Seni Software/big Data Engineer Resume
OR
SUMMARY
- Over 7 years of experience in development, implementation and testing of Data warehousing and Business Inetelligence solutions
- 5 years of experience as BigData Engineer/Data Engineer and Data Analyst including designing, developing and implementation ofdatamodels for enterprise - level applications and systems.
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase and Spark.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Hive, Pig, and Pyspark.
- Experienced in using distributed computing architectures such as Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solvebigdatatype problems
- Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SQOOP, Hue, Scala and CHEF.
- Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster
- Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Hortonworks and Cloudera.
- Expertise in integration of variousdatasources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Experience in extracting the data from RDBMS into HDFS using Sqoop.
- Experience in collecting the logs from log collector into HDFS using Flume.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper
- Solid knowledge ofDataMarts, OperationalDataStore (ODS), DimensionalDataModeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services
- Expertise inDataArchitect,DataModeling,DataMigration,DataProfiling,DataCleansing, Transformation, Integration,DataImport, andDataExport through the use of multiple ETL tools such as Informatica Power Centre.
- Experience in NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning &datamodeling.
- Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
- Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments
- Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
- Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra
- Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, MapReduce(M-R), Hue, Hive, Pig, HBase, Impala, Sqoop, Flume, Zookeeper, Oozie, Kafka, Spark with Scala
Operating Systems / Environment: Windows, Ubuntu, Linux, iOS, Cloudera CDH,EC2,S3, IBM Big Insight
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, Java Beans
Modeling Tools: UML on Rational Rose, Rational Clear Case, Enterprise Architect, Microsoft Visio
IDEs: Eclipse, Net beans, JUnit testing tool, Log4j for logging
Databases: Oracle, DB2, MS-SQL Server, MySQL, MS- Access, Teradata,NoSQL (HBase, MongoDB, Cassandra )
Web Servers: Web Logic, Web Sphere, Apache Tomcat 7
Build Tools: Maven, Scala Build Tool(SBT), Ant
Operating systems and Virtual Machines: Linux (Red Hat, Ubuntu, Centos), Oracle virtual box, VMware player, Workstation 11
ETL Tools: Talend for Big data, Informatica
PROFESSIONAL EXPERIENCE
Confidential - OR
Senior Software/Big Data Engineer
Responsibilities:
- Worked on building an ingestion framework to ingest data from different sources like Oracle, SQL server, delimited flat files, XML, Parquet, JSON into Hadoop and building tables in Hive
- Worked on building big data analytic solutions to provide near real time and batch data as per Business requirements.
- Assisted in leading the plan, building, and running states within the Enterprise Analytics Team
- Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
- Performed detailed analysis of business problems and technical environments and use thisdatain designing the solution and maintainingdataarchitecture
- Designed and developed software applications, testing, and building automation tools.
- Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences
- Worked on building a Spark framework to ingest data into Hive external tables and run complex computational and non equi-join SQLs in Spark.
- Writing Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
- Used Hive to analyze the partitioned and bucketeddataand compute various metrics for reporting on the dashboard.
- Used Hive to analyzedataingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Involved in developing a data quality tool for checking all data ingestion into Hive tables
- Collaborated with BI teams to ensure data quality and availability with live visualization
- Design, develop and maintain workflows in Oozie to integrate Shell-actions, Java-actions, Sqoop-actions, Hive-actions and Spark-actions in Oozie workflow nodes to run data pipelines
- Design and support multi-tenancy on our data platform to allow other teams to run their applications
- Used Impala for low latency queries, visualization and faster querying purposes.
- Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables .
- Created HBase tables to load large sets of structured data.
- Managed and reviewed Hadoop log files.
- Coded to ENCRYPT/DECRYPT data for PII groups.
- Performed Real time event processing of data from multiple servers in the organization using Apache Kafka and Flume.
- Processed JSON files and ingested into Hive tables
- Used python to parse XML files and created flat files from them.
- Used Hbase to support front end applications that retrieve data using row keys.
- Used Control-M as Enterprise Scheduler to schedule all our jobs
- Used Bit-Bucket extensively for code repository
Environment: Cloudera,Hue,Java,Python,Sql,Shell-scripting,CONTROL-M,Oozie,Spark,Sqoop,Bit-Bucket,HiveImpala
Confidential - TX
Senior Software/Big Data Engineer
Responsibilities:
- Implemented theBigDatasolution using Hadoop, hive and Informatica to pull/load thedatainto the HDFS system.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Architected, Designed and Developed Business applications andDatamarts for reporting.
- Worked with SME and conducted JAD sessions documented the requirements using UML and use case diagrams
- Used SDLC Methodology ofDataWarehouse development using Kanbanize.
- Configured Apache Mahout Engine.
- Used Agile (SCRUM) methodologies for Software Development.
- Wrote complex Hive queries to extractdatafrom heterogeneous sources (DataLake) and persist thedatainto HDFS.
- DevelopedBigDatasolutions focused on pattern matching and predictive modeling
- Objective of this project is to build adatalake as a solution using Apache Hadoop.
- Developed the code to performDataextractions from Oracle Database and load it into Hadoop platform using ingestion framework.
- Created Hive External tables to stagedataand then move thedatafrom Staging to main tables
- Worked in exportingdatafrom Hive tables into Oracle database.
- Pulled thedatafromdatalake (HDFS) and massaging thedatawith various RDD transformations.
- Developed Scala scripts, UDF's using bothDataframes/SQL and RDD/MapReduce in Spark forDataAggregation, queries and writingdataback into RDBMS through Sqoop.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing ofdata.
- CreatedDataPipeline using Processor Groups and multiple processors using Apache Nifi for Flat Files, RDBMS as part of a POC using Amazon EC2.
- Developed Spark code using Scala and Spark-SQL for faster testing anddataprocessing.
- Built Hadoop solutions forbigdataproblems using MR1 and MR2 in YARN.
- Load thedatafrom different sources such as HDFS or HBase into Spark RDD and implement in memorydatacomputation to generate the output response.
- Developed complete end to endBig-dataprocessing in Hadoop eco system
- Used Hortonworks distribution with Infrastructure Provisioning / Configuration.
- Used Hive to analyze the partitioned and bucketeddataand compute various metrics for reporting on the dashboard.
- Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Used Hive to analyze the partitioned and bucketeddataand compute various metrics for reporting on the dashboard.
- Worked on configuring and managing disaster recovery and backup on CassandraData.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
- Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
- Implemented partitioning, dynamic partitions and buckets in Hive.
- Used Flume to collect, aggregate, and store the web logdatafrom different sources like web servers, mobile and network devices and pushed to HDFS.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Environment: Apache Spark 2.3, Hive 2.3, Informatica, HDFS, MapReduce, Scala, Apache Nifi 1.7, Yarn, HBase, PL/SQL, MongoDB, Pig 0.17, Sqoop 1.4, Apache Flume 1.8
Confidential - Bridgeport, CT
Hadoop Developer
Responsibilities:
- Worked as a Sr.DataAnalyst/DataEngineer to review business requirement and compose source to target data mapping documents.
- Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable BigData platforms.
- Designed and developed architecture fordataservices ecosystem spanning Relational, NoSQL, and BigDatatechnologies.
- Responsible for thedataarchitecture design delivery,datamodel development, review, approval andDatawarehouse implementation.
- Designed and developed the conceptual then logical and physical data models to meet the needs of reporting.
- Involved in designing and developing DataModels andDataMarts that support the Business IntelligenceDataWarehouse.
- Implemented logical and physical relational database and maintained Database Objects in thedatamodel using Erwin.
- Responsible for Bigdata initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
- Performed theDataMapping,Datadesign (DataModeling) to integrate thedataacross the multiple databases in to EDW.
- Designed both 3NFDatamodels and dimensionalDatamodels using Star and Snowflake schemas.
- Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
- Worked with SQL Server Analysis Services (SSAS) and SQL Server Reporting Service (SSRS).
- Worked onDatamodeling, Advanced SQL with Columnar Databases using AWS.
- Performed reverse engineering of the dashboard requirements to model the requireddatamarts.
- Cleansed, extracted and analyzed businessdataon daily basis and prepared ad-hoc analytical reports using Excel and T-SQL.
- CreatedDataMigration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
- Handled performance requirements for databases in OLTP and OLAP models.
- Conducted meetings with business and development teams fordatavalidation and end-to-enddatamapping.
- Involved in debugging and Tuning the PL/SQL code, tuning queries, optimization for the Sql database.
- Leaddatamigration from legacy systems into moderndataintegration frameworks from conception to completion.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetchdatafrom legacy DB2 and SQL Server database systems..
- Generated DDL and created the tables and views in the corresponding architectural layers.
- Handled importing ofdatafrom variousdatasources, performed transformations using Map Reduce, loadeddatainto HDFS and Extracted thedatafrom MySQL into HDFS using Sqoop.
- Involved in performing extensive Back-End testing by writing SQL queries and PL/SQL stored procedures to extract thedatafrom SQL Database.
- Participate in code/design reviews and provide input into best practices for reports and universe development.
- Involved in the validation of the OLAP, Unit testing and System Testing of the OLAP Report Functionality anddatadisplayed in the reports.
- Created a high-level industry standard, generalizeddatamodel to convert it into logical and physical model at later stages of the project using Erwin and Visio.
- Involved in translating business needs into long-term architecture solutions and reviewing object models,datamodels and metadata.
Environment: Erwin 9.7, HDFS, HBase, Hadoop 3.0, Metadata, MS Visio 2016, SQL Server 2016, SDLC, PL/SQL, ODS, OLAP, OLTP, flat files.
Confidential - San Francisco, CA
Data Analyst/Data Engineer
Responsibilities:
- Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, Architect's and BAs.
- Optimized and updated UML Models (Visio) and RelationalDataModels for various applications.
- Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
- Worked with Business users during requirements gathering and prepared Conceptual, Logical and PhysicalDataModels.
- Planned and defined system requirements to Use Case, Use Case Scenario and Use Case Narrative using the UML (Unified Modeling Language) methodologies.
- Participated in JAD sessions involving the discussion of various reporting needs.
- Reverse Engineering the existingdatamarts and identified theDataElements, Dimensions, Facts and Measures required for reports.
- Extensively used PL/SQL in writing database packages, stored procedures, functions and triggers in Oracle.
- Createddatadictionaries for variousdatamodels to help other teams understand the actual purpose of each table and its columns.
- Developed the requireddatawarehouse model using Star schema for the generalized model.
- Involved in designing and developing SQL server objects such as Tables, Views, Indexes (Clustered and Non-Clustered), Stored Procedures and Functions in Transact-SQL.
- Used forward engineering approach for designing and creating databases for OLAP model.
- Developed and maintainedDataDictionary to create Metadata Reports for technical and business purpose.
- Worked with BI team in providing SQL queries,DataDictionaries and mapping documents.
- Responsible for the analysis of business requirements and design implementation of the business solution.
- Extensively involved inDataGovernance that involveddatadefinition,dataquality, rule definition, privacy and regulatory policies, auditing and access control.
- Designed and Developed Oracle database Tables, Views, Indexes and maintained the databases by deleting and removing olddata.
- DevelopedDatamapping,DataGovernance, Transformation and Cleansing rules for theDataManagement involving OLTP, ODS and OLAP.
- Conducting user interviews, gathering requirements, analyzing the requirements using Rational Rose, Requisite pro RUP.
- Designed and developed Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
- Created E/R Diagrams,DataFlow Diagrams, grouped and created the tables, validated thedata.
- Designed thedatamarts in dimensionaldatamodeling using star and snowflake schemas.
- Translated business concepts into XML vocabularies by designing Schemas with UML.
Environment: MS Visio 2014, PL/SQL, Oracle 11g, OLAP, XML, OLTP, SQL server, Transact-SQL.
Confidential, IL
ETL Consultant
Responsibilities:
- Worked as an Informatica developer and involved in creation of initial documentation for the project and setting the goals for Data Integration team from ETL perspective.
- Played a Key role in designing the application that would migrate into the existing data into Annuity warehouse effectively by using Informatica Power Center.
- Parsed high-level design spec to simple ETL coding and mapping standards.
- Created ftp connections, database connections for the sources and targets.
- Involved in creating test files and performed testing to check the errors.
- Loaded Data to the Interface tables from multiple data sources such as MS Access, SQL Server, Flat files and Excel Spreadsheets using SQL Loader, Informatica and ODBC connection.
- Created different transformations for loading the data into targets like Source Qualifier Joiner Transformation, Update strategy, lookup transformation (connected and unconnected), Rank transformations, Expression, Aggregator, and Sequence Generator.
- Simplified the data flow by using a Router transformation to check multiple conditions at the same time.
- Created reusable transformations and mapplets to import in the common aspects of data flow to avoid complexity in the mappings.
- Created sessions, sequential and concurrent batches for proper execution of mappings using workflow manager.
- Used shortcuts to reuse objects without creating multiple objects in the repository and inherit changes made to the source automatically.
- Extensively worked with SQL scripts to validate the pre and post data load.
- Used Session parameters, Mapping variable/parameters and created Parameter files for imparting flexible runs of workflows based on changing variable values.
- Responsible for monitoring scheduled, running, completed and failed sessions. Involved in debugging the failed mappings and developing error handling methods.
- Generated weekly and monthly report Status for the number of incidents handled by the support team.
- Maintained source and target mappings, transformation logic and processes to reflect the changing business environment over time.
- Designed a mapplet to update a slowly changing dimension table to keep full history which was used across the board.
- Maintained documentation for corporate Data Dictionary with attributes, table names and constraints.
- Responsible for post production support and SME to the project.
Environment: Informatica Power Center, Oracle 10g, PL/SQL, SQL Server, SQL Developer Toad, Windows NT, Stored Procedures, Business Intelligence Development Studio, Microsoft Visio 2003, Business Objects.
Confidential, McLean, VA
ETL Developer
Responsibilities:
- Experience in developing Logical data modeling, Reverse engineering and physical data modeling of CRM system using ER-WIN and Infosphere.
- Involved design and development of Data Migration from Legacy system using Oracle Loader and import/export tools for OLTP system.
- Worked closely with the Data Business Analyst to ensure the process stays on track, develop consensus on data requirements, and document data element/data model requirements via the approved process and templates.
- Was involved in writing Batch Programs to run Validation Packages.
- Extensively worked on Informatica Power Center-Source analyzer, Data warehousing designer, Mapping Designer, Mapplet and Transformations to import source and target definitions into the repository and to build mappings.
- Extensive use of Store Procedures/functions/Packages and User Defined Functions
- Proper use of Indexes to enhance the performance of individual queries and enhance the Stored Procedures for OLTP system
- Dropped and recreated the Indexes on tables for performance improvements for OLTP application
- Tuned SQL queries using Show Plans and Execution Plans for better performance
- Done the full life cycle software development processes, especially as they pertain to data movement and data integration
Environment: Informatica PowerCenter, Oracle 10g, SQL, PL/SQL, DB2, Stored Procedures, UNIX