Data Engineer Resume
NC
SUMMARY:
- 12 years of IT experience that includes more than 4 years in Big Data, 8 years in Data warehousing.
- Worked as Integration specialist,Solution Architect, data engineer and data analyst.
- Experience on Apache Hadoop Technologies Hadoop distributed file system (HDFS), MapReduce framework, YARN, Pig, Hive, Sqoop, Spark, BigSheets and BigSQL.
- Strong experience on Apache Hadoop Map Reduce programming, PIG Scripting and HDFS.
- Good knowledge on Hadoop Cluster administration, monitoring and managing Hadoop clusters using Cloudera Manager.
- Experience in data transformation and data analysis using HiveQL, Pig Latin.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Worked with NoSQL data bases like HBase.
- Having experience on using OOZIE to define and schedule the jobs .
- Worked on BigSheets for data validation and reports validation.
- Experience in importing and exporting data using Sqoop from Relational Databases into HDFS.
- Experience in using Spark API over Hadoop Map Reduce to perform analytics on data. Performed various performance optimizations like using distributed cache for small datasets, Partitions and Bucketing in Hive and Map Side joins.
- Good hands on experience in creating the RDD's, DF's for the required input data and performed the data transformations using Spark Scala.
- Working on Hive queries using Spark SQL that integrates with the Spark environment.
- Experience in converting Hive queries into Spark transformations using Spark RDDs and Scala.
- Experience in large-scale, distributed systems, with ETL tools
- Experience in OLTP Modeling and OLAP Dimensional modeling (Star and Snow Flake) .
- Worked with multiple relational databases such as Oracle 8i/9i/10g, SQL Server 2005 and DB2.
- Experience in UNIX/Scala Shell Scripting.
- Knowledge on core Java.
- Good experience in Data Warehousing Methodologies/Dimensional Data modeling techniques such as Star/Snowflake schema using ERWIN
- Strong skills in SQL, PL/SQL functions, stored procedures, and materialized views to implement business logic in oracle atabase.
- Experience with Database SQL tuning and query optimization.
- Extensively worked on developing Informatica Mappings, Mapplets, Sessions, Worklets and Workflows for data loads.
PROFESSIONAL EXPERIENCE:
Confidential, NC
Data Engineer
Environment: Datastage, BigSheets, BigSQL, HDFS, Pig, Hive, Sqoop,HBase, Spark.
Responsibilities:
- Responsible for designing and managing the Sqoop jobs that migrate the data from DB2(Mainframe) to HDFS.
- Datastage is used to ingest incremental data from different source systems.
- Created pig/hive scripts to ingest, extract and managing Data over HDFS.
- Created different schemas in Hive to store processed data in a tabular format.
- HBase is used to store incremental data.
- Used BigSheets and BigSQL to validate data and reports.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in handling risk assessment and subsequently created contingency and mitigation plans.
- Analyzed large amounts of data sets to determine the optimal way to aggregate and report on it.
- Participate in requirement gathering and analysis phase of the project in documenting the business requirements
Confidential, Los Angeles, CA
Data Engineer
Environment: Oracle 11g,HDFS, Pig, Hive, Oozie,Sqoop,Informatica.
Responsibilities:
- Collected business requirements from the Business Partners and Subject Matter Experts.
- Worked on loading data from ERP sources into HDFS using Sqoop.
- Created various pig/hive scripts to ingest, extract and managing Data over HDFS.
- Responsible for designing and managing Sqoop jobs that uploaded the data from Oracle to HDFS.
- Wrote Hive Queries to have a consolidated view of the transactional data for analysis and ad hoc querying.
- Created partitioned tables in Hive for best performance and faster querying.
- Data was loaded into HBase for reporting and for the business users to analyze and visualize the data.
- Used python to embed data from file attachment to body of the email.
- Analyzed large amounts of data sets to determine the optimal way to aggregate and report on it.
- Participate in requirement gathering and analysis phase of the project and documenting the business requirements.
Confidential, CA
Sr. ETL Consultant
Environment: Informatica 9.1, MS Sql Server, Oracle 10g, Linux,, Anaplan.
Responsibilities:
- Worked in Informatica 9x/ 8x for the ETL from MS Sql server source to data warehouse.
- Created Technical Required document based on the FRD.
- Provide the estimated timelines for Integration work.
- Completed the development of informatica mappings, sessions and workflows.
- Worked with Trillium libraries for cleansing the address fields.
- Created Unix scripts that are required to process the data from source to target which includes
- Sftp scripts, encrypting and decrypting scripts.Designed and developed the error handling mechanism by using the oracle stored proc and Unix scripts.
- Worked on EBS Concurrent Program Mananager for Inforamtica job scheduling.
- Providing support activities /Enhancements for ETM.
Confidential, Sanjose, CA
ETL/ Data Analyst
Environment: Informatica 8.6, Teradata R12,Oracle 9i,Obiee 10.x.,Trillium
Responsibilities:
- Created CR’s extensively to fix data issues on TD (Long-term fixes/Short-term fixes).
- Used Data Flux tool extensively to compare data between Oracle and Teradata.
- Created SQL scripts for data validations on daily basis in Products Subject area.
- Collecting requirements for data quality checks to be done for data within various application and data across applications using Dataflux.
- To define data quality rules especially for those data elements where the data quality is not controlled by application itself due to various limitations in the applications.
- Worked with Informatica PowerCenter to create mappings for extracting data from various database sources which include oracle and Flat files.
- Worked on OBIEE for generating reports and dashboards for DQG (Data Quality & Governance) project.
- Migrated data from Oracle to Teradata by writing BTEQ, Fast Load and MultiLoad scripts.
- Wrote Teradata BTEQ, Fast Load, MultiLoad, TPump and Fast Export scripts to import / export data.
Environment: Informatica 8.6.1, Oracle 10g, Teradata V2R6,OBIEE,DataFlux.
Confidential, Pleasanton, CA
ETL Development Lead
Environment: Informatica 8.6, Teradata R12,Oracle 9i.
Responsibilities:
- Contacting the Business Analysts/Users to develop, document and to evolve process and data models based upon their input and feedback.
- Experience in preparing Logical data models as well as Physical data models and document the same.
- Worked with DBA’s to design and build the Staging/Target databases.
- Worked in Informatica 8.6 for the Extraction, Transformation and Loading from various sources to the enterprise data warehouse.
- Developed and tested extraction, transformation, and load (ETL) processes.
- Involved in change data capture (CDC) ETL process.
- Created sessions and workflows for the source to staging data flow and staging to Target loading.
- Implemented Type II changes for different Dimensions.
- Developed functions and stored procedures to aid complex mappings.
Confidential, Sunnyvale, CA
Informatica Consultant/ Onsite Co-ordinator
Environment: Informatica 7.1, Oracle, Oracle Apps, Unix (AIX).
Responsibilities:
- Worked in Informatica 7.1 for the Extraction, Transformation and Loading from various sources to the enterprise data warehouse.
- Interacting with the Business Analysts to get the project requirements.
- Responsibilities including Production support and enhancement.
- Involved in bulk extracts as well as change data capture (CDC).
- Created sessions and workflows for the source to staging data flow and staging to Target loading.
- Design, Development and tested of the various Mappings and Mapplets involved in the ETL process
- Implemented Type II changes for different Dimensions.
Confidential, NJ
Informatica Consultant
Environment: Informatica 8.1, MS Sql Server, Oracle 10g, UNIX, DB Artisan.
Responsibilities:
- Worked in Informatica 8.1 v for the ETL from MS Sql server source to the governance technology data warehouse.
- Created sessions and workflows for the source to staging data flow and staging to data warehousing.
- Design, Developing and testing of the various Mappings and Mapplets, worklets and workflows involved in the ETL process.
- Implemented Type II changes for Project and Organization Dimensions.
- Developed stored procedures to aid complex mappings.
- Involved in fixing invalid Mappings, testing of Stored Procedures, Unit and Integration Testing of Informatica Sessions and the Target Data.
- Created Unix scripts and tested for documented in detail the same and implemented it during the quality analysis phase.
Confidential, Boston, MA
ETL Consultant
Environment: Informatica 8.1, Mainframe, Oracle 9i, UNIX (AIX), Toad.
Responsibilities:
- Worked in Informatica 8.1 for the ETL from various sources to the central data warehouse.
- Design, Development and testing of the various mappings involved in the ETL process.
- Instead of using Normalizer Transformation, I designed a mapping for concatenating the mainframe fixed length flat files.
- Created customized transformations to handle miscellaneous characters.
- Created different mapplets, and worklets as per the business requirements.
- Implemented a common Error handling for all the workflows loading fact data.
- Implemented Type I and Type II slowly changing dimensions.
- Informatica components for daily and monthly incremental loading of around 50 tables were designed.
- Created Unix scripts for scheduling.
Confidential, Sanjose, CA
Informatica Consultant
Environment: Informatica Power Center 8.1, Oracle 10g, SQL * Plus, Business Objects xi.
Responsibilities:
- Validation of Informatica mappings for source compatibility due to version changes at the source level.
- Conversion of oracle procedures to informatica mappings
- Customization of mappings to accommodate new environment
- Performance Tuning of Informatica mappings
- Repository migration for Development, SIT & UAT environment
- Changed universe design to accommodate new business requirements
- Created new Deski reports and Webi reports as per user requirements
- Documentation of existing informatica mappings
- Created UAT environment for informatica by creating repository from the production repository.
- Took backup of repository from production and restored it into DEV and UAT.
- Migration of reports from 6.5 to XI R2
Confidential, Minneapolis, MN
Informatica Consultant
Environment: Informatica 7.1, DB2 v7, Oracle 9i, Unix (AIX).
Responsibilities:
- Worked in Informatica 7.1 for the ETL from various sources to the central data warehouse.
- Created mappings, sessions, workflows for loading three fact tables and 6 dimension tables.
- Design, Development and testing of the various mappings involved in the ETL process.
- Extensively used Transformations like Router Transformation, Aggregator Transformation, Normalizer Transformation, Source Qualifier Transformation, Joiner Transformation, Expression Transformation and Sequence generator.
- Implemented a common restart strategy for all the workflows.
Confidential, Boise, Idaho
Informatica Consultant
Environment: Informatica 7.1, Teradata, Mainframes, Unix (AIX).
Responsibilities:
- Worked in Informatica 7.1 for the ETL from various remote sources to the central data warehouse.
- Involved in bulk extracts as well as change data capture (CDC) from remote Mainframe source using San info mover.
- Created sessions and workflows for the source to staging data flow and staging to presentation layer loading.
- Design, Development and tested of the various Mappings and Mapplets involved in the ETL process
- Scheduled the workflow for the change data capture for the source
- Implemented Type II changes for Currency and Unit Of Measure Dimensions.
- Created master workflow for the daily extract from the remote sources and to the staging area and then to the presentation layer.