Senior Data/etl Engineer Resume
Irvine, CA
SUMMARY
- Total 8 Years of experience in implementing ETL solutions using Hadoop framework and Informatica Power Center/Big Data Edition.
- Two years of experience with Big Data frame works and migration of projects from ETL to Hive Warehouse.
- Designed and implemented SCD ETL data flow in Hive Warehouse.
- Experience working with NoSQL databases. Designed data models and solutions based on Cassandra.
- Used Solr to query huge volumes of data.
- Good understanding of Hadoop and Map Reduce Internals.
- Implemented hive solution to convert semi - structured data into consumable format by Data Scientists.
- Performed dynamic partitioning on HIVE external and internal tables for faster query results.
- Experience working with in Apache Spark streaming, Spark SQL and DataFrames.
- Proficient inUNIX shell scripting, Perl and Python programming languages.
- Wrote shell scripts to execute workflows and also to load data from source xml’s to staging tables, to purge data from tables and respective paths.
- Worked in all phases ofSoftware Development Life Cycle(SDLC) with experience in Educational, Financial, Insurance, Telecom, Health care and Bio-technology domains.
- Developed PL/SQL Stored procedures using Functions, Packages, Triggers and Cursors.
- UsedInformaticaData Quality 9.6.1 (IDQ) toolkit for Analysis and wrote complex rules for data cleansing, data matching/conversion using components like Character labeler, Rule-based analyzer, Dictionary Manager.
- Involved in building Enterprise Data Warehouses (EDW), Data Marts using data modeling tool ERWIN
- Worked with Informatica Client Tools - Repository Manager, Mapping Designer, Transformation Developer, Mapplet Designer, Workflow Manager and Monitor.
- Expert inTrouble shooting andperformance tuningat various levels such as source, mapping, target and sessions.
- Developed complex mappings using Informatica Power Center Transformations - Lookup, Filter, Expression, Router, Joiner, Update Strategy, Aggregator, Stored Procedure, Sorter, Sequence Generator
- Experience in developing complexMapplets, Workletsand re-usable Tasks, re-usable mappings.
TECHNICAL SKILLS
Big Data Technologies: Hive, Sqoop, Kafka, Zoo Keeper, Redis, Storm, Spark, HDFS, YARN
Hadoop Distributions: Horton works, Azure, AWS (Amazon Web Services) Cloud, Cloudera
ETL Data Warehousing Tools: Informatica Power Center 9.6.1/ 9.5.1/9.1.0 /8.6.1 , Power Exchange, Informatica Data Quality(IDQ), Big Data Edition(BDE), IBM Web Sphere Data Stage 9.1/8.5
Dimensional Data Modeling: Data Modeling, Star/Snow-Flake models, Erwin Data Modeler, Visio,Wiki,Jira,Trello
Business Intelligence Reporting: Business Objects 6.5, Cognos Series 7.0, SQL Server Reporting and Analysis, Web focus
Programming: Unix Shell Scripting, Python, SQL, PL/SQL, Transact SQL, C++
Databases/Tools: Cassandra, Solr, AWS RedShift, Oracle, IBM Netezza, Microsoft SQL Server
PROFESSIONAL EXPERIENCE
Confidential, Irvine, CA
Senior Data/ETL Engineer
Responsibilities:
- Attended business meetings with end-users/UAT team and student advisor meetings/demos to understand new front end so data can be analyzed as per changing data sources.
- Integrated data from multiple data sources including Canvas, Registrar/Admissions data using Informatica ETL tool and stored procedures.
- Converted processes from Tableau ETL to use Informatica Power Center which improved performance for huge fact table loads daily.
- Worked on pulling Canvas files with huge volumes of data fact and dimensions (70 different files in Star Schema) daily to local Informatica sftp server from AWS (Amazon Web Services) cloud using Python script.
- Performed data analysis on data both historic and real time using Spark Streaming and SQL using python before loading data into data storage for Compass reporting.
- Worked on converting warehouse from SQL server to Oracle database using stored procedures.
- Provided analysis and type 2 data for different Cognos provide interim student learner activity information to advisors and student services support staff to provide a more complete view of student progress/success.
- Developed a design for Audit table load for the Compass project and other Informatica jobs.
- Performed Cognos reporting on top of the Audit table to know the analytics and performance of data loads daily.
- Created Probation, Demographic and lower division reports future use of learning management data to provide “nudges” to students to encourage improved study habits and course materials interaction.
- Worked on Informatica IDQ mappings to write rules for completing data profiling, cleansing.
- Worked on JIRA Agile for Agile planning, management and integration as its’ flexible to each user to plan sprints, change and release.
- Updated wiki and Trello for task tracking and documentation.
Environment: Hive 1.2,Horton Works, Kafka, Cassandra, Apache Spark, Python, Informatica Power Center 9.6.1/Data Quality(IDQ), XML, AWS cloud, Wiki, JIRA Studio/Agile, Cognos Report Studio 10.2.2.0, Trello, SQL Server 2012, Oracle, Microsoft SQL Server Management Studio 11.0
Confidential, San Diego, CA
Senior Data Engineer
Responsibilities:
- Processed machine generated data into format consumable by Data Scientists using Hive.
- Used Kerberos authentication to connect to Hadoop from Edge node.
- Created external tables in Hive and linked them to source files.
- Used HIVE functions like Extract and explode to convert data from XML to structured column format.
- Loaded data into final external table in Hive Warehouse for querying by reporting team.
- Extensively worked on Hive UDFs and Performance tuning.
- Optimized the performance of Ingestion and consumption.
- Kafka messaging queue is used to read data into storm.
- Storm is used to process data and metadata and load into Cassandra database.
- Created new Informatica SAP/ABAP extractors using SAP Exchange for Informatica for new custom finance contracts (close to real time OLTP data) from source depending on Delta/Full load as needed.
- Worked on Informatica IDQ mappings to write rules for completing data profiling, cleansing and matching/removing duplicate data.
- Worked with ETL Architect/Data modelers to design ETL for jobs with performance issues/reporting bugs.
- Dealt with General ledger/finance and product data and reports to fix the amounts and get right annual profits in reporting for business users.
- Involved in Cognos report unit testing after the changes blasted out in frame work model.
- Used JIRA Studio for bug tracking, issue tacking and content management.
- Worked on updating Confidential website customer and product data which is located on AWS (Amazon Web Services) cloud.
- Used Skybot scheduling tool to run the Individual/Member jobs or Job suites to run as a batch. Member jobs are used to not miss any data dependencies between dim and fact tables.
Environment: Hive 1.2, Hadoop 2.4, Horton works, Azure, Kafka, Zoo keeper, Redis, Storm, YARN, Cassandra, Solr, Informatica Power Center 9.6.1/Data Quality(IDQ), SAP Exchange, XML, AWS RedShift, Wiki, JIRA Studio/Agile, Cognos Report Studio 10.2.2.0, Skybot, SQL Server 2012, Microsoft SQL Server Management Studio 11.0
Confidential, Brea, CA
Senior Data Engineer
Responsibilities:
- Coordinated with business analysts and users to get specific requirements to build new reports for Fluid Management project.
- Worked on developing Functional design, detailed level designs and Mapping documents.
- Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured and unstructured data.
- Loaded unstructured data into Hadoop File System (HDFS).
- Wrote HiveQL (HQL) statements using commands like Create, Drop, Alter, Describe, Truncate and Join, Aggregate grouping functions.
- Submitted HIVEQL statements to be converted and run in ETL Informatica BDE for execution in converting unstructured data to structured format.
- Wrote HQL queries to get monthly and yearly data like count of Inpatient/Outpatient, Mode of Dialysis, Check vitals, Patient risk factor to help reporting team users.
- Hands on experience with installing Hadoop cluster (using command line interface) consisting of multiple data and name nodes/maintaining it on AWS EC2 server both using Windows and Unix as local OS.
- Co-ordinated with teams onsite at different locations across US and off-shore to get all the pieces of code and data required and made sure all QA/Production release is done as per the project plan.
- Performed Impact analysis on the whole ware house when major design changes are implemented to improve performance of Fact/Dim jobs due to changing data size.
- Wrote NZSQL scripts to load fact and dimension tables in Netezza using Aginity Workbench.
- Developed complex Mappings using Informatica Power Center Transformations - Lookup, Filter, Expression, Router, Joiner, Update Strategy, Aggregator, Stored Procedure, Sorter, Sequence Generator and Slowly Changing Dimensions.
- Wrote shell scripts to automate and create workflows directly in the repository with Start/End job required at the Enterprise level with minimum inputs like text file with source/target details and SQL script to load the Fact/Dimension table.
- Point of contact for QA and Business users and helped resolve UAT issues, acted as release coordinator.
Confidential, Naperville, IL
Senior ETL Production Support Engineer
Responsibilities:
- Adopted existing UNIX directories and scripts and modified as per required data process and for better performance and enhancements.
- Developed and implemented the new flow of Component trades which includes loading the feed file in Perl/Unix shell, applying the business rules through stored procedures and generating the report in Perl for Adhoc reports.
- Re-designed ETL jobs for performance issues as part of Production Support on-call rotation basis.
- Created fix workflows to load data after a failure in job due to data issue as part of Production Support.
- Developed custom item file jobs to provide with special price on the products to specific customers with bulk orders all around the year.
- Identified and eliminated duplicates in datasets thorough IDQ 8.6.1 components of Edit Distance, Mixed Field matcher. It enables the creation of a single view of customers, help control costs associated with mailing lists by preventing multiple pieces of mail.
- Designed and developed the DataStage server as well as parallel jobs for extracting, cleansing, transforming, integrating and loading data using DataStage designer.
- Created UNIX shell scripts to utilize the DataStage Parallel engine to handle large volumes of data.
- Used HP Quality center for Incident Management to keep track of ongoing major L1 and L2 tickets and played a key role in closing many critical issues/bugs in the high profit/sale season for the company.
- Used UC4 to schedule and monitor jobs.
- Used Quality center for tracking and resolving defects.
Environment: Informatica Power center/IDQ 8.6.1, IBM Info sphere DataStage 8.5, PL/SQL, Perl Scripting, Oracle 10g/11g, SQL Server, Toad for Oracle 10.0, IT Remedy, flat files, Business Objects enterprise 11.5, Paraccel, UC4 scheduler.
Confidential, Monroe, LA
ETL Informatica Developer/Business Analyst
Responsibilities:
- Worked on migration of data from Confidential system to Qwest server as a part of merging of the two companies.
- Involved in Requirement Analysis, ETL Design and Development for extracting from the source systems and loading it into the Warehouse and DataMarts.
- Analyzed, designed, developed, implemented and maintained moderate to complex initial and incremental load mappings to provide data for enterprise data warehouse.
- Wrote UNIX scripts to run and schedule workflows on Production server for the daily runs.
- Used transformations like Source Qualifier, Expression, Filter, Lookup transformations for transformation of the data and loading into the targets.
- Worked on Teradata Utility scripts like FastLoad, MultiLoad to load data from various source systems to Teradata.
- Created BTEQ (Basic Teradata Query) scripts to generate Keys.
- Wrote complex SQL scripts to avoid Informatica joiners and look-ups to improve the performance where data volumes were heavy.
- Used BTEQ andSQL Assistant(Query man) front-endtools to issue SQL commands matching the business requirements to TeradataRDBMS.
- Automated the Informatica process to update a control table in database when maps are run successfully.
- Developed PL/SQL procedures/packages to kick off the SQL Loader control files/procedures to load the data into Oracle.
- Defined file provisioning process (DW Preprocessing Steps). 100% automation of file provisioning process using UNIX, Informatica mappings and oracle utilities.
- Solely responsible for the daily loads, handling the reject data and re-loading the fixed records.
Environment: Informatica Power Center 9.1/8.6.1/8.6.0 , Teradata SQL Assistant 12.0, Oracle PL/SQL developer 8.0, Oracle 10g, Python, SQL Plus, Unix shell scripts, Web focus, Putty.
Confidential, Reston, VA
ETL Informatica Developer
Responsibilities:
- Requirement gathering and Analysis of the specifications provided by the clients and updating Report Descriptions, Software Requirements Specification documents.
- Worked on Informatica tool - Source Analyzer, Data warehousing designer, Mapping Designer & Mapplets, and Transformations.
- Developed mappings using reusable transformations.
- Created workflows with event wait task to make sure all prerequisites are met for each job in the flow.
- Modified UNIX scripts to monitor systems and automation of daily tasks and customer requests.
- Extensively used Debugger Process to test Data and unit testing.
- Loaded data from various sources (Flat files, Oracle, SQL Server, XML) using different Transformations like Source Qualifier, Joiner, Router, Sorter, Aggregator, Connected and Unconnected Lookup, Expression, Sequence Generator, Union and Update Strategy to load the data into the target.
- Used persistent/static/dynamic cache for better throughput of sessions containing Lookup, Joiner, and Aggregator and Rank transformations.
- Used Incremental Aggregation technique for better performance of aggregator transformation.
- Created mappings for populating data to dimensions and fact tables with huge volumes for history and daily loads separately.
- Provided Production Support for business users and documented problems and solutions for running the workflow.
- Moving the mappings and workflows to staging and then production environment and testing the process at every level.
- Used IBM clear case and clear quest for Incident management to keep track of open and critical tickets.
Environment: Informatica Power Center 8.6.1, Power Exchange 8.6.1, Oracle 11g/10g, IBM Clear Case/Clear quest, Cognos 8, XML, PL/SQL, Toad, Unix Korn-shell scripts, Perl scripts, Erwin, Linux.
Confidential, Irving, TX
ETL Developer
Responsibilities:
- Participated from the initial Data Warehouse build phase which includes logical and physical modelling.
- Handled various types of sources like flat files, Oracle, SQL Server.
- Extracted SAP HR data into Informatica using SAP power connect and application source qualifier to get data from SAP system into source analyzer.
- Worked extensively with complex mappings using transformations like update strategy, expression, aggregator, stored procedure, filter, lookup.
- Created reusable transformations and mapplets in the designer using transformation developer and mapplet designer tools.
- Used workflow monitor to monitor tasks, workflows and also to monitor performance using collect stats.
- Worked on writing and tuning SQL and PL/SQL Procedures.
- Worked on UNIX shell Scripts.
- Appropriate tuning was performed to improve performance and provide maximum efficiency for ETL Jobs and Oracle database level scripts.
Environment: Informatica Power Center 8.6.1/8.5/7.1 , Power connect 8.6.1, Oracle 11g/10g, SAP, MS SQL Server 2005, XML, PL/SQL, Toad 8.1, Unix shell scripts, MS Visio 2007.
Confidential
Oracle Developer
Responsibilities:
- Designed and developed data tables to store employee payroll information.
- Interacted with end users to prepare system requirements.
- Identify the data and activities that need to be maintained.
- Designed and developed all thetables, viewsfor the system in Oracle.
- Created indexes, sequences, constraints.
- Developed Application programs usingOracle PL/SQLPackages, Procedures, Functions and Database Triggers.
- Supported technical teams involved with Oracle issues and operations.
- Involved in Implementation and production Support of the system.
Environment: Oracle 8i/7, MS SQL, Developer 2000, Forms and Reports, SQL Plus.