Data Engineer Resume
5.00/5 (Submit Your Rating)
SUMMARY
- Over 7 years’ experience in Data Warehousing / Business Intelligence with strong analysis skills and handling large scale development efforts leveraging industry standard using Talend and Informatica
- Experience with very large datasets (>100 TB) and creating ETL processes as per the business requirements
- Experience with Talend Open Studio to develop processes for extracting, cleansing, transforming, integrating, and loading data into data mart database
- Expertise in creating mappings in Talend using tMap, tJoin, tParallelize, tConvertType, tAggregate, tSortRow, tLogCatcher, tRowGenerator, tNormalize, tMysqlScd, tFilter etc.
- Well acquainted with Informatica Designer components - Source Analyzer, Warehouse Designer, Transformation Developer, Mapplet and Mapping Designer
- Good knowledge of data warehouse concepts and principles - Star Schema, Snowflake, SCD, Surrogate keys, Normalization/De-normalization
- Experienced in integrating various data sources with multiple relational databases such as MySQL, Teradata, Redshift and worked on integrating data from flat files like fixed width and delimited
- Extensive client interaction and lead a team of 3 managing their day-to-day activities
- Knowledge of cloud computing concepts and able to deploy solutions on cloud using Amazon Web Services (AWS)
TECHNICAL SKILLS
Data Analysis and Mining Tools: Informatica power center 10.x, Talend DI 7.x, Tableau, Grafana, HiveQL
Databases: MySQL, Hive, Microsoft SQL Server, Redshift, Teradata 14, Vertica
Programming and Scripting: Python, UNIX
Amazon Web Services (AWS): EC2, S3, DynamoDB, Redshift, RDS
Other Tools: Excel, PowerPoint, Access, JIRA, JAMA, Service Now, HP quality Center, Erwin
PROFESSIONAL EXPERIENCE
Confidential
Data Engineer
Responsibilities:
- Responsible for the development of ETL processes in Telecom (CRM, Billing, provisioning, Order management, Inventory System) and integrate data from various sources into datamarts
- Migrated the ETL processes from Informatica to Talend for a more open source approach with a minimal impact to the system and testing AWS S3 and Redshift to migrate the system as a whole
- Used various components in Talend such as tReplace, tMap, tjava, tjavarow, database components - tMySQLInput, tMySQLrow, filebased components - tFileCopy, tFileCompare, tFileExist, TFileDelete, tFileRename
- Built a Talend-JIRA API based reporting system to track project status by creating reports such as defect aging, planned vs actual, created vs resolved etc.
- Used Informatica Powercenter to process up to 40 million transactional records on a daily basis using SCD techniques and provide data to end-users, and supporting them to make strategic decisions
- Developed test cases for unit testing of Informatica workflows for the EDW based extracts and parameterized the mappings for re-usability
- Developed Teradata SQL queries to be used in the source qualifier transformation of Informatica using joins, case statements
- Develop re-usable MySQL & Python scripts for data analysis to improve the data quality, thereby efficiently tracking 15+ key metrics for various KPI’s in a contact center environment (Net Promoter Scores, AHT, FCR etc.)
- Worked as the Change Management Lead, performed root cause analysis to minimize the impact and ensure optimal uptime of the system to prevent SLA breach
- Created dashboards in Power BI and Tableau to analyze source data trends and measure the account performance, shared with client and C-level management to update progress and recalibrate strategy
- Eliminated manual work by automating report consumption using VB scripts, which resulted in saving of 15 man-hours every week
Confidential
Business Technology Analyst/ Consultant
Responsibilities:
- Involved in full Software Development Life Cycle (SDLC) - Business Requirements Analysis, preparation of Technical Design documents, Data Analysis, Logical and Physical database design, Coding, Testing, Implementing, and deploying to business users
- Monitored the Data Quality, generating weekly/monthly/yearly statistics reports on production processes - for root cause analysis and Enhancing existing production ETL scripts
- Created ETL mappings in Informatica using transformations such as lookup, filter, expression, aggregator, filter etc.
- Developed SCD Type 2 methodology to insert and update the records for the daily load in the target tables.
- Implemented Oracle extensively for development of complex SQL queries using joins, sub queries and correlated sub queries and did performance tuning of the existing production SQL queries
- Analyzed data, performed gap analysis and determined quality issues in the existing system, which led to the inception of a new module for re-designing the architecture, generating $400K revenue annually
- Implemented a SQL- based reporting system that publishes ~10 reports daily, leading to a 15% increase in efficiency
- Performed Unit testing to detect the defects in the codes and tracked them using HP Quality center
Confidential
Informatica Developer
Responsibilities:
- Worked as Informatica consultant for Electronic Arts
- Analyzed their financial data and gathering requirements from the end users to develop financial reports
- Created and managed Informatica workflows required for the data transformation and developing reports using Hyperion Financial Management tools
- Developed mappings, sessions and Workflows to load the Historical and Daily data in to ODS tables
- Implemented various Performance Tuning techniques on Sources, Targets, Mappings, and Workflows
- Created technical design documents based on Plan review document for team based on the business specifications documents
Confidential
Senior Informatica Developer
Responsibilities:
- Managed Teradata data warehouse with a size of approximately 110TB, which contained the historical data
- Implemented Push down optimization techniques and developed Slowly Changing Dimensions (SCD) and complex mappings using Informatica powercenter
- Monitored sessions using the workflow monitor and debugged mappings for failed sessions
- Optimized SQL queries for data migration from DB2 to Teradata which increased the reporting efficiency, reducing the average runtime by 40%
- Handled operational process flow and analyzed daily processing by identifying bottlenecks in Teradata SQL queries and Informatica mappings which hinder with the performance and resolved them on time to avoid SLA breach with a resolution efficiency of 100%
- Created complex work flows to load from different upstream source systems into common tables in the Teradata warehouse.
- Created functions and stored procedures to perform the user defined functions and execute the required query whenever needed
- Identified performance issues in existing sources, targets and mappings by analyzing the data flow, evaluating transformations and tuned accordingly for better performance
- Developed and modified UNIX shell scripts as part of the ETL process
Confidential
Informatica Developer
Responsibilities:
- Worked as part of DBO migration team and involved in migrating existing Sybase Stored Procedures to Teradata
- Identified sources, targets, mappings and sessions and tuned them to improve performance. Schedule and Run Extraction and Load process and monitor sessions using Informatica Workflow Manager. Scheduled the batches to be run using the Workflow Manager
- Used Informatica debugger wizard to troubleshoot the existing mappings and tune them if required for optimal processing of the ETL jobs
- Converted the Sybase stored procedures into Teradata and checked for possible syntactical errors
- Debugged the Store procedures when the corresponding job fails in the Event Engine by analyzing the logs
- Documented the common errors that occur while converting a Sybase procedure to Teradata and best practices to be followed to avoid these errors
- Involved in identifying bugs in existing mappings by analyzing the data flow, evaluating transformations and fixing the bugs so that they conform to the business needs
- Involved in the Migration process from Development, Test and Production Environments
- Developed shell scripts, PL/SQL procedures, for creating/dropping of table and indexes of performance for pre and post session management