Sr. Data Engineer / Etl Developer Resume
Woonsocket, RI
SUMMARY
- Over 6 years of experience in the IT Industry as the Data Engineer/ETL Developer in Cloud platform and areas including Data Analysis, Statistical Analysis, Machine Learning, Deep Learning, Data mining with large data sets of structured and unstructured data source and Big Data.
- Experience in Analysis, Design, Development and Testing of ETL methodologies in all the phases of the Data Warehousing.
- 2+ Years of Experience in IICS (Informatica Intelligent Cloud Services).
- Expertise in managing full life cycle of Data Science project includes transforming business requirements intoData Collection, Data Cleaning, Data Preparation, Data Validation, Data Mining, and Data Visualization.
- Experience in building end to end data science solutions using Python, SQL and Tableau by leveraging machine learning based algorithms, Statistical Modeling, Data Mining, Natural Language Processing (NLP) and Data Visualization.
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake.
- Having Knowledge primarily in Investments Management sector.
- Solid experience in designing and developing complex mappings to extract data from various legacy diverse sources including CSV files, flat files, fixed width files, delimited files, XML files, Oracle, MS SQL Server, DB2, Sybase, Salesforce, AWS and FTP into a common reporting and analytical Data Model using Informatica Power Center.
- Solid experience in Developing/Optimizing/Tuning mappings using Informatica.
- Expertise on various types of transformations like XML, HTTP, Web services, Lookup, Update Strategy, Stored Procedure, Joiner, Filter, Aggregator, Rank, Router, Normalizer, Sorter, External Procedure, Sequence Generator, and Source qualifier and SCD Type - 2 etc.
- Professional experience in Integration, Functional, Regression, System Testing, Load Testing and UAT Testing.
- Did Proof of Concept for Informatica Cloud integration for a project.
- Expertise using Debugger, Mapping wizards, Workflow administrator and Workflow Monitor Informatica
- Expertise in Extraction Transformation Loading (ETL) process, Dimensional Data Modeling experience using Data modeling, Star schema/Snowflake modeling, Fact and Dimensions tables, dimensional, multidimensional modeling and De-normalization techniques.
- Experience in writing UNIX shell scripting.
- Expertise monitoring workflows, pipeline partitioning, configuring email.
- Strong back end experience in understanding PL/SQL stored procedures, functions, packages and triggers.
- Involved in Performance Tuning of the Data Warehouses including the creation of materialized views, bitmapped indexes and partition.
- Team player with strong communication and interpersonal skills.
- Good knowledge and experience in implementing practices in Software Development Life Cycle (SDLC).
- Expertise in design and implementation of Slowly Changing Dimensions (SCD).
- Experienced in loading data, troubleshooting, debugging mappings, performance tuning of Informatica (Sources, Targets, Mappings and Sessions) and fine-tuned transformations to make them more efficient in terms of session performance.
- Involved in designing and implementing Data Mart applications, mainly transformation Process using ETL tools like Informatica.
- Experienced with Integration of data from heterogeneous sources such as Relational tables, flat files, MS Excel, and XML files.
- Solid understanding of big data technologies like Hadoop, Spark, HDFS, MapReduce, Pig and Hive.
- Good knowledge with Data migration with respect to Supply chain and Finance.
- Good understanding of Views, Synonyms, Indexes, Joins, and Sub-Queries.
- Effective working relationships with client team to understand support requirements, and effectively manage client expectations. Ability to learn new technologies quickly.
- Involved in Unit testing and Integration testing.
- Experience in Infrastructure Development and Operations involving AWS Cloud platforms.
TECHNICAL SKILLS
ETL /DW Tools: Informatica PowerCenter 10.3/10.2/10.1/9.1/8.6.0, IICS.
Operating Systems: Windows 10/NT/XP, UNIX
DBMS: Oracle 11g/10g/9i, Sybase, SQL Server, DB2, AWS Redshift (Aginity tool).
Programming Languages: Oracle, SQL, PL/SQL, AWS, Perl scripting, SOQL (Salesforce).
Scheduling Tools: Maestro, Control M, Autosys
Management Tools: Remedy, JIRA, BitBucket, TFS, ServiceNow, Change point.
Cloud Platform: AWS, EC2, S3, EMR, VPC, Route 53, Lambda, Cloud watch, Cloud trail, IAM and GCP
SDLC/Testing Methodologies: Agile, Waterfall, Scrum, TDD
Languages: Python, Scala, Java, Perl.
Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, HBase, Teradata, Mongo DB and Cassandra
Reporting Tools: MS Office (Word/Excel/Power Point/Visio), Tableau, Business Intelligence, Business Objects 5.x/ 6.x
PROFESSIONAL EXPERIENCE
Confidential, Woonsocket, RI
Sr. Data Engineer / ETL Developer
Responsibilities:
- Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
- Generated Python Django Forms to record data of online users.
- Developed monitoring and notification tools using Python.
- Continuous integration, automated deployment and management using Jenkins.
- Wrote and executed various MYSQL database queries fromPythonusingPython-MySQL connector and MySQL database package.
- Monitoring and optimizing the database performance through database tuning, system tuning, application/SQL tuning.
- Designed, built, and maintained scalable data warehousing, data models, and data processing pipelines
- Worked with data scientists to build models for understanding and content personalization.
- Designed and developed ETL mapping for data collection from various data feeds using REST API.
- Work effectively with data engineers in other teams across the world on global data products and global project teams.
- Designed ADD for Workforce Scheduler Applications.
- Developed Mappings/Workflows using Informatica Powercenter.
- Developed scripts to copy, sFTP, purge and zip using shell scripting.
- Scheduled and automated jobs using Control M.
- Workings on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Setup storage and data analysis tools in Amazon Web Services (AWS) cloud computing infrastructure.
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using Python, and MongoDB.
- Good experience in writing SQL Queries and implementing stored procedures, functions, packages, tables, views, Cursors, triggers.
- Experience building data warehouses and data modeling pipelines on AWS
- Developed and managed cloud VMs withAWSEC2 command line clients and management console.
- Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup onAWS.
- Extracted the data from hive tables by writing efficient Hive queries.
- Create ETL scripts for the ad-hoc requests, requests to retrieve data from analytic sites.
- Worked with business users to identify requirements and design mapping document.
- Extract data from Xml’s and load the Oracle DB.
- Gave production support when needed.
- Created test plans and documented the result sets.
- Migrated Workflows, parameter files and scripts using Tortoise SVN.
Confidential, Boston, MA
Cloud Data Engineer
Responsibilities:
- Worked on major data loads like Salesforce, Xactly, MARS and Epicor projects.
- Developed Informatica ETL code consistent with departmental standards and practices.
- Built Daily loads to copy from Salesforce objects to Dim/Fact tables in AWS.
- Upserted data to Salesforce from AWS, MySQL, Flat files.
- Used IICS for Sales department to extract, transform and load data primarily to Salesforce.
- Did Data Synchronization and Masking tasks using IICS.
- Used Auto Sys automation tool in running INFA jobs on PROD.
- Worked closely with reporting team who uses Tableau.
- Design and build scalable data pipelines to ingest, translate, and analyze large sets of data
- Develop tools supporting self-service data pipeline management (ETL)
- Worked on Workbench to query Salesforce objects (SOQL).
- Effectively used IICS Data integration to create mapping templates to bring data into staging layer from hybrid data source systems like Sql Server, Oracle, AWS and Salesforce.
- Used Informatica 10.x to create mappings, build workflows and Monitoring.
- Converted the existing MARS project data feeds to source from AWS for faster performance.
- Gave production support when needed.
- Develop Sqoop scripts to handle change data capture for processing incremental records between new arrived and existing data inRDBMStables.
- InstalledHadoop, Map Reduce, HDFS, and AWSand developed multipleMapReducejobs inPIGandHivefor data cleaning and pre-processing.
- Deploy/Migrate ETL code, DDL and DML to Production.
- Develop SQL scripts to load and transform data as necessary.
- Used ServiceNow as the primary ticketing tool for managing release tasks.
- Translated existing PL/SQL to Informatica ETL code.
- Performed unit testing and integration testing from source to target.
- Documented performed tasks.
- Code reviewed for other team projects.
Confidential, Boston, MA
Sr. ETL Developer
Responsibilities:
- Worked on Agile methodology as SDLC using JIRA.
- Actively Participated in daily scrums, sprint reviews, backlog refinement, sprint planning, Sprint demo and Sprint retro sessions
- Worked on Informatica Cloud to implement Proof of concept for integrating Oracle and flat files to Salesforce.
- Used HTTP transformations to download XML data from website.
- Used Informatica 10.1.1 to create mappings, build workflows and Monitoring.
- Coded Perl script to convert .csv to .xlsx and installed related modules into Perl.
- Worked on multiple reporting projects which are time sensitive and completed them on time within proposed budgets.
- Developed hundreds mappings on PowerCenter of SCD type 1 and Type 2.
- Used Param Variables and Mapping Parameters to smoothen mapping creation.
- Used Maestro automation tool in running INFA jobs on PROD.
- Used SQL repository queries on PowerCenter, to identify objects modified and capture them on xml files for migration. Involved in Data synchronizing across DEV, MOD, STAGE and PROD environments.
- Deployed workflows using UDeploy.
- Used Perl scripting in moving and FTP process of files.
- Used CDC methodology in our mappings to provide the latest data to the reporting team.
- Worked closely with reporting team who uses GRIP.
- Used tool Remedy for creating CRQ and Incident tickets.
- Worked on Bitbucket and Sourcetree for SQL code and parameter file deployments.
- Code reviews requests were performed as part of the daily routines
- Worked on unstructured data sources
Confidential
ETL Developer
Responsibilities:
- Responsible for analysis of requirements and designing generic and standard ETL process to load data from different source systems
- The developed objects are tested for unit/component testing and prepared test cases document for mappings/sessions/workflows.
- Involving in the daily status meeting and interacting with onshore team through mails/calls to follow up the module and to resolve the data/code issues.
- Handled Classification System part of the project which involved loading of the data based on some pre conditions
- Understanding the existing business model and customer requirements.
- Involved in developing and documenting the ETL (Extract, Transformation and Load) strategy to populate the Data Warehouse from various source systems
- Involved in Data Extraction, Staging, Targeting Transformation and Loading.
- Involved in testing at the data base end and reviewing the Informatica Mappings as per the business logic
- Listed out the issues that was not according to business requirement, developed some maps and changes for other maps.
- Writing several test cases, identifying the issues that can occur, understanding the date merge, match process.