Data Engineer Resume
4.00/5 (Submit Your Rating)
Negaunee, MI
SUMMARY
- 8 years of experience in Analysis, Design, Development, and Implementation as a Data Engineer.
- Expert in providing ETL solutions for any type of business model.
- Experience in development and design of various scalable systems using Hadoop technologies in various environments. Extensive experience in analyzing data using Hadoop Ecosystems including HDFS, MapReduce, Hive & PIG.
- Experience in understanding the security requirements for Hadoop.
- Extensive experience in working with Informatica PowerCenter.
- Well versed with big data on AWS glue cloud services i.e., EC2, S3, Glue, Anthena, DynamoDB and RedShift
- Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys
- Defined and deployed monitoring, metrics, and logging systems on AWS.
- Played a key role in migrating Cassandra, Hadoop cluster on AWS glue and defined different read/write strategies.
- Docker container orchestration using ECS, ALB and lambda.
- Strong Data Warehousing ETL experience of using Informatica 9.1/8.6.1/8.5/8.1/7.1 PowerCenter Client tools - Mapping Designer, Repository manager, Workflow Manager/Monitor and Server tools Informatica Server, Repository Server
- Worked on Big data on AWS and AWS connect glue cloud services i.e., EC2, S3, EMR and DynamoDB.
- Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS glue code pipeline with AWS connect.
- Experience on Teradata tools and utilities (BTEQ, Fast load, Multi Load, Fast Export, and TPUMP).
- Experience with Unix/Linux systems with scripting experience and building data pipelines.
- Responsible for migration of application running on premise onto Azure cloud.
- Implemented Integration solutions for cloud platforms with Informatica Cloud.
- Proficient in SQL, PL/SQL and Python coding.
- Excellent understanding of best practices of Enterprise Data Warehouse and involved in Full life cycle development of Data Warehousing.
- Involved in building Data Models and Dimensional Modeling with 3NF, Star and Snowflake schemas for OLAP and Operational data store (ODS) applications.
- Skilled in designing and implementing ETL Architecture for cost effective and efficient environment.
- Optimized and tuned ETL processes & SQL Queries for better performance.
- Performed complex data analysis and provided critical reports to support various departments.
- Work with Business Intelligence tools like Business Objects and Data Visualization tools like Tableau.
- Extensive Shell/Python scripting experience for Scheduling and Process Automation.
- Good exposure to Development, Testing, Implementation, Documentation and Production support.
- Hands on experience with different programming languages such as Java, Python, R, SAS
- Experience in using different Hadoop eco system components such as HDFS, YARN, MapReduce, Spark, Pig, Sqoop, Hive, Impala, Hbase, Kafka, and Crontab tools.
- Expert in creating HIVE UDFs using java to analyze data sets for complex aggregate requirements.
- Design, develop, test, implement and support of Data Warehousing ETL using Talend.
- Involved in Development and Created PL/SQL stored procedures, functions.
- Experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark-Scala, PySpark, Spark-Sql, and Pig.
- Expert in creating HIVE UDFs using java to analyze data sets for complex aggregate requirements.
- Experience in using SQOOP for importing and exporting data from RDBMS to HDFS and Hive.
- Experience on MS SQL Server, including SSRS, SSIS, and T-SQL.
- An excellent team member with an ability to perform individually, good interpersonal relations, effective communication skills, hardworking and high level of motivation.
PROFESSIONAL EXPERIENCE
Data Engineer
Confidential, Negaunee, MI
Responsibilities:
- Analyze and cleanse raw data using HiveQL
- Experience in data transformations using Map-Reduce, HIVE for different file formats.
- Involved in converting Hive/SQL queries into transformations using Python
- Performed complex joins on tables in hive with various optimization techniques
- Created Hive tables as per requirements, internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
- Worked extensively with HIVE DDLS and Hive Query language (HQL)
- Involved in loading data from edge node to HDFS using shell scripting.
- In - depth knowledge of Hadoop Eco system - HDFS, Yarn, MapReduce, Hive, Hue, Sqoop, Flume, Kafka, Spark, Oozie, NiFi and Cassandra.
- Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
- Worked with different file formats like Json, AVRO and parquet and compression techniques like snappy. Nifi ecosystem is used.
- Setting up and configuring Kafka Environment in Windows from the scratch and monitoring it.
- Strong experience in All Hadoop and Spark ecosystems include Hive, Pig, Sqoop, Flume, Kafka, Cassandra, SparkSQL, Spark Streaming,
- Designed and implemented by configuring Topics in new Kafka cluster in all environment.
- Extensively used Apache Flume to collect logs and error messages across the cluster.
- Manage Hadoop infrastructure with Cloudera Manager.
- Created and maintained technical documentation for launching Hadoop cluster and for executing Hive queries.
- Build Integration between applications primarily Salesforce.
- Technical / functional background and experience with MDM Tools and ERP applications.
- Relationship types using Hierarchy tool to enable Hierarchy Manager (HM) in MDM HUB implementation.
- Deployed new MDM Hub for portals in conjunction with user interface on IDD application.
- Experienced in implementing the Data standards and Data Governance Process.
- Implemented the import and export of data using XML and SSIS.
- Involved in Planning, Defining and Designing data base using ER Studio on business requirement and provided documentation.
- Responsible for migration of application running on premise onto Azure cloud.
- Used SSIS to build automated multi-dimensional cubes.
- Wrote indexing and data distribution strategies optimized for sub-second query response
- Verify data consistency between systems.
Data Engineer
Confidential, Indianapolis, IN
Responsibilities:
- Built reporting data warehouse from ERP system using Order Management, Invoice & Service contracts modules.
- Extensive work in Informatica PowerCenter.
- Acted as SME for Data Warehouse-related processes.
- Performed Data analysis for building Reporting Data Mart.
- Tuned performance of Informatica mappings and sessions for improving the process and making it efficient after eliminating bottlenecks.
- Manage the implementation of the first master data hub for customer, vendor, and product, within a larger program that is merging two new companies into VF, upgrading SAP, and implementing SOA.
- Experience working on Informatica MDM to design, develop, test and review & optimize Informatica MDM (Siperian).
- Experience in using Snowflake Clone and Time Travel.
- Build the Logical and Physical data model for snowflake as per the changes required
- Worked on complex SQL Queries, PL/SQL procedures and convert them to ETL tasks
- Worked with PowerShell and UNIX scripts for file transfer, emailing, and other file-related tasks.
- Worked with deployments from Dev to UAT, and then to Prod.
- Worked with Informatica Cloud for data integration between Salesforce, RightNow, Eloqua, Web services applications
- Expertise in Informatica cloud apps Data Synchronization (ds), Data Replication (dr), Task Flows & Mapping configurations.
- Identified data issues within DWH dimension and fact tables like missing keys, joins, etc.
- Validated reporting numbers between source and target systems.
- Finding a technical solution and business logic for fixing any missing or incorrect data issues identified
- Coordinating and providing technical details to reporting developers
Data Engineer
Confidential, Oak Brook, IL
Responsibilities:
- Implemented reporting Data Warehouse with online transaction system data.
- Managed user account, groups, and workspace creation for different users in PowerCenter.
- Wrote complex UNIX/windows scripts for file transfers, emailing tasks from FTP/SFTP.
- Worked with PL/SQL procedures and used them in Stored Procedure Transformations.
- Extensively worked on oracle and SQL server. Wrote complex sql queries to query ERP system for data analysis purpose
- Encoded and decoded json objects using PySpark to create and modify the data frames in Apache Spark
- Develop programs in Spark to use on application for faster data processing than standard MapReduce programs.
- Resolved the data type inconsistencies between the source systems and the target system using the Mapping Documents and analyzing the database using SQL queries.
- Worked with Devops team to Clusterize NIFI Pipeline on EC2 nodes integrated with Spark, Kafka, Postgres running on other instances using SSL handshakes in QA and Production Environments.
- Healthcare system implementation including enterprise Electronic Medical Records EMR software.
- Used knowledge of Health Care Information Systems Electronic Medical Records EMR model to develop proposed workflow in MS Visio.
- Designed, test, and customized EMR templates, documents, and crystal reports.
- Worked closely with Business Analyst and report developers in writing the source to target specifications for Data warehouse tables based on the business requirement needs.
- Exported data into excel for business meetings which made the discussions easier while looking at the data.
- Experience working with GitHub and JIRA for version management and bug tracking
- Worked on the concept of GIT cloning to create GIT repository.
- Setup and maintained Subversion (SVN) and Git repositories along with the creation of branches and tags.
- Used Git & Subversion (SVN)as source code repositories, managed for branching, merging, and tagging the files.
- Responsible for designing and deploying best SCM processes and procedures with
ETL Developer
Confidential
Responsibilities:
- Gathered business requirements and prepared technical design documents, target to source mapping document, mapping specification document.
- Extensively worked on Informatica PowerCenter.
- Parsed complex files through Informatica Data Transformations and loaded it to Database.
- Optimized query performance by oracle hints, forcing indexes, working with constraint-based loading and few other approaches.
- Building and maintain SQL scripts, indexes, and complex queries for data analysis and extraction.
- Automate the configuration management of database and Big Data systems.
- Created Sqoop scripts to ingest data from HDFS to Teradata and from SQL Server to HDFS and to PostgreSQL.
- Installing and monitoring PostgreSQL database using the standard monitoring tools like Nagios etc.
- Analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud.
- Implemented the Biopic features using SIFT algorithm using Java.
- Provide End User support to all Confidential End / Confidential Users via phone, email, jabber, in person or remotely.
- Reimage units via SCCM according to end users assigned role.
- Worked on Oracle Databases, RedShift and Snowflakes
- Extensively worked on UNIX Shell Scripting for splitting group of files to various small files and file transfer automation.
- Worked with Autosys scheduler for scheduling different processes.
- Performed basic and unit testing.
- Assisted in UAT Testing and provided necessary reports to the business users.
ETL/Data Warehouse Developer
Confidential
Responsibilities:
- Gathered requirements from Business and documented for project development.
- Coordinated design reviews, ETL code reviews with teammates.
- Developed mappings using Informatica to load data from sources such as Relational tables, Sequential files into the target system.
- Provide database coding to support business applications using Sybase T-SQL
- Performed quality assurance and testing of SQL server environment.
- Used Erwin tool for dimensional modeling (Star schema) of the staging database as well as the relational data warehouse.
- Developed new processes to facilitate import and normalization, including data file for counterparties.
- Developed parameter and dimension-based reports, drill-down reports, matrix reports, charts, and Tabular reports using Tableau Desktop.
- Retrieved data from data warehouse and generated a series of meaningful business reports using SSRS.
- Expertise in Client-Server application development using Oracle 12c/11g/10g/9i, PL/SQL, SQL PLUS, TOAD and SQL LOADER.
- Extensively worked with Informatica transformations.
- Created datamaps in Informatica to extract data from Sequential files.
- Extensively worked on UNIX Shell Scripting for file transfer and error logging.
- Scheduled processes in ESP Job Scheduler.
- Conduct code review to ensure the work delivered by the team is of high-quality standards.
- Maintain relationships with assigned customers post integration support their needs and build the relationship to encourage future growth of business with the customer.
- Used shell scripts and PMCMD commands to conduct basic ETL functionalities.
- Performed Unit, Integration and System testing of various jobs.