We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

0/5 (Submit Your Rating)

Evansville, IN

SUMMARY

  • Eight plus years of experience in Analysis, Design, Development and Implementation as a Data Engineer.
  • Expert in providing ETL solutions for any type of business model.
  • Provided and constructed solutions for complex data issues.
  • Experience in development and design of various scalable systems using Hadoop technologies in various environments.
  • Extensive experience in analyzing data using Hadoop Ecosystems including HDFS, MapReduce, Hive & PIG.
  • Experience in understanding the security requirements for Hadoop.
  • Extensive experience in working with Informatica Powercenter
  • Implemented Integration solutions for cloud platforms with Informatica Cloud.
  • Worked with Java based ETL tool, Talend.
  • Proficient in SQL, PL/SQL and Python coding.
  • Develop and maintain ETL solutions for large - scale data warehouses using the latest technologies.
  • Design and implement data pipelines using Hadoop ecosystems such as HDFS, MapReduce, Hive, Spark, and Kafka.
  • Work with cloud platforms and develop integration solutions using Informatica Cloud, AWS Glue, or Azure Data Factory.
  • Develop and maintain data models and dimensional modeling using 3NF, Star, and Snowflake schemas for OLAP and Operational Data Store (ODS) applications.
  • Expert in SQL and PL/SQL coding, and proficient in programming languages such as Python and Java.
  • Optimize ETL processes and SQL queries for better performance using techniques such as partitioning, indexing, and data compression.
  • Perform complex data analysis and provide critical reports to support various departments.
  • Work with Business Intelligence tools such as Power BI and Tableau to create dashboards and visualizations.
  • Automate scheduling and process automation using shell scripting and Python.
  • Ensure data security and privacy requirements are met while working with large datasets.
  • Provide technical guidance to the team and mentor junior developers.
  • Collaborate with cross-functional teams to gather requirements, design, develop and implement data solutions, and manage client expectations.
  • Stay up-to-date with the latest technologies and trends in the industry and evaluate their potential use in the organization.

TECHNICAL SKILLS

Big Data Eco Systems: Apache Hadoop, HDFS, MapReduce, Apache Spark, Apache Kafka, Apache Flink, Apache Cassandra Apache Hive with support for LLAP and Tez engines, Apache Pig, Apache Sqoop, Apache Oozie Apache HBase, Apache Phoenix, Apache ZooKeeper, Apache Flume

Programming: Python, Java, Scala, R.

Data Warehousing: Informatica PowerCenter, Talend Data Integration, AWS Glue, Azure Data Factory

Applications: Salesforce, Oracle CX, HubSpot, Marketo, Adobe Experience Cloud

Databases: Oracle, SQL Server, MySQL, PostgreSQL, MongoDB, Amazon Redshift

BI Tools: Tableau, Microsoft Power BI, QlikView, Looker

Query Languages: SQL, PL/SQL, T-SQL, ANSI SQL 2016, Apache HiveQL, Apache Spark SQL, Amazon Redshift SQL, PostgreSQL SQL

Scripting Languages: Bash, Python, PowerShell, Perl

RDBMS Utility: SQLcl, DBeaver, Oracle SQL Developer, Microsoft SQL Server Management Studio, MySQL Workbench

Scheduling Tools: Apache Airflow, Apache NiFi, Cron, Jenkins, Azure Logic Apps.

PROFESSIONAL EXPERIENCE

Confidential, Evansville, IN

Sr. Data Engineer

Responsibilities:

  • Analyze and cleanse raw data using HiveQL, Spark SQL, and Apache Flink
  • Experience in data transformations using MapReduce, Apache Spark, Apache Flink, and Hive for different file formats including Parquet, ORC, and Avro
  • Involved in converting Hive/SQL queries into transformations using Python, Scala, and Java
  • Perform complex joins on tables in Hive, Spark, and Flink with various optimization techniques such as broadcast joins, bucketing, and partitioning
  • Create Hive tables as per requirements, internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
  • Work extensively with HIVE DDLS and Hive Query language(HQLs), Spark SQL and Flink SQL
  • Manage and monitor Hadoop infrastructure with Apache Ambari, Cloudera Manager, and Hortonworks Data Platform
  • Work with various Big Data technologies such as Apache Kafka, Apache Cassandra, and Apache HBase
  • Build Integration between applications primarily Salesforce, Oracle CX, HubSpot, Marketo, Adobe Experience Cloud, and AWS Glue
  • Extensive work in Informatica Cloud and Talend Data Integration for ETL processing
  • Expertise in Informatica cloud apps Data Integration, Data Quality, and Data Governance, Task Flows, Mapping configurations, and Real-time apps like process designer and process developer.
  • Work extensively with various file formats such as CSV, JSON, and XML. Loading them into on-premise applications and retrieve data from applications to files
  • Develop Informatica cloud real-time processes (ICRT) and Talend Jobs for streaming data processing
  • Work with RESTful and SOAP APIs, and API management tools such as Apigee, Mulesoft, and AWS API Gateway
  • Write SOQL, SQL, and Spark SQL queries, create test data in Salesforce, Oracle CX, HubSpot, Marketo, Adobe Experience Cloud, and AWS Glue for ETL mappings unit testing.
  • Prepare TDDs, Test Case documents after each process has been developed.
  • Develop and maintain technical documentation for launching Hadoop cluster and executing ETL mappings and workflows.

Technologies Used: Bigdata ECO systems, Hadoop, HDFS, Hive, PIG, Cloudera, MapReduce, Python, Informatica Cloud Services, Salesforce, Unix scripts, FlatFiles, XML files.

Confidential, St. Louis, MO

Data Engineer

Responsibilities:

  • Built and maintained reporting data warehouse from ERP system using Order Management, Invoice & Service contracts modules.
  • Extensive experience in Informatica Powercenter and worked as a SME for Data Warehouse related processes.
  • Performed Data analysis for building and maintaining Reporting Data Mart.
  • Optimized Informatica mappings and sessions for improving process efficiency and eliminating bottlenecks.
  • Developed and maintained complex SQL Queries, PL/SQL procedures and converted them to ETL tasks.
  • Developed and maintained Python scripts for automating file transfer, emailing and other file-related tasks.
  • Worked with CI/CD pipelines for deployments from Dev to UAT, and then to Prod.
  • Experience in Informatica Cloud for data integration between Salesforce, ServiceNow, HubSpot, Marketo, and other applications using Data Synchronization, Data Replication, Task Flows & Mapping configurations.
  • Experience in building and maintaining SOAP and REST APIs, as well as implementing Proof of concepts.
  • Experience in developing web services mappings and exposing them as SOAP wsdl.
  • Worked with Reporting developers to oversee the implementation of reports and dashboard designs in Tableau and other BI tools.
  • Assisted users in creating/modifying worksheets and data visualization dashboards in Tableau.
  • Tuned and optimized report and dashboard performance.
  • Assisted report developers with writing required logic and achieve desired goals.
  • Performed data gap analysis and identified data issues within DWH dimension and fact tables like missing keys, joins, etc.
  • Wrote SQL queries and Python scripts to identify and validate data inconsistencies in data warehouse against source system.
  • Validated reporting numbers between source and target systems.
  • Finding a technical solution and business logic for fixing any missing or incorrect data issues identified and coordinating with reporting developers for the same.

Technologies Used: Informatica PowerCenter 10.x/9.x, Informatica Cloud, Oracle 12c/19c, SQL Server 2019, Tableau 2022.1, Salesforce, ServiceNow, HubSpot, Marketo, Python, Unix, PowerShell, CI/CD pipelines.

Confidential, Patskala, Ohio

ETL Developer

Responsibilities:

  • Designed and implemented reporting Data Warehouse with online transaction system data using Informatica PowerCenter and Azure Synapse Analytics.
  • Developed and maintained data warehouse for PSN project.
  • Created reports and publications for Third Parties using Power BI for Royalty payments.
  • Managed user accounts, groups, and workspace creation for different users in Azure Synapse Studio.
  • Wrote complex UNIX/Windows scripts for file transfers and email tasks from FTP/SFTP using Azure Logic Apps.
  • Worked with PL/SQL procedures and used them in Stored Procedure Transformations.
  • Worked extensively on Oracle and SQL Server. Wrote complex SQL queries to query ERP system for data analysis purpose.
  • Migrated ETL code from Talend to Informatica. Involved in development, testing, and post-production for the entire migration project.
  • Tuned ETL jobs in the new environment after fully understanding the existing code.
  • Maintained Talend admin console and provided quick assistance on production jobs.
  • Involved in designing Power BI reports and dashboards.
  • Built ad-hoc reports using stand-alone tables in Power BI.
  • Created publications which split into various reports based on specific vendor using Power BI.
  • Wrote custom SQL for some complex reports in Power BI.
  • Worked with business partners internal and external during requirement gathering.
  • Worked closely with Business Analyst and report developers in writing the source to target specifications for Data warehouse tables based on the business requirement needs.
  • Exported data into Excel for business meetings, which made the discussions easier while looking at the data.
  • Performed analysis after requirements gathering and walked the team through major impacts.
  • Provided and debugged crucial reports for finance teams during the month-end period using Power BI.
  • Addressed issues reported by Business Users in standard reports by identifying the root cause.
  • Get the reporting issues resolved by identifying whether it is report-related issue or source-related issue.
  • Creating Ad hoc reports as per user needs using Power BI.
  • Investigating and analyzing any discrepancy found in data and then resolving it.

Technologies Used: Informatica PowerCenter 10.x, Azure Synapse Analytics, Talend Integration suite, Power BI, Oracle 12c, SQL Server 2019, Azure Logic Apps, JIRA.

Confidential

Data Engineer

Responsibilities:

  • Gathered business requirements and prepared technical design documents, target to source mapping document, mapping specification document.
  • Extensively worked on ETL tools such as Informatica Powercenter or Talend to extract, transform, and load data from various sources into data warehouses.
  • Developed and maintained ETL workflows and mappings for data integration and data quality.
  • Optimized query performance by working with indexing, partitioning, and other database tuning techniques.
  • Developed complex data transformations using SQL, PL/SQL, and other programming languages.
  • Utilized cloud-based technologies such as AWS S3, Redshift, and Snowflake for data storage and processing.
  • Designed and implemented data quality checks and data cleansing routines to ensure accuracy and consistency of data.
  • Developed and maintained shell scripts for file transfer automation and scheduling ETL workflows using tools such as Control-M or Airflow.
  • Worked with business users and data analysts to understand data requirements and provide data solutions that meet their needs.
  • Performed testing and debugging of ETL workflows and provided necessary reports to the business users.

Technologies Used: ETL tools such as Informatica Powercenter, Talend, AWS S3, Redshift, Snowflake, SQL, PL/SQL, Control-M, Airflow, and various database technologies.

Confidential

ETL/Data Warehouse Developer

Responsibilities:

  • Collaborated with business stakeholders to gather requirements and documented them for project development.
  • Conducted design reviews and ETL code reviews with teammates.
  • Developed ETL mappings using Informatica to extract, transform and load data from various sources such as Relational databases, Flat files, XML files, and APIs into the target Data Warehouse.
  • Implemented complex Informatica transformations to transform data as per business rules and requirements.
  • Created data mappings in Informatica to extract data from Sequential files and other non-database sources.
  • Extensively worked on UNIX Shell Scripting and Python scripting for file transfer automation, error logging, and other automation tasks.
  • Scheduled processes using job scheduling tools like Autosys, Control-M, and Airflow.
  • Performed various types of testing, including Unit, Integration, System, and Acceptance testing of various ETL jobs.
  • Optimized ETL jobs and SQL queries for performance tuning and data processing.
  • Provided production support and troubleshooting for ETL issues and worked with other teams to resolve them.
  • Designed and developed data models and dimensional models using 3NF, Star and Snowflake schemas for OLAP and Operational data store (ODS) applications.
  • Implemented ETL solutions for cloud platforms such as AWS, Azure and Google Cloud using cloud-based ETL tools like AWS Glue, Azure Data Factory, and Google Cloud Dataflow.
  • Extensively worked with Big Data technologies such as Hadoop, Spark, Hive, and Pig for analyzing and processing large volumes of data.
  • Worked with containerization technologies such as Docker and Kubernetes for deploying ETL jobs and other data processing tasks.
  • Collaborated with data scientists and machine learning engineers to design and develop data pipelines for machine learning and predictive analytics models.
  • Provided technical leadership and guidance to junior team members and mentored them on best practices and industry standards for ETL development and data warehousing.

Technologies Used: Informatica Power Center, Oracle, SQL Server, Python, UNIX Shell Scripting, Job Scheduling Tools (Autosys, Control-M, Airflow), Data Warehousing concepts.

We'd love your feedback!