We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY:

  • Certified professional data engineer and architect with overall IT experience more than 5+ years in different domain clients and technologies.
  • Experienced in working in highly scalable and large - scale applications design and building with different technologies using Cloud, BigData, DevOps and Spring boot.
  • Also, expert in working in different working environments like Agile and Waterfall. Expert in onboarding various applications into multi-cloud environments like AWS, Azure.
  • Experience in developing Software applications including various phases of analysis, design, development, Integration, Testing and Maintenance of various big data applications various programming languages.
  • Experience in working various hadoop distributions like Cloudera, Hortonworks and MapR.
  • Expert in ingesting batch data for incremental loads from various RBMS tools using Apache Sqoop.
  • Developed scalable applications for real-time ingestions into various databases using Apache Kafka.
  • Experience in building optimized ETL data pipelines using Apache Hive, Impala and Spark.
  • Implemented various optimizing techniques in Hive scripts for data crunching and transformations.
  • Experience in building ETL scripts in Impala for faster access for reporting layer.
  • Built spark data pipelines with various optimization techniques using python and Scala.
  • Experience in loading transactional and delta loads into NoSQL databases like HBase.
  • Developed various automation flows using Apache Oozie, Airflow and Autosys.

PROFESSIONAL EXPERIENCE:

Confidential, Chicago, IL

Data Engineer

Responsibilities:

  • Implemented Ab initio Graphs based on business requirements.
  • Debugged the failures from log files and rerun the jobs, by analyzing the checkpoints within graphs using ab initio GDE.
  • Constructed graphs in Ab initio for ETL to populate staging as well as warehouse data.
  • Built multiple stages in data lake implementation and maintained it using Spark with Python.
  • Implemented multiple optimization techniques on data pipelines using Hive, Impala and Spark.
  • Developed Producer and Consumer code to ingest real time Json and Avro data to ingest into Hive.
  • Experience in developing ingestion batch pipelines using Sqoop and Spark from different sources.
  • Developed AutoSys jobs to ingest data from NAS locations into HDFS.
  • Implemented job automations in oozie scheduler with bundles, Coordinator, and workflows.
  • Worked on exposing data through BI tools like Tableau for business exploration and reporting.
  • Worked on proof of concept in the Google cloud platform (GCP) to migrate the data from On-premises environment into the Big Query.
  • Worked on GCP AI Platform Jupyter Notebooks and development of the python scripts for various transformation and loading data into Big Query tables.
  • Worked on sharing development code scripts and collaborated with our development, QA and production support team members using version control in GitHub and Jupyter notebooks.
  • Worked on continuous integration and continuous deployment (CI/CD) implementation processes and pipelines in Jenkins.
  • Involved in loading, processing of data into Microsoft Azure cloud using Azure Blobs storage, Azure Data Factory (ADF), Databricks, data lakes and Synapse Datawarehouse system.
  • Experience in building integrations from AWS S3 to ingest bi-directional flow between HDFS.
  • Athena been used to fetch the s3 data for processing queries in alteration of tables.
  • Good understanding of operationalization of large-scale data and analytics solutions on Snowflake.
  • Depth knowledge on Snow pipelines.
  • In-depth knowledge of Snowflake Database, Schema and Table structures.
  • Cloud architect for Snowflake use cases to suite the project load from S3 to snowflake worksheets and perform ETL operations to see the results.

Environment: Ab Initio 3.5.1, Co-op 3.1.7.10, Oracle 11g/12c/19g, Teradata 15.0, SQL Developer, Business Objects, Tableau, Linux, IBM Workload scheduler, Tivoli IBM, Python, Snowflake, Google Cloud Platform (GCP), Big Query, AI Jupyter Notebooks, Jenkins, Microsoft Visual Studio, Unix, GitHub, Autosys, Hadoop, Cloudera, Azure Data factory, AWS Console.

Confidential, Chicago, IL

Data Analyst

Responsibilities:

  • Experience in working multiple projects and agile teams involved in analytics and cloud platforms.
  • Experience in building scalable data pipelines in Azure cloud platform using different tools.
  • Developed multiple optimized PySpark applications using Azure Databricks.
  • Developed data pipelines using Azure Data Factory that process cosmos activity.
  • Implemented reporting stats on top of real time data using Tableau.
  • Developed ETL solutions using SSIS, Azure Data Factory and Azure Data Bricks.
  • Expert in working continuous integration and deployment using Jenkins.
  • Developed real time ingestion data pipelines from Event Hub into different tools.
  • Experience in building ETL solutions using Hive and Spark with Python and Scala.
  • Expert in working on optimizing applications built using tools like Spark and Hive.
  • Developed job automations from different clusters using Airflow Scheduler.
  • Worked on Talend integration with on prem cluster and Azure Cloud Sql for data migrations.
  • Developed code coverage and test cases integrations using sonar and Mockito.
  • Implemented complex data types in hive also used multiple data formats like ORC, Parquet.
  • Deployed the data pipelines in Azure Data Factory (ADF) that process the data using the Cosmos.
  • Developed custom dashboards on top of streaming data using power BI.
  • Developed a custom message consumer to consume the data from the Kafka producer and push the messages to service bus and event hub (Azure Components).
  • Written Auto scalable functions which will consume the data from Azure Service Bus or Azure Event Hub and send the data to Document DB.
  • Written spark Application to capture the change feed from the Document DB using java API and write updates to the new Document DB.
  • Used Python to run the ansible playbook which will deploy the logic apps to azure.

Environment: Ab Initio 3.5.1, Co-op 3.1.7.10, Oracle 11g/12c/19g, Teradata 15.0, SQL Developer, Business Objects, Tableau, Linux, IBM Workload scheduler, Tivoli IBM, Python, Hadoop.

Confidential

System Engineer

Responsibilities:

  • Developed code for bi-directional data flow from different workstreams into HDFS using Sqoop.
  • Implemented HBase tables from Hive and Wrote HiveQL statements to access HBase table data.
  • Developed optimized ETL data pipelines using PySpark and Hive.
  • Involved in Spark streaming using Scala for real-time computations to process JSON files.
  • Orchestrated tasks using Oozie for loading data into HDFS through Sqoop and Hive.
  • Developed various scripting functionality using shell Script and Python.
  • Pushed application logs and data stream logs to the Kibana server for monitoring and alerting.
  • Understanding the Business requirements, co-ordination with developers and Business Analysts.
  • Involvement in the Development of ETL strategy and subsequently develop various ab initio graphs that are complex in nature.
  • Trouble shooting and debugging the graphs using the intermediate files, phases and check points, debugger (isolation mode).
  • Resolve the issues evolved during the project execution by providing the root cause analysis.

Environment: Ab Initio 3.5.1, Co-op 3.1.7.10, Oracle 11g/12c/19g, Teradata 15.0, SQL Developer, Business Objects, Tableau, Linux, IBM Workload scheduler, Tivoli IBM, Python, Hadoop.

We'd love your feedback!