We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • Experience in working in different bigdata distributions like Hortonworks, Cloudera and MapR.
  • Worked on optimized data formats like Parquet, ORC and Avro for data storage and extraction.
  • Experience in building ingestion batch pipelines using Sqoop, Fume, Spark, Talend and NiFi.
  • Experience in design, develop and maintain DataLake projects using different bigdata tool stack.
  • Migrated data from different data sources like Oracle, MySQL, Teradata to Hive, HBase and HDFS.
  • Built optimized Hive ETL and Spark data pipelines and datastores in Hive and HBase.
  • Designed and developed optimized data pipelines using Spark SQL, Spark with Scala and Python.
  • Implemented various optimization techniques in Hive and Spark ETL process.
  • Experienced in building Jupyter notebooks using PySpark for extensive data analysis.
  • Working knowledge in different Nosql databases like MongoDB, Hbase, Cassandra.
  • Experience in ingesting and exporting data from Apache Kafka using Apache Spark Streaming.
  • Experience in building end to end automation for different actions using Apache Oozie.
  • Implemented scalable microservices applications using Spring boot for building Rest end points.
  • Built Talend and NiFi integrations for ingestion bi - directional data into different sources.
  • Good exposure to different cloud platforms like Amazon webservices, Azure and Google Cloud.
  • Expert with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis).
  • Implemented application development using various tool stack in Azure like Cosmos DB, AZURE Databricks, Event Hub and Azure HD Insights.
  • Built historical and incremental reporting dashboards in Power Bi and Tableau.
  • Used Python to run the ansible playbook which will deploy the logic apps to azure.
  • Built application metrics and alerting dashboards using Kibana.
  • Built kibana dashboard for real time logs aggregations and analysis.
  • Experience in CI and CD using Jenkins and Docker.
  • Worked on scaling microservices applications using Kubernetes and docker.
  • Experience in GCP tools like BigQuery, Pub/Sub, Dataproc and Data lab.
  • Experience in working Agile environments and waterfall models.

PROFESSIONAL EXPERIENCE

Confidential

Sr. Data Engineer

Responsibilities:

  • Developed code for importing and exporting data from different workstreams into HDFS using Sqoop.
  • Implemented Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Created HBase tables from Hive and Wrote HiveQL statements to access HBase table data.
  • Developed UDF’s in Java as and when necessary to use in Hive queries.
  • Handle ETL Framework in Spark for writing data from HDFS to Hive.
  • Involved in Spark streaming using Scala and python for real-time computations to process JSON files.
  • Worked on Kafka messaging queue for Data Streaming in both batch and real-time applications.
  • Automated tasks using Oozie for loading data into HDFS through Sqoop and ETL preprocessing the data with Hive and spark with scala and python.
  • Developed various scripting functionality using shell Script and Python.
  • Built Jupyter notebooks using pySpark for extensive data analysis and exploration.
  • Implemented code coverage and integrations using Sonar for improving code testability.
  • Pushed application logs and data streams logs to Kibana server for monitoring and alerting purpose.
  • Worked on migrating data from HDFS to Azure HD Insights and Azure Databricks.
  • Implemented multiple modules in microservices to expose data through Restful Api’s.
  • Developed Jenkins pipelines for continuous integration and deployment purpose.

Environment: Hadoop 2.0, Cloudera, HDFS, MapReduce, Hive, Impala, HBase, Sqoop, Kafka, Spark, Linux, MySQL, Azure

Confidential

Sr. Data Engineer

Responsibilities:

  • Developed and Tuned Spark Streaming application using Scala for processing data from Kafka
  • Imported batch data using Sqoop to load data from MySQL to HDFS on regular basis from various sources.
  • Extracted data from various APIs, data cleansing and processing by using Java and Scala
  • Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
  • Worked on migrating data from HDFS to Azure HD Insights.
  • Developed Complex queries and ETL process in notebook using Azure data bricks spark.
  • Developed different modules in microservices to collect stats of application for visualization.
  • Worked on docker and Kubernetes for deploying application and make it containerize.
  • Developed complex Hive ETL Logic for data cleansing and transformation of data coming through relational systems.
  • Implemented complex data types in hive also used multiple data formats like ORC, Parquet.
  • Created Custom Dashboards Using Application Insights and Query Language to process metrics sent to AI and create dashboards on top of it in AZURE.
  • Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity.
  • Created real time streaming dashboards in PowerBi using Stream Analytics to push dataset to PowerBi.
  • Developed a custom message consumer to consume the data from the Kafka producer and push the messages to service bus and event hub (Azure Components).
  • Written Auto scalable functions which will consume the data from Azure Service Bus or Azure Event Hub and send the data to DocumentDB.
  • Written spark Application to capture the change feed from the DocumentDB using java API and write updates to the new DocumentDB.
  • Used Python to run the ansible playbook which will deploy the logic apps to azure.

Environment: Hive, Sqoop, LINUX, Cloudera CDH 5, Scala, Kafka, HBase, Avro, Spark, Zookeeper and MySQL.

Confidential

Sr. Data Engineer

Responsibilities:

  • Worked on building and developing ETL pipelines using Spark-based applications.
  • Worked in migration of RDMS data into Data Lake applications using Sqoop, Spark.
  • Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows
  • Build optimized spark jobs for data cleansing and transformations.
  • Developed spark scala applications in an optimized way to complete in time.
  • Developed a monitoring platform for our jobs in Kibana and Grafana.
  • Developed real-time log aggregations on Kibana for analyzing data.
  • Worked in developed Ni-Fi pipelines for extracting data from external sources.
  • Implemented NiFi pipelines to export data from HDFS to different end points and cloud locations like Aws S3 .
  • Built bi-directional ingestion pipelines in Hadoop and AWS S3 storage.
  • Experience in building ingestion pipelines into AWS Redshift and DynamoDB.
  • Developed data pipelines for events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
  • Worked on ETL Migration services by developing AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
  • Developed Jenkins pipelines for data pipeline and application deployments.

Environment: Hadoop, Sqoop, Pig, HBase, Hive, Flume, Java 6, Eclipse, Apache Tomcat 7.0, Oracle, Java, J2ee, AWS, NiFi, Kibana, Grafana.

Confidential

Data Engineer

Responsibilities:

  • Ingested various data sources to Hive and HDFS using different tools like Sqoop, Kafka, Flume.
  • Developed MapReduce jobs in java for data cleaning and preprocessing
  • Built complex Unix/windows scripts for file transfers, emailing tasks from FTP/SFTP.
  • Experienced in using Elastic search in building Data catalog.
  • Experience in scheduling multiple Hadoop jobs in oozie and Scripts in Autosys.
  • Ingested Transactional based data into Hbase for extensive CRUD operations.
  • Transformed large sets of structured, semi structured and unstructured data using Hive and Pig.
  • Created SSIS package to load data into SQL server using various transformation in SSIS.
  • Developed PL/SQL procedures and used them in Stored Procedure Transformations.
  • Worked on critical Finance projects and had been the go-to person for any data related issues.
  • Migrated ETL code from Talend to Informatica. Involved in development, testing and postproduction for the entire migration project.
  • Tuned ETL jobs in the new environment after fully understanding the existing code.
  • Maintained Talend admin console and provided quick assistance on production jobs.
  • Worked with business partners internal and external during requirement gathering.
  • Worked closely with Business Analyst and report developers in writing the source to target specifications for Data warehouse tables based on the business requirement needs.

Environment: Informatica Power Center 9.1/9.0, Talend 4.x & Integration suite, Business Objects XI, Oracle 10g/11g, Oracle ERP, EDI, SQL Server 2005, UNIX, Windows Scripting, JIRA, MapR, Sqoop, Hive, Hbase, Kafka.

We'd love your feedback!