Sr. Data Engineer Resume

SUMMARY

Experience in working in different bigdata distributions like Hortonworks, Cloudera and MapR.
Worked on optimized data formats like Parquet, ORC and Avro for data storage and extraction.
Experience in building ingestion batch pipelines using Sqoop, Fume, Spark, Talend and NiFi.
Experience in design, develop and maintain DataLake projects using different bigdata tool stack.
Migrated data from different data sources like Oracle, MySQL, Teradata to Hive, HBase and HDFS.
Built optimized Hive ETL and Spark data pipelines and datastores in Hive and HBase.
Designed and developed optimized data pipelines using Spark SQL, Spark with Scala and Python.
Implemented various optimization techniques in Hive and Spark ETL process.
Experienced in building Jupyter notebooks using PySpark for extensive data analysis.
Working knowledge in different Nosql databases like MongoDB, Hbase, Cassandra.
Experience in ingesting and exporting data from Apache Kafka using Apache Spark Streaming.
Experience in building end to end automation for different actions using Apache Oozie.
Implemented scalable microservices applications using Spring boot for building Rest end points.
Built Talend and NiFi integrations for ingestion bi - directional data into different sources.
Good exposure to different cloud platforms like Amazon webservices, Azure and Google Cloud.
Expert with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis).
Implemented application development using various tool stack in Azure like Cosmos DB, AZURE Databricks, Event Hub and Azure HD Insights.
Built historical and incremental reporting dashboards in Power Bi and Tableau.
Used Python to run the ansible playbook which will deploy the logic apps to azure.
Built application metrics and alerting dashboards using Kibana.
Built kibana dashboard for real time logs aggregations and analysis.
Experience in CI and CD using Jenkins and Docker.
Worked on scaling microservices applications using Kubernetes and docker.
Experience in GCP tools like BigQuery, Pub/Sub, Dataproc and Data lab.
Experience in working Agile environments and waterfall models.

PROFESSIONAL EXPERIENCE

Confidential

Sr. Data Engineer

Responsibilities:

Developed code for importing and exporting data from different workstreams into HDFS using Sqoop.
Implemented Partitions, Buckets based on State to further process using Bucket based Hive joins.
Created HBase tables from Hive and Wrote HiveQL statements to access HBase table data.
Developed UDF’s in Java as and when necessary to use in Hive queries.
Handle ETL Framework in Spark for writing data from HDFS to Hive.
Involved in Spark streaming using Scala and python for real-time computations to process JSON files.
Worked on Kafka messaging queue for Data Streaming in both batch and real-time applications.
Automated tasks using Oozie for loading data into HDFS through Sqoop and ETL preprocessing the data with Hive and spark with scala and python.
Developed various scripting functionality using shell Script and Python.
Built Jupyter notebooks using pySpark for extensive data analysis and exploration.
Implemented code coverage and integrations using Sonar for improving code testability.
Pushed application logs and data streams logs to Kibana server for monitoring and alerting purpose.
Worked on migrating data from HDFS to Azure HD Insights and Azure Databricks.
Implemented multiple modules in microservices to expose data through Restful Api’s.
Developed Jenkins pipelines for continuous integration and deployment purpose.

Environment: Hadoop 2.0, Cloudera, HDFS, MapReduce, Hive, Impala, HBase, Sqoop, Kafka, Spark, Linux, MySQL, Azure

Confidential

Sr. Data Engineer

Responsibilities:

Developed and Tuned Spark Streaming application using Scala for processing data from Kafka
Imported batch data using Sqoop to load data from MySQL to HDFS on regular basis from various sources.
Extracted data from various APIs, data cleansing and processing by using Java and Scala
Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
Worked on migrating data from HDFS to Azure HD Insights.
Developed Complex queries and ETL process in notebook using Azure data bricks spark.
Developed different modules in microservices to collect stats of application for visualization.
Worked on docker and Kubernetes for deploying application and make it containerize.
Developed complex Hive ETL Logic for data cleansing and transformation of data coming through relational systems.
Implemented complex data types in hive also used multiple data formats like ORC, Parquet.
Created Custom Dashboards Using Application Insights and Query Language to process metrics sent to AI and create dashboards on top of it in AZURE.
Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity.
Created real time streaming dashboards in PowerBi using Stream Analytics to push dataset to PowerBi.
Developed a custom message consumer to consume the data from the Kafka producer and push the messages to service bus and event hub (Azure Components).
Written Auto scalable functions which will consume the data from Azure Service Bus or Azure Event Hub and send the data to DocumentDB.
Written spark Application to capture the change feed from the DocumentDB using java API and write updates to the new DocumentDB.
Used Python to run the ansible playbook which will deploy the logic apps to azure.

Environment: Hive, Sqoop, LINUX, Cloudera CDH 5, Scala, Kafka, HBase, Avro, Spark, Zookeeper and MySQL.

Confidential

Sr. Data Engineer

Responsibilities:

Worked on building and developing ETL pipelines using Spark-based applications.
Worked in migration of RDMS data into Data Lake applications using Sqoop, Spark.
Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow
Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows
Build optimized spark jobs for data cleansing and transformations.
Developed spark scala applications in an optimized way to complete in time.
Developed a monitoring platform for our jobs in Kibana and Grafana.
Developed real-time log aggregations on Kibana for analyzing data.
Worked in developed Ni-Fi pipelines for extracting data from external sources.
Implemented NiFi pipelines to export data from HDFS to different end points and cloud locations like Aws S3 .
Built bi-directional ingestion pipelines in Hadoop and AWS S3 storage.
Experience in building ingestion pipelines into AWS Redshift and DynamoDB.
Developed data pipelines for events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
Worked on ETL Migration services by developing AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
Developed Jenkins pipelines for data pipeline and application deployments.

Environment: Hadoop, Sqoop, Pig, HBase, Hive, Flume, Java 6, Eclipse, Apache Tomcat 7.0, Oracle, Java, J2ee, AWS, NiFi, Kibana, Grafana.

Confidential

Data Engineer

Responsibilities:

Ingested various data sources to Hive and HDFS using different tools like Sqoop, Kafka, Flume.
Developed MapReduce jobs in java for data cleaning and preprocessing
Built complex Unix/windows scripts for file transfers, emailing tasks from FTP/SFTP.
Experienced in using Elastic search in building Data catalog.
Experience in scheduling multiple Hadoop jobs in oozie and Scripts in Autosys.
Ingested Transactional based data into Hbase for extensive CRUD operations.
Transformed large sets of structured, semi structured and unstructured data using Hive and Pig.
Created SSIS package to load data into SQL server using various transformation in SSIS.
Developed PL/SQL procedures and used them in Stored Procedure Transformations.
Worked on critical Finance projects and had been the go-to person for any data related issues.
Migrated ETL code from Talend to Informatica. Involved in development, testing and postproduction for the entire migration project.
Tuned ETL jobs in the new environment after fully understanding the existing code.
Maintained Talend admin console and provided quick assistance on production jobs.
Worked with business partners internal and external during requirement gathering.
Worked closely with Business Analyst and report developers in writing the source to target specifications for Data warehouse tables based on the business requirement needs.

Environment: Informatica Power Center 9.1/9.0, Talend 4.x & Integration suite, Business Objects XI, Oracle 10g/11g, Oracle ERP, EDI, SQL Server 2005, UNIX, Windows Scripting, JIRA, MapR, Sqoop, Hive, Hbase, Kafka.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship