Sr. Data Engineer Resume
SUMMARY
- Experience in working in different bigdata distributions like Hortonworks, Cloudera and MapR.
- Worked on optimized data formats like Parquet, ORC and Avro for data storage and extraction.
- Experience in building ingestion batch pipelines using Sqoop, Fume, Spark, Talend and NiFi.
- Experience in design, develop and maintain DataLake projects using different bigdata tool stack.
- Migrated data from different data sources like Oracle, MySQL, Teradata to Hive, HBase and HDFS.
- Built optimized Hive ETL and Spark data pipelines and datastores in Hive and HBase.
- Designed and developed optimized data pipelines using Spark SQL, Spark with Scala and Python.
- Implemented various optimization techniques in Hive and Spark ETL process.
- Experienced in building Jupyter notebooks using PySpark for extensive data analysis.
- Working knowledge in different Nosql databases like MongoDB, Hbase, Cassandra.
- Experience in ingesting and exporting data from Apache Kafka using Apache Spark Streaming.
- Experience in building end to end automation for different actions using Apache Oozie.
- Implemented scalable microservices applications using Spring boot for building Rest end points.
- Built Talend and NiFi integrations for ingestion bi - directional data into different sources.
- Good exposure to different cloud platforms like Amazon webservices, Azure and Google Cloud.
- Expert with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis).
- Implemented application development using various tool stack in Azure like Cosmos DB, AZURE Databricks, Event Hub and Azure HD Insights.
- Built historical and incremental reporting dashboards in Power Bi and Tableau.
- Used Python to run the ansible playbook which will deploy the logic apps to azure.
- Built application metrics and alerting dashboards using Kibana.
- Built kibana dashboard for real time logs aggregations and analysis.
- Experience in CI and CD using Jenkins and Docker.
- Worked on scaling microservices applications using Kubernetes and docker.
- Experience in GCP tools like BigQuery, Pub/Sub, Dataproc and Data lab.
- Experience in working Agile environments and waterfall models.
PROFESSIONAL EXPERIENCE
Confidential
Sr. Data Engineer
Responsibilities:
- Developed code for importing and exporting data from different workstreams into HDFS using Sqoop.
- Implemented Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Created HBase tables from Hive and Wrote HiveQL statements to access HBase table data.
- Developed UDF’s in Java as and when necessary to use in Hive queries.
- Handle ETL Framework in Spark for writing data from HDFS to Hive.
- Involved in Spark streaming using Scala and python for real-time computations to process JSON files.
- Worked on Kafka messaging queue for Data Streaming in both batch and real-time applications.
- Automated tasks using Oozie for loading data into HDFS through Sqoop and ETL preprocessing the data with Hive and spark with scala and python.
- Developed various scripting functionality using shell Script and Python.
- Built Jupyter notebooks using pySpark for extensive data analysis and exploration.
- Implemented code coverage and integrations using Sonar for improving code testability.
- Pushed application logs and data streams logs to Kibana server for monitoring and alerting purpose.
- Worked on migrating data from HDFS to Azure HD Insights and Azure Databricks.
- Implemented multiple modules in microservices to expose data through Restful Api’s.
- Developed Jenkins pipelines for continuous integration and deployment purpose.
Environment: Hadoop 2.0, Cloudera, HDFS, MapReduce, Hive, Impala, HBase, Sqoop, Kafka, Spark, Linux, MySQL, Azure
Confidential
Sr. Data Engineer
Responsibilities:
- Developed and Tuned Spark Streaming application using Scala for processing data from Kafka
- Imported batch data using Sqoop to load data from MySQL to HDFS on regular basis from various sources.
- Extracted data from various APIs, data cleansing and processing by using Java and Scala
- Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
- Worked on migrating data from HDFS to Azure HD Insights.
- Developed Complex queries and ETL process in notebook using Azure data bricks spark.
- Developed different modules in microservices to collect stats of application for visualization.
- Worked on docker and Kubernetes for deploying application and make it containerize.
- Developed complex Hive ETL Logic for data cleansing and transformation of data coming through relational systems.
- Implemented complex data types in hive also used multiple data formats like ORC, Parquet.
- Created Custom Dashboards Using Application Insights and Query Language to process metrics sent to AI and create dashboards on top of it in AZURE.
- Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity.
- Created real time streaming dashboards in PowerBi using Stream Analytics to push dataset to PowerBi.
- Developed a custom message consumer to consume the data from the Kafka producer and push the messages to service bus and event hub (Azure Components).
- Written Auto scalable functions which will consume the data from Azure Service Bus or Azure Event Hub and send the data to DocumentDB.
- Written spark Application to capture the change feed from the DocumentDB using java API and write updates to the new DocumentDB.
- Used Python to run the ansible playbook which will deploy the logic apps to azure.
Environment: Hive, Sqoop, LINUX, Cloudera CDH 5, Scala, Kafka, HBase, Avro, Spark, Zookeeper and MySQL.
Confidential
Sr. Data Engineer
Responsibilities:
- Worked on building and developing ETL pipelines using Spark-based applications.
- Worked in migration of RDMS data into Data Lake applications using Sqoop, Spark.
- Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows
- Build optimized spark jobs for data cleansing and transformations.
- Developed spark scala applications in an optimized way to complete in time.
- Developed a monitoring platform for our jobs in Kibana and Grafana.
- Developed real-time log aggregations on Kibana for analyzing data.
- Worked in developed Ni-Fi pipelines for extracting data from external sources.
- Implemented NiFi pipelines to export data from HDFS to different end points and cloud locations like Aws S3 .
- Built bi-directional ingestion pipelines in Hadoop and AWS S3 storage.
- Experience in building ingestion pipelines into AWS Redshift and DynamoDB.
- Developed data pipelines for events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
- Worked on ETL Migration services by developing AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
- Developed Jenkins pipelines for data pipeline and application deployments.
Environment: Hadoop, Sqoop, Pig, HBase, Hive, Flume, Java 6, Eclipse, Apache Tomcat 7.0, Oracle, Java, J2ee, AWS, NiFi, Kibana, Grafana.
Confidential
Data Engineer
Responsibilities:
- Ingested various data sources to Hive and HDFS using different tools like Sqoop, Kafka, Flume.
- Developed MapReduce jobs in java for data cleaning and preprocessing
- Built complex Unix/windows scripts for file transfers, emailing tasks from FTP/SFTP.
- Experienced in using Elastic search in building Data catalog.
- Experience in scheduling multiple Hadoop jobs in oozie and Scripts in Autosys.
- Ingested Transactional based data into Hbase for extensive CRUD operations.
- Transformed large sets of structured, semi structured and unstructured data using Hive and Pig.
- Created SSIS package to load data into SQL server using various transformation in SSIS.
- Developed PL/SQL procedures and used them in Stored Procedure Transformations.
- Worked on critical Finance projects and had been the go-to person for any data related issues.
- Migrated ETL code from Talend to Informatica. Involved in development, testing and postproduction for the entire migration project.
- Tuned ETL jobs in the new environment after fully understanding the existing code.
- Maintained Talend admin console and provided quick assistance on production jobs.
- Worked with business partners internal and external during requirement gathering.
- Worked closely with Business Analyst and report developers in writing the source to target specifications for Data warehouse tables based on the business requirement needs.
Environment: Informatica Power Center 9.1/9.0, Talend 4.x & Integration suite, Business Objects XI, Oracle 10g/11g, Oracle ERP, EDI, SQL Server 2005, UNIX, Windows Scripting, JIRA, MapR, Sqoop, Hive, Hbase, Kafka.