We provide IT Staff Augmentation Services!

Sr.data Engineer Resume

3.00/5 (Submit Your Rating)

Piscataway, NJ

SUMMARY

  • Around 8 years of experience in a various IT related technologies, which includes 6 years of hands - on experience in Big Data technologies.
  • Implementation and extensive working experience in wide array of tools in the Big Data Stack like HDFS, Spark, MapReduce, Hive, Pig, Flume, Oozie, Sqoop, Kafka, Zookeeper and HBase.
  • Around 4 years of experience in various Cloud ecosystems like AWS, GCP and Azure.
  • Experienced the software life cycle in SDLC, Agile, DevOps and Scrum methodologies including creating requirements, test plans.
  • Experienced in data transfer between HDFS and RDBMS using tools like Sqoop, Talend and Spark.
  • Experience in building optimized Hive Scripts for ETL Cleansing and transformations.
  • Expert in building various functionalities using UDF’s in Hive and Pig.
  • Implemented various optimizations techniques in ETL applications using Spark and Hive.
  • Implemented Optimized queries for BI applications using tools like Impala and Tez.
  • Built end to end job automations using Oozie bundle, coordinator and workflows.
  • Experience building distributed high-performance systems using Spark, Scala and Python.
  • Experience in working various optimized data formats like ORC, Parquet and AVRO
  • Experience in building Scala applications for loading data into NoSQL databases (MongoDB) .
  • Experience in creating data lake using Spark which is used for downstream applications.
  • Designed and implemented a product search service using Apache Solr.
  • Worked on building applications model’s using various machine learning libraries in Spark Mlib.
  • Experience in building applications with backend NoSQL stores like MongoDB and Cassandra.
  • Experience in building data lakes using optimized techniques in HBase.
  • Worked in AWS tools like EMR and EC2 web services which provides fast and efficient processing.
  • Experience in using AWS services like lambda, Glue, Dynamo DB, Redshift and EMR.
  • Expert in building applications using Java, Scala, SQL, PL/SQL and Restful services.
  • Built real time applications using tools like Apache Kafka, microservices and Spark Streaming.
  • Experience in automating end to end jobs in Airflow scheduler.
  • Build Code coverage dashboards using tools like Sonar integrations for code testability.
  • Developed alerting and logging reports using tools like Grafana and Kibana.
  • Working knowledge in building and using various components in Talend and NiFi.
  • Experience in building notebooks like jupyter notebook with various operations in spark python.
  • Expert in writing various scripts using Shell Script and Python Scripting.
  • Worked in different hadoop platforms distributions like Hortonworks, Cloudera and MapR.
  • Experience in working various cloud environments like AWS, Azure and GCP.
  • Worked on migration projects from on prem cluster to Azure HDInsight’s and Azure Databricks.
  • Worked on google cloud tools like BigQuery, Pub Sub, Cloud SQL, data proc and GCS.
  • Experience in building containerization tools and scripts like Docker and Kubernetes.
  • Experience in building BI applications reports using Tableau, Qlikview and Apache Superset.
  • Worked on supporting projects using technologies like Teradata and Informatica.
  • Experience in building deployments pipelines using Jenkins.
  • Worked in Agile environment and used rally and Jira tool to maintain the user stories and tasks.

PROFESSIONAL EXPERIENCE

Confidential, Piscataway, NJ

Sr.Data Engineer

Responsibilities:

  • Experience in building data applications using GCP services like BigQuery, Cloud Data Flow, Data Proc and GCS.
  • Implemented various servers on demand basis using DataProc for spark applications.
  • Developed various modules in spring applications in building REST Api’s.
  • Integrated stream sets applications with GCP cluster for data migration from on prem servers.
  • Implemented data streaming applications using Pub-sub services for real time data.
  • Used DataProc and PySpark to leverage the distributed framework on GCP to train and forecast the models in a distributed environment.
  • Experience in architecting, designing and operationalization of large-scale data and analytics solutions on Snowflake Cloud Data Warehouse.
  • Automated various applications using airflow for coordination purposes.
  • Developed Kubernetes scripts for deploying and automating applications runs.
  • Implemented various ETL spark applications using python and scala.
  • Developed various ETL scripts in BigQuery for various transformations.
  • Experience with Streaming and batch processing tools like Apache Beam, Spark.
  • Implemented various clod functions to enable automated data loads from various vendors.

Confidential, Manhattan, NY

Sr.Data Engineer

Responsibilities:

  • Experience in implementing data migration and data processing using Azure services like Azure Data Factory, Azure SQL DB, Event Hub, Azure Stream Analytics, HDInsight, Databricks Azure, Cosmos DB.
  • Built data pipelines that extract, classify, merge and deliver new insights on the data.
  • Developed Python and shell scripts to automate and schedule the workflows to run on Azure.
  • Built modern data solutions using Azure PaaS service to support visualization of data.
  • Advanced knowledge in performance troubleshooting and tuning of services in Azure like HD insight clusters, ADF, Databricks.
  • Designed and implemented data pipelines using ADF from on-premises onto SQL DW.
  • Used Apache NiFi for loading data from RDBMS to HDFS and HBase.
  • Developed Spark jobs using PySpark and Spark-SQL for data extraction, transformation & aggregation from multiple file formats.
  • Implemented various data modeling techniques for Cosmos DB.
  • Designed and implemented database solutions in Azure SQL data warehouse.
  • Experience in building Spark data pipelines and notebooks in Azure Databricks.
  • Built scalable application to pull data from Azure Event Hub to various azure services.
  • Developed various PySpark application for crunching ETL operations.
  • Consumed data from various rest end points and loaded it into Cosmos DB.
  • Extensively worked on Data science and Data engineering pipelines with azure Databricks as the environment
  • Analyzed AWS mechanism to pull the data to ADLS services in Blob storage
  • Worked on ADF in building the layers to inherit the functionalities to Databricks
  • Leading in migrating data pipelines from Azure HDInsight’s to Azure Databricks
  • Designed and developed data pipelines for IoT Sources using Azure Databricks
  • Experience in building Talend pipelines using various in built and custom components.
  • Automated end to end job actions in Airflow Scheduler for Event driven jobs.
  • Built custom dashboards using Power BI for reporting purpose on daily incremental data applications.

Confidential, Atlanta, Georgia

Data Engineer

Responsibilities:

  • Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
  • Responsible for fetching real-time data using Kafka and processing using Spark and Scala.
  • Worked on Kafka to import real-time weblogs and ingested the data to Spark Streaming.
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
  • Data validation is performed using GLUE service for data check, jobs are been added and created to run at the scheduled time
  • Implemented real-time streaming ETL pipeline using Kafka Streams API.
  • Worked on Hive to implement Web Interfacing and stored the data in Hive tables.
  • Migrated Map Reduce programs into Spark transformations using Spark and Scala.
  • Experienced with Spark Context, Spark-SQL, Spark YARN.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
  • Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
  • Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
  • Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters.
  • Developed traits and case classes etc. in Scala.
  • Developed Spark scripts using Scala shell commands as per the business requirement.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Worked on connecting the Cassandra database to the Amazon EMR File System for storing the database in S3.
  • Implemented usage of Amazon EMR for processing applications using virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Deployed the project on Amazon EMR with S3 connectivity for setting backup storage.
  • Well versed in using Elastic Load Balancer for Autoscaling in EC2 servers.
  • Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint.

Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cloudera, Oracle 10g, Linux.

Confidential, New York, NYC

Data Engineer

Responsibilities:

  • Experience in developing Hive ETL scripts for data transformation.
  • Experience in converting Hive scripts to PySpark applications for faster ETL operations.
  • Used Sqoop to import data from Relational Databases like MySQL, Oracle.
  • Involved in importing structured and unstructured data into HDFS.
  • Created Hive views on top of flattened files and loaded them processing parallelly on the cluster by setting the configurations within a single configuration file.
  • Developed Java Map Reduce programs for the analysis of sample log file stored in cluster.
  • Developed simple to complex map/reduce jobs using Hive and Pig
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Worked in optimizing and upgrading spark applications to improve performance.
  • Automated various kinds of Hadoop jobs using automation tool Apache Oozie.
  • Developed Jenkins pipeline for continuous integration and deployment for production jobs.
  • Exposure towards streaming technologies like Apache Kafka and Spark Streaming.
  • Developed Shell scripts for validating data and sending production status to the team.
  • Converting SAS datasets into csv files using PySpark.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
  • Developed Hive queries to process the data for visualizing.
  • Configured workflows that involve Hadoop actions using Oozie.
  • Worked on building integration pipelines using NiFi for various ingestions pipelines.

Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cloudera, Oracle 10g, Linux.

Confidential

Associate

Responsibilities:

  • Responsible for the analysis, documenting the requirements and architecting the application based on J2EE standards.
  • Attended Scrum meetings daily as a part of Agile Methodology.
  • Involved in complete Software Development Life Cycle (SDLC) with Object Oriented Approach of client's business process and continuous client feedback.
  • Implementing MVC Architecture using Spring Framework, customized user interfaces. Used Core Java, and Spring Aspect Oriented programming concepts for logging, security, error handling mechanism
  • Developed application modules using Spring MVC, Spring Annotations, Spring Beans, Dependency Injection, with database interface using Hibernate.
  • Used the Java Collections API extensively in the application as security protection for XML, SOAP, REST and JSON to make a secure Web Deployment.
  • Developed server-side services using Java, Spring, Web Services (SOAP, Restful, WSDL, JAXB, JAX-RPC)
  • Built Web pages that are more user-interactive using jQuery plugins for Drag and Drop, AutoComplete, AJAX, JSON, Angular JS, JavaScript and Bootstrap.
  • Used XSL to transform XML data structure into HTML pages.
  • Used Struts as the framework in this project and developed struts action classes, form beans.
  • Created dispatch Action classes, and Validation plug-in using Struts framework.
  • DB2 was used as the database and wrote queries to extract data from the database.
  • Developed SQL queries and stored procedures.
  • Designed Developed white-box test cases using JUnit, Git, JMeter, Mockito Framework.

Environment: Core Java, Agile, Scrum, XML, HTML, JMeter, SOAP, REST, JDK, JSP, Servlets, JDBC, HTML, CSS, JUnit, SQL, MySQL, Windows, Oracle, Eclipse

We'd love your feedback!