Sr.Data Engineer Resume Piscataway, NJ - Hire IT People

SUMMARY

Around 8 years of experience in a various IT related technologies, which includes 6 years of hands - on experience in Big Data technologies.
Implementation and extensive working experience in wide array of tools in the Big Data Stack like HDFS, Spark, MapReduce, Hive, Pig, Flume, Oozie, Sqoop, Kafka, Zookeeper and HBase.
Around 4 years of experience in various Cloud ecosystems like AWS, GCP and Azure.
Experienced the software life cycle in SDLC, Agile, DevOps and Scrum methodologies including creating requirements, test plans.
Experienced in data transfer between HDFS and RDBMS using tools like Sqoop, Talend and Spark.
Experience in building optimized Hive Scripts for ETL Cleansing and transformations.
Expert in building various functionalities using UDF’s in Hive and Pig.
Implemented various optimizations techniques in ETL applications using Spark and Hive.
Implemented Optimized queries for BI applications using tools like Impala and Tez.
Built end to end job automations using Oozie bundle, coordinator and workflows.
Experience building distributed high-performance systems using Spark, Scala and Python.
Experience in working various optimized data formats like ORC, Parquet and AVRO
Experience in building Scala applications for loading data into NoSQL databases (MongoDB) .
Experience in creating data lake using Spark which is used for downstream applications.
Designed and implemented a product search service using Apache Solr.
Worked on building applications model’s using various machine learning libraries in Spark Mlib.
Experience in building applications with backend NoSQL stores like MongoDB and Cassandra.
Experience in building data lakes using optimized techniques in HBase.
Worked in AWS tools like EMR and EC2 web services which provides fast and efficient processing.
Experience in using AWS services like lambda, Glue, Dynamo DB, Redshift and EMR.
Expert in building applications using Java, Scala, SQL, PL/SQL and Restful services.
Built real time applications using tools like Apache Kafka, microservices and Spark Streaming.
Experience in automating end to end jobs in Airflow scheduler.
Build Code coverage dashboards using tools like Sonar integrations for code testability.
Developed alerting and logging reports using tools like Grafana and Kibana.
Working knowledge in building and using various components in Talend and NiFi.
Experience in building notebooks like jupyter notebook with various operations in spark python.
Expert in writing various scripts using Shell Script and Python Scripting.
Worked in different hadoop platforms distributions like Hortonworks, Cloudera and MapR.
Experience in working various cloud environments like AWS, Azure and GCP.
Worked on migration projects from on prem cluster to Azure HDInsight’s and Azure Databricks.
Worked on google cloud tools like BigQuery, Pub Sub, Cloud SQL, data proc and GCS.
Experience in building containerization tools and scripts like Docker and Kubernetes.
Experience in building BI applications reports using Tableau, Qlikview and Apache Superset.
Worked on supporting projects using technologies like Teradata and Informatica.
Experience in building deployments pipelines using Jenkins.
Worked in Agile environment and used rally and Jira tool to maintain the user stories and tasks.

PROFESSIONAL EXPERIENCE

Confidential, Piscataway, NJ

Sr.Data Engineer

Responsibilities:

Experience in building data applications using GCP services like BigQuery, Cloud Data Flow, Data Proc and GCS.
Implemented various servers on demand basis using DataProc for spark applications.
Developed various modules in spring applications in building REST Api’s.
Integrated stream sets applications with GCP cluster for data migration from on prem servers.
Implemented data streaming applications using Pub-sub services for real time data.
Used DataProc and PySpark to leverage the distributed framework on GCP to train and forecast the models in a distributed environment.
Experience in architecting, designing and operationalization of large-scale data and analytics solutions on Snowflake Cloud Data Warehouse.
Automated various applications using airflow for coordination purposes.
Developed Kubernetes scripts for deploying and automating applications runs.
Implemented various ETL spark applications using python and scala.
Developed various ETL scripts in BigQuery for various transformations.
Experience with Streaming and batch processing tools like Apache Beam, Spark.
Implemented various clod functions to enable automated data loads from various vendors.

Confidential, Manhattan, NY

Sr.Data Engineer

Responsibilities:

Experience in implementing data migration and data processing using Azure services like Azure Data Factory, Azure SQL DB, Event Hub, Azure Stream Analytics, HDInsight, Databricks Azure, Cosmos DB.
Built data pipelines that extract, classify, merge and deliver new insights on the data.
Developed Python and shell scripts to automate and schedule the workflows to run on Azure.
Built modern data solutions using Azure PaaS service to support visualization of data.
Advanced knowledge in performance troubleshooting and tuning of services in Azure like HD insight clusters, ADF, Databricks.
Designed and implemented data pipelines using ADF from on-premises onto SQL DW.
Used Apache NiFi for loading data from RDBMS to HDFS and HBase.
Developed Spark jobs using PySpark and Spark-SQL for data extraction, transformation & aggregation from multiple file formats.
Implemented various data modeling techniques for Cosmos DB.
Designed and implemented database solutions in Azure SQL data warehouse.
Experience in building Spark data pipelines and notebooks in Azure Databricks.
Built scalable application to pull data from Azure Event Hub to various azure services.
Developed various PySpark application for crunching ETL operations.
Consumed data from various rest end points and loaded it into Cosmos DB.
Extensively worked on Data science and Data engineering pipelines with azure Databricks as the environment
Analyzed AWS mechanism to pull the data to ADLS services in Blob storage
Worked on ADF in building the layers to inherit the functionalities to Databricks
Leading in migrating data pipelines from Azure HDInsight’s to Azure Databricks
Designed and developed data pipelines for IoT Sources using Azure Databricks
Experience in building Talend pipelines using various in built and custom components.
Automated end to end job actions in Airflow Scheduler for Event driven jobs.
Built custom dashboards using Power BI for reporting purpose on daily incremental data applications.

Confidential, Atlanta, Georgia

Data Engineer

Responsibilities:

Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
Responsible for fetching real-time data using Kafka and processing using Spark and Scala.
Worked on Kafka to import real-time weblogs and ingested the data to Spark Streaming.
Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
Data validation is performed using GLUE service for data check, jobs are been added and created to run at the scheduled time
Implemented real-time streaming ETL pipeline using Kafka Streams API.
Worked on Hive to implement Web Interfacing and stored the data in Hive tables.
Migrated Map Reduce programs into Spark transformations using Spark and Scala.
Experienced with Spark Context, Spark-SQL, Spark YARN.
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters.
Developed traits and case classes etc. in Scala.
Developed Spark scripts using Scala shell commands as per the business requirement.
Worked on Cloudera distribution and deployed on AWS EC2 Instances.
Worked on connecting the Cassandra database to the Amazon EMR File System for storing the database in S3.
Implemented usage of Amazon EMR for processing applications using virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
Deployed the project on Amazon EMR with S3 connectivity for setting backup storage.
Well versed in using Elastic Load Balancer for Autoscaling in EC2 servers.
Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint.

Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cloudera, Oracle 10g, Linux.

Confidential, New York, NYC

Data Engineer

Responsibilities:

Experience in developing Hive ETL scripts for data transformation.
Experience in converting Hive scripts to PySpark applications for faster ETL operations.
Used Sqoop to import data from Relational Databases like MySQL, Oracle.
Involved in importing structured and unstructured data into HDFS.
Created Hive views on top of flattened files and loaded them processing parallelly on the cluster by setting the configurations within a single configuration file.
Developed Java Map Reduce programs for the analysis of sample log file stored in cluster.
Developed simple to complex map/reduce jobs using Hive and Pig
Developed Map Reduce Programs for data analysis and data cleaning.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Worked in optimizing and upgrading spark applications to improve performance.
Automated various kinds of Hadoop jobs using automation tool Apache Oozie.
Developed Jenkins pipeline for continuous integration and deployment for production jobs.
Exposure towards streaming technologies like Apache Kafka and Spark Streaming.
Developed Shell scripts for validating data and sending production status to the team.
Converting SAS datasets into csv files using PySpark.
Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
Used Sqoop to import data into HDFS and Hive from other data systems.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
Developed Hive queries to process the data for visualizing.
Configured workflows that involve Hadoop actions using Oozie.
Worked on building integration pipelines using NiFi for various ingestions pipelines.

Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cloudera, Oracle 10g, Linux.

Confidential

Associate

Responsibilities:

Responsible for the analysis, documenting the requirements and architecting the application based on J2EE standards.
Attended Scrum meetings daily as a part of Agile Methodology.
Involved in complete Software Development Life Cycle (SDLC) with Object Oriented Approach of client's business process and continuous client feedback.
Implementing MVC Architecture using Spring Framework, customized user interfaces. Used Core Java, and Spring Aspect Oriented programming concepts for logging, security, error handling mechanism
Developed application modules using Spring MVC, Spring Annotations, Spring Beans, Dependency Injection, with database interface using Hibernate.
Used the Java Collections API extensively in the application as security protection for XML, SOAP, REST and JSON to make a secure Web Deployment.
Developed server-side services using Java, Spring, Web Services (SOAP, Restful, WSDL, JAXB, JAX-RPC)
Built Web pages that are more user-interactive using jQuery plugins for Drag and Drop, AutoComplete, AJAX, JSON, Angular JS, JavaScript and Bootstrap.
Used XSL to transform XML data structure into HTML pages.
Used Struts as the framework in this project and developed struts action classes, form beans.
Created dispatch Action classes, and Validation plug-in using Struts framework.
DB2 was used as the database and wrote queries to extract data from the database.
Developed SQL queries and stored procedures.
Designed Developed white-box test cases using JUnit, Git, JMeter, Mockito Framework.

Environment: Core Java, Agile, Scrum, XML, HTML, JMeter, SOAP, REST, JDK, JSP, Servlets, JDBC, HTML, CSS, JUnit, SQL, MySQL, Windows, Oracle, Eclipse

We provide IT Staff Augmentation Services!

Sr.data Engineer Resume

Piscataway, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship