Sr.data Engineer Resume
Piscataway, NJ
SUMMARY
- Around 8 years of experience in a various IT related technologies, which includes 6 years of hands - on experience in Big Data technologies.
- Implementation and extensive working experience in wide array of tools in the Big Data Stack like HDFS, Spark, MapReduce, Hive, Pig, Flume, Oozie, Sqoop, Kafka, Zookeeper and HBase.
- Around 4 years of experience in various Cloud ecosystems like AWS, GCP and Azure.
- Experienced the software life cycle in SDLC, Agile, DevOps and Scrum methodologies including creating requirements, test plans.
- Experienced in data transfer between HDFS and RDBMS using tools like Sqoop, Talend and Spark.
- Experience in building optimized Hive Scripts for ETL Cleansing and transformations.
- Expert in building various functionalities using UDF’s in Hive and Pig.
- Implemented various optimizations techniques in ETL applications using Spark and Hive.
- Implemented Optimized queries for BI applications using tools like Impala and Tez.
- Built end to end job automations using Oozie bundle, coordinator and workflows.
- Experience building distributed high-performance systems using Spark, Scala and Python.
- Experience in working various optimized data formats like ORC, Parquet and AVRO
- Experience in building Scala applications for loading data into NoSQL databases (MongoDB) .
- Experience in creating data lake using Spark which is used for downstream applications.
- Designed and implemented a product search service using Apache Solr.
- Worked on building applications model’s using various machine learning libraries in Spark Mlib.
- Experience in building applications with backend NoSQL stores like MongoDB and Cassandra.
- Experience in building data lakes using optimized techniques in HBase.
- Worked in AWS tools like EMR and EC2 web services which provides fast and efficient processing.
- Experience in using AWS services like lambda, Glue, Dynamo DB, Redshift and EMR.
- Expert in building applications using Java, Scala, SQL, PL/SQL and Restful services.
- Built real time applications using tools like Apache Kafka, microservices and Spark Streaming.
- Experience in automating end to end jobs in Airflow scheduler.
- Build Code coverage dashboards using tools like Sonar integrations for code testability.
- Developed alerting and logging reports using tools like Grafana and Kibana.
- Working knowledge in building and using various components in Talend and NiFi.
- Experience in building notebooks like jupyter notebook with various operations in spark python.
- Expert in writing various scripts using Shell Script and Python Scripting.
- Worked in different hadoop platforms distributions like Hortonworks, Cloudera and MapR.
- Experience in working various cloud environments like AWS, Azure and GCP.
- Worked on migration projects from on prem cluster to Azure HDInsight’s and Azure Databricks.
- Worked on google cloud tools like BigQuery, Pub Sub, Cloud SQL, data proc and GCS.
- Experience in building containerization tools and scripts like Docker and Kubernetes.
- Experience in building BI applications reports using Tableau, Qlikview and Apache Superset.
- Worked on supporting projects using technologies like Teradata and Informatica.
- Experience in building deployments pipelines using Jenkins.
- Worked in Agile environment and used rally and Jira tool to maintain the user stories and tasks.
PROFESSIONAL EXPERIENCE
Confidential, Piscataway, NJ
Sr.Data Engineer
Responsibilities:
- Experience in building data applications using GCP services like BigQuery, Cloud Data Flow, Data Proc and GCS.
- Implemented various servers on demand basis using DataProc for spark applications.
- Developed various modules in spring applications in building REST Api’s.
- Integrated stream sets applications with GCP cluster for data migration from on prem servers.
- Implemented data streaming applications using Pub-sub services for real time data.
- Used DataProc and PySpark to leverage the distributed framework on GCP to train and forecast the models in a distributed environment.
- Experience in architecting, designing and operationalization of large-scale data and analytics solutions on Snowflake Cloud Data Warehouse.
- Automated various applications using airflow for coordination purposes.
- Developed Kubernetes scripts for deploying and automating applications runs.
- Implemented various ETL spark applications using python and scala.
- Developed various ETL scripts in BigQuery for various transformations.
- Experience with Streaming and batch processing tools like Apache Beam, Spark.
- Implemented various clod functions to enable automated data loads from various vendors.
Confidential, Manhattan, NY
Sr.Data Engineer
Responsibilities:
- Experience in implementing data migration and data processing using Azure services like Azure Data Factory, Azure SQL DB, Event Hub, Azure Stream Analytics, HDInsight, Databricks Azure, Cosmos DB.
- Built data pipelines that extract, classify, merge and deliver new insights on the data.
- Developed Python and shell scripts to automate and schedule the workflows to run on Azure.
- Built modern data solutions using Azure PaaS service to support visualization of data.
- Advanced knowledge in performance troubleshooting and tuning of services in Azure like HD insight clusters, ADF, Databricks.
- Designed and implemented data pipelines using ADF from on-premises onto SQL DW.
- Used Apache NiFi for loading data from RDBMS to HDFS and HBase.
- Developed Spark jobs using PySpark and Spark-SQL for data extraction, transformation & aggregation from multiple file formats.
- Implemented various data modeling techniques for Cosmos DB.
- Designed and implemented database solutions in Azure SQL data warehouse.
- Experience in building Spark data pipelines and notebooks in Azure Databricks.
- Built scalable application to pull data from Azure Event Hub to various azure services.
- Developed various PySpark application for crunching ETL operations.
- Consumed data from various rest end points and loaded it into Cosmos DB.
- Extensively worked on Data science and Data engineering pipelines with azure Databricks as the environment
- Analyzed AWS mechanism to pull the data to ADLS services in Blob storage
- Worked on ADF in building the layers to inherit the functionalities to Databricks
- Leading in migrating data pipelines from Azure HDInsight’s to Azure Databricks
- Designed and developed data pipelines for IoT Sources using Azure Databricks
- Experience in building Talend pipelines using various in built and custom components.
- Automated end to end job actions in Airflow Scheduler for Event driven jobs.
- Built custom dashboards using Power BI for reporting purpose on daily incremental data applications.
Confidential, Atlanta, Georgia
Data Engineer
Responsibilities:
- Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
- Responsible for fetching real-time data using Kafka and processing using Spark and Scala.
- Worked on Kafka to import real-time weblogs and ingested the data to Spark Streaming.
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
- Data validation is performed using GLUE service for data check, jobs are been added and created to run at the scheduled time
- Implemented real-time streaming ETL pipeline using Kafka Streams API.
- Worked on Hive to implement Web Interfacing and stored the data in Hive tables.
- Migrated Map Reduce programs into Spark transformations using Spark and Scala.
- Experienced with Spark Context, Spark-SQL, Spark YARN.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
- Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
- Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters.
- Developed traits and case classes etc. in Scala.
- Developed Spark scripts using Scala shell commands as per the business requirement.
- Worked on Cloudera distribution and deployed on AWS EC2 Instances.
- Worked on connecting the Cassandra database to the Amazon EMR File System for storing the database in S3.
- Implemented usage of Amazon EMR for processing applications using virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
- Deployed the project on Amazon EMR with S3 connectivity for setting backup storage.
- Well versed in using Elastic Load Balancer for Autoscaling in EC2 servers.
- Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint.
Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cloudera, Oracle 10g, Linux.
Confidential, New York, NYC
Data Engineer
Responsibilities:
- Experience in developing Hive ETL scripts for data transformation.
- Experience in converting Hive scripts to PySpark applications for faster ETL operations.
- Used Sqoop to import data from Relational Databases like MySQL, Oracle.
- Involved in importing structured and unstructured data into HDFS.
- Created Hive views on top of flattened files and loaded them processing parallelly on the cluster by setting the configurations within a single configuration file.
- Developed Java Map Reduce programs for the analysis of sample log file stored in cluster.
- Developed simple to complex map/reduce jobs using Hive and Pig
- Developed Map Reduce Programs for data analysis and data cleaning.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Worked in optimizing and upgrading spark applications to improve performance.
- Automated various kinds of Hadoop jobs using automation tool Apache Oozie.
- Developed Jenkins pipeline for continuous integration and deployment for production jobs.
- Exposure towards streaming technologies like Apache Kafka and Spark Streaming.
- Developed Shell scripts for validating data and sending production status to the team.
- Converting SAS datasets into csv files using PySpark.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
- Developed Hive queries to process the data for visualizing.
- Configured workflows that involve Hadoop actions using Oozie.
- Worked on building integration pipelines using NiFi for various ingestions pipelines.
Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cloudera, Oracle 10g, Linux.
Confidential
Associate
Responsibilities:
- Responsible for the analysis, documenting the requirements and architecting the application based on J2EE standards.
- Attended Scrum meetings daily as a part of Agile Methodology.
- Involved in complete Software Development Life Cycle (SDLC) with Object Oriented Approach of client's business process and continuous client feedback.
- Implementing MVC Architecture using Spring Framework, customized user interfaces. Used Core Java, and Spring Aspect Oriented programming concepts for logging, security, error handling mechanism
- Developed application modules using Spring MVC, Spring Annotations, Spring Beans, Dependency Injection, with database interface using Hibernate.
- Used the Java Collections API extensively in the application as security protection for XML, SOAP, REST and JSON to make a secure Web Deployment.
- Developed server-side services using Java, Spring, Web Services (SOAP, Restful, WSDL, JAXB, JAX-RPC)
- Built Web pages that are more user-interactive using jQuery plugins for Drag and Drop, AutoComplete, AJAX, JSON, Angular JS, JavaScript and Bootstrap.
- Used XSL to transform XML data structure into HTML pages.
- Used Struts as the framework in this project and developed struts action classes, form beans.
- Created dispatch Action classes, and Validation plug-in using Struts framework.
- DB2 was used as the database and wrote queries to extract data from the database.
- Developed SQL queries and stored procedures.
- Designed Developed white-box test cases using JUnit, Git, JMeter, Mockito Framework.
Environment: Core Java, Agile, Scrum, XML, HTML, JMeter, SOAP, REST, JDK, JSP, Servlets, JDBC, HTML, CSS, JUnit, SQL, MySQL, Windows, Oracle, Eclipse