We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

3.00/5 (Submit Your Rating)

PROFESSIONAL EXPERIENCE:

Confidential

Sr. Data Engineer

Responsibilities:

  • Responsible for Digital Transformation from the legacy BW to Cloud Datawarehouse. Worked on building different use cases for the data which was being manually maintained in different manufacturing data warehouses. Core member of Cloud Analytics Selection Team for selecting a Cloud platform for which is suitable for many of our organization use cases. Experienced in creating Dataproc and Dataflow clusters on GCP for running all the computations on Cloud. Developed CI/CD pipelines using Python, Spark and Spark - SQL for data extraction, transformation, pivoting and aggregating to the specified format as per business requirement. Loading data every 15 min on incremental basis to Bigquery using Google DataProc, Pyspark, Gsutil and Shell Script.
  • Used Google Cloud Composer, Airflow for creating data pipelines automation from Cloud storage to a Bigquery. Experience in creating queries on Bigquery for different data sets and integrating that to Power BI for creating dashboards. Using rest API with Python to ingest Data from and some other site to Bigquery. Created dashboard on Power BI for Data visualization and reporting for quarterly and monthly reports. Worked on a POC for integration of Snowflake to
  • AWS for one of our data use case, created a demo data pipeline using AWS S3 bucket, Glue and by using python transformations and ingested curated dataset to Snowflake. Worked on POC with AWS Sagemaker for one of our ML and AI use cases and on GCP with Vertex AI for Comparison between both the platforms. Expert in monitoring Bigquery, Dataproc and cloud Data flow jobs via Monitoring agent for GCP.Client Confidential Designation Sr. Data Engineer

Confidential

Sr. Data Engineer

Responsibilities:

  • Developed external vendor file export pipelines using Spark, Hive, Python, Scala, and Shell scripting. Implemented optimized spark Scala data pipelines for aggregating large amounts of data. Worked on building on generating business reports from custom vendor data platform. Developed integrations pipelines for SFTP, Cloud storages like S3 and GCS. Developed Spark applications using Spark - SQL for data extraction, transformation and aggregating to a specified format for transforming and analyzing the data to uncover insights into customer requested formats. Experienced in SQL, data
  • Transformations, statistical analysis and troubleshooting across more than one Database Platform (MySQL, PostgreSQL, Teradata, and Azure SQL warehouse). Migrated existing data pipelines from Hortonworks Platform 2 to Hortonworks Platform 3. Implemented data pipelines automation using Oozie and internally open-sourced tools like automation portal. Implemented reporting layer on top of Apache Druid, for incrementally updates to business reports. Expert in building Hive optimized queries on top of large volumes of data in different data formats. Developed Continuous deployment process using container-based tools like Drone. Implemented Docker pipelines for testing and validation in integration and deployment process. Developed end to end unit testing and integration testing for data pipelines using PySpark. Developed daily metrics pipelines and exposed it through Grafana dashboard with alerting.

Confidential

Sr.Data Engineer

Responsibilities:

  • Implemented Real - time data pipelines for streaming analytics Using Kafka, Spark Streaming with Scala. Working on migrating on premise cluster data into Azure Cloud for implementing real-time features.
  • Created Custom Dashboards Using Application Insights and Application Insights Query Language to process metrics sent to AI and create dashboards on top of it in AZURE. Created real time streaming dashboards in Power BI using Stream Analytics to push dataset to Power Bi. Developed a custom message consumer to consume the data from the Kafka producer and push the messages to service bus and event hub (Azure Components). Implemented Spark ETL jobs in Azure HD insights for ETL Operations in Cloud.
  • Implemented CI-CD pipelines to build and deploy the projects in Hadoop environment using Jenkins. Implemented data platform in Hive data warehouse for on premise use and archival purpose.

Confidential

Sr.Big Data Engineer

Responsibilities:

  • Extracted the data from Teradata & MySQL into HDFS using Sqoop export/import. Developed Sqoop jobs with incremental load to populate Hive External tables. Expertise in using design patterns in Map Reduce to convert business data into custom format. Experienced with handling different compression codec's like LZO, GZIP, and Snappy. Expert in optimizing performance in hive using partitions and bucketing concepts.
  • Experience on working hive dynamic partition to overcome hive locking mechanism. Developed UDFs in Java as and when necessary to use in HIVE queries. Developed crontab for scheduling and orchestrating the ETL process. Involved in indexing hive data using Solr and prepare custom tokenizer formats for querying. Involved in designing a real time computation engine using Kafka. Worked on POC to set up spark streaming data to Solr and perform indexing on it.
  • Experienced with writing build jobs using Maven and integrate that with Jenkins. Ingested data from AWS cloud buckets for third party data.

Confidential

Big Data Engineer

Responsibilities:

  • Developed oozie automations using custom MapReduce, Pig, Hive, Sqoop. Built reusable Hive UDF libraries for business which enables users to reuse. Expertise in performance tuning on Hive queries, Joins and different configuration parameters to improve query response time. Created Partitions, Buckets based on state to further process using Bucked based Hive joins.
  • Used Cassandra CQL with Java API's to retrieve data from Cassandra table. Developed applications on spark as part of Next gen platform implementation. Implemented Data Ingestion in real time processing using Kafka. Developed Data pipeline using Kafka and Storm to store Data into HDFS. Used Apache Maven extensively while developing MapReduce program.
  • Extensively worked on PIG Scripts and Pig UDF's to perform ETL activities. Developed spark scripts using Python. Developed workflow in Oozie to automate the tasks. Collected Logs data from web servers and loaded into HDFS using Flume.

Confidential

Hadoop Developer

Responsibilities:

  • Understand the exact requirement of report from the Business groups and users. Imported trading and derivatives data in Hadoop Distributed File System using Eco System components MapReduce, Pig, Hive, Sqoop. Responsible writing Hive queries and PIG scripts for data processing. Running Sqoop for importing data from Oracle and another Database. Created of shell scripts to collect raw logs from different machines. Created Hive as static and dynamic partitions.
  • Optimized script using illustrate and explain and used parameterize Pig Script. Defined some PIG UDFs for some functions such as swap, hedging, Speculation and arbitrage. Unstructured logs files are coded using MapReduce program. Imported and exported data into
  • HDFS and Hive using Sqoop. Involved in the process of configuring HA, Kerberos security issues and name node failure restoration activity time to time as a part of zero downtime. Developed JUNIT test cases for application unit testing. Used SVN as version control to check in the code, created branches and tagged the code in SVN.

Confidential

Java/J2EE Developer

Responsibilities:

  • Developed the modules based on Struts MVC Architecture. Developed business components using Core Java concepts and classes like Inheritance, Polymorphism, Collections, Serialization and Multithreading etc.

    Developed the Web Interface using Servlets, Java Server Pages, HTML and CSS. Developed the DAO objects using JDBC. Used Spring Framework for Dependency injection and integrated with the Struts

    Framework and Hibernate. Used Log4j to capture the log that includes runtime exceptions, monitored error logs, and fixed the problems. Performed Unit Testing, System Testing, and Integration Testing.

    Provided technical support for production environments resolving the issues, analyzing the defects, providing, and implementing the solution defects.

We'd love your feedback!