We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

High Point, NC

SUMMARY:

  • Over 4+ years of experience in IT industry, played major role in implementing, developing and maintenance of various Web Based applications using Java and Big Data Ecosystem .
  • Around 3+ years of strong end to end experience in Hadoop, Spark, Cloud development using different Big Data tools .
  • Strong knowledge of Hadoop Architecture and Daemons such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts.
  • Expertise in importing and exporting data into HDFS and hive using Sqoop and vice versa.
  • Experience in Adding and removing the nodes in Hadoop Cluster .
  • Experience in extracting the data from RDBMS into HDFS sqoop
  • Experience in collecting the logs from log collector into HDFS using up Flume
  • Good understing of No SQL databases such as HBase.
  • Experience in analyzing data in HDFS through Map Reduce, Hive and pig.
  • Design, implement and review features and enhancements to Cassandra.
  • Experience in writing Map Reduce programs in Java.
  • Experienced in optimizing Hive Queries by tuning configuration parameters.
  • Involved in designing the data model in Hive for migrating the ETL process into Hadoop and wrote PIG Scripts to load data into Hadoop Environment.
  • Expert in data cleansing operations using Pig Latin transformations.
  • Hands on experience in NOSQL databases like Cassandra .
  • Experience in developing near real time workflows using spark streaming.
  • Experienced in using messaging queues like Kafka, Event Hub, and Service Bus.
  • Good Knowledge in Java, UNIX shell scripting, Linux, SQL Developer.
  • Extensive knowledge on Data ingestion, data processing, Batch analytics .

TECHNICAL SKILLS:

Big Data Stack: HDFS, Spark, Hive, Sqoop, Pig, MapReduce, Flume, Oozie, HBase, Kafka.

Programming Languages: Java, Scala, Python

Search Engine: Elastic Search

Databases and NoSQL DB s: Oracle, MySQL, SQL, Cassandra, MongoDB, DynamoDB

Cloud: Azure, AWS

Others: Shell Scripting, SBT, Jenkins

PROFESSIONAL EXPERIENCE:

Confidential, High Point, NC

Data Engineer

Responsibilities:

  • Implemented workflows to process around 400 messages per second and push the messages to the DocumentDB as well as Event Hubs.
  • Developed a custom message producer which can produce about 4000 messages per second for scalability testing.
  • Implemented call - back architecture and notification architecture for real time data.
  • Implemented spark streaming in scala to process the JSON Messages and push them to the kafka topic.
  • Created Custom Dashboards Using Aplication Insights and Aplication Insights Query Language to process metrics sent to AI and create dashboards on top of it in AZURE .
  • Created real time streaming dashboards in PowerBi using Stream Analytics to push dataset to PowerBi.
  • Developed a custom message consumer to consume the data from the kafka producer and push the messages to service bus and event hub (Azure Components).
  • Written Auto scalable functions which will consume the data from Azure Service Bus or Azure Event Hub and send the data to DocumentDB.
  • Written spark Application to capture the change feed from the DocumentDB using java API and write updates to the new DocumentDB .
  • Implemented Zero Down Time deployment for the entire production pipelines in Azure.
  • Implemented CICD pipelines to build and deploy the projects in Hadoop environment.
  • Experienced in implementing the pipelines in Jenkins.
  • Used Custom Receiver, socket stream, File stream and Directory stream in spark streaming.
  • Used Lambda, Kinesis, DynamoDB, cloudwatch from AWS.
  • Used APP Insights, DocumentDB, Service Bus, Azure Data Lake Store, Azure Blob Store, Event HUB, Azure Functions.
  • Developed Python code to gather the data from HBase and designs the solution to implement using Pyspark.
  • Developed Hadoop streaming Jobs using python for integrating python API supported applications.
  • Used Python to run the ansible playbook which will deploy the logic apps to azure.

Environment : Hadoop, Hive, HDFS, Azure, AWS, spark Streaming, spark-sql, scala, python, Java, webserver’s, Maven Build, Jenkins, Ansible.

Confidential, DE

Data Engineer

Responsibilities:

  • Developed daily process to do Incremental import of Data from DB2 and Oracle into HDFS using Sqoop.
  • Extensively used Pig for Data transformation and Data cleansing.
  • Experienced in converting Hive scripts into spark using Scala and optimized spark jobs.
  • Developed Python code to gather the data from HBase and designs the solution to implement using Pyspark.
  • Experience in writing Store procedures to transform the data in Microsoft sql server.
  • Scheduled multiple spark jobs in Oozie scheduler.
  • Developed Error logging script to load the Ingestion logs into HBase table.
  • Worked on CI/CD deployments using Jenkins.
  • Ingested XML Files captured from RabbitMQ and stored in HBase table.
  • Developed Complex Hive queries to analyze Data.

Environment: MAPR, Spark, Hive, Sqoop, Spark, HBase, Oozie, Pig, Scala, Unix, Agile, Code hub, Splunk, Pyspark.

Confidential

Hadoop / Java developer

Responsibilities:

  • Developed Hive queries as per required analytics for the report generation
  • Involved in developing the Pig scripts to process the data coming from different sources
  • Worked on data cleaning using pig scripts and storing in HDFS.
  • Worked on PIG user define functions (UDF) using java language for external functions
  • Developed data requirements, performed database queries to identify test data, and to create data procedures with expected results.
  • Planned, coordinated and managed internal process documentation and presentations, describing the Process Improvement identified along with the Workflow, Diagrams associated with the Process Flow.
  • Scheduling jobs to automate the process for regular executing jobs worked on using OOZIE.
  • Moving data to HDFS frame work using SQOOP.
  • Expertise on HIVE optimisation techniques like partioning and bucketing on the différent formates of data
  • Worked on UDF in HIveusing JAVA.
  • Expertise on PIG Joins to handle the data in Différent data sets

Environment: : Hadoop, Hive, Pig, Sqoop, HBase, MapReduce, Java, Python

Confidential

Java Developer

Responsibilities:

  • Involved in the implementation of service layer and DAO.
  • Responsible for designing JSPs as per the requirements.
  • Developer Core Java applications
  • Worked on JDBC, Collections, Multithreading, Collection API and Generics, File Handling.
  • Experience in working on Java collections
  • Worked SQl queries to create databases and tables and loading the data using SQl queries
  • Worked on java exceptions.
  • Involved in developing and deploying the Server Side components.
  • Fixed/Float Interest rates projections
  • Coding, debugging and bug fixing.

Environment: Core Java, JDBC, Collections, Multithreading, Hibernate, Collection API and Generics, File Handling, SQL Server 2005, Tortoise SVN, J2EE

We'd love your feedback!