Data Engineer Resume
High Point, NC
SUMMARY:
- Over 4+ years of experience in IT industry, played major role in implementing, developing and maintenance of various Web Based applications using Java and Big Data Ecosystem .
- Around 3+ years of strong end to end experience in Hadoop, Spark, Cloud development using different Big Data tools .
- Strong knowledge of Hadoop Architecture and Daemons such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts.
- Expertise in importing and exporting data into HDFS and hive using Sqoop and vice versa.
- Experience in Adding and removing the nodes in Hadoop Cluster .
- Experience in extracting the data from RDBMS into HDFS sqoop
- Experience in collecting the logs from log collector into HDFS using up Flume
- Good understing of No SQL databases such as HBase.
- Experience in analyzing data in HDFS through Map Reduce, Hive and pig.
- Design, implement and review features and enhancements to Cassandra.
- Experience in writing Map Reduce programs in Java.
- Experienced in optimizing Hive Queries by tuning configuration parameters.
- Involved in designing the data model in Hive for migrating the ETL process into Hadoop and wrote PIG Scripts to load data into Hadoop Environment.
- Expert in data cleansing operations using Pig Latin transformations.
- Hands on experience in NOSQL databases like Cassandra .
- Experience in developing near real time workflows using spark streaming.
- Experienced in using messaging queues like Kafka, Event Hub, and Service Bus.
- Good Knowledge in Java, UNIX shell scripting, Linux, SQL Developer.
- Extensive knowledge on Data ingestion, data processing, Batch analytics .
TECHNICAL SKILLS:
Big Data Stack: HDFS, Spark, Hive, Sqoop, Pig, MapReduce, Flume, Oozie, HBase, Kafka.
Programming Languages: Java, Scala, Python
Search Engine: Elastic Search
Databases and NoSQL DB s: Oracle, MySQL, SQL, Cassandra, MongoDB, DynamoDB
Cloud: Azure, AWS
Others: Shell Scripting, SBT, Jenkins
PROFESSIONAL EXPERIENCE:
Confidential, High Point, NC
Data Engineer
Responsibilities:
- Implemented workflows to process around 400 messages per second and push the messages to the DocumentDB as well as Event Hubs.
- Developed a custom message producer which can produce about 4000 messages per second for scalability testing.
- Implemented call - back architecture and notification architecture for real time data.
- Implemented spark streaming in scala to process the JSON Messages and push them to the kafka topic.
- Created Custom Dashboards Using Aplication Insights and Aplication Insights Query Language to process metrics sent to AI and create dashboards on top of it in AZURE .
- Created real time streaming dashboards in PowerBi using Stream Analytics to push dataset to PowerBi.
- Developed a custom message consumer to consume the data from the kafka producer and push the messages to service bus and event hub (Azure Components).
- Written Auto scalable functions which will consume the data from Azure Service Bus or Azure Event Hub and send the data to DocumentDB.
- Written spark Application to capture the change feed from the DocumentDB using java API and write updates to the new DocumentDB .
- Implemented Zero Down Time deployment for the entire production pipelines in Azure.
- Implemented CICD pipelines to build and deploy the projects in Hadoop environment.
- Experienced in implementing the pipelines in Jenkins.
- Used Custom Receiver, socket stream, File stream and Directory stream in spark streaming.
- Used Lambda, Kinesis, DynamoDB, cloudwatch from AWS.
- Used APP Insights, DocumentDB, Service Bus, Azure Data Lake Store, Azure Blob Store, Event HUB, Azure Functions.
- Developed Python code to gather the data from HBase and designs the solution to implement using Pyspark.
- Developed Hadoop streaming Jobs using python for integrating python API supported applications.
- Used Python to run the ansible playbook which will deploy the logic apps to azure.
Environment : Hadoop, Hive, HDFS, Azure, AWS, spark Streaming, spark-sql, scala, python, Java, webserver’s, Maven Build, Jenkins, Ansible.
Confidential, DE
Data Engineer
Responsibilities:
- Developed daily process to do Incremental import of Data from DB2 and Oracle into HDFS using Sqoop.
- Extensively used Pig for Data transformation and Data cleansing.
- Experienced in converting Hive scripts into spark using Scala and optimized spark jobs.
- Developed Python code to gather the data from HBase and designs the solution to implement using Pyspark.
- Experience in writing Store procedures to transform the data in Microsoft sql server.
- Scheduled multiple spark jobs in Oozie scheduler.
- Developed Error logging script to load the Ingestion logs into HBase table.
- Worked on CI/CD deployments using Jenkins.
- Ingested XML Files captured from RabbitMQ and stored in HBase table.
- Developed Complex Hive queries to analyze Data.
Environment: MAPR, Spark, Hive, Sqoop, Spark, HBase, Oozie, Pig, Scala, Unix, Agile, Code hub, Splunk, Pyspark.
Confidential
Hadoop / Java developer
Responsibilities:
- Developed Hive queries as per required analytics for the report generation
- Involved in developing the Pig scripts to process the data coming from different sources
- Worked on data cleaning using pig scripts and storing in HDFS.
- Worked on PIG user define functions (UDF) using java language for external functions
- Developed data requirements, performed database queries to identify test data, and to create data procedures with expected results.
- Planned, coordinated and managed internal process documentation and presentations, describing the Process Improvement identified along with the Workflow, Diagrams associated with the Process Flow.
- Scheduling jobs to automate the process for regular executing jobs worked on using OOZIE.
- Moving data to HDFS frame work using SQOOP.
- Expertise on HIVE optimisation techniques like partioning and bucketing on the différent formates of data
- Worked on UDF in HIveusing JAVA.
- Expertise on PIG Joins to handle the data in Différent data sets
Environment: : Hadoop, Hive, Pig, Sqoop, HBase, MapReduce, Java, Python
Confidential
Java Developer
Responsibilities:
- Involved in the implementation of service layer and DAO.
- Responsible for designing JSPs as per the requirements.
- Developer Core Java applications
- Worked on JDBC, Collections, Multithreading, Collection API and Generics, File Handling.
- Experience in working on Java collections
- Worked SQl queries to create databases and tables and loading the data using SQl queries
- Worked on java exceptions.
- Involved in developing and deploying the Server Side components.
- Fixed/Float Interest rates projections
- Coding, debugging and bug fixing.
Environment: Core Java, JDBC, Collections, Multithreading, Hibernate, Collection API and Generics, File Handling, SQL Server 2005, Tortoise SVN, J2EE