Data Engineer Resume

SUMMARY

Searching for the opportunity to bring 8 years of programming, technology, and engineering expertise in developing software’s while incorporating critical thinking, problem solving, and leadership.

TECHNICAL SKILLS

Hadoop Core Services: HDFS, MapReduce, Hadoop YARN, Spark

Hadoop Data Services: Apache Hive, Sqoop, Flume, Kafka

Hadoop Distributions: Hortonworks, Cloudera

Hadoop Operational Services: Apache Zookeeper, Oozie

Cloud Computing Services: AWS ( Confidential Web Services), Confidential EC2, EMR, AZURE, AAS

IDE Tools: Eclipse, NetBeans, IntelliJ, PyCharm

Programming Languages: C, Java, Unix Shell scripting, Scala, Python

Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS

Databases: Confidential, DB2, Database (Hbase, Cassandra), Confidential

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential, MINNESOTA, MN

Responsibilities:

Working as a Data Engineer for a Supply Planning team, wherein I analyzed large data sets to provide best possible insights.
Ingested the data from various data sources ( Confidential DB, Confidential, Snowflake) into AWS - S3 using the Spark-Scala JDBC connectors and snowflake connectors.
Created DAG’s for the Ingestion Jobs in Airflow using Python and scheduled using the Cron expression.
Created both external and internal tables in Hive on top of S3 data. Worked on Dynamic Partitioning.
Developed ETL jobs using PySpark,DataLiniage in which the data has been transformed in multiple stages and actions like aggregations are performed.
Hands on Experience on HL7 v2 and FHIR
Used HL7 V2 for exchanging the customer information.
Used HL7 v2 for data exchange and used FHIR with XML language.
Actively involved in creating Data Pipelines.
Involved in designing the Architecture for creating Data pipelines.
Ingesting data sources from Confidential to s3 and hdfs and developing ETL’s.
Created Dataframes and RDD’s using Spark by reading the data from Hive Tables and parquet files in S3.
Fine Tuning and Optimizing the Spark Jobs using different configurations, cache and broadcast joins.
The Result Data set is stored in S3 and Snowflake for the Visualization - Tableau reports.
Read the spreadsheet (.xls, .xlsx) files using Spark Scala by integrating zuinnote - which is a spark data source for hadoopoffice library.
Experience in implementing OLAP multi-dimensional cube functionality using Azure SQL Data Warehouse.
Used Data Liniage tool for Mapping Analysis.
Understand the Client requirements and designed the best possible approach to meet the Customer Use case.
Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, SQOOP, Advance SQL Saavy,Python, Spark,Hortonworks - 2.5.3, HDFS, Sqoop, Confidential - 12c, HBase, Spark - 2.3, Confidential, Azure Analytics, AWS, EMR

BIG DATA DEVELOPER

Confidential, CHICAGO - IL

Responsibilities:

Developed Sqoop scripts for importing data from RDBMS to Hadoop.
Used custom framework (Aorta) and YAML script files which would internally invoke SQOOP and hive.
Scheduled automated jobs using Automic scheduler and responsible to manage data coming from different sources.
Created logics to handle History load and incremental data.
Involved in creating Hive Tables, loading with data and writing Hive queries.
Implemented the workflows using Automic framework to automate tasks.
Performed Data Quality checks to validate the data is as per the defined standards.
Worked on CICD pipeline, integrating code changes to Git repository and build using Jenkins.
Read the ORC files and create Data frames to use in Spark
Experienced working with Spark Core and Spark SQL using Scala as programming language.
Performed data transformations and analytics on large dataset using Spark.
Used Confidential S3 for storage as replacement to HDFS.
Well versed knowledge on Spark Streaming API’s.
Good knowledge on using interactive notebooks like Jupiter/Zeppelin
POC on Confidential EMR to check the feasibility of moving to cloud and to be future ready.

Environment: Hortonworks, Hadoop, Spark, HDFS, YARN, Oozie,Hive, Linux, Java, Automic, SQL, Aorta, YAML, Confidential, Informix, Spark, Scala, AWS, S3, Confidential -EMR.

Hadoop Developer

Confidential, RICHMOND, VA

Responsibilities:

Ingested the data from various data sources ( Confidential DB, FTP, Couchbase) into HDFS.
Developed a Shell Script and Python script for running Sqoop commands, exception Handling and storing the logs while ingesting the Transactional and Electronic Invoice data from Confidential to HDFS.
Created Hive External Tables for the incremental imports into Hive using Ingest, Reconcile, Compact and Purge Strategy.
Experienced in migrating HiveQL into Impala to minimize query response time.
Created both external and internal tables in Hive. Worked on Partitions and bucketing.
Fine Tuning and Productionizing the Confidential SQl queries that are running for long time in a queue.
Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity
Worked extensively on building Nifi data pipelines in development phase.
Performed Data analysis on the data stored using Spark Scala.
Case class, object creation and creating data frames, rdds. Applying Transformations and Actions.
Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
Worked on AWS to create, manage EC2 instances and Hadoop Clusters. Involved in connecting to target database to get data.
POC involved in loading data from LINUX file system to AWS S3 and HDFS.
Using Python, able to connect to the SFTP server and get all the OMS (Order Management System) and Post Tax files into HDFS.
Developed a Spark application using Scala, which parses the data in HDFS and ingests only the filtered data into Hbase and Solr.
Created Data frames using Spark SQL and performed a JOIN over large number of tables and the resulting data frame is ingested into Hbase and Solr.
Created a table in Hbase with Column Families and Column Qualifiers and inserted the data using REST API.
Using Solr, metadata is stored from which the UI reads the data and projects it to the end-user.
Understand the Client requirements and designed the best possible approach to meet the Customer Use case.
Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, SQOOP, NIFI, Splunk, Python, Spark,Hortonworks - 2.5.3, HDFS, Sqoop, Confidential - 12c, HBase, Spark - 2.0, Solr

Hadoop/Data engineer

Confidential, TENNESSee

Responsibilities:

Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Data Ingestion into the Indie-Data Lake using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Open source.
Apache tools like FLUME and SQOOP into HIVE environment.
Expertise in Hive queries, created user defind aggregated function worked on advanced optimization techniqus and have extensive knowledge on joins.
Developed Sqoop Scripts to extract data from DB2 EDW source databases onto HDFS.
We used the most popular streaming tool KAFKA to load the data on Hadoop File system and move the same data to Cassandra NoSQL database.
Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, CDH4.x .
Gained Knowledge on Confidential AWS services, created a EC2 instance, S3 storage.
Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
Experience in managing and reviewing Hadoop log files.
Tested raw data and executed performance scripts using MRUnit.

Environment: Hadoop, Hive, Talend, Map Reduce, Pig, SQOOP, Splunk, CDH5, Python,Cloudera Manager CM 5.1.1,HDFS, Pig, DB2, Sqoop, Oozie, Putty, Java.

JR JAVA DEVELOPER

Confidential

Responsibilities:

Analysis, design and development of Application based on J2EE using Struts and Hibernate.
Involved in interacting with the Business Analyst and Architect during the Sprint Planning Sessions.
Implemented Point to Point JMS queues and MDB's to fetch diagnostic details across various interfaces.
Worked with WebSphere business integration technologies as WebSphere MQ and Message Broker 7.0 (Middleware tools) on Various Operating systems.
Configured WebSphere resources including JDBC providers, JDBC data sources, connection pooling, and JavaMail sessions. Deployed Session and Entity EJBs in WebSphere
Developed Rich user interface using RIA, HTML, JSP, JSTL, JavaScript, Jquery, CSS, YUI, AUI using Liferay portal.
Worked on new Portal theme for the website using Liferay and customize for the look and feel.
Experience in all aspects of Angular JS like "Routing", "modularity", "Dependency injection", "Service calls" and "Custom directives" for development of single page applications.
Hibernate was used for Object Relational mapping with Confidential database.
Involved in developing the user interface using Struts tags, core java development involving concurrency/multi-threading, struts-hibernate integration, database operation tasks.
Integrated Struts and Hibernate ORM framework for persistence and used Hibernate DAO Support with Hibernate Template to access the data.
Implemented core java functionalities like collections, multi-threading, Exception handling.
Involved in unit testing using JUnit 4.
Performed Code optimization and rewriting the database queries to resolve performance related issues in the application.
Implemented DAO classes which in turn use JDBC to communicate and retrieve the information from DB2 database sitting on Linux/UNIX server.
Involved in writing SQL, PL/SQL stored procedures using PL/SQL Developer.
Used Eclipse as IDE for application development.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship