Data Engineer Resume
SUMMARY
- Searching for the opportunity to bring 8 years of programming, technology, and engineering expertise in developing software’s while incorporating critical thinking, problem solving, and leadership.
TECHNICAL SKILLS
Hadoop Core Services: HDFS, MapReduce, Hadoop YARN, Spark
Hadoop Data Services: Apache Hive, Sqoop, Flume, Kafka
Hadoop Distributions: Hortonworks, Cloudera
Hadoop Operational Services: Apache Zookeeper, Oozie
Cloud Computing Services: AWS ( Confidential Web Services), Confidential EC2, EMR, AZURE, AAS
IDE Tools: Eclipse, NetBeans, IntelliJ, PyCharm
Programming Languages: C, Java, Unix Shell scripting, Scala, Python
Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS
Databases: Confidential, DB2, Database (Hbase, Cassandra), Confidential
PROFESSIONAL EXPERIENCE
Data EngineerConfidential, MINNESOTA, MN
Responsibilities:
- Working as a Data Engineer for a Supply Planning team, wherein I analyzed large data sets to provide best possible insights.
- Ingested the data from various data sources ( Confidential DB, Confidential, Snowflake) into AWS - S3 using the Spark-Scala JDBC connectors and snowflake connectors.
- Created DAG’s for the Ingestion Jobs in Airflow using Python and scheduled using the Cron expression.
- Created both external and internal tables in Hive on top of S3 data. Worked on Dynamic Partitioning.
- Developed ETL jobs using PySpark,DataLiniage in which the data has been transformed in multiple stages and actions like aggregations are performed.
- Hands on Experience on HL7 v2 and FHIR
- Used HL7 V2 for exchanging the customer information.
- Used HL7 v2 for data exchange and used FHIR with XML language.
- Actively involved in creating Data Pipelines.
- Involved in designing the Architecture for creating Data pipelines.
- Ingesting data sources from Confidential to s3 and hdfs and developing ETL’s.
- Created Dataframes and RDD’s using Spark by reading the data from Hive Tables and parquet files in S3.
- Fine Tuning and Optimizing the Spark Jobs using different configurations, cache and broadcast joins.
- The Result Data set is stored in S3 and Snowflake for the Visualization - Tableau reports.
- Read the spreadsheet (.xls, .xlsx) files using Spark Scala by integrating zuinnote - which is a spark data source for hadoopoffice library.
- Experience in implementing OLAP multi-dimensional cube functionality using Azure SQL Data Warehouse.
- Used Data Liniage tool for Mapping Analysis.
- Understand the Client requirements and designed the best possible approach to meet the Customer Use case.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, Map Reduce, SQOOP, Advance SQL Saavy,Python, Spark,Hortonworks - 2.5.3, HDFS, Sqoop, Confidential - 12c, HBase, Spark - 2.3, Confidential, Azure Analytics, AWS, EMR
BIG DATA DEVELOPER
Confidential, CHICAGO - IL
Responsibilities:
- Developed Sqoop scripts for importing data from RDBMS to Hadoop.
- Used custom framework (Aorta) and YAML script files which would internally invoke SQOOP and hive.
- Scheduled automated jobs using Automic scheduler and responsible to manage data coming from different sources.
- Created logics to handle History load and incremental data.
- Involved in creating Hive Tables, loading with data and writing Hive queries.
- Implemented the workflows using Automic framework to automate tasks.
- Performed Data Quality checks to validate the data is as per the defined standards.
- Worked on CICD pipeline, integrating code changes to Git repository and build using Jenkins.
- Read the ORC files and create Data frames to use in Spark
- Experienced working with Spark Core and Spark SQL using Scala as programming language.
- Performed data transformations and analytics on large dataset using Spark.
- Used Confidential S3 for storage as replacement to HDFS.
- Well versed knowledge on Spark Streaming API’s.
- Good knowledge on using interactive notebooks like Jupiter/Zeppelin
- POC on Confidential EMR to check the feasibility of moving to cloud and to be future ready.
Environment: Hortonworks, Hadoop, Spark, HDFS, YARN, Oozie,Hive, Linux, Java, Automic, SQL, Aorta, YAML, Confidential, Informix, Spark, Scala, AWS, S3, Confidential -EMR.
Hadoop Developer
Confidential, RICHMOND, VA
Responsibilities:
- Ingested the data from various data sources ( Confidential DB, FTP, Couchbase) into HDFS.
- Developed a Shell Script and Python script for running Sqoop commands, exception Handling and storing the logs while ingesting the Transactional and Electronic Invoice data from Confidential to HDFS.
- Created Hive External Tables for the incremental imports into Hive using Ingest, Reconcile, Compact and Purge Strategy.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Created both external and internal tables in Hive. Worked on Partitions and bucketing.
- Fine Tuning and Productionizing the Confidential SQl queries that are running for long time in a queue.
- Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity
- Worked extensively on building Nifi data pipelines in development phase.
- Performed Data analysis on the data stored using Spark Scala.
- Case class, object creation and creating data frames, rdds. Applying Transformations and Actions.
- Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
- Worked on AWS to create, manage EC2 instances and Hadoop Clusters. Involved in connecting to target database to get data.
- POC involved in loading data from LINUX file system to AWS S3 and HDFS.
- Using Python, able to connect to the SFTP server and get all the OMS (Order Management System) and Post Tax files into HDFS.
- Developed a Spark application using Scala, which parses the data in HDFS and ingests only the filtered data into Hbase and Solr.
- Created Data frames using Spark SQL and performed a JOIN over large number of tables and the resulting data frame is ingested into Hbase and Solr.
- Created a table in Hbase with Column Families and Column Qualifiers and inserted the data using REST API.
- Using Solr, metadata is stored from which the UI reads the data and projects it to the end-user.
- Understand the Client requirements and designed the best possible approach to meet the Customer Use case.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, Map Reduce, SQOOP, NIFI, Splunk, Python, Spark,Hortonworks - 2.5.3, HDFS, Sqoop, Confidential - 12c, HBase, Spark - 2.0, Solr
Hadoop/Data engineer
Confidential, TENNESSee
Responsibilities:
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Data Ingestion into the Indie-Data Lake using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Open source.
- Apache tools like FLUME and SQOOP into HIVE environment.
- Expertise in Hive queries, created user defind aggregated function worked on advanced optimization techniqus and have extensive knowledge on joins.
- Developed Sqoop Scripts to extract data from DB2 EDW source databases onto HDFS.
- We used the most popular streaming tool KAFKA to load the data on Hadoop File system and move the same data to Cassandra NoSQL database.
- Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, CDH4.x .
- Gained Knowledge on Confidential AWS services, created a EC2 instance, S3 storage.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
- Experience in managing and reviewing Hadoop log files.
- Tested raw data and executed performance scripts using MRUnit.
Environment: Hadoop, Hive, Talend, Map Reduce, Pig, SQOOP, Splunk, CDH5, Python,Cloudera Manager CM 5.1.1,HDFS, Pig, DB2, Sqoop, Oozie, Putty, Java.
JR JAVA DEVELOPER
Confidential
Responsibilities:
- Analysis, design and development of Application based on J2EE using Struts and Hibernate.
- Involved in interacting with the Business Analyst and Architect during the Sprint Planning Sessions.
- Implemented Point to Point JMS queues and MDB's to fetch diagnostic details across various interfaces.
- Worked with WebSphere business integration technologies as WebSphere MQ and Message Broker 7.0 (Middleware tools) on Various Operating systems.
- Configured WebSphere resources including JDBC providers, JDBC data sources, connection pooling, and JavaMail sessions. Deployed Session and Entity EJBs in WebSphere
- Developed Rich user interface using RIA, HTML, JSP, JSTL, JavaScript, Jquery, CSS, YUI, AUI using Liferay portal.
- Worked on new Portal theme for the website using Liferay and customize for the look and feel.
- Experience in all aspects of Angular JS like "Routing", "modularity", "Dependency injection", "Service calls" and "Custom directives" for development of single page applications.
- Hibernate was used for Object Relational mapping with Confidential database.
- Involved in developing the user interface using Struts tags, core java development involving concurrency/multi-threading, struts-hibernate integration, database operation tasks.
- Integrated Struts and Hibernate ORM framework for persistence and used Hibernate DAO Support with Hibernate Template to access the data.
- Implemented core java functionalities like collections, multi-threading, Exception handling.
- Involved in unit testing using JUnit 4.
- Performed Code optimization and rewriting the database queries to resolve performance related issues in the application.
- Implemented DAO classes which in turn use JDBC to communicate and retrieve the information from DB2 database sitting on Linux/UNIX server.
- Involved in writing SQL, PL/SQL stored procedures using PL/SQL Developer.
- Used Eclipse as IDE for application development.