We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • Searching for the opportunity to bring 8 years of programming, technology, and engineering expertise in developing software’s while incorporating critical thinking, problem solving, and leadership.

TECHNICAL SKILLS

Hadoop Core Services: HDFS, MapReduce, Hadoop YARN, Spark

Hadoop Data Services: Apache Hive, Sqoop, Flume, Kafka

Hadoop Distributions: Hortonworks, Cloudera

Hadoop Operational Services: Apache Zookeeper, Oozie

Cloud Computing Services: AWS ( Confidential Web Services), Confidential EC2, EMR, AZURE, AAS

IDE Tools: Eclipse, NetBeans, IntelliJ, PyCharm

Programming Languages: C, Java, Unix Shell scripting, Scala, Python

Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS

Databases: Confidential, DB2, Database (Hbase, Cassandra), Confidential

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential, MINNESOTA, MN

Responsibilities:

  • Working as a Data Engineer for a Supply Planning team, wherein I analyzed large data sets to provide best possible insights.
  • Ingested the data from various data sources ( Confidential DB, Confidential, Snowflake) into AWS - S3 using the Spark-Scala JDBC connectors and snowflake connectors.
  • Created DAG’s for the Ingestion Jobs in Airflow using Python and scheduled using the Cron expression.
  • Created both external and internal tables in Hive on top of S3 data. Worked on Dynamic Partitioning.
  • Developed ETL jobs using PySpark,DataLiniage in which the data has been transformed in multiple stages and actions like aggregations are performed.
  • Hands on Experience on HL7 v2 and FHIR
  • Used HL7 V2 for exchanging the customer information.
  • Used HL7 v2 for data exchange and used FHIR with XML language.
  • Actively involved in creating Data Pipelines.
  • Involved in designing the Architecture for creating Data pipelines.
  • Ingesting data sources from Confidential to s3 and hdfs and developing ETL’s.
  • Created Dataframes and RDD’s using Spark by reading the data from Hive Tables and parquet files in S3.
  • Fine Tuning and Optimizing the Spark Jobs using different configurations, cache and broadcast joins.
  • The Result Data set is stored in S3 and Snowflake for the Visualization - Tableau reports.
  • Read the spreadsheet (.xls, .xlsx) files using Spark Scala by integrating zuinnote - which is a spark data source for hadoopoffice library.
  • Experience in implementing OLAP multi-dimensional cube functionality using Azure SQL Data Warehouse.
  • Used Data Liniage tool for Mapping Analysis.
  • Understand the Client requirements and designed the best possible approach to meet the Customer Use case.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, SQOOP, Advance SQL Saavy,Python, Spark,Hortonworks - 2.5.3, HDFS, Sqoop, Confidential - 12c, HBase, Spark - 2.3, Confidential, Azure Analytics, AWS, EMR

BIG DATA DEVELOPER

Confidential, CHICAGO - IL

Responsibilities:

  • Developed Sqoop scripts for importing data from RDBMS to Hadoop.
  • Used custom framework (Aorta) and YAML script files which would internally invoke SQOOP and hive.
  • Scheduled automated jobs using Automic scheduler and responsible to manage data coming from different sources.
  • Created logics to handle History load and incremental data.
  • Involved in creating Hive Tables, loading with data and writing Hive queries.
  • Implemented the workflows using Automic framework to automate tasks.
  • Performed Data Quality checks to validate the data is as per the defined standards.
  • Worked on CICD pipeline, integrating code changes to Git repository and build using Jenkins.
  • Read the ORC files and create Data frames to use in Spark
  • Experienced working with Spark Core and Spark SQL using Scala as programming language.
  • Performed data transformations and analytics on large dataset using Spark.
  • Used Confidential S3 for storage as replacement to HDFS.
  • Well versed knowledge on Spark Streaming API’s.
  • Good knowledge on using interactive notebooks like Jupiter/Zeppelin
  • POC on Confidential EMR to check the feasibility of moving to cloud and to be future ready.

Environment: Hortonworks, Hadoop, Spark, HDFS, YARN, Oozie,Hive, Linux, Java, Automic, SQL, Aorta, YAML, Confidential, Informix, Spark, Scala, AWS, S3, Confidential -EMR.

Hadoop Developer

Confidential, RICHMOND, VA

Responsibilities:

  • Ingested the data from various data sources ( Confidential DB, FTP, Couchbase) into HDFS.
  • Developed a Shell Script and Python script for running Sqoop commands, exception Handling and storing the logs while ingesting the Transactional and Electronic Invoice data from Confidential to HDFS.
  • Created Hive External Tables for the incremental imports into Hive using Ingest, Reconcile, Compact and Purge Strategy.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Created both external and internal tables in Hive. Worked on Partitions and bucketing.
  • Fine Tuning and Productionizing the Confidential SQl queries that are running for long time in a queue.
  • Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity
  • Worked extensively on building Nifi data pipelines in development phase.
  • Performed Data analysis on the data stored using Spark Scala.
  • Case class, object creation and creating data frames, rdds. Applying Transformations and Actions.
  • Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
  • Worked on AWS to create, manage EC2 instances and Hadoop Clusters. Involved in connecting to target database to get data.
  • POC involved in loading data from LINUX file system to AWS S3 and HDFS.
  • Using Python, able to connect to the SFTP server and get all the OMS (Order Management System) and Post Tax files into HDFS.
  • Developed a Spark application using Scala, which parses the data in HDFS and ingests only the filtered data into Hbase and Solr.
  • Created Data frames using Spark SQL and performed a JOIN over large number of tables and the resulting data frame is ingested into Hbase and Solr.
  • Created a table in Hbase with Column Families and Column Qualifiers and inserted the data using REST API.
  • Using Solr, metadata is stored from which the UI reads the data and projects it to the end-user.
  • Understand the Client requirements and designed the best possible approach to meet the Customer Use case.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, SQOOP, NIFI, Splunk, Python, Spark,Hortonworks - 2.5.3, HDFS, Sqoop, Confidential - 12c, HBase, Spark - 2.0, Solr

Hadoop/Data engineer

Confidential, TENNESSee

Responsibilities:

  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Data Ingestion into the Indie-Data Lake using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Open source.
  • Apache tools like FLUME and SQOOP into HIVE environment.
  • Expertise in Hive queries, created user defind aggregated function worked on advanced optimization techniqus and have extensive knowledge on joins.
  • Developed Sqoop Scripts to extract data from DB2 EDW source databases onto HDFS.
  • We used the most popular streaming tool KAFKA to load the data on Hadoop File system and move the same data to Cassandra NoSQL database.
  • Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, CDH4.x .
  • Gained Knowledge on Confidential AWS services, created a EC2 instance, S3 storage.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Experience in managing and reviewing Hadoop log files.
  • Tested raw data and executed performance scripts using MRUnit.

Environment: Hadoop, Hive, Talend, Map Reduce, Pig, SQOOP, Splunk, CDH5, Python,Cloudera Manager CM 5.1.1,HDFS, Pig, DB2, Sqoop, Oozie, Putty, Java.

JR JAVA DEVELOPER

Confidential

Responsibilities:

  • Analysis, design and development of Application based on J2EE using Struts and Hibernate.
  • Involved in interacting with the Business Analyst and Architect during the Sprint Planning Sessions.
  • Implemented Point to Point JMS queues and MDB's to fetch diagnostic details across various interfaces.
  • Worked with WebSphere business integration technologies as WebSphere MQ and Message Broker 7.0 (Middleware tools) on Various Operating systems.
  • Configured WebSphere resources including JDBC providers, JDBC data sources, connection pooling, and JavaMail sessions. Deployed Session and Entity EJBs in WebSphere
  • Developed Rich user interface using RIA, HTML, JSP, JSTL, JavaScript, Jquery, CSS, YUI, AUI using Liferay portal.
  • Worked on new Portal theme for the website using Liferay and customize for the look and feel.
  • Experience in all aspects of Angular JS like "Routing", "modularity", "Dependency injection", "Service calls" and "Custom directives" for development of single page applications.
  • Hibernate was used for Object Relational mapping with Confidential database.
  • Involved in developing the user interface using Struts tags, core java development involving concurrency/multi-threading, struts-hibernate integration, database operation tasks.
  • Integrated Struts and Hibernate ORM framework for persistence and used Hibernate DAO Support with Hibernate Template to access the data.
  • Implemented core java functionalities like collections, multi-threading, Exception handling.
  • Involved in unit testing using JUnit 4.
  • Performed Code optimization and rewriting the database queries to resolve performance related issues in the application.
  • Implemented DAO classes which in turn use JDBC to communicate and retrieve the information from DB2 database sitting on Linux/UNIX server.
  • Involved in writing SQL, PL/SQL stored procedures using PL/SQL Developer.
  • Used Eclipse as IDE for application development.

We'd love your feedback!