We provide IT Staff Augmentation Services!

Sr.data Engineer Resume

1.50/5 (Submit Your Rating)

Northville, MI

PROFESSIONAL SUMMARY:

  • 8+ years of IT experience in Software Development including Java/J2EE, development in Big Data Eco - System, Pentaho Data Integration and Tableau Visualizations. experience in Big Data Ecosystem including Spark.
  • Certified as a Tableau Desktop 10 Qualified Associate from Tableau Software.
  • In-Depth Understanding of Spark Architecture including Spark Core, Spark SQL, Spark Streaming and Spark MLlib.
  • Proficiency in Spark using Scala for loading data form local file systems, HDFS, Amazon S3, Relational and NoSQL databases using Spark SQL, import data into RDD, DataFrames and ingesting the data from a range of sources using Spark Streaming.
  • Sound knowledge in programming Spark 2.0 using Scala and Java
  • Developed Spark Jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Proficiency in performing ETL operations from Hive to Spark.
  • Dealt with Apache Hadoop components like HDFS, MapReduce, HIVE, HBase, PIG, SQOOP, OOZIE, Kafka, Hue, Zoo Keeper.
  • Solid understanding of Hadoop architecture and the daemons of Hadoop - Name Node, Data Node, Resource Manager, Node Manager, Task Tracker, Job Tracker.
  • Experience in writing Hive Queries for processing and analyzing large volumes of data.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Experienced in writing Java Map Reduce Programs.
  • Managed cluster with YARN, Zookeeper, and Hue.
  • Handled streaming data in real time with Kafka, Flume and Spark Streaming.
  • Well versed creating ETL transformations and jobs using Pentaho Kettle Spoon and Pentaho Data Integration Designer and scheduling them on Pentaho BI Server.
  • Experience in designing, modelling, performance tuning and analysis, implementing processes using ETL tool Pentaho Data Integration (PDI) tool for Data Extraction, transformation and loading processes. Designing end to end ETL processes to support reporting requirements. Designed aggregates, summary tables and materialized views for reporting.
  • Extensive experience in installing and configuring Pentaho BI Server 8.0 for ETL and reporting purposes.
  • Experienced in using Meta-Data Injection for performing ETL operations dynamically and generate JSON files based on different client needs.
  • Experienced in using JavaScript for performing ETL operations and generating JSON files based on client requirements.
  • More than 4 years of experience in JAVA, J2EE, SOAP, HTML and XML related technologies demonstrating strong analytical and problem-solving skills, and the ability to follow through with projects from inception to completion.
  • Extensive experience with design and development of Tableau Visualization Solutions.
  • Created Multiple Interactive Dashboards and views using Tableau Desktop 10.
  • Experienced in using Tableau Online and Tableau Server.
  • Worked with different types of data in Tableau like Excel files, Text files, JSON files etc.,
  • Worked with different types of server connections like MSSQL Server, MySQL Server, Hadoop and Hive with Tableau.
  • Extensive Data Virtualization experience in creating Base views, Derived Views and derived views from multiple heterogeneous data sources using Denodo 6.0.
  • Excellent communication and interpersonal skills backed up by sound professional ability to work independently as well as a team member.

TECHNICAL PROFICIENCY AND SKILL :

Data Visualization Tools: Tableau

Data Integration Tools: Denodo, PDI

Programming languages: Java, Scala, Python, R, HTML and SQL.

J2EE Technologies: Java Beans, Servlets, JSP, JDBC

ETL Tools: Spark, Pentaho

Web Services: Spring Web Services, REST, SOAP

IDE’s: NetBeans, Eclipse, Android studio, IntelliJ

Operating systems: Windows, Linux

Big data Platforms: Hortonworks, Cloudera

Big Data Ecosystem: MR, PIG, Hive, HBase, SQOOP, YARN, OOZIE, Apache Spark, Kafka

PROFESSIONAL EXPERIENCE:

Confidential, Northville, MI

Sr.Data Engineer

Responsibilities:

  • Architected, designed and developed a fully automated payment integrity solution using Spark and Hadoop.
  • Involved in architecting the data model to store claims data in proper format for efficient analysis.
  • Designed solution to perform ETL tasks like Data Acquisition, data transformation, data cleaning and efficient data storage on HDFS.
  • Responsible for cleaning the data from EDI file formats (e.g. 837, 834, 277, 276), flat files, client databases and load them into local database.
  • For Creating a Rules Engine, I have used Drools Rule Engine and Spark RDD’s for quick analysis of the data.
  • Designed and Developed Business Rules and rule flows using BRMS using Drools 7.1.0
  • Analyzing Health care data from multiple sources like Relational-Data, flat file data, Amazon S3, 837 and 834 files data.
  • Tested our product in various environments like Amazon EC2, Azure HDInsight, local multi-node Hadoop Cluster.
  • Learned and Worked a great deal about Amazon Cloud Services like EC2, S3 and also about Azure HDInsight.
  • Developed Spark scripts by using Scala shell commands as per the requirements for cleaning and testing.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Created Managed Tables and External Tables in Hive and loaded data from HDFS.
  • Used Hue to create external Hive Tables on the data imported and monitored the system for performance metrics and capacity usage.
  • Developed Oozie workflow to automate the loading of the data to HDFS for data pre-processing.
  • Creating the test environment for Staging area, loading the Staging area with data from procedures and cross verifying the field size defined within the application with metadata multiple sources.
  • Implemented test scripts to support to support test driven development and continuous integration.
  • Created Multiple Interactive Dashboards and views using Tableau Desktop 10.
  • Using publicly available data, created a Tableau Dashboard which can predict an Amount that can be recovered from the claims.
  • Implemented SQL, PL/SQL stored procedures.
  • Actively involved in code review and bug fixing for improving the performance.

Environment: Spark, DROOLS, MAPR Hadoop Framework, MapReduce, HDFS, SQOOP, Hive, MSSQL SERVER, MYSQL, AMAZON EMR, S3, Azure HDInsight, Azure SQL SERVER, Tableau, Java.

Confidential, Piscataway, NJ

Data Integration for Quest Analytics

Responsibilities:

  • Responsible for developing and rebuilding the current architecture using Pentaho Data Integration.
  • Creating the test environment for Staging area, loading the Staging area with data from procedures and cross verifying the field size defined within the application with metadata multiple sources.
  • Used Meta data injection procedure to automate the ETL process. Used Pentaho to extract the data from MongoDB & AWS-S3 buckets.
  • Used Pentaho data transformation steps to transform the data into clients DSF (Data Standard Format).
  • Used API calls and tunneling techniques to load the transformed data into client’s inbuilt API system.
  • Created various mapping documents for source and target mapping.
  • Used various input types in PDI for parallel accessing.
  • Created various connections with various databases like AWS RedShift, MySQL etc., in PDI.
  • Identified performance issues in existing sources, targets and mappings by analyzing the data flow, evaluating transformations and tuned accordingly for better performance.
  • Used various types of inputs and outputs in Pentaho Kettle including Database Tables, MS Access, Text Files, Excel files and CSV files.
  • Designed and implemented Business Intelligence Platform from scratch. Integrated it with upstream systems using Mango DB and other Big Data components for various functionalities. Make platform more resilient, also make it less configurable so that with minimal setting, we can onboard clients.
  • Perform ETL Operations that were required and convert the Input files to standard DSF output which are in JSON format.
  • Automated data transfer processes and mail notifications by using FTP Task and send mail task in Transformations.
  • Designing document to create ETL pipeline mappings, sessions and workflows.
  • Supported QA and various cross functional users in all their data needs.
  • Scheduled jobs using Clarinet tool to run all Pentaho jobs and transformations nightly, weekly and monthly.
  • Identify, document and communicate BI and ETL best practices and industry accepted development methodologies and techniques.
  • Troubleshoot BI tool problems and provide technical support as needed. Perform other tasks as assigned.
  • Worked very closely with Project Manager to understand the requirement of reporting solutions to be built.
  • Good experience in design the jobs and transformations and load the data sequentially & parallel for initial and incremental loads.
  • Good experience in using various PDI steps in cleansing and load the data as per the business need
  • Good experience in configuration of Data integration server to run the jobs in local, remote server and cluster mode.

Environment: Pentaho Data Integration, MongoDB, AWS, Meta data Injection, Java Script and SQL Programming

Confidential, Columbus, OH

Hadoop Developer

Responsibilities:

  • Architected, designed and developed the data lake for this solution considering all the scenarios.
  • Designed and documented all the possible scenarios in OR scheduling.
  • Creating the test environment for Staging area, loading the Staging area with data from procedures and cross verifying the field size defined within the application with metadata multiple sources.
  • Responsible for building the architecture and implementing the predictive analytics into the solution.
  • Responsible for building the workflow for extracting the data from different Electronic Medical Record (EMR) systems such as EPIC, Cerner etc.
  • Worked with EPIC and Pentaho Data integration to build the pipeline which has been used for extracting the data.
  • Used Pentaho Data Integration tool to transform the historical unstructured data into structured data.
  • Used Tableau to produce the dashboards which will compare the results before using this solution and after using this solution in staging environment.

Environment: Pentaho Data Integration, MSSQL SERVER, Python and SQL Programming, Tableau.

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

  • Developing mapping to extract data from different data sources like MySQL, Flat files, XML files and Excel into HDFS .
  • Developing parser and loader map reduce application to retrieve data from HDFS and store in Hive.
  • Importing unstructured data into the data lake in HDFS using flume.
  • Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
  • Automated Sqoop jobs for extracting the data from different data sources like MySQL to pushing the result set to HDFS.
  • Implemented partitioned, Dynamic Partitions, Bucketing in Hive .
  • Implemented MR programs on our private Multi-Node Hadoop cluster.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Implemented Kafka producers create custom partitions, configured brokers and implemented high level consumers to implement data platform.
  • Collecting and aggregating large amounts of log data using Apache Flume and tagging data in HDFS for Further analysis .
  • Implemented test scripts to support to support test driven development and continuous integration .
  • Worked on tuning the performance of Pig Scripts.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Developed Pig Latin scripts to extract data from Web Server output files to HDFS.
  • Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
  • Implemented SQL, PL/SQL stored procedures.
  • Worked in debugging, performance tuning of Hive and Pig jobs .
  • Actively involved in code review and bug fixing for improving the performance.

Environment: Hadoop, HDFS, Hive, Flume, Kafka, SQOOP, Pig, Map Reduce, Java, MySQL, MSSQL, Flat files, CSV, Avro data files.

Confidential, NY

Hadoop Spark Developer

Responsibilities:

  • Involved in the complete SDLC of the project from requirement analysis to testing.
  • Developed the modules based on the Struts MVC framework.
  • Using JSP, JavaScript, JQuery, AJAX, HTML and CSS for designing and developing the data and presentation layers.
  • Developed the GUI using JavaScript, JQuery, HTML and CSS for interactive cross browser functionality and complex User Interface.
  • Coding of Business logic is done using Servlets, Session Beans and deployed them using Tomcat Web Server.
  • Used MVC Struts framework for the application Design.
  • Created SQL queries, PL/SQL standard procedures and functions for the back-end
  • Involved in writing the stored procedures for database cross validations.
  • Performed unit testing, system testing and integration testing.
  • Wrote test cases in JUnit for unit testing of classes.
  • Involved in preparing Code Review, Deployment and Documentation.

Environment: Java, Servlets, JSP, HTML, XML, CSS, AJAX, JavaScript, JQuery, Tomcat, Junit, PL/SQL and Eclipse.

Hadoop Developer

Confidential, TX

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Written MapReduce code to process & parsing the data from various sources & storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Involved in developing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
  • Worked on moving all log files generated from various sources to HDFS for further processing.
  • Developed workflows using custom MapReduce, Pig, Hive and Sqoop.
  • Developing predictive analytic product for using Apache Spark, SQL/HiveQL, JavaScript, and High Charts.
  • Writing Spark programs to load, parse, refined and store sensor data into Hadoop and also process analyzed and aggregate data for visualizations.
  • Developing data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (json) for visualization, and generating. E.g. Highcharts: Outlier, data distribution, Correlation/comparison, and 2 dimension charts using JavaScript.
  • Creating various views for HBASE tables and also utilizing the performance of Hive on top of HBASE.
  • Developed the Apache Storm, Kafka, and HDFS integration project to do a real time data analyses.
  • Designed and developed the Apache Storm topologies for Inbound and outbound data for real time ETL to find the latest trends and keywords.
  • Developed Map Reduce program for parsing and loading into HDFS information.
  • Migrated an existing on-premises application to AWS.
  • Used AWS services like EC2 and S3 for small data sets.
  • Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
  • Written Hive UDF to sort Structure fields and return complex data type.
  • Responsible for loading data from UNIX file system to HDFS.

Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse (Kepler), Hadoop, AWS, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Oozie, Sqoop, Aws, Spark, Unix, Tableau, Cosmos

We'd love your feedback!