Data Engineer Resume
Roseland, NJ
SUMMARY:
- Around 7 years of professional IT experience in Big Data Ecosystem and Java/J2EE
- Technical experience in financial and telecommunications industries
- Experienced in Big Data Ecosystem with Hadoop 2.0, HDFS, MapReduce, Pig 0.12+, Hive1.0+, HBase 0.98+, Sqoop1.3+, Flume 1.3+, Kafka 1.2+, Oozie 3.0+ and Spark 2.0+
- Proficient in Java, Python, Scala in Apache Spark
- Experienced with distributions including Cloudera CDH 5.X and Hortonworks HDP 2.X and AWS and EMR
- Expert in RDBMS including MySQL, Oracle SQL, SQL Server, PostgreSQL
- Worked with NoSQL Database including HBase, MongoDB, Redis and Cassandra
- Experienced in writing UDFs for Hive and Pig Latin in Scala/ JAVA to extend functionality, capable of writing HiveQL queries to process and analyze data
- Skilled in using Sqoop/Flume to transfer data between RDBMS and HDFS
- Utilized Kafka, RabbitMQ and Flume to ingest real - time data stream in HDFS and HBase
- Applied open-source tools such as Zookeeper, Oozie and Shell script for scheduling
- Strong in data structure, algorithms design, Object-oriented Design(OOD) and core components like Collection Framework, multithreading, exception handling, I/O system for both C++ and Java
- Experienced in Graphic and UI design with Adobe PhotoShop
- Experienced in all the phases of Data warehouse life cycle involving requirement analysis, design, coding, testing, and deployment
- Involved in Tableau Server Configuration and Dashboard building
- Developed Machine Learning algorithms including Linear Regression, Logistic Regression, K-Means, Decision Trees
- Experienced in optimize NUMA(Non-uniform memory access) system such as synchronization for multithreaded programs and optimization of locks, and benchmarks based on Intel VTune Amplifier 2016
- Good knowledge of Unit Testing with Pytest, ScalaCheck, ScalaTest, JUnit and MRUnit
- Exposed to Agile environment and familiar with tools like JIRA, Confluence, Bitbucket etc.
- Self-motivated, fast learner with team work spirit, enjoy working both independently and collaboratively to solve challenging business problem
TECHNICAL SKILLS:
Hadoop Eco Systems\ NoSQL\: Hadoop 2.0.0+, MapReduce, HBase 0.98, \HBase 0.98+, Cassandra 2.0+, MangoDB 3.0+ Spark 1.3+, Hive 1.1+, Pig 0.12+, Kafka 1.2+, \ Sqoop 1.3+, Flume 1.3+, Impala 1.2+, Oozie\ 3.0+, Zookeeper 3.4+\
Programming Languages\Operating System\: C++/C, Java 7+, Scala 1.60+, SQL, SparkSQL\Mac OS, Ubuntu, CentOS, Windows HiveSQL, Pig-Latin\Python 2.7+
Database\Machine Learning\: MySQL 5.x, Oracle 10g, PostgreSQL 9.x, \ Linear Regression, Logistic Regression, \ MongoDB 3.2, HBase 0.98\K-Means, Decision Tree\
PROFESSIONAL EXPERIENCE:
Confidential, Roseland, NJ
Data Engineer
Responsibilities:
- Design and develop high throughput, scalable, extensible, maintainable and testable applications, Automate, extend and scale the data processing and analytics pipeline
- Design and implement Map/Reduce, Spark and Machine Learning jobs to support distributed data processing
- Acquire, clean and analyze large data sets
- Manage technology and environments for Data Scientists, Data Engineers & Data Analysts
- Integrate data from multiple internal/external data sources and APIs
- Create custom tools to streamline and optimize workflow and enable cohesive data driven applications
- Design and develop SQL scripts and tools to support adhoc analytical requests
- Diagnose, Tune & Architect advanced Data Science technology
- Pushed cleansed data set into Hive using Sqoop and developed BI reports using Tableau designed workflow in Oozie to automate tasks of loading data
- Involved in design and development phases of Software Development Life Cycle using Scrum methodology
- Used Git for version control and JIRA for project tracking
Environment: Red Hat Linux, HDFS, MapReduce, Hive, Java, Sqoop, Ooize, CDH, Tableau,, Flume, Eclipse, JIRA, Scala, Python
Confidential,New York, NY
Big Data Developer/Analyst
Responsibilities:
- Designed data pipeline using Flume, Sqoop to ingest customers’ data into HDFS
- Developed multiple MapReduce jobs in Java for data cleaning
- Wrote customized UDFs with Scala/Python for data preprocessing
- Extend the capabilities of DataFrames using UDFs in Python and Scala.
- Worked with multiple data formats ( XML, CSV, JSON, Avro) and imported data into Hive
- Wrote customized Hive UDFs (user defined function) for data transformation
- Built star schema data model(Fact/Dim tables) using Kimball Approach for data analysis
- Worked with various compression hive file formats, such as gzip,bzip2,LZO,and Snappy
- Saved aggregation result into tables for fast data retrieval
- Pushed cleansed data set into Hbase using Sqoop and developed BI reports using Tableau designed workflow in Oozie to automate tasks of loading data
- Involved in design and development phases of Software Development Life Cycle using Scrum methodology
- Performed unit testing using JUnit and MRUnit
- Used Git for version control and JIRA for project tracking
Environment: Red Hat Linux, HDFS, MapReduce, Hive, Java, Sqoop, Ooize, CDH, Tableau, Hbase, Flume, Eclipse, JIRA, Junit, MrUnit, Scala, Python
Confidential, Richmond, VABig Data Developer
Responsibilities:
- Worked with Amazon Web Services
- Extracted data from various source systems (Oracle, MySQL, SQL Server, MongoDB, log files) to HDFS cluster using Sqoop, Flume
- Implemented Hive UDFs to in corporate business logic into Hive Queries
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing
- Configured Kafka producers/consumers and Kafka cluster to serve as a temporary data storage by Scala and Java
- Persisted ingested high-throughput data in Cassandra
- Processed semi-structured data into structured using spark core, spark sql
- Analyzed real time data using Spark Streaming
- Worked on Oozie to automate data load jobs into HDFS and HIVE
- Involved in managing and reviewing Hadoop log files
- Involved in handling the issues related to cluster start, node failures on the system
- Performed unit testing for Spark and Spark Streaming with Pytest, ScalaCheck
- Used JIRA for project tracking and Jenkins for continuous integration
Environment: Hadoop, Cloudera CDH5X, HDFS, MapReduce, Kafka, Ooize, Pig, Hive, Sqoop, JIRA, Jenkins, Cassandra, MongoDB, AWS
Confidential, Herndon, VA
Hadoop Developer
Responsibilities:
- Installed and Configured Apache Hadoop clusters and Hadoop tools for application development includes HDFS, YARN, Sqoop, Flume, Hive, Pig, Oozie, Zookeeper and HBase
- Wrote Map Reduce job to launch and monitor computation on the cluster by using Java
- Migrating the needed data from RDBMS into HDFS using Sqoop and importing various formats of flat files into HDFS Worked on bulk load of data from enterprise data warehouse to Hadoop
- Wrote Pig Scripts to perform transformation procedures on the data in HDFS
- Created Oozie workflows to automate the data pipeline and schedule data by using Oozie coordinator
- Involved in design workflow for Oozie resource management for YARN
- Worked with serialization formats such as Json, Xml and Big data serialization formats such as Avro and Sequence Files
- Verified importing and exporting data into HDFS and Hive using Sqoop
Confidential
SQL Developer
Responsibilities:
- Observe performance of the databases and optimize system resources and SQL
- Support internal projects by creating update procedures to fix data issues
- Design analysis for supply chain management projects involving multiple databases/ETL and materialized views
- Setting up database monitoring for existing environments using shell scripts
- Read from SQL DBs, Web through APIs and processed them for further use in python with PANDAS module
- Written SQL queries involved in the JDBCconnection in accordance with the business logic
Environment: MS SQL Server 2005/2008, Visual Studio 2008, MS Access, MS Excel, Crystal Reports, SQL Server Analysis Services (SSAS)
Confidential, Fort Wayne, IN
Java/J2EE Developer
Responsibilities:
- Developed unit test code using Java
- Involved in Quality Test and inspection of the tests written by other engineers and generated feedback reports
- Gathered business requirement and wrote technical report for potential customers
- Involved in design and implement web application according to customer’s needs
- Implemented client-side application to invoke SOAP and REST Web Services
Environment: Java 7, ASP.NET, Entity framework 6, My SQL, PostgreSQL, WCF, WPF SOAP REST
Confidential, Indianapolis, IN
Front End Developer
Responsibility:
- Involved in SDLC Requirements gathering, Analysis, Design, Development and Testing of application
- Created standards compliant HTML, CSS and JavaScript pages as needed
- Developed JavaScript, jQuery with JavaScript libraries.
- Involved in User Interface Testing to check the compatibility of web sites for multiple browsers.
- Worked with Java back-end, utilizing AJAX to pull in and parse XML
Environment: HTML, JavaScript, JAVA, CSS, AJAX, jQuery, XML