Data Engineer Resume Roseland, NJ - Hire IT People

SUMMARY:

Around 7 years of professional IT experience in Big Data Ecosystem and Java/J2EE
Technical experience in financial and telecommunications industries
Experienced in Big Data Ecosystem with Hadoop 2.0, HDFS, MapReduce, Pig 0.12+, Hive1.0+, HBase 0.98+, Sqoop1.3+, Flume 1.3+, Kafka 1.2+, Oozie 3.0+ and Spark 2.0+
Proficient in Java, Python, Scala in Apache Spark
Experienced with distributions including Cloudera CDH 5.X and Hortonworks HDP 2.X and AWS and EMR
Expert in RDBMS including MySQL, Oracle SQL, SQL Server, PostgreSQL
Worked with NoSQL Database including HBase, MongoDB, Redis and Cassandra
Experienced in writing UDFs for Hive and Pig Latin in Scala/ JAVA to extend functionality, capable of writing HiveQL queries to process and analyze data
Skilled in using Sqoop/Flume to transfer data between RDBMS and HDFS
Utilized Kafka, RabbitMQ and Flume to ingest real - time data stream in HDFS and HBase
Applied open-source tools such as Zookeeper, Oozie and Shell script for scheduling
Strong in data structure, algorithms design, Object-oriented Design(OOD) and core components like Collection Framework, multithreading, exception handling, I/O system for both C++ and Java
Experienced in Graphic and UI design with Adobe PhotoShop
Experienced in all the phases of Data warehouse life cycle involving requirement analysis, design, coding, testing, and deployment
Involved in Tableau Server Configuration and Dashboard building
Developed Machine Learning algorithms including Linear Regression, Logistic Regression, K-Means, Decision Trees
Experienced in optimize NUMA(Non-uniform memory access) system such as synchronization for multithreaded programs and optimization of locks, and benchmarks based on Intel VTune Amplifier 2016
Good knowledge of Unit Testing with Pytest, ScalaCheck, ScalaTest, JUnit and MRUnit
Exposed to Agile environment and familiar with tools like JIRA, Confluence, Bitbucket etc.
Self-motivated, fast learner with team work spirit, enjoy working both independently and collaboratively to solve challenging business problem

TECHNICAL SKILLS:

Hadoop Eco Systems\ NoSQL\: Hadoop 2.0.0+, MapReduce, HBase 0.98, \HBase 0.98+, Cassandra 2.0+, MangoDB 3.0+ Spark 1.3+, Hive 1.1+, Pig 0.12+, Kafka 1.2+, \ Sqoop 1.3+, Flume 1.3+, Impala 1.2+, Oozie\ 3.0+, Zookeeper 3.4+\

Programming Languages\Operating System\: C++/C, Java 7+, Scala 1.60+, SQL, SparkSQL\Mac OS, Ubuntu, CentOS, Windows HiveSQL, Pig-Latin\Python 2.7+

Database\Machine Learning\: MySQL 5.x, Oracle 10g, PostgreSQL 9.x, \ Linear Regression, Logistic Regression, \ MongoDB 3.2, HBase 0.98\K-Means, Decision Tree\

PROFESSIONAL EXPERIENCE:

Confidential, Roseland, NJ

Data Engineer

Responsibilities:

Design and develop high throughput, scalable, extensible, maintainable and testable applications, Automate, extend and scale the data processing and analytics pipeline
Design and implement Map/Reduce, Spark and Machine Learning jobs to support distributed data processing
Acquire, clean and analyze large data sets
Manage technology and environments for Data Scientists, Data Engineers & Data Analysts
Integrate data from multiple internal/external data sources and APIs
Create custom tools to streamline and optimize workflow and enable cohesive data driven applications
Design and develop SQL scripts and tools to support adhoc analytical requests
Diagnose, Tune & Architect advanced Data Science technology
Pushed cleansed data set into Hive using Sqoop and developed BI reports using Tableau designed workflow in Oozie to automate tasks of loading data
Involved in design and development phases of Software Development Life Cycle using Scrum methodology
Used Git for version control and JIRA for project tracking

Environment: Red Hat Linux, HDFS, MapReduce, Hive, Java, Sqoop, Ooize, CDH, Tableau,, Flume, Eclipse, JIRA, Scala, Python

Confidential,New York, NY

Big Data Developer/Analyst

Responsibilities:

Designed data pipeline using Flume, Sqoop to ingest customers’ data into HDFS
Developed multiple MapReduce jobs in Java for data cleaning
Wrote customized UDFs with Scala/Python for data preprocessing
Extend the capabilities of DataFrames using UDFs in Python and Scala.
Worked with multiple data formats ( XML, CSV, JSON, Avro) and imported data into Hive
Wrote customized Hive UDFs (user defined function) for data transformation
Built star schema data model(Fact/Dim tables) using Kimball Approach for data analysis
Worked with various compression hive file formats, such as gzip,bzip2,LZO,and Snappy
Saved aggregation result into tables for fast data retrieval
Pushed cleansed data set into Hbase using Sqoop and developed BI reports using Tableau designed workflow in Oozie to automate tasks of loading data
Involved in design and development phases of Software Development Life Cycle using Scrum methodology
Performed unit testing using JUnit and MRUnit
Used Git for version control and JIRA for project tracking

Environment: Red Hat Linux, HDFS, MapReduce, Hive, Java, Sqoop, Ooize, CDH, Tableau, Hbase, Flume, Eclipse, JIRA, Junit, MrUnit, Scala, Python

Confidential, Richmond, VA

Big Data Developer

Responsibilities:

Worked with Amazon Web Services
Extracted data from various source systems (Oracle, MySQL, SQL Server, MongoDB, log files) to HDFS cluster using Sqoop, Flume
Implemented Hive UDFs to in corporate business logic into Hive Queries
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing
Configured Kafka producers/consumers and Kafka cluster to serve as a temporary data storage by Scala and Java
Persisted ingested high-throughput data in Cassandra
Processed semi-structured data into structured using spark core, spark sql
Analyzed real time data using Spark Streaming
Worked on Oozie to automate data load jobs into HDFS and HIVE
Involved in managing and reviewing Hadoop log files
Involved in handling the issues related to cluster start, node failures on the system
Performed unit testing for Spark and Spark Streaming with Pytest, ScalaCheck
Used JIRA for project tracking and Jenkins for continuous integration

Environment: Hadoop, Cloudera CDH5X, HDFS, MapReduce, Kafka, Ooize, Pig, Hive, Sqoop, JIRA, Jenkins, Cassandra, MongoDB, AWS

Confidential, Herndon, VA

Hadoop Developer

Responsibilities:

Installed and Configured Apache Hadoop clusters and Hadoop tools for application development includes HDFS, YARN, Sqoop, Flume, Hive, Pig, Oozie, Zookeeper and HBase
Wrote Map Reduce job to launch and monitor computation on the cluster by using Java
Migrating the needed data from RDBMS into HDFS using Sqoop and importing various formats of flat files into HDFS Worked on bulk load of data from enterprise data warehouse to Hadoop
Wrote Pig Scripts to perform transformation procedures on the data in HDFS
Created Oozie workflows to automate the data pipeline and schedule data by using Oozie coordinator
Involved in design workflow for Oozie resource management for YARN
Worked with serialization formats such as Json, Xml and Big data serialization formats such as Avro and Sequence Files
Verified importing and exporting data into HDFS and Hive using Sqoop

Environment:Hadoop, MapReduce, HDFS, Java, Pig, YARN, Sqoop, Oozie, Cassandra, Eclipse, Linux

Confidential

SQL Developer

Responsibilities:

Observe performance of the databases and optimize system resources and SQL
Support internal projects by creating update procedures to fix data issues
Design analysis for supply chain management projects involving multiple databases/ETL and materialized views
Setting up database monitoring for existing environments using shell scripts
Read from SQL DBs, Web through APIs and processed them for further use in python with PANDAS module
Written SQL queries involved in the JDBCconnection in accordance with the business logic

Environment: MS SQL Server 2005/2008, Visual Studio 2008, MS Access, MS Excel, Crystal Reports, SQL Server Analysis Services (SSAS)

Confidential, Fort Wayne, IN

Java/J2EE Developer

Responsibilities:

Developed unit test code using Java
Involved in Quality Test and inspection of the tests written by other engineers and generated feedback reports
Gathered business requirement and wrote technical report for potential customers
Involved in design and implement web application according to customer’s needs
Implemented client-side application to invoke SOAP and REST Web Services

Environment: Java 7, ASP.NET, Entity framework 6, My SQL, PostgreSQL, WCF, WPF SOAP REST

Confidential, Indianapolis, IN

Front End Developer

Responsibility:

Involved in SDLC Requirements gathering, Analysis, Design, Development and Testing of application
Created standards compliant HTML, CSS and JavaScript pages as needed
Developed JavaScript, jQuery with JavaScript libraries.
Involved in User Interface Testing to check the compatibility of web sites for multiple browsers.
Worked with Java back-end, utilizing AJAX to pull in and parse XML

Environment: HTML, JavaScript, JAVA, CSS, AJAX, jQuery, XML

We provide IT Staff Augmentation Services!

Data Engineer Resume

Roseland, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship