Big Data Engineer Resume New Jersey - Hire IT People

SUMMARY:

Around 6 years of IT experience with 3 years in developing data pipelines on Big Data Technologies such as Spark, Hive, Pig, Hadoop, MapReduce, Sqoop, Kafka.
Experience in developing Apache Spark programs using Java, Scala, Python.
Commendable knowledge on Spark architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib
Experienced in writing Spark programs/application in Scala using Spark APIs for Data Extraction, Transformation and Aggregation
Expertise in processing large sets of structured, semi - structured data in Spark & Hadoop, and store them in HDFS
Experienced in Spark Framework on both batch and real-time data processing
Experience in developing Kafka Consumer API using Spark Scala applications
In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts
Developed MapReduce programs in Java for data cleansing, data filtering, and data aggregation
Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Hortonworks and Cloudera
Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the Hive QL queries.
Implemented Hive UDF's to achieve customized functionality.
Experience in ETL operations on Hive to Spark.
Worked on importing and exporting RDBMS data into HDFS and Hive using Sqoop.
Experienced in analyzing data using PIG Latin scripts
Proficient in big data ingestion and streaming tools like Sqoop, Kafka.
Good Knowledge on NoSQL data bases and hands on work experience in writing applications on NoSQL databases like Cassandra and MongoDB.
Working knowledge on RDBMS Databases like Oracle11g, SQL Server, MySQL, MS Access.
Good knowledge of Data warehousing concepts and ETL processes.
Good knowledge on various scripting languages like Linux/Unix shell scripting and Python.
Experienced in using IDEs and Tools like Eclipse, Net Beans, GitHub, Maven and IntelliJ.
Experienced in working with different file formats - Avro, text file, XML, JSON, CSV.
Good understanding of algorithms, data structures, performance optimization techniques and object-oriented programming.
Proficient in Data Visualization by creating multiple dashboards using Tableau, R.
Skilled in using version control software such as GIT.
Robust understanding of Agile methodology and implementing Scrum structure in Project development.
Involved in various stages of waterfall Model methodology like Analysis, Development and Maintenance
Ability to work independently and a strong team player in a team as well with excellent communication skills.
Quick learning ability, self-motivated, adaptability to new environment

TECHNICAL SKILLS:

Languages: Cluster Mgmt.& Monitoring Python2.7, Java1.8, Scala2.10, SQL, R, C, C++.\ Cloudera 5.7.6, Horton works Ambari 2.5.

Hadoop Ecosystem: Hadoop2.6, MapReduce v1 & v2, YARN, Spark1.6, Spark SQL, Spark Streaming with HDFS, SQOOP1.4.6, Hive0.13, Pig, Kafka. scala, Spark with python.

Database: Oracle11g, SQL Server, MySQL, MS Access.\ VM ware workstation, Oracle VM Virtual Box.

No SQL Databases: MongoDB, Cassandra.\ MS Excel, R, Tableau.

Cloud Computing: Google Cloud.Eclipse, Net Beans, GitHub, Maven, IntelliJ.

Operating Systems: Unix, Linux, Windows, Git, SVN.

PROFESSIONAL EXPERIENCE:

Confidential, New Jersey

Big Data Engineer

Responsibilities:

Performed data Ingestion from various sources into Hadoop Data Lake using Kafka.
Built real time pipeline for streaming data using Kafka and Spark Streaming.
Written and ran Java Producer programs to post messages to topics.
Wrote and ran Java Consumer programs to read and process messages from Kafka topics.
Created tables in DataStax Cassandra and loaded large sets of data for processing. => hdfs
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
Responsible for the Implementation of POC to migrate map reduce jobs into Spark RDD transformations using scala.
Created Spark Application to load data into Dynamic Partition Enabled Hive Table.
Created Hive external tables for each source table in Hadoop Data Lake.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Optimized the data sets by creating dynamic partitioning and bucketing in Hive.
Developed business specific Custom UDF's in Hive, Pig.
Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Experience in code repositories such as Git

Environment: CDH 5.7.6, Hadoop 2.6, Spark 1.6.0, Scala 2.10, Maven, Kafka2.10, Sqoop 1.4.6, Mapreduce, HDFS, Pig, Hive0.13, Intellij, Oracle, DataStax Cassandra 4.8, Centos, Windows, Python 2.7, Tableau 9.0

Confidential, Charlotte, NC

Data Engineer

Responsibilities:

Worked on analyzing Hadoop cluster using different big data analytic tools including Spark, Pig, Hive and MapReduce.
Developed Spark code using Scala for faster processing of data.
Migrated complex Map reduce programs, Hive scripts into Spark RDD transformations and actions.
Developed Scala scripts, UDF's using both SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through SQOOP.
Design and Develop Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs.
Written PIG scripts to process unstructured data and available to process in Hive.
Created Hive schemas using performance techniques like partitioning and bucketing.
Performed data analysis with Cassandra using Hive External tables.
Exported the analyzed data to Cassandra using Sqoop and to generate reports for the BI team.
Involved in deploying code into version control git
Worked on different data formats such as CSV and JSON

Environment: CDH, HDFS, SPARK, Pig, Hive, Sqoop, Map Reduce, YARN, UNIX Shell Scripting, Agile Methodology

Confidential

Hadoop Developer

Responsibilities:

Worked on live 8 node Hadoop clusters running CDH 4.
Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS).
Developed several MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data into HDFS.
Responsible for creating Hive External tables and loaded the data into tables and query data using HiveQL.
Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
Involved in Design, Development and support of the application used Agile Methodology and participated in scrum meetings
Developed user interfaces using JSP, HTML, Java Script, CSS Client Server network communication design and Development
Offline Location based ERP Design and Development
Conducted Design reviews and Technical reviews with other project statehood Implemented Services using Core Java.
Developed analysis level documentation such as Use Case, Business Domain Model, Activity & Sequence and Class Diagrams.
Develop client and server using core java, Swing and C++
Technical Support to client

Environment: Java, Core Java, AWT, Applet, Swing and C++, Struts, JSP and Servlet, JDBC and SQL Server

Confidential

UI Developer

Responsibilities:

Used HTML, CSS to build page layouts.
Used JavaScript and jQuery to handle all events that are triggered by users, such as hover and click.
Following the design requirement to design user-friendly layout by using HTML and CSS.
Request and Get data from backend using AJAX to exchange JSON data with back-end.
Used SVN for version control and QC for defect tracking.
Creating cross-browser compatibility and standards-compliant CSS-based page layouts.
Daily website maintenance and updating content.

Environment: HTML, XHTML, XSL, CSS, AJAX, JSON, jQuery, RESTful

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

New, JerseY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship