We provide IT Staff Augmentation Services!

Big Data Engineer Resume

New, JerseY


  • Around 6 years of IT experience with 3 years in developing data pipelines on Big Data Technologies such as Spark, Hive, Pig, Hadoop, MapReduce, Sqoop, Kafka.
  • Experience in developing Apache Spark programs using Java, Scala, Python.
  • Commendable knowledge on Spark architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib
  • Experienced in writing Spark programs/application in Scala using Spark APIs for Data Extraction, Transformation and Aggregation
  • Expertise in processing large sets of structured, semi - structured data in Spark & Hadoop, and store them in HDFS
  • Experienced in Spark Framework on both batch and real-time data processing
  • Experience in developing Kafka Consumer API using Spark Scala applications
  • In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts
  • Developed MapReduce programs in Java for data cleansing, data filtering, and data aggregation
  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Hortonworks and Cloudera
  • Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the Hive QL queries.
  • Implemented Hive UDF's to achieve customized functionality.
  • Experience in ETL operations on Hive to Spark.
  • Worked on importing and exporting RDBMS data into HDFS and Hive using Sqoop.
  • Experienced in analyzing data using PIG Latin scripts
  • Proficient in big data ingestion and streaming tools like Sqoop, Kafka.
  • Good Knowledge on NoSQL data bases and hands on work experience in writing applications on NoSQL databases like Cassandra and MongoDB.
  • Working knowledge on RDBMS Databases like Oracle11g, SQL Server, MySQL, MS Access.
  • Good knowledge of Data warehousing concepts and ETL processes.
  • Good knowledge on various scripting languages like Linux/Unix shell scripting and Python.
  • Experienced in using IDEs and Tools like Eclipse, Net Beans, GitHub, Maven and IntelliJ.
  • Experienced in working with different file formats - Avro, text file, XML, JSON, CSV.
  • Good understanding of algorithms, data structures, performance optimization techniques and object-oriented programming.
  • Proficient in Data Visualization by creating multiple dashboards using Tableau, R.
  • Skilled in using version control software such as GIT.
  • Robust understanding of Agile methodology and implementing Scrum structure in Project development.
  • Involved in various stages of waterfall Model methodology like Analysis, Development and Maintenance
  • Ability to work independently and a strong team player in a team as well with excellent communication skills.
  • Quick learning ability, self-motivated, adaptability to new environment


Languages: Cluster Mgmt.& Monitoring Python2.7, Java1.8, Scala2.10, SQL, R, C, C++.\ Cloudera 5.7.6, Horton works Ambari 2.5.

Hadoop Ecosystem: Hadoop2.6, MapReduce v1 & v2, YARN, Spark1.6, Spark SQL, Spark Streaming with HDFS, SQOOP1.4.6, Hive0.13, Pig, Kafka. scala, Spark with python.

Database: Oracle11g, SQL Server, MySQL, MS Access.\ VM ware workstation, Oracle VM Virtual Box.

No SQL Databases: MongoDB, Cassandra.\ MS Excel, R, Tableau.

Cloud Computing: Google Cloud.Eclipse, Net Beans, GitHub, Maven, IntelliJ.

Operating Systems: Unix, Linux, Windows, Git, SVN.


Confidential, New Jersey

Big Data Engineer


  • Performed data Ingestion from various sources into Hadoop Data Lake using Kafka.
  • Built real time pipeline for streaming data using Kafka and Spark Streaming.
  • Written and ran Java Producer programs to post messages to topics.
  • Wrote and ran Java Consumer programs to read and process messages from Kafka topics.
  • Created tables in DataStax Cassandra and loaded large sets of data for processing. => hdfs
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
  • Responsible for the Implementation of POC to migrate map reduce jobs into Spark RDD transformations using scala.
  • Created Spark Application to load data into Dynamic Partition Enabled Hive Table.
  • Created Hive external tables for each source table in Hadoop Data Lake.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Optimized the data sets by creating dynamic partitioning and bucketing in Hive.
  • Developed business specific Custom UDF's in Hive, Pig.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
  • Experience in code repositories such as Git

Environment: CDH 5.7.6, Hadoop 2.6, Spark 1.6.0, Scala 2.10, Maven, Kafka2.10, Sqoop 1.4.6, Mapreduce, HDFS, Pig, Hive0.13, Intellij, Oracle, DataStax Cassandra 4.8, Centos, Windows, Python 2.7, Tableau 9.0

Confidential, Charlotte, NC

Data Engineer


  • Worked on analyzing Hadoop cluster using different big data analytic tools including Spark, Pig, Hive and MapReduce.
  • Developed Spark code using Scala for faster processing of data.
  • Migrated complex Map reduce programs, Hive scripts into Spark RDD transformations and actions.
  • Developed Scala scripts, UDF's using both SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through SQOOP.
  • Design and Develop Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs.
  • Written PIG scripts to process unstructured data and available to process in Hive.
  • Created Hive schemas using performance techniques like partitioning and bucketing.
  • Performed data analysis with Cassandra using Hive External tables.
  • Exported the analyzed data to Cassandra using Sqoop and to generate reports for the BI team.
  • Involved in deploying code into version control git
  • Worked on different data formats such as CSV and JSON

Environment: CDH, HDFS, SPARK, Pig, Hive, Sqoop, Map Reduce, YARN, UNIX Shell Scripting, Agile Methodology


Hadoop Developer

  • Worked on live 8 node Hadoop clusters running CDH 4.
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS).
  • Developed several MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data into HDFS.
  • Responsible for creating Hive External tables and loaded the data into tables and query data using HiveQL.
  • Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
  • Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
  • Involved in Design, Development and support of the application used Agile Methodology and participated in scrum meetings
  • Developed user interfaces using JSP, HTML, Java Script, CSS Client Server network communication design and Development
  • Offline Location based ERP Design and Development
  • Conducted Design reviews and Technical reviews with other project statehood Implemented Services using Core Java.
  • Developed analysis level documentation such as Use Case, Business Domain Model, Activity & Sequence and Class Diagrams.
  • Develop client and server using core java, Swing and C++
  • Technical Support to client

Environment: Java, Core Java, AWT, Applet, Swing and C++, Struts, JSP and Servlet, JDBC and SQL Server


UI Developer


  • Used HTML, CSS to build page layouts.
  • Used JavaScript and jQuery to handle all events that are triggered by users, such as hover and click.
  • Following the design requirement to design user-friendly layout by using HTML and CSS.
  • Request and Get data from backend using AJAX to exchange JSON data with back-end.
  • Used SVN for version control and QC for defect tracking.
  • Creating cross-browser compatibility and standards-compliant CSS-based page layouts.
  • Daily website maintenance and updating content.
Environment: HTML, XHTML, XSL, CSS, AJAX, JSON, jQuery, RESTful

Hire Now