We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Franklin Lakes, NJ


  • Over 14+ years of professional, ITES and IT experience including 4+ years in Hadoop/Big data ecosystem.
  • Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node and Data Node.
  • Hands on experience in installing and deployment of Hadoop ecosystem components like Hadoop Map Reduce, YARN, HDFS, NoSQL, HBase, Oozie, Hive, Tableau, Sqoop, Zoo Keeper and Flume.
  • PySpark API's working knowledge.
  • Strong experience in Hadoop distributions like Cloudera and Hortonworks.
  • Excellent Hands on Experience in developing Hadoop Architecture within the project in Windows and Linux platforms.
  • Good technical Skills in Oracle 11i, SQL Server, ETL Development using Informatica tool.
  • Expert in importing and exporting data from Oracle and MySQL databases into HDFS using Sqoop and Flume.
  • Experience in ingesting the streaming data to Hadoop clusters using Flume and Kafka.
  • Performed data analytics using PIG and Hive for Data Architects and Data Scientists within the team.
  • Experience with NoSQL databases like HBase, and Cassandra as well as other ecosystems like Zookeeper, Oozie, Storm etc.
  • Experience in Job scheduling using automation tools like Control - M, JAMS, Autosys.
  • Developed stored procedures and queries using PL/SQL.
  • Expertise in RDBMS like Oracle, MS SQL Server, TERADATA, MySQL and DB2.
  • Strong analytical skills with the ability to quickly understand a client’s business needs. Involved in meetings to gather information and requirements from the clients. Leading the Team and involved in Onsite, Offshore co-ordination.


Analytical Tools: SQL, Jupiter Notebook, Tableau

NoSQL: Cassandra, HBase, MongoDB

Hadoop Distributions: Cloudera, Hortonworks

Workload Automation Tool: Control-M, JAMS, Autosys.

Big Data: Spark, Hive, Sqoop, HBase, Hadoop, HDFS, Flume, Shell Script PySpark, Scala.

Databases: Oracle 11g/10g, DB2 8.1, MS-SQL Server, My SQL

Operating Systems: Unix / Linux, Windows 2000/NT/XP


Confidential, Franklin Lakes, NJ

Big Data Engineer


  • Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data.
  • Created Hive Partitioned and Bucketed tables to improve performance.
  • Involved in capacity planning, the configuration of the Cassandra Cluster on DATASTAX.
  • Design, development, and implementation of performant ETL pipelines using python API (pySpark) of Apache Spark.
  • Writing reusable, testable, and efficient code Experience in using Sqoop to import the data on to Cassandra tables from different relational databases.
  • Importing and exporting data into HDFS from database and vice versa using Sqoop.
  • Worked on the core and Spark SQL modules of Spark extensively using programming languages like Python and Scala.
  • Closely work Business Intelligence team for Data Analysis, reporting dashboard and recommending solutions.
  • Utilizing Spark streaming to receive real-time data from the Kafka and store the stream data to HDFS using Python/PySpark also Scala and databases such as HBase
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Performance tuning of pySpark scripts.
  • Involved in creating Hive tables, loading with data and writing hive queries.
  • Responsible in exporting analyzed data to relational databases using Sqoop.
  • Creating the tables in Hive and integrating data between Hive & Spark.
  • Responsible for tuning Hive to improve performance.
  • Documented the technical details Hadoop cluster management and daily batch pipeline, which includes several jobs of Hive, Sqoop, Oozie, and other scripts.

Environment: Cassandra, HDFS, Hbase, Spark, pySpark, Hortonworks, Hive, Oozie, YARN, and Sqoop

Confidential, El Segundo, CA

Big Data Developer


  • Defining the metrics for the Big data analytics proof of concepts
  • Defining the requirements for data lakes/pipelines
  • Creating the tables in Hive and integrating data between Hive & Spark
  • Developed python scripts to collect data from source systems and store it on HDFS to run analytics
  • Created Hive Partitioned and Bucketed tables to improve performance
  • Created Hive tables with User defined functions
  • Involved in code review and bug fixing for improving the performance
  • Worked on the core and Spark SQL modules of Spark extensively using programming languages like Scala
  • Perform extensive studies of different technologies and capture metrics by running different algorithms
  • Converting the SAS algorithms into different technologies
  • Design and implement data ingestion techniques for real-time data coming from various source systems
  • Defining the data layouts and rules and after consultation with ETL teams
  • Worked in an AGILE environment and participated in daily Stand-ups/Scrum Meetings

Environment: Python, HDFS, MapReduce, PL/SQL, Hive, Spark, AGILE, Spark SQL, Scala, Sqoop, Oracle, Quality Center, Windows.


Big Data Consultant


  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Hive, HBase and SQOOP.
  • Coordinated with business customers to gather business requirements and interact with other technical peers to derive Technical requirements.
  • Extensively involved in the Design phase and delivered Design documents.
  • Involved in Testing and coordination with business in User testing.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Transforming the data using Spark applications for analytics consumption.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing hive queries.
  • Experienced in defining job flows.
  • Experience in developing ETL data pipelines using pyspark.
  • Used Hive to analyze the partitioned data and compute various metrics for reporting.
  • Experienced in managing and reviewing the Hadoop log files.
  • Used ETL tool to do Transformations, even joins and some pre-aggregation.
  • Load and Transform large sets of structured and semi-structured data.
  • Responsible to manage data coming from different sources.
  • Created Data model for Hive tables.
  • Involved in Unit testing and delivered Unit test plans and results documents.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Worked on Oozie workflow engine for job scheduling.

Environment: Hadoop, HDFS, Spark, Cloudera Manager, Hive, YARN, Sqoop, HBase, Oozie, Flume.


Data Processing Analyst


  • Processing the applications from the across India.
  • Downloading the scanned images from the data source.
  • Creating a database by sequencing them in an order to be processed.
  • Digitalizing the data by keying in the data to the SQL database.
  • Creating the excel to tables with the data to make it available for the users.
  • Arranging the tables according to the user requirement using oracle database.

Environment: SQL Database, PL/SQL MS SQL.

Confidential, Harrisburg, PA

Process Associate


  • Claim forms will be received in Lotus notes in images.
  • Doing a batch process by typing the numbers and information from the image to template.
  • Vertexing (verifying text) by seeing the claim forms and keying in the information in the database.
  • If an image of multiple pages separating them in batches to join them.
  • Processing medical claim forms.
  • Reviewing the database done by other Verterxers.

Environment: IBM Lotus Notes

Hire Now