We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume


  • Overall 7 years of progressive IT experience in Analysis, Enterprise Application Development, Database Administration and Big data technologies. Complete life cycle (SDLC) experience of a product involved in System Analysis, Architecture, Technical design, development, testing, deployment & support medium to large - scale business applications using Agile Scrum and iterative development methodologies.
  • Experience in developing, implementing, configuring, testing various systems using Hadoop technologies.
  • Good understanding of hadoop daemons like Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker and Yarn Architecture.
  • Experience in using Hive QL for analyzing, querying and summarizing huge data sets.
  • Experienced with Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
  • Used Pig as ETL tool to do transformations, joins, filter and some pre-aggregation.
  • Developed User Defined Functions ( UDFs ) for Pig and Hive using Java based languages.
  • Queried both Managed and External tables created by Hive using Impala.
  • Experience in loading logs from multiple sources directly into HDFS using Flume .
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map - Reduce, Pig, Hive, and Sqoop ) as well as system specific jobs (such as Java programs and shell scripts).
  • Experience in scheduling Jobs thru Oozie and knowledge on Autosys, TAC and Zena .
  • Experience in processing of real-time data using Spark and Scala.
  • Good knowledge of integrating Spark Streaming with Kafka for real time processing of streaming data.
  • Hands-on experience with message brokers such as Apache Kafka and RabbitMQ .
  • Experience working with MapR volumes and snapshots for data redundancy.
  • Experience in fetching data into Hadoop Datalake from various databases like MySQL, Oracle, DB2, Teradata and SQL Server using Sqoop.
  • Hands on Evaluation of ETL ( Talend ) and OLAP tools and recommend the most suitable solutions based on business requirements.
  • Experience in generating reports using Tableau by connecting to Hive .
  • Experience in using Kerberose for authenticating end users using hadoop in a secure mode.
  • Experience in working with file formats like Parquet, Avro, RC, ORC, Sequence Files and JSON etc.
  • Excellent knowledge on UNIX and Shell scripting.
  • Expertise in design and development of Web Applications involving J2EE technologies with Java, Spring, EJB, Hibernate, Servlets, JSP, Struts, Web Services, XML, JMS, JDBC etc.
  • Extensive experience in using Relational databases like Oracle, SQL Server, DB2, Teradata and MySQL.
  • Experience in working with different Hadoop distributions like CDH, MapR and HortonWorks.
  • Expertise in using Tomcat server and also application servers like JBoss and Web Logic.
  • Good knowledge in Finance and Health Care Domains.


Hadoop / Big Data Stack: HDFS, YARN, MapReduce, Pig, Hive, Spark, SparkSQL, Scala, Kafka, ZooKeeper, HBase, Spark, Sqoop, Flume, Shell script, Oozie.

Hadoop Distributions: MapR, Horton Works.

Databases: Oracle, MySQL, DB2, Teradata, SQL Server, Sybase.

No SQL Databases: HBase, Cassandra.

Query Languages: HiveQL, SQL, Pig.

Web Technologies: Java, Servlets, EJB, JavaScript, CSS, Bootstrap.

Frameworks: MVC, Struts, Spring, And Hibernate.

Build& Integration Tools: Maven, Ant, Jenkins.

Operating Systems: Windows, Linux, Unix and CentOS.



Sr. Big data Engineer


  • Developed Sqoop Framework to ingest Historical data and incremental data from Oracle, DB2 and SQL Server etc.
  • Worked on flume, to read the messages from JMS Queue to load in HDFS.
  • Developed MapReduce programs using Java to parse the raw data, populate staging tables and store the refined data in partitioned tables in HDFS.
  • Transformed raw data by developing Pig scripts and loaded the data into HBase tables.
  • Developed custom UDF’s to generate unique key for the use in pig transformations.
  • Designed HBase schema to avoid hot spotting and exposed the data from HBase tables to REST API on UI.
  • Identified control characters in the data and developed scripts to remove them.
  • Converted existing Pig Scripts to Spark, as part of improving performance.
  • Helped market analysts by creating Hive queries to spot the emerging trends by comparing fresh data with HDFS reference tables and historical metrics.
  • Developed spark code using Scala for faster data processing using RDD's and Dataframe API.
  • Executed Spark SQL queries against data in Hive in spark context and done performance optimization.
  • Worked on Creating Kafka topics, partitions, writing custom practitioner classes.
  • Defined the job flows in Oozie to automate the process of data loading into the HDFS and Pig.
  • Involved in creating POCs to ingest and process streaming data using Spark streaming and Kafka.
  • Performed various performance optimizations like using distributed cache for small datasets, Partitions, Bucketing in hive and Map side joins in MapReduce.
  • Created Branches in GitHub, pushed the code and deployed to production thru Jenkins for the production release.
  • Involved in complete SDLC life cycle of big data project that includes requirement analysis, design, coding, testing and production.

Environment: Hadoop, HDFS, MapR, Pig, Hive, Spark, SparkSQL, Scala, HBase, Oozie, Sqoop, Flume, Kafka, Linux, Java, Maven, Junit, GitHub, Jenkins.


Hadoop Developer


  • Implemented EP Data Lake provides a platform to manage data in a central location so that anyone in the firm can rapidly query, analyze or refine the data in a standard way.
  • Involved in moving legacy data from Sybase ASE data warehouse to Hadoop Data Lake and migrating the data processing to lake.
  • Responsible for creating Data store, Datasets and Virtual Warehouse in the lake and then creating Spark and Hive refiners to implement the existing SQL Stored Procedures.
  • Developed MapReduce programs using Java to parse the raw data, populate staging tables and store the refined data in partitioned tables in HDFS.
  • Created Hive refiners for simple UNIONs and JOINS.
  • Handled Hive Queries using Spark SQL that integrates Spark environment.
  • Automated the triggering of Data Lake REST API calls using Unix Shell Scripting.
  • Used Scala to test Data frame transformations and debugging issues with data.
  • Redesigned and implemented Scala REPL (read-evaluate-print-loop) to tightly integrate with other IDE features in Eclipse.
  • Added AppDynamics monitoring to JVM to gather statistics for REST application.
  • Used Avro format for staging data and Parquet for final repository.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
  • Worked on data modeling service which is our own tool (i.e. PURE MODEL). I have used the data from data lake virtual warehouse and I have exposed the output of data model to Java web services and which has been accessed by the end users.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
  • Used Sqoop import and export functionalities to handle large data set transfer between Sybase database and HDFS.
  • Worked in tuning Hive and Pig scripts to improve performance.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time and data availability.
  • Performed unit testing and integration testing using Junit framework.
  • Configured build scripts for multi module projects with Maven and Jenkins.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, HDFS, Horton Works, Spark, SparkSQL, Scala, Pig, Hive, Oozie, Sqoop, Sybase, Kafka, Linux, Java, Maven, Junit, Maven, Jenkins.


Java Developer


  • Involved in Requirement analysis and design, development of the application using Java Technologies.
  • Developed the login screen so that the application can be accessed only by authorized and authenticated administrators.
  • Used HTML, CSS, JSP's to design and develop front end and used Java Script to perform user validation.
  • Performed Designing, developing, and configuring server side J2EE components like EJB, Java Beans, and Servlets.
  • Involved in Creating tables, functions, triggers, sequences and stored procedures in PL/SQL.
  • Implemented business logic by developing Session Beans.
  • Involved in developing JSP pages using Struts custom tags, JQuery and Tiles Framework.
  • Used Hibernate as the ORM and PL/SQL for handling database processing.
  • Used JDBC-API to communicate with the Database.
  • Developed application using Waterfall model software methodology.
  • Involved In technical documentation of project.

Environment: Java, HTML, CSS, JSP, Servlets, EJB, JQuery, JDBC, Hibernate, PL/SQL.

Hire Now