We provide IT Staff Augmentation Services!

Sr. Bigdata Engineer Resume


  • 14+ Years of extensive IT experience in Analysis, Architecture, Design, Development, Testing, Maintenance, and User training of software applications which including 4+ years of experience working on Apache Hadoop ecosystem and Apache Spark and over 10+ Years of experience in Java/J2EE.
  • Hands on experience in developing and deploying enterprise - based applications using major components in Hadoop ecosystem such as Hadoop 2.x, YARN, Hive, Pig, Map Reduce, Sqoop, Spark, Scala, Kafka, Oozie.
  • Good knowledge in handling messaging services using Apache Kafka.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
  • Expertise in using Spark-SQL with various data sources like JSON, Parquet and Hive.
  • Experience in usage of Hadoop distribution like Cloudera and Horton Works distribution.
  • Experience in transferring data from RDBMS to HDFS and HIVE table using Sqoop.
  • Experience in creating tables, partitioning, bucketing, loading and aggregation using HIVE.
  • Migrating the code from Hive to Spark/PySpark and Scala/Python using Spark-SQL and Spark Windows function.
  • Extensive experience in Spring Core, Spring IOC, Spring MVC, Spring Web Flow, Spring Batch, Spring Security, Spring Boot for micro-services, Hibernate framework, iBatis and AJAX.
  • Experience in writing Apache MAVEN, and Log4J and JUnit for unit testing.
  • Extensive experience in developing Use Cases, Activity Diagrams, Sequence Diagrams and Class Diagrams using Visio.
  • Sun Certified Java Programmer (SCJP 1.5).
  • Work with business analysts to understand problems and provided architecture optimal solution.
  • Experience working in environments using Agile (SCRUM) and BDD (Behavior-Driven Development), Test-Driven development methodologies.
  • Excellent team player with good communication, people and leadership skills.



Sr. Bigdata Engineer


  • Responsible for developing data pipeline using Spark, Scala, Apache Kafka to ingestion the data from CSL source and store in HDFS protected folder.
  • Implemented many Kafka ingestion jobs to consume the real time data processing and batch processing.
  • Used HBase for storing the Kafka topic, partition number and Offsets value. Also used phoenix jar to connect HBase table.
  • Used PySpark to creating batch job for merge multiple small files (Kafka stream files) into single larger files in parquet format.
  • All Spark/PySpark jobs we are implemented Progtegrity API for writing & reading PCI/PII data from HDFS location or Hive table.
  • Implemented multiple function in PySpark program like 'UnionAll' function to combine the two Dataset & remove duplicates.
  • Implemented on spark using Scala/Java custom function for map object.
  • Developed Autosys scripts to schedule the Kafka streaming and batch job.
  • Involved in creating Hive tables and loading and analyzing data using hive queries.
  • Used Ambari to monitor node’s health and status of the jobs in Hadoop clusters.
  • Used Rally for user-story/bug tracking and Bit Bucket to check-in and checkout code changes.


Bigdata Developer & Java Tech Lead


  • Responsible for building scalable distributed data solutions using Hadoop Eco system and Spark.
  • Developed Spark applications for the entire batch processing by using Scala.
  • Imported data from different sources into Spark RDD for processing. Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Utilized spark data frame and spark sqlapi extensively for all the processing.
  • Integrated Kafka with Spark Streaming for real time data processing.
  • Experience in managing and reviewing Hadoop log files. Experience in hive partitioning, bucketing and perform joins on hive tables.
  • Importing and exporting the analyzed data to the relational databases into HDFS using Sqoop.
  • New library development with microservices architecture using Rest APIs, spring boot, Pivotal Cloud Foundry and AWS.
  • Create and configured the continuous delivery pipelines for deploying microservices using Jenkins.

Hire Now