- 14+ Years of extensive IT experience in Analysis, Architecture, Design, Development, Testing, Maintenance, and User training of software applications which including 4+ years of experience working on Apache Hadoop ecosystem and Apache Spark and over 10+ Years of experience in Java/J2EE.
- Hands on experience in developing and deploying enterprise - based applications using major components in Hadoop ecosystem such as Hadoop 2.x, YARN, Hive, Pig, Map Reduce, Sqoop, Spark, Scala, Kafka, Oozie.
- Good knowledge in handling messaging services using Apache Kafka.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
- Expertise in using Spark-SQL with various data sources like JSON, Parquet and Hive.
- Experience in usage of Hadoop distribution like Cloudera and Horton Works distribution.
- Experience in transferring data from RDBMS to HDFS and HIVE table using Sqoop.
- Experience in creating tables, partitioning, bucketing, loading and aggregation using HIVE.
- Migrating the code from Hive to Spark/PySpark and Scala/Python using Spark-SQL and Spark Windows function.
- Extensive experience in Spring Core, Spring IOC, Spring MVC, Spring Web Flow, Spring Batch, Spring Security, Spring Boot for micro-services, Hibernate framework, iBatis and AJAX.
- Experience in writing Apache MAVEN, and Log4J and JUnit for unit testing.
- Extensive experience in developing Use Cases, Activity Diagrams, Sequence Diagrams and Class Diagrams using Visio.
- Sun Certified Java Programmer (SCJP 1.5).
- Work with business analysts to understand problems and provided architecture optimal solution.
- Experience working in environments using Agile (SCRUM) and BDD (Behavior-Driven Development), Test-Driven development methodologies.
- Excellent team player with good communication, people and leadership skills.
Sr. Bigdata Engineer
- Responsible for developing data pipeline using Spark, Scala, Apache Kafka to ingestion the data from CSL source and store in HDFS protected folder.
- Implemented many Kafka ingestion jobs to consume the real time data processing and batch processing.
- Used HBase for storing the Kafka topic, partition number and Offsets value. Also used phoenix jar to connect HBase table.
- Used PySpark to creating batch job for merge multiple small files (Kafka stream files) into single larger files in parquet format.
- All Spark/PySpark jobs we are implemented Progtegrity API for writing & reading PCI/PII data from HDFS location or Hive table.
- Implemented multiple function in PySpark program like 'UnionAll' function to combine the two Dataset & remove duplicates.
- Implemented on spark using Scala/Java custom function for map object.
- Developed Autosys scripts to schedule the Kafka streaming and batch job.
- Involved in creating Hive tables and loading and analyzing data using hive queries.
- Used Ambari to monitor node’s health and status of the jobs in Hadoop clusters.
- Used Rally for user-story/bug tracking and Bit Bucket to check-in and checkout code changes.
Bigdata Developer & Java Tech Lead
- Responsible for building scalable distributed data solutions using Hadoop Eco system and Spark.
- Developed Spark applications for the entire batch processing by using Scala.
- Imported data from different sources into Spark RDD for processing. Developed custom aggregate functions using Spark SQL and performed interactive querying.
- Utilized spark data frame and spark sqlapi extensively for all the processing.
- Integrated Kafka with Spark Streaming for real time data processing.
- Experience in managing and reviewing Hadoop log files. Experience in hive partitioning, bucketing and perform joins on hive tables.
- Importing and exporting the analyzed data to the relational databases into HDFS using Sqoop.
- New library development with microservices architecture using Rest APIs, spring boot, Pivotal Cloud Foundry and AWS.
- Create and configured the continuous delivery pipelines for deploying microservices using Jenkins.