We provide IT Staff Augmentation Services!

Sr. Big Data/spark Developer Resume

Philadelphia, PA


  • Currently working in a Big Data Capacity with the help of Hadoop Eco System across internal and cloud - based platforms
  • 6+ years of experience as Big Data/Hadoop with skills in analysis, design, development, testing and deploying various software applications
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa
  • Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL)
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS
  • Experience in developing custom UDF's for Pig and Apache Hive to in corporate methods and functionality of Java into PigLatin and HiveQL
  • Good experience in developing MapReduce jobs in J2EE /Java for data cleansing, transformations, pre-processing and analysis
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2webservices which provides fast and efficient processing of Teradata BigData Analytics
  • Experience in collection of LogData and JSON data into HDFS using Flume and processed the data using Hive/Pig
  • Strong exposure to Web2.0 client technologies using JSP, JSTL, XHTML, HTML5, DOM, CSS3, JavaScript and AJAX
  • Experience working with cloud platforms, setting up environments and applications on AWS, automation of code and infrastructure (DevOps) using Chef and Jenkins
  • Extensive experience on developing Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used Spark SQL as required
  • Experience on developing JAVA MapReduce jobs for data cleaning and data manipulation as required for the business
  • Strong knowledge on Hadoop eco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc
  • Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate
  • Expertise in JavaScript, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls
  • Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR
  • Extensive experience in working with Oracle, MSSQL Server, DB2, MySQL
  • Experience working with Horton works and Cloudera environments
  • Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required
  • Excellent experience in installing and running various Oozie workflows and automating parallel job executions
  • Experience on Spark and Spark SQL, Spark Streaming, Spark GraphX, Spark Mlib
  • Extensively development experience in different IDE like Eclipse, Net Beans, IntelliJ and STS
  • Strong experience in core SQL and Restful web services (RWS)
  • Strong knowledge in NOSQL column-oriented databases like HBase and its integration with Hadoop cluster
  • Good experience in Tableau for Data Visualization and analysis on large datasets, drawing various conclusions
  • Experience in using Python, R for statistical analysis
  • Good knowledge of coding using SQL, SQLPlus, T-SQL, PL/SQL, Stored Procedures/Functions
  • Worked on Bootstrap, AngularJS and NodeJS, knockout, ember, Java Persistence Architecture (JPA)
  • Experienced in developing applications using all Java/J2EE technologies like Servlets, JSP, EJB, JDBC, JNDI, JMS, SOAP, REST, GRAILS etc
  • Well versed working with Relational Database Management Systems as Oracle12c, MSSQL, MySQL Server
  • Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.


Confidential, Philadelphia, PA

Sr. Big Data/Spark Developer


  • Involved in analysing business requirements and prepared detailed specifications that follow project guidelines required for project development
  • Used Sqoop to import data from Relational Databases like MySQL, Oracle
  • Involved in importing structured and unstructured data into HDFS
  • Responsible for fetching real-time data using Kafka and processing using Spark and Scala
  • Worked on Kafka to import real-time weblogs and ingested the data to Spark Streaming
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations
  • Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API
  • Worked on Hive to implement Web Interfacing and stored the data in Hive tables
  • Migrated Map Reduce programs into Spark transformations using Spark and Scala
  • Experienced with Spark Context, Spark-SQL, Spark YARN
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into a spark for faster processing of data
  • Loaded the data into Spark RDD and do in-memory data Computation to generate the Output response
  • Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data
  • Implemented Hive Partitioning and Bucketing on the collected data in HDFS
  • Involved in Data Querying and Summarization using Hive and Pig and created UDF's, UDAF's and UDTF's
  • Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters
  • Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs
  • Developed traits and case classes etc. in Scala
  • Developed Spark scripts using Scala shell commands as per the business requirement
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances
  • Experienced in loading the real-time data to the NoSQL database like Cassandra
  • Well versed in using Data Manipulations, Compactions, in Cassandra
  • Experience in retrieving the data present in Cassandra cluster by running queries in CQL (Cassandra Query Language)
  • Worked on connecting the Cassandra database to the Amazon EMR File System for storing the database in S3
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
  • Deployed the project on Amazon EMR with S3 connectivity for setting backup storage
  • Well versed in using of Elastic Load Balancer for Autoscaling in EC2 servers
  • Configured workflows that involve Hadoop actions using Oozie
  • Used Python for pattern matching in build logs to format warnings and errors
  • Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint

Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cassandra, Cloudera, Oracle 10g, Linux.

Confidential, NJ

Hadoop Developer


  • Worked on analyzing the Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop
  • Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams
  • Worked on NoSQL (HBase) for support enterprise production and loading data into HBASE using Impala and SQOOP
  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS
  • Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
  • Architect, Design and develop Hadoop ETL by using Kafka
  • Support REST-Based ETL Hadoop software in higher environments like UAT, Production
  • Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database
  • Configured Hive metastore with MySQL, which stores the metadata for Hive tables
  • Created tables in HBase to store variable data formats of PII data coming from different portfolios
  • Involved in identifying job dependencies to design workflow for Oozie & YARN resource management
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation
  • Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP
  • Implemented Map Reduce jobs in HIVE by querying the available data
  • Performance tuning of Hive queries, MapReduce programs for different applications
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
  • Used Cloudera Manager for installation and management of Hadoop Cluster
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis
  • Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts

Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Oracle 12c, Flume, Oozie, HBase, Impala, Spark Streaming, Yarn, Eclipse, spring, PL/SQL, UNIX Shell Scripting, Cloudera, BitBucket.


Jr.JAVA Developer


  • Involved in the analysis, design, implementation, and testing of the project
  • Implemented the presentation layer with HTML, XHTML and JavaScript
  • Developed web components using JSP, Servlets, and JDBC
  • Designed tables and indexes
  • Extensively worked on JUnit for testing the application code of server-client data transferring
  • Developed and enhanced products in design and in alignment with business objectives
  • Used SVN as a repository for managing/deploying application code
  • Involved in the system integration and user acceptance tests successfully
  • Developed front end using JSTL, JSP, HTML, and JavaScript
  • Wrote complex SQL queries and stored procedures
  • Involved in fixing bugs and unit testing with test cases using Junit
  • Actively involved in system testing
  • Involved in implementing service layer using Spring IOC module
  • Prepared the Installation, Customer guide, and Configuration document which were delivered to the customer along with the product

Environment: Java, JSP, JSTL, HTML, JAVAScript, Servlets, JDBC, JavaScript, MySQL, JUnit, Eclipse IDE.

Hire Now