We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

3.00/5 (Submit Your Rating)

Dover, NH


  • Spark/Hadoop developer with 8+ Years of professional IT experience including 3+ Years of Big data consultant experience in Hadoop ecosystem components in Data Ingestion, Data modeling, Querying, Processing, Storage Analysis, Data Integration and Implementing enterprise level systems spanning Big data.
  • Excellent hands on experience in Data Extraction, Transformation, Loading and Data Analysis and Data Visualization using Cloudera Platform (Spark, Scala, HDFS, Hive, Sqoop, Kafka, Oozie)
  • Developed end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirements.
  • Well versed with developing and implementing Spark programs in Scala using Hadoop to work with Structured and Semi - structured data.
  • Used Spark for interactive queries, processing of streaming data and integration with NoSQL database for huge volume of data.
  • Extract data from heterogeneous sources like flat files, MySQL, Teradata into HDFS using Sqoop and vice versa.
  • Extensive experience working with structured data using Spark SQL, Data frames, Hive QL, optimizing queries, and incorporate complex UDF's in business logic.
  • Experience working with Text, Sequence files, XML, Parquet, JSON, ORC, AVRO file formats and Click Stream log files.
  • Experienced in migrating ETL transformations using Spark jobs and Pig Latin Scripts.
  • Experience in transferring Streaming data from different data sources into HDFS and HBase using Apache Kafka and Flume.
  • Experience in using Oozie schedulers and Unix Scripting to implement Cron jobs that execute different kind of Hadoop actions.
  • Good experience in optimization/performance tuning of Spark Jobs, PIG & Hive Queries.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop architecture, data modeling and data mining and advanced data processing. Experience optimizing ETL workflows.
  • Excellent understanding of Spark Architecture and framework, Spark Context, APIs, RDDs, Spark SQL, Data frames, Streaming, MLlib.
  • Good understanding of Hadoop Gen1/Gen2 architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Secondary Name Node, Data Node and YARN architecture and its deamons Node manager, Resource manager and App Master and Map Reduce Programming Paradigm.
  • Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop MRV1/MRV2, HDFS, Hive, Oozie, Sqoop, Hue, Pig, Flume, HBase, Zookeeper with Cloudera distribution.
  • Hands on experience in using the Hue browser for interacting with Hadoop components.
  • Good understanding and Experience with Agile and Waterfall methodologies of Software Development Life Cycle (SDLC).
  • Highly motivated, self-learner with a positive attitude, willingness to learn new concepts and accepts challenges.


Languages: Scala, Python, SQL, Java Big Data Spark and Scala, Hadoop Ecosystem Components - HDFS, Hive, Sqoop, Impala, Flume, Map Reduce, Pig and Cloudera Hadoop Distribution CDH 5.8.2

Databases: NoSQL- HBase, Cassandra, SQL- DB2, MySQL, Teradata, SchedulersOozie, CA7-ESP

OS Windows: 7/8.1/10, Unix, Linux

Other Tools: Apache Zeppelin, SOLR, Hue, IntelliJ IDEA, Eclipse, DB Visualizer, Maven, Zoo Keeper


Confidential, Dover, NH

Spark/Hadoop Developer

Environment: Java 1.8, Scala 2.10.5, Apache Spark 1.6.0, Apache Zeppelin, MySQL, CDH 5.8.2, IntelliJ IDEA, Hive, HDFS, YARN, Map Reduce, Sqoop 1.4.3, ivy 2.0, Flume, SOLR, Unix Shell Scripting, Python 2.6, Apache Kafka.


  • Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Load the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
  • Developed data pipelines using Spark, Hive and Sqoop to ingest, transform and analyze operational data.
  • Developed Spark jobs and Hive jobs to summarize and transform data.
  • Tuning Spark application to improve performance.
  • Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
  • Real time streaming the data using Spark with Kafka. Responsible for handling Streaming data from web server console logs.
  • Performance tuning of long running Greenplum user defined functions. Leveraged the feature of temporary tables break the code into small sub part load to a temp table and join it later with the corresponding join tables. Table distribution keys are modified based on the data granularity and primary key column combination.
  • Worked on different file formats like Text, Sequence files, Avro, Parquet, ORC, JSON, XML files and Flat files using Map Reduce Programs.
  • Developed daily process to do incremental import of data from DB2 and Teradata into Hive tables using Sqoop.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MR jobs.
  • Work with cross functional consulting teams within the data science and analytics team to design, develop and execute solutions to derive business insights and solve clients operational and strategic problems.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into HBase tables.
  • Extensively worked with Partitions, Dynamic Partitioning, Bucketing tables in Hive and designed both Managed and External tables and also worked on optimization of Hive queries.
  • Involved in collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Assisted analytics team by writing Pig and Hive scripts to perform further detailed analysis of the data.
  • Designing Oozie workflows for job scheduling and batch processing.

Confidential, Portsmouth, NH

Big Data/Hadoop Developer

Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Hbase, DB2, Flume, ESP, Oozie, CDH 5.6.1, Maven, Unix Shell Scripting.

Roles and Responsibilities:

  • Developed Map Reduce programs for data extraction, transformation and aggregation. Supported Map Reduce Jobs those are running on the cluster.
  • Implemented solutions for ingesting data from various sources and processing the Data utilizing Big Data Technologies such as Hive, Spark, Pig, Sqoop, HBase, Map reduce, etc.
  • Worked on creating Combiners, Partitioners and Distributed cache to improve the performance of Map Reduce jobs.
  • Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Optimization of Map reduce algorithms using combiners and partitioners to deliver the best results and worked on Application performance optimization for a HDFS cluster.
  • Wrote Hive Queries to have a consolidated view of the telematics data.
  • Orchestrated many Sqoop scripts, Pig scripts, Hive queries using Oozie workflows and sub workflows.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers and pushed to HDFS.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in debugging Map Reduce jobs using MRUnit framework and optimizing Map Reduce jobs.
  • Involved in troubleshooting errors in Shell, Hive and Map Reduce.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Implemented solutions for ingesting data from various sources and processing the Data Utilizing Big Data Technologies such as Hive, Pig, Sqoop, Hbase, and Map reduce, etc.
  • Design and implement map reduce jobs to support distributed processing using Map Reduce, Hive and Apache Pig.
  • Created Hive external tables on the map reduce output before partitioning, bucketing is applied on top of it.
  • Developed Pig Scripts for replacing the existing home loans legacy process to Hadoop and data is back fed to retail legacy mainframe systems.
Confidential, Portsmouth, NH

Big Data/Hadoop Developer


  • Writing Map Reduce programming, HiveQL and Pig scripting
  • Working on data ingestion tools with Sqoop, Flume, and Analytics with Splunk
  • Working with HBase NoSQL database
  • End to end development using Hive, Pig, Sqoop, HBase, Oozie & related Hadoop stack
  • Designing Oozie workflows for job scheduling and batch processing
  • Worked on Cloudera distribution of Hadoop
  • Providing analytical dash boards, reports and to the Directors, Service Managers and stake holders and participate and presenting the work progress of the System
  • Writing the queries using Impala query engine to get the faster results
  • Solid understanding and experience with extract, transform, load (ETL) methodologies
  • Involved in setup and maintenance of Hadoop clusters for distributed dev/staging/production
  • Onsite & Offshore team management and call rotation

Environment: Hadoop, Cloudera, MapReduce, Hive, Impala, Pig, HBase, Sqoop, Flume, Oozie, Java, Maven, Splunk, RHEL and UNIX Shell


Java/Mainframes Developer

Roles and Responsibilities:

  • Involves in Technical Workflow discussion and documentation, Mapping document and Data Modelling designing.
  • Involved in analysis, design, development and testing phase while implementation of new features.
  • Analyzed the requirements and designed class diagrams, sequence diagrams using UML and prepared high level technical documents.
  • Implementing server side functionalities using java, servlets, Hibernate and developing business logic in Java using the J2EE API
  • Worked on various solutions to accelerate their business turnaround time by anticipating the risks involved, increasing the acceptability rate.
  • PL/SQL and SQL coding; Stored procedures, Functions, Cursors, Triggers, Constraints, Views, Materialize Views
  • Analytical SQL and to write Complex SQL using Joins, Cursor and exception handling
  • Created scripts to create new tables, views, queries for new enhancement in the application using TOAD.
  • Created indexes on the tables for faster retrieval of the data to enhance database performance.
  • Involved in data loading using PL/SQL and SQL*Loader calling UNIX scripts to download and manipulate files.
  • Work closely with the business team for gathering requirements Risk Analysis
  • Worked on business process management to identify real time business rules.
  • Worked on BRE business logic and written coding using java.
  • Implemented Struts MVC framework.
  • Effectively collaborate with other engineers, architects, managers and product managers to solve complex problems spanning multiple projects
  • Involved in Full SDLC life cycle experience including requirements gathering, high level design, detailed design, data design, coding, testing, and creation of functional documentation.
  • Created Action Forms and Action classes for the modules.
  • Writing build scripts using Apache Ant
  • Work on the defects assigned in Bugzilla.

Environment: COBOL, Java, JCL, DB2, ENDEVOR, JDK1.6, JSP, Struts Frame work (MVC model), Apache Tomcat 6.0, Eclipse, Linux.

We'd love your feedback!