We provide IT Staff Augmentation Services!

Spark Developer Resume

Boston, MA


  • Around 8 years of professional IT experience, 4+ years Big Data Ecosystem experience in ingestion, querying, processing and analysis of big data.
  • Hands on experience with Spark Core, Spark SQL, Spark Streaming using Scala and Python.
  • Good Understanding in Hadoop Ecosystem including Spark, Hive, Pig, HBase, Oozie, Sqoop and Kafka.
  • Experience in Data Frame, RDD architecture and implementing spark operations on RDD, Data Frame and optimizing Transformations and Actions in Spark.
  • Experience with processing large sets of structured, semi - structured and unstructured data using Spark and Scala.
  • Expertise knowledge on Object Oriented Programming Concepts and valuable experience in Exception Handling, Debugging and Tracing concepts.
  • Experience with Hadoop architecture and its components such as HDFS, Name Node, Data Node and MapReduce programming paradigm.
  • Experience with creating Hive tables, Hive joins & HQL for querying the databases eventually leading to complex Hive UDFs.
  • Experience in work with Cassandra for fast retrieval of data.
  • Worked with Several File Formats such as Avro, Parquet, CSV, JSON, Sequential, ORC etc.
  • Knowledge of ETL (Extract, Transform, Load) methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for data analysis.
  • Understanding of several programing languages such as Scala, Python, Java, C++.


Data Ingestion: Sqoop, Kafka, Flume, Apache Hadoop Eco Systems

Data Processing: Spark, Impala, YARN, Map Reduce

Distributed Storage and Computing: HDFS, Zookeeper, S3.

Programming Languages: Scala, SQL, Python

ETL: IBM-Data Stage, Talend, AB Initio

Relational Databases: Oracle, MYSQL, Oracle SQL & ACCESS

NOSQL Databases: MONGODB, Cassandra, HBASE, DYDB

Cloud AWS: EMR, EC2, S3

Build tools: Jenkins, Maven, Gradle

Version Control: GIT, IntelliJ, Eclipse

Operating System: LINUX, Windows, UNIX

Data Formats: Parquet, Sequence, AVRO, ORC, CSV, JSON

Monitoring: Ambari, Cloudera Manager


Spark Developer

Confidential, Boston, MA


  • Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.
  • Written extensive Hive queries to do transformations on the data to be used by downstream models.
  • Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
  • Developed the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Teradata to HDFS.
  • Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Spark etc. and ingested streaming data into Hadoop using Spark and Scala.
  • Experience in writing Sqoop Scripts for importing and exporting data from RDBMS to HDFS.
  • Developed Scala scripts using both Data frames and RDD in Spark for Data Aggregation, queries and writing data through Sqoop.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Worked with various HDFS file formats like Avro, ORC, Sequence File and various compression formats like Snappy, gzip etc.
  • Extensively used Maven, SVN as a code repository and Version One for managing day agile project development process and to keep track of the issues and blockers.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.

Environment: Hadoop, Spark, Hive, Oracle, Maven, Scala, Python, Pig, Sqoop, Oozie, MongoDB, SVN.

Hadoop Developer

Confidential, Reston, VA


  • Ingested incremental Batch Data from MySQL database and Teradata Using Sqoop.
  • Ingested real time data into the HDFS using Kafka and Oozie.
  • Worked with Elastic map reduce (EMR) for data processing on Amazon Web Services (AWS).
  • Worked with Amazon S3 for storage on Amazon Web Services (AWS).
  • Worked with Different Spark Modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.
  • Involved in converting the files in HDFS into RDD's from multiple data formats and performing Data Cleansing using RRD Operations.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Wrote complex queries and User Defined functions (UDFs) for custom functionality in hive using Scala.
  • Worked with different HDFS file formats like Avro, ORC, parquet, Sequence File
  • Integrated Oozie with the rest of Hadoop stack supporting several types of jobs and the system specific jobs (such as Java programs and shell scripts).

Environment: HDFS, Spark, Hive, Sqoop, Kafka, AWS EMR, AWS S3, Oozie, Spark Core, SPARK SQL, Maven, Scala, SQL, Linux, YARN, IntelliJ, Agile Methodology

Big Data Developer

Confidential, Pittsburgh, PA


  • Utilized SQOOP, Kafka, Flume and Hadoop File System API’s for implementing data ingestion pipelines from heterogeneous data Sources.
  • Created storage with Amazon S3 for storing data and Worked on transferring data from Kafka topic into AWS S3 storage.
  • Worked on real time streaming and performed transformations on the data using Kafka and Spark Streaming.
  • Implemented Spark Scripts using Scala and used Spark SQL to access hive tables into spark for faster processing of data.
  • Created data pipeline for several events of ingestion, aggregation and load consumer response data from AWS S3 bucket into Hive external tables.
  • Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and XML.
  • Used Apache NiFi to automate data movement between different Hadoop components and perform conversion of raw XML data into JSON, AVRO.

Environment: Hadoop, HDFS, AWS, Scala, Kafka, Map Reduce, YARN, Spark, Pig, Hive, Python, Java, NiFi, HBase, IMS Mainframe, Maven.

Java Developer



  • Designed and developed User Interface using HTML5, CSS, JavaScript, jQuery, AJAX and JSON.
  • Implemented spring security features using AOP Interceptors for the authentication.
  • Developed Spring Framework based Restful Web Services for handling and persisting of requests and Spring MVC for returning response to presentation tier
  • Used multithreading in programming to improve overall performance using Singleton design pattern in Hibernate Utility class.
  • Wrote complex SQL query to pull data from different tables to build the report.
  • Used Log4j for error handling, to monitor the status of the service and to filter bad loans.
  • Development and debugging done using Eclipse IDE.

Environment: Java, Hibernate, spring, HTML5, CSS, JavaScript, J Query, AJAX, JSON, Web Logic, Oracle, PL/SQL.

Java/ J2EE Developer



  • Developed the application using Spring Framework that leverages Model View Controller (MVC) architecture, spring security and Java API.
  • Implemented design patterns like Singleton, Factory pattern and MVC.
  • Deployed the applications on IBM Web Sphere Application Server.
  • Worked on Java script, CSS Style Sheet and JQuery.
  • Worked one-on-one with client to develop layout, color scheme for his website and implemented it into a final interface design with the HTML5/CSS3 & JavaScript.
  • Used advanced level of HTML5, JavaScript, JQuery, CSS3 and pure CSS layouts (table less layout)
  • Wrote SQL queries to bring data from the Oracle & MySQL databases.

Environment: Java, Oracle 11g Express, CVS, Struts, spring, HTML, CSS, JavaScript, Apache Tomcat, Eclipse IDE, Maven, Junit.

Hire Now