We provide IT Staff Augmentation Services!

Spark Developer Resume

Charlotte, NC


  • Around 6 years of IT experience in software Development Life Cycle (Analysis, Design, Development, Testing, Deployment and Support) using WATERFALL and AGILE methodologies.
  • Having around 4 years of experience in Data Analysis using Hadoop Eco System components (Spark, HDFS, MapReduce, Pig, Sqoop, Hive, Cassandra and HBase) in Financial, Retail and Health - care sector.
  • Experience in Hadoop components like HDFS, MapReduce, Job Tracker, Name Node, Data Node Task Tracker and Apache Spark.
  • Hands on experience in Capturing data from existing relational databases (Oracle, MySQL, SQL and Teradata) that provide SQL interfaces using Sqoop.
  • Hands on experience in Sequence files, RC files, Avro, Parquet, and Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
  • Skilled in developing Java Map Reduce programs using java API and using hive, pig to perform data analysis, data cleaning and data transformation.
  • Developed multiple MapReduce jobs to perform data cleaning and preprocessing.
  • Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design to load data into Hadoop environment.
  • Experience in working with Cloudera and Hortonworks Hadoop distribution.
  • Expertise in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries.
  • Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3), EMR, and Amazon Elastic Compute Cloud (Amazon EC2)
  • Implemented Sqoop for large dataset transfer between Hadoop and RDBMS.
  • Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning, bucketing, writing and optimizing the HiveQL queries.
  • Worked on different file formats (ORCFILE, Parquet, Avaro,TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO,BZIP) .
  • Experience in composing shell scripts to dump the shared information from MySQL servers to HDFS.
  • Proficiency in using Apache Sqoop to import and export data from other databases to HDFS and vice versa.
  • Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data.
  • Involved In working with Maven, Ant, sbt and Gradle for build process.
  • Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka
  • Experience in data workflow scheduler Zoo-Keeper and Oozie to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.
  • Knowledge in creating impala views on top of Hive tables for faster access to analyze data.
  • Integrated BI tool like Tableau with Impala and analyzed the data.
  • Experienced in performance tuning and real-time analytics in both relational database and NoSQL database (HBase).
  • Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharing features.
  • Experience with NoSQL databases like HBase, MongoDB and Cassandra.
  • Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
  • Experienced in integrating Kafka with Spark streaming for high speed data processing.
  • Experience in collecting the log data from different sources (webservers and social media) using Flume, Kafka and storing in HDFS to perform the MapReduce jobs.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
  • Exposure in working with data frames.
  • Hands on experience in working with Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
  • Profound experience in working with Cloudera (CDH4 &CDH5) and Horton Works Hadoop Distributions and Amazon EMR Hadoop distributors on multi-node cluster.
  • Exposure towards simplifying and automating big data integration with graphical tools and wizards that generate native code using Talend(ETL).
  • Knowledge in importing results into visualization tool Tableau to create dashboards.
  • Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), J2SE, Multithreading in Core Java, HTML, servlets, JSP, JDBC.
  • Experience in working with different relational databases like MySQL, MS SQL and Oracle.
  • Strong experience in database design, writing complex SQL Queries and Stored Procedures
  • Expertise in various faces of Software Development including analysis, design, development and deployment of applications using Servlets, JSP, Java Beans, Struts, Spring Framework, JDBC.
  • Having Experience on Development applications like Eclipse, NetBeans etc.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Good analytical, communication, problem solving skills and adore learning new technical, functional skills.


Bigdata Ecosystem: HDFS and Map Reduce, Pig, Hive, Impala, YARN, HUE, Oozie, Zookeeper, Solr, Apache Spark, Apache STORM, Apache Kafka, Sqoop, Flume.

NoSQL Databases: HBase, Cassandra, and MongoDB

Hadoop Distributions: Cloudera, Hortonworks

Programming languages: Java, C/C++, SCALA, Pig Latin, HiveQL.

Scripting Languages: Shell Scripting, Java Scripting

Databases: MySQL, oracle, Teradata, DB2

Build Tools: Maven, Ant, Gradle, sbt

Reporting Tool: Tableau

Version control Tools: SVN, Git, GitHub

Cloud: AWS, Azure

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

Web Design Tools: HTML, AJAX, JavaScript, JQuery, CSS and JSON.

Operating Systems: WINDOWS 10/8/Vista/ XP

Development IDEs: NetBeans, Eclipse IDE, Python(IDLE)


Spark Developer

Confidential, Charlotte,NC

  • Developed data pipeline using Kafka, Sqoop, Hive to ingest Customer transactional data and behavioral data into HDFS for processing and analysis.
  • Involved in writing SQOOP scripts for importing and exporting data into HDFS and Hive.
  • Developed scripts and automated data management from end to end and sync up between all the clusters.
  • Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop.
  • Responsible for importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka messaging system.
  • Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
  • Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Worked on Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Imported results into visualization BI tool Tableau to create dashboards.
  • Worked in Agile Methodology and used JIRA for maintain the stories about project.
  • Involved in gathering the requirements, designing, development and testing.
  • Developed the services to run the Map-Reduce jobs as per the requirement basis.
  • Responsible to manage data coming from different sources.
  • Developing business logic using Scala.
  • Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Writing MapReduce programs to convert text files into AVRO and loading into Hive tables.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBase, Oozie, Scala, Spark, Spark Streaming, Kafka.

Spark/Hadoop Developer

Confidential, Hartford,CT

  • Extracted the data from RDBMS into HDFS using Sqoop.
  • Loaded and transformed large sets of structured and semi-structured data using Spark SQL and Data Frames API into Spark clusters.
  • Developed Spark applications Using Scala as per the Business requirements.
  • Used Spark Data Frame Operations to perform required validations on the data.
  • Implemented Hive Partitioning and bucketing for data analytics.
  • Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets.
  • Created Hive tables and working on them for data analysis to cope up with the requirements.
  • Involved in creating views for the data security.
  • Involved in the performance tuning of spark applications.
  • Worked on Performance and Tuning operations in Hive.
  • Involved in creating workflows to run Sqoop jobs .
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
  • Experienced in using version control tools like GitHub to share the code snippet among the team members
  • Used Apache NIFI to copy the data from local file system to HDFS.
  • Developed UDF functions for Hive and wrote complex queries in Hive for data analysis.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
  • Worked on data serialization formats for converting complex objects into sequence bits by using Avro, ORC file formats.
  • Designed and developed Hive tables to store staging and historical data.
  • Implemented Hortonworks NiFi (HDP 2.4) and recommended solution to inject data from multiple data sources to HDFS and Hive using NiFi.
  • Experience in using ORC file format with Snappy compression for optimized storage of Hive tables.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process.
  • Involved in migrating MapReduce jobs into Spark jobs.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Integrated BI tool like Tableau with Impala and analyzed the data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Pig, Hive, Oozie, Scala, Spark, Spark SQL, Nifi, Hortonworks.

Hadoop Developer


  • Developed Hive queries for extracting data and sending them to clients.
  • Created SCALA programs to develop the reports for Business users.
  • Created hive UDFs for formatting data in SCALA.
  • Distributed programming through spark, specifically Scala.
  • Transformation and Analysis in Hive/Pig, Parsing the raw data using Map reduce and SPARK.
  • Worked on capturing transactional changes in the data using MAPREDUCE and HBASE.
  • Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MapReduce, HIVE, SPARK, SQOOP and Pig Latin.
  • Familiar with AWS Components like EC2,S3.
  • Worked with SQOOP import and export functionalities to handle large data set transfer between DB2 database and HDFS.
  • Worked on ingesting data from different sources.
  • Supported multiple application extracts coming out of Big Data Platform.
  • Followed agile methodology during project delivery.
  • Knowledge of CodeHub and GIT.
  • Worked/Coordinated with Offshore to complete the tasks.
  • Understanding of ServiceNowtool to submit Change requests, incidents for application deployments.

Environment: mapR, Hive, Pig, SPARK, SCALA, MapReduce, UNIX scripting, Talend.

Java Developer


  • Interaction with business team for detailed specifications on the requirements and issue resolution.
  • Developed user interfaces using HTML, XML, CSS, JSP, Java Script and Struts Tag Libraries and defined common page layouts using custom tags.
  • Developed client-side validations using JavaScript.
  • Implemented Struts MVC Paradigm components such as Action Mapping, Action class, Action Form, Validation Framework, Struts Tiles and Struts Tag Libraries.
  • Involved in the development of the front end of the application using Struts framework and interaction with controller java classes.
  • Domain model creation and enhancement using XSD and Hibernate.
  • Provided development support for System Testing, User Acceptance Testing and Production and deployed application on JBoss Application Server.
  • Wrote and executed efficient SQL queries (CRUD operations), JOINs on multiple tables, to create and test sample test data in Oracle Database using Oracle SQL Developer.
  • Used CVS for check-in, check-out of files to control versions of files.
  • Used Eclipse as an IDE.
  • Used HP Quality Center to track activities and defects .
  • Implemented logging with Log4j .
  • Used Ant to compile and build project.
  • Developed Style Sheet to provide dynamism to the pages and extensively involved in unit testing and System testing using JUnit and involved in critical bug fixing.
  • Utilized the base UML methodologies and Use cases modeled by architects to develop the front-end interface. The class, sequence and state diagrams were developed using Visio.

Environment: Java, Struts 1.2, Hibernate 3.0, JSP, JavaScript, HTML, XML, Oracle, Eclipse, JBoss Application Server, ANT, CVS, and SQL Developer.

Hire Now