We provide IT Staff Augmentation Services!

Spark/big Data Developer Resume

Irving, TX


  • Overall 6 years of IT experience, including 2+ years of Hadoop/Big data Experience, 4 years of Java Programming involved in entire Software Development Life Cycle which includes Design, Developing, Implementing, Testing and maintenance of various web - based applications using Java, J2EE Technologies.
  • Experience in working with Cloudera, Hortonworks Distributions.
  • Experience in dealing with large data sets and making performance improvements
  • Experience in Implementing Spark with the integration of Hadoop Ecosystem.
  • Experience in using Spark RDD for parallel processing datasets in HDFS, MySQL and other data sources.
  • Experience in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Experience in using different build tools like SBT and Maven.
  • Implemented Spark Streaming for fast data processing.
  • Experience in designing and developing Applications in Spark using Scala.
  • Skilled in integrating Kafka with Spark streaming for high speed data processing.
  • Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
  • Experience in data cleansing using Spark Map and Filter Functions.
  • Experience in developing and Debugging Hive Queries.
  • Experience in performing read and write operations on HDFS filesystem.
  • Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), setting up EMR (Elastic MapReduce).
  • Good Experience in Data importing and Exporting to Hive and HDFS with Sqoop.
  • Experience in creating Hive Tables and loading the data from different file formats.
  • Experience in processing the data using Hive HQL for data Analytics.
  • Extending Hive Core functionality by writing UDF’s for Data Analysis.
  • Implemented Partitioning, Dynamic Partition, Buckets in HIVE.
  • Experience in dealing with the different file formats like Sequence files, Avro and Parquet.
  • Experience in using Producer and Consumer API’s of Apache Kafka.
  • Extensively used Apache Flume to collect the logs and error messages across the cluster.
  • Good in using version control like GITHUB and SVN
  • Worked with MySQL, Oracle 11g databases.
  • Strong Knowledge on UNIX/LINUX commands.
  • Strong Knowledge on Python scripting Language.
  • Adequate knowledge of Scrum, Agile and Waterfall methodologies.


Big Data Technologies: Apache Hadoop, Map Reduce, Hive, Pig, HBase, Sqoop, Spark, HDFS, Apache Kafka, Apache Flume, Apache oozie, Apache Zookeeper, Cassandra.

Hadoop Distributions: Cloudera, Hortonworks.

Programming Languages: Scala, Python, Java.

Scripting Languages: Angular2, Java Script.

Build Tools: Maven, SBT.

Version Control Tools: Git, SVN.

Cloud: AWS.

Databases: MySQL, Oracle 10g,11g.

NOSQL Databases: HBase, Cassandra.

Operating Systems: Windows 7/10, Linux (Cent OS, Red hat, Ubuntu), Mac OS.

Development Tools: IntelliJ IDEA, Eclipse, NetBeans.


Confidential, Irving, TX

Spark/Big Data Developer


  • Worked under the Cloudera distribution CDH 5.13 version.
  • Involved in Ingesting weblog data into HDFS using Kafka.
  • Processed Json Data with Spark SQL.
  • Performed Cleansing the data to get a desired format.
  • Involved in writing Spark SQL Data frames into Parquet Files.
  • Involved in Tuning Spark Jobs for optimal Efficiency.
  • Written the Scala functions, procedures, Constructors and Traits.
  • Created Hive tables to load the transformed Data.
  • Performed partitions and bucketing in hive for easy data classification.
  • Involved in Analyzing data by writing queries using HiveQL for faster data processing.
  • Involved in working with Sqoop for loading the data into RDBMS.
  • Created a data pipeline using Oozie which runs on daily basis.
  • Involved in Persisting Metadata into HDFS for further data processing.
  • Loading data from Linux Filesystems to HDFS and vice-versa.
  • Involved in creating tables, partitioning, bucketing of table and creating UDF's along with fine tuning in Hive.
  • Loaded the Cleaned Data into the hive tables and performed some analysis based on the requirements.
  • Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets.
  • Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.

Environment: HDFS, Apache Spark, Apache Hive, Scala, Oozie, Flume, Kafka, Agile Methodology, Cloudera, Cassandra.

Confidential, Farmington, CT

Big Data/Hadoop Developer.


  • Worked under the Hortonworks HDP Enterprise.
  • Worked on large sets of structured and semi-structured data.
  • Involved in copying large data from Amazon S3 buckets to HDFS using Flume.
  • Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
  • Involved in working with Avro Files using Spark SQL
  • Written UDF’s in Spark SQL using Scala.
  • Performed data Aggregation operations using Spark SQL queries.
  • Configured Spark streaming to receive data from Kafka and store the streamed data to HDFS using Scala.
  • Implemented Hive Partitioning and bucketing for data analytics.
  • Worked on Performance and Tuning operations in Hive.
  • Extensively used Maven Build tool for code repository.
  • Used Git has Version Control System.
  • Involved in working with Sqoop to export the data from Hive to S3 buckets
  • Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling the work flows.

Environment: Apache Spark, Apache Flume, Amazon S3, Apache Sqoop, Apache Oozie, Apache Kafka, Hive, Apache.


Java Developer


  • Involved in requirement collection and analysis.
  • Involved in developing front-end screens using JSP, Struts and HTML
  • Involved in implementing persistent data management using JDBC .
  • Involved in problem analysis and coding
  • Design and coding of screens involving complex calculations on various data windows accessing different tables on the oracle database .
  • Developed screens for Patient Registration, Inventory of Medicines, Billing of Services and Asset Modules .
  • Used JSF framework in developing user interfaces using JSF UI Components, validate Events and Listeners.
  • Created several pieces of the JSF engine, including value bindings, bean discovery, method bindings, event generation and component binding,
  • Involved in unit testing, integration testing, SOAP UI testing, smoke testing, system testing and user acceptance testing of the application.
  • Wrote stored procedures, Database Triggers.
  • Involved in debugging and troubleshooting related to production and environment issues
  • Performed Unit testing.

Environment: JSP, Servlets, SQL, PL/SQL, WebSphere Application Server, Oracle 9i, JavaScript, Windows XP, html, Unix shell script, Junit.


Java Developer


  • Involved in the complete development, testing and maintenance of the application.
  • Designed UI Screens using Servlets, JavaScript, CSS, Ajax, DHTML, XSL, XHTML and HTML.
  • Implemented Patterns such as Singleton, Factory, Facade, Prototype, Decorator, Business Delegate and MVC.
  • Created Session Beans to handle the business logic associated with the Inspection.
  • Developed and deployed various Entity EJBs and session EJBs.
  • Involved in the Object-Oriented Requirement Analysis Phase of the project in order to gather business logic requirement.
  • Development of GUI using JSP.
  • Coding of JSP Pages for External Application (EXA) using Custom Tag Library which create standard tag used in the application.
  • Involved in designing application based on MVC Architecture.
  • Developed Session beans to implement the core Business logic.
  • Designed use case diagrams, class diagrams and sequence diagrams using Microsoft Visio tool.
  • Involved in coding the helper classes for better data exchange between different layers.
  • Provided production support by fixing bugs.
  • Performed unit testing, system testing and user acceptance test.
  • Used CVS for version control.

Environment: Java, Servlets, JSP, CSS3, XML, DHTML, EJB, JavaScript AJAX, DB2, Web Services, Web Sphere Application Server, Log4j, CVS, JUnit, IBM RAD, UML.

Hire Now