We provide IT Staff Augmentation Services!

Spark/scala Developer Resume

Longwood, FL


  • Over 8 years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE, Hadoop and Spark related technologies using Cloudera, Hortonworks.
  • Hadoop Developer with 5 years of working experience in designing and implementing complete end - to-end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase etc.
  • Experience in installation, configuration, management and deployment of Hadoop Cluster, HDFS, Map Reduce, Pig, Hive, Sqoop, Apache Storm, Flume, Oozie, HBase and Zookeeper.
  • In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce, Spark.
  • Hadoop related eco-systems as a Data Storage and Retrieval systems.
  • Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
  • Profound experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Good knowledge on Spark Ecosystem and Spark Architecture.
  • Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), (UDAFs) for custom data specific processing.
  • Good Hands-on full life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.
  • Strong Knowledge on Architecture of Distributed systems and parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
  • Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Experience in handling messaging services using Apache Kafka.
  • Experience with migrating data to and from RDBMS into HDFS using Sqoop.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Worked on NoSQL databases including HBase, Cassandra and Mongo DB.
  • Strong experience in collecting and storing stream data like log data in HDFS using Apache Flume.
  • Experience in working with Java HBase API for ingestion processed data to HBase tables.
  • Experience with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and Pig jobs.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs)
  • Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
  • Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.
  • Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files, XML files and Databases.
  • Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading in Talend.
  • Used Talend for ETL processing based on business needs and extensively used Oozie workflow engine to run multiple Hive and Pig jobs.
  • Experience with Talend and Informatica/Data Exchange.
  • Solid experience in developing workflow using Oozie for running Map Reduce jobs and Hive Queries.
  • Experience in managing and reviewing Hadoop log files.
  • Responsible in performing advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Experience with Apache Solr in replication, distribution, rebalancing and fault tolerance out of the box.
  • Experience in architectural patterns like Apache Lucene search development, full-text search development, cross-platform, High Performance Indexing and ranked searching.
  • Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.
  • Experienced in Java Application Development, Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, spring, Hibernate, Struts, JMS, Web Services (SOAP/REST), Oracle, SQL Server and other relational databases.
  • Profound knowledge on core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization.
  • Experience writing Shell scripts in Linux OS and integrating them with other solutions.
  • Expert at creating UML diagrams Use Case diagrams, Activity diagrams, Class diagrams and Sequence diagrams using Microsoft Visio and IBM Rational Rose.
  • Good experience in development of software applications using Core Java, JDBC, Servlets, JSPs, Spring and RESTful Web Services.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Experience in Requirements Gathering/Analysis, Design, Development, Versioning, Integration, Documentation, Testing, Build and Deployment.
  • Efficient in packaging & deploying J2EE applications using ANT, Maven & Cruise Control on WebLogic, WebSphere & JBoss.
  • Experience in using Jenkins and Maven to compile the package and deploy to the Application Servers.
  • Deployment, Distributed and Implementation of Enterprise applications in J2EE environment
  • Good Understanding of bootstrap, spring rest and integration.
  • Strong Knowledge of Version Control Systems like SVN, GIT & CVS.
  • Familiar with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused, adaptive and quick learner with excellent interpersonal, technical and communication skills.


Big Data Technologies: Hadoop 1.x/2.x(Yarn), HDFS, MapReduce, Pig, Hive, HBase, Cassandra, Zookeeper, Oozie, Sqoop, Flume, HCatalog, Apache Spark, Scala, Impala, Kafka, Storm, Tez, Ganglia, Nagios, Splunk, Elastic Search, Kibana

Hadoop Distributions: Cloudera, Horton Works, AWS

Operating Systems: Windows, Macintosh, Linux, Ubuntu, Unix, CentOS, Redhat.

Programming Languages: C, JAVA, J2EE, SQL, PigLatin, HiveQL, Scala, Python, Unix Shell Scripting

Java Technologies: JSP, Servlets, Spring, Hibernate, Maven

Databases: MS-SQL, Oracle, MS-Access, NoSQL, MySQL

Reporting Tools/ETL Tools: Tableau, Informatica, Data stage, Talend, Pentaho, Power View

Methodologies: Agile/Scrum, Waterfall, DevOps

Development Tools: Eclipse, NetBeans, IntelliJ, Hue, Microsoft Office Suite (Word, Excel, PowerPoint, Access)


Confidential, Longwood, FL

Spark/Scala Developer


  • Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
  • Worked on the Spark SQL for analyzing the data.
  • Used Scala to write code for all Spark use cases.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD'S, YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD'S and Scala.
  • Worked in Spark SQL on different data formats like JSON and Parquet.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Familiarity with Hadoop open source stack including Yarn, Kafka, Hive
  • I have been experienced with KAFKA to ingest data into Spark engine
  • Worked on streaming pipeline that uses Spark to read data from Kafka, transform it and write it to HDFS.
  • Extensive experience in using the mom with active MQ, apache storm, apache Spark & Kafka maven and zookeeper.
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Involved in preparing JIL's for AutoSys jobs.
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
  • Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie.
  • Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
  • Implemented a Continuous Delivery pipeline with Docker, Jenkins and GitHub, Nexus, Maven and AWS.

Environment: Spark - 1.5.2, Spark SQl, Java 1.8, Hive, HDFS, HQL, YARN, HBase, MapReduce, Sqoop, Flume, Oozie, Kafka, Scala, AWS Oracle 12c.

Confidential, Chicago, IL

Hadoop Developer


  • Worked on live 60 nodes Hadoop Cluster running CDH5.4.4, CHD5.2.0, CDH5.2.1
  • Worked on Hadoop cluster using different Bigdata analytic tools including Kafka, Pig, Hive and Map Reduce.
  • Developed simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.
  • Implemented data access jobs through Pig, Hive, HBase (0.98.0), Storm (0.91)
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Altered existing Scala programs to enhance performance and obtain partitioned results Spark tool.
  • Used SparkSQL for Scala &, Python interface that automatically converts RDD case classes to schema RDD.
  • Used SparkSQL to read and write table which are stored in Hive.
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job daily.
  • Involved in developing Hive DDLs to create, alter and drop Hive tables and storm, & Kafka.
  • Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers
  • Experience in data migration from RDBMS to Cassandra.
  • Created data-models for customer data using the Cassandra Query Language.
  • Experienced in developing Spark scripts for data analysis in both python and Scala.
  • Worked on processing unstructured data using Pig and Hive.
  • Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Used Impala to read, write and query the Hadoop data in HDFS or HBase.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Developed Pig Latin Scripts to extract data from the web server output files to load into HDFS.
  • Responsible in taking backups and restoration of Tableau repository.
  • Converted ETL operations to Hadoop system using Pig Latin operations, transformations and functions.
  • Experience in Talend migration project from one version to another is preferred.
  • Worked on majority of Talend components and can design simple ETL Jobs to handle complex Business Logic.
  • Knowledge of error handling and Performance tuning in Talend and SQL.
  • Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
  • Exported the result set from Hive to MySQL using Shell Scripts.
  • Actively involved in code review and bug fixing for improving the performance.
  • Experience with Cassandra (DataStax distribution preferred)
  • Collaborate with development teams on architecture and deployment of NoSQL database systems like Cassandra

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Storm, Kafka, LINUX, Hortonworks distribution, Bigdata, Java APIs, Java collection, SQL, NoSQL, MongoDB, Cassandra.

Confidential, Malvern, PA

Hadoop Administrator/Developer


  • Responsible for installation, configuration, maintenance, monitoring, performance tuning and troubleshooting Hadoop Clusters in different environments such as Development Cluster, Test Cluster and Production.
  • Used Job Tracker to assign MapReduce tasks to Task Tracker in cluster of nodes.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
  • Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
  • Implemented Kerberos security in all environments.
  • Defined file system layout and data set permissions.
  • Implemented Capacity Scheduler to share the resources of the cluster for the MapReduce jobs given by the users.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Involved in loading data from Linux and Unix file system to HDFS.
  • Involved in emitting processed data from Hadoop to relational databases or external file systems using Sqoop, HDFS GET or Copy to Local.
  • Involved in Cluster planning and setting up the multimode cluster.
  • Commissioned and Decommissioned nodes from time to time.
  • Involved in HDFS maintenance and administering it through HDFS-Java API.
  • Worked with Hadoop developers and designers in troubleshooting MapReduce job failures and issues.

Environment: Hadoop 1.2.1, MapReduce, HDFS, Pig, Hive, Sqoop, Cloudera Hadoop Distribution, HBase, Windows NT, LINUX, UNIX Shell Scripting.


Hadoop Administrator/Developer


  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Imported and exported data into HDFS from Oracle database and vice versa using Sqoop.
  • Installed and configured Hadoop Cluster for major Hadoop distributions.
  • Used Hive and Pig as an ETL tool for event joins, filters, transformations and pre-aggregations.
  • Created partitions, bucketing across state in Hive to handle structured data.
  • Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data such as removing personal information or merging many small files into a handful of very large, compressed files using Pig pipelines in the data preparation stage.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Kafka, Flume & and process the files by using Piggybank.
  • Extensively used PIG to communicate with Hive using HCatalog and HBase using Handlers.
  • Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MangoDB.
  • Created tables, secondary indices, join indices viewed in Teradata development environment for testing.
  • Captured data logs from web server into HDFS using Flume & for analysis.
  • Managed and reviewed Hadoop log files.

Environment: Hive, Pig, MapReduce, Sqoop, Oozie, Flume, Kafka, Storm, HBase, Unix, Linux, Python, SQL, Hadoop 1.x, HDFS, GitHub, Talend, Python Scripting.


Java Developer


  • Involved in development of JavaScript code for client-side validations.
  • Developed the HTML based web pages for displaying the reports.
  • Developed front-end screens using JSP, HTML, jQuery, JavaScript and CSS.
  • Performed data validation in Struts from beans and Action Classes.
  • Developed dynamic content of presentation layer using JSP.
  • Accessed stored procedures and functions using JDBC Callable statements.
  • Involved in designing use-case diagrams, class diagrams and interaction using UML model with Rational Rose.
  • Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.
  • Developed coding using SQL, PL/SQL, Queries, Joins, Views, Procedures/Functions, Triggers and Packages.
  • Developed Web Applications with rich internet applications using Java applets, Silverlight, Java.
  • Used JDBC for database access.
  • Played a key role in the high-level design for the implementation of the application.
  • Designed and established the process and mapping the functional requirement to the workflow process.

Environment: Java, Servlets, Java Beans, JSP, EJB, J2EE, STRUTS, XML, XSLT, JavaScript, HTML, CSS, Spring 3.2, SQL, PL/SQL, MS Visio, Eclipse, JDBC, Windows XP.

Hire Now