We provide IT Staff Augmentation Services!

Sr. Spark /scala Developer Resume

Cupertino, CA


  • 8+ years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and applications.
  • 4+ years of strong end - to-end experience on Hadoop Development and implementing Big Data technologies.
  • Expertise in core Hadoop and its ecosystem tools like Spark, CASSANDRA, HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Flume, Kafka, HBase and ZooKeeper .
  • Experience on NoSQL databases like Cassandra, HBase & Mongo DB and their Integration with Hadoop cluster.
  • Experienced in Spark Core, Spark SQL, Spark Streaming.
  • Implemented Scala scripts , UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data into HDFS through Sqoop.
  • Developed fan-out workflow using Flume and Kafka for ingesting data from various data sources like Webservers , REST API by using network sources and ingested data into Hadoop with HDFS sink.
  • Exposure to Cassandra CQL with Java API s to retrieve data from Cassandra tables.
  • Knowledge on FLUME to extract click stream data from the web server.
  • Experience in Oozie workflow scheduler to manage Hadoop jobs by DAG of actions with control flows.
  • Experience in using Zookeeper and Oozie operational services to coordinate clusters and scheduling workflows.
  • Developed pipeline for constant information ingestion utilizing Kafka, Spark streaming .
  • Experience designing and building data ingestion pipelines using Kafka, Flume, NIFI frameworks.
  • Analyze Cassandra database and compare it with other open-source No SQL databases.
  • Worked with Spark on parallel computing to enhance knowledge about RDD using CASSANDRA.
  • Good understanding and working experience on Hadoop Distributions like Cloudera and Hortonworks.
  • Experience in managing large shared MongoDB cluster and managing life cycle of MongoDB including sizing, automation, monitoring and tuning.
  • Developed Python scripts to monitor health of Mongo databases and perform ad-hoc backups using Mongo dump and Mongo restore.
  • Experience with Apache TEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
  • Experience with Hadoop deployment and automation tools such as Ambari, Cloud break, EMR.
  • Good working knowledge in moving data between HDFS and RDBMS using Sqoop .
  • Hands-on experience on analyzing data using Hive QL , Pig Latin , and custom MapReduce programs.
  • Experienced in working with structured data using Hive QL, JOIN opérations, Hive UDFs, Partitions, Bucketing and internal/external tables.
  • Implemented POC's using Amazon Cloud Components S3, EC2, Elastic beanstalk and Simple Db.
  • Hands-on creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
  • Experienced with full text search and implemented data querying with faceted reader search using Solr.
  • Hands-on experience with monitoring tools to check status of cluster using Cloudera manager.
  • Worked with cloud infrastructure Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and ETL Tools like IBM DataStage , Informatica and Talend.
  • Knowledge on integrating Kerberos into Hadoop to make cluster more strong and secure for unauthorized users.
  • Monitor the Hadoop cluster connectivity and security using Ambari monitoring system.
  • Experienced in backend development using SQL , stored procedures on Oracle 10g and 11g .
  • Good working knowledge on multithreading Core Java, J2EE, JDBC, j Query, JavaScript, and Web Services ( SOAP, REST ).


Big Data/Hadoop Ecosystem: Horton works, Cloudera, Apache, EMR, Hive, Pig, Sqoop, Spark, Kafka, Oozie, Flume, Zookeeper, NIFI, Impala, TEZ

No SQL Databases: Cassandra, HBase, MongoDB

Cloud Services: Amazon AWS

Languages: SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting, C, Java, Python, Scala

ETL Tools: Informatica, IBM, Data Stage, Talend

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JSP, JDBC

Application Servers: Web Logic, Web Sphere, Tomcat

BI Tools: Tableau, Splunk

Databases: Oracle, MySQL, DB2, Teradata, MS SQL Server

Operating Systems: UNIX, Windows, iOS, LINUX

IDE’s: IntelliJ IDEA, Eclipse, Net Beans


Confidential, Cupertino, CA

Sr. Spark /Scala Developer


  • Import the data from CASSANDRA databases and Stored it into AWS.
  • Performed transformations on the data using different Spark modules.
  • Responsible for Spark Core configuration based on type of Input Source.
  • Executed Spark code using Scala for Spark Streaming/Spark SQL for faster processing of data.
  • Performed SQL Joins among Hive tables to get input for Spark batch process.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
  • Developed Kafka producer and brokers for message handling.
  • Involved in importing the data to Hadoop using Kafka and implemented the Oozie job for daily imports.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Used Amazon CLI for data transfers to and from Amazon S3 buckets.
  • Worked with POC ’s for stream processing using Apache NIFI.
  • Worked on Hortonworks Hadoop Solutions with Real-time Streaming using Apache NIFI.
  • Executed Hadoop/Spark jobs on AWS EMR using programs and data is stored in S3 Buckets.
  • Build and configured Apache TEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
  • Experience in pulling the data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
  • Implemented Spark RDD transformations, actions to implement business analysis.
  • Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, JSON, CSV formats.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's in Scala and Python.
  • Worked with Cassandra for non-relation data storage and retrieval.
  • Used Amazon Cloud Watch to monitor and track resources on AWS.
  • Using Spark-Streaming API s to perform transformations on the data from Kafka in real time and persists into Cassandra.
  • Worked on analyzing and examining customer behavioral data using Cassandra .
  • Experienced in automating jobs with Event driven and time-based jobs using Oozie workflow.

Environment: Cassandra, Kafka, Spark, Pig, Hive, Oozie, AWS, Solr, SQL, Scala, TEZ, Python, Java, FileZilla, putty, IntelliJ, GitHub.

Confidential, Fort Myers, Fl

Sr. Big data/Hadoop Developer


  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Automated and Scheduling the Rules on Weekly, Monthly Basis in TAC (Talend Administration Centre).
  • Development of Apache Flume client to send data as events to flume sever and stored in file and HDFS.
  • Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to Hive tables using different SerDe ’s .
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in developing Hive DDLs to create, alter and drop Hive tables.
  • Created POC in Cloudera environment for Oracle Migration to HDFS utilizing Talend ETL to replace informatica.
  • Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
  • Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in Hive , doing map side joins.
  • Developed the Sqoop scripts to make the interaction between HDFS and RDBMS (Oracle, MySQL).
  • Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes.
  • Storing and loading the data from HDFS to Amazon S3 and backing up the Name space data into NFS.
  • Implemented Name Node backup using NFS . This was done for High availability.
  • Developed ETL processes to transfer data from different sources, using Sqoop, Impala, and bash.
  • Used Flume to stream through the log data from various sources.
  • Configured Flume to extract the data from the web server output files to load into HDFS.
  • Designed and implemented the MongoDB schema.
  • Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
  • Wrote services to store and retrieve user data from the MongoDB for the application on devices.
  • Used Mongoose API to access the MongoDB from Node JS .
  • Involved in development of Talend components to validate the data quality across different data sources.
  • Created and Implemented Business validation and coverage Price Gap Rules in Talend on Hive .
  • Written the shell scripts to monitor the data of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.

Environment: Apache Flume, Hive, Pig, HDFS, Zookeeper, Sqoop, RDBMS, AWS, MongoDB, Talend, Shell Scripts, Eclipse.

Confidential, Chicago, IL

Hadoop Developer


  • Developed MapReduce/ EMR jobs to analyze the data and provide heuristics and reports. The heuristics were used for improving campaign targeting and efficiency.
  • Responsible for building scalable distributed data solutions using Hadoop .
  • Developed Simple to complex MapReduce Jobs that are implemented using Hive and Pig .
  • Analyzed the data by performing Hive queries ( HiveQL ) and running Pig scripts (Pig Latin) to study customer behavior and used UDF's to implement business logic in Hadoop.
  • Managing and Reviewing Hadoop Log Files , deploy and Maintaining Hadoop Cluster.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, No SQL.
  • Supporting HBase Architecture Design with the Hadoop Architect group to build up a Database Design in HDFS.
  • Experience in creating tables, dropping, and altered at run time without blocking updates and queries using HBase .
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume .
  • Exported data from HDFS into RDBMS using Sqoop for report generation and visualization purpose.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Using Oozie workflows and enabled email alerts on any failure cases.
  • Managed and reviewed Hadoop log files to identify issues when job fails.
  • Used Pig to convert the fixed width file to delimited file.
  • Supported in setting up updating configurations for implementing scripts with Pig and Sqoop.
  • Developed Kafka producer and consumer components for real time data processing.

Environment: MapReduce, Hive, Pig, Sqoop, Oozie, HBase, Flume, Cloudera, HDFS, RDBMS

Confidential, Woodstock, GA

Hadoop Developer


  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig , Sqoop, Zookeeper.
  • Exported the analyzed data to the relational database using Sqoop for visualization and to generate reports for the BI team.
  • Importing and exporting data into HDFS and Hive using Sqoop between SQL server.
  • Developed Hive queries to process the data for visualizing.
  • Developed multiple MapReduce jobs in Java for data cleaning.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Involved in creating Hive tables, loading with data, and performing transformations on the data imported.
  • Supported MapReduce programs those are running on the cluster.
  • Implemented Frameworks using Java and Python to automate the ingestion flow.
  • Worked on tuning the performance on Pig queries.
  • Worked on Zoo Keeper for coordinating between different master node and data nodes.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: HDFS, MapReduce, Hive, Pig, Sqoop, Linux, Java, python, Oozie, Zookeeper, SQL Server, Linux.


Java Developer


  • Involved in End to End Design and Development of UI Layer, Service Layer, and Persistence Layer.
  • Implemented Spring MVC for designing and implementing the UI Layer for the application.
  • Implemented UI screens using JSF for defining and executing UI flow in the application.
  • Have used AJAX to retrieve data from server synchronously in the background without interfering with the display and existing page in an interactive way.
  • Have Used DWR (Direct Web Remoting) generated script to make AJAX calls to JAVA .
  • Worked with JavaScript for dynamic manipulation of the elements on the screen and to validate the input.
  • Involved in writing Spring Validator Classes for validating the input data.
  • Have used JAXB to marshal & unmarshal java objects to Communicate with the backend mainframe system.
  • Involved in writing complex PL/SQL and SQL blocks for the application.
  • Involved in Unit Testing, User Acceptance Testing and Bug Fixing.

Environment: MVC Architecture, Spring-IOC, Hibernate-JPA, JAVA Web Services-REST, Spring Batch, XML, Java Script, Oracle, Database (10g) and JSF UI, JUnits.


Java Developer


  • Used Servlets, Struts, JSP and Java Beans for developing the Performance module using Legacy Code.
  • Involved in coding for JSP pages, Form Beans and Action Classes in Struts .
  • Created Custom Tag Libraries to support the Struts framework .
  • Involved in Writing Validations.
  • Involved in Database Connectivity through JDBC .
  • Implemented JDBC for mapping an object-oriented domain model to a traditional relational database.
  • Created SQL queries, Sequences, Views for the back-end database in Oracle database.
  • Developed JUnit Test cases and performed application testing for QC team.
  • Used JavaScript for client-side validations.
  • Developed dynamic JSP pages with Struts .
  • Participated in weekly project meetings, updates, and Provided Estimates for the assigned task.

Environment: JAVA, J2EE, STRUTS, JSP, JDBC, ANT, XML, IBM Web Sphere, WSAD, JUNIT, DB2, Rational Rose, CVS, SOAP, and RUP.

Hire Now