We provide IT Staff Augmentation Services!

Sr. Spark/scala Developer Resume



  • Having 8+ years of Experience in IT industry in Designing, Developing and Maintaining usingBigdata Technologies like Hadoop , Spark Ecosystems and Java/J2EE Technologies.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle .
  • Excellent Programming skills at a higher level of abstraction using Scala , Java and Python .
  • Extensive experience in working with various distributions of Hadoop enterprise versions of Cloudera ( CDH4 / CDH5 ), Hortonworks and good knowledge on MAPR distribution, IBM Big Insights and Amazon's EMR ( Elastic MapReduce ).
  • Working knowledge of Amazon's Elastic Cloud Compute( EC2 ) infrastructure for computational tasks and Simple Storage Service ( S3 ) as Storage mechanism.
  • Experienced in implementing scheduler using Oozie , Airflow , Crontab and Shell scripts .
  • Good working experience in importing data using Sqoop , SFTP from various sources like RDMS , Teradata , Mainframes , Oracle , Netezza to HDFS and performed transformations on it using Hive , Pig and Spark .
  • Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka messaging system.
  • Strong experience and knowledge of real time data analytics using Spark Streaming , Kafka and Flume .
  • Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
  • Expertise in writing SparkRDD transformations, Actions, Data Frames , Case classes for the required input data and performed the data transformations using Spark - Core .
  • Experience in integrating Hive queries into Spark environment using Spark SQL .
  • Expertise in performing real time analytics on big data using HBase and Cassandra .
  • Developed customized UDFs and UDAFs in java to extend Pig and Hive core functionality.
  • Proficient in NoSQL databases including HBase , Cassandra , MongoDB and its integration with Hadoop cluster.
  • Good experience in optimizing MapReduce algorithms using Mappers , Reducers , combiners and partitioners to deliver the best results for the large datasets.
  • Extracted data from various data source including OLEDB , Excel , Flat files and XML .
  • Experienced in using build tools like Ant , SBT , Log4j , Maven to build and deploy applications into the server.
  • Had competency in using Chef , Puppet and Ansible configuration and automation tools. Configured and administered CI tools like Jenkins , Hudson Bambino for automated builds.
  • Proficient in developing, deploying and managing the SOLR from development to production.
  • Experience in Enterprise search using SOLR to implement full text search with advanced text analysis, faceted search, filtering using advanced features like dismax, extended dismax and grouping.
  • Worked on data warehousing and ETL tools like Informatica , Talend , and Pentaho .
  • Designed ETL workflows on Tableau , Deployed data from various sources to HDFS .
  • Working experience on Test Data Management tools HP Quality Center, HPALM , Load Runner , QTP and Selenium.
  • Worked on ELK stack like Elastic search, Logstash , Kibana for log management.
  • Worked on various programming languages using IDEs like Eclipse , NetBeans , and Intellij .
  • Experience in Software Design, Development and Implementation of Client/Server Web based Applications using JSTL , jQuery , JavaScript , Java Beans , JDBC , Struts , PL/SQL , SQL , HTML , CSS , PHP , XML , AJAX and had a bird's eye view on React Java Script Library.
  • Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and Worked on various version control tools like CVS , GIT , PVCS , SVN .
  • Experience with best practices of Web services development and Integration (both REST and SOAP ).
  • Generated various kinds of knowledge reports using Power BI and Qlik based on Business specification.
  • Experience in automated scripts using Unix shell scripting to perform database activities.
  • Experience in complete Software Development Life Cycle ( SDLC ) in both Waterfall and Agile methodologies.
  • Good understanding of all aspects of Testing such as Unit, Regression, Agile, White-box, Black-box.


Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, Map R and Apache

Languages: Java, Python, J ruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++

No SQL Databases: Cassandra, MongoDB and HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts

Methodology: Agile, waterfall

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

ETL Tools: Talend, Informatica, Pentaho


Confidential, MO

Sr. Spark/Scala Developer


  • Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Worked with the Sparkfor improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
  • Experience in implementing Spark RDD's in Scala.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
  • Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers.
  • Developed Preprocessing job using Spark Data frames to flatten Json documents to flat file.
  • Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.
  • Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Experienced in using Spark Core for joining the data to deliver the reports and for delivering the fraudulent activities.
  • Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
  • Experienced in Creating data-models for Client's transactional logs, analyzed the data from Cassandra tables for quick searching, sorting and grouping using the Cassandra Query Language(CQL).
  • Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes.
  • Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
  • Developed Sqoop Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
  • Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage.
  • Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Worked with ELASTIC MAPREDUCE and setup Hadoop environment in AWS EC2 Instances.
  • Experience in using Avro, Parquet, RCFile and JSON file formats, developed UDFs in Hive and Pig.
  • Developed Custom PigUDFs in Java and used UDFs from Piggybank for sorting and preparing the data.
  • Experienced with Full Text Search and Faceted Reader search using Solr and implemented data querying with Solr.
  • Setting up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for users.
  • Generated various kinds of reports using Power BI and Tableau based on Client specification.
  • Worked with Network, Database, Application, QA and BI teams to ensure data quality and availability.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Hadoop, Spark, Spark-Streaming, Spark SQL, AWS EMR, HDFS, Hive, Pig, Apache Kafka, Sqoop, Scala, Shell scripting, Linux, MySQL, SOLR, Jenkins, Eclipse, Oracle, Git, Oozie, Tableau, MySQL, Soap, NIFI, Cassandra and Agile.

Confidential, NJ

Sr. Spark Developer


  • Processed the web server logs by developing Multi - hop flume agents by using Avro Sink and loaded into MongoDB for further analysis.
  • Implemented Custom interceptors to Mask confidential data and filter unwanted records from the event payload in flume.
  • Implemented Custom Serializes to perform encryption using DES algorithm.
  • Developed Collections in MongoDB and performed aggregations on the collections.
  • Used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled Structured data using Spark SQL.
  • Used Spark-SQL to Load data into Hive tables and Written queries to fetch data from these tables.
  • Developed Spark Programs using Scala and Java API's and performed transformations and actions on RDD's.
  • Designed and implemented Spark jobs to support distributed data processing.
  • Experienced in writing Spark Applications in Scala and Python(Pyspark).
  • Created HBase tables and used HBase sinks and loaded data into them to perform analytics using Tableau.
  • Created HBase tables and column families to store the user event data
  • Imported data from AWSS3 and into Spark RDD and performed transformations and actions on RDD's.
  • Configured, monitored, and optimized Flume agent to capture web logs from the VPN server to be put into HadoopData Lake.
  • Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQLServer, MS Access, Excel, Flat Files and XML using Talend.
  • Experienced on loading and transforming of large sets of structured, semi and unstructured data.
  • Experience in working with Hadoop clusters using Hortonworks distributions.
  • Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/HiveUDFs.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Develop ETL Process usingSPARK, SCALA, HIVE and HBASE.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVEtables.
  • Used codec's like snappy and LZO to store data into HDFS to improve performance.
  • Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery and backup.
  • Created HBase tables to store variable data formats of data coming from different Legacy systems.
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Developed Sqoop Jobs to load data from RDBMS into HDFS and HIVE.
  • Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
  • Imported the data from different sources like AWS S3, LFS into Spark RDD.
  • Performed troubleshooting of MapReducejobs by analyzing and reviewing Hadoop log files.
  • Worked with Network, database, application and BI teams to ensure data quality and availability.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Experience in Maintaining the cluster on AWS EMR.
  • Experienced in NOSQL databases like HBase, MongoDB and experienced with Hortonworks distribution of Hadoop.
  • Experienced in Creating ETL Mappings in Informatica.
  • Scheduled the ETL jobs using ESP scheduler.

Environment: Hadoop, HDFS,MapReduce, Hive, Pig, Sqoop, HBase,MongoDB,Flume, Apache Spark, Accumulo, Oozie, Kerberos, AWS, Tableau, Java, Informatica, Elastic Search, Git, Maven.

Confidential, TX

Hadoop/ETL Developer


  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop, Name Node, Secondary Name Node, Job Tracker, Task Trackers, and Data Nodes.
  • Developed MapReduce programs in Java and Sqoop the data from ORACLE database.
  • Responsible for building scalable distributed data solutions using Hadoop. Written various Hive and Pig scripts.
  • Moved data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
  • Experienced with different scripting language like Python and shell scripts.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Experienced with handling administration activations using Cloudera manager.
  • Expertise in understanding Partitions, Bucketing concepts in Hive.
  • Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the Map Reduces jobs that extract the data on a timely manner. Responsible for loading data from UNIX file system to HDFS.
  • Analyzed the weblog data using the HiveQL, integrated Oozie with the rest of the Hadoop stack
  • Utilized cluster co - ordination services through Zookeeper.
  • Got good experience with various NoSQL databases and Comprehensive knowledge in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation.
  • Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing MongoDB clusters.
  • Created Partitioned Hive tables and worked on them using HiveQL.
  • Developed Shell scripts to automate routine DBA tasks.
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.

Environment: HDFS, Map Reduce, Pig, Hive, Oozie, Sqoop, Flume, HBase, Java, Maven, Avro, Cloudera, Eclipse and Shell Scripting.


JAVA/ETL Developer


  • Developed Maven scripts to build and deploy the application.
  • Developed Spring MVC controllers for all the modules.
  • SAS scripts on UNIX are run, and the output datasets are exported into SAS.
  • Implemented jQuery validator components.
  • Extracted data from Oracle as one of the source databases.
  • Using Data stage ETL tool to copy data from Teradata to Netezza
  • Created ETL Data mapping spreadsheets, describing column level transformation details to load data from Teradata Landing zone tables to the tables in Party and Policy subject area of EDW based on SAS Insurance model.
  • Used JSON and XML documents with Mark logic NoSQL Database extensively. REST API calls are made using NodeJS and Java API.
  • SAS data sets were constantly created and updated using the SET and UPDATE statements
  • Built data transformation with SSIS including importing data from files.
  • Loaded the flat files data using Informatica to the staging area.
  • Created SHELL SCRIPTS for generic use.

Environment: Java, Spring, MPP, Windows XP/NT, Informatica Power center 9.1/8.6, UNIX, Teradata, Oracle Designer, Autosys, Shell, Quality Center 10.


Java Developer


  • Involved in the analysis, design, implementation, and testing of the project.
  • Implemented the presentation layer with HTML, XHTML and JavaScript.
  • Developed web components using JSP, Servlets and JDBC.
  • Implemented database using SQL Server.
  • Implemented Spring IoC framework
  • Developed Spring REST services for all the modules.
  • Developed custom SAML and SOAP integration for healthcare.
  • Validated the fields of user registration screen and login screen by writing JavaScript validations.
  • Used DAO and JDBC for database access.
  • Built responsive Web pages using Kendo UI mobile.
  • Designed dynamic and multi-browser compatible pages using HTML, CSS, jQuery, JavaScript, Require JS and Kendo UI.

Environment: Oracle 11g, Java 1.5, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat 6, Java, JSP, JDBC, JavaScript, MySQL, Eclipse IDE, Rest.


Jr. Java Developer


  • Analyzing and preparing the requirement Analysis Document.
  • Deploying the Application to the JBOSS Application Server.
  • Developed Spring REST services for all the modules.
  • Implemented Web Service using SOAP protocol using Apache Axis.
  • Requirement gatherings from various parties involved in the project
  • Used to J2EE and EJB to handle the business flow and Functionality.
  • Involved in the complete SDLC of the Development with full system dependency.
  • Actively coordinated with deployment manager for application production launch.
  • Monitoring of test cases to verify actual results against expected results.
  • Carrying out Regression testing to track the problem tracking.\

Environment: Java, J2EE, EJB, UNIX, XML, Work Flow, JMS, JIRA, Oracle, JBOSS, Soap.

Hire Now