We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Raleigh, NC


  • 11 years IT experience in analysis, design, development and implementation of business applications with thorough knowledge in Java, J2EE, Big data, Hadoop Eco System, Spark, Scala, ETL and RDBMS related technologies with domain exposure in Banking and Financial, Insurance, Management Systems.
  • 4+ years of strong experience in developing distributed computing Big Data Analytical applications using open Source frameworks like Apache Spark, Apache Hadoop, Hive, kafka etc.
  • Good understanding of Apache Spark features & advantages over map reduce or traditional systems.
  • Very good hands - on in Spark Core, Spark Sql, Spark Streaming and Spark machine learning using Scala and Python programming languages.
  • Solid Understanding of RDD Operations in Apache Spark i,e Transformations & Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimizing Broadcasts.
  • In depth understanding of Apache spark job execution Components like DAG, lineage graph, Dag Scheduler, Task scheduler, Stages and task.
  • Good understanding Driver, Executor Spark web UI.
  • Experience in submitting Apache Spark job and map reduce jobs to YARN.
  • Experience in exposing Apache Spark as web services.
  • Worked under direction of data scientist to develop an efficient solution to a predictive analytics problem, testing a number of potential machine learning algorithms of apache spark.
  • Experience in real time processing using Apache Spark and Kafka.
  • Have good working experience of No SQL database like Cassandra and MangoDB.
  • Delivered at three end-to-end Big data analytical based solutions.
  • Good experience in machine learning.
  • Migrated Python Machine learning modules to scalable, high performance and fault-tolerant distributed systems like Apache Spark.
  • Strong experience in Spark SQL UDFs, Hive UDFs, Spark SQL Performance, Performance Tuning
  • Hands on experience in working with input file formats like orc, parquet, json, avro.
  • Worked with large data sets from multiple sources utilizing Big Data tools and techniques for the purposes of analyzing, providing insight and validating hypotheses
  • Experience leveraging DevOps techniques and practices like Continuous Integration, Continuous Deployment, Test Automation, Build Automation and Test
  • Hands on experience leading delivery through Agile methodologies
  • Good expertise in coding in Python, Scala and Java.
  • Experience in managing code on Github
  • Very Good Knowledge in YARN (Hadoop 2.x.x) terminology and High availability Hadoop Clusters.
  • Experience in analyzing the log files for Hadoop and eco system services and finding out root cause.
  • Performed Thread Dump Analysis for stuck threads and Heap Dump Analysis for leaked memory with Memory analyzer tool manually.
  • Proficient in Java, with a good knowledge of its ecosystems.
  • Good hands on experience on Spring & Hibernate framework.
  • Solid understanding of object-oriented programming.
  • Familiarity with concepts of MVC, JDBC, and RESTful.
  • Familiarity with build tools such as Maven and SBT.
  • Experience with (TDD) test-driven development.
  • Extensive usage of Verbose GC for JVM monitoring in performance tuning.
  • Having good understanding on Garbage collection and performance tuning of Garbage collection.
  • Very Good experience on high-volume transactional systems running on Unix/Linux and Windows.
  • Involved in all phases of Software Development Life Cycle (SDLC) in large-scale enterprise software using Object Oriented Analysis and Design.
  • Co-ordination with different tighter schedules and efficient in meeting deadlines.


Big Data Frameworks: Hadoop, Spark, Scala, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, MapReduce, Cloudera, Mongo DB.

Languages: Core Java, Scala, Python, SQL, Shell Scripting.

Big data Distribution: Cloudera, Amazon EMR

Cloud: Amazon web services.

Front End Technologies: JSP, HTML5, Ajax, jquery and XML

Servers: CherryPy, Web Sphere and Tomcat.

Web and Enterprise Technologies: Servlet and Spring

Persistence: Hibernate

Visualization Tool: Apache Zeppelin & Matplotlib.

Databases: Oracle, SQL Server.

Operating Systems: LINUX, Windows, MS-DOS

Scheduling Tools: Control-M, Autosys

Query Tools: TOAD, SQL Navigator.


Confidential, Raleigh, NC

Hadoop/Spark Developer


  • Coordinating with Data Science team to gather requirements for various data mining projects.
  • Cleaned input text data using Spark Machine learning feature exactions API.
  • Created Machine learning features to train algorithms.
  • Used various algorithms of Spark ML API.
  • Used Spark Streaming to load the trained model to predict on real time data from Kafka.
  • Trained model using historical data stored in HDFS and AWS S3.
  • Using Spark framework and Scala programming and stored result in NoSQL database Cassandra.
  • Implementing Spark RDD transformations, actions to implement business analysis.
  • Importing the data from the MySQL and Oracle into the HDFS using Sqoop.
  • Build Map Reduce scripts for processing the data.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework and save to Cassandra keyspace.
  • Developed applications with high performance utilizing EMR-EC2 instances by choosing appropriate instance types and capacity.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing customer behavioral data.
  • Involved building REST API to read data from CDR database and loaded data into EDH Kafka topic.
  • Converting all Hadoop cascading flows into Enterprise data hub using Spark streaming, Kafka and Spark batch processing.
  • Created different keyspaces in Cassandra, one as an intermediate keyspace and the next final keyspace is to save the aggregated data from which micro strategy team will use for reports and dashboards visualization.
  • Decision making on structure/DDLs of the tables created in Cassandra with appropriate partition, clustering keys based on the business requirement and look up patterns on the tables.
  • Created Hive tables as per requirement, internal and external tables are defined with appropriate static and dynamic partitions, intended for efficiency.
  • Created the Hive Managed tables, External Tables and partitioned tables using Hive Index and bucket to and wrote HQL scripts to perform data analysis.
  • Worked on all types of web services like SOAP, Rest with asynchronous messaging.
  • Involved in Gerrit review code and JIRA to track requests and issues.
  • Developing customized UDF's in Java to extend Hive while querying and processing of Data.
  • Handled Delta processing or incremental updates using hive and processed the data in hive tables.
  • Worked on Hive optimization techniques using joins, sub queries and used various functions to improve the performance of long running jobs.
  • Processed the real time steaming data using Kafka, Flume integrating with Spark streaming API.
  • Imported streaming data using Apache Kafka and designed hive tables on top.
  • Experienced in migrating HiveQL into Spark SQL into Spark engine to minimize query response time. And, used Spark SQL to process NRT data Processing into Hive tables.
  • Converted sequential file data formats into Avro format with snappy conversion.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing, development and production.
  • Developed Scala unit tests using Scala Funsuite.
  • Involved in loading data from UNIX file system to HDFS.

Tools: and Technologies: Spark core, Spark SQL, Spark streaming, Scala, AWS S3, Cloudera, Kafka, Cassandra, Hive, Sqoop, Java/J2EE, github, SOAP, REST, Gerrit, Jira .

Confidential, NY

Hadoop Developer


  • Importing the data from the MySQL and Oracle into the HDFS using Sqoop.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed MapReduce programs on the cluster to parse the raw data, populate staging tables and store the refined data in partitioned tables in HDFS.
  • Create common Java components between the applications in order to convert data to appropriate state for the applications.
  • Implemented knowledge of various Java, J2EE and EAI patterns.
  • Wrote Map Reduce jobs using Java API and HIVEQL.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with HDFS reference tables and historical metrics.
  • Written customized Hive UDFs in Java where the functionality is too complex.
  • Designed HBase schema to avoid Hot spotting and exposed data from tables to REST API on UI.
  • Used Flume to collect, aggregate, and store the log data from different web servers.
  • Created business data reports using Spark SQL.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Importing and exporting data into HDFS using Sqoop and Kafka.
  • Processed the steaming data using Kafka, integrating with Spark Streaming API.
  • Maintenance of data importing scripts using Hive and MapReduce jobs Ability to understand and capture technical as well as business requirements.
  • Working side by side with R&D, QA, and Operations teams to understand, design, develop and support ETL platform and end-to-end data flow requirements.
  • Developing structured, efficient and error free codes for Big Data requirements using my knowledge in Hadoop and its Eco-system.
  • Designed HBase schema to avoid Hot spotting and exposed the data from HBase tables to REST API on UI.
  • Proficient knowledge on setting up Hadoop software's for lower environment like DEV, SIT Ubuntu 14.x/16.x, Hadoop 2.6.x, and Spark-shell cluster mode configurations.
  • Involved migrations process from Hadoop Java map-reduce program to Spark-Scala APIs.
  • Writing Scala programs to create Spark RDDs & SQL data frame to load processed data into RDBMS for mortgage analysis dashboard.
  • Storing, processing and analyzing huge data-set for getting valuable insights from them.
  • Worked on UI layer to publish the VMAA analysis data using Java/J2EE, Spring, Hibernate, JSP, HTML and XML.
  • Involved migrations process from Hadoop java map-reduce program to Spark-Scala APIs.

Tools: and Technologies: Apache Hadoop, Cloudera, MapReduce, HBase, Flume, Impala, Sqoop, Pig, Zookeeper, Java/J2EE, Spring, Hibernate, JSP, ETL, Spark1.6, Unix, Rest API.

Confidential, Newark, NJ

Sr. Java Developer


  • Developed DAO Implement classes using Hibernate Template from SPRING with HIBERNATE API.
  • Developed business components and configured as beans using Spring DI.
  • Developed Java Message Services (JMS) using Message Queue and Topic.
  • Code, test, debug, implement, and document highly complex programs.
  • Develop complex test plans to verify logic of new or modified programs using Java and other technologies.
  • Providing devise creative and efficient technical solutions
  • Implemented Service Oriented Architecture with Web Services components such as SOAP, WSDL protocols and RESTful API.
  • Fixed issues for NYSOH consumer's production defects.
  • Fixed NYSOH application's 834 inbound and outbound issues.
  • Involved in EDI data file processing to ISSUER and EMEDNY with NYSOH.
  • Handled issues for enrollment data process from AUTO ENROLL QUEUE task.
  • Implemented schedulers for batch jobs respective to enrollment periods and renewals.
  • Develop code using relational databases to facilitate programming software.
  • Create appropriate documentation in work assignments such as program code, and technical documentation.
  • Assist in resolving production support problems. Develop and suggest alternative approaches to resolving problems.
  • Research and analyse existing systems and program requirements, under periodic supervision
  • Develop test plans to verify logic of new or modified programs. Identify issues as appropriate.
  • Create appropriate documentation in work assignments such as program code, and technical documentation.
  • Design programs for projects or enhancements to existing programs
  • Maintained build related scripts developed in ANT, shell and WLST. Created and modified build configuration files including Ant's build.xml.
  • Worked with development team to migrate Ant scripts to Maven.

Environment: Core Java, J2EE, Java script, Eclipse, Web Services, HTML, CSS, XML, Spring with Hibernate, DB2, Apache Tomcat, Maven, SOAP, Restful, QUEUE and TOPIC.

Confidential, Saint Louis, MO

Java Developer


  • Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC).
  • Analyze the software requirements to determine feasibility of design within time and cost constrains.
  • Designed and Developed User Interactive (UI) of Web pages with the help of HTML, CSS3, CSS, JavaScript, JQuery, Bootstrap and AJAX.
  • Involved in the creation of interface to manage user menu and Bulk update of Attributes using Angular.JS, node. Js, EXTJS, Require.js and JQuery.
  • Developed the Controller, Service layer using Spring MVC & Spring JDBC.
  • Implemented RESTfull Web services with spring and Angular.js.
  • Configured the Transaction Management for project using Spring Container Managed transactions.
  • Creating Custom directives and dependency injection.
  • Applied SQL commands and Stored Procedures to retrieve data from Oracle 11g database.
  • Used Hibernate ORM Framework to communicate with Oracle 11g database.
  • Performed Unit testing on angular applications using tools like Karma, Jasmine.
  • Involved in developing XML, HTML, and JavaScript for client side presentation and, data validation on the client side with in the forms.
  • Implemented CSS3 and JavaScript based navigation system visually identical to previous table-based system to improve ease of maintenance and organic search engine placement.
  • Used various libraries of JavaScript like jQuery, jQuery UI, backbone.js and node.js.
  • Created Master Pages, CSS Styles Sheets and Integrated to Silver light and got approval from Business Stake holders.
  • Created HTML navigation menu that is role based menu items changes dynamically, derived from the database in the form of XML.
  • Used JQuery core library functions for logical implementation part at client side for all applications.
  • Designed new classes and functionalities using various JQUERY components for customer service. Designed and developed User Interface Web Forms using Adobe Flash, CSS, and JavaScript.
  • Used AJAX for implementing dynamic Webpages where the content was fetched via API calls and updated the DOM (JSON Parsing)
  • Expertise in web Data Visualization skills to render large sets of data on an MS excel like table view.

Tools: and Technologies: Core Java, J2EE, JSP, JBoss, Oracle, Eclipse, JMS, XML, HTML, AJAX, SVN, Struts, XML.

Confidential, Charlotte, NC

Java Developer


  • Designed and developed the project using MVC design pattern.
  • Involved in preparing Technical Design Document of the project.
  • Designed and developed application using JSP Custom Tags, Struts tag libraries.
  • Developed Controller Servlets, Action and ActionFrom objects for process of interacting with Oracle Database using Struts.
  • Developed the front-end using Java, JSP, servlets, HTML, DHTML, and JavaScript.
  • Used and configured Struts ActionForms, MessageResources, ActionMessages, ActionErrors, Validation.xml, Validator-rules.xml.
  • Followed Waterfall to produce high Quality software and satisfy the customers.
  • Wrote stored procedures and Database Triggers using PL/SQL.
  • Worked in using JBoss Application Server for deploying and testing the code.
  • Responsible for preparing use cases, class and sequence diagrams for the modules using UML.
  • Developed Data Access Layer to interact with backend by implementing Hibernate Framework.
  • Implemented the spring dependency injection of the Database helper instance to the action objects.
  • Used Hibernate in data access layer to access and update information in the database.
  • Wrote Junit classes for the services and prepared documentation.
  • User HP Quality Center to track Defect details and working towards Issues.

Tools: and Technologies: Java, JSP, Struts, Spring, and Hibernate, Oracle 10g, Eclipse, TOAD, CVS, ANT, Junit, HP Quality Center, Jboss, and Windows SQL Server.

Confidential, Milpitas, CA

Software Developer


  • Developed complex Mappings to load data from external sources like Flat Files, XML, Oracle, SQL Server using Informatica PowerCenter 7.1.4 designer to the Staging and Data Warehouse database.
  • Created Mappings using transformations like Source Qualifier, XML Source Qualifier, Expression, Sorter, Aggregator, Filter, Lookup, Update Strategy, Sequence Generator, Normalizer, Stored Procedure, Joiner, Rank, Router and Transaction Control transformations.
  • Used connected, unconnected lookup, static and dynamic cache to implement business logic and improve the performance.
  • Used partitioning in Sessions to improve the performance of the database load time.
  • Created Session Tasks, Worklets and Workflows comprising of different mappings to control the flow of data in to Data Warehouse based on the success of the previous session or worklet.
  • Implemented Error Handling strategy for trapping errors from Date validation, Data type mismatch and Lookup failures.
  • Scheduled the Jobs using Autosys to run at particular time/interval thru the Scheduler using Workflow Manager.
  • Tested the existing mappings and redesigned the mappings to improve the performance and the efficiency of the design logic.
  • Performed various operations like scheduling, publishing and retrieving reports from corporate documents using the business objects reporting.
  • Performed unit test, string test, parallel test and user testing by developing test cases and test scripts.

Tools: and Technologies: Informatica Power Center 7.1.4, SQL Server 2000, MS Access, Autosys, TOAD, XML, Unix.

Hire Now