We provide IT Staff Augmentation Services!

Sr. Apache Spark/hadoop Lead Developer Resume

Atlanta, GA


  • 14+ years of IT experience executing all major facets of the Development, Maintenance and Enhancement projects.
  • 4+ years of strong experience in software development using Bigdata, Hadoop, Apache Spark, Scala technologies to efficiently solve big data processing requirement.
  • Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Hbase, Zookeeper, Sqoop, Oozie and Flume.
  • Currently working on building all domain pipelines using Kafka, Spark steaming, Spark batch processing, SparkSQL and ingestion in Hbase/Hive for near real time Spend analytics.
  • Developed analytical components using Kafka, Spark Stream and Scala.
  • Solid experience in NoSQL column oriented database HBase and its integration with Hadoop cluster.
  • Experience in importing and exporting data using Sqoop from Relational Database systems to HDFS.
  • Implemented the framework designing/Implementation of hive queries and SQOOP to import data from Relational Databases to Hadoop Hive tables and HDFS file system.
  • Have hands on experience in writing Pig Script and HiveQL and mentoring team to resolve their queries.
  • Experience in understanding the data and designing/Implementing the enterprise platforms like Data lake and Huge Data warehouses on Cloud Platform (AWS S3)
  • Worked in various roles such as Project Lead, Senior ETL Architect and Data Analyst.
  • Fine - tuned several complex ETL Reporting applications with a goal of providing faster and more efficient BI platform for business users.
  • Experience in Production, quality assurance (QA), SIT (System Integration testing) and User Acceptance Testing (UAT).
  • Enriched with functional knowledge on domains like Energy & Utility and Manufacturing and Operations.
  • Experience in executing solutions for Integration projects and conducting demos/trainings for large number of audience.
  • Experience in handling end to end projects in Business Intelligence (BI) and Mainframe technologies.
  • Solid Analytics, leadership and training talents with proven ability to supervise and train individuals from across an array of backgrounds.
  • Highly motivated team player with the ability to work independently and adapt quickly to new and emerging technologies.
  • Augmented expertise of working in Onsite-Offshore model. This skill is leveraged to lead the team size of 15.


Big Data Ecosystem: Hadoop, Map Reduce, HDFS, Zookeeper, Hive, Pig, Sqoop, Oozie, Flume, Yarn, Spark core, SparkSQL, Spark Streaming, Kafka

Programming Languages: Scala, PL/SQL, IBM Mainframe S/390 (CICS, COBOL, JCL, VSAM, REXX, CSF 4.2.1, CSF Designer - For AFP file creation)

Databases: Relational Databases, HBase, Redshift

ETL Tools: Informatica, Talend (Big Data Version)

Reporting Tool: Cognos 10.0

Version Control: SVN, Git

Development Methodology: Waterfall, Agile

AWS Services: AWS EC2, S3

Tools: Eclipse, Putty, Maven


Confidential, Atlanta, GA

Sr. Apache Spark/Hadoop Lead Developer


  • Worked with the business users to gather business requirements and analyze the possible technical solutions. It involved Client side architects and big data experts as well. Designing and technology stack decisions were made after carefully evaluating infrastructure, technology stack, data size and business vision.
  • Design data transformation logic and Data flow from one layer to another of Data Lake to meet end objective.
  • Moving the data from MS SQL Server into AWS S3 using Sqoop (MS SQL to HDFS to AWS S3). This involves making sure daily incremental data as well as 5 years history data is also transferred to AWS S3. Making sure the failures in data load are handled and all data is loaded into target.
  • Implemented Spark Core in Scala to process data in memory and filtering out unnecessary records to fine tune Spark jobs. Data was filtered on various criteria like specific statues, POs received from certain source systems etc. This data was later processed to transform from ODS to DW layer.
  • Existing data warehouse was in SQL server and was written in complex logic of SQL Packages. Responsibility was to reverse engineer the existing SQL Packages and convert the logic into Spark Spark RDD & Dataset transformations and SparkSQL.
  • Some of the existing SQL packages were taking too long time for execution. Responsibility was to improve the performance and optimize the algorithms using Spark-SQL Scala. Some of the techniques used were to remove unwanted columns and datasets were cached to make processing faster.
  • Populating data into Hive/Hbase tables for outbound and intra layers interfaces. Hive tables were populated for downstream systems to consume. Hbase tables were populated for storing the data while transferring from ODS to DW layer of AWS S3.
  • Populating data into AWS Redshift for reporting module to utilize it. Once the data was injected into AWS S3 for storage and processed it was copied (COPY command) to transfer data to Redshift.
  • Working on POC to implement Kafka integration with Spark to import structured and unstructured data various data sources.
  • Working on POC to bring OLTP application logs into S3 using Kafka-Spark Streaming integration and massaging data for BI on BI application.

Environment: AWS EMR, S3, Sqoop, Zookeeper, Oozie, Spark-Core 2.0, Spark Streaming, Spark SQL, Maven, SQL Server, Git, Agile, Scala 2.11, Redshift

Confidential, Auburn Hills, MI

Lead Big Data Developer


  • Worked on a live 20 nodes Hadoop cluster running CDH4. One of the responsibility was to set up the cluster with help of administrator and was first level of contact in case of any issues arise while spinning up the cluster or in case of HIVE data injection job failures during warranty phase of the project.
  • Worked with highly unstructured and semi structured data of ~50 TB in size (150 TB with replication factor of 3). This involved bringing in data for 8 years of history as well.
  • Worked with SQOOP (version 1.4.3) jobs. SQOOP was also used to bring in incremental data and load into HIVE tables.
  • Users needed data based on ad-hoc requirements for some of the outbound systems or for analysis. These users did not have direct access to hive data. In such cases with approval from management developed Hive scripts for end user/analyst requirements to perform ad hoc analysis.
  • Developed Sqoop scripts which will load the data into relational database (SQL Server) for downstream systems to consume it. This data was transformed using Spark SQL before loading into relational database.
  • Involved in the unit testing and system testing. Tested the system from the beginning to end to ensure quality of the adjustments made to oblige the Source system up-grades.

Environment: Sqoop, Zookeeper, Oozie, SQL Server, Git, Scala 2.11, HiveQL, Pig, HDFS

Confidential, New York

Sr. Big Data Developer


  • Understanding the data nature from different OLTP systems and designing the injection processes for HDFS.
  • Working on Hadoop File formats TextInputFormat and KeyValueTextInputFormat. Designing data model on Hive.
  • Extracted data from Oracle SQL server and MySQL databases to HDFS using Sqoop.
  • Experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Created Hive tables to store the processed results in a tabular format and written Hive scripts to transform and aggregate the disparate data.
  • Experience in using Avro, Parquet and JSON file formats and developed UDFs using Hive and Pig.
  • Designed and Developed Inventory prediction logic based on history data in Spark.
  • Automated the process for extraction of data from warehouses into HIVE tables by developing workflows and coordinator jobs in Oozie.
  • Build/Tune/Maintain Hive QL and Pig Scripts for reporting purpose.
  • Implementing POC for big data tools like Mahout, Impala etc.
  • Providing support to team members and help them in technical difficulties.
  • Testing developed code and making sure it meets the design and business ultimate goals.

Environment: Sqoop, Zookeeper, Oozie, Spark-Core 2.0, Spark SQL, Maven, SQL Server, Git, Scala 2.11, HiveQL, Pig


Sr. Mainframe Developer/Project Lead


  • Generating business solutions to fit new requirements in to current business work flow.
  • Detailed analysis of the Business requirements and creating High and Detailed level design documents.
  • Onsite coordination for offshored part of the project.
  • Assigning the tasks to team members and tracking them on day to day basis. Maintaining Capacity Planner for the team members.
  • Doing Code Reviews, Implementation plans, back out plans, tracking the tasks assigned to team members.
  • Development and testing of the business modules.
  • Unit, Integration and Regression testing.
  • Leading the team during design, Development and testing phase.
  • User Acceptance Testing support
  • Implementation and warranty support


Sr. Mainframe Developer/Project Lead


  • Gather end user requirements converting them to functional and technical specifications.
  • Analyzing current business work flow to fit new requirements.
  • Map the business requirements to IT requirements & estimate for the requirements identified.
  • Detailed analysis of the Business requirements and creating High and Detailed level design documents.
  • Coordinate with Onsite client team for requirements and reporting status.
  • Code development, monitor team during development and testing of the business modules.
  • Code review, Unit testing, Integration and Regression testing.
  • Coordination between client users and testing team during User Acceptance Testing.
  • Handover to support team, Coordinate between support team and development team during Implementation and warranty support phase.
  • Creating performance Metric for all projects and presenting them to management

Confidential, Los Angeles, California

Mainframe Developer


  • Requirement gathering by discussing through various layers of the business. This involves frequent meetings with business directors, Market people and floor staff for understanding their functional boundaries and mapping them to the requirement.
  • Design walkthrough with all the stake holders of the project in order to gain the buy in.
  • Coordinating with offshore resources for developing the requirement. This often involves status check on the delivery dates, altering the staffing in case of delayed deliveries and accommodating out of scope requirements and query resolution.
  • Performing demo to the end users through UAT. This often is intended to get the buy-in from stake holders for implementing them in production.

Hire Now