We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

Dallas, TX


  • Over 9+ years of IT experience in analysis, design and development using Big Data Ecosystem (Pig, Hive, Impala and Spark, Scala), Java and J2EE.
  • Progressive experience in all phases of the iterative Software Development Life Cycle (SDLC).
  • Experience in working in environments using Agile (SCRUM), RUP and Test - Driven development methodologies.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing
  • Good understanding of NoSQL Database and hands on work experience in writing application on No SQL database which is MongoDB.
  • Strong knowledge in using MapReduce programming model for analyzing the data stored in Hadoop.
  • Extensive experience in installing, configuring and using Big Data ecosystem components like MapReduce, HDFS, Sqoop, Pig, Impala & Spark
  • Expertise in using J2EE application servers such as IBM Web Sphere, JBoss and web servers like Apache Tomcat.
  • Good knowledge on spark components like Spark SQL, MLLib, Spark Streaming and GraphX Proficient in configuring Zookeeper, Cassandra & Flume to the existing Hadoop cluster.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems.
  • Expert in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Experience in analyzing data using HiveQL, PIG Latin and custom Map Reduce programs in JAVA.
  • Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
  • Good knowledge on using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Experience in working with Version Control Tools like Rational Team Concert, Harvest, Clear Case, SVN, and Git-hub.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Experience to develop enterprise applications with MVC architecture with application servers and Web.
  • Strong experience in analyzing large amounts of data sets writing PySpark scripts and Hive queries.
  • Good experience in defining the XML schemas and in working with XML parsers to read and validate the data held in XML documents.
  • Hands-on experience in writing Pig Latin scripts, working with grunt shells and job scheduling with Oozie.
  • Involve in moving all log files generated from various sources to HDFS and Spark for further processing.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology
  • Experience on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
  • Hands on experience in developing and deploying enterprise - based applications using major components in Hadoop ecosystem like Hadoop 2.x, YARN, Hive, Pig, Spark, Map Reduce, Impala, Kafka, Oozie, HBase, Flume,Sqoop and Zookeeper.
  • Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the development, implementation, administration and support for Hadoop.
  • Experience in Python programming language for framework and core java concepts
  • Experience with on-prem (HortonWorks, MapR) and Google Cloud Platform.
  • Experience in monitoring, tuning and administrating Hadoop cluster.
  • Experience in understanding Big Data business requirements and providing them Hadoop based solutions.
  • Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS using Sqoop.
  • Worked on Spark 1.6.0 for data processing using RDD's and Dataframe API.
  • Experience in writing UDF'S in Hive for processing and analyzing large datasets.
  • Experience in working with different file formats and compression techniques in Hadoop.
  • Experience in using NFS (Network File Systems) for backing up Name node metadata.
  • Experience in managing the cluster resources by implementing fair scheduler and capacity scheduler.
  • Experience in developing Pig Latin scripts for data processing on HDFS.
  • Excellent team player with good communication skills and effective time management.
  • Understand business process management and business requirements of the customers and translate them to specific software requirements.
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Spark Streaming.
  • Experience in using Scala to convert Hive/SQL queries into RDD transformations in Spark.
  • Strong knowledge of real time data analytics using Spark Streaming, Kafka & amp; Flume.
  • Proficient knowledge with kafka and spark with YARN Local & Standalone modes.
  • Expertise in writing Spark RDD transformations, Actions, Case classes for input data and performing data transformations using Spark-Core
  • Implementing Scheduler using Azkaban, Tidal Enterprise scheduler, Crontab and Oozie.
  • Experience in using DStreams, Broadcast Variables, RDD caching for Spark Streaming.
  • Improving the performance and optimizing existing algorithms in Hadoop using Spark context, Spark-SQL DataFrames,Pair RDD's & Spark YARN.
  • Hands on experience with ORC, AVRO, Sequence and Parquet file formats.
  • Experience in analyzing data using PIG Latin, HiveQL, Spark SQL
  • Experience with Hadoop Distributions like Cloudera and Hortonworks.
  • Extensive knowledge on designing Hive Managed/External tables, Views & Hive Analytical functions.
  • Experience in tuning the performance of hive queries using Partitioning and Bucketing.
  • Experience working with FLUME to handle large volume of streaming data ingestion.
  • Experience in developing customized UDFs and UDAFs to extend core functionality if PIG and Hive.
  • Experience in various Big Data application phases like Data Ingestion, Data analytics and Data visualization.
  • Proficient in working with NoSQL databases such as HBase and MongoDB.
  • Expertise in writing pig and hive queries for analyzing data to meet business requirements.
  • Experience in design and pipeline flows with Jenkins, Tonomi and Azkaban.
  • Exposed to build tools like MAVEN, SBT and bug tracking tool JIRA in the work environment.
  • Good Knowledge in scheduling Job/Workflow and monitoring tools like Azkaban and Cisco Tidal Scheduler.
  • Hands on Experience in Importing/Exporting Data from RDBS to HDFS using SQOOP.
  • Excellent programming skills at high level abstraction using Java, Scala, Python & SQL.
  • Co-ordinate patch upgrades, bug fixes and new releases for the application within stipulated timelines
  • Performing Team Lead Activities and Coordination with the team members and defining time estimations for deliverables of change requests, patches and upgrades to the application.


Apache (6 years), APACHE CASSANDRA (5 years), APACHE HADOOP HDFS (5 years), APACHE HADOOP MAPREDUCE (5 years), APACHE HADOOP OOZIE (5 years), APACHE HADOOP SQOOP (5 years), APACHE KAFKA (5 years), ASTERADATA (5 years), Cassandra (5 years), Flume (5 years), Hadoop (5 years), HADOOP DISTRIBUTED FILE SYSTEM (5 years), HDFS (5 years), Hive (5 years), Java (7 years), Kafka (5 years), MapReduce (5 years), Oozie (5 years), Scala (5 years), SQL (6 years)


Sr. Big Data Developer

Confidential, Dallas, TX


  • As a Sr. Big Data Developer worked on Hadoop eco - systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
  • Developed Big Data solutions focused on pattern matching and predictive modeling.
  • Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Participated in all aspects of Software Development Life Cycle (SDLC) and Production troubleshooting, Software testing using Standard Test Tool.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Loaded the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data .
  • Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
  • Involved in identifying job dependencies to design workflow for Oozie and YARN resource management.
  • Designed solution for various system components using Microsoft Azure.
  • Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
  • Explored with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD.
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Developed ApacheNifi flows dealing with various kinds of data formats such as XML, JSON, and Avro.
  • Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Configured Hive meta store with MySQL, which stores the metadata for Hive tables.
  • Performed data analytics in Hive and then exported those metrics back to Oracle Database using Sqoop.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
  • Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
  • Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data .
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
  • Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and hive.
  • Analyzed large amounts of data sets using HBase to aggregate and report on it.
  • Developed reports, dashboards using Tableau for quick reviews to be presented to business.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data .
  • Developed many distributed, transactional, portable applications using Enterprise JavaBeans (EJB) architecture for Java 2 Enterprise Edition (J2EE) platform.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Worked on MongoDB, HBase databases which differ from classic relational databases
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Used Hive to perform data validation on the data ingested using Sqoop and cleansed the data .
  • Developed several business services using Java RESTful Web Services using Spring MVC framework.

Environment: Hadoop 3.0, Oozie 4.3, Zookeeper 3.4, Cassandra 3.0, Sqoop 1.4, Apache NiFi 1.4, ETL, Azure, Hive 2.3, HBase 1.4, Pig 0.17, HDFS 3.1, Flume 1.8, Tableau, GIT, Kafka 1.1, MapReduce, JSON, AVRO, Teradata, Maven, SOAP.

Big Data /Hadoop Developer

Confidential, Bellevue, WA


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in gathering the business requirements from the Business Partners and Subject Matter Experts.
  • Migrated MapReduce jobs to Spark jobs to achieve better performance.
  • Interacted with the stake - holders and gather requirements and business artifacts based on Agile SCRUM methodology.
  • Extracted Real time feed using Kafka and Spark streaming and convert it to RDD and process data in the form of Data Frame.
  • Worked on Kafka and REST API to collect and load the data on Hadoop file system also used Sqoop to load the data from relational databases.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data .
  • Developed Apache Spark applications by using spark for data processing from various streaming sources.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
  • Imported data from AWS S3 and into spark RDD and performed transformations and actions on RDD.
  • Implemented Apache Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Involved in migrating MapReduce jobs into RDD (Resilient data distributions) and create Spark jobs for better performance.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data .
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Developed Scala scripts, UDF using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Involved in executing various Oozie workflows and automating parallel Hadoop MapReduce jobs.
  • Involved in transforming data from legacy tables to HDFS and Hive tables using Sqoop.
  • Implemented Spark using and Spark SQL for faster testing and processing of data responsible to manage data from different sources Scala.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Developed Oozie Bundles to Schedule Pig, Sqoop and Hive jobs to create data pipelines.
  • Experienced in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
  • Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
  • Implemented multiple MapReduce Jobs in java for data cleansing and pre-processing.
  • Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
  • Import the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
  • Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for our use case
  • Worked with teams in setting up AWS EC2 instances by using different AWS services like S3, EBS, Elastic Load Balancer, and Auto scaling groups, VPC subnets and CloudWatch.
  • Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Created and maintained various Shell and Python scripts for automating various processes and optimized MapReduce code, pig scripts and performance tuning and analysis.

Environment: Hadoop 3.0, Spark, Hive 2.3, Agile, MapReduce, Kafka 1.1, HBase 1.4, HDFS 3.1, Sqoop 1.4, Scala, AWS, RDBMS, Oozie, Pig 0.17, Sqoop, Cassandra 3.11, NoSQL, Elastic Search, Java

Spark Developer

Confidential, Atlanta, GA


  • Wrote Programs in Spark using Scala and Python for Data quality check.
  • Worked on Big Data infrastructure for batch processing and real time processing. Built scalable distributed data solutions using Hadoop.
  • Written transformations and actions on data frames used Spark SQL on data frames to access hive tables into spark for faster processing of data .
  • Imported and exported terabytes of data using Sqoop and real time data using Flume and Kafka.
  • Created various hive external tables, staging tables and joined the tables as per the requirement.
  • Implemented static Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Used Hive to do transformations, joins, filter and some pre - aggregations after storing the data to HDFS.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Implemented the workflows using Apache Oozie framework to automate tasks. Used Zookeeper to co-ordinate cluster services.
  • Have used Enterprise Data Warehouse (EDW) architecture and various data modeling concepts like star schema, snowflake schema in the project.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters and implemented data ingestion and handling clusters in real time processing using Kafka.
  • Performed various benchmarking steps to optimize the performance of spark jobs and thus improve the overall processing.
  • Developed multiple POCs using Pyspark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Developed code in reading multiple data formats on HDFS using Pyspark.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive and involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Designed ETL workflows on Tableau, Deployed data from various sources to HDFS and generated reports using Tableau.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Hadoop 2.8, MapReduce, HDFS, Yarn, Hive 2.1, Sqoop 1.1, Cassandra 2.7, Oozie, Spark, Scala, Python, AWS, Flume 1.4, Kafka, Tableau, Linux, Shell Scripting.

Java/J2EE Developer

Confidential, Newport Beach, CA


  • As a Java/J2EE developer involved in back - end and front-end developing team.
  • Extensively used for system analysis, design and development using J2EE architecture.
  • Actively participated in requirements gathering, analysis, and design and testing phases.
  • Developed the application using Spring Framework that leverages classical Model View Controller (MVC) architecture.
  • Involved in Software Development Life cycle starting from requirements gathering and performed OOA and OOD
  • Created a Transaction History Web Service using SOAP that is used for internal communication in the workflow process.
  • Designed and created components for company's object framework using best practices and design Patterns such as Model-View-Controller (MVC).
  • Used DOM and DOM Functions using Firefox and IE Developer Tool bar for IE.
  • Debugged the application using Firebug to traverse the documents.
  • Involved in writing SQL Queries, Stored Procedures and used JDBC for database connectivity with MySQL Server.
  • Developed the presentation layer using CSS and HTML taken from Bootstrap to develop for browsers.
  • Implemented XML parsers with SAX, DOM, and JAXB XML Parser Libraries to Modify User view of Products and Product information in Customized view with XML, XSD, XSTL in HTML, XML, PDF formats.
  • Used Spring Core and Spring-web framework. Created a lot of classes for backend.
  • Involved in developing web pages using HTML and JSP.
  • Exposed business functionality to external systems (Interoperable clients) using Web Services (WSDL-SOAP) Apache Axis.
  • Developed POJO classes and writing Hibernate query language (HQL) queries.
  • Used PL/SQL for queries and stored procedures in SQL as the backend RDBMS.
  • Involved in the Analysis and Design of the front-end and middle tier using JSP, Servlets and Ajax.
  • Implemented Spring IOC or Inversion of Control by way of Dependency Injection where a Factory class was written for creating and assembling the objects.
  • Created EJB, JPA and Hibernate component for the application.
  • Established continuous integration with JIRA, Jenkins.
  • Developed data mapping to create a communication bridge between various application interfaces using XML, and XSL.
  • Used Hibernate to manage Transactions (update, delete) along with writing complex SQL and HQL queries.
  • Used Microsoft VISIO for developing Use Case Diagrams, Sequence Diagrams and Class Diagrams in the design phase.
  • Developed Restful Web services client to consume JSON messages using Spring JMS configuration. Developed the message listener code.
  • Provided production support which includes handling tickets & providing resolution andUsed BMC Remedy Tool to add issues & update resolutions.
  • Create database objects like tables, sequences, views, triggers, stored procedures, functions packages.
  • Used Maven as the build tool and Tortoise SVN as the Source version controller.

Environment: Java, J2EE, MVC, CSS, HTML, Bootstrap, Hibernate, Jenkins, Microsoft, VISIO, JSON, Maven

Java/J2EE Developer



  • Implemented Java advances incorporating specialization in XML, XSL, and XSLT.
  • Developed the web application using Spring MVC architecture and implemented business layer using Spring Framework and Spring Validator.
  • Implemented RESTful web services using JAX - RS and Jersey API to expose the data as a service.
  • Developed test cases to perform unit testing using Junit Framework.
  • Used React and AngularJS as the development frameworks to build a single-page application.
  • Utilized various Designed GUI and User Interface using JSP, JSTL, HTML, CSS, and JavaScript, AJAX and jQuery technologies.
  • Used Java Scripts for client side validations and validation frame work for server side validations.
  • Implemented Log4J for Logging Errors, debugging and tracking using loggers.
  • Used Ant Script for performing automated build for the project files and Web Sphere enterprise Server for deploying the application.
  • Used JIRA for bug tracking and monitoring completion of work in the system in agile methodology.
  • Worked extensively on Web Services (SOAP & RESTful), XML, JMS and Spring Controller.
  • Used Git as source control management giving a huge speed advantage on centralized systems that have to communicate with a server.
  • Developed Test-Driven Development (TDD) by using spring, Junit and Cucumber.
  • Used SOAP based web services to develop interfaces to integrate between front end systems and back end systems.
  • Developed server-side services using Spring Web Services (SOAP, WSDL).
  • Designed in-house build automation and continuous integration systems by utilizing Node.js.
  • Implemented caching techniques, wrote POJO classes for storing data and DAO s to retrieve the data and did other database configurations using EJB.
  • Integrated Docker and Maven plugin with Jenkins for the continuous integration and continuous deployment.
  • Developed WebLogic container security components for adding vendor specific Security Realm to application using JMX.

Environment: Java, XML, MVC, spring, AngularJS, Junit, JSP, HTML, AJAX, Jquery, POJO, Maven, Jenkins

Hire Now