We provide IT Staff Augmentation Services!

Sr.big Data Developer Resume

Valley Forge, PA


  • Above 8+years of professional IT experience which includes Java/J2EE, Big Data ecosystem related experience in developing Spark/Hadoop applications.
  • Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
  • Operated on Java/J2EE systems with different databases, which include Oracle, MySQL and DB2.
  • Knowledge on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
  • Good working experience on different file formats (PARQUET, TEXTFILE, AVRO, ORC) and different compression codecs (GZIP, SNAPPY, LZO).
  • Valuable experience on practical implementation of cloud - specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), Amazon ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS.
  • Build AWS secured solutions by creating VPC with private and public subnets.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Good knowledge in developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS.
  • Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
  • Hands on experience working on NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster & Kubernetes cluster.
  • Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
  • Good Exposure on Apache Hadoop MapReduce programming, PIG Scripting and Distribute Application and HDFS.
  • Experience in building Pig scripts to extract, transform and load data onto HDFS for processing.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
  • Experience in Big Data analysis using PIG and HIVE and understanding of SQOOP and Puppet.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Strong work ethics with desire to succeed and make significant contributions to the organization.
  • Load streaming log data from various web servers into HDFS using Flume.
  • Experience in deployment of Hadoop Cluster using Puppet tool.
  • Experience in scheduling Cron jobs on EMR, Kafka, and Spark using Clover Server.
  • Proficient in using RDMS concepts with Oracle, SQL Server and MySQL.
  • Hands on experience with build and deploying tools like Maven and GitHub using Bash scripting.
  • Hands on experience with spring tool suit for development of Scala Applications.
  • Extensive experience working with structured data using Spark SQL, Data frames, Hive QL, optimizing queries, and in corporate complex UDF's in business logic.
  • Experience working with Text, Sequence files, XML, Parquet, JSON, ORC, AVRO file formats and Click Stream log files.
  • Experience working with Data Frames, RDD, Spark SQL, Spark Streaming, APIs, System Architecture, and Infrastructure Planning.
  • Experience in usage of Hadoop distribution like Cloudera and Hortonworks.
  • Strong knowledge in working with UNIX/LINUX environments, writing shell scripts and PL SQL Stored Procedures.


Hadoop Ecosystem:: HDFS, YARN, Spark Core, Spark SQL, Spark Streaming, Scala, Map Reduce, Hive 2.3, Pig 0.17, Zookeeper 3.4.11, Sqoop 1.4, Oozie 4.3, Bedrock, Apache Flume 1.8, Kafka 2.0, Impala 3.0, Nifi, MongoDB, HBase.

Languages:: Python, PL/SQL, Java, HiveQL, Pig Latin, Scala, UNIX shell scripting.

Hadoop Platforms:: Hortonworks, Cloudera, Azure, Amazon Web services (AWS).

Amazon Web Services:: Redshift, EMR, EC2, S3, RDS, Cloud Search, Data Pipeline, Lambda.

Databases:: Oracle 12c, MS-SQL Server 2017, MySQL, PostgreSQL, NoSQL (HBase, Cassandra 3.11, MongoDB), Teradata r14.

Tools: Eclipse 4.8, NetBeans 9.0, Informatica, IBM DataStage, Talend, Maven, Jenkins 2.12.

Operating Systems:: Windows XP/2000/NT, Linux, UNIX.

Version Control:: GitHub, SVN, CVS.

Packages:: MS Office Suite, MS Vision, MS Project Professional.


Confidential, Valley Forge, PA

Sr.Big Data Developer


  • Working as a Sr. Big Data Developer with Big data & Hadoop Ecosystems components.
  • Responsible for automating build processes towards CI/CD/DevOps automation goals.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Data Ingestion into the Indie-Data Lake using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Open source Apache tools like FLUME and SQOOP into HIVE environment.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
  • Develop predictive analytic using Apache Spark Scala APIs.
  • Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
  • Configured Sqoop and developed scripts to extract data from MYSQL into HDFS.
  • Created tables in HBase to store variable data formats of PII data coming from different portfolios.
  • Implemented best income logic using Pig scripts.
  • Write Puppet manifests and modules to deploy, configure and manage virtualized environment.
  • Heavily involve in fully automated CI/CD pipeline process through Github, Jenkins and Puppet
  • Built and deployed Docker containers to improve developer workflow, increasing scalability and optimization.
  • Used AWS CloudTrail for audit findings and Cloud Watch for monitoring AWS resources
  • Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
  • Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Involved in scheduling Oozie workflow to automatically update the firewall.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Worked in the BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
  • Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases
  • Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Developed Nifi flows dealing with various kinds of data formats such as XML, JSON and Avro
  • Implemented test scripts to support test driven development and continuous integration.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Responsible for managing data coming from different sources
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts
  • Used Spark SQL to process the huge amount of structured data.
  • Developed Spark streaming application to pull data from cloud to Hive table.

Environment: HDFS, MapReduce, Pig 0.17, Hive 2.3, Sqoop 1.4, Flume 1.8, Oozie 4.3, HBase, Impala 3.0.0, Spark Streaming, Yarn, Eclipse, spring, PL/SQL, UNIX Shell Scripting, Cloudera.

Confidential, Sunnyvale, CA

Hadoop,Spark Developer


  • Worked on Big Data infrastructure for batch processing and real time processing. Built scalable distributed data solutions using Hadoop.
  • Importing and exporting terabytes of data using Sqoop and real time data using Flume and Kafka.
  • Written Programs in Spark using Scala and Python for Data quality check.
  • Created various hive external tables, staging tables and joined the tables as per the requirement.
  • Implemented static Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Written transformations and actions on data frames, used Spark SQL on data frames to access hive tables into spark for faster processing of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Used Hive to do transformations, joins, filter and some pre-aggregations after storing the data to HDFS.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Implemented the workflows using Apache Oozie framework to automate tasks. Used Zookeeper to co-ordinate cluster services.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters and implemented data ingestion and handling clusters in real time processing using Kafka.
  • Performed various benchmarking steps to optimize the performance of spark jobs and thus improve the overall processing.
  • Developed multiple POCs using Pyspark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Developed code in reading multiple data formats on HDFS using Pyspark.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive and involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Designed ETL workflows on Tableau, Deployed data from various sources to HDFS and generated reports using Tableau.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment : Hadoop 2.8, MapReduce, HDFS, Yarn, Hive 2.1, Sqoop 1.1, Cassandra 2.7, Oozie, Spark, Scala, Python, AWS, Flume 1.4, Kafka, Tableau, Linux, Shell Scripting.

Confidential, Durham, NC

Spark Developer


  • Actively involved in designing Hadoop ecosystem pipeline.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Involved in designing Kafka for multi data center cluster and monitoring it.
  • Responsible for importing real time data to pull the data from sources to Kafka clusters.
  • Worked with spark techniques like refreshing the table and handling parallelly and modifying the spark defaults for performance tuning.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Involved in migrating MapReduce jobs into Spark jobs and used SparkSQL and Data frames API to load structured data into Spark clusters.
  • Involved in using Spark API over Hadoop YARN as execution engine for data analytics using Hive and submitted the data to BI team for generating reports, after the processing and analyzing of data in Spark SQL.
  • Performed SQL Joins among Hive tables to get input for Spark batch process.
  • Worked with data science team to build statistical model with Spark MLLIB and Pyspark.
  • Involved in performing importing data from various sources to the Cassandra cluster using Sqoop.
  • Worked on creating data models for Cassandra from Existing Oracle data model.
  • Designed Column families in Cassandra and Ingested data from RDBMS, performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Used Sqoop to import functionality for loading Historical data present in RDBMS to HDFS
  • Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Apache Hadoop environment by Hortonworks (HDP 2.2)
  • Configured Hive bolts and written data to hive in Hortonworks as a part of POC.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Developed Oozie workflow for scheduling & orchestrating the ETL process.
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
  • Worked extensively on Apache Nifi to build Nifi flows for the existing Oozie jobs to get the incremental load, full load and semi structured data and to get data from rest API into Hadoop and automate all the Nifi flows runs incrementally.
  • Created Nifi flows to trigger spark jobs and used put email processors to get notifications if there are any failures.
  • Developed shell scripts to periodically perform incremental import of data from third party API to Amazon AWS
  • Worked extensively with importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive and AWS cloud.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Used version control tools like GITHUB to share the code snippet among the team members.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.

Environment : Hadoop 3.0, HDFS, Hive 2.3, Python 3.7, Spark 2.3, MYSQL, Oracle 12c, Linux, Hortonworks, Oozie 4.3, MapReduce, Sqoop 1.4, Shell Scripting, Apache Kafka 2.0, Scala, AWS.

Confidential, Centennial, CO

Java,Hadoop Developer


  • Analyzed Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
  • Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Extracted files from Cassandra through Sqoop and placed them in HDFS and processed them.
  • Performed data modeling to connect data stored in Cassandra DB to the data processing layers and wrote queries in CQL.
  • Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC) using Agile software development methodology.
  • Used Rational Rose for developing Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design phase.
  • Involved in implementation of the presentation layer (GUI) for the application using JSF, HTML4, CSS2/3 and JavaScript.
  • Developed multiple scripts for analyzing data using Hive and Pig and integrating with HBase.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Automated all the jobs from pulling data from databases to loading data into SQL server using shell scripts.
  • Developed integration services using SOA, Mule ESB, Web Services, SOAP, and WSDL.
  • Designed UI screens using JSP 2.0 and HTML. Using JavaScript for client side validation.
  • Actively involved in designing and implementing Singleton, MVC, and Front Controller and DAO design patterns.
  • Used log4j to log the messages in the database.
  • Performed unit testing using JUNIT framework.
  • Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
  • Used Hibernate to access the database and mapped different POJO classes to the database tables and persist the data into the database.
  • Used Spring Dependency Injection to set up dependencies between the objects.
  • Developed Spring-Hibernate and struts integration modules.
  • Developed Pig Scripts, Pig UDF's and Hive Scripts, Hive UDF's to load data files.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting
  • Implemented scripts for loading data from UNIX file system to HDFS.
  • Integrated Struts application with Spring Framework by configuring Deployment descriptor file and application context file in Spring Framework.
  • Implemented Model View Controller (MVC) architecture using Spring Framework.
  • Worked on Java Beans and other business components for the application and implemented new functionalities for the ERIC application.
  • Developed various SQL queries and PL/SQL Procedures in Oracle db for the Application

Environment: Hadoop 2.2, Hive 1.8, HDFS, Sqoop, Spark, Java, Hibernate 4.0, Oracle 10g, HTML3, CSS2/3, SQL Server 2012, Spring 3.1 framework, Spring Model View Controller (MVC), Servlets 3.0, JDBC4.0, AJAX, Web services, Rest full, JSON, JQuery, JavaScript

Confidential, Santa, Clara, CA

Java,J2EE Developer


  • As a Java,J2EE developer involved in back-end and front-end developing team.
  • Responsible for system analysis, design and development using J2EE architecture.
  • Actively participated in requirements gathering, analysis, design and testing phases.
  • Developed the application using Spring Framework that leverages classical Model View Controller (MVC) architecture.
  • Involved in Software Development Life cycle starting from requirements gathering and performed OOA and OOD
  • Used Spring JDBC to execute database queries.
  • Created row mappers and query classes for DB operations.
  • Created a Transaction History Web Service using SOAP that is used for internal communication in the workflow process.
  • Designed and created components for company's object framework using best practices and design Patterns such as Model-View-Controller (MVC).
  • Used DOM and DOM Functions using Firefox and IE Developer Tool bar for IE.
  • Debugged the application using Firebug to traverse the documents.
  • Involved in writing SQL Queries, Stored Procedures and used JDBC for database connectivity with MySQL Server.
  • Developed the presentation layer using CSS and HTML taken from Bootstrap to develop for browsers.
  • Did core Java coding using JDK 1.3, Eclipse Integrated Development Environment (IDE), clear case, and ANT.
  • Used Spring Core and Spring-web framework. Created a lot of classes for backend.
  • Involved in developing web pages using HTML and JSP.
  • Exposed business functionality to external systems (Interoperable clients) using Web Services (WSDL-SOAP) Apache Axis.
  • Developed POJO classes and writing Hibernate query language (HQL) queries.
  • Used PL/SQL for queries and stored procedures in SQL as the backend RDBMS.
  • Involved in the Analysis and Design of the front-end and middle tier using JSP, Servlets and Ajax.
  • Implemented Spring IOC or Inversion of Control by way of Dependency Injection where a Factory class was written for creating and assembling the objects.
  • Implemented modules using Core Java APIs, Java collection, Threads, XML, and integrating the modules and used SOAP for Web Services by exchanging XML data between applications over HTTP.
  • Created EJB, JPA and Hibernate component for the application.
  • Established continuous integration with JIRA, Jenkins.
  • Developed data mapping to create a communication bridge between various application interfaces using XML, and XSL.
  • Used Hibernate to manage Transactions (update, delete) along with writing complex SQL and HQL queries.
  • Developed Restful Web services client to consume JSON messages using Spring JMS configuration. Developed the message listener code.
  • Used Maven as the build tool and Tortoise SVN as the Source version controller.

Environment: Core Java, UNIX, J2EE, XML Schemas, XML, JavaScript 2014, JSON, CSS3, HTML3, spring, Hibernate, Design Patterns, Servlets, JUnit, JMS, MySQL 2010, Restful Web Services, SOAP, Tortoise SVN 1.5, Web Services, Apache Tomcat 8.0, Windows XP

Hire Now