We provide IT Staff Augmentation Services!

Big Data Developer Resume

Peoria, IL


  • Above8+ years of professional IT experience which includes Java/J2EE, Big Data ecosystem related experience in developing Spark/Hadoop applications.
  • Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
  • Operated on Java/J2EE systems with different databases, which include Oracle, MySQL and DB2.
  • Knowledge on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
  • Good working experience on different file formats (PARQUET, TEXTFILE, AVRO, ORC) and different compression codecs (GZIP, SNAPPY, LZO).
  • Valuable experience on practical implementation of cloud - specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), Amazon ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS.
  • Build AWS secured solutions by creating VPC with private and public subnets.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Good knowledge in developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS.
  • Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
  • Hands on experience working on NoSQLdatabases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster&Kubernetes cluster.
  • Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
  • Good Exposure on Apache HadoopMapReduce programming, PIGScripting and Distribute Application and HDFS.
  • Experience in building Pig scripts to extract, transform and load data onto HDFS for processing.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Worked as a part of the Cloud Dev Infrastructure team to design and implement appropriate business security measures on the new Cloud environment
  • Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
  • Experience in Big Data analysis using PIG and HIVE and understanding of SQOOP and Puppet.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Strong work ethics with desire to succeed and make significant contributions to the organization.
  • Load streaming log data from various web servers into HDFS using Flume.
  • Experience in deployment of HadoopCluster using Puppettool.
  • Good understanding on Migrating applications to Google Cloud Platform, Off site disaster recovery and archival with Google Cloud Platform, Hybrid architectures and multi cloud deployments
  • Experience in scheduling Cron jobs on EMR, Kafka, and Spark using Clover Server.
  • Proficient in using RDMS concepts with Oracle, SQL Server and MySQL.
  • Hands on experience with build and deploying tools like Maven and GitHub using Bash scripting.
  • Hands on experience with spring tool suit for development of Scala Applications.
  • Extensive experience working with structured data using Spark SQL, Data frames, Hive QL, optimizing queries, and in corporate complex UDF's in business logic.
  • Experience working with Text, Sequence files, XML, Parquet, JSON, ORC, AVRO file formats and Click Stream log files.
  • Experience working with Data Frames, RDD, Spark SQL, Spark Streaming, APIs, System Architecture, and Infrastructure Planning.
  • Experience in usage of Hadoop distribution like Cloudera and Hortonworks.
  • Strong knowledge in working with UNIX/LINUX environments, writing shell scripts and PL/SQLStored Procedures.


Hadoop Ecosystem: HDFS, YARN, Spark Core, Spark SQL, Spark Streaming, Scala, Map Reduce, Hive 2.3, Pig 0.17, Zookeeper 3.4.11, Sqoop 1.4, Oozie 4.3, Bedrock, Apache Flume 1.8, Kafka 2.0, Impala 3.0, Nifi, MongoDB, HBase.

Languages: Python, PL/SQL, Java, HiveQL, Pig Latin, Scala, UNIX shell scripting.

Hadoop Platforms: Hortonworks, Cloudera, Azure, Amazon Web services (AWS).

Amazon Web Services: Redshift, EMR, EC2, S3, RDS, Cloud Search, Data Pipeline, Lambda.

Databases: Oracle 12c, MS-SQL Server 2017, MySQL, PostgreSQL, NoSQL (HBase, Cassandra 3.11, MongoDB), Teradata r14.

Tools: Eclipse 4.8, NetBeans 9.0, Informatica, IBM DataStage, Talend, Maven, Jenkins 2.12.

Operating Systems: Windows XP/2000/NT, Linux, UNIX.

Version Control: GitHub, SVN, CVS.

Packages: MS Office Suite 2016, MS Vision, MS Project Professional.


Confidential - Peoria, IL

Big Data Developer


  • Working as a Sr. Big Data Developer with Big data&HadoopEcosystems components.
  • Responsible for automating build processes towards CI/CD/DevOps automation goals.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Data Ingestion into the Indie-DataLake using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Open source Apache tools like FLUME and SQOOP into HIVE environment.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
  • Develop predictive analytic using ApacheSparkScala APIs.
  • Developed MapReduce jobs in JavaAPI to parse the raw data and store the refined data.
  • Configured Sqoop and developed scripts to extract data from MYSQL into HDFS.
  • Created tables in HBase to store variable data formats of PII data coming from different portfolios.
  • Implemented best income logic using Pigscripts.
  • Write Puppet manifests and modules to deploy, configure and manage virtualized environment.
  • Heavily involve in fully automated CI/CD pipeline process through Github, Jenkins and Puppet
  • Built and deployed Docker containers to improve developer workflow, increasing scalability and optimization.
  • Used AWS CloudTrail for audit findings and Cloud Watch for monitoring AWS resources
  • Involved in identifying job dependencies to design workflow for Oozie&YARN resource management.
  • Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
  • Developed strong relationships with Strategic Accounts and help CxOs realize the full potential of BlueMix cloud-platform
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Involved in scheduling Oozieworkflow to automatically update the firewall.
  • Developing data pipeline using Flume, Sqoop, Pig and Javamapreduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in scheduling Oozieworkflowengine to run multiple Hive and pig jobs.
  • Worked in the BI team in the area of BigDataHadoop cluster implementation and data integration in developing large-scale system software.
  • Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases
  • Worked in AWSEC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Developed Nifi flows dealing with various kinds of data formats such as XML, JSON and Avro
  • Implemented test scripts to support test driven development and continuous integration.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Responsible for managing data coming from different sources
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSVformat.
  • Worked in tuning Hive&Pig to improve performance and solved performance issues in both scripts
  • Used SparkSQL to process the huge amount of structured data.
  • Developed Sparkstreaming application to pull data from cloud to Hivetable.

Environment: HDFS, MapReduce, Pig 0.17, Hive 2.3, Sqoop 1.4, Flume 1.8, Oozie 4.3, HBase, Impala 3.0.0, Spark Streaming, Yarn, Eclipse, spring, PL/SQL, UNIX Shell Scripting, Cloudera.

Confidential - Durham, NC

Sr. Hadoop/Spark Developer


  • Worked on Big Data infrastructure for batch processing and real time processing. Built scalable distributed data solutions using Hadoop .
  • Importing and exporting terabytes of data using Sqoop and real time data using Flume and Kafka .
  • Written Programs in Spark using Scala and Python for Data quality check.
  • Created various hive external tables, staging tables and joined the tables as per the requirement.
  • Implemented static Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table.
  • Involved in file movements between HDFS and AWSS3 and extensively worked with S3bucket in AWS .
  • Written transformations and actions on data frames , used Spark SQL on data frames to access hive tables into spark for faster processing of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala .
  • Used Hive to do transformations, joins, filter and some pre-aggregations after storing the data to HDFS .
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra .
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Used Scala to convert Hive/SQLqueries into RDD transformations in Apache Spark.
  • Implemented the workflows using Apache Oozie framework to automate tasks. Used Zookeeper to co-ordinate cluster services.
  • Have used Enterprise Data Warehouse (EDW) architecture and various data modeling concepts like star schema, snowflake schema in the project.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters and implemented data ingestion and handling clusters in real time processing using Kafka .
  • Performed various benchmarking steps to optimize the performance of spark jobs and thus improve the overall processing.
  • Developed multiple POCs using Pyspark and deployed on the Yarn cluster , compared the performance of Spark , with Hive and SQL/Teradata.
  • Developed code in reading multiple data formats on HDFS using Pyspark .
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive and involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and AmazonSimple Storage Service (S3).
  • Designed ETL workflows on Tableau, Deployed data from various sources to HDFS and generated reports using Tableau .
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Hadoop 2.8, MapReduce, HDFS, Yarn, Hive 2.1, Sqoop 1.1, Cassandra 2.7, Oozie, Spark, Scala, Python, AWS, Flume 1.4, Kafka, Tableau, Linux, Shell Scripting.

Confidential - Boston, MA

Spark Developer


  • Actively involved in designing Hadoop ecosystem pipeline.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Involved in designing Kafka for multi data center cluster and monitoring it.
  • Responsible for importing real time data to pull the data from sources to Kafka clusters.
  • Worked with spark techniques like refreshing the table and handling parallelly and modifying the spark defaults for performance tuning.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Involved in migrating MapReduce jobs into Spark jobs and used SparkSQL and Data frames API to load structured data into Spark clusters.
  • Involved in using Spark API over Hadoop YARN as execution engine for data analytics using Hive and submitted the data to BI team for generating reports, after the processing and analyzing of data in Spark SQL.
  • Performed SQL Joins among Hive tables to get input for Spark batch process.
  • Worked with data science team to build statistical model with Spark MLLIB and Pyspark .
  • Involved in performing importing data from various sources to the Cassandra cluster using Sqoop.
  • Worked on creating data models for Cassandra from Existing Oracledatamodel.
  • Designed Column families in Cassandra and Ingested data from RDBMS , performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Used Sqoop to import functionality for loading Historical data present in RDBMS to HDFS
  • Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on ApacheHadoop environment by Hortonworks (HDP 2.2)
  • Configured Hive bolts and written data to hive in Hortonworks as a part of POC.
  • Implemented ELK(Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Developed Oozie workflow for scheduling & orchestrating the ETL process.
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
  • Worked extensively on Apache Nifi to build Nifi flows for the existing Oozie jobs to get the incremental load, full load and semi structured data and to get data from rest API into Hadoop and automate all the Nifi flows runs incrementally.
  • Created Nifi flows to trigger spark jobs and used put email processors to get notifications if there are any failures.
  • Developed shell scripts to periodically perform incremental import of data from third party API to Amazon AWS
  • Worked extensively with importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive and AWS cloud.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Sparkframework .
  • Used version control tools like GITHUB to share the code snippet among the team members.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in makingscrum meetings more productive.

Environment: Hadoop 3.0, HDFS, Hive 2.3, Python 3.7, Spark 2.3, MYSQL, Oracle 12c, Linux, Hortonworks, Oozie 4.3, MapReduce, Sqoop 1.4, Shell Scripting, Apache Kafka 2.0, Scala, AWS.

Confidential - Centennial, CO

Sr. Java/Hadoop Developer


  • Analyzed Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
  • Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Extracted files from Cassandra through Sqoop and placed them in HDFS and processed them.
  • Performed data modeling to connect data stored in CassandraDB to the data processing layers and wrote queries in CQL.
  • Installed and configured HadoopMapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC) using Agile software development methodology.
  • Used Rational Rose for developing Usecase diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design phase.
  • Involved in implementation of the presentation layer (GUI) for the application using JSF, HTML4, CSS2/3 and JavaScript.
  • Developed multiple scripts for analyzing data using Hive and Pig and integrating with HBase.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Automated all the jobs from pulling data from databases to loading data into SQLserver using shell scripts.
  • Developed integration services using SOA, MuleESB, WebServices, SOAP, and WSDL.
  • Designed UI screens using JSP2.0 and HTML. Using JavaScript for client side validation.
  • Actively involved in designing and implementing Singleton, MVC, and Front Controller and DAO design patterns.
  • Used log4j to log the messages in the database.
  • Performed unit testing using JUNIT framework.
  • Created complex SQLQueries, PL/SQL Stored procedures, Functions for back end.
  • Used Hibernate to access the database and mapped different POJO classes to the database tables and persist the data into the database.
  • Used Spring Dependency Injection to set up dependencies between the objects.
  • Developed Spring-Hibernate and struts integration modules.
  • Developed PigScripts, PigUDF's and HiveScripts, Hive UDF's to load data files.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting
  • Implemented scripts for loading data from UNIX file system to HDFS.
  • Integrated Struts application with SpringFramework by configuring Deployment descriptor file and application context file in Spring Framework.
  • Implemented Model View Controller (MVC) architecture using Spring Framework.
  • Worked on Java Beans and other business components for the application and implemented new functionalities for the ERIC application.
  • Developed various SQL queries and PL/SQL Procedures in Oracle db for the Application

Environment: Hadoop 2.2, Hive 1.8, HDFS, Sqoop, Spark, Java, Hibernate 4.0, Oracle 10g, HTML3, CSS2/3, SQL Server 2012, Spring 3.1 framework, Spring Model View Controller (MVC), Servlets 3.0, JDBC4.0, AJAX, Web services, Rest full, JSON, JQuery, JavaScript

Hire Now