We provide IT Staff Augmentation Services!

Big Data Developer Resume

Livonia, MI


  • Certified Hadoop and Spark Developer from Cloudera and Horton works with 8+ years of experience in developing applications using SQL, Java, Spark, AWS, Big Data
  • 4 years of experience in Big Data tools.
  • In depth Knowledge and experience in using Hadoop ecosystem tools likeHDFS, MapReduce, YARN, Pig, Hive, Sqoop, Kafka, Flume, Oozie and Zookeeper.
  • Excellent understanding and extensive knowledge of Hadoop architecture and various ecosystem components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, MapReduce programming paradigm and strong knowledge of Rack awareness topology.
  • Good usage of Apache Hadoop along enterprise version of Cloudera, Hortonworks and MapR distribution.
  • Expert in importing and exporting data from different Relational Database Systems like MySQL and Oracle into HDFS and vice - versa.
  • Hands on experience with data ingestion tools Kafka, Flume and workflow management tools Oozie
  • Experience on analyzing data in NoSQL databases like Hbase, Cassandra, DynamoDB.
  • Capable of processing large sets of structured, semi- structed and unstructured data and supporting systems application architecture.
  • Experience in using data visualization libraries such as Tableau.
  • Expertise in Extending Hive and Pig core functionality by writing custom UDF’s Spark Core, SQL and Streaming:
  • Developed Spark applications for data transformations and loading into HDFS using RDD, DataFrames.
  • Extensive Knowledge on performance tuning of Spark applications and converting Hive/SQL queries into Spark transformations.
  • Hands on experience handling different file formats like JSON, AVRO, ORC, Parquet and Compression Snappy, zlib, ls4, etc.
  • Experience in execution of Batch jobs through the data streams to SPARK Streaming.
  • Hands-on experience with AWS (Amazon Web Services), using Elastic MapReduce (EMR), creating and storing data in S3buckets and creating Elastic Load Balancers(ELB)
  • Extensive knowledge on creating Hadoop cluster on multiple EC2 instances in AWS and configuring them and using IAM (Identity and Access Management) for creating groups, users and assigning permissions.
  • Hands-on experience with Data-pipeline, moving data between S3 and DynamoDB.
  • Extensive programming experience in Java Core concepst like OOPS, Multithreading, Collections and IO.
  • Extensive experience with UNIX commands, shell scripting and setting up CRON jobs.
  • Good experience in using Relational databases like Oracle, MySQL etc.
  • Experience in working with build tools like Maven, SBT, Gradle to build and deploy applications into server.
  • Expertise in Object Oriented Analysis and Design(OOAD) and knowledge in Unified Modeling Language(UML)
  • Expertise in complete Software Development Life Cycle (SDLC) in Waterfall and Agile.
  • Experience in software configuration Management using Git.
  • Experience in using IDEs like Eclipse, NetBeans and Intellij.
  • Experience in developing web pages interfaces using JSP, Java Swings and HTML.
  • Comprehensive Knowledge on software development using Shell scripting, Core Java and Web technologies.
  • Experience in working with JAVA,JDBC,ODBC,JSP, Servlets, Java Beans,
  • Developed stored procedures and queries using PL/SQL.
  • Successfully working in face-paced environment, both independently and in collaborative team environments.
  • Strong background in mathematics and have very good analytical and problem-solving skills.


Bigdata Technologies: HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Scala, Spark, Kafka, Flume, Ambari.

Hadoop Frameworks: Cloudera CDHs, Hortonworks HDPs, MAPR.

Database: Oracle 10g/11g, PL/SQL, MySQL, MS SQL Server 2012,DB2

Language: C, C++, Java, Scala, Python

AWS Components: IAH, S3, EMR, EC2,Lambda, Route 53, Cloud Watch, SNS

Development Methodologies: Agile, Waterfall

Build Tools: Maven, Gradle, Jenkins.

NOSQL Databases: HBase, Cassandra, MongoDB, DynamoDB

IDE Tools: Eclipse, NetBeans, Intellij

Modelling Tools: Rational Rose, StarUML, Visual paradigm for UML

Relational DBMS, Client: Server Architecture

Cloud Platforms: AWS Cloud

BI Tools: Tableau

Operating System: Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X


Confidential, Livonia, MI

Big Data Developer


  • Worked on Horton works-HDP 2.5 distribution.
  • Involved in review of functional and non-functional requirements.
  • Involved in importing data from IBM DB2into HDFS using Sqoop and created Hive external tables.
  • Developed Spark scripts by using Scala as per the requirement.
  • Load the data into SparkRDD and performed in-memory data computation to generate the output response.
  • Performed different types of transformations and actions on the RDD to meet the businessrequirements.
  • Involved in loading data from UNIX file system to HDFS.
  • Developed multiple Map Reduce jobs inScala for data cleaning and pre-processing.
  • Involved in managing and reviewing Hadoop log files.
  • Exported the analyzed data to the relational databases using Sqoop to generatereports for the BI team.
  • Involved in writing shell scripts for execution of HiveQL.
  • Analyzed NDW data-based Business logic and visualized data in Tableau.
  • Deployed Tableau Dashboard in Tableau Server and Scheduled data loading.
  • Migrated Existing SQL queries into SparkSQL and executed Spark job in cluster mode.
  • Implemented best offer logic in Sparkby writingSpark UDFsin Scala.
  • Created Calculated fields in Tableau for aggregation, transformation of data in Tableau.
  • Involved in scheduling Oozie workflow engine to run Hive queries.

Environment: Horton works, Hadoop, HDFS, Sqoop, Hive, Oozie, Zookeper, NoSQL, Shell Scripting, Scala, Spark, SparkSQL, Tableau, Tableau Server.

Confidential, Middletown, NJ

Hadoop Developer


  • Worked on Hortonworks-HDP 2.5distribution
  • Responsible for building-scalable distribution data solution using Hadoop
  • Involved in importing data from MicrosoftSQLServer, MySQL, Teradata. into HDFS using Sqoop.
  • Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata.
  • Writing HiveQL queries for integrating different tables for create views to produce result set.
  • Collected the log data from Web Servers and integrated into HDFS using Flume.
  • Experienced on loading and transforming of large sets of structed and unstructured data.
  • Developed MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats.
  • Written Several MapReduce Jobs in Spark using Scala programming language.
  • Written MapReduce programs to handle semi structed and un structed data like JSON, Avro data files and sequence files for log files.
  • Involved in loading data into HBaseNoSQL database.
  • Building, Managing and scheduling Oozie workflows for end to end job processing
  • Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
  • Analyzing of Large volumes of structured data using SparkSQL.
  • Written shell script to execute HiveQL.
  • Written Automated shell scripts in Linux/Unix environment using bash.
  • Analyze Hive log files and fix the issues making sure all the jobs run fine.
  • Migrated HiveQL queries into SparkSQLto improve performance.
  • Extracted Real time feed using Spark streaming and convert to RDD and process data into Data Frame and load the data into HBase.
  • Experienced in using Data Stax Spark connector which is used to store the data into Cassandra databaseor get the data from Cassandra database.
  • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
  • Imported real time weblogs using Fulme and ingested the data to Spark Streaming.
  • Used Flume to collect, aggregate and push log data from different log servers.
  • Extensive experience tuning Hive queries using memory joins for faster execution and appropriating resources.
  • Implemented Flume, Spark, and Spark Streaming framework for real time data processing.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Generate the reports using Tableau.
  • Migrated Map Reduce jobs to Spark jobs to achieve better performance.
  • Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.

Environment: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeper, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL, Git, GitHub.

Confidential, Raleigh, NC

Hadoop Developer


  • Worked on Cloudera CDH distribution.
  • Hand on experience on cloud services like Amazon Web Services (AWS)
  • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
  • Developed Sqoop scripts to import data from relational sources and handled incremental loading.
  • Create Hive external tables for data in HDFS locations.
  • Written Hive queries for data analysis to meet the business requirements.
  • Used various Hive optimization techniques likes partitioning, bucketing, Mapjoins and merge small files and vectorization.
  • Process the complex/nested Json and CSV data using Spark Data Frames.
  • Used Spark as ETL tool
  • Developed various Spark jobs with Scala as programming for Data Analysis on Different data formats.
  • Automatically scale-up the EMR Instances based on the data.
  • Apply Transformation rules on the top of Data Frames.
  • Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming.
  • Used Kafka to load data into HDFS and move data into NoSQL databases.
  • Deployed the project on Amazon EMR with S3 Connectivity.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service(S3).
  • Loaded the data into Simple Storage Service (S3) in the AWS Cloud.
  • Good Knowledge in using of Amazon Load Balancer for Auto scaling in EC2 servers.
  • Executed the Spark jobs in Amazon EMR.
  • Generated the reports using Tableau.

Environment: Data Pipeline, Hive, Impala, Amazon Elastic Cloud, Amazon Load Balancer, Amazon Simple Storage Service, Amazon EMR. Spark, Spark SQL, Cloudera, Intellij IDE.

Confidential, Portland, Oregon

Hadoop Developer


  • Worked on Cloudera CDH distribution.
  • Worked on Cluster size of 50-100 nodes.
  • Loading data from different relational databases to HDFS using Sqoop.
  • Implemented Daily jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
  • Involved in review of functional and non-functional requirements.
  • Created External Hive tables and executed complex Hive queries on them using Hive QL.
  • Solved performance issues in Hive and Pig Scripts with understanding of Joins, Group and Aggregation and perform the MapReduce jobs.
  • UsedSparkfor transformations, event joins and some aggregations before storing the data into HDFS.
  • Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in the data being reported.
  • Analyze the large amount of data sets to determine optimal way to aggregate.
  • Used Oozie to automate/schedule business workflows which invoke sqoop, MapReduce and pig jobs as per the requirements.
  • Loaded data from UNIX file system to HDFS and written Hive User Defined Functions.
  • Involved in processing ingested raw data using Apache Pig.
  • Involved in migrating HiveQL into Impala to minimize query response time.
  • Used Pig, Spark as ETL tool.
  • Involved in creating UDF’s in Spark using Scala programming Language.
  • Monitored continuously and managed the Hadoop cluster using cloudera manager.
  • Created Hive-Hbase tables for data storage Hive for Meta-Store and Hbase for data storage in Row Key Format.
  • Implemented Impala to have a better database management system for data stored in computer clusters which are running on Apache Hadoop and also provides less stress on CPU than HIVE.
  • Worked on different file formats like JSON, AVRO, ORC, Parquet and Compression like Snappy, zlib, ls4 etc.
  • Invoved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Gained Knowledge in creating Tableau dashboard for reporting analyzed data.
  • Expertise with NoSQL databases like HBaseand loaded the data into HBase.
  • Experienced in managing and reviewing the Hadoop log files.
  • Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.
  • Involved in creating and maintaining of the technical documentation for MapReduce, Hive, Sqoop, Spark jobs along with Hadoop Clusters and also reviewing them to fix the post production issues.

Environment: HDFS, MapReduce, Sqoop, Hive, Pig, Oozie, Cloudera, MySQL, Eclipse, Spark, Git, GitHub, Jenkins.


Java Developer


  • Involved in all the phases of the life cycle of the project from requirement gathering to quality assurance testing.
  • Developed UML diagrams using Rational Rose.
  • Involved in developing applications using Java, JSP, Servlets, Swings.
  • Developed UI using HTML, CSS, Ajax, JQuery and developed Business Logic and Interfacing Components using Business Objects, XML and JDBC.
  • Created applications, connection pools, deployment of JSP & Servlets.
  • Used Oracle, MySQL database for storing user information.
  • Developed backed for application using PHP for web applications.
  • Used the Eclipse as IDE, configured and deployed the application onto WebLogic application server using Maven build scripts to automate the build and deployment process.

Environment: Java, JSP, Swings, Oracle, HTML, CSS, PHP, Servlets, Eclipse, JSP, Servlets, MySQL.


Java Developer


  • Hands on experience in all phases of SDLC (software development life cycle) involving.
  • Developed UML diagrams using Rational Rose
  • Created UI for web applications using HTML, CSS.
  • Created Desktop applications using J2EE, Swings.
  • Developed the process using Waterfall model.
  • Created SQL scripts for Oracle database.
  • Executed test cases manually to verify expected results.
  • Used JDBC to establish connection between the database and the application.
  • Involved in designing, coding, debugging, documenting and maintaining the applications.

Environment: Rational Rose, HTML, CSS, J2EE, Swings, SQL, Oracle 9i Java, Servlets.

Hire Now