We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

Atlanta, GA


  • Result - driven IT Professional with referable expertise in software development life cycle (SDLC) including requirement gathering, analysis, design, development, writing technical specifications, and interface development using Object Oriented Methodologies and RDBMS like MySQL, Oracle.
  • 4+ years of recent experience in developing & administering Hadoop and Big Data technologies like Hadoop HDFS, MapReduce, Pig, Hive, Oozie, Flume, HCatalog, Sqoop, Zookeeper and NoSQL like Cassandra.
  • Skilled in utilizing Java tools in business, Web, and client-server environments including Java Platform, Enterprise Edition (Java EE), Enterprise Java Bean (EJB), Java Server Pages (JSP), Java Servlets, and Java database Connectivity (JDBC) technologies.
  • In depth knowledge of Hadoop Architecture and various components such as HDFS, MapReduce, Name Node, Data Node, Secondary Name Node, Job Tracker, Task Tracker.
  • Experience in designing, developing & deploying Hadoop eco system technologies such as HDFS, Zookeeper MapReduce, Pig, Hive, Oozie, Flume, Hue and Sqoop.
  • Expert in deploying Spark to run with YARN and connect to Hive Server to run Spark-SQL commands.
  • Experience in working with multiple file formats JSON, XML, Sequence Files and RC Files by using SerDes.
  • High-end technical driven experience in analyzing data by writing Hadoop MapReduce jobs using Java.
  • Expert in optimization of MapReduce algorithms using Combiners, Partitioners and Distributed Cache to deliver best results.
  • Hands on experience in analyzing data by developing custom UDF's and scripts in Pig.
  • Experience in using HCatalog to transfer data in between Pig and Hive.
  • Hands on experience in implementing and optimizing the queries using HiveQL by implementing partitioning and custom UDF's.
  • Experience in importing streaming logs and aggregating the data to HDFS through Flume.
  • Hands on experience in scheduling Oozie workflow engine to run multiple Flume, MapReduce, Hive and pig jobs.
  • Experience in working with relational databases and integrating with Hadoop infrastructure and pipe the data to HDFS using Sqoop.
  • Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
  • Experience in writing Python, Shell and Java scripts to monitor the Cluster usage metrics.
  • Experience in integrating Hadoop with Enterprise BA tools using Tableau and Pentaho Business Analytics tool to generate reports.
  • Experience in installing and Customizing Pentaho Data Integration and Pentaho Business Analytics tools to use Active Directory, LDAP and Custom Header authentication.
  • Excellent communications skills add adept at building strong working relationships with co-workers and management.


Big Data Distributions: Apache Hadoop, Cloudera Manager, Cloudera Hadoop Distributions

Hadoop Ecosystems: HDFS, MapReduce, Hive, Pig, Flume, Sqoop, Oozie, Zookeeper, Spark

Monitoring Tools: Ganglia, Nagios, Cloudera Manager, DataStax OpsCenter

Security: Kerberos

Databases: Oracle, MySQL, PostgreSQL

NoSQL Database: Cassandra, HBase, MongoDB

Languages: Shell Scripts, C, C++, Java, Java Script, JSP, HTML, XML, SQL

Technologies: J2EE, JDBC, Servlets, JSP, Webservices, XML

Tools: Eclipse, NetBeans IDE, Apache Tomcat, Visual basic, Microsoft Office Suite, Adobe Dreamweaver, Stash, GitHub

Operating Systems: Windows XP/7/8 Macintosh, Ubuntu, Linux CentOS6


Confidential - Atlanta GA

Sr. Hadoop/Spark Developer

Roles & Responsibilities:

  • Worked in querying data using Spark SQL on top of Spark engine.
  • Managing and monitoring Hadoop cluster using Cloudera Manager.
  • Developed HQL scripts for performing transformation logic and also loading the data from staging zone to landing zone and Semantic zone.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Used Sqoop to transfer data between DB2 and HDFS.
  • Wrote transformations for various business logics in using Hive SQL using spark.
  • Involved in loading .csv files into hive databases according to the business logics.
  • Wrote transformations for about 160 columns tables and upset this into DB2 database using spark.
  • Worked on querying data using Spark SQL on top of Spark engine.
  • Created and worked on large data frames with a schema of more than 300 columns
  • Created strongly typed datasets.
  • Wrote functions whenever required to make column validations, data cleansing as required to achieve logics in Scala
  • Created UDF’s when required and registering to use throughout application
  • Worked on various file formats like parquet orc, Avro
  • Developed quality code adhering to Scala coding Standards and best practices.
  • Worked on performance tuning of Spark jobs for setting right Batch Interval time, correct level of Parallelism and memory tuning, changing the configuration properties and using broadcast variables.
  • Involved in Hive data cleansing using eclipse ide like trimming data, joining columns, performing aggregations on columns like percentages, trimming leading zero’s columns in hive tables etc.
  • Performed transformations on top of columns and storing data into hive databases.
  • Wrote case statements using HiveQL
  • Used Spark for Parallel data processing and better performance.
  • Develop scalable solutions using NoSQL databases Cassandra.
  • Involved in writing transformations for hive tables using spark and upsetting it to DB2.
  • Working closely with Cassandra loading activity on history load and incremental loads from Oracle Databases and resolving loading issues and tuning the loader for optimal performance
  • Involved in migrating several data bases from on premise data center to Cassandra.

Environment: s: Hive, Hadoop Cluster, Sqoop, Spark 2.1.1, Scala, Eclipse Scala IDE, Cassandra, Toad DB2

Confidential - Franklin, WI

Sr. BigData Analyst/Developer

Roles & Responsibilities:

  • Analysis the Existing system process.
  • Prepared design documents for the above specified models.
  • Handled the implementation for data preparation, scoring and trend analysis.
  • Developed common export framework to transfer the data for different target systems (COM, EXE)
  • Prepared the in-house Comparator tool using MapReduce for (Data Science and Engineering team output data validation)
  • Lead quality assurance reviews, code inspections, and walkthroughs of the developers' code
  • Acted as technical interface to development team for external groups
  • Provide training for team members and cross team members.
  • Prepared validation script to check source and target data validation post ingested.
  • Implemented scoring logic using Python script and hive script.
  • Created and configure coordinator, workflow and bundles in Oozie.
  • Deployed jar file in EC2 instance post development.
  • Worked 62 Nodes physical cluster in hadoop1x and 31 Nodes in hadoop2x Yarn.
  • Worked 10 Nodes cluster in AWS for Dev & QA Environment.
  • Involved in setting up IAM identity access manager role.
  • Involved network set up in physical cluster with admin.

Environment: Hadoop, HDFS, Hive, Sqoop, Java, Spark, Python, AWS, Cloudera, Zookeeper and HBase, DB2, Teradata, Linux

Confidential - Bloomington, IL

Hadoop Developer

Roles & Responsibilities:

  • Interacted with the Business users to identify the process metrics and various key dimensions and measures and involved in the complete life cycle of the project.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Deployed and analyzed large chunks of data using HIVE as well as HBase.
  • Provide support data analysts in running Pig and Hive queries.
  • Used HIVE, Python at various stages of the project lifecycle.
  • Create business intelligence dashboards in Tableau for reconciliation and verifying data
  • Redesigned and developed a critical ingestion pipeline to process over 200 TB of data.
  • Import & export Data from MySQL/Oracle to HiveQL Using SQOOP.
  • Import & export Data from MySQL/Oracle to HDFS.
  • Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Pig Latin and Java-based map-reduce.
  • Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Responsible for defining the data flow within Hadoop eco system and direct the team in implement them and exported the result set from Hive to MySQL using Shell scripts.
  • Created models and customized data analysis tools in Python and MATLAB
  • Delivered data analysis projects using Hadoop based tools and the python data science stack, Developed new data analysis and visualization in python
  • Handled importing of data from various data sources, performed transformations using and Experience in Importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
  • Involved in creating tables, partitioning, bucketing of table.
  • Created Mapplets, reusable transformations and used them in different mappings.
  • Created Workflows and used various tasks like Email, Event-wait and Event-raise, Timer, Scheduler, Control, Decision, Session in the workflow manager.
  • Made use of Post-Session success and Post-Session failure commands in the Session task to execute scripts needed for cleanup and update purposes.
  • Prepared ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.
  • Worked with reporting team to help understand them the user requirements on the reports and the measures on them.
  • Migrated repository objects, services and scripts from development environment to production environment.
  • Troubleshooting and solving migration issues and production issues.
  • Actively involved in production support. Implemented fixes/solutions to issues/tickets raised by user community.

Environment: Hadoop, HDFS, Hive, Sqoop, Java, Spark, Python, AWS, Cloudera, Zookeeper and HBase, Informatica PowerCenter, DB2, Teradata, UNIX, Tableau

Confidential - Fremont, CA

Hadoop Admin

Roles & Responsibilities:

  • Actively participated with the development team to meet the specific customer requirements and proposed effective Hadoop solutions.
  • Involved in extracting the data from various sources into Hadoop HDFS for processing.
  • Worked on streaming the data into HDFS from web servers using flume.
  • Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks
  • Responsible for collecting Data required for testing various Map Reduce applications from different sources.
  • Developed Map-Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Implemented complex map reduce programs to perform joins on the Map side using distributed cache.
  • Effectively used Sqoop to transfer data between databases and HDFS.
  • Created the Hive tables as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can’t be performed using Hive inbuilt functions.
  • Wrote Pig Scripts for advanced analytics on the data for recommendations.
  • Designed and implemented Pig UDF's for evaluation, filtering, loading and storing of data.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
  • Implemented unit testing with the help of MRUnit and Junit tools.
  • Debugging and troubleshooting the issues in development and Test environments.
  • Worked with BI teams in generating the reports on Tableau.
  • Involved in Minor and Major Release work activities.

Environment: HDFS, Oozie, Sqoop, Pig, Hive, Flume, Shell scripting, MapReduce, Eclipse, Tidal, Stash

Confidential - Portland, ME

Hadoop Developer

Roles & Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, hive and Sqoop.
  • Implemented Sqoop to transfer data between RDBMS databases and HDFS.
  • Involved in loading data from UNIX file system to HDFS.
  • Worked on streaming the data into HDFS from web servers using Flume.
  • Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
  • Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a map reduce way.
  • Worked on Implementing Hive Custom UDF's and UDAF's in java to process and analyze data.
  • Worked on custom Pig Loaders and Storage classes to work with semi-structured data and unstructured data.
  • Supporting Hadoop developers and assisting in optimization of MapReduce jobs, Pig Latin scripts and Hive Scripts.
  • Automating jobs using Oozie workflow engine to chain together Shell scripts, Flume, MapReduce jobs, Hive and pig scripts
  • Used Tidal enterprise scheduler to schedule and automate the daily jobs.
  • Implemented unit testing with the help of MRUnit and Junit tools.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Worked with big data Analysts, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig, and Flume etc.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Developed test plans, test scripts and test environments to understand and resolve defects.

Environment: MapReduce, Oozie, Sqoop, Pig, Hive, Flume, Shell scripting, Eclipse, Tidal, GitHub


Java Developer

Roles & Responsibilities:

  • Participated in planning and development of UML diagrams like use case diagrams, object diagrams and class diagrams to represent a detailed design phase.
  • Designed and developed user interface static and dynamic web pages using JSP, HTML and CSS.
  • Identified and fixed transactional issues due to incorrect exception handling and concurrency issues due to unsynchronized block of code.
  • Used JavaScript for developing client side validation scripts.
  • Developed SQL scripts for batch processing of data.
  • Created and implemented stored procedures, functions, triggers, using SQL.
  • Incorporated custom logging mechanism for tracing errors, resolving all issues and bugs before deploying the application.
  • Performed unit testing, system testing and user acceptance test.
  • Worked with QA to move the application to production environment.
  • Prepared technical reports & documentation manuals during the program development.

Hire Now