We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

Chicago, IL


  • Senior Hadoop Developer with 7 + years of programming and software development experience with skills in data analysis, design, development, testing and deployment of software systems from development stage to production stage in Big Data and Java technologies.
  • 4 years of experience in Big Data and tools in Hadoop Ecosystem including Pig, Hive, Sqoop, Oozie, Zookeeper and Flume.
  • Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Expert in creating PIG Latin Scripts and UDFs using JAVA for analysis of data efficiently.
  • Expert in creating Hive Queries and UDFs using Java for analysis of data efficiently.
  • Knowledge of Hadoop GEN2 Federation, High Availability and YARN architecture .
  • Expert in using Sqoop for fetching data from different systems and Confidential to analyze in Confidential, and again putting it back to the previous system for further processing.
  • Also used Hbase in accordance with PIG/Hive as and when required for real time low latency queries.
  • Hands on experience on operating system internals in multithreaded environment using Inter Process Communications and deep knowledge of UNIX operating system internals and working knowledge of Linux software distributions and High Performance Computing (HPC).
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
  • Excellent knowledge on Hadoop Ecosystem Architecture and components such as Hadoop Distributes File System ( Confidential ), MRv1, MRv2, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and MapReduce programming.
  • Good experience in optimizing MapReduce algorithms using Mappers, Reducers, Combiners and Partitioners to deliver best results for the large dataset.
  • Used Flume to process real time processing data.
  • Good understanding of NoSQL Databases.
  • Worked in Windows, UNIX/Linux platform with different technologies such as SQL, PL/SQL, XML, HTML, CSS, Java Script, Core Java etc.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache, Cloudera and AWS.
  • Experience in using IDEs like Eclipse and NetBeans.
  • Extensive programming experience in developing web based applications using Core Java, J2EE, JSP and JDBC.
  • Experience in deploying applications in Web/Application Servers like Tomcat, WebLogic and Oracle Application Servers.
  • Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation.
  • Extensive experience with Waterfall and Agile Scrum Methodologies.
  • Experience in development of logging standards and mechanism based on Log4J
  • Developed UML Diagrams for Object Oriented Design: Use Cases, Sequence Diagrams and Class Diagrams using Visual.
  • Working knowledge of database such as Oracle 8i/9i/10g, Microsoft SQL Server.
  • Experienced in creating Product Documentation & Presentations.
  • Strong expertise on MapReduce programming model with XML, JASON, CSV file formats.
  • Extensive experience worked on CVS, Clear Case and SVN for Source Controlling.
  • Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
  • Understanding of Data warehouse and ETL tools.
  • Highly proficient in Object Oriented Programming concepts.
  • Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result - oriented problem solving technique and leadership skills.


Hadoop/Big Data Technologies: Hadoop 2.x, Confidential, MapReduce, Hbase 0.94.8, Pig 0.14.0, Hive 1.1.0, Sqoop 1.4.6, Flume 1.5.2, Cloudera CDH 4, Oozie, Avro, YARN, Storm 0.9.1 and Zookeeper 3.5.0.

Programming Languages: Java, C, Matlab and C++, SQL, PLSQL

Scripting/Web Technologies: JavaScript, HTML, XML, Shell Scripting, J2EE, JDBC, JSP, CSS, JSON.

Databases: Oracle 9i/10g/11g, MySQL and NoSQL

Operating Systems: Linux, UNIX and Windows.

Java IDE: Eclipse and NetBeans.

Visualization Tools: Crystal Reports, Tableau 8.1/8.0/7.0


Confidential, Chicago, IL

Senior Hadoop Developer


  • Developed and maintained the ingestion process to ingest data coming from SQL server to Hadoop using shell script.
  • Involved in copying data from old cluster to new development cluster.
  • Automated the Hadoop ETL process using Oozie workflows.
  • Responsible for Setup of Oozie workflow in the new development cluster.
  • Designed and implemented the efficient way of concurrent/parallel running workflows using Oozie, to reduce the ETL processing time, which helped the project deliverables with in time.
  • Re-designed the code to run in Map-Reduce using Hadoop streaming with Python.
  • Exported Hadoop ETL data to the relational databases using Sqoop for visualization as a input data to the Tableau dashboards.
  • Created Hive partitioned external tables, for easy access to the Tableau using the portioned columns.
  • Worked with using different kind of compression techniques to save data and optimize data transfer over network using zip, tar etc..
  • Developed custom aggregate functions, and business required columns values using python
  • Used Pig on larger data sets in the deduping process and stored the data into Confidential .
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
  • Used Pig for Notes parsing the data and Store in Confidential and then export the data to Oracle database using Sqoop.
  • Stored the data in tabular formats using Hive tables and Hive SerDes.
  • Implemented a script to transmit information from Oracle to Confidential using Sqoop.
  • Worked on tuning the performance Hive queries, Pig queries.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Installed Oozie workflow engine to run multiple Hive, Hadoop Map-Reduce jobs with python streaming and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Assisted QA team in writing the Hive queries to test the data accuracy and test cases design.
  • Involved in the debugging of major defects for quicker resolutions.
  • Automated the onboarding of new hospitals data in Hadoop using Oozie workflows.
  • Responsible for delivering the weekly and monthly processed data for QA Team.

Environment: Hadoop, Confidential, Pig, Sqoop, Spark, MapReduce, MapR, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux, Oracle.

Confidential, Chicago, IL

Hadoop Developer


  • Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to Confidential using shell scripting.
  • Worked on installing cluster, commissioning & decommissioning of DataNode, NameNode high availability, capacity planning, and slots configuration.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked with using different kind of compression techniques to save data and optimize data transfer over network using LZO, Snappy, etc.
  • Analyze large and critical datasets using Cloudera, Confidential, Hbase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper, & Spark.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Used Pig to store the data into HBase.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
  • Used Pig to parse the data and Store in Avro format.
  • Stored the data in tabular formats using Hive tables and Hive SerDes.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in Confidential for further analysis.
  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Implemented a script to transmit information from Oracle to Hbase using Sqoop.
  • Worked on tuning the performance Pig queries.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Hadoop, Confidential, Pig, Sqoop, Oozie, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat.

Confidential, Rochester, MN

Hadoop Developer


  • Worked on writing transformer/mapping Map-Reduce pipelines using Java.
  • Handling structured and unstructured data and applying ETL processes.
  • Collected the logs data from web servers and integrated in to Confidential using Flume.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
  • Designed and implemented Incremental Imports into Hive tables.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Extensively used Pig for data cleansing.
  • Involved in collecting, aggregating and moving data from servers to Confidential using Apache Flume.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Worked extensively with Sqoop for importing and exporting the data from Confidential to Relational Database system and vice-versa. Loading data into Confidential .
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Experienced in managing and reviewing the Hadoop log files.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto Confidential .
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Developed scripts and automated data management from end to end and sync up between all the clusters.
  • Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts.

Environment: Hadoop, Big Data, Confidential, Map Reduce, Sqoop, Oozie, Pig, Hive, Hbase, Flume, LINUX, Java, Eclipse, Cassandra, Hadoop Distribution of Cloudera., PL/SQL, Windows, UNIX Shell Scripting, and Eclipse

Confidential, St. Louis, MO

Hadoop Developer


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Written multiple Map Reduce programs in Java for Data Analysis.
  • Wrote Map Reduce job using Pig Latin and Java API.
  • Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
  • Developed pig scripts for analyzing large data sets in the Confidential .
  • Collected the logs from the physical machines and the OpenStack controller and integrated into Confidential using Flume.
  • Designed and presented plan for Confidential on impala.
  • Experienced in migrating Hive QL into Impala to minimize query response time.
  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
  • Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
  • Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Implemented Daily jobs that automate parallel tasks of loading the data into Confidential using autosys and Oozie coordinator jobs.
  • Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
  • Responsible for performing extensive data validation using Hive.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Used Kafka to load data in to Confidential and move data into NoSQL databases(Cassandra)
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in submitting and tracking Map Reduce jobs using Job Tracker.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
  • Responsible for cleansing the data from source systems using Ab Initio components such as Join, Dedup Sorted, De normalize, Normalize, Reformat, Filter-by-Expression, Rollup.
  • Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources.
  • Implemented Hive Generic UDF's to implement business logic.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, Confidential, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Teradata, Zookeeper, autosys, Hbase, Cassandra, Apache ignite


Java/Hadoop Developer


  • Involving in Analysis, Design, Implementation and Bug Fixing Activities.
  • Involving in Functional & Technical Specification documents review.
  • Created and configured domains in production, development and testing environments using configuration wizard.
  • Involved in creating and configuring the clusters in production environment and deploying the applications on clusters.
  • Deployed and tested the application using Tomcat web server.
  • Analysis of the specifications provided by the clients.
  • Involved to Design of the Application.
  • Ability to understand Functional Requirements and Design Documents.
  • Developed Use Case Diagrams, Class Diagrams, Sequence Diagram, Data Flow Diagram
  • Coordinated with other functional consultants.
  • Web related development with JSP, AJAX, HTML, XML, XSLT, and CSS.
  • Create and enhance the stored procedures, PL/SQL, SQL for Oracle 9i RDBMS.
  • Designed and implemented a generic parser framework using SAX parser to parse XML documents which stores SQL.
  • Identified the required data to be pooled to Hadoop, and created required Sqoop scripts which were scheduled periodically to migrate data to Hadoop environment.
  • Provided further Maintenance and support, this involves working with the Client and solving their problems which include major Bug fixing.

Environment: Java 1.4, Web logic Server 9.0, Oracle 10g, Web services Monitoring, Web Drive, UNIX/LINUX Hadoop, Hive, Web Logic Server, JavaScript, HTML, CSS, XML


Implementation Engineer


  • Responsible for gathering the requirements of customizations requested by the clients.
  • Developed PL/SQL stored procedures, Functions, database triggers and created packages to access the database from front end screens.
  • Developing of Client Specific Customization Reports using Oracle SQL/PLSQL code for developing Crystal Reports.
  • Design and Development of business required reports using the Crystal Reports tool.
  • Automation of the product reports for the specified authorized users.
  • Documentation of reports and Training the users with the report parameters and inputs while scheduling the reports.
  • Worked on Performance Tuning for fine tuning the reports and work flow procedures.
  • Creation of indexes on big data tables so as to improve the performance while using the data for generation of the reports or moving the data to a different database.
  • Creation of Oracle External Tables, Views, Triggers, Procedures and Directories as per the requirement.
  • Testing and Integrating the Provisioning adapters with the Clients applications.
  • Scheduling the monthly, weekly, daily, hourly reports after the month end bill generation.
  • Scheduling the business process actions like disconnections, reconnections and renewals as per clients wish based upon the customers status.
  • Have Knowledge of Design and Schedule of Broadcaster reports for audit purpose.
  • Configuring the monthly Statement of Account reports to the customers personal email through the application.
  • Loading data into the relocation database using Toad Import utility and transferred the data by Export / Import Utility.
  • Helped in transferring the data during migration of the application to the latest version.
  • Helping the users in correction of data with a ticket tracking system of the issue and documenting the reason for error and the correction process.
  • Responsible for sending the periodic reports to the clients on the status of the issues reported.
  • Responsible in assisting various clients to install the patch releases and the product upgrades.
  • Identifying the alert mechanism in case of any service failures.
  • Helped the team by coordinating with different teams during the bill generations which usually happens during every month end.
  • Responsible for coordinating with various clients in analyzing and addressing the problems reported by them as per service level agreement.
  • Responsible for providing assistance in configuring complex business rules.

Environment: Oracle 10g, Crystal Reports, Windows Server 2006, Toad, SQL Developer, SQL * loader, .NET application Monitoring, UNIX/LINUX .

Hire Now