We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Albnay, NY


  • Around 8+ years of professional experience including around 3 years of Java Developer and 5 plus years in Big Data analytics as Hadoop Developer.
  • Experience in all the phases of Data warehouse life cycle involving Requirement Analysis, Design, Coding, Testing, and Deployment.
  • Experience in architecting, designing, installation, configuration, and management of Apache Hadoop Clusters & Cloudera Hadoop Distribution.
  • Working close together with QA and Operations teams to understand, design, and develop and end - to-end data flow requirements.
  • Utilizing Oozie to schedule workflows.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Experience in managing the Hadoop infrastructure with Cloudera Manager.
  • Practical knowledge on functionalities of every Hadoop daemons, interaction between them, resource utilizations and dynamic tuning to make cluster available and efficient.
  • Experience in understanding and managing Hadoop Log Files.
  • Experience in understanding hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn.
  • Experience in Adding and removing the nodes in Hadoop Cluster.
  • Experience in extracting the data from RDBMS into HDFS Sqoop.
  • Experience in collecting the logs from log collector into HDFS using up Flume.
  • Good understanding of No SQL databases such as HBase.
  • Experience in analyzing data in HDFS through Map Reduce, Hive and Pig.
  • Design, implement and review features and enhancements to Cassandra.
  • Experience on UNIX commands and Shell Scripting.
  • Experience in Data Analysis, Data Cleansing (Scrubbing), Data Validation and Verification, Data Conversion, Data Migrations and Data Mining.
  • Excellent interpersonal, communication, documentation, and presentation skills.


Hadoop/Big Data: MapReduce, HDFS, Hive 2.3, Pig 0.17, HBASE 1.2, Zookeeper 3.4, Sqoop 1.4, Oozie, Flume 1.8, Scala 2.12, Kafka 1.0, Storm, MongoDB 3.6, Hadoop 3.0Spark, Cassandra 3.11, Impala 2.1

Database: Oracle 12c, MySQL, MS SQL server, Teradata15.

Web Tools: HTML 5.1, Java Script, XML, ODBC, JDBC, Hibernate, JSP, Servlets, Java, Struts, spring, and Avro.

Cloud Technology: Amazon Web Services (AWS), EC2, EC3, Elastic Search, Microsoft Azure.

Languages: Java/J2EE, SQL, Shell Scripting, C/C++, Python

Java/J2EE Technologies: JDBC, Java Script, JSP, Servlets, JQuery

IDE and Build Tools: Eclipse, NetBeans, MS Visual Studio, Ant, Maven, JIRA, Confluence Version Control Git, SVN, CVS

Operating System: Windows, Unix, Linux.

Tools: Eclipse Maven, ANT, JUnit, Jenkins, Soap UI, Log4j

Scripting Languages: JavaScript, JQuery, AJAX, CSS, XML, DOM, SOAP, REST


Hadoop Developer

Confidential, Albnay, NY


  • Understand Business requirement and involved in preparing Design document preparation according to client requirement.
  • Analyzed Tera Data procedure to prepare all individual queries information.
  • Developed hive queries according to business requirement.
  • Developed UDF's in Hive where we don't have some default functions in hive.
  • Developed UDF for converting data from Hive table to JSON format as per client requirement.
  • Implemented Dynamic partitioning and Bucketing in Hive as part of performance tuning.
  • Implemented the workflow and coordinator files using Oozie framework to automate tasks.
  • Involved in Unit, Integration, System Testing.
  • Prepared all unit test case documents and flow diagrams for all scripts which are used in the project.
  • Scheduling and managing jobs on a Hadoop cluster using Oozie work flow.
  • Experienced on loading and transforming of large sets of structured, semi structured, and unstructured data.
  • Transforming unstructured data into structured data using PIG.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Designed and developed PIG Latin Scripts to process data in a batch to perform trend analysis.
  • Good experience on Hadoop tools like MapReduce, Hive and HBase.
  • Worked on both External and Managed HIVE tables for optimized performance.
  • Developed HIVE scripts for analyst requirements for analysis.
  • Maintenance of data importing scripts using Hive and Map reduce jobs.
  • Data design and analysis to handle huge amount of data.
  • Cross examining data loaded in Hive table with the source data in oracle.
  • Working close together with QA and Operations teams to understand, design, and develop and end-to-end data flow requirements.
  • Utilizing Oozie to schedule workflows
  • Developing structured, efficient and error free codes for Big Data requirements using my knowledge in Hadoop and its Eco-system.
  • Storing, processing, and analyzing huge data-set for getting valuable insights from them.

Environment: HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, HBase, Flume, LINUX, Java, Eclipse, Cassandra, PL/SQL, UNIX Shell Scripting, and Eclipse.

Hadoop Engineer

Confidential, St.Louis, MO


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Written multiple MapReduce programs in Java for Data Analysis.
  • Wrote MapReduce job using Pig Latin and Java API.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • Responsible for architecting Hadoop clusters with Hortonworks distribution platform HDP 1.3.2 and Cloudera CDH4.
  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files on Hortonworks, MapR and Cloudera clusters
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
  • Load data from various data sources into HDFS using Kafka.
  • Designed and presented plan for POC on impala.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Performed extensive Data Mining applications using HIVE.
  • Responsible for performing extensive data validation using Hive.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Utilized Storm for processing large volume of datasets.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
  • Used Visualization tools such as Powerview for excel, Tableau for visualizing and generating reports.
  • Setup Hadoop cluster on Amazon EC2 using whirr for POC.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Maven, Teradata, Zookeeper, SVN, autosys, Hbase.

Big data Engineer

Confidential, Columbus, OH


  • Used Sqoop and Java API’s to import the data to Cassandra from different relational databases.
  • Created tables in Cassandra and loaded large data sets of structured, semi-structured and unstructured data from various data sources.
  • Developed Map reduce jobs in Java for cleaning and preprocessing data.
  • Wrote Python scripts for wrapper and utility automation.
  • Performed cleansing operations by using storm builder topologies before moving data in to Cassandra.
  • Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
  • Worked on configuring Hive, PIG, Impala, Sqoop, Flume and oozie in cloudera.
  • Automated data movement between different Hadoop systems using Apache NiFi.
  • Wrote Map reduce programs in python using Hadoop Streaming API.
  • Wrote on creating Hive tables and loading them with data and writing Hive queries.
  • Migration of ETL processes from SQL server to Hadoop using PIG for data manipulation.
  • Developed spark jobs using Scala in test environment and Spark sql for querying.
  • Worked on importing data from oracle tables to HDFS and Hbase tables using Sqoop.
  • Wrote scripts to load data in to Spark RDDs and do in memory computations.
  • Wrote Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
  • Experience in Elastic search technologies in creating custom Solr Query components.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Worked on different data sources such as Oracle, Netezza, MySQL, Flat files etc.
  • Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
  • Worked with Flume to load the log data from different sources into HDFS.
  • Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
  • Developed Talend jobs to move inbound files to HDFS file location based on monthly, weekly, daily, and hourly partitioning.

Environment: Cloudera, Map Reduce, Spark SQL, Spark Streaming, Pig, Hive, Flume, Hue, Oozie, Java, Eclipse, Zookeeper, Cassandra, HBase, Talent, Github.

Hadoop Developer



  • Worked on writing transformer/mapping Map-Reduce pipelines using Java.
  • Involved in creating Hive Tables, loading with data, and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
  • Designed and implemented Incremental Imports into Hive tables.
  • Deployed an Apache Solr search engine server to help speed up the search of the government cultural asset.
  • Involved in collecting, aggregating, and moving data from servers to HDFS using Apache Flume.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data, and writing hive queries that will run internally in map reduce way.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts.

Environment: Hadoop, Big Data, HDFS, MapReduce, Sqoop, Oozie, Pig, Hive, hbase, Flume, LINUX, Java, Eclipse, Cassandra, Hadoop Distribution of Cloudera., PL/SQL, Windows NT, UNIX Shell Scripting, Putty and Eclipse.

J2EE Developer



  • Responsible for the systems design, architecture, implementation, and integration with various technologies like Spring Integration, Web Services, Oracle Advanced Queues and WMQ's.
  • Implemented framework Spring 3.05 and Spring Integration 2.0.5 upgrades.
  • Used OSGi container framework to install bundles (modules) developed using Spring and Spring Integration.
  • Worked on UI development using JSP on Struts and Spring MVC Frameworks.
  • Developed DAOs (Data Access Object) and DOs (Data Object) using Hibernate as ORM to interact with DBMS - Oracle.
  • Developed modules that integrate with web services that provide global information.
  • Used Log4j for logging the application, log of the running system to trace the errors and certain automated routine functions.
  • Worked as Web Dynpro Java developer and developed custom applications and creating the Portal screens.

JAVA Developer



  • Analysis, design, and development of Application based on J2EE using Struts and Hibernate.
  • Involved in interacting with the Business Analyst and Architect during the Sprint Planning Sessions.
  • Implemented Point to Point JMS queues and MDB's to fetch diagnostic details across various interfaces.
  • Worked with WebSphere business integration technologies as WebSphere MQ and Message Broker 7.0 (Middleware tools) on Various Operating systems.
  • Perform incident resolution for WebSphere Application Server, WebSphere MQ, IBM Message broker, Process and Portal server.
  • Configured WebSphere resources including JDBC providers, JDBC data sources, connection pooling, and JavaMail sessions. Deployed Session and Entity EJBs in WebSphere.

Hire Now