We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Phoenix, AZ


  • Overall 8+ years of overall experience with strong emphasis on Design, Development, Implementation, Testing and Deployment of Software Applications.
  • Over 8+ years of comprehensive IT experience in BigData and Big DataAnalytics, Hadoop, HDFS, Map Reduce, YARN, Hadoop Ecosystem and ShellScripting.
  • Highly capable for processing large sets of Structured, Semi - structured and Unstructured datasets and supporting BigData applications.
  • Expertise in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MY SQL, Oracle, Teradata and DB2 using Sqoop.
  • Experience in ApacheSpark cluster and streams processing using Spark Streaming
  • Expertise in moving large amounts of log, streaming event data and Transactional data using Flume.
  • Experience in developing Map Reduce jobs in Java for data cleaning and pre-processing.
  • Expertise in writing PigLatin, Hive Scripts and extended their functionality using UserDefined Functions (UDF's).
  • Good knowledge on Hadoop, Hbase, Hive, Pig Latin Scripts, MR, Sqoop, Flume, Hive QL.
  • Experience in analyzing data using Pig Latin, HiveQL and HBase.
  • Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Implemented Proofs of Concept on Hadoop stack and different big data analytic tools, migration from different databases (i.e Teradata, Oracle, MYSQL) to Hadoop.
  • Worked on NoSQL databases including HBase, Cassandra and MongoDB
  • Successfully loaded files to Hive and HDFS from MongoDB, HBase
  • Experience in configuring Hadoop Clusters and HDFS.
  • Expertise in handling structured arrangement of data within certain limits (Data Layout's) using Partitions and Bucketing in Hive.
  • Expertise in preparing interactive Data Visualization's using Tableau Software from different sources.
  • Hands on experience in developing workflows that execute MapReduce, Sqoop, Pig,Hive and Shellscripts using Oozie.
  • Experience working with Cloudera HueInterface and Impala.
  • Expertise in developing SQLqueries, Stored Procedures and excellent development experience with Agile Methodology.
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
  • Excellent leadership, interpersonal, problem solving and time management skills.
  • Excellent communication skills both Written (documentation) and Verbal (presentation).


Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, and Zookeeper.

No SQL Databases: Hbase, Cassandra, MongoDB

Languages: C, Python, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, R Programming

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP


Confidential, Phoenix, AZ

Hadoop Developer


  • Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
  • Developed Spark scripts by using Scala as per the requirement.
  • Load the data into SparkRDD and performed in-memory data computation to generate the output response.
  • Develop scripts to automate the execution of ETL using shell scripts under Unix environment.
  • Support other Talend developers; providing mentoring, technical assistance, troubleshooting and alternative development solutions.
  • Remarkable experience in designing ETL processes and developing source to target mappings
  • Expert in developing, testing, and deploying ETL Tools Talend .
  • Performed different types of transformations and actions on the RDD to meet the business requirements.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analysing data.
  • Also worked on analysing Hadoop cluster and different bigdata analytic tools including Pig, HBase and Sqoop.
  • Good experience in AWS services, Networking, Storage, and Cloud Technology.
  • Primarily responsible for designing, implementing, Testing, and maintaining database solution for Azure.
  • Involved in loading data from UNIX file system to HDFS.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Implemented best offer logic using Pig scripts and Pig UDFs.
  • Responsible to manage data coming from various sources.
  • Experience on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Experience in using Map-Reduce programming model for Batch processing of data stored in HDFS.
  • Cluster coordination services through Zookeeper.
  • Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Analysed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Responsible for setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Involved in managing and reviewing Hadoop log files.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Responsible for writing Hive queries for data analysis to meet the business requirements.
  • Responsible for creating Hive tables and working on them using HiveQL.
  • Responsible for importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Designed and implemented Map Reduce based large-scale parallel relation-learning system.
  • Involved in scheduling Oozie workflow engine to run multiple Hive jobs
  • Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
  • Importing the unstructured data into the HDFS using Flume.
  • Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
  • Involved in using HBase Java API on Java application.
  • Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
  • Hands on design and development of an application using Hive (UDF).
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Provide support data analysts in running Pig and Hive queries.
  • Involved in HiveQL and Involved in Pig Latin.
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
  • Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.

Environment: Hadoop, MapReduce2.7.2, Hive2.0, Pig0.16, Talend, Sqoop2, Java, Oozie, HBase0.98.19, Kafka0.10.1.1, Spark2.0, Scala2.12.0, Eclipse, Linux, Oracle, Teradata.

Confidential, San Francisco, CA

Hadoop Developer


  • Worked on Hortonworks-HDP 2.5distribution
  • Responsible for building-scalable distribution data solution using Hadoop
  • Involved in importing data from MicrosoftSQLServer, MySQL, Teradata. into HDFS using Sqoop.
  • Using Jenkins AWS Code Deploy plugin to deploy to AWS and Migrated applications to the AWS cloud.
  • Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata.
  • Writing HiveQL queries for integrating different tables for create views to produce result set.
  • Collected the log data from Web Servers and integrated into HDFS using Flume.
  • Experienced on loading and transforming of large sets of structed and unstructured data.
  • Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats.
  • Written MapReduce programs to handle semi structed and un structed data like JSON, Avro data files and sequence files for log files.
  • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
  • Involved in loading data into HBaseNoSQL database.
  • Building, Managing and scheduling Oozie workflows for end to end job processing
  • Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
  • Analyzing of Large volumes of structured data using SparkSQL.
  • Written shell script to execute HiveQL.
  • Used Spark as ETL tool
  • Written Automated shell scripts in Linux/Unix environment using bash.
  • Migrated HiveQL queries into SparkSQLto improve performance.
  • Extracted Real time feed using Spark streaming and convert to RDD and process data into Data Frame and load the data into HBase.
  • Experienced in using Data Stax Spark connector which is used to store the data into Cassandra databaseor get the data from Cassandra database.
  • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.

Environment: : Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeeper, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL.

Confidential, Dallas, TX

Hadoop Developer


  • Developed a process for Sqooping data from multiple sources like SQLServer, Oracle and Teradata.
  • Responsible for creation of mapping document from source fields to destination fields mapping.
  • Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Ooziejobs.
  • Developed Oozie workflow's for executing Sqoop and Hive actions.
  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Involved in building databaseModel, APIs and Views utilizing python, in order to build an interactive web based solution
  • Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
  • Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
  • Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
  • Developed Hivescripts for performing transformation logic and also loading the data from staging zone to final landing zone.
  • Developed monitoring and notification tools using Python.
  • Worked on Parquet File format to get a better storage and performance for publish tables.
  • Involved in loading transactional data into HDFS using Flume for Fraud Analytics.
  • Developed Python utility to validate HDFS tables with source tables.
  • Designed and developed UDF'S to extend the functionality in both PIG and HIVE.
  • Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
  • Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from python using Python-MySQL connector and MySQL dB package to retrieve information.
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozieworkflows.
  • Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using SparkContext, Spark-SQL, PairRDD's, Spark YARN.
  • Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.

Environment: : Hadoop, HDFS2.6.3, Hive1.0.1, HBase0.98.12.1, Zookeeper3.5.1, Oozie, Impala1.4.1, Java(jdk1.6), ClouderaCDH 3, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume1.6.0, Scala2.11.6, Spark1.5.0, Sqoop1.4.6, Python3.5.1.


Java Developer


  • Responsible for understanding the scope of the project and requirements gathering
  • Created the database, user, environment, activity and class diagram for the project (UML).
  • Implemented the database using oracle database engine.
  • Created an entity object (business rules and policy, validation logic, default value logic, security).
  • Web application development using J2EE, JSP, Servlets, JDBC, JavaBeans, Struts, Ajax, Custom Tags, EJB, Hibernate, Ant, Junitand ApacheLog4j, Web Services, Message queue(MQ).
  • Created applications, connection pools, deployment of JSP & Servlets.
  • Used Oracle, MySQL database for storing user information.
  • Developed backed for application using PHP for web applications.
  • Hands on experience in all phases of SDLC (software development life cycle) involving.
  • Used the Eclipse as IDE, configured and deployed the application onto WebLogic application server using Maven build scripts to automate the build and deployment process.
  • Developed UML diagrams using Rational Rose
  • Created UI for web applications using HTML, CSS.
  • Created Desktop applications using J2EE, Swings.
  • Developed the process using Waterfall model.
  • Created SQL scripts for Oracle database.

Environment: Java, Servlets, JSF, Adf rich client UI framework ADF-BC (BC4J) 11g, Web Services using Oracle SOA, Oracle Web Logic.

Hire Now