We provide IT Staff Augmentation Services!

Hadoop Developer Resume

New York, NY


  • Over 8 years of IT experience in software development and Big Data Technologies and Analytical Solutions with 2+ years of hands - on experience in development and design of Java and related frameworks, full stack web development and 2+ years’ experience in design, architecture and data modeling as database developer
  • Over 4 years’ experience as Hadoop Developer with good knowledge of Hadoop framework, HDFS and Parallel processing implementation, Hadoop Ecosystems, Map Reduce programming paradigm, Job Tracker, Task Tracker, Name Node, Data Node, Hive, Pig, Python, HBase, Sqoop, Hue, Oozie, Impala and Spark.
  • Built and Deployed Industrial scale Data Lake on-premise and cloud platforms.
  • Experienced in handling different file formats like Text file, Avro data files, Parquet file, Sequence files, Xml and JSON files.
  • Extensively worked on Spark Core, Numeric RDDs, Paired RDDs, Dataframes, and Caching for developing Spark applications
  • Expertise in deployment of Hadoop , Yarn, Spark integration with Cassandra, etc.
  • Experienced in ETL, Data analysis and designing data warehouse strategies.
  • Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
  • Upgraded Hadoop CDH to 5.x and Hortonworks Ambari. Installed, upgraded and maintained CDH-based software, Cloudera Clusters, Cloudera Navigator.
  • Industrial experience in creating applications in Python, Java, Scala, Java Script (AngularJS, NodeJS and SQL Server 2017).
  • Experienced in writing custom MapReduce programs for data processing and UDFs for both Hive and Pig in Java. Extensively worked on MRV1 and MRV2 Hadoop architectures.
  • Experienced in analyzing large amounts of data sets writing PySpark scripts and Hive queries and used Python's packages like xlrd, numpy, pandas, scipy, scikit-learn and IDEs - PyCharm, Spyder, Anaconda, Jupyter, IPython.
  • Experienced in working with structured data using Hive QL, join operations, wrote custom UDF's and optimized queries
  • Experienced in working with diverse data by implementing complex MapReduce programs using design patterns.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
  • Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, Telnet sources etc.
  • Experienced in Oozie Workflow Engine in running workflow jobs with Hadoop MapReduce, Hive, Spark jobs.
  • Involved in moving all log files generated from various sources to HDFS and Spark for further processing.
  • Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
  • Experience in implementing Kerberos authentication protocol in Hadoop for data security and Dimensional modelling, logical modelling and Physical data modelling.
  • Experienced with code versioning and dependency management systems such as Git, SVT, and Maven and Testing MapReduce programs using MRUnit, JUnit, ANT, Maven.
  • Experienced in working with scheduling tools such as UC4, Cisco Tidal enterprise scheduler, or Autosys.
  • Adequate knowledge and working experience in Agile & Waterfall methodologies.
  • Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.


Confidential, New York, NY

Hadoop Developer


  • Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
  • Responsible for building scalable distributed data solutions using Hadoop .
  • Experienced in loading and transforming of large sets of structured, semi structured and unstructured data.
  • Developed Spark jobs and HiveJobs to summarize and transform data.
  • Involved in converting Hive/SQL queries into Spark transformations using Sparkdataframes, Scala and Python.
  • Implemented SparkScala application using higher order functions for both batch and interactive analysis requirement.
  • Experienced in developing Spark scripts for data analysis in both python and scala.
  • Wrote Scala scripts to make spark streaming work with Kafka as part of sparkKafka integration efforts.
  • Built on-premise data pipelines using kafka and spark for real time data analysis.
  • Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
  • Implemented Hive complex UDF's to execute business logic with Hive Queries.
  • Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
  • Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
  • Evaluated performance of SparkSQL vs IMPALA vs DRILL on offline data as a part of poc.
  • Worked on solr configuration and customizations based on requirements.
  • Implemented Spark using Scala and utilizing Data frames and SparkSQLAPI for faster processing of data.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Responsible for developing data pipeline by implementing Kafka producers and consumers and configuring brokers.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Developed Unit tests for MapReduce Programs using MRUnit testing library.
  • Experience in managing and reviewing Hadoop Log files.
  • Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
  • Setup SparkEMR to process huge data which is stored in AmazonS3.
  • Developed PIGUDF'S for manipulating the data as per business requirements and developed custom PIG Loaders.
  • Used Gradle for building and testing project
  • Fixed defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
  • Used Mingle and later moved to JIRA for task/bug tracking and used GIT for version control

Environment: Cloudera5.8, Hadoop2.7.2, HDFS2.7.2, AWS, PIG0.16.0, Hive2.0, Impala, Drill1.9, SparkSql1.6.1, MapReduce1.x, Flume1.7.0, Sqoop1.4.6, Oozie 4.1, Storm1.0, Docker1.12.1, Kafka 0.10, Spark1.6.3, Scala 2.12, Hbase0.98.19, ZooKeeper3.4.9, MySQL, Tableau, Shell Scripting, Java.

Confidential, San Jose, California

Hadoop Developer


  • Analyzing the requirement to setup a cluster.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
  • Involved in loading data from LINUX file system, servers, Java web services using KafkaProducers, partitions.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Implemented Storm topologies to pre-process data before moving into HDFS system.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
  • Migrated complex MapReduce programs into Spark RDD transformations, actions.
  • Implemented SparkRDD transformations to mapbusiness analysis and apply actions on top of transformations.
  • Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
  • Developed the MapReduce programs to parse the raw data and store the pre Aggregated data in the partitioned tables.
  • Loaded and transformed large sets of structured, semi structured, and unstructured data with MapReduce, Hive and pig.
  • Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
  • Experienced in developing custom input formats and data types to parse and process unstructured and semi structured input data and mapped them into key value pairs to implement business logic in MapReduce.
  • Involved in using HCATALOG to access Hive table metadata for MapReduce or Pig code.
  • Experience in implementing custom sterilizer, interceptor, source and sink as per the requirement in flume to ingest data from multiple sources.
  • Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Evaluated usage of Oozie for Workflow Orchestration.
  • Converted unstructured data to structured data by writing Spark code.
  • Indexed documents using Apache Solr.
  • Set up Solr Clouds for distributing indexing and search.
  • Automation of all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and running MR, PIG, and Hivejobs using Kettle and Oozie (Work Flow management)
  • Worked on No-SQL databases like Cassandra, MongoDB for POC purpose in storing images and URIs.
  • Integrating bulk data into Cassandra file system using MapReduce programs.
  • Used Talend ETL tool to develop multiple jobs and in setting workflows.
  • Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.
  • Worked on MongoDB for distributed storage and processing.
  • Designed and implemented Cassandra and associated RESTful web service.
  • Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
  • Used Cassandra CQL with JavaAPI's to retrieve data from Cassandra tables.
  • Worked on analyzing and examining customer behavioral data using Cassandra.
  • Created partitioned tables in Hive, mentored analyst and SQA team for writing Hive Queries.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in cluster setup, monitoring, test benchmarks for results.
  • Involved in build/deploy applications using Maven and integrated with CI/CD server Jenkins.
  • Involved in agile methodologies, daily scrum meetings, Spring planning's.

Environment: Hadoop, Cloudera 5.4, HDFS, pig0.15, Hive1.2.1, Flume1.6.0, Sqoop1.4.6, Oozie0.4, AWS Redshift3.3.2.9, Python 3.5.1, Spark1.5.0, Scala2.11, MongoDB3.0, Cassandra2.0.15, Solr6.6.1, ZooKeeper3.4.7, MySQL, Talend6.2., Shell Scripting 7.x, Linux Red Hat, Java.

Confidential, New Brunswick, New Jersey

Hadoop Developer


  • Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
  • Implemented and maintained monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Worked closely with the data modelers to model the new incoming data sets.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Performed real time analysis on the incoming data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

Confidential, New York, New York



  • Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
  • Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process.
  • Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets.
  • Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver.
  • Created Data Driven scripts for adding multiple customers, checking online accounts, user interfaces validations, and reports validations.
  • Performed cross verification of trade entry between mainframe system, its web application and downstream system.
  • Extensively used Selenium WebDriver API (XPath and CSS locators) to test the web application.
  • Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG.
  • Performed Data-Driven testing by developing Java based library to read test data from Excel & Properties files.
  • Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system.\ Developed data driven framework with Java, Selenium WebDriver and Apache POI which is used to do the multiple trade order entry.
  • Developed internal application using Angular.js and Node.js connecting to Oracle on the backend.
  • Expertise in debugging issues occurred in front end part of web-based application which is developed using HTML5, CSS3, Angular JS, Node.JS and Java.
  • Developed smoke automation test suite for regression test suite.
  • Applied various testing technique in test cases to cover all business scenario for quality coverage.
  • Interacted with development team to understand design flow, code review, discuss unit test plan.
  • Executed tests in System & integration Regression testing In Testing environment.
  • Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing.
  • Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.

Environment : HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer.

Confidential, Jersey City, New Jersey

Software Test Engineer


  • Involved in preparing Test Plan and Test cases.
  • Used java with TestNg frame work for automating scripts.
  • Developed test scripts to automate process of testing in Selenium WebDriver.
  • Implemented Data Driven Frameworks to create parameterized test scripts and generate XSLT reports using Selenium Web driver and TestNg framework.
  • Involved in writing Gherkins/scenarios and generated step definitions and methods using Cucumber, for different functionalities.
  • Developed test scripts to automate process of testing in Selenium WebDriver
  • Created automated Test scripts using automated tools and ran the test scripts on various Builds and instances.
  • Implemented robust logging mechanisms within new and existing automated tests using log4j.
  • Repository for version controlling.
  • Performed Cross browser testing using Selenium Grid and Java on various browsers.
  • Responsible for attending daily scrums and discuss daily issues and testing activities.
  • Executed automated selenium scripts and reproduced failures manually.
  • Developed and executed test cases for various web services using SOAPUI
  • Prepared Traceability Matrix to show the test coverage requirement vs. Test scripts.
  • Walkthroughs and peer review participation with team members and other project teams.
  • Involved in web service testing with SOAP UI and validated various responses against annotations.
  • Performed database testing by passing SQL Queries to retrieve data.
  • Performed usability, GUI, Functionality and regression testing of the new builds.
  • Performed browser (IE, Firefox, and Chrome) and platform (Windows 7) compatibility testing.
  • Identified and isolated software defects and reported them via JIRA.
  • Attended Daily Scrums and reporting daily activities or issues to scrum master.
  • Performed functional, compatibility testing on different browsers like Firefox, Chrome and IE.
  • Used GIT as a version control tool to store the test scripts.
  • Responsible for tracking daily testing activities and provide daily testing updates to higher management.

Environment : Java, Cucumber, Selenium, Web Driver, Data Driven, Test NG, Eclipse, Jira, SOAP UI v4.5, Oracle v9i/8i, XML, SQL, Windows 7, MS Project, HTML, Firebug, Fire path, Git.

Hire Now