We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Minneapolis, MN


  • Over 10 Years of IT Experience as Hadoop, Data stage & Mainframe developer and Application engineer Extensive experience in designing and testing systems following the waterfall and agile software model’s IT in Analysis, Design, Development, Testing, Documentation, Deployment, Integration, and Maintenance of web based and Client/Server applications using SQL and Big Data technologies.
  • I have experience in Application Development using Hadoop and related Big Data technologies such as HIVE, PIG, OOZIE, SQOOP, and ZOOKEEPER.
  • In - depth Knowledge of Data Structures, Design and Analysis of Algorithms and good understanding of Data Mining and Machine Learning techniques.
  • Excellent knowledge on Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node.
  • Having Hadoop/Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Proficient in design and development of Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, MapReduce, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig.
  • Skilled in writing Map Reduce jobs in Pig and Hive.
  • Knowledge in managing and reviewing Hadoop Log files.
  • Expertise in wide array of tools in the Big Data Stack such as Hadoop, Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Excellent programming skills with experience in SQL and Programming. Have knowledge on Java & scala
  • In depth and extensive knowledge of analyzing data using HiveQL, Pig Latin, HBase.
  • Experience in tuning and troubleshooting performance issues in Hadoop cluster.
  • Worked on importing data into HBase using HBase Shell and HBase Client API.
  • Hands on experience in using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Extensive experience working on various databases and database script development using SQL and PL/SQL
  • Hands on experience in application development using RDBMS and Linux Shell Scripting.
  • Experience in writing Pig and Hive scripts and extending Hive and Pig core functionality by writing custom UDFs.
  • Knowledge in writing live Real-time Processing using Spark Streaming with Kafka.
  • Involved in HBase setup and storing data into HBase, which will be used for further analysis.
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive with JSON and Avro.
  • Extensive experience in all phases of Software Development Life Cycle (SDLC) including identification of business needs and constraints, collection of requirements, design, implementation, testing, deployment and maintenance.
  • Worked on various database technologies like Oracle, DB2-UDB, Teradata
  • Enhancement and production support activities with extensive working knowledge on IBM Mainframe technologies like COBOL, JCL, VSAM and DB2.
  • Worked extensively on ETL tool like Data stage, and experience in using Data stage
  • Developed various Data Stage Mappings, Transformations for migration of data from various existing systems to the new systems using DataStage Designer.
  • Good exposure to Data Warehouse concepts and UNIX operating system.
  • Other technical expertise includes SQL.
  • Worked in various phases of the SDLC (Coding, Unit Testing, System Testing, Integration Testing, Stress Testing, UAT, Post-implementation support).


Confidential, Minneapolis, MN

Hadoop developer


  • Coordinated with business customers to gather business requirements. And, interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents
  • Worked on analyzing Hadoop 2.7.2 cluster and different Big Data analytic tools including Pig 0.16.0, Hive 2.0 HBase 1.1.2 database and SQOOP 1.4.6
  • Implemented Spark 2.0 using Scala and Spark SQL for faster processing of data.
  • Involved in validating the aggregate table based on the rollup process documented in the data mapping. Developed Hive QL, Spark RDD SQL and automated the flow using shell scripting
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Used OOZIE 1.2.1 Operational Services for batch processing and scheduling workflows dynamically and created UDF's to store specialized data structures in HBase and Cassandra.
  • Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Hands on experience in application development using RDBMS, and Linux shell scripting.
  • Developed and updated social media analytics dashboards on regular basis.
  • Create a complete processing engine, based on Hortonworks distribution, enhanced to performance.
  • Manage and review Hadoop log files.
  • Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
  • Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.

Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Oracle 11g, scala, HDFS, Eclipse.

Confidential, MN

Hadoop developer


  • Involved in the high-level design of the Hadoop 2.6.3 architecture for the existing data structure and Problem statement and setup the 64-node cluster and configured the entire Hadoop platform.
  • Implemented Data Interface to get information of customers using Rest API and Pre-Process data using MapReduce 2.0and store into HDFS (Hortonworks)
  • Extracted files from MySQL, Oracle, and Teradata 2through Sqoop 1.4.6 and placed in HDFS Hortonworks Distribution and processed.
  • Configured Hive 1.1.1 meta store, which stores the metadata for Hive tables and partitions in a relational database.
  • Worked with various HDFS file formats like Avro 1.7.6, ORC, Parquet Sequence File, Json and various compression formats like Snappy, bzip2. cleaning and preprocessing on Hortonworks.
  • Developed the Pig 0.15.0 UDF's to pre-process the data for analysis and Migrated ETL operations into Hadoop system using Pig Latin scripts and Python Scripts 3.5.1.
  • Used Pig as ETL tool to do transformations, event joins, filtering and some pre-aggregations before storing the data into HDFS.
  • Troubleshooting, debugging & altering DataStage issues, while maintaining the health and performance of the ETL environment.
  • Developed Hive queries for data sampling and analysis to the analysts.
  • Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Developed custom Unix SHELL scripts to do pre-and post-validations of master and slave nodes, before and after configuring the name node and data nodes respectively.
  • Experienced in running Hadoop streaming jobs to process terabytes of formatted data using Python scripts.
  • Developed small distributed applications in our projects using Zookeeper 3.4.7 and scheduled the workflows using Oozie 4.2.0.
  • Developed complex DataStage flows to load the data from various sources using different components.
  • Created HBase tables from Hive and Wrote HiveQL statements to access HBase table's data.
  • Proficient in designing Row keys and Schema Design for NoSQL Database Hbase Used Hive to perform data validation on the data ingested using scoop and the cleansed data set is pushed into Hbase.
  • Created a MapReduce program which considers data in HBase current and prior versions to identify transactional updates. These updates are loaded into Hive external tables which are in turn referred by Hive scripts in transactional feeds generation.

Environment: Hadoop (Hortonworks), HDFS, Map Reduce, Hive, Scala, Python, Pig, Sqoop, WebSphere, Hibernate, Oozie, REST Web Services, Solaris, DB2, UNIX Shell Scripting, JDBC.

Confidential - Nashville, TN

Hadoop developer


  • Executed Hive queries that helped in analysis of market trends by comparing the new data with EDW reference tables and historical data.
  • Managed and reviewed Hadoop log files job tracker, NameNode, secondary NameNode, data node, and task tracker.
  • Tested raw market data and executed performance scripts on data to reduce the runtime.
  • Involved in loading the created Files into HBase for faster access of large sets of customer data without affecting the performance.
  • Importing and exporting the data from HDFS to RDBMS using Sqoop and Kafka.
  • Executed the Hive jobs to parse the logs and structure them in relational format to provide effective queries on the log data.
  • Created Hive tables (Internal/external) for loading data and have written queries that will run internally in MapReduce and queries to process the data.
  • Developed Pig Scripts for capturing data change and record processing between new data and already existed data in HDFS.
  • Involved in importing of data from different data sources, and performed various queries using Hive, MapReduce, and Pig Latin.
  • Involved in loading data from local file system to HDFS using HDFS Shell commands.
  • Experience on UNIX shell scripts for process and loading data from various interfaces to HDFS.
  • Develop different components of Hadoop ecosystem system process that involves Map Reduce, and Hive.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Big Data, Yarn, HBase, Oozie, SQL scripting, Linux shell scripting, Mahout, Eclipse and Hortonworks.

Confidential - Greenville, SC

Hadoop developer


  • Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Hive, Pig, HBase, Sqoop, Spark AVRO, Zookeeper etc.), Hortonworks
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
  • Involved in installing Hadoop Ecosystem components.
  • Importing and exporting data into HDFS, Pig, Hive and HBase using SQOOP.
  • Responsible to manage data coming from different sources.
  • Flume and from relational database management systems using SQOOP.
  • Responsible to manage data coming from different data sources.
  • Involved in gathering the requirements, designing, development and testing.
  • Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Worked on Hue interface for querying the data.
  • Created Hive tables to store the processed results in a tabular format.
  • Developed Hive Scripts for implementing dynamic Partitions.
  • Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's.
  • Extensive knowledge on PIG scripts using bags and tuples.
  • Experience in managing and reviewing Hadoop log files.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Exported analyzed data to relational databases using SQOOP for visualization to generate reports for the BI team.

Environment: Hadoop (CDH4), UNIX, Eclipse, HDFS, MapReduce, Apache Pig, Hive, HBase, Oozie, SQOOP and MySQL.

Confidential, Minneapolis, MN

Sr BI Developer


  • Extensive experience in system analysis, design, development and implementation of Data warehousing system using IBM DataStage V8.5 in DB2 and Oracle.
  • Sound knowledge on Shell scripting, DataStage parallel version using multiple stages (join, filter, funnel, lookup, sequence generator, etc.,) and SQL queries.
  • Involved in gathering Business requirement and Designing Functional specifications for OLAP systems.
  • Experienced in effectively working and coordinating with multiple teams across multiple time zones.
  • Monitor critical application batch jobs thru Control-M.
  • Debugging the production failure and recover them and conduct root cause analysis for permanent fix.
  • Work with business users to understand their stability problems & provide solutions
  • Partner with vendors like Nice, Gmalto, TSYS, Argus for new development
  • Setup data transfer mechanism like SFTP, MFT

Environment: Visual Studio 2005/2010, XML, MS SQL Server 2005, SSIS, Microsoft TFS 2005/2010, DataStage


Mainframe Developer


  • Business Requirement gathering/analyzing
  • Conduct impact analysis of interface applications
  • Involved in effort estimation
  • Design the HLDs & LLDs
  • Participate in design review meetings
  • Develop application in Mainframes using JCL, COBOL Batch and CICS, DB2
  • Perform unit testing to verify the functionalities of developed applications work as expected
  • Work with testing team for RITs, SITs & defect fixes
  • Track & fix the defects raised by testing teams
  • Work with business for UAT testing
  • Creating knowledge document & providing knowledge transition to support teams
  • Provide post implementation support for the developed applications
  • Provide solution for stability issues post implementation

Environment: O/S 390 - MVS, MS Excel, MS Visio, TSO/ISPF, JCL, COBOL, DB2, CICS, VSAM, IDMS, Xpediter, FILE-AID, change man, FTP, Connect Direct, QC, OPC, SAVERS.

Confidential, Minneapolis

Mainframe developer (Enhancement)


  • Business Requirement gathering/analyzing for the Client’s Customers.
  • Conduct impact an analysis of interface applications
  • Create low and high-level design documents
  • Perform the changes in the Cobol components
  • Create a test data based on the business requirement and perform unit testing of the application modified
  • Coordinate with onsite coordinator for User Acceptance Testing (UAT)


Languages: Scala, Python, SQL, Scala, Java

Other Languages: COBOL, EASYTRIEVE, JCL, SQL, DB2 Store Procedures, HTML, XML, Shell scripting

Big Data: Spark and Scala, Hadoop Ecosystem Components - HDFS, Hive, Sqoop, Map Reduce, Pig and Hortonworks

ETL Tool: Datastage

Databases: NoSQL- HBase, SQL- DB2, MySQL, Teradata

Schedulers: Oozie, Control-M, CA7

OS Windows: 7/8/8.1/10, Unix, Linux

Other Tools: Hue, IntelliJ IDEA, Eclipse, DB Visualizer, Maven, Zoo Keeper

Data Interaction tools: Hue, SQL Developer, Aqua data Studio, Teradata SQL

Transfer Method: MQ, FTP, XCOM & MFT

Version control: Git & NED

Hire Now