We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Reston, VA


  • 8 Years of extensive experience including 4 years of Big Data and on E - Commerce, Healthcare domains and 4 years on Software Development Experience in ETL Informatica.
  • Having hands on experience in using Hadoop Technologies such as HDFS, HIVE, SQOOP, and Impala.
  • Having hands on experience in writing Map Reduce jobs in Hive, Pig.
  • Having experience on importing and exporting data from different systems to Hadoop file system using
  • SQOOP. Using Hadoop ecosystem components for storage and processing data.
  • Having experience on creating databases, tables, and views in HIVEQL, IMPALA and PIG LATIN.
  • Strong knowledge on Map Reduce concepts
  • Around 1year experience on Spark and Scala.
  • Hands on Experience in working with ecosystems like Hive, Pig, Map Reduce.
  • Strong Knowledge of Hadoop, Hive, and Hive analytical functions.
  • Efficient in building map reduce programs using Hive and Pig.
  • Involved in data migration to implement on Hadoop stack from different databases (SQL Server2008 R2, Oracle, and MYSQL).
  • Successfully loaded files to Hive and HDFS from MYSQL.
  • Loaded the dataset into Hive for ETL Operations.
  • Good knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • Strong Communication skills of written, oral, interpersonal and presentation.
  • Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
  • Extensive work experience with different SDLC approaches such as Waterfall and Agile development methodologies.
  • Good communication and presentation skills.
  • Ability to identify and resolve problems both independently and quickly.
  • Moving data from HDFS to RDBMS and vice-versa using SQOOP.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Implemented Commissioning and Decommissioning of new nodes to existing cluster.
  • Analyzing/Transforming data with Hive and Pig.


Skills: APACHE HADOOP HDFS, Hadoop, Hadoop Distributed File System, Oracle, SQL, Data warehouse, Informatica, unix

Big Data Hadoop Skills: HDFS, YARN, SQOOP, Flume, PIG and Hive SPARK: Spark Core, Spark Streaming, Spark SQL, NoSQL HBase.

Programming Language: Java

Analytics Tools: Informatica, RDBMS, Oracle


Hadoop Developer

Confidential - Reston, VA


  • Analyze large datasets to provide strategic direction to the company.
  • Involved in analyzing the system and business.
  • Developed SQL statements to improve back-end communications.
  • Loaded unstructured data into Hadoop File System (HDFS).
  • Created ETL jobs to load Twitter JSON data and server data into MongoDB and transported MongoDB into the Data Warehouse.
  • Created reports and dashboards using structured and unstructured data.
  • Involved in importing data from MySQL to HDFS using SQOOP.
  • Involved in writing Hive queries to load and process data in Hadoop File System.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Involved in working with Impala for data retrieval process.
  • Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
  • Sentiment Analysis on reviews of the products on the client's website.
  • Exported the resulted sentiment analysis data to Tableau for creating dashboards.

Environment: Cloudera, CDH4.3, Hadoop, Map Reduce, HDFS, Hive, MangoDB, SQOOP, MYSQL, SQL, Impala, Tableau.

Hadoop Developer

Confidential - Boston, MA


  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.
  • Shared responsibility for administration of Hadoop, Hive, and Pig.
  • Developed Map Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.
  • Understand clearly the business requirements of the client with respect to the risk rating modules and report modules.
  • Working in the Cluster Setup 2-node and 5-node clusters with CDH3 distribution.
  • Involved in the data prediction analysis using K-Mean algorithm.
  • Coordinate discussions with customer and functional team as may be required to get various inputs.
  • Work closely with the technology counterparts in communicating the business requirements.
  • Application design and database design.
  • Technical design document preparation.

Environment: Java, Machine learning, Cloud Era, Apache Hadoop, HDFS, Hive, Pig, Apache Spark, Spark Streaming, Spark SQL, SCALA, Git.

Hadoop Developer

Confidential - San Francisco, CA


  • To lead the Big Data Analytics solution project to load the data from Source all through into Client's Modern Analytics Platform.
  • Analyze and Ingest Policy, Claims, Billing and Agency Data in Client's Solution which is done through multiple stages.
  • Written multiple Map Reduce programs to extract data for extraction, transformation, and aggregation from different sources having multiple file formats including XML, JSON, CSV &other compressed file formats.
  • Assisted with data capacity planning and node forecasting.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using SQOOP and automated the SQOOP jobs by scheduling in Oozie.
  • Create Hive scripts to load data from one stage into another and implemented incremental load with the changed data architecture.
  • The Hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
  • Performed data analysis, queries on hive, pig on AMBARI(Hortonworks)
  • Enhanced Hive performance by implementing Optimizing and Compressing Techniques.
  • Implemented Hive partitioning and bucketing to improve query performance in the Staging layer which is de-normalized form of the Analytics Model.
  • Implemented techniques for efficient execution of Hive queries like Map Joins, compress map/reduce output, parallel execution of queries.
  • Issued SQL queries via Impala to process the data stored in HDFS and HBASE.
  • Plan and review the deliverables. Assist the team in their development & deployment activities.
  • Involved in cluster setup meetings with the administration team.

Environment: Apache Hadoop 2.2.0, Hortonworks, MapReduce, Hive, Hbase, HDFS, PIG, Sqoop, Flume, Impala, Spark, Oozie, Kafka, MongoDB, UNIX, Shell Scripting, XML, JSON.

Jr.Hadoop Developer

Confidential - MEMPHIS, TN


  • Extracted and updated the data into HDFS using Sqoop import and export command line utility interface.
  • Responsible for developing data pipeline using Flume, Sqoop, and Pig to extract the data from web logs and store in HDFS.
  • Involved in developing Hive UDFs for the needed functionality.
  • Involved in creating Hive tables, loading with data and writing Hive queries.
  • Managed works including indexing data, tuning relevance, developing custom tokenizers and filters, adding functionality includes playlist, custom sorting and regionalization with a Solar search engine.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Used pig to do transformations, event joins filter boot traffic and some pre-aggregations before storing the data onto HDFS.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like a spark.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Extending Hive functionality by writing custom UDFs.
  • Experience in managing and reviewing Hadoop log files
  • Developed data pipeline using Flume, Sqoop, pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in emitting processed data from Hadoop to relational databases and external file systems using Sqoop.
  • Orchestrated hundreds of Sqoop scripts, pig scripts, Hive queries using Oo zie workflows and sub-workflows.
  • Loaded cache data into HBase using Sqoop.
  • Experience in custom Talend jobs to ingest, enrich and distribute data in MapR, Cloudera Hadoop ecosystem.
  • Created lots of external tables on Hive pointed to HBase tables.
  • Analyzed HBase data in Hive by creating externally partitioned and bucketed tables.
  • Worked with cache data stored in Cassandra.
  • Injected the data from External and Internal Flow Organizations.
  • Used the external tables in Impala for data analysis.
  • Supported MapReduce Programs those are running on the cluster.
  • Participated in apache Spark POCS for analyzing the sales data based on several business factors
  • Participated in daily scrum meetings and iterative development.

Environment: s: Hadoop, MapReduce, Hdfs, Pig, Hive, HBase, Impala, Sqoop, Oozie, Apache Spark, Java, Linux, SQL Server, Zookeeper, Tableau.

Informatica Developer

Confidential, Louisville, KY


  • Developed ETL programs using Informatica to implement the business requirements.
  • Communicated with business customers to discuss the issues and requirements.
  • Created shell scripts to fine tune the ETL flow of the Informatica workflows.
  • Used Informatica file watch events to pole the FTP sites for the external mainframe files.
  • Production Support has been done to resolve the ongoing issues and troubleshoot the problems.
  • Performance tuning was done at the functional level and map level. Used relational SQL wherever possible to minimize the data transfer over the network.
  • Effectively used Informatica parameter files for defining mapping variables, workflow variables, FTP connections, and relational connections.
  • Involved in enhancements and maintenance activities of the data warehouse including tuning, modifying of stored procedures for code enhancements.
  • Effectively worked in Informatica version based environment and used deployment groups to migrate the objects.
  • The used debugger in identifying bugs in existing mappings by analyzing data flow, evaluating transformations.
  • Pre-and post-session assignment variables were used to pass the variable values from one session to other.
  • Designed workflows with many sessions with the decision, assignment task, event wait, and event raise tasks, used the informatic scheduler to schedule jobs.
  • Reviewed and analyzed functional requirements, mapping documents, problem-solving and trouble shooting.
  • Performed unit testing at various levels of the ETL and actively involved in team code reviews.
  • Identified problems in existing production data and developed one-time scripts to correct them.
  • Fixed the invalid mappings and troubleshoot the technical problems of the database.

Hire Now