We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Sanjose, CA


  • Professional Software developer with 4.5+ years of technical expertise in all phases of Software development cycle (SDLC), in various Industrial sectors expertizing in Big Data analyzing Frameworks and ETL Tools.
  • 3+ years of industrial experience in Big Data analytics, Data manipulation, using Hadoop Eco system tools Map - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, Spark, Flume, Sqoop, Avro, Sqoop, AWS and Zookeeper.
  • Extraordinary Understanding of Hadoop building and Hands on involvement with Hadoop segments such as YARN, Name Node, Data Node and HDFS Framework.
  • Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
  • Developed Customized UDFs and UDAF’s in java to extend HIVE’s core functionality.
  • Created Hive tables to store structured data into HDFS and processed it using HiveQL.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
  • Experience in using D-Streams, Accumulator, Broadcast variables, RDD caching for Spark Streaming.
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.
  • Extensive knowledge of AWS cloud infrastructure - RDS, Redshift, DynamoDB, EC2, EMR, Route53, CloudWatch, Lambda and IAM.
  • Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Having strong Testing and Debugging skills with exposure to complete software development life cycle from requirements gathering to product release.
  • Good Experience in creating Business Intelligence solutions and designing ETL workflows using Tableau.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.


Big Data Ecosystem: HDFS, Map Reduce, Spark, YARN, Hive, Pig, Sqoop, Flume

Hadoop Distributions: Cloudera (CDH5), Hortonworks, Apache

Languages: Python, SQL and Java

No SQL Databases: MongoDB and Amazon DynamoDB

Cloud Computing Tools: Amazon AWS

DB Languages: PL/SQL

RDBMS: Oracle 11c, MySQL, Teradata

Development Tools: Microsoft SQL Studio, Toad, Eclipse

Development methodologies: Agile/Scrum

Visualization and analytics tool: Tableau Software

Operating systems: UNIX, Red Hat LINUX, Mac OS and Windows Variants

ETL Tools: Informatica


Confidential, Sanjose, CA

Hadoop Developer


  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Transformations and other during ingestion process.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Created lambda functions that are triggered by SNS and CloudWatch events to run daily jobs as scheduled.
  • Worked on Spark in building BI reports using Tableau. Tableau was integrated with Spark using Spark-SQL.
  • Monitoring and Debugging Spark jobs which are running on a spark cluster using Cloudera Manager.
  • Worked with different File Formats like textfile, Parquet, ORC for HIVE querying and processing based on business logic.
  • Worked extensively with HIVE DDLs and Hive Query language (HQLs).

Environment: Scala, Python, Hadoop, Apache Spark, MapReduce, Amazon Web Services, CDH 5.9, Cloudera Manager, Control M Scheduler, Shell Scripting, Agile Methodology, JIRA, Git, Tableau.


Hadoop Developer


  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs for data cleaning and preprocessing.
  • Involved in creating Hive tables, writing complex Hive queries to populate Hive tables.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Optimized Hive QL Scripts by using execution engine like Tez.
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
  • Implemented SQOOP for large dataset transfer between Hadoop and RDBMS.
  • Used different file formats like Text files, Avro, Parquet and ORC.
  • Worked with different File Formats like textfile, Parquet for HIVE querying and processing based on business logic.
  • Used JIRA for creating the user stories and creating branches in the bitbucket repositories based on the story.
  • Knowledge on creating various repositories and version control using GIT.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Python, Hadoop, MapReduce, CDH 5.9, Cloudera Manager, Control M Scheduler, Shell Scripting, Agile Methodology, JIRA, Git, Tableau.


Informatica ETL Developer


  • Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored Procedure, and Union to develop robust mappings in the Informatica Designer.
  • Created Informatica maps using various transformations like Source Qualifier, Expression, Lookup, Stored procedure, Aggregate, Update Strategy, Joiner, Filter and Router.
  • Performed coding, testing and implementation of Informatica mappings and workflows.
  • Mentoring the team members on support activities.
  • Migrated ETL codes from Development to Test to Production.
  • Problem resolution of high severity batch and online problems in the production system.

Environment: Informatica Power Center 8.x/ 7.x, Oracle/DB2, UNIX, Control M

Hire Now