We provide IT Staff Augmentation Services!

Hadoop Senior Consultant Resume

5.00 Rating

Ashburn, VA


  • Over 9 years of IT experience in various domains wif Hadoop Eco Systems, Core java and SQL&PL/SQL Technologies wif hands - on project experience in various Verticals which includes financial services and trade compliance.
  • Extensive hands-onexperience in Spark Core, Spark-Sql, Spark Streaming and Spark machine learning using Scala and Python programming language.
  • Solid understanding of RDD operations in Apache Spark i.e., Transformations &Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimising Broadcasts.
  • In depth understanding of Apache spark job execution Components like DAG, lineage graph, Dag Scheduler, Task scheduler, Stages and task.
  • Experience in exposing Apache Spark as web services.
  • Good understanding of Driver, Executor Spark web UI.
  • Experience in submitting Apache Spark job and map reduce jobs to YARN.
  • Experience in real time processing using Apache Spark and Flume, Kafka.
  • Migrated Python Machine learning modules to scalable, high performance and fault-tolerant distributed systems like Apache Spark.
  • Strong experience in Spark SQL UDFs, Hive UDFs, Spark SQL Performance, Performance Tuning. Hands on experience in working wif input file formats like orc, parquet, json, avro.
  • Good expertise in coding in Python, Scala and Java.
  • Good understanding of teh map reduces framework architectures (MRV1 & YARN Architecture).
  • Good Knowledge and understanding of Hadoop Architecture and various components in Hadoop ecosystems - HDFS, Map Reduce, Pig, Sqoop and Hive.
  • Developed various Map Reduce applications to perform ETL workloads on meta data and terabytes of data.
  • Hands on experience in cleansing semi-structured and unstructured data using Pig Latin scripts
  • Good working knowledge in creating Hive tables and worked using HQL for data analysis to meet teh business requirements.
  • Experience in managing and reviewing Hadoop log files.
  • Having good working experience of No SQL database like Hbase,Cassandra and MangoDB
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Experience in importing and exporting teh data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
  • Experience in working wif flume to load teh log data from multiple sources directly into HDFS
  • Experience in scheduling time driven and data driven Oozieworkflows.
  • Used Zookeeper on a distributed Hbase for cluster configuration and management.
  • Worked wif Avro Data Serialization system.
  • Experience in fine-tuning Map reduces jobs for better scalability and performance.
  • Experience in writing shell scripts do dump teh shared data from landing zones to HDFS.
  • Experience in performance tuning teh Hadoop cluster by gathering and analyzing teh existing infrastructure.
  • Expertise in Client Side designing and validations using HTML and Java Script.
  • Excellent communication and inter-personal skills detail oriented, analytical, time bound, responsible team player and ability to coordinate in a team environment and possesses high degree of self-motivation and a quick learner.


Confidential Frameworks: Hadoop, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, Map Reduce, Cloudera, Mongo DB, Spark, Scala.

Confidential distribution: Cloudera, Amazon EMR

Programming languages: Oracle PL/SQL,Core Java, Scala, Python, Shell Scripting

Operating Systems: Windows, Linux (Ubuntu)

Databases: Oracle10g,Mysql, Netezza, Sql Server, Tera Data, Postgres

Designing Tools: Eclipse,PL/SQL Developer, Toad,Putty

Development methodologies: Agile, Waterfall

Messaging Services: ActiveMQ, Kafka,JMS

Version Tools: PVCS, SVN and CVS, Git

Analytics: Tableau, SPSS, SAS EM and SAS JMP


Confidential, Ashburn, VA

Hadoop Senior Consultant


  • Write spark jobs to read data into a data frame and apply various transformations and actions to filter and transform data into teh required format
  • Build scalable framework using spark’s advanced framework.
  • Write spark jobs to write final data into HDFS and RDBMS.
  • Manage and monitor Hadoop cluster.
  • Manage ETL team and motivate/educate them to learn and work efficiently
  • Involved Hive queries into Spark SQL to improve performance.
  • Executed Spark RDD transformations and actions as per business analysis needs
  • Imported data from MySQL to HDFS using Sqoop and manage Hadoop log files
  • Fully automated job scheduling, monitoring, and cluster management wifout human.
  • Used Sqoop to import and export data among HDFS, MySQL database and Hive.

Environment: Linux, Hadoop, Spark core, Spark SQL, Scala, HiveCITI Bank, USA Jul ’2015 to Jun’2017


Hadoop Consultant.


  • Load and transform large sets of structured, semi structured and unstructured data coming from different source systems and a variety of portfolios
  • Used Sparkdata frame to read text data, CSV data, and image data from HDFS, S3 and Hive.
  • Worked closely data scientist for building predictive model using Spark.
  • Cleaned input text data using Spark Machine learning feature exactions API.
  • Migrated Hive queries into Spark SQL to improve performance.
  • Involving in Migrating teh coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
  • Trained model using historical data stored in HDFS and Amazon S3.
  • Used Spark Streaming to load teh trained model to predict on real time data from Kafka.
  • Executed Spark RDD transformations and actions as per business analysis needs
  • Imported data from MySQL to HDFS using Sqoop and manage Hadoop log files
  • Fully automated job scheduling, monitoring, and cluster management wifout human.
  • Created Hive tables and involved in meta data loading and writing Hive UDFs
  • Used Sqoop to import and export data among HDFS, MySQL database and Hive.
  • Migrated python scikit learn machine learning to data frame based spark machine learning algorithms.

Environment: Spark core, SparkSQL, Spark streaming, Spark machine learning, Scala, Data frames, Datasets,AWS, Kafka Hive, Sqoop, Hbase,Github, Webflow, Amazon s3, Amazon EMR.


Hadoop Associate Consultant


  • Created various Map reduce jobs for performing ETL transformations on teh transactional and application specific data sources.
  • Imported data from our relational data stores to Hadoop using Sqoop
  • Wrote PIG scripts and executed by using Grunt shell.
  • Worked on teh conversion of existing Map Reduce batch applications for better performance.
  • Confidential analysis using Pig and User defined functions (UDF).
  • Worked on loading tables to Impala for faster retrieval using different file formats .
  • Teh system was initially developed using Java. Teh Java filtering program was restructured to has business rule engine in a jar that can be called from both java and Hadoop.
  • Created Reports and Dashboards using structured and unstructured data.
  • Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
  • Performed joins, group by and other operations in Map Reduce by using Java and PIG.
  • Processed teh output from PIG, Hive and formatted it before sending to teh Hadoop output file.
  • Used HIVE definition to map teh output file to tables.
  • Setup and benchmarked Hadoop/HBase clusters for internal use
  • Wrote data ingesters and map reduce programs
  • Reviewed teh HDFS usage and system design for future scalability and fault-tolerance
  • Wrote MapReduce/HBase jobs
  • Worked wif HBase, NOSQL database.

Environment: ApacheHadoop 2.x, MapReduce, HDFS, Hive, Pig, Hbase, Sqoop, Flume, Linux, Java 7, Eclipse, NOSQL.


Hadoop Associate Consultant


  • Developed Confidential Solutions that enabled teh business and technology teams to make data-driven decisions on teh best ways to acquire customers and provide them business solutions.
  • Installed and configured Apache Hadoop, Hive, and HBase.
  • Worked on Hortonworks cluster, which was used to process teh Confidential .
  • Developed multiple map reduce jobs in java for data cleaning and pre-processing.
  • Sqoop was used to pull data into Hadoop distributed file system from RDBMS and vice versa
  • Defined workflows using Oozie.
  • Used Hive to create partitions on hive tables and analyzes dis data to compute various metrics for reporting.
  • Created Data model for Hive tables
  • Good Experience in managing and reviewing Hadoop log files
  • Used Pig as ETL tool to do transformations, joins and pre-aggregations before loading data onto HDFS.
  • Worked on large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from different sources
  • Installed and configured Hive and also developed Hive UDFs to extend core functionality of hive
  • Responsible for loading data from UNIX file systems to HDFS.

Environment: Apache Hadoop 2.x, MapReduce, HDFS, Hive, HBase, Pig, Oozie, Linux, Java 7, Eclipse.


Associate Consultant


  • Full life cycle experience including requirements analysis, high level design, detailed design, data model design, coding, testing and creation of functional and technical design documentation.
  • Extensively involved in writing stored procedures, functions, packages as per teh business requirements.
  • Redesigned existing procedures and packages to enhance teh performance.
  • Debugging Pro*C and PL/SQL code block of stored procedures.
  • Generation of ad-hoc reports using SQL and stored procedures.
  • Involved in teh continuous enhancements and fixing of production problems.
  • Analysis of CRs those are raised from UAT & Production.
  • Coordinating wif teh UAT & Production team as well as wif teh users.
  • Used Bulk Collectionsforbetter performanceand easy retrieval of data, by reducing context switching between SQL and PL/SQL engines.
  • Wrote SQL, PL/SQL, SQL*Plus programs required to retrieve data using cursors and exception handling
  • Involved in SIT and UAT Support for solving critical issues.
  • Involved in requirements, Design phases, Coding, Testing of teh functionality.
  • Creating and Maintaining Database objects.
  • End to end functional testing for entire application

Environments: Oracle11g, SQL, PL/SQL, Pro*c, Putty, Sun Solaris.


Software Engineer


  • Involved in writing stored procedures, functions, packages as per teh business requirements.
  • Developed Pro*c programs for flat file generation.
  • Worked on Request for Changes (RFC) and Production Problem Resolutions (PPR).
  • Provided support across teh various phases of teh project.
  • Prepared and executed unit test cases.

Environment: Oracle11g, SQL, PL/SQL, Pro*c, Putty, Sun Solaris

We'd love your feedback!