We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

5.00/5 (Submit Your Rating)

San Jose, CA

PROFESSIONAL SUMMARY:

  • Overall 11+ Years - Technology Lead - Big Data, Google Cloud platform, ETL process (informatica Power Center)
  • Good experience in Google Cloud platform Big Data and Machine Learning with Tensor Flow, Server less Data Analysis with Google Big Query and Cloud Dataflow, Leveraging Unstructured Data with Cloud Dataproc
  • Good hands on experience in AWS security, storage, compute and Analytics components like S3, Glacier, RDS, EMR, IAM components.
  • Strong Hands on experience on major components in Hadoop Ecosystem like Map Reduce, Spark, Scala, Python, Hdfs, Hive, Pig, Solr, Zookeeper, Sqoop, Oozie, NoSQL-Cassandra, Rapid-Minor, Data bricks API’s and different file formats.
  • Extensively used ETL process tool Informatica power center to extract data from various sources and load in to data mart and data warehouse.

TECHNICAL SKILLS:

Domain: Hadoop, Big Data, ETL, Data Bricks API and Google API

Languages: Spark, Scala, Map Reduce, Solr, Python, PL1, SQL, CQL, Assembler

Integration Tools: Informatica Power Center, Rapid Minor

Database & Ware House: Hive, HBase, DB2, Cassandra

Framework / Tools: Hadoop, Hdfs, Hive, Pig, Zookeeper, Sqoop, Oozie, Data bricks API’s and different file formats (CSV, XML, Avro, Parquet, Json)

Job Scheduling and Monitoring tool: Control-M, Tidal Enterprise Scheduler

Cloud Experience: Google Cloud & API’s, AWS

MAJOR PROFESSIONAL EXPERIENCE:

Hadoop Engineer

Confidential, San Jose, CA

Technologies used - Spark-Scala, Python, Spark-SQL, Sqoop, Hive, HDFS, Solr, Cassandra, Tidal, different file formats, Data bricks API’s and ETL (Informatica), Rapid Minor, Tableau, GCP Components ( Big query, Cloud Storage, Cloud- ML,Data-Proc, Data-Lab, IAM, Cloud SQL)

Responsibilities:

  • Framework runs on Python and uses the hive queries for data related operations in Hadoop. The whole framework is made to run on UNIX server with the help of shell Script. It is designed to accomplish the above requirements with some important additional features.
  • Sourcing and Processing of data for any particular campaign can be done in one go as framework allows to execute hive queries one after another on the same data set to give out the required output.
  • Allows to apply Business rules in the form of Hive queries.
  • Automatically generates the independent log file for each and every campaign run and send out the count summary report in the form of automated email to the team.
  • Framework is able to identify the Party ID drop-offs caused due to any particular business rule and generates a log table and summarize each and every detail step wise that helps team in debugging or analyze the whole process.
  • Translate complex functional and technical requirements into detailed design.
  • Developed hive and spark jobs and implemented into various environment.
  • Developed scoop jobs to import/export the data.
  • Developed the jobs to populate the data into presentation layers like tableau.
  • Implemented High-speed querying process.
  • Loading from disparate data sets into Hadoop.
  • Pre-processing using Hive and spark.
  • Designing, building, installing, configuring and supporting Hadoop.
  • Perform analysis of vast data stores and uncover insights.
  • Maintain security and data privacy.
  • Involved in many POC effort to help build new Hadoop clusters.
  • Test prototypes and oversee handover to operational teams.
  • Proposed best practices/standards.

Technology Lead

Confidential, Irvine, CA

Technologies used - Spark-SQL, Sqoop, Hive, DB2, Cassandra, Control-M, different file formats, Data bricks API’s and ETL (Informatica).

Responsibilities:

  • Developed high-level and detail-level specification document for the requirements
  • Responsible for managing the code delivery on time with quality.
  • Defined coding standards, created templates for database objects development.
  • Developed database objects to ETL the MNAO data to Historical data store and the MNAO custom data mart for reporting.
  • Developed Workflows using derived columns, condition split, aggregate and etc. to generate underlying data for the reports.
  • Developed ETL scripts to import data from disparate RDBMS into Hive, HDFS using different file formats
  • Developed CQL scripts to import data from disparate RDBMS into CASSANDRA tables.
  • Developed scala programs to apply business rules, data cleanse, transform the raw data into meaningful business information and uploaded it into Hive.
  • Developed Partitioning, Bucketing and used AVRO, JSON and Parquet format to optimize the storage and performance in Hive and developed complex hive queries to support business analysis.
  • Performed deployment in multiple environments, testing and monitoring the jobs using CONTROL-M. Project - Mazda Brand and Customer Experience Program. Mazda

Technology Lead

Confidential, Irvine, CA

Technologies used - Spark-Scala, MapReduce, Sqoop, Hive, HDFS, DB2, Control-M, different file formats, Data bricks API’s and Informatica.

Responsibilities:

  • Performed requirement gathering with business groups and transform into Business Requirement Document (BRDs).
  • Developed high-level and detail-level specification document for the requirements.
  • Designed data model and data integration specification.
  • Developed database objects like Tables, Views, Indexes, Synonyms, Sequences and procedures.
  • Created Informatica workflows to Extract, transform and load the batch data received from host system with detailed logging and exception.
  • Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
  • Importing and exporting data into HDFS and Hive using Sqoop. We have developed Sqoop job with incremental load to populate Hive External tables for delta data.
  • Worked with different file formats such as Text, Sequence files, Avro, ORC, JSON and Parquet
  • Created Informatica workflows to Extract, transform and load the batch data received from host system with detailed logging and exception.
  • Managed critical incidents/outages, incident alerting process, resolution/recovery to the incidents, RCA and incident closure.

Technology Analyst

Confidential, Torrance, CA

Technologies used - Hive, Sqoop, Pig, MapReduce, DB2, SOAP, HDFS, Oozie and ETL (Informatica).

Responsibilities:

  • Involved in source system analysis and documented the data structure and business definitions
  • Involved in the solution architecture and Design for data load and the migration of data to Hadoop
  • Developed Map Reduce and pig scripts to cleanse, transform the raw data into meaningful business information and uploaded it into Hive
  • Developed job scheduling using Oozie workflow to automate the data processing
  • Developed Partitioning, Bucketing and used ORC and Parquet format to optimize the storage and performance in Hive.
  • Developed complex hive queries to support business analysis.
  • Solved various performance problems in Hive, Pig and Scoop with deep understanding on joins, groups and aggregations getting converted into MapReduce.
  • Performed deployment in multiple environments, testing and monitoring the jobs
  • Supported user acceptance testing and resolved the defects/queries raised by business users
  • Performed code build and release activities with proper version controlling

Technology Analyst

Confidential, Torrance, CA

Technologies used - Cobol, JCL, VSAM, DB2, IMS-DB/DC, File-Aid, CA-7, Change man, Informatica power center

Responsibilities:

  • Responsible for managing the code delivery on time with quality and defined coding standards, created templates for database objects development
  • Developed database objects to ETL the Incentive data to Historical data store and the TMS custom data mart for reporting for CSR application.
  • Done the Impact analysis and research of the affected system. Technical support to resolve the abends and implements fixes for application problems.
  • Analysis and Understanding the requirements & functional specifications.
  • Preparation of TS (Technical Specifications) based on the existing functionality and requirements.
  • Created detailed technical design specification for enhancing the batch programs, care was taken to re-use most of the existing components/modules.
  • Knowledge Transfer to the offshore team with business functionality and the new requirements.
  • Development of the new batch programs, Debugging and implementing the system using IBM Debug tools.
  • Setting up Process to be followed from Offshore.
  • Perform Peer review and log any issues and track till it is closed.
  • Prepare turnover elements check list to make the turnover process is seamless.
  • Responsible for correct versioning of code by creating and moving the requests using Endevor.
  • Status Reporting and Responsible for the final delivery.

Programmer Analyst, Sep/2008 to June/2011

Confidential, Atlanta, GA

Technologies used - Cobol, JCL, VSAM, DB2, CICS, IMS-DB/DC, File-Aid, CA-7, MySQL, Datacom, Toad

Responsibilities:

  • Monitoring of application batch processes to ensure successful completion of jobs.
  • Testing programs using various testing tools like Interest, Xpeditor and debugging run time problems.
  • Prepared many automated tools to fix frequently faced ABENDS.
  • Creating JCL, CARDLIBS, COPY BOOKS and PROC for new programs.
  • Post Release Installation Verification and necessary correction.
  • 24x7 application production support, first point of contact.
  • Performed mentor role on product and processes to team members
  • Performed unit and Integration Testing

We'd love your feedback!