We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

3.00/5 (Submit Your Rating)

San Francisco, CA

SUMMARY

  • 7+ years of IT experience along with 3 years of Big Data/Hadoop experience.
  • Experienced Big Data Engineer with good knowledge of Hadoop Distributed File System and Eco System (Map Reduce, Pig, Hive, Sqoop & Cloudera Manager)
  • Working knowledge of Hadoop Distributed File System and Map Reduce.
  • Well versed with different layers of Hadoop Framework - Storage layer (HDFS), Analysis Layer (Pig and Hive), Engineering Layer (Jobs and Workflows).
  • Proficient in analysing data using Pig Latin and Hive QL.
  • Knowledge of Spark, Scala, In memory computation.
  • Knowledge of Python.
  • Hands on knowledge in job/workflow scheduling and monitoring tools like Oozie & Zookeeper.
  • Experienced in ETL, Importing & exporting data from existing databases that provide SQL interfaces using ETL tool Sqoop, Informatica.
  • Experienced in working with different data sources like Flat files, Spreadsheet files, log files and Databases.
  • Knowledge of monitoring and managing Hadoop cluster using CDH4 Cloudera Manager.
  • Expertise with managing and reviewing Hadoop log files.
  • Good working experience in Hadoop Data Pipeline process and Data Warehouse.
  • Background with traditional databases such as Oracle, SQL Server, MySQL and Teradata.
  • Knowledge of NoSQL databases such as HBase & Cassandra.
  • Background with traditional databases such as Oracle, SQL Server, and ETL tools / processes. Expertise in Amazon AWS concepts like EMR&EC2 web services which provides fast and efficient processing of Big Data.
  • Working knowledge of ETL tools like Informatica/Powercentre, RDBMS/Oracle 9i, SDLC, QA/UAT & Technical documentation.
  • Production support experience.
  • Involved in data collection, data analysis, data security, methodologies and designs. Conducted research to collect and assemble data for databases - Was responsible for design/development of relational databases for collecting data.
  • Worked in the Research division for India’s premier Technology Institute (Birla Institute of Technology).
  • Have published internationally, Technical papers & Journals on numerous facets of IT.

TECHNICAL SKILLS

BigData/ Hadoop Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Zookeeper, Flume, HBase, Cassandra, Splunk, Amazon Redshift, Spark, Scala

Databases/RDBMS: Oracle 9i/10g, Oracle 9i, SQL Server 2005, UNIX, MySQL, Teradata

Languages: Pig Latin, Java, C++,Shell Script

ETL, Reporting Tools: Informatica Powercentre 8.1., Tableau, Denodo.

Operating Systems: Windows XP/8, Ubuntu, Linux, Unix

Web Technologies: JSP, XML, VMWare, Amazon AWS

Front-End: HTML/HTML 5, JavaScript

Project Skills: Waterfall, MS Project, AGILE-SCRUM

Library Software/s: LibSys, Koha, DSpace, WordPress, Drupal, MediaWiki

Stastical Analysis Tool: SPSS 18.0

PROFESSIONAL SUMMARY

Confidential, San Francisco, CA

Hadoop Engineer

Responsibilities:

  • Participated in Gathering requirements, analyse requirements and design technical documents for business requirements.
  • Worked autonomously within a team of Data Analysts, to analyse, review, update, edit, clean, translate, and ensure accuracy of customer data.
  • Involved in Data Pipeline and ETL process and Testing.
  • Involved different phases in big data projects like data acquiring, data processing, data monitoring and data serving using dash boards.
  • Import/export data from Oracle data base to/from HDFS using Sqoop and JDBC.
  • Created partitions, bucketing across state in Hive to handle structured data.
  • Implemented Dash boards that handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
  • Implemented business logic based on state in Hive using Generic UDF's.
  • Created production jobs using Oozie work flows that integrated different actions like MapReduce, Sqoop, and Hive.
  • Experience in managing and reviewing HadoopLog files.
  • Design and Built database in HADOOP HIVE and Redshift.
  • Design Performance monitoring and metadata management database in AMAZON RDS.
  • Automate data processing using UNIX shell scripts and Ooziee.
  • Used Splunk for capturing logs.
  • Involved in deployment and production support.
  • Worked on Jenkins for automation deployment process.

Environment: CDH4, Hadoop, Informatica, AWS EMR, Hive, Sqoop, Java, MySQL 4.x, Ubuntu, Shell Script, Splunk, Jenkins.

Confidential, San Jose, CA

Hadoop Developer

Responsibilities:

  • Worked in a team that built big data analytic solutions using Cloudera Hadoop Distribution.
  • Disparate dataset: Worked with huge explicit and implicit dataset of consumer provided search behavior for spotting trends. It helped Confidential consumers with intelligent home search experience and agents to make valuable connections with buyers.
  • Developed MapReduce programs against MySQL data, csv and text files.
  • Data operation: Operated on data stored in HDFS and other MySQL or NoSQL data stores, in both batch-oriented and ad-hoc contexts.
  • Developed workflow in Oozie to automate tasks of loading the data into HDFS & pre-processing with Pig.
  • Experience with loading large data sets to Hive and writing software accessing Hive data.
  • Used scripting languages like Pig to manipulate and filter data.
  • Develop reports, dashboards using Tableau 8.3 for quick reviews to be presented to Business and IT users.
  • Developed POCs by building reports and dashboards using Tableau 8.3 to demonstrate quick wins.
  • Work with end users, business and IT users to understand, elicit and analyze reporting requirements.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, Flume, Java, MySQL 4.x, Tableau Ubuntu

Confidential, Oakland, CA

BigData Consultant

Responsibilities:

  • Exported data from MySQL to HDFS using Sqoop and NFS mounts.
  • Validated and cleaned data before performing analysis.
  • Developed Hive scripts to de-normalize and aggregate data.
  • UDF usage: Developed product profiles using Pig UDFs from the PiggyBank library.
  • Automated workflows using shell scripts to pull data from various databases into Hadoop.
  • Implemented external tables, dynamic partitions using Hive.
  • Data Formats: Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.

Environment: Apache Hadoop, Hive, Pig, MySQL, Java (MapReduce), Flume, JDBC, Amazon AWS

Confidential, Palo Alto, CA

BigData Engineer

Responsibilities:

  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
  • Experience in Importing and exporting data into HDFS and Hive using Sqoop.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Responsible for managing data coming from different sources.
  • Gained good experience with NOSQL database.
  • Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
  • Involved in creating tables, partitioning, bucketing of table.
  • Good understanding and related experience with Hadoop stack-internals, Hive, Pig and Map/Reduce.

Environment: Core Java, MS Excel 2007, Oracle, Apache Hadoop, Pig, Hive, Map-reduce, Sqoop, JAVA/J2EE, WINDOWS.

ETL Developer

Confidential

Responsibilities:

  • Developed mappings in Informatica Power Center 6.1that catered to the Extraction, Transformation, and Loading from various source systems to target systems.
  • Creating the Mappings and Workflows.
  • Used Informatica tool to handle complex Mappings and extensively used the various Transformations like Source Qualifier, Aggregators, Lookups, Filters, Update Strategy, Expression, Sequence generator and Sorter etc.
  • Extensively used workflow manager to create tasks and workflows.
  • Rule based Data cleansing, data conversion & process implementation
  • Used PL/SQL with Informatica Power Center 6.1for Oracle 8i database.
  • Developed mapping using Informatica Power Center Designer to bulk load data from Oracle and Flat file source system to target database.
  • Used Informatica tool to handle complex mappings and extensively used various Transformations including Source Qualifier, Aggregators, Lookups, Filters, Update Strategy, Expression, Sequence generator and Sorter.
  • Teradata, Informatica Power Center 6.1, Flat files, Oracle 8i, BO 5, Windows 2000

Information Officer

Confidential

Responsibilities:

  • Managed & maintained inventory & library tools like Web 2.0, DSPACE, DELNET, LibSys etc.
  • Using ICT for acquisition, cataloging, circulation, serials management and reference
  • Library automation & replacing manual processes &systems.
  • Created High Level design documents & collaborated with cross-functional teams for aligning technical direction and approac.
  • Train users on systems automating - cataloguing, circulation, acquisition, serial control, interlibrary loan (ILL), and Web OPAC
  • Providing Technical assistance to the Research division.
  • Taking Classes on “Digital Library & Multimedia” for MIS Students.
  • Have published internationally, Technical papers & Journals on numerous facets of IT.
  • Wrote test cases, identified bugs & logged defects.

We'd love your feedback!