We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

5.00/5 (Submit Your Rating)

Chicago, IllinioS

SUMMARY

  • Over 7 years of professional IT experience which includes 3+ years of experience in Hadoop, Big data ecosystem related technologies.
  • In depth understanding/knowledge of Hadoop architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and MapReduce programming paradigm.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like HadoopMapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper, Flume.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Involved in all the phases of Software Development Life Cycle (SDLC): Requirements gathering, analysis, design, development, testing, production and post-production support.
  • Well versed with developing and implementing MapReduce programs for analyzing Big Data with different file formats like structured and unstructured data.
  • Procedural knowledge in cleansing and analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
  • Experience on Apache Hadoop technologies Hadoop distributed file system (HDFS), MapReduce framework, YARN, Pig, Hive, HCatalog, Sqoop, Flume.
  • Experience with developing large-scale distributed applications.
  • Experienced in writing custom UDFs and UDAFs for extending Hive and Pig functionalities.
  • Ability to develop Pig UDF'S to pre-process the data for analysis.
  • Led many Data Analysis & Integration efforts involving HADOOP along with ETL.
  • Good Knowledge on Hadoop Cluster administration, monitoring and managing Hadoop clusters using Cloudera Manager.
  • In-depth understanding of Data Structure and Algorithms.
  • Experience in managing and reviewing Hadoop log files.
  • Experience in NoSQL database HBase.
  • Experienced with Java API and REST to access HBase data.
  • Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2E E design patterns and Core Java design patterns.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Collected data from different sources like web servers and social media for storing in HDFS and analyzing the data using otherHadooptechnologies.
  • Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Having good experience on Core Java in implementing OOP concepts, Multithreading, Collections, Exception handling.
  • Experience in Java, JSP, Servlets, WebLogic,WebSphere, JDBC, XML, and HTML.
  • Experienced in Editors/IDEs like Eclipse IDE, NetBeans IDE etc.
  • Strong and effective problem-solving, analytical and interpersonal skills, besides being a valuable team player.

TECHNICAL SKILLS

  • Hadoop/Big Data
  • HDFS
  • Map Reduce
  • Hive
  • Pig
  • Sqoop
  • Flume
  • Oozie
  • ZooKeeper
  • HBase.
  •  Java
  • C/C++
  • VC++
  • Objective C.
  • Teradata
  • MS SQL Server
  • Oracle
  • PL/SQL
  • Informix
  • Sybase
  • Informatica
  • Datastage.
  • JAVA
  • J2EE
  • Spring
  • Hibernate EJB
  • Webservices
  • Servlets
  • JSP
  • Jakarta Struts.
  • JBoss
  • Tomcat.
  • UML
  • OOAD.
  • HTML
  • AJAX
  • CSS
  • XHTML
  • XML
  • XSL
  • XSLT
  • WSDL.
  • Junit
  • MRUnit
  • Ant
  • Maven
  • Log4j
  • FrontPage.
  • Eclipse
  • NetBeans.
  • Linux
  • UNIX
  • Windows.

PROFESSIONAL EXPERIENCE

Hadoop Consultant

Confidential, Chicago, Illinios

Responsibilities:

  • Worked on a 300 nodes Hadoop cluster running CDH4.4.
  • Worked with highly unstructured and semi structured data of 2 Petabytes in size.
  • Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Created and worked Sqoop(version 1.4.3) jobs with incremental load to populate Hive External tables.
  • Developed MapReduce(YARN) jobs for cleaning, accessing and validating the data.
  • Developed MapReduce pipeline jobs to process the data and create necessary HFiles and loading the HFiles into HBase for faster access without taking performance hit.
  • Written multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups.
  • Extensive experience in writing Pig (version 0.11) scripts to transform raw data from several data sources into forming baseline data.
  • Developed Hive (version 0.10) scripts for end user / analyst requirements to perform ad hoc analysis.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page- views, visit duration, most purchased product on website.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, network devices and moved to HDFS.
  • Used different file formats like Sequence files, Text Files and Avro.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process.
  • Worked on Cluster co-ordination services through Zookeeper.
  • Worked with the admin team in designing and upgrading CDH 3 to CDH 4.
  • Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
  • Involved in fixing issues arising out of duration testing.
  • Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.

Environment: Hadoop-2.4.1, Hive-0.10.0, Pig-0.11.1, Map Reduce, Sqoop-1.4.3, Zookeeper-3.4.5, Flume-1.2.0, Oozie-3.3.2, MySQL, DB2, Teradata, Linux, Eclipse Juno, JDK-1.7.21.

Hadoop Consultant

Confidential, NY

Responsibilities:

  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed MapReduce programs to parse the raw data and store the refined data in tables.
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created Hive tables, loaded data and wrote Hive queries dat run within the map.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Involved in fetching brands data from social media applications like Facebook, twitter.
  • Developed and updated social media analytics dashboards on regular basis.
  • Performed data mining investigations to find new insights related to customers.
  • Involved in forecast based on the present results and insights derived from data analysis.
  • Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
  • Manage and review Hadoop log files.
  • Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
  • Involved in identification of topics and trends and building context around dat brand.
  • Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.

Environment: Java, HBase, Hadoop, HDFS, Map Reduce, Hive, Sqoop, Flume, Oozie, Zookeeper, MySQL, and eclipse.

ETL Developer

Confidential, Minneapolis, MN

Responsibilities:

  • Developed ETL best practices and standards.
  • Design of source to target mapping (STM) document.
  • Worked extensively on Source Analyzer, Mapping Designer, Target Designer, Workflow Manager and Workflow Monitor.
  • Used various Transformations like Joiner, Aggregate, Java, Expression, Lookup, Filter, Union, Update Strategy, Stored Procedures, and Router etc. to implement the business logic.
  • Created Complex mappings using Connected and Unconnected Lookups, Aggregate, UpdateStrategy, Stored Procedure and Router transformationsfor populating target table in efficient manner.
  • Created Informatica mappings with PLSQL Procedures, Functions to build business rules to load data.
  • Developed SCD Type1 and SCD Type2 Mappings to track the Change Data Capture(CDC).
  • Prepared SDLC Document and conducted walk through while moving from Development to Test and Test to Integration Test environments and so forth.
  • Co-ordinate with testing team in providing explanations and resolutions to the observations and defects raised by the testers.
  • Involved in Performance Tuning of SQL Queries, Sources, Targets and sessions by identifying and rectifying performance bottlenecks.
  • Creating job setup document for job scheduling tool.
  • Worked on Unix shell scripts used in scheduling Informatica pre/post session operations.
  • Implemented different Tasks in workflows which included Session, Command, E-mail, Event-Wait etc.
  • Migrated the Code from Informatica Power center 8.6.1 to 9.1.0.
  • Involved in Folder Migrations from one environment to the other environments.
  • Performed extensive Unit Testing on the developed Mappings and was also involved in the documentation of Test Plans and testing with the users (UAT).
  • Extracted data from Flat files and Oracle and loaded them into Teradata.

Environment: Informtica Power Center 9.x, SQL SERVER 2000, Oracle 10g,TOAD, AUTOSYS, Business Objects XI, Teradata.

ETL Developer

Confidential, Battle Creek, MI

Responsibilities:

  • Extensively involved in extraction of data from Oracle, Flat files.
  • Design ofETLprocess using Informatica.
  • Developed various Mappings using Source Qualifier, Aggregator, Joiners, Lookups, Filters, Router and Update strategy.
  • Extensively worked with joiner functions like normal join, full outer join, master outer join, and detail outer join in the joiner transformation.
  • Used Update Strategy DD INSERT, DD DELETE, DD UPDATE, AND DD REJECT to insert, delete, update and reject the items based on the requirement.
  • Extensively worked with aggregate functions like Avg, Min, Max, First, Last, and Count in the Aggregator Transformation.
  • Extensively used SQL Override, Sorter, and Filter in the Source Qualifier Transformation.
  • Extensively used Mapping Variables, Mapping Parameters, and Parameter Files for the capturing delta loads.
  • Worked with various tasks like Session, E-Mail, Workflows, and Command.
  • Optimized various Mappings, Mapplets, Sessions, Sources and Target Databases
  • Unit testing performed on Mappings.
  • Developed simple & complex mappings using Informatica to load Dimension & Fact tables as per STAR Schema techniques.
  • Extensively used various transformations to load data into slowly changing dimensions (SCD).
  • Code review for other developers and prepared Production Release sheet.
  • Generated the standard reports - daily, weekly & monthly in excel format.

Environment: Informatica 7, SQL, PL/SQL, Oracle 8i and Flat files.

Informatica Developer

Confidential 

Responsibilities:

  • Analyzed projects requirements and designed specifications
  • Built dimension tables and fact tables
  • Worked extensively on Informatica development tools such as Source Analyzer, Data Warehouse Designer, Transformation Designer, Mapplet Designer and Mapping Designer
  • Used major components like Mappers and Streamers in Data Transformation Studio for conversion of XML files to other formats.
  • Used update strategy transformation to effectively migrate data from source to target
  • Designed mappings and mapplets using various transformations such as Lookup, Aggregator, Expression, Sequence Generator, Router, Filter and Update Strategy
  • Designed mappings using reusable transformations and mapplets
  • Involved in design, development and maintenance of catalogs, reports using different types of drill downs and multiple prompt selections

Environment: Informatica Power center 5.1, SOA, Business Objects, Oracle 8i, DB2, JAVA, J2EE, XML, XSL, Windows NT/2000 and UNIX.

We'd love your feedback!