Hadoop Consultant Resume Chicago, Illinios - Hire IT People

SUMMARY

Over 7 years of professional IT experience which includes 3+ years of experience in Hadoop, Big data ecosystem related technologies.
In depth understanding/knowledge of Hadoop architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and MapReduce programming paradigm.
Hands on experience in installing, configuring, and using Hadoop ecosystem components like HadoopMapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper, Flume.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
Involved in all the phases of Software Development Life Cycle (SDLC): Requirements gathering, analysis, design, development, testing, production and post-production support.
Well versed with developing and implementing MapReduce programs for analyzing Big Data with different file formats like structured and unstructured data.
Procedural knowledge in cleansing and analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
Experience on Apache Hadoop technologies Hadoop distributed file system (HDFS), MapReduce framework, YARN, Pig, Hive, HCatalog, Sqoop, Flume.
Experience with developing large-scale distributed applications.
Experienced in writing custom UDFs and UDAFs for extending Hive and Pig functionalities.
Ability to develop Pig UDF'S to pre-process the data for analysis.
Led many Data Analysis & Integration efforts involving HADOOP along with ETL.
Good Knowledge on Hadoop Cluster administration, monitoring and managing Hadoop clusters using Cloudera Manager.
In-depth understanding of Data Structure and Algorithms.
Experience in managing and reviewing Hadoop log files.
Experience in NoSQL database HBase.
Experienced with Java API and REST to access HBase data.
Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2E E design patterns and Core Java design patterns.
Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
Collected data from different sources like web servers and social media for storing in HDFS and analyzing the data using otherHadooptechnologies.
Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
Having good experience on Core Java in implementing OOP concepts, Multithreading, Collections, Exception handling.
Experience in Java, JSP, Servlets, WebLogic,WebSphere, JDBC, XML, and HTML.
Experienced in Editors/IDEs like Eclipse IDE, NetBeans IDE etc.
Strong and effective problem-solving, analytical and interpersonal skills, besides being a valuable team player.

TECHNICAL SKILLS

Hadoop/Big Data
HDFS
Map Reduce
Hive
Pig
Sqoop
Flume
Oozie
ZooKeeper
HBase.
Java
C/C++
VC++
Objective C.
Teradata
MS SQL Server
Oracle
PL/SQL
Informix
Sybase
Informatica
Datastage.
JAVA
J2EE
Spring
Hibernate EJB
Webservices
Servlets
JSP
Jakarta Struts.
JBoss
Tomcat.
UML
OOAD.
HTML
AJAX
CSS
XHTML
XML
XSL
XSLT
WSDL.
Junit
MRUnit
Ant
Maven
Log4j
FrontPage.
Eclipse
NetBeans.
Linux
UNIX
Windows.

PROFESSIONAL EXPERIENCE

Hadoop Consultant

Confidential, Chicago, Illinios

Responsibilities:

Worked on a 300 nodes Hadoop cluster running CDH4.4.
Worked with highly unstructured and semi structured data of 2 Petabytes in size.
Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
Extracted the data from Teradata into HDFS using Sqoop.
Created and worked Sqoop(version 1.4.3) jobs with incremental load to populate Hive External tables.
Developed MapReduce(YARN) jobs for cleaning, accessing and validating the data.
Developed MapReduce pipeline jobs to process the data and create necessary HFiles and loading the HFiles into HBase for faster access without taking performance hit.
Written multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups.
Extensive experience in writing Pig (version 0.11) scripts to transform raw data from several data sources into forming baseline data.
Developed Hive (version 0.10) scripts for end user / analyst requirements to perform ad hoc analysis.
Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page- views, visit duration, most purchased product on website.
Used Flume to collect, aggregate, and store the web log data from different sources like web servers, network devices and moved to HDFS.
Used different file formats like Sequence files, Text Files and Avro.
Developed Oozie workflow for scheduling and orchestrating the ETL process.
Worked on Cluster co-ordination services through Zookeeper.
Worked with the admin team in designing and upgrading CDH 3 to CDH 4.
Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
Involved in fixing issues arising out of duration testing.
Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.

Environment: Hadoop-2.4.1, Hive-0.10.0, Pig-0.11.1, Map Reduce, Sqoop-1.4.3, Zookeeper-3.4.5, Flume-1.2.0, Oozie-3.3.2, MySQL, DB2, Teradata, Linux, Eclipse Juno, JDK-1.7.21.

Hadoop Consultant

Confidential, NY

Responsibilities:

Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Developed MapReduce programs to parse the raw data and store the refined data in tables.
Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Created Hive tables, loaded data and wrote Hive queries dat run within the map.
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
Involved in fetching brands data from social media applications like Facebook, twitter.
Developed and updated social media analytics dashboards on regular basis.
Performed data mining investigations to find new insights related to customers.
Involved in forecast based on the present results and insights derived from data analysis.
Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
Manage and review Hadoop log files.
Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
Involved in identification of topics and trends and building context around dat brand.
Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.

Environment: Java, HBase, Hadoop, HDFS, Map Reduce, Hive, Sqoop, Flume, Oozie, Zookeeper, MySQL, and eclipse.

ETL Developer

Confidential, Minneapolis, MN

Responsibilities:

Developed ETL best practices and standards.
Design of source to target mapping (STM) document.
Worked extensively on Source Analyzer, Mapping Designer, Target Designer, Workflow Manager and Workflow Monitor.
Used various Transformations like Joiner, Aggregate, Java, Expression, Lookup, Filter, Union, Update Strategy, Stored Procedures, and Router etc. to implement the business logic.
Created Complex mappings using Connected and Unconnected Lookups, Aggregate, UpdateStrategy, Stored Procedure and Router transformationsfor populating target table in efficient manner.
Created Informatica mappings with PLSQL Procedures, Functions to build business rules to load data.
Developed SCD Type1 and SCD Type2 Mappings to track the Change Data Capture(CDC).
Prepared SDLC Document and conducted walk through while moving from Development to Test and Test to Integration Test environments and so forth.
Co-ordinate with testing team in providing explanations and resolutions to the observations and defects raised by the testers.
Involved in Performance Tuning of SQL Queries, Sources, Targets and sessions by identifying and rectifying performance bottlenecks.
Creating job setup document for job scheduling tool.
Worked on Unix shell scripts used in scheduling Informatica pre/post session operations.
Implemented different Tasks in workflows which included Session, Command, E-mail, Event-Wait etc.
Migrated the Code from Informatica Power center 8.6.1 to 9.1.0.
Involved in Folder Migrations from one environment to the other environments.
Performed extensive Unit Testing on the developed Mappings and was also involved in the documentation of Test Plans and testing with the users (UAT).
Extracted data from Flat files and Oracle and loaded them into Teradata.

Environment: Informtica Power Center 9.x, SQL SERVER 2000, Oracle 10g,TOAD, AUTOSYS, Business Objects XI, Teradata.

ETL Developer

Confidential, Battle Creek, MI

Responsibilities:

Extensively involved in extraction of data from Oracle, Flat files.
Design ofETLprocess using Informatica.
Developed various Mappings using Source Qualifier, Aggregator, Joiners, Lookups, Filters, Router and Update strategy.
Extensively worked with joiner functions like normal join, full outer join, master outer join, and detail outer join in the joiner transformation.
Used Update Strategy DD INSERT, DD DELETE, DD UPDATE, AND DD REJECT to insert, delete, update and reject the items based on the requirement.
Extensively worked with aggregate functions like Avg, Min, Max, First, Last, and Count in the Aggregator Transformation.
Extensively used SQL Override, Sorter, and Filter in the Source Qualifier Transformation.
Extensively used Mapping Variables, Mapping Parameters, and Parameter Files for the capturing delta loads.
Worked with various tasks like Session, E-Mail, Workflows, and Command.
Optimized various Mappings, Mapplets, Sessions, Sources and Target Databases
Unit testing performed on Mappings.
Developed simple & complex mappings using Informatica to load Dimension & Fact tables as per STAR Schema techniques.
Extensively used various transformations to load data into slowly changing dimensions (SCD).
Code review for other developers and prepared Production Release sheet.
Generated the standard reports - daily, weekly & monthly in excel format.

Environment: Informatica 7, SQL, PL/SQL, Oracle 8i and Flat files.

Informatica Developer

Confidential

Responsibilities:

Analyzed projects requirements and designed specifications
Built dimension tables and fact tables
Worked extensively on Informatica development tools such as Source Analyzer, Data Warehouse Designer, Transformation Designer, Mapplet Designer and Mapping Designer
Used major components like Mappers and Streamers in Data Transformation Studio for conversion of XML files to other formats.
Used update strategy transformation to effectively migrate data from source to target
Designed mappings and mapplets using various transformations such as Lookup, Aggregator, Expression, Sequence Generator, Router, Filter and Update Strategy
Designed mappings using reusable transformations and mapplets
Involved in design, development and maintenance of catalogs, reports using different types of drill downs and multiple prompt selections

Environment: Informatica Power center 5.1, SOA, Business Objects, Oracle 8i, DB2, JAVA, J2EE, XML, XSL, Windows NT/2000 and UNIX.

We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

Chicago, IllinioS

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship