Hadoop Consultant Resume
Chicago, IllinioS
SUMMARY
- Over 7 years of professional IT experience which includes 3+ years of experience in Hadoop, Big data ecosystem related technologies.
- In depth understanding/knowledge of Hadoop architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and MapReduce programming paradigm.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like HadoopMapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper, Flume.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Involved in all the phases of Software Development Life Cycle (SDLC): Requirements gathering, analysis, design, development, testing, production and post-production support.
- Well versed with developing and implementing MapReduce programs for analyzing Big Data with different file formats like structured and unstructured data.
- Procedural knowledge in cleansing and analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
- Experience on Apache Hadoop technologies Hadoop distributed file system (HDFS), MapReduce framework, YARN, Pig, Hive, HCatalog, Sqoop, Flume.
- Experience with developing large-scale distributed applications.
- Experienced in writing custom UDFs and UDAFs for extending Hive and Pig functionalities.
- Ability to develop Pig UDF'S to pre-process the data for analysis.
- Led many Data Analysis & Integration efforts involving HADOOP along with ETL.
- Good Knowledge on Hadoop Cluster administration, monitoring and managing Hadoop clusters using Cloudera Manager.
- In-depth understanding of Data Structure and Algorithms.
- Experience in managing and reviewing Hadoop log files.
- Experience in NoSQL database HBase.
- Experienced with Java API and REST to access HBase data.
- Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2E E design patterns and Core Java design patterns.
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Collected data from different sources like web servers and social media for storing in HDFS and analyzing the data using otherHadooptechnologies.
- Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Having good experience on Core Java in implementing OOP concepts, Multithreading, Collections, Exception handling.
- Experience in Java, JSP, Servlets, WebLogic,WebSphere, JDBC, XML, and HTML.
- Experienced in Editors/IDEs like Eclipse IDE, NetBeans IDE etc.
- Strong and effective problem-solving, analytical and interpersonal skills, besides being a valuable team player.
TECHNICAL SKILLS
- Hadoop/Big Data
- HDFS
- Map Reduce
- Hive
- Pig
- Sqoop
- Flume
- Oozie
- ZooKeeper
- HBase.
- Java
- C/C++
- VC++
- Objective C.
- Teradata
- MS SQL Server
- Oracle
- PL/SQL
- Informix
- Sybase
- Informatica
- Datastage.
- JAVA
- J2EE
- Spring
- Hibernate EJB
- Webservices
- Servlets
- JSP
- Jakarta Struts.
- JBoss
- Tomcat.
- UML
- OOAD.
- HTML
- AJAX
- CSS
- XHTML
- XML
- XSL
- XSLT
- WSDL.
- Junit
- MRUnit
- Ant
- Maven
- Log4j
- FrontPage.
- Eclipse
- NetBeans.
- Linux
- UNIX
- Windows.
PROFESSIONAL EXPERIENCE
Hadoop Consultant
Confidential, Chicago, Illinios
Responsibilities:
- Worked on a 300 nodes Hadoop cluster running CDH4.4.
- Worked with highly unstructured and semi structured data of 2 Petabytes in size.
- Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Extracted the data from Teradata into HDFS using Sqoop.
- Created and worked Sqoop(version 1.4.3) jobs with incremental load to populate Hive External tables.
- Developed MapReduce(YARN) jobs for cleaning, accessing and validating the data.
- Developed MapReduce pipeline jobs to process the data and create necessary HFiles and loading the HFiles into HBase for faster access without taking performance hit.
- Written multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups.
- Extensive experience in writing Pig (version 0.11) scripts to transform raw data from several data sources into forming baseline data.
- Developed Hive (version 0.10) scripts for end user / analyst requirements to perform ad hoc analysis.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page- views, visit duration, most purchased product on website.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, network devices and moved to HDFS.
- Used different file formats like Sequence files, Text Files and Avro.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Worked on Cluster co-ordination services through Zookeeper.
- Worked with the admin team in designing and upgrading CDH 3 to CDH 4.
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
- Involved in fixing issues arising out of duration testing.
- Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
Environment: Hadoop-2.4.1, Hive-0.10.0, Pig-0.11.1, Map Reduce, Sqoop-1.4.3, Zookeeper-3.4.5, Flume-1.2.0, Oozie-3.3.2, MySQL, DB2, Teradata, Linux, Eclipse Juno, JDK-1.7.21.
Hadoop Consultant
Confidential, NY
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed MapReduce programs to parse the raw data and store the refined data in tables.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Created Hive tables, loaded data and wrote Hive queries dat run within the map.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Involved in fetching brands data from social media applications like Facebook, twitter.
- Developed and updated social media analytics dashboards on regular basis.
- Performed data mining investigations to find new insights related to customers.
- Involved in forecast based on the present results and insights derived from data analysis.
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
- Manage and review Hadoop log files.
- Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
- Involved in identification of topics and trends and building context around dat brand.
- Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.
Environment: Java, HBase, Hadoop, HDFS, Map Reduce, Hive, Sqoop, Flume, Oozie, Zookeeper, MySQL, and eclipse.
ETL Developer
Confidential, Minneapolis, MN
Responsibilities:
- Developed ETL best practices and standards.
- Design of source to target mapping (STM) document.
- Worked extensively on Source Analyzer, Mapping Designer, Target Designer, Workflow Manager and Workflow Monitor.
- Used various Transformations like Joiner, Aggregate, Java, Expression, Lookup, Filter, Union, Update Strategy, Stored Procedures, and Router etc. to implement the business logic.
- Created Complex mappings using Connected and Unconnected Lookups, Aggregate, UpdateStrategy, Stored Procedure and Router transformationsfor populating target table in efficient manner.
- Created Informatica mappings with PLSQL Procedures, Functions to build business rules to load data.
- Developed SCD Type1 and SCD Type2 Mappings to track the Change Data Capture(CDC).
- Prepared SDLC Document and conducted walk through while moving from Development to Test and Test to Integration Test environments and so forth.
- Co-ordinate with testing team in providing explanations and resolutions to the observations and defects raised by the testers.
- Involved in Performance Tuning of SQL Queries, Sources, Targets and sessions by identifying and rectifying performance bottlenecks.
- Creating job setup document for job scheduling tool.
- Worked on Unix shell scripts used in scheduling Informatica pre/post session operations.
- Implemented different Tasks in workflows which included Session, Command, E-mail, Event-Wait etc.
- Migrated the Code from Informatica Power center 8.6.1 to 9.1.0.
- Involved in Folder Migrations from one environment to the other environments.
- Performed extensive Unit Testing on the developed Mappings and was also involved in the documentation of Test Plans and testing with the users (UAT).
- Extracted data from Flat files and Oracle and loaded them into Teradata.
Environment: Informtica Power Center 9.x, SQL SERVER 2000, Oracle 10g,TOAD, AUTOSYS, Business Objects XI, Teradata.
ETL Developer
Confidential, Battle Creek, MI
Responsibilities:
- Extensively involved in extraction of data from Oracle, Flat files.
- Design ofETLprocess using Informatica.
- Developed various Mappings using Source Qualifier, Aggregator, Joiners, Lookups, Filters, Router and Update strategy.
- Extensively worked with joiner functions like normal join, full outer join, master outer join, and detail outer join in the joiner transformation.
- Used Update Strategy DD INSERT, DD DELETE, DD UPDATE, AND DD REJECT to insert, delete, update and reject the items based on the requirement.
- Extensively worked with aggregate functions like Avg, Min, Max, First, Last, and Count in the Aggregator Transformation.
- Extensively used SQL Override, Sorter, and Filter in the Source Qualifier Transformation.
- Extensively used Mapping Variables, Mapping Parameters, and Parameter Files for the capturing delta loads.
- Worked with various tasks like Session, E-Mail, Workflows, and Command.
- Optimized various Mappings, Mapplets, Sessions, Sources and Target Databases
- Unit testing performed on Mappings.
- Developed simple & complex mappings using Informatica to load Dimension & Fact tables as per STAR Schema techniques.
- Extensively used various transformations to load data into slowly changing dimensions (SCD).
- Code review for other developers and prepared Production Release sheet.
- Generated the standard reports - daily, weekly & monthly in excel format.
Environment: Informatica 7, SQL, PL/SQL, Oracle 8i and Flat files.
Informatica Developer
Confidential
Responsibilities:
- Analyzed projects requirements and designed specifications
- Built dimension tables and fact tables
- Worked extensively on Informatica development tools such as Source Analyzer, Data Warehouse Designer, Transformation Designer, Mapplet Designer and Mapping Designer
- Used major components like Mappers and Streamers in Data Transformation Studio for conversion of XML files to other formats.
- Used update strategy transformation to effectively migrate data from source to target
- Designed mappings and mapplets using various transformations such as Lookup, Aggregator, Expression, Sequence Generator, Router, Filter and Update Strategy
- Designed mappings using reusable transformations and mapplets
- Involved in design, development and maintenance of catalogs, reports using different types of drill downs and multiple prompt selections
Environment: Informatica Power center 5.1, SOA, Business Objects, Oracle 8i, DB2, JAVA, J2EE, XML, XSL, Windows NT/2000 and UNIX.