We provide IT Staff Augmentation Services!

Hadoop Developer/admin Resume

4.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY

  • A Qualified IT Professional with 9+ years of experience including 5+ years of experience as a Hadoop Developer .In depth understanding/knowledge of Hadoop architecture and its components such as Hadoop, HDFS, MapReduce and Hadoop Ecosystem.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper, Flume.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Involved in all the phases of Software Development Life Cycle (SDLC): Requirements gathering, analysis,design, development, testing, production and post-production support.
  • Well versed with developing and implementing MapReduce programs for analysing Big Data with different file formats.
  • Procedural knowledge in cleansing and analysing data using HiveQL, Pig Latin, and custom MapReduce programs in Java. Expertise in setting, configuring & monitoring ofHadoop cluster using Cloudera CDH3, CDH4, Apache tar balls&Hortonworks Ambari on Ubuntu, Redhat, Centos&Windows
  • Experience on Apache Hadoop technologies Hadoop distributed file system (HDFS), MapReduce framework, Pig, Hive, Sqoop, Flume,YARN.
  • Experience with developing large-scale distributed applications.
  • Experienced in writing custom UDFs and UDAFs for extending Hive and Pig functionalities.
  • Ability to develop Pig UDF'S to pre-process the data for analysis.
  • Good understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra,Strom,Spark,Kafka.
  • Led many Data Analysis & Integration efforts involving HADOOP along with ETL.
  • Good Knowledge on Hadoop Cluster administration, monitoring and managing Hadoop clusters using Cloudera Manager. Extensive experience with relational databases Oracle 10g/9i, DB2, SQL Server 2005/2000 and MS Access. Hands on experience in defining Technical Requirements for ETL processes and developing Complex Mappings to load data into enterprise data warehouse and data marts.
  • Hands on experience in implementing Data Warehouse Methodologies like Star schema, Snow Flake schema, Slowly Changing Dimensions, Change Data Capture and Incremental Aggregation etc.
  • Knowledge in designing Dimensional Models, Physical and Logical Data Models by using Erwin and MS Visio.
  • Experience in creating Detail Design documents, Mapping Specifications documents and Test Cases documents.
  • Extensive expertise using various Performance Tuning Techniques on Sources, Targets, Mappings and Workflows using Partitions/Parallelization and eliminating Cache Intensive Transformations.
  • Hands on experience with Informatica Administration performing tasks like creating user accounts, creating folders, setting privileges to users and code migrations etc.

TECHNICAL SKILLS

Big Data (Hadoop Framework): HDFS, Map Reduce, Pig, Hive, Oozie, Zookeeper, HBase and Sqoop.

Databases: My SQL, Hbase, MongoDB/Cassandra

Languages: SQL, JAVA, Pig Latin,Mapreduce

Development Tools: Eclipse, Toad, My SQL

Web Technologies: XML, VMWare

Office Tools: Microsoft office suite

Project Management: Microsoft Projects

Operating Systems: Windows 8,Windows 7, UNIX, Linux, CentOS, Ubuntu

PROFESSIONAL EXPERIENCE

Confidential, Dallas, TX

Hadoop Developer/Admin

Responsibilities:

  • Responsible for architecting Hadoop clusters with CDH3
  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop, NameNode, JobTracker, TaskTrackers and DataNodes
  • Installed and configured Hadoop ecosystem like HBase, Flume, Pig, Hive, Oozieand Sqoop
  • Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data
  • Parsing various format of files like XML, JSON format files and load into Oracle database with Python XML
  • Managed and reviewed Hadoop Log files
  • Worked extensively with Sqoop for importing metadata from Oracle
  • Responsible for smooth error-free configuration of DWH-ETL solution and Integration with Hadoop
  • Designed a data warehouse using Hive
  • Created partitioned tables in Hive
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, MySQL, Ubuntu, Java 1.6,Python,Apache NiFi

Confidential, NJ

Hadoop Developer/Admin

Responsibilities:

  • Gathered the business requirements from the Business Partners and Subject Matter Experts
  • Involved in installing Hadoop Ecosystem components
  • Responsible to manage data coming from different sources
  • Supported Map Reduce Programs those are running on the cluster
  • Involved in HDFS maintenance and loading of structured and unstructured data
  • Installed and configured Pig, Hive and Sqoop
  • Wrote MapReduce job using Pig Latin
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis
  • Developed Scripts and Batch Job to schedule various Hadoop Program
  • Wrote Hive queries for data analysis to meet the business requirements
  • Created Hive tables and working on them using HiveQL.
  • Utilized PyUnit, thePythonunit test framework, for allPythonapplications.

Environment: Java, Hadoop, MapReduce, HDFS,Sqoop, Hive, Java, Pig, Linux, MySQL, and Ubuntu.

Confidential

Hadoop Developer

Responsibilities:

  • Hands on experience doing POC’s in Hadoop with 2 months including in this project
  • Experienced on major Hadoop ecosystem's projects such as PIG, HIVE, HBASE and monitoring them with Cloudera Manager.
  • Hands on experience working on NoSQL databases including HBase and its integration with Hadoop cluster.
  • Integrated Hadoop with Sqoop
  • Implemented Hive tables and HQL Queries for the reports.
  • Developed Simple to complex Map/reduce Jobs using Hive and Pig
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Developed and involved in the industry specific UDF (user defined functions)
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Developed Hive queries to process the data for visualizingLoading log data directly into HDFS using Flume.
  • Experienced in managing and reviewing Hadoop log files.

Environment: Hadoop, MapReduce, Hive, HBase, Flume, Pig, Zookeeper, Java, ET,SQLand CentOS.

Confidential

ETL Developer

Responsibilities:

  • Participated in build of the Data Warehouse which includes the Design of Data mart using Star Schema
  • Created Dimension Tables and Fact Tables based on the warehouse design
  • Extensively worked on Informatica to extract data from flat files and Oracle, and to load the data into the target database.
  • Used Informatica power center for (ETL) extraction, transformation and loading data from heterogeneous source systems.
  • Imported Source/Target Tables from the respective databases and created reusable transformations (Joiner, Router, Lookup, Rank, Filter, Expression, and Aggregator) inside a Mapplet and created new mappings using Designer module of Informatica.
  • Extensively worked on the performance tuning of the Informatica Power Center Mappings as well as the tuning of the sessions
  • Created Stored Procedures for data transformation purpose
  • Created worklets to run several sessions sequentially and concurrently
  • Extensively used informatica to load data from wide range of sources such as Oracle, SQL Server Sources, Teradata, and Flat Files to DB2 Database.
  • Developed schedules to automate the updated processes and Informatica Sessions/Work lets
  • Extensively used Sorter, Aggregator transformations to improve the performance Confidential mapping level.
  • Created and used the Normalizer Transformation to normalize the flat files in the source data.
  • Extensively used theSlowly Changing Dimensions-Type II in various mappings.
  • Used debugger to test the data flow and fix the mappings.
  • Created the Source and Target Definitions in Informatica Power Center Designer
  • Created Reusable Transformations to use in Multiple Mappings
  • Created Sessions using Work Flow Manager and Monitored the performance of the session through Gantt chart, Task View in Work Flow Monitor.
  • Wrote Stored Procedures using PL / SQL for Incremental updates
  • Integrated various sources in to the Staging area in Data warehouse to Integrating and Cleansing data.
  • Used UNIX commands (Vi Editor) to perform the DB2 load operations.
  • Extensively implemented data conversions while loading the flat files with respect to DB2 file format.

Environment: InformaticaPowerCenter 8.1,, XML, Teradata, Workflow Manager, Informatica Power Connect, Workflow Monitor, ERWIN, windows 2000,Oracle 9i, PL/SQL Developer, Business objects 6.5, Toad, UNIX, SQL Loader, Microsoft Excel

Confidential

ETL Developer

Responsibilities:

  • Responsible for Business Analysis and Requirements Collection.
  • Documented data conversion, integration, load and verification specifications.
  • Parsing high-level design spec to simple ETL coding and mapping standards.
  • Maintained warehouse metadata, naming standards and warehouse standards for future application development.
  • Extensively used ETL to load data from wide range of sources such as flat files, and Oracle to XML Documents.Involved in Informatica administrative work such as creating Informatica folders, repositories and managing folder permissions.
  • Worked with ETL Developers and SQL Server DBA Team
  • Collected performance data for sessions and performance tuned by adjusting Informatica session parameters.
  • Used XML schema to extract data from Oracle, Teradata into XML using Export Option in Informatica.
  • Created pre-session and post-session shell scripts and mail-notifications.
  • Developed Shell scripts using UNIX to drop and recreate indexes and key constraints.
  • Used TOAD to develop oracle PL/SQL Stored Procedures.
  • Extensively worked on the Informatica Designer, Repository Manager, Repository Server, Workflow Manager/Server Manager and Workflow Monitor.
  • Created Workflows containing command, email, session, decision and a wide variety of tasks to load the data into Target database.
  • Scheduled batch and sessions within Informatica using Informatica scheduler and also wrote shell scripts for job scheduling.

Environment: InformaticaPowerCenter 8.1,, XML, Teradata, Workflow Manager, Informatica Power Connect, Workflow Monitor, windows 2000,Oracle 9i, PL/SQL Developer,, Toad, UNIX, SQL Loader

We'd love your feedback!