We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

Moline, IL

SUMMARY

  • Around 9 years of IT experience in Data Warehousing, JAVA and Hadoop framework.
  • Around 3 years of work experience as Hadoop Developer with good knowledge of Hadoop framework, Hadoop Distributed file system and Parallel processing implementation.
  • Hands on experience in installing, configuring and using ecosystem components like Hadoop Map Reduce, Hive, Sqoop, Pig, HDFS, HBase, Cassandra, Zookeeper, Oozie, and Flume.
  • Excellent knowledge on Hadoop architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary NameNode and MapReduce.
  • Strong experience in writing MapReduce programs for Data Analysis and experience in writing Map Reduce codes in JAVA as per business requirements.
  • Good experience in writing PIG scripts and Hive Queries for processing and analyzing large volumes of data.
  • Experience in Importing and exporting data from different databases like MySQL, Oracle, Netezza, Teradata, DB2 into HDFS using Sqoop..
  • Experience in developing and scheduling ETL workflows in Hadoop using Oozie and Zookeeper.
  • Experience in optimization of MapReduce algorithm using Combiners and Partitioners to deliver best results.
  • Excellent understanding and knowledge of NOSQL databases like Mongo DB, HBase, and Cassandra.
  • Experience in analyzing MongoDB database and compare it with other open - source NoSQL databases to find which one of them better suites the current requirements.
  • Good experience with distributed systems, large scale non-relational data stores, RDBMS, NoSQL MapReduce systems, data modeling, database performance, and multi-terabyte data warehouses.
  • Highly proficient in Extract, Transform and Load the data into target systems using DataStage.
  • Solid experience in developing complex mappings using DataStage transformations.
  • Experience in managing and reviewing Hadoop log files.
  • Experienced in writing complex shell scripts and schedule them using CRON to run on recurring basis.
  • Hands on experience in application development using JAVA, RDBMS and Linux shell scripting.
  • Experience in writing SQL Scripts, PL/SQL Scripts and BTEQ scripts.
  • Excellent development experience in DBMS like Oracle,DB2, MS SQL Server, Teradata and MYSQL.
  • Strong knowledge of data warehousing, including Extract, Transform and Load Processes.
  • Involved in designing logical and physical data model using Erwin and Visio.
  • Experience in Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).
  • Experience in setting up definitions and process for test phases including unit test, system test, integration test, user acceptance test and product test and upload them into HP ALM
  • Involved in Migration, Enhancement, Maintenance and Support of project.
  • Ability to work independently as well as in a team and able to effectively communicate with customers, peers and management at all levels in and outside the organization.
  • Having good working experience in Agile/Scrum methodologies, technical discussion with client and communication using scrum calls daily for project analysis specs and development aspects.
  • Experience in using SVN for version control and Check-in, Check-out of the code.

TECHNICAL SKILLS

HADOOP/Big Data: HDFS, Map Reduce, Hive, Pig, HBase, Sqoop, NOSQL (HBase, Cassandra),Zookeeper, Oozie and Flume.

Databases: Oracle 10g/9i, DB2, Teradata,MS SQL Server, MySQL, NoSQL, HBase,MongoDB.

Schedulling Tools: OOZIE, ASG Zena, Control-M

ETL Tools: IBM Information Server 8.5/8.0.1 (Websphere Datastage and Qualitystage InformationAnalyzer),AscentialDatastage 7.5.2/7.1/6.0 , 5.1(Designer,Director,Manager, Administrator)

Data Modeling Tools: ERwin 4.2/3.5.2

Database tools: SQL Developer, TOAD,SQL*Loader

Operating Systems: MS-DOS 6.22, Windows NT/98/2000, Solaris,IBM-AIX,UNIX, LINUX

Languages: Java 1.6,SQL, PL/SQL, T SQL,C, XML, HTML

Scripting: UNIX Shell Scripting,Perl Scripting

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential, Moline, IL

Responsibilities:

  • Participated in requirements gathering from Business Users and converting the requirements into technical specifications.
  • Used Flume for moving all log files generated from various sources to HDFS for further processing and later analyzed the imported data using Hadoop Components.
  • Developed multiple MapReduce programs in Java for data cleaning and pre-processing.
  • Developed simple to complex MapReduce jobs using Hive and Pig.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
  • Involved in creating Hive tables and applied HiveQL on those tables for data validation.
  • Created Hive external tables and managed tables, designed data models in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
  • Used Sqoop to export data from Hadoop Distributed File System (HDFS) to RDBMS.
  • Used Oozie job scheduler to automate the job flows.
  • Configured Oozie coordinator for this whole process like MR, HIVE and Sqoop to MongoDB
  • Written workflow.xml, coordinator.xml, job properties and coordinator properties for the Oozie configuration.
  • Involved in development of Oozie code for moving files from different folders in HDFS.
  • In the Oozie configuration we have incorporated the email notification for successful jobs and failed jobs.
  • Used Zookeeper to manage coordination among the clusters.
  • Export the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team..
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts

Environment: Hadoop, Pig, Hive, Flume, MapReduce, HDFS, LINUX, MongoDB, Java (jdk 1.6), Eclipse, Oozie, ZooKeeper, Sqoop, Hadoop Distribution of HortonWorks,Teradata, SQL, NoSQL, HBase, Flume, Unix Shell scriptingOracle

Hadoop Developer

Confidential, IL

Responsibilities:

  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG.
  • Developed Simple to complex MapReduce Jobs using JAVA, Hive and Pig
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms..
  • Extensively used Pig for data cleansing.
  • Created partitioned tables in Hive.
  • Managed and reviewed Hadoop log files.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Performed Extraction, Transform and Loading of large sets of semi structured and unstructured data as per Business requirement.
  • Exported the analyzed data to the RDBMS using Sqoop for visualization and to generate reports for the BI team.
  • Used Oozie job scheduler to automate the job flows.
  • Did Cluster coordination services using ZooKeeper.

Environment: Hadoop, Pig, Hive, Flume, MapReduce, HDFS, LINUX, MongoDB, Java (jdk 1.6), Eclipse, Oozie, ZooKeeper, Sqoop, Teradata, SQL, NoSQL, HBase, Flume,DataStage8.5,Oracle 10g,DB2..

ETL Developer

Confidential, TX

Responsibilities:

  • Worked closely with the Data Modeler and Database Administrator to understand the business processes and participated in the analysis of business requirements to build the data warehouse model.
  • Extensively used DataStage Designer to develop the Parallel jobs and Server jobs.
  • Designed several DataStage jobs using Join, Merge, Lookup, Change Apply, Change Capture, Funnel, Filter, ODBC,Column Generator,Transformer, Modify, Surrogate Key, Aggregator, Dcode, Row Generator and XML stages.
  • Developed parameter-driven ETL processes to map source systems to target data systems with DataStage complete source system profiling.
  • Implemented various partition methods like Hash, Entire, Auto, and DB2.
  • Utilized various Database Connector stages like DB2/UDB, ODBC.
  • Tuned DataStage jobs for better performance by creating DataStage Lookup files for staging the data and lookups.
  • Developed complex transformations, surrogate keys, routines, dimension tables and fact tables.
  • Utilized Environment Variables, Stage Variables and Routines for developing parameter-driven jobs and debugging them.
  • Identified facts and dimensions for developing logical design of the data marts.
  • Enhanced the job performance by using proper partitioning methods.
  • Analyzed the resources utilized using Job Monitor.
  • Extract and transform source data from DB2 database and XML sources to load into data warehouse in Oracle and Teradata.
  • Perform data extractions and loads using Teradata database utilities like Multiload, Fast Load and Fast Export for large warehouse and for datamarts in Teradata.
  • Developed sequencers to control the job execution of the DataStage jobs using Job activity and Email Notification stages.
  • Created many user defined server routines, before/after job subroutines, shared containers which facilitated in implementing some of the complex logics. Also used stage variables.
  • Extensively involved in creating the partitioned tables, indexes, triggers and many complex SQL queries
  • Utilized DataStage Director for running and monitoring performance statistics.
  • Performed the unit testing for jobs developed to ensure that it met the requirements.
  • Conducted the migration of DataStage code to the Integration and Production and conducted Integration testing.
  • Developed UNIX shell scripts to automate file manipulation and data loading procedures and to create indexes, drop the indexes in the tables and enhanced the master script to run the master sequencer.

Environment: DataStage 8.5/ 8.0.1,ASG ZENA, Teradata, DB2 UDB, Oracle 10g, SQL Scripts, BTEQ Scripts, UNIX Shell Scripting, Serena, XML, MS SQL Server, MS Word.

ETL Developer

Confidential, IL

Responsibilities:

  • Worked on analyzing the systems and gathering of requirements
  • Involved in designing and development of data warehouse environment
  • Involved in implementing star schema for the data warehouse using Erwin for Logical/Physical Data Modeling and Dimensional Data Modeling
  • Extensively used Data Stage Manager, Designer, Director and Integrity for creating and implementing jobs
  • Used the DataStage Director and it’s run-time engine to schedule running the jobs, testing and debugging it’s components, and monitoring the resulting executable versions (on an ad-hoc or scheduled basis)
  • Developed jobs using different stages like Transformer, Aggregator, Lookup, Source dataset, External Filter, Row generator, XML, Sequential file, OCI, Column Generator and Peek stages.
  • Implemented SCD Type 1 and SCD Type 2 by using Dataset, Change Capture, Transformer and Oracle Enterprise stages
  • Used DataStage to transform the data to multiple stages and prepared documentation
  • Improved performance by optimizing SQL statements and implemented Bitmap Indexes.
  • Extensively used DataStage Director to validate, monitor, schedule and execute the jobs..
  • Migrated the jobs from Development to QA and then to Production server using Version Control
  • Created and monitored sequences using DataStage sequencer also Created containers to use in multiple jobs.
  • Used Teradata database utilities for extraction and loading the records.
  • Developed User Defined Sub Routines to implement some of the complex transformations, data conversions, code validations and using various DataStage supplied functions
  • Created different types of reports, such as Master/Detail, Cross Tab and Chart
  • Used Analyzer and Explorer to view reports from different prospect like Slice and Dice and Drill down

Environment: Data Stage 7.5.2 (Enterprise Edition), Erwin 4.2,Oracle 9i, SQL Server, Windows NT, Business Objects, Autosys, Perl scripting, XML, HTML, Teradata, Web services, Unix Shell Scripts.

ETL Developer

Confidential, MI

Responsibilities:

  • Requirements gathering and source data analysis and identifying business rules for data migration to data warehouse and data marts.
  • Automation of ETL processes using DataStage Job Sequencer, shell scripting and PL/SQL programming.
  • Performing administrative tasks such as setting up users and privileges, Creating and maintaining DataStage Projects and timely Project cleanup.
  • Tuning Jobs using various configurations by understanding it’s complexion during volume testing before moving to production.
  • Development of parallel jobs in Parallel Extender framework. Implementing type. II slowly changing dimensions
  • Extract and transform source data from DB2 Database and XML sources to load into data warehouse in Oracle and Teradata.
  • Perform data extractions and loads using Teradata database utilities like Multi Load, Fast Load and Fast Export for a large warehouse and for data marts in Teradata .
  • Create Source and Target Data Base to Load Metadata for Profiling the Data and Generate the Analysis reports As per the Client Requirements
  • Worked on Data Profiling to do the Column Analysis, Table Analysis, Primary Key Analysis
  • Create custom Reports For Column Analysis & Tables Analysis adding Notes & comments for the Analysis results for Business User to develop the Business rules to Implement them in ETL process
  • Generated Source to Target Specification Reports to Assist the Data Modelers
  • Processing of XML documents using XML Stages.
  • Writing Autosys JIL Scripts to activate the UNIX scripts in production i.e. Jils for the Box and Commands.
  • Query optimization using HINTS, EXPLAIN Plan etc.
  • Writing Job Control routines and Transform functions in the process of designing the job.
  • Maintain backup of ETL code, shell scripts using the PVCS Tracker and PVCS version Manager.

Environment: DataStage 7.0, Oracle 9.x, DB2, DB2 UDB, COBOL, JCL, UNIX-AIX, Teradata, Windows 2000/XP, Oracle Designer, SQL Server, AutoSys.

Java Developer

Confidential

Responsibilities:

  • Involved in the coding of User Interface Screens using JSPs.
  • Done Unit testing for the code, which I have developed and also involved in System Testing and integration testing of the application.
  • End user interaction in-order to fix/close an issue (bugs).
  • Module specific database creation and query tuning.

Environment: Java,CoreJava,J2ee,Servlets,JSP,JavaBeans,JDBC,Swing,JavaScript,HTML,CSS,XML,XSLTEclipseIDE,Tomcat,,Junit,Web logic, Oracle, SQL,PL/SQL,CVS,Log4j,ANT.

We'd love your feedback!