Hadoop Developer Resume Profile
TX
SUMMARY:
- IT professional , Over 8 years' experience in System Analysis, Design and Development and Extensive experience in IBM DataStage v 8.7/8.1/8.0/ 7.x using Components like Administrator, Manager, Designer, and Director.
- Over Seven years in the fields of data Warehousing, Data Integration, Data Migration using IBM Websphere DataStage, Oracle, PL/SQL, DB2 UDB, SQL Server 2000/2005. SQL procedural language and Shell Scripts
- Around Seven years of experience in ETL methodologies in all the phases of the Data warehousing life cycle
- Experience in Data ingestion into HDFS using Hadoop ecosystem SQOOP, FLUME and performing data transformation/analysis using PIG and HIVE.
- In Depth knowledge in Data Warehousing Business Intelligence concepts with emphasis on ETL and full Life Cycle Development including requirement analysis, design, development, testing and implementation
- Expertise in all the phases of System development life Cycle SDLC using different methodologies like Agile, Waterfall.
- Have good grasp of data warehousing fundamentals and have proven ability to implement them. Conversant with ETL processes.
- Worked SQL, SQL PLUS, Oracle PL/SQL, Stored Procedures, Table Partitions, Triggers, SQL queries, PL/SQL Packages, and loading data into Data Warehouse/Data Marts
- Excellent experience in most of the RDBMS including Oracle 10g/9i/8x, SQL Server 7.0/6.5 DB2 8.1/9.0
- Extensively used DataStage- Designer to design and develop Server and PX jobs to migrate data from transactional systems Sybase, DB2UDB into the Data Warehouse.
- Extensively used DataStage- Manager to Export/Import DataStage Job components and Import Plug in Table Definitions from DB2UDB, Oracle and Sybase databases.
- Designed Server jobs, Job Sequencers, Batch Jobs and Parallel jobs. Handled multiple pieces of a Project.
- Experience in writing UNIX Shell scripts for various purposes like file validation, automation of ETL process and job scheduling using Crontab.
- Designed Parallel jobs using various stages like join, merge, lookup, remove duplicates, filter, dataset, lookup file set, modify, aggregator, CFF, Transformer, XML and MQ plug in stages.
- Good Experience in Extraction Transformation and Loading ETL processes using Datastage ETL Tool, Parallel Extender, Metastage, Quality Stage, Profile Stage
- Developed Server jobs using various types of stages like Sequential file, ODBC, Hashed File, Aggregator, Transformer, Sort, Link Partitioner and Link Collector.
- Experience in integration of various data sources like Oracle, Teradata, DB2, SQL Server, MS Access and Flat files into the Staging Area. Extensively worked with materialized views and TOAD
- Proven track record in troubleshooting of DataStage Jobs and addressing production issues such as performance tuning and enhancement
- Excellent knowledge of studying the data dependencies using Metadata stored in the Repository and preparing batches for the existing sessions to facilitate scheduling of multiple sessions
- Excellent analytical, problem-solving and communication skills
Technical Skills:
| Languages | SQL, PL/SQL, C, C , VB, XML, Java, J2EE, DOS, COBOL, UNIX, Korn, Shell Scripting, Perl Scripting, Python |
| Big Data / Hadoop | Hadoop, HDFS, Hive, Pig, Sqoop, Flume, MapReduce, Horton Works, Cloudera |
| Operating Systems | Sun Solaris , IBM AIX 5.3/5.2/4.2, MS DOS 6.22, Win 2000, Win NT 4.0,Win XP |
| Database and Tools Oracle Tools | Oracle10g/9i/8i/8.0/7.0,DB2UDB7.2/8.1/9.0,Mainframe, TeradataV2R6 13, MS SQL Server 2005, 2008 |
| Web Technologies | HTML, JavaScript |
| ETL Tools | IBM DataStage 8.7/8.1/8.0/7.5.3/7.5.2/7.5.1 Designer, Director, manager, Administrator , Parallel extender, Server Edition ,MVS Edition, Quality Stage , ETL, OLAP, OLTP, SQL Plus, Business Glossary, Fast Track, Information Analyzer, Metadata Workbench |
| Data Modeling Tools Scheduling Tools | Tivoli,Erwin,Zena , Autosys,Control M |
PROFESSIONAL EXPERIENCE:
Sr. ETL Developer/Hadoop Developer
Academy Sports and Outdoors is one of the leading retail store of its kind. Worked on two different projects here, one of them is to build a Data warehouse for the ECOM project using IBM Datastage. Other project is to build a ODS layer in hadoop for different internal applications to access the data.
Responsibilities:
- Created a process to pull the data from existing applications and land the data on Hadoop.
- Used sqoop to pull the data from source databases such as Oracle RMS database, DB2 Ecom database.
- Created the Hive tables on top of the data extracted from Source system.
- Partitioning the Hive tables depending on the load type.
- Created the hive tables to show the current snapshot of the source data.
- Created and troubleshooting of MapReduce programs.
- Created Pig scripts to cleanse and transform the data.
- Good understanding experience using tools like Sqoop and Flume
- Written shell scripts for Data ingestion and data cleansing process.
- Created the Datastage jobs to load data from ECOM database to ODS to Business Intelligence layer.
- Developed the solution for creating the generic jobs in Datastage to load 300 source tables into current ODS layer Netezza .
- Created reusable components in datastage to pull data from different source systems into ODS.
- Developed the jobs to load 29 dimensions and 10 Fact tables related to ECOM into Business intelligence layer.
Environment: Hadoop, MapReduce, HDFS, Hive, Java jdk1.6 , Cloudera CDH5, Pig, Impala, Oozie, IBM InfoSphere Datastage 8.7, UC4, Shell Scripts,WinXP, UNIX and Netezza, Oracle and SQL Server 2008. PL/SQL.
Confidential
Hadoop Developer
Cardinal health provides services such as Logistics, Specialty solutions, Pharmacy solutions, Supply chain management, etc. to its health care clients. Objective of the Hadoop data analytics project is to bring all the source data from different applications such as Teradata, DB2, SQL Server, SAP HANA and some flat files on to Hadoop layer for business to analyze the data.
Responsibilities:
- Created a process to pull the data from existing applications and land the data on Hadoop.
- Worked in agile environment, involved in sprint planning, grooming and daily standup meetings.
- Responsible for meeting with application owners for defining/planning of Sqooping the data from source systems.
- Used sqoop to pull the data from source databases such as Teradata, DB2, and MS SQL server.
- Created the Hive tables on top of the data extracted from Source system.
- Created Hive and PIG UDFs using java for data transformations and implement date conversions.
- Partitioning the Hive tables depending on the load type.
- Worked with AVRO and Sequential files formats.
- Created MapReduce programs for data transformations.
- Responsible for creating PIG scripts for data transformations.
- Responsible for creating the Datameer links for data Visualization using Datameer.
- Assisted business in validating and analysis of the data.
- Created the shell wrapper scripts for the Sqoop, Hive and MapReduce jobs.
- Deployed and scheduled the tested Sqoop, Hive and Datameer jobs using Autosys.
- Experienced in managing and reviewing Hadoop log files
- Created work flows using Oozie.
- Good understanding of Hadoop architecture and knowledge of NoSQL databases Cassandra and Hbase .
Environment: Hadoop, MapReduce, HDFS, Hive, Java jdk1.6 , Pig, Datameer, UNIX, Shell scripting, Teradata, DB2, MySQL, Autosys, Oozie.
Confidential
Sr. Data Stage Consultant / Hadoop Developer
- Teacher's Insurance is a nonprofit organization that is the leading retirement provider for people who work in the academic, research, medical and cultural fields. Worked in multiple projects one of them is Daily Client Transactions which would extract daily client financial transaction data from Omni Plus that was processed in the prior business night's batch cycle. The EDW daily Extract file created will be transmitted from Datastage to NetApp Server via wmqfte.
- Fee Disclosure: This project is to send either Email or Hard copy of Statement to all the participants about the Service fee charged to the Participant. Worked as an ETL Technical lead for this project, Participated in scrum meetings and successfully implemented it using agile methodology.
- EDW Optimization: Worked on tuning the existing ETL jobs to improve the performance and reduce the run time.
Confidential
Hadoop experience:
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in managing and reviewing Hadoop log files
- Created components on Hive/Pig for converting Fixed Length Ascii files to hive tables. Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Supported Map Reduce Programs those are running on the cluster
- Understanding of Cluster coordination services through Zoo Keeper.
- Involved in loading data from UNIX file system to HDFS.
- Good understanding of Installation and configuration of Hive and also Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map way
- Automated all the jobs for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
ETL Responsibilities:
- Involved in requirement gathering, analysis and study of existing systems.
- Involved in preparing technical design/specifications for data Extraction, Transformation and loading.
- Lead a team of four developers and participated in daily scrum meetings, created ETL solution for complex business requirements.
- Written Stored Procedures, Functions and packages to modify, load the data and create extracts.
- Written Teradata Mload, FastLoad and Fast Export Bteq scripts for loading, modifying and creating exports of data.
- Extensively used Datastage Designer to develop various jobs to extract, cleanse, transform, integrate and create extract files as needed.
- Developed complex Teradata SQL which involved many tables and calculated the summary values as necessary.
- Used Information Analyzer for column analysis and written Data rules for quality check.
- Used Business Glossary and Fast track for ETL mapping and to link business terms with technical terms and solutions.
- Also used Metadata Work Bench for impact analysis of existing Data model.
- Also worked as part time admin, involved in Datastage configuration, ODBC connections creation, Assigning roles to the users, monitoring the system and killing process if needed.
- General cleanup and maintenance of the Datastage server.
- Developed a generic shell script to wmqfte files and initiate file transfer between two servers.
- Scheduled the jobs using Autosys Scheduler, which would trigger ETL jobs and invoke wmqfte shell scripts to initiate file transfer between two servers.
- Involved in writing Jil Script's to create Autosys Jobs to trigger ETL jobs and Shell Script.
- Created Technical Specs document for the Datastage Jobs, Developed several Test Plans and Error Logs / Audit Trails were maintained.
- Implementing performance-tuning techniques along various stages of the ETL process.
- Following up deployment process of Datastage code migration on different environments
- Development, test and production with admin team.
- Co-coordinating with client managers, business architects and data architects for various sign offs on data models, ETL design docs, testing docs, migrations and end user review specs.
Admin Responsibilities:
- Performed system level and application level tuning and other Datastage administration activities. Adhered to SLA and resolved the tickets on time.
- Supported the application development teams for Datastage needs and guidance.
- Involved in upgrades and hot fixes for the new releases.
- Support, configure, install, and upgrade IBM Information Server Environment versions 8.1/8.7.
- Creating new Database Connections, Projects setting parameters at various level job, Project, environment , setup configuration of DS.
- Provided support on User creation, Environment variable creation and UNIX box folder creation.
- Created Projects using Administrator client.
- Set up auto purging of logs in the Administrator at project level.
- Cleared and purged off the Dataset control files on regular basis from the IBM Server Directory.
- Allocated privileges to the users using Server Console.
- Made use of DataStage Import and export options to migrate jobs from one environment to the other.
- Manage resource and scratch disk.
- Worked on DS Client Installation.
Environment: Hadoop, MapReduce, HDFS, Hive, Java jdk1.6 , Hadoop distribution of HortonWorks, Cloudera, MapR, IBM InfoSphere Datastage 8.1/8.7, OBIEE, Teradata 13.0, Autosys 4.5.1, Shell Scripts,WinXP, UNIX and Teradata Sql Assistance, Oracle and SQL Server 2008. PL/SQL, BTeq scripts.
Confidential
Sr. Datastage Consultant
Network Appliance is a manufacturer of Network and Data Storage systems, this project automates the Professional Services provided by NetApp to their clients. Data is loaded from ERP systems into respective Dimensions and Facts and OBIEE is used as reporting tool to generate reports.
Responsibilities:
- Interacted with Business users and Technical Architects to analyze the data gathering the requirements from various sources.
- Developed jobs using Datastage in Designer to extract data from different operational Sources like Flat Files, CSV Files, Delimited Files, SFDC Plugin stage and performed business operations on data like Cleansing, Transforming and Load Initial/Incremental into Target DWH.
- Involved in daily meetings with the client for requirements and provide services to meet the required SLAs.
- Created Design Documents and Unit Test cases with test Results Documents.
- Used Job Sequencers to run the jobs sequentially and reporting the Status of the Job through Email.
- Exported and imported the Datastage jobs between the production and the development servers
- Developed JOBS to load the data into the Warehouse environment using the Slowly Changing Dimension techniques.
- Check the output according to the specifications using Unit Test.
- Basically involved in providing Support to the built Applications and also involved in developing in new dimensions while enhancing the application.
- Loading the data into the Staging and moving the data from Staging to ODS.
- Involved in Business Requirement gathering sessions
- Designing of the ETL jobs as per the client's requirements.
- Involved in Unit, Integration, system, and performance testing levels and Involved in Performance tuning at source, target, jobs and system levels and Testing Jobs with Unit Test plan.
- Integrate the data from the Source system into the target database.
- Involved in S.I.T System Integration and Testing and U.A.T User Acceptance Testing .
- Supported Data Reconciliation team during the data recon process
- Extensively used SQL coding for overriding the generated SQL in Datastage and also tested the data loaded into the data base.
- Used Autosys Scheduler to Schedule the jobs.
- Supported the application after moving into Production and Troubleshooting the issues faced in Production support.
Environment: IBM InfoSphere DataStage 8.1, OBIEE, SalesForce.com, Autosys, Oracle 10g, Win 2000/NT and UNIX
Confidential
Sr.ETL Developer
Confidential is a Medical equipment manufacturing company it has 7 different manufacturing divisions and few sales divisions. This project involved in integrating and processing the data coming from different divisions/source systems related to revenue, capital expenditure, work force. Loading the processed data into the Hyperion planning application for which ESSBASE is the backend database and loading the planned data from ESSBASE cubes to the different source systems.
Responsibilities:
- Designed the technical design documents from Functional specifications.
- Worked with Project Lead, Technical Lead and Functional analysts to understand the functional requirements and from there designed the Technical specifications.
- Worked with complex flat files in the process of extraction of data.
- Developed parallel jobs which involved of SAP BW Open Hub Extract, ESSBASE plug in stage and oracle enterprise stage.
- Worked as Onsite Coordinator, managed a team of four in offsite.
- Worked in an environment which involved of different source/target systems like SAP BW, ESSBASE, Oracle, and MS SQL and Flat files.
- Worked on Data Migration from legacy ESSBASE to the ESSBASE V10
- Reviewed the jobs developed by offshore team and assisted them during the daily calls.
- Developed the jobs and documented by following the SJM ETL standards.
- Involved in creation of UTPs, Code review check list and Process flow documents.
- Responsible for building the Hierarchy of the meta data to load the MDM DRM is the MDM tool used in this project
- Created the jobs to update the Metadata in the cubes using the incremental files.
- Implemented complex logics in Transformer stage like date validation, use of stage variables.
- Unit Testing and Integration testing the individual and extract-transform-load jobs in sequence respectively.
- Extensively used IBM Information Server Director for scheduling the job to run in batch, emailing and online production support for troubleshooting from LOG files
- Used Job Sequencers to run the jobs sequentially and reporting the Status of the Job through Email.
- Exported and imported the Datastage jobs between the production and the development servers
- Involved in S.I.T System Integration and Testing and U.A.T User Acceptance Testing .
- Supported Data Reconciliation team during the data recon process
- As a Datastage Developer I created both Parallel and server jobs and also Sequence jobs.
- Used the Client components Designer, Manager, Director and Administrator.
- Developed jobs using the stages like Sequential file, Dataset, Oracle connector, ODBC stage, Lookup, Join, Aggregator, Pivot, External Source, BW Open Hub Extractor, Essbase Connector, Link Partitioner/Collector, Column Generator, Copy, Transformer, Sort, Remove Duplicates, Funnel, FTP.
- Also used UNIX scripts in file moving, scheduling jobs, removing null from the Flat files.
- Used AutoSys to schedule the load process and worked closely with operations team.
Environment: Datastage 8.1, Oracle 10g, MS SQL Server 2005, SAP BW, Autosys, EssBase Visual Explorer 11, SQL / PLSQL, FTP Client, Linux, Windows XP
Confidential
DataStage Developer
Responsibilities
- Responsible for Business Analysis and Requirements Collection.
- Worked as an Onsite coordinator for efficient development of the project.
- Being an Onsite coordinator was also responsible for Requirement Analysis, designing, coding and testing of the jobs.
- Practically responsible for working with the offshore co-ordination team for further development and maintenance.
- Extensively worked with DataStage - Manager, Designer, Director and Administrator to load data from flat files, legacy data to target Oracle database.
- Used DataStage for subjecting the data to multiple stages, thereby transforming it and prepared documentation. Used DataStage Manager to define Table definitions, Custom Routines and Custom Transformations.
- Extensively worked with DataStage - Designer to pull data from flat files and Oracle and also worked on Information Analyzer.
- Used Integrity, Parallel Extender for cleansing the data and performance improvement.
- Used DataStage for subjecting the data to multiple stages, thereby transforming it and prepared documentation.
- Extensively worked on DataStage Job Sequencer to Schedule Jobs to run jobs in Sequence.
- Used Shell Scripts in the development of test cases and automation.
- Used debugger to test the data flow and fix jobs.
- Performed data loading with multiple and parallel ETL processes.
Environment: Ascential DataStage 7.5 DataStage Manager, DataStage Administrator, DataStage Designer, DataStage Director, Parallel Extender, Integrity and MetaRecon , ETL, IBM Cognos, PL/SQL, Oracle 9i.
Confidential
DataStage Developer /Quality stage
Responsibilities:
- Designed jobs which populated the tables using column generator ,surrogate key, join, Oracle enterprise stage
- Creation of Shared containers so that it can be used by other modules of the plan.
- Exporting the jobs to Testing Environment from Development and then to Production.
- Worked closely with Project lead/Manager, Architects, and Data Modelers, System Analyst to understand the business process and functional requirements
- Did unit testing of the jobs developed before taking them for UAT and finally Production.
- Involved in writing Shell scripts for reading parameters from files, invoking datastage jobs, and FTP files to specific locations.
- Worked on DB2 databases for importing and exporting data..Performance tuning of the jobs
- Also Worked on Data Migration from DB2 to Teradata
- Using the stages like Change-Capture, Modify, Look-up, Slowly changing Dimension, Remove Duplicate, filter, Sort in modifying the jobs already developed to improve the performance and to meet the requirements.
- Agile methodology was followed to produce the results.
- Developed DataStage job sequences used the User Activity Variables, Job Activity, Wait for File stages
- A block called eligibility was build which involved reading data from Complex flat flies and loading into database.
- Created and verified quality stage jobs for Match and unduplication
- Used Quality stages built in wizards for parsing and removing duplicate records
- Used quality stage jobs to validate the quality of output data generated.
- Extensively Used the IBM Information Server Designer to develop jobs to extract, transform and load data from various source systems into the Data Warehouse.
- Extensively used IBM Information Server Director for scheduling the job to run in batch, emailing and online production support for troubleshooting from LOG files
Environment: IBM Information Server 7.5, Oracle 10g, SQL/PLSQL,Shell Scripts, Quality Stage Integrity , UML, FTP Client, DB2, Teradata V2R6, IBM AIX 5.2/5.3,Control-M, Windows XP
