Hadoop Developer Resume
Newark, DelawarE
SUMMARY:
- 8 years of Total IT Experience.
- 2 years of experience with Big Data - Hadoop Stack.
- 8 years of experience with scripting in Unix Shell and Perl scripting.
- 8 years of Datastage Experience.
- 1 year of Informatica PowerCenter Experience.
- 7 years of experience in DB2 LUW.
- 1 year of experience in DB2 BLU Acceleration.
- 3 years of experience in HP Vertica.
- 2 years of experience in Teradata.
SUMMARY OF EXPERIENCE:
- IT: 8 years of experience in the field of Information Technology with good experience in Data Analysis, Design and Development. Experience in Data warehousing and application development across various verticals.
- Big Data: 2 years of Experience with Hadoop stack - Hive, Impala, Sqoop and Spark. Worked on application built on HDFS, Hive and Spark
- ETL: 8 years of extensive experience with Data Warehousing using IBM Datastage 9.0.1, 8.5/8.1, Informatica (PowerCenter 8.x)
- Scripting - Shell, Perl and Python Extensively written Shell Scripts and programing in PERL scripting for the automation of various ETL scripts as well as for scheduling ETL jobs.
- Database: 7 years of experience in DB2 9.x and 1 year of experience in DB2 BLU and 3 Years of experience in Vertica. Expertise in SQL.
- Data Modeling: Dimensional Data Modeling, Star Join Schema Modeling, Snow-Flake Modeling, FACT, Dimensions Tables and Data Modeling. Proficient at Enterprise Database Integration and Management with DB2 and Vertica.
- Teradata: 2 years of experience with Teradata utilities like BTEQ, FASTLOAD, MULTILOAD to load the input files from operational database to Analytical repository. Experience with finetuning the SQL to reduce the cost and creating efficient SQL queries.
- Vertica: Expertise in creating Projections, buddy projections, using VSQL options for exporting and copying data. Knowledge on Vertica Columnar architecture. Debugging and finetuning VSQLS for performance improvement. Well versed with WOS and ROS concepts in Vertica architecture.
- Scheduling tools: Autosys and Tivoli Workload Schedulers.
TECHNICAL SKILLS:
Operating Systems Scripting Languages: AIX 6.1, Windows, Linux Unix Shell scriptingPerl and Python
Databases Version Control: DB29.7, DB2 BLU, HP Vertica SCCS, SVN Tortoise
ETL Tools Scheduler: DataStage 8.5,8.7, 9.1 Maestro (with TWS web admin) and Autosys
DOMAIN: Banking and Financial Services
PROFESSIONAL EXPERIENCE:
Confidential
Hadoop Developer
Technologies: Hive, HQL, Impala, Sqoop, Shell, MapReduce, Spark
Responsibilities:
- Creating HLD and LLD for reference as per the requirements provided on project basis.
- Developing the HQL’s required for the data processing and data extraction from the sources based on the requirement
- Worked on application development to process the file and load the parquet table with implying the compression techniques on it.
- Using Hive analytical functions to perform different analytical activities.
- Optimizing HLQ’s for application performance.
- Capturing the performance of the Queries and re-writing the utilities.
- Debugging the runtime issues and solving them in the development phase.
- Uncovered the limitation of the Hadoop cluster and finding a work-around to run the application flawless.
- Implemented Partitioning, Dynamic Partitions, Bucketing in Hive.
- Utilization of Sqoop utility to extract the data from different sources like RDMS and Teradata.
- Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using Hive QL
- Used Pig to parse the data and Store in Avro format.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Loading the flat files into parquet tables for wide use in other Hadoop components
- Interacting with end users to improve/change the partitioning as per their query requirement.
- Implemented test scripts to perform the data quality checks and modified it according to the requirement.
- Data loading is a 2step process. It would initially be processed to Stage layer and then moved to perm table.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Worked on analysing and writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from edge node to HDFS using shell scripting.
- Used Sqoop job to import the data from RDBMS using Incremental Import.
- Customized Avro tools used in MapReduce, Pig and Hive for deserialization and to work with Avro ingestion framework.
- Worked with using different kind of compression techniques to save data and optimize data transfer over network using LZO, Snappy, etc.
- Analyse large and critical datasets using Cloudera, HDFS, MapReduce, Hive, Hive UDF, Pig & Spark.
- Used Pig to parse the data and Store in Avro format.
- Stored the data in tabular formats using Hive tables and Hive SerDe's.
- Developed custom aggregate functions using Spark SQL and performed interactive querying.
Environment: Hadoop, HDFS, Hive, Impala, Sqoop, Spark, Python, Cloudera, Shell scripting, Autosys
Confidential, Newark, Delaware
ETL Developer
Responsibilities:
- Gathering and documenting requirements, requirements analysis, converting requirements into High-Level Design Documents.
- Developed Parallel jobs to Extract, Transform and load the data in to the data warehouse.
- Develop Datastage Parallel jobs using different stages like Aggregator, Join, Merge, Lookup, Source dataset, External Filter, Row generator, Column Generator, Change Capture, Copy, Funnel, Sort, Peek stages etc.
- Design and develop ETL jobs using Datastage tool as needed for project development assignments based on requirements.
- Ensure accuracy and integrity of data and application through data analysis, coding and problem resolution.
- Design, develop and enforce best practices and standards around data quality and ETL solutions.
- Using Datastage Director for running and monitoring the jobs.
- Creating Autosys jobs to execute the Datastage job sequence/jobs
- Create Datastage sequencers to run the batch jobs.
- Develop parallel jobs with the different partitioning techniques like HASH, MODULUS, RANGE, SAME, ENTIRE, ROUND ROBIN.
- Develop parallel jobs using FTP, XML and pivot stages.
- Using Optimiser to optimize the parallel job performance.
- Working with Unix and Linux servers, jobs scheduling, as well as script languages such as C, shell script, AWK, and sed.
- Using SVN Tortoise for version control.
- Using Debug stages for better understanding and performance improvement for the ETL code.
- Working with IBM Infosphere Metadata workbench for better data lineage.
- Develop parallel jobs to process different varieties of input data using CFF, SEQUENTIAL file stages.
- Develop jobs to read data from different data sources using DB2 Connector, ODBC Connector, Oracle, Mainframe Z/os and further load it into data warehouse.
- Designing solutions for optimal performance and handling other non-functional aspects of availability, reliability and security of DATASTAGE ETL Platform.
- Developed and Documented technical architecture, system design, performance test results and other technical Aspects of the implementation.
- Expertise in designing and modelling Data Warehousing Concepts in OLTP/OLAP System. Analysis and developing Database Schemas like Star Schema and Snowflake Schema for Relational and Dimensional Modelling.
- Participated in all phases including Requirement Analysis, Client Interaction, Design, develop, Testing, Support and Documentation.
- Gathering and documenting requirements, requirements analysis, converting requirements into High-Level Design Documents.
- Generate the Test Plan; specify the overview of Testing Approach, Testing Strategy, Roles Responsibilities and Scope.
- Prepare test scripts and Test cases for Unit testing
- Perform UAT and engage Business Users for the Testing
- Work with LOB on UAT or Functional Testing Sign-off
Environment: Datastage V9.1, DB2 9.7/11, Shell Scripting, Linux, Autosys, Perl.
Confidential, Newark, Delaware
Data Stage/Unix Developer
Responsibilities:
- Participated in all phases including Requirement Analysis, Client Interaction, Design, Coding, Testing, Support and Documentation.
- Gathering and documenting requirements, requirements analysis, converting requirements into High-Level and Low-Level Design Documents from Project Specifications meetings and discussion with Confidential technology and functional architects.
- Develop processes for extracting, transforming, integrating and loading data from various sources into the Data Warehouse database using Datastage.
- Prepare test plan and test scripts.
- Review unit test results and SIT test results.
- Co-ordinating in the preparation of schedules, jobs and production documents for installs.
- Deployment in production and providing PIW support.
Environment: DB2, Vertica, UNIX, Data stage 8.7 (Mainly with OSH scripts), Wrapper scripts were built around the orchestrate scripts for data load activity.
Confidential
Data Stage Developer
Responsibilities:
- Develop HLD and LLD documents from Project Specifications meetings and discussion with Confidential technology associates.
- Coordinate with offshore team in developing the code components in line with the design documents created.
- Prepare test plan and test scripts.
- Review unit test results and SIT test results.
- Coordinating in the preparation of schedules, jobs and production documents for installs.
- Deployment in production and providing PIW support.
Environment: UNIX, Vertica, Data stage 8.7 (The transformations in COBOL program were converted to parallel jobs in DataStage )
Confidential
UNIX and Datastage Developer
Responsibilities:
- Analysis of requirements.
- Participated with onshore team for HLD and LLD confirmation for the alerts and attestation.
- Design, development and implementation of DataStage framework statistics collection job. XML reader stage was used as the main stage. Data was read from the XML files and insert into framework tables.
- Design, development and implementation of Alerts back-end scripts .
- Design, development and implementation of attestation back-end scripts .
- Automated framework data migration across development/UAT/Production through export, verify and import methodology.
- Providing solutions to framework users for implementing DataStage jobs.
- Fixing code relates issues raised by the developers through an institutionalized procedure (using a Bank owned ticketing tool).
- Fine tuning queries hitting framework database to optimize the Database (DB2) interaction timings.
- Coordinating with Clients (Framework owners) and onshore coordinator for reviewing test results, Code components and enhancing functionalities.
Environment: UNIX shell scripting, Datastage 8.5 and Maestro (TWS web admin)
Confidential
UNIX and Datastage Developer
Responsibilities:
- Develop HLD and LLD documents from Project Specifications meetings and discussion with Confidential technology associates.
- Preparing Estimations (for forecasting efforts based on the various components).
- Coordinate with onshore team in developing the code components in line with the design documents created.
- Developed the ETL processes that using Datastage and Shell scripts
- Prepare test plan and test scripts.
- Review unit test results and SIT test results.
- Co-ordinating in preparation of schedules, jobs and production documents for installs.
- Deployment in production and providing PIW support.