Datastage Developer Resume
Riverwoods, IL
SUMMARY:
- Having 9+ years of extensive experience as an ETL Data Stage\ Informatica job designer and in Hadoop Ecosystem like ETL job design, Data Replication, Defect Analysis, Maintenance and Production Support & Release Management activities. In Hadoop Sqoop, Hive, Apache PIG utilities to extract and run the system.
- Worked extensively on Hive, Sqoop and PIG with huge processing been streamlined and has also done performance tuning which has got monetary benefit to the client.
- API Data extraction and validation within Python was the key feature developed by me.
- Sqoop is used as extraction tool to reduce the cost and effectively scheduled it using crontab, so with zero investment we have moved the data into Hive effectively. No third - party tools were used.
- Efficient implementation of huge databases sizing on an average of 60 TB and always followed coding standards to ensure first time right is met.
- Expertise in designing and developing of ETL, performance tuning of jobs, as well as development and implementation of large-scale Enterprise Data Warehousing.
- Experience in Big Data Lake Preparation in ETL Environment.
- Strong understanding of Relational Database Management Systems (RDBMS), proficient in SQL on Oracle, SQL server database environments and basics of Teradata. Used database utilities like SQL Developer, SQL Server Management Studio and Teradata SQL Assistant.
- Playing the role of a Team Lead while coordinating with offshore team.
- Management skills include Team Management, Project Planning, Monitoring, and self-starter with good communication skills and ability to work independently and as part of a team.
- Good Experience in writing, managing, and optimizing the existing Data stage jobs on UNIX and Windows.
- Experience on Installation, Application Fail-over handling, and administration of IBM Data Stage 8.5.
- Hands of experience on Big Data technology is such as HBase, Pig, and Kognitio database.
TECHNICAL SKILLS:
ETL Tools: IBM DataStage, Informatica 9.6.1
RDBMS: Oracle, Sql Server, Db2, Netezza, Hive, HBase, Teradata.
Languages: Python, Shell Scripting, Basic Java
Big Data Technologies: Hadoop, Spark, MapReduce, PIG, SQOOP, HIVE, Kafka and HBase.
Operating Systems: Unix, Windows.
Reporting: Power BI.
PROFESSIONAL EXPERIENCE:
Confidential, Riverwoods IL
DataStage developer
Environment: IBM Info Sphere Data Stage 11.5, Oracle, Netezza, Snowflake Hive, IIP, GCP, HBase, Spark, Python.
Responsibilities:
- Created jobs Using Data Stage Designer from source to Staging and staging to Data Warehouse.
- Designed & developed jobs to extract data from different resources, cleansing, applied various Business rules and logic at transformation stage and loading data into staging table
- Used Reject Link, Job Parameters, and Stage Variables in developing jobs.
- Involved in Business requirement analysis and design documentation preparation.
- Working on data migration to AWS Cloud with help of Glue ETL, Lambda functions and nifi.
- Created Generic API extract flows in Glue Python shell to get data from different end points by using multi-threading concept.
- Also created generic extract flows in Glue Spark to extract data from raw S3 folder to landing s3 folder and this flow will run automatically after placing file in specific S3 folder by using nifi job.
- Created End to End event scheduling with help of lambda functions and Cloud watch events.
- Created end to end ETL flow’s in AWS Cloud environment in all dimensions like type 1,2 and 3.
- Working with Neon Lightbox product migration team by providing data to help business for user migration with pre/post implementation.
- Used Chef for configuration management of hosted Instances within GCP. Configuring and Networking of Virtual Private Cloud (VPC).
- Migrated applications to the PKS,GCPcloud.
- Planning to move fromVCLOUD to GCP
- Used the Data Stage Director to run, schedule, monitor, and test the application on development, and to obtain the performance statistics.
Confidential, Irving, TX
DataStage developer
Environment: IBM Info Sphere Data Stage 11.5, Oracle, Netezza, Snow flake, Spark, Python.
Responsibilities:
- Involved in Analysis of Business requirement and prepared HLD.
- Working on Churn predictive algorithm to implement auto machine program to prepaid/Post-paid churns report to business users.
- Worked on Proactive Assurance auto machine project and delivered successfully.
- Extract data from target tables using Sqoop and process the high-volume data through business rules in Pig and load them in Hive tables.
- Implemented performance tuning activates like UDF to reduce the load, python UDF’s to make the complex functionalities into simple functions and effective reusable PIG scripts to make the load run faster using the TEZ and PIG combination.
- Prepared Data lake for HR Dashboards from Carrier API by using Python and NIFI.
- Created test plan to ensure data quality for the delivery.
- Working as Data scientist for the customer analytics to identifying issues and response form customer through social websites like Facebook and twitter.
- Leading team for DataStage migration project from DataStage 9.1 to DataStage 11.5.
- Prepared project implementation and rollback plans.
- Created UNIX scripts for archive source files and delete files from source path. Created scripts for file validation, repository scripts, oracle scripts and daily recon scripts.
Confidential
ETL Developer
Environment: IBM Web sphere Data Stage 8.5, Unix, Teradata.
Responsibilities:
- Involved in design of dimensional data model - Star schema and Snowflake Schema
- Generating DB scripts from Data modeling tool and Creation of physical tables in DB.
- Worked SCDs to populate Type I and Type II slowly changing dimension tables from several operational source files
- Created some routines (Before-After, transform function) used across the project.
- Experienced in PX file stages that include Complex Flat File stage, DataSet stage, LookUp File Stage, Sequential file stage.
- Implemented Shared containerformultiple jobs and Local containersforsame job as per requirements.
- Adept knowledge and experience in mapping source to target data using IBM Data Stage 8.x
- Implemented multi-node declaration using configuration files (APT Config file) for performance enhancement.
- Experienced in developing parallel jobs using various Development/debug stages (Peek stage, Head & Tail Stage, Row generator stage, Column generator stage, Sample Stage) and processing stages (Aggregator, Change Capture, Change Apply, Filter, Sort & Merge, Funnel, Remove Duplicate Stage)
- Debug, test and fix the transformation logic applied in the parallel jobs
- Involved in creating UNIX shell scriptsfordatabase connectivity and executing queries in parallel job execution.
- Used the ETL Data Stage Director to schedule and running the jobs, testing, and debugging its components & monitoring performance statistics.
- Experienced in using SQL Loader and import utility in TOAD to populate tables in the data warehouse.
Confidential
ETL Developer
Environment: IBM Web sphere Data Stage 8.5, Unix, Teradata, and DB2.
Responsibilities:
- Analysis of Business Requirements and Impact Analysis of enhancements.
- Extensively used Data Stage Designer to design and develop Server and PX jobs for migrating data from Data base into the Data Warehouse.
- Extensively used Processing Stages- Join, Funnel, Filter, Aggregator, Sort, Remove Duplicates, Copy, Transformer, Change Data Capture and Lookup. Develop/Debug- Row Generator & Peek stage. File Set Stages- Dataset, Sequential File, DB2.
- Analysis of Pro *C scripts and preparing high level design document to convert Pro*C scripts into data stage jobs.
- Created new projects in administrator, adding user defined variables in project level, giving specific roles to user, and making configurations.
- Prepared project implementation and risk handling plans.
- Development of sequel queries to validate the data cleansed and transformed and loaded using Data Stage.
- Involved in the testing of the various jobs developed and maintaining the test log.
- Created UNIX scripts over korn shell (ksh) for preload and post load auditing of source files.
Confidential
ETL DataStage Developer.
Environment: IBM DataStage, Oracle.
Responsibilities:
- Understanding business logic provided in mappings and creating queries based on that to extract data.
- Extensively worked on various Parallel job stages like Oracle Connector Stage, Teradata Mload and Enterprise Stages, Transformer, Join, Lookup, Merge, Sort, Filter, Aggregator, Copy, remove duplicates, Funnel, Change capture, Data Set, Sequential File and Sequence job stages.
- Separated loading jobs from extraction and transformation jobs, so that extraction and transformation jobs can be scheduled before the load window starts.
- Build Master for RDM track in R2.1 and released code into higher environments using TFS.
- Monitoring the production jobs and fixing the incident tickets that are raised.
- Developed System Testing Strategy and involved in setting up System Testing (ST) environment for DataStage and CDC.
- Part of Sprint Team which is formed post Release 2.1 Live to resolve Production Issues at a faster pace.
Confidential
DataStage Developer
Environment: IBM Web sphere Data Stage 7.5.1A, Windows XP, Teradata, SQL server 2005.
Responsibilities:
- Understanding the existing ISS system functionality and involved preparing of impact analysis document based on the enhancements.
- Created DB stored procedure and Unix scripts to load data into target tables.
- Extract data from target tables using Sqoop and process the high-volume data through business rules in Pig and load them in Hive tables,
- Created a mini-MDM using HiveQL which executes in the Spark compatible environment which can process around 10+M records in less than 30 mins.
- Crated high level design document for the new implementations.
- Co-ordinated with offshore and onsite teams.
- Created UNIX scripts over korn shell (ksh) for archive source files and delete files from source path.
- Heterogeneous combinations of Databases and Technologies are handled to ensure existing flow is not disturbed and data is moved to Big Data platform for further analysis.
