Sr. Etl/big Data Integration Lead Consultant Resume
SUMMARY
- A dynamic professional with overall 12+ years experience in JAD, Agile, Scrum/Sprints, Software Development Life Cycle (SDLC) in ETL Integration with System Analysis, Design, Development, and Support of Relational/NoSQL Databases on Data Warehousing and Data Lakes platforms.
- ETL strategies for highly scalable Data Warehouse/Marts using IBM InfoSphere DataStage & QualityStage (v7.5.x till v11.5.0.1) Enterprise, IBM Big Integrate v11.5.0.2 editions.
- Specialized in data loads Slowly Changing Dimension (SCD’s) and Change Data Capture (CDC) of Type - I, Type-II and similar. ETL tuning, performance Sorting, Partitioning, Splitting, Apt Config. to improve overall throughput of ETL Jobs.
- Big Data enthusiast over 2+ years of experience on Hadoop Data Lake Architecture, Design & development on HortonWorks, Cloudera platforms, Hadoop-HDFS, Hive, Pig, Spark, Kafka, Nifi, Sqoop. Source data analysis: Apache Spark, Python, TriFacta Data Wrangler.
- Very well versed in conceptual, logical, physical, relational (E-R), Star and Snowflake dimensional data modeling using ERWin Data Modeler 7.x.
- Database processes: SQL/PL-SQL. Oracle SQL Loader, TeraData Fast Load Multi-load, BTEQ scripting.
- SQL Performance Tuning: Oracle v11g-v8: Explain plan, Optimizer Costs, Hints, Indexes, Statistics. IBM DB2 UDB 9.x: Gather Snapshot, Buffer Pools Size/Hit ratio/Buffer Size/Parameters. Teradata 14.x: Explain, Collects Statistics on Join column, Unique/Non-unique Primary/Secondary Indexes/Joins/columns. SQL Server: Cluster/Non Cluster Indexes, Statistics, partitioning, QueryTraceOn Trace Flag with Optimizer rules, and Table/Query level Hints.
- Experienced in data quality, data profiling, and metadata management flat files mgt., data scrubbing and batch integration.
- Worked in Life Insurance, Healthcare, Financial, Pharmaceutical, Medical and Telecom domains.
- Highly perceptive, proven ability to pinpoint problems, and follow through to resolution in a timely and cost effective manner. Excellent team player with good analytical, communication, interpersonal skills, coordination.
TECHNICAL SKILLS
- Data Analysis
- Data lineage
- Mapping
- Trackers
- OLTP/Data Warehouse
- ETL
- Integration-Design
- Development - (Data Combine
- Sort
- Aggregate
- Transform)
- Databases (Oracle
- TeraData
- IBM DB2) SQL
- PLSQL
- Performance Tuning
- Semi-Structured data processing (XML/JSON)
- Design Data Lake
- Hive
- Pig
- Spark
- Data Wrangling (Trifacta)
- Sqoop
- Python
- Kafka
- NoSQL
- Java
- C#.NET
- VB.NET
- Shell Script
PROFESSIONAL EXPERIENCE
Confidential
Sr. ETL/Big Data Integration Lead Consultant
Environment: IBM DataStage/QualityStage v11.5.0.1-v9.5.x, IBM DataStage Big Integrate v11.5.0.2, Toad 12.x, HortonWorks: Hadoop, HDFS Ambari, Kafka, MirrorMaker, LogStash, Spark SQL, Hive, Pig, NiFi, SQL Workbench, Trifacta Data Wrangler, IBM AIX, Windows 7, IBM-ESP Scheduler, and MS Visio 2013. MS Office 2016, SalesForce Workbench, SalesForce Data Loader, SOQL, MuleSoft ESB/Anypoint studio, CA Agile Central-Rally, Atlassian/Confluence-Jira.
Responsibilities:
- Performed detailed systems gap analysis on various sources system data feeds to legacy PCS CRM application and from NYL’s ESDA Data warehousing, TransferHub, and Data Lake platforms.
- Effective Interaction with Business User/Product Managers/SalesForce Application/Functional team for new Features, Enhancements.
- Efficiently worked as Agile Scrum-Master (CA-Rally) to perform, iterative Scrum Planning, Sprints, requirements translation and track progress into Business Features, User Stories, defects, Backlogs.
- Extensively Prepared, maintained detailed data lineage mapping/trackers on PCS sources Policy (PDS), Investments, Client, Leads (CLT), Customer, Corporate & Agent Personal Copy Legacy data, Users, Agents, Marketers and Salesforce CRM Sales/Service objects.
- Performed data analysis over Hadoop HDFS, Trifacta Data Wrangler, SQL Workbench, Ambari, Hive, Pig UI views.
- Designed/developed ETL (Type-I & Type-II) slowly changing dimension (CDC) for Historical and Incremental data loads from Source-to-Staging, Staging-to-Integration, Staging-to-TransferHub outbound into SalesForce objects, such as Client/Account, Leads/Consumer, Policy (Life/Annuities/Others), Investments/Financial Accounts Assets, Asset Party Relationship, User/Agents and related using DataStage v9.5.x/v11.5.x.
- Participated in DataLake architecture building discussions and iterative exercises with Data Architects, Big Data Admins and Business Stakeholders.
- Designed, Developed delta data load processes on Structured (RDBMS)/Semi-Structure (XML/JSON) data on Client, Applicant, Marketer, Agent sources over Hadoop/HDFS into Hive (TEXTFILE, RCFILE Formats) tables on NYL’s DataLake Ingest, curated, conformed, history zones and into ElasticSearch Indexes for further processing.
- Performed Full/Incremental imports with Sqoop from Oracle RDBMS Source data into HDFS Ingest, landing Zone and data loads into Hive tables.
- Hive partitioning, bucketing, enabling compression parameters, Join Optimization (MapJoin, SkewJoin, Bucketed Map Join), parallel query execution.
- Performed Kafka Producer/Consumers Topics push/pull functionalities P.O.C. for processing JSON logs and load into Hive Audit tables for using v11.5.0.2 Big Integrate - Hive, Hierarchical, File, Kafka Connectors, shell scripts.
- Tracking of Agile processes SIT, Integration testing with Confluence: Jira and CA: Rally. Involved in UAT with business end users on Salesforce UI data validation using SFDC Data Loader tools/SOQL.
- ETL Master Integration: DataStage Job sequence, Shell Scripts, ESP jobs. Implemented ETL Control process to capture job success, failure, error and audit reporting. Managing data integration, errors, restart ability and notifications.
Confidential, Chicago, IL
Sr. ETL/Big Data Integration Lead Consultant
Environment: IBM DataStage & QualityStage 8.7.x Parallel Extender EE (Citrix), IBM MDM Server 8.0.1, DB2-MDM 9.6, Teradata 14.10, Teradata SQL Assistant, Toad 9.6, IBM AIX, Windows 7, Citrix, Zena Scheduler, IBM Team Concert and MS Visio 2007, Cloudera-Hadoop/HDFS, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, YARN, Python 2.7
Responsibilities:
- Lead/managed ECM-ME, SDW-STN, VBC/PI-3B project developers and performed detailed analysis on various application Memberships level Demography, Eligibility, Claims, Plans, Medical, Biometrics/Lab, Incentives, and others.
- Designed, developed Initial/Incremental data loads involving RDBMS/Flat/XML files from source systems to staging area to ODS/IDS (DB2) and Atomic data warehouse (ADW/TDW-Teradata) using DataStage 8.7.x
- Wrote BTEQ scripts to extract data, and also perform bulk loads using Mload (Max 5 Table DML), Fast Loads (Empty Table DML).
- Designed ETL jobs/sequences for data extraction/load strategies for various source systems with CDC of Type-I and Type-II slowly changing dimension implementation.
- Designed ETL to process IBM DataPower MQ Series XML messages (HCSC/BCBSA - In patient prior auth. admission, discharge notifications) using DataStage 8.5 MQ Connectors, Web Services XML/WSDL process within DataStage jobs using legacy XML input/Output and new XML stages.
- ETL Master integration: Job Sequences with Zena Scheduler (Events, Processes, Tasks), Implemented Process (ETL Control) logic to capture job success, failure, error and audit, information for reporting. DataStage parallel jobs tuning and optimization.
- Extensively worked with BCBSIL Big Data team on Cloudera Hadoop/Data Lake project to process Medical claims, patient vitals, drug related data feeds into Data lake layers (Ingest/curated/conformed/History zones), with ETL and Hadoop tools: Hive/Pig. Performed Oracle RDBMS Source data Full/Incremental imports with Sqoop into HDFS Ingest landing Zone and data load into Hive (TEXTFILE, RCFILE formats) tables.
- Implemented partitioning / bucketing on Hive tables. Enabling compression parameters, parallel query execution, Joins (MapJoin, SkewJoin, Bucketed Map Join) optimization.
- TeraData SQL Tuning: Explain, Collects Statistics-Join column, Unique, Non-unique Primary/Secondary Indexes/Joins columns.
- Performed SIT, Functional, Integration and UAT Testing with IBM Team Concert for tracking/resolution.
Confidential, New York, NY
Sr. ETL Datastage Integration Lead/Consultant
Environment: IBM DataStage & QualityStage 8.5 Parallel Extender EE (Citrix), IMB MDM Server 8.0.1, Oracle 11g/10g, Toad 9.6, HP-UX B.11.31, Windows 7, Citrix, Autosys Scheduler, and MS Visio 2007
Responsibilities:
- Performed Medicaid Redesign Team project detailed analysis of legacy and other application layers - Member Demography, Eligibility, Claims, Customer Service, Referrals, Pre-Authorizations, Welcome Kit, IVR systems, Web usages, Market Prominence, and related
- Prepared HLD and tech docs and presented to SME and stakeholders in a design review and walkthrough sessions.
- Extensively designed, and developed ETL Processes using DataStage 8.5 stages. Designed ETL jobs/job sequences to manage data extraction/load strategies for various source systems. Developed ETL data load jobs to implement slowly changing dimension.
- Created Master Job Sequences for integration, (ETL Control) logic to capture job success, failure, error and audit, information for reporting.
- Tuned/optimized DataStage parallel jobs - partition/sorting methods, node config. files, env. variables, shared containers (code modularization/job splitting).
- Business transformations: PL/SQL stored procs, functions. Tuned SQL Performance - Explain Plans, Indexes, Hints and Partitions.
- ETL integration with DataStage Job Sequences, DataStage Director, Autosys jobs (JIL scripts), and Shell Scripts.
- Performed Functional, System Integration and UAT, Testing with HP Quality Center for tracking/resolution.
Confidential, Philadelphia, PA
Sr. ETL/Integration Module Lead/Consultant
Environment: IBM DataStage & QualityStage 8.x/7.5.x Parallel Extender EE (Citrix), Oracle 10g, SQL Developer, SQL Server 2010, BO.XI.Rel2, IMB MDM Server 8.0.1, IBM AIX 5.3, Windows 7, Citrix, ESP Scheduler, and MS Visio 2007.
Responsibilities:
- Performed SRO process sunset projects detailed analysis on legacy & downstream applications - IMAGE, SROHIST, CARS, OSCARS, FICA Policy, CEPP EFTs, CEPP Write backs, CheckFree, DHCC, TIAA CREF and others which were written 40 years back (Mainframes).
- Effective lead and coordinated between onsite and offshore Confidential resources (consisting 20+ resources), project process assignments, design walkthroughs, code development, follow up and other aspects.
- Extensively prepared HLD/Technical docs and presented to Client SME and stakeholders in a design review/walkthrough sessions.
- Extensively designed, and developed data loads using DataStage 8.1 Designed ETL jobs/job sequences for data extraction/load strategies of various source systems, Acclaim DB, Client, Policy, Consumer GADB and FINEOS system to centralized integration GCIF database serves as data source to drive legacy downstream processes.
- Developed ETL data load jobs to implement slowly changing dimension (SCD- Type-I, Type-II, & Type-IV).
- Master Job Sequences, process integration, errors, restart ability and email alerts. Implemented Process (PCDB) logic to capture job success, failure, error and audit, information for reporting.
- Tuned/optimized DataStage parallel jobs - partition/sorting methods, node config. files, environmental variables, shared containers (code modularization/job splits).
- Business transformations: SQL/PL-SQL Stored procs, functions, views. SQL Performance Tuning - Explain Plans and Indexes, Hints and Partitions. ETL integration with ESP jobs. Unit, Functional, and System Integration Testing with HP Quality Center for tracking/resolution.
Confidential, Wilmington, DE
Sr. ETL/Integration Lead/Consultant
Environment: IBM DataStage & QualityStage 8.1.x Parallel Extender EE, Oracle 10g, Toad 10.0, IBM AIX 5.3, Windows XP, Control-M Scheduler, WinCVS, Microstrategy 8.1, and MS Visio 2007
Responsibilities:
- Performed detailed analysis for UDAP Disputes, Collections Ops., Multiproduct Account Structure Changes, Multiple ACH Payments (Draft 5/Return file) and legacy Money Movement processes involving MasterCard and TSYS, Wachovia/Wells Fargo systems for Collections/Customer Care, Data retention/Compliance departments. Some legacy processes are written 15+ years back. Extensively prepared and presented HLD/LLD documents to Client SMEs, stake holders in technical code review and walkthrough sessions.
- Effective coordination onsite-offshore resources (10+ resources, Pune, India Branch), design, code development, handover, follow up and project assignments tracking in Agile Scrums sprints.
- Extensively developed ETL processes for Disputes Cases, Collections Operations, Mercury Account Structure Changes, Multiple ACH Payments and Money Movement using DataStage 8.1 with enterprise/processing stages by implementing Barclay’s custom data transfer processes.
- Designed Master Integration Job sequences, Korn shell scripts, Control-M jobs, implemented Process Run-Log logic for job success/failure tracking for MIS reporting, managed errors, re-startability and alert notifications.
- Designed BCUS custom file transfer process configurations for credit card, customer payment transactions, and third party vendor CMC and TSYS.
- Configured DataStage 8.1 Web Services connectivity using BPM/Java WSDL. Performed Web Services method invocation for BPM Disputes, Money Movement, and Credit Card Services within DataStage jobs using XML stages.
- Tuned, optimized parallel jobs - partition strategies, node config. files, environmental variables, shared containers, jobs splitting.
- PL/SQL, stored procs., functions, and triggers. SQL tuning: Explain Plans, Indexes, SQL Hints and Partitions.
- Closely worked with MicroStrategy BI reporting team for Disputes, Collection and Customer Payments, Money Movement, Oracle Data Retention processes reports generation.
- HP Quality Center: SIT, Functional, Integration, and Regression Testing with. Feature Checklists (FCLs). Release Plans, Change Mgt. Process implementation. Interacted with Release Quality and Config. Mgt. dept. for Batch/WEB/Data Feed - Weekly, Monthly QA/PROD release cycles
Confidential, Deerfield, IL
Sr. ETL/Integration Consultant
Environment: IBM DataStage & QualityStage 8.1.x Parallel Extender EE, Information Analyzer 8.1.x, Oracle 10g, Toad 9.6, IBM AIX 5.3, Windows XP, IBM Tivoli Job Scheduler, PVCS, Erwin Data Modeler 7.3, Cognos BI 8 Reporting toolset, MS Visio 2007, MS Office SharePoint 2007, Siebel Pharma v7.8, Exploria SPS 2.5 software
Responsibilities:
- Performed Siebel SFA and Exploria “Clickstream” systems analysis for Call Planning, HCP/Medical Reps detailing, and Tablet data syncing to Exploria Servers. Task prioritization, planning, estimations, and QA phases 1.0 through 1.2 of CLP/CLM project.
- Involved in data analysis, profiling, functional dependency, table/ column relationships analysis using IBM Information Analyzer 8.1 for HCP, Medical Reps Segmentation and Exploria “Clickstream” and Siebel SFA data.
- Created tech design specs. on ETL, conversion, migration process, Operations, production roll outs for go live and presented to Client SMEs in review sessions.
- Designed, implemented data standardization rules and matching solution for HCP and Medical Reps address, IMS Ids, SSN and Contact numbers using QualityStage. Data standardized into most commonly used format/patterns, converting the variable length record to fixed length records, parsing the fields to single domain data fields, by trimming/removing spaces, & nonprinting characters.
- Extensively designed, developed ETL solution to handle Historical/Incremental data loads in data warehouse, Batch Audit and Controls to process Control Totals, Batch Data Errors, and load History using DataStage 8.1 parallel extender.
- Developed ETL data load jobs to implement slowly Changing Dimension (SCD Type II) and Master Job Sequences.
- Tuned parallel jobs using sorting/partition strategies, tuning configuration parameters, shared containers to increase the throughput of the data load.
- Developed PL/SQL stored procedures, functions, and triggers for business transformations and for real time integration between Siebel Call Planning and Data Source Importer process. Designed and developed cleansing procedures for batch process.
- SQL performance tuning using Explain Plan, optimizer hints and applied indexes when required.
- Process integration with IBM Tivoli job streams, DataStage Director, and UNIX Korn shell scripts. Performed real-time integration process of Reps-HCP call planning, presentation-slide execution from Siebel SFA 7.8 to sync the data into Exploria “Clickstream” database.
- Interfaced with BI reporting team for Cognos BI 8 Reporting and dashboards design to meet CLP Call Planning, HCP Segmentation trends and related business matrix.
- Performance and integration testing. Defect tracking, and code fixing with MS-Sharepoint 2007.
Confidential, New York, NY
Sr. ETL/Data Warehouse Consultant
Environment: IBM DataStage & QualityStage 8.1/7.5.2 Parallel Extender, Information Analyzer 8.1/ProfileStage 7.5.x, Oracle 10g/9i, Toad 9.6, Windows XP, IBM AIX 5.3, Mainframe COBOL, IBM Tivoli Job Scheduler, IBM WebSphere Web Application Server 5.x. PVCS, Mercury/HP Quality Center 9.0, Erwin Data Modeler 7.x, MS Visio 2003.
Responsibilities:
- Strengthen learning by providing strategic solutions, tracked progress and regularly followed up the ETL development effort onsite and offshore.
- Created technical High Level design documents with process flows, migration process documents on Royalty performance statement extract, distribution Y2D, Membership Hold/Release, deductions process.
- Performed project migration from DataStage v7.5 to DataStage v8.1, designed, enhanced Parallel jobs. Implemented Slowly Changing Dimension (Type-I & Type-II) data load.
- Data extract from Source system to ODS layer and to Target systems with Oracle Enterprise stage. Efficiently made use of parameter sets, range lookups, and other processing/development/debug parallel stages.
- Tuned and optimized parallel jobs using partition strategies, shared containers. Config. Files, job splits, resolving job locks, and setting up environmental variables, migrating jobs to QA/UAT/Prod environments with Datastage Administrator.
- Extensively designed Job sequences to integrate flow, using various Activities, Triggers, sequencers, Email notification, and Wait-for-File activities & Watchers.
- Implemented PREP Statements business transformations: with PL/SQL stored procedures, functions, and triggers. Developed cleansing procedures to revert back to previous state to handle aborted Statement batch process. SQL performance tuning with Explain Plan, Optimizer Hints, and indexes.
- Done PREP Statements batch jobs integration via IBM Tivoli job scheduler job streams, DS-Director and Korn shell scripts.
- Done Statements module unit, and integration testing. Bug fixes, defect fixing with Mercury/HP QC 9.0.
Confidential, Malvern, PA
ETL/Data Warehouse Consultant
Environment: DataStage 7.5.2 EE, ProfileStage 7.5.1, QualityStage 7.5.1, MetaStage 7.x, Oracle 10g/9i, IBM DB2/UDB 8.x,Toad 8.x, SAP R/3, SQL Server 2005, Visual Studio .NET 2005, CLR, MicroStrategy 8 Desktop, Web Intelligence Server 8, Rational ClearCase, Autosys, Erwin Data Modeler 4.0, Windows XP, and UNIX
Responsibilities:
- Analyzed SC-EDW architecture, tech. specs, mapping matrix on Clinical observations, ordered, administered medications and other key performance indicators (KPI’s).
- Performed Source systems profiling for functional dependency, table/column relationships and reporting on source and target systems using ProfileStage along with Data Architects for data modeling exercise.
- Designed ETL parallel extender (PX) jobs with Oracle and DB2 Enterprise stages for grouping, summarizing on KPI’s for Patient Billing, Revenues, Outpatient Visits, Patient Clinical information, Cost-based benchmarking, and Provider Reports.
- Used SAP R/3 (ABAP EXT for R3 and IDOC EXT for R3 stages) extract pack for SAP Patient Management System for Patient Claims, Billing, Pharmacy, and Provider data extract/load into SC-EDW.
- ETL Job Performance tuning sorting, partitioning methods, configuration files parameters, shared containers, migrating jobs to QA/ /Prod environments.
- Business transformations: Developed PL/SQL packages, procedures, functions, and triggers under HIPPA guidelines in Oracle, DB2/UDB and SQL Server.
- Implemented Patient’s de-identification with Visual Studio .NET/CLR User defined functions/DB objects to deploy/integrate with SQL Server 2005 database Assemblies.
- UNIX Korn Shell scrips, Autosys Job Scheduler Scripts schedule, and test ETL jobs during development. Done Unit, performance, integration testing of ETL process.
- Used MicroStrategy Desktop ProjectBuilder to import tables, build objects, & reports. Used FreeForm SQL to create FreeForm reports on Patient Trials, Cardiology Study, IV-to-Oral meds, and Provider Drug Cost Reports.
- Used MicroStrategy Web to enable users to run-predefined sales and inventory reports for building customized reports. Configured MicroStrategy Intelligence Server to use project sources to start services, execute and test reports.
Confidential, TX
Software Programmer
Environment: Visual Basic 6.0, Microsoft Excel, Access, ChartFX 5.0, C#.NET, ASP.NET, Java, Jakarta Struts, MySQL 4.x, & Eclipse/MyEclipse IDE 3.x.
Responsibilities:
- Research project requirements analysis at Texas A & M Experiment Stations with Primary Research Investigator (PRI)
- Worked along on research proposal to Cotton Incorporated, NC "Cotton variety selection research project".
- Extensively designed and developed "Cotton Wizard Cotton Variety Test System" a web/internet based application with MS Visual Studio.NET 2003 and C#.NET, Excel, Oracle 8i
- (Core features: raw data import/export, timely access over internet, Statistical Head-to-Head and related analysis on critical cotton variety performance test data for varietal assessment on key performance Indicators (KPIs) such as Lint Yield, Lint Percent, Fiber length, Micronaire, Strength and Maturity to help cotton breeders, and producers to decide the most yielding cotton varieties they should grow)
- Designed, developed and maintained Cotton Wizard, a cotton variety performance assessment client-server application to serve as a statistical analysis tool to cotton breeders, and producers using tools such as Visual Basic 6.0, MS-Access 2000 & ChartFX 5.0.
- Developed On-line Reference System to store research references, and bookmarks for the department using Jakarta Struts, MySQL, and Eclipse/MyEclipse IDE.
- Developed and maintained user forms in VB 6, ASP.NET, and JSP.
Confidential
Software Engineer
Environment: MS-Visual Studio/VB.NET, IIS 5.0, Java 1.3 Servlets, JSP, JNDI-LDAP/JNI, WML, iPlanet Web & Calendar Server 2.0, IBM WebSphere App. Server 4.0, Oracle 8i, Toad, Windows NT, Sun Solaris 7, JavaScript, VBScript, VBA macros, MS Office.
Responsibilities:
- Wrote tech. specs that follow ISO-9000 guidelines. Developed customized web applications (GUIs) -Visual Basic .NET for Customers (Idea Cellular, Airtel, Spice, and Reliance Telecom wireless operators)
- Designed, developed “Alerts” module for SMS, Email & Mobile alerts generation/automation using Java technologies, iPlanet Web/Calendar Server 2.0 and Oracle 8i.
- Developed SMS/Web Contest applications having Admin & User Modules for process tracking, user activities, housekeeping, and to configure Contest activities via log files, multiple mobile operator/users mgt. using Java Servlets, JSP, HTML & Oracle 8i.
- Prepared unit, black box & integration test for Cellnext’s p-Biz Gateway Mobile Middleware Framework.