- 8+years of experience in IT Industry and worked on Data Warehousing projects, Data Integration Projects and Data migration projects with development/Enhancement Projects and also have Production Support Experience.
- Experience in analysis, design, development and integration using Bigdata - Hadoop Technology like MapReduce, Hive, Pig, Sqoop, Ozzie, Kafka, HBase, AWS, Cloudera, Horton works, Impala, Avro and Data Processing.
- Strong experience with Informatica Data Quality IDQ 9.6, Informatica Analyst and Informatica Power Center ETL 9X, 10X ETL tools and good experience with Informatica Power-Exchange CDC Change Data Capture and B2B MFT tools
- Expertise in the Data Analysis, Design, Development, Implementation and Testing of Data Warehousing using Data Extraction, Data Transformation and Data Loading using Datastage.
- Extensively worked with various Relational and non-relational sources and target systems including Teradata, DB2, SQLServer, Oracle, PLSQL, Flat files, XML files, SalesForce.Com (SFDC) etc.
- Good experience with developing Data profiling, Scorecards, Mapplets, Rule, Mappings, workflows, Data Cleansing, Data Standardization process, data exceptional handling using Informatica Data Quality Developer IDQ and Informatica Analyst tools.
- Strong experience in loading and maintaining Data Warehouses and Data Marts using DataStage ETL processes.
- Expert in designing Datastage Parallel jobs using various stages like Join, Merge, Lookup, Remove duplicates, Dataset, Complex flat file, Aggregator, XML, ODBC Connector, Teradata Connector.
- Hands on experience in Big Data concepts, Hadoop, Hive, IBM Data Stage Design, UNIX shell scripting, SAP Data Services, Business Object reporting, Python and Teradata.
- Extensively worked with cloud salesforce.com (SFDC) system as Source and target systems, SFDC source and Target transformation, Lookup transformations and using UPSERT logic with SFDC target.
- Used Teradata Utilities such as FLoad, MLoad, TPUMP, BTEQ scripts for loading the data from flat files to staging tables, Dimension tables and Fact tables.
- Good Experience with PLSQL Scripting and UNIX/LINUX shell scripting and expertise with using Pushdown Optimization (PDO) technique for performance optimization.
- Strong experience in Data Warehousing concepts such as D.W.H, DataMart's, Star Schemas, Snowflake schemas, Facts, Fact less facts, various Dimensions, SCD techniques, Dimension modeling (SCD) etc.
- Experience with developing slowly changing dimensions (SCD) type -1, 2, 3 mappings and various Change Data Capture (CDC) requirements for incremental data load process.
- Expertise in troubleshooting and analyzing issues/errors and providing resolutions accordingly.
- Experience with data modeling tools such as ERWIN and Microsoft VISIO and xtensively worked with XML transformations like XML Generator and XML Parser to transform XML files.
- Experienced using Control-M, Autosys and Tivoli Workload Scheduler (TWS), Maestro scheduling.
- Strong knowledge of Extraction Transformation and Loading (ETL) processes using AscentialDataStage, UNIX shell scripting and SQL Loader.
- Expertise in all phases of Project/Software Development Life Cycle phases (SDLC) from Requirements gathering/Analysis/Development & Unit testing/QA Support to Production Deployment and knowledge transition to Support teams in various projects implemented.
- Good experience with Production Support activities, incidents, Service requests, Problem tickets, change tickets, DR activities, Maintenance activities with Service Now and Remedy
- Expertise in working with Business Analysts, Architects/Project members from different areas including technical and non-technical resources.
ETL Tools: IBM Datastage 11.x, 8.x, Informatica Power Center 10.x/9.x (Source Analyzer, Mapping Designer, Workflow Monitor, Workflow Manager, Power Connects for ERP and Mainframes, Power Plugs), Power Exchange, Informatica Data integrator, Power Connect, Data Junction (Map Designer, Process Designer, Meta Data Query).
OLAP/DSS Tools: Business Objects XI, Hyperion, Crystal Reports XI
Databases: Oracle 10g/11g/12c,Sybase, DB2, MS SQL Server Teradata v2r6/v2r5, Netezza, HBase, MongoDB and Cassandra
Others: AWS Cloud, AWS Redshift, TOAD, PL/SQL Developer, Tivoli, Cognos, Visual Basic, Perl, SQL-Navigator, Test Director, Win Runner Database Skills Stored Procedures, Database Triggers and packages
Data Modeling Tools: Physical and Logical Data Modeling using ERWIN
Languages: UNIX shell scripts, XML, SQL, Python, T-SQL, Scala.
Operating Systems: Windows NT/ AIX, LINUX, UNIX
Sr. ETL Developer
Confidential, Chicago, IL
- Responsible for Interacting with Business Analyst/Project Architect for understanding of requirements and analyze the requirements and responsible for preparing Design documents (High Level Design document and Low level design document) and security and other compliance standard documents and also Prepared Unit test Plan which needs to be executed in Development environment after the code changes completed.
- Used to handle the group of people (offshore team) and perform lead role to make sure reaching deadlines and get the work done as per the time lines/phases.
- Used Informatica Power Center ETL tool for developing mappings/mapplets and create sessions/tasks, workflows for integrating the data to large Datawarehouse like Oracle, Teradata.
- Created the mappings using Informatica BigData edition tool and all the job areas are subjected to run on Hive, Blaze or Spark engines.
- Involved in migrating entire data warehouse data using AWS services and Apache SPARK and SQOOP applications.
- Creating IDQ Data profiling, ScoreCards, Mapplets, Rules, Mappings, workflows, Data Cleansing, Data Standardization process using IDQ Developer and Informatica Analyst tools.
- Involved in Extraction, Transformation and Loading of data. Data ingestion with different data sources and load into redshift and working on redshift database AWS maintenance, we must vacuum and analyze our AWS redshift tables.
- Created all JDBC, ODBC and HIVE connections to the Informatica Developer Client Tool to import the parquet files and the relation tables.
- Developed mappings for Change data Capture CDC with Power Exchange with Change data Capture and developed Mappings, Workflows for extracting CDC data and scheduled in TWS Scheduler tool.
- Install SQL Server on EC2 instances in the new environment, create High Availability solution with existing Vault-Mart servers
- Worked on Streaming near Real Time (NRT) data using Kafka - Flume integration and integrated Apache Storm with Kafka to perform web analytics and to perform clickstream data from Kafka to HDFS.
- Developed jobs to send and read data from AWS S3 buckets using components like tS3Connection, tS3BucketExist, tS3Get, tS3Put.
- Ingesting wide variety of data lake structured, unstructured and semi structured into Big data eco systems with batch processing, real time streaming and SQL
- Used Informatica B2B Data Exchange tool MFT the files from external venders to internal servers and used Pushdown Optimization (PDO) technique for performance optimization.
- Involved in Installing and Configuring of Informatica MDM Hub Console, Hub Store, Cleanse and Match Server, Address Doctor, InformaticaPowerCenter applications and worked on Informatica BigData tool moving data from Oracle, Flat File, and JSon file to Hive.
- Used Teradata Utilities such as FLoad, MLoad, BTEQ scripts for loading the data from flat files to staging tables, Dimension tables and Fact tables. These tables will be used for Cognos reporting.
- Worked on Hadoop File formats Text Input Format and Key Value Text Input Format and designing data model on Hbase and Hive. Creating MapReduce jobs for Adhoc data requests.
- Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
- Developed mappings& writing PLSQL Scripts for loading the data into Dimensional tables (SCD 1, 2, 3,4) and fact tables and writing shell Scripts to automate Informatica workflows in scheduling tool.
- Identified bottlenecks and performance tuned at source, target, mapping and sessions where bottle neck occur and used Partitioning technique and pushdown optimization for improving the performance.
- Working closely with the client on planning and brainstorming to migrate the current RDBMS to Hadoop and migration of Informatica jobs to HadoopSqoop jobs and load into oracle database
- Migration of code changes to PreProd/UAT environments, trouble shooting and fixing issues and migration of code changes to QA environment and providing support till QA sign off.
Environment: Informatica Data Quality IDQ 9X/10.2, Informatica Analyst, Informatica Power-Exchange CDC (change data capture), InformaticaPowerCenter 10X/9X, B2B MFT Tool, Oracle, PL/SQL, Teradata, DB2, SalesForceDotCom (SFDC), Hadoop, Hive, Kafka, HBase, Python, Parquet Files, SQL Server 2016, MongoDB, SQL, Business Objects, Shell Scripting, UNIX, Impala, AWS S3, AWS Glue, EC2 and Redshift.
Sr. ETL Developer
Confidential, New York, NY
- Involved in data modeling session, developed technical design documents and used the ETL Datastage Designer to develop processes for extracting, cleansing, transforms, integrating and loading data into data warehouse database.
- Working on POC wrt. BigData like loading the data into HDFS and creating Map Reduce Jobs.
- Created mappings using pushdown optimization to achieve good performance in loading data into Netezza and created the scheduling plan, job execution timings and sharing with scheduling team (Control-M)
- Worked with different API's to get the data using curl command and load into Redshift database and data ingestion with different data sources and load into AWS Redshift and load the files into AWS S3 Bucket and copy into Redshift for creating tables.
- Using UNIX Shell scripting, Python, IBM Data Stage 9.1/ 11.3/11.5 or Hive sql for transformation and load data into Teradata or Oracle.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Migrated the existing Teradata Scripts to Netezza from BTEQ to NZSQL by keeping the business logic same and validating the results across the systems.
- Developed Parallel Jobs using various Development / Debug Stages (Peek Stage, Head and Tail Stage, Row generator stage, Column generator stage, Sample stage) and Processing Stages (Aggregator, Change Capture, Change Apply, Filter, Sort and Merge, Funnel Remove Duplicate Stage)
- Defining the process flow using Datastage job sequences and scheduling the datastage job sequencers using Tivoli Work Scheduler (TWS) and prepared the TWS job stream and job info files which are required to upload in TWS database.
- Extracting transformed data from Hadoop to destination systems, as a one-off job, batch process, or Hadoop streaming process.
- Process streaming data from Kafka topic using Spark Streaming API and load them into data frames and data sets. Also, use Spark SQL to pull in the data and perform complex SQL logics for analytics
- Executing the jobs through TWS and monitoring the jobs and creation of VAR tables in TWS and Executing/Monitoring the jobs using the TWS.
- Established best practices for DataStage jobs to ensure optimal performance, reusability, and restart ability and designer and Director based on business requirements and business rules to load data from source to target tables.
- Wrote Python MapReduce scripts for processing the unstructured data and load log data into HDFS using Flume and worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Excellent with PL/SQL, T-SQL, Stored Procedures, Database Triggers and SQL * Loader. Resolving the defects which have been raised by QA.
- Created Hive external tables using shared meta-store and supported partitioning, clustering and dynamic partitioning for faster data retrieval.
- Reviewing the code developed by the subordinates with respect to naming standards, best practices and worked on different stages for creating the jobs based upon business application.
- Worked with Sqoop to export analyzed data from HDFS environment into RDBMS for report generation and visualization purpose.
- Developed UNIX shell scripts and updated the log for the backups and involved in Unit Testing with the jobs and date loaded in the Target database.
Environment: IBM WebSphereDataStage 11.3, Oracle 11g, TWS, DB2, Quality Stage Designer, Director, Administrator), DB2 UDB, Teradata V13, Hadoop, HDFS, Hive, Flume, Kafka, Sqoop, Python, AWS S3, AWS Redshift, EC2, Linux, SQL, PL/SQL, UNIX Shell Scripting, Datastage Version Control, MS SQL server, Netezza 4.x, Mainframe, Autosys, Info analyzer.
Sr. ETL Developer
Confidential, Grand Rapids, MI
- Analyzed the Business Requirement Documents (BRD) and laid out the steps for the data extraction, business logic implementation & loading into targets.
- Responsible for Impact Analysis, upstream/downstream impacts and created detailed Technical specifications for Data Warehouse and ETL processes.
- Used Informatica as ETL tool, and stored procedures to pull data from source systems/ files, cleanse, transform and load data into the Teradata using Teradata Utilities.
- Applied the concept of Change Data Capture (CDC) and imported the source from Legacy systems.
- Involved in Deployment and Administration of SSIS packages with Business Intelligence development studio.
- Worked on Informatica- Source Analyzer, Warehouse Designer, Mapping Designer &Mapplet, and Transformation Developer and used most of the transformations such as the Source Qualifier, Expression, Aggregator, Filter, Connected and Unconnected Lookups, Joiner, update strategy and stored procedure.
- Extensively used Pre-SQL and Post-SQL scripts for loading the data into the targets according to the requirement.
- Leveraged Informatica Big Data Manager (BDM) tool for ETL processing on Hive tables and HDFS files in Hadoop Big Data environment. Executed mappings in both Native and Hadoop modes.
- Extracted data from various heterogeneous sources like Oracle, Sybase, SFDC, Flat Files and COBOL (VSAM) using Informatica Power center and loaded data in target database DB2.
- Extracted Data from Hadoop and Modified Data according to Business requirement and load into Hadoop.
- Build frame work for inbound/Outbound files getting in to Facets System and used the application to setup the group and subscriber data in theFACETS application.
- Modified SOQL (Sales force object Query Language) for Sales Force target at session level.
- Developed mappings to load Fact and Dimension tables, SCD Type 1&Type 2 dimensions and Incremental loading and unit tested the mappings.
- Successfully upgraded Informatica 9.1 and to 9.5 and responsible for validating objects in new version of Informatica.
- Involved in Initial loads, Incremental loads and Daily loads to ensure that the data is loaded in the tables in a timely and appropriate manner.
- Extensively worked in the performance tuning of Teradata SQL, ETL and other processes to optimize session performance and loaded data in to the Teradata tables using Teradata Utilities Bteq, Fast Load, Multi Load, and Fast Export, TPT.
- Worked extensively with different Caches such as Index cache, Data cache and Lookup cache (Static, Dynamic and Persistence) while developing the Mappings and created Reusable transformations, Mapplets, Worklets using Transformation Developer, Mapplet Designer and Worklet Designer.
- Involved in creation and maintenance of BMC Control-M jobs that submit Cognos cube build scripts that perform automatic publishing of Power Play operational reporting cubes providing version control.
- Implement partitioning and bucketing techniques in Hive and developed script to create external Hive tables and worked with different complex file formats such as Text, Sequence files, Avro, ORC and Parquet.
- Responsible for Unit Testing, Integration Testing and helped with User Acceptance Testing and scheduling Informatica jobs and implementing dependencies if necessary using Autosys.
- Tuned the performance of mappings by following Informatica best practices and also applied several methods to get best performance by decreasing the run time of workflows.
- Worked extensively on Informatica Partitioning when dealing with huge volumes of data and also partitioned the tables in Teradata for optimal performance.
- Managed postproduction issues and delivered all assignments/projects within specified time lines.
Environment: Informatica Power Center 9.5.1, Oracle 11g, DB2, Teradata, Flat Files, Erwin 4.1.2, Sql Assistant, Cognos, Facets, Hadoop, Hive, SQL, Netezza, HDFS, Sqoop, Shell Scripting, UNIX, Toad, Winscp, Putty, Autosys, UNIX, Agile.