- Over 8+ years of IT industry experience in all aspects of Analysis, Design, Testing, Development, Implementation and Support of Relational Database (OLTP), Data Warehousing Systems (OLAP) and Data Marts in various domains and around 3+ years of experience with Talend Open Studio & Talend Enterprise platform for Data Management.
- Experienced in working with Data Warehousing Concepts like OLAP, OLTP, Star Schema, Snow Flake Schema, Logical Data Modeling, Physical Modeling and Dimension Data Modeling and utilizing t - Stats Catcher, t-Die, t-Log Row to create a generic job to store processing stats.
- Experienced with MapReduce, Pig, Programming Model, Installation and Configuration of Hadoop, HBase, Hive, Pig, Sqoop and Flume using Linux commands.
- Work experience in solving BIG DATA problems using Apache Hadoop (Mapreduce, HDFS) and Ecosystems (Hive, PIG Latin, Sqoop, Flume, Oozie, Spark, Avro, Zookeeper).
- Extensively created mappings in Talend using t-Map, t-Join, t-Replicate, t-Parallelize, t-Java, t-Java row, tDie, t-Aggregate Row, t-Warn, t-Log Catcher, t-Filter, t-Global map etc.
- Experienced in creating spark applications in both Scala and Python Context and experienced in writing Hive and Pig queries for data analysis to meet the business requirements.
- Experienced in scheduling Talend jobs using Talend Administration Console (TAC).
- Experience with Talend DI Installation, Administration and development for data warehouse and application integration.
- Experienced in extracting user's data from various data sources into Hadoop Distributed File Systems (HDFS) and automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Expertise in Data modeling techniques like Data Modeling- Dimensional/ Star Schema, and Snowflake modeling, Slowly Changing Dimensions (SCD Type 1, Type 2, and Type 3) and Tracking Daily data load, Monthly data extracts and send to client for their verification.
- Experienced in Hadoop Big Data Integration with ETL on performing data extract, loading and transformation process for ERP data.
- Expertise in writing MapReduce programs in Java, PIG Latin, HQL, Perl scripting, PostgreSQL, VB scripting, Shell scripting, SQL, PL/SQL, Core Java.
- Strong experience in designing and developing Business Intelligence solutions in Data Warehousing using ETL Tools and excellent understanding and best practice of Data Warehousing Concepts, involved in Full Development life cycle of Data Warehousing.
- Experienced working on NoSQL databases like HBase, MongoDB and knowledge in Cassandra.
- Expertise in working with relational databases such as Oracle SQL Server DB2 8.0/7.0, UDB, MS Access and Teradata, Netezza.
- Strong Data Warehousing ETL experience of using Informatica 9.x/8.x/7.x Power Center Client tools - Mapping Designer, Repository manager, Workflow Manager/Monitor and Server tools - Informatica Server, Repository Server manager.
- Extensive experience in Teradata utilities like MLOAD, FLOAD, TPUMP, FASTEXPORT and TPT for improving target loading performance and have also created complex BTEQ scripts.
- Experience on Data Analysis, User Requirement Gathering, User Requirement Analysis, Gap Analysis, Data Cleansing, Data Transformations, Data Relationships, Source Systems Analysis and Reporting Analysis.
- Experienced in analyzing, designing and developing ETL strategies and processes, writing ETL specifications.
- Experienced in working with different data sources like Flat files, Spreadsheet files, log files and Databases and worked extensively with slowly changing dimensions.
- Experienced in SQL Server Reporting Services (SSRS), SQL Server Integration Services (SSIS), Data Transform Services (DTS) and SQL Server Analysis Services (SSAS).
- Proficient in writing complex PL/SQL Packages, Stored Procedures, Triggers, Performance tuning, Application tuning and Query Optimization using Hints, Explain plan, TKPROF.
- Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting, and also used Netezza Utilities to load and execute SQL scripts using Unix
- Hands-on experience across all stages of Software Development Life Cycle (SDLC) including business requirement analysis, data mapping, build, unit testing, systems integration and user acceptance testing.
- Excellent interpersonal and communication skills, and is experienced in working with senior level managers, business people and developers across multiple disciplines.
Languages: JAVA, Python and Scala
Big Data Eco Systems: Hadoop, HDFS, Spark, PIG, HBase, Hive, Sqoop, Zoo Keeper, Oozie, Kafka
ETL Tools: IBM Infosphere Datastage 8.1, 11.5
Relational Databases: Oracle, My SQL, SQL Server, Netezza, Teradata, DB2, MS Access
No SQL Databases: HBase
Scripting Languages: UNIX Shell Scripting, SQL, NZSQL, PL/SQL
Tools: Eclipse, IDLE, SQL Developer, DB Visuablizer, TOAD
Operating Systems: UNIX, Linux, Windows XP and Windows 7
Domain Skills: Retail - Consumer Market Research
Data Modeling Tools: SQL Developer Data Modeler
Confidential, Chicago, IL
Sr. ETL / Hadoop Developer
- Responsible for building scalable distributed data solutions using Hadoop and continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
- Designed Oozie jobs for the auto processing of similar data and worked on project to retrieve log messages procured by leveraging Spark Streaming and collect the data using Spark Streaming.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior and extensively used for all and bulk collect to fetch large volumes of data from table.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs and developed Pig scripts in the areas where extensive coding needs to be reduced.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Handled importing of data from various data sources using Sqoop, performed transformations using Hive, MapReduce and loaded data into HDFS.
- Imported data from RDBMS environment into HDFS using Sqoop for report generation and visualization purpose using Tableau.
- Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
- Created HBase tables to store various data formats of PII data coming from different portfolios and configured Sqoop and developed scripts to extract data from MySQL into HDFS.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig HBase database and Sqoop.
- Created HBase tables to store various data formats of PII data coming from different portfolios. Data processing using SPARK.
- Parsed high-level design specification to simple ETL coding and mapping standards and cluster co-ordination services through Zookeeper.
- Developed complex Talend jobs mappings to load the data from various sources using different components and designed developed and implemented solutions using Talend Integration Suite.
- Built big Data solutions using HBase handling millions of records for the different trends of data and exporting it to hive and tested the data coming from the source before processing.
- Debugged the technical issues and errors was resolved.
- Developed processes on both Teradata and Oracle using shell scripting and RDBMS utilities such as Multi Load, Fast Load, Fast Export, BTEQ (Teradata) and SQL*Plus, SQL*Loader (Oracle).
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite and worked in improving performance of the Talend jobs.
- Introduced Tableau Visualization to Hadoop to produce reports for Business and BI team and worked for ETL job design as per criteria in ODI and loaded data table to Teradata server.
- Implemented slowly changing dimensions (SCD) for some of the Tables as per user requirement and performed unit testing and also integration testing after the development and got the code reviewed.
Environment: Hadoop (Cloudera), Talend ETL Tool, HDFS, MapReduce, Pig, Hive, Sqoop, HBase, Oozie, Flume, Zookeeper, java, SQL, Scripting, Spark, Oracle 11g, XML files, Agile Methodology, Tableau, Teradata, Netezza, Java, SQL, T-SQL, PL/SQL, JSON
Confidential, Chicago, IL
Sr. Hadoop/ETL Developer
- Analyze, design, develop, test, implement and troubleshoot integrations between mission critical business applications including cloud based data warehouses and worked closely with Business analysts and Data architects to understand and analyze the user requirements and closely worked with Data Architects in designing of tables and even involved in modifying technical Specifications.
- Designed and Implemented the ETL process using Talend Enterprise Big Data Edition to load the data from Source to Target Database.
- Importing system generated log files onto HDFS and Implemented MapReduce programs in Java to get trade statistics on log files stored in HDFS and Warehousing RDBMS tables onto HDFS using Hive. Used Sqoop to import data from RDBMS (oracle) to HDFS.
- Involved in Data Extraction from Oracle, Flat files and XML files using Talend by using Java as Backend Language and used tWaitForFile component for file watch event jobs.
- Used over 20+ Components in Talend Like (tMap, tfilelist, tjava, tlogrow, toracleInput, toracleOutput, tsendEmailetc) and used debugger and breakpoints to view transformations output and debug mappings.
- Develop ETL mappings for various Sources (.TXT, .CSV, .XML) and also load the data from these sources into relational tables with Talend Enterprise Edition.
- Worked on Global Context variables, Context variables, and extensively used over 100+components in Talend to create jobs and created child jobs to use them in parent job in using tRunJob.
- Extracting transformed data from Hadoop to destination systems, as a one-off job, batch process, or Hadoop streaming process.
- Developed spark applications in both Scala and python context and used SparkSQL to connect to different databases/HDFS and actively worked in Sqoop to move Structured, Unstructured and semi structured Data from multiple databases to HDFS.
- Worked on Error handling techniques and tuning the ETL flow for better performance and worked Extensively TAC (Admin Console), where we Schedule Jobs in Job Conductor.
- Extensively Used Talend components tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator, tOracleInput, tOracleOutput, tfileList, tDelimited etc.
- Worked with Oracle SQL Developer while Implementing Unit Testing of ETL Talend Jobs and scheduling the ETL mappings on daily, weekly, monthly and yearly basis.
- Working on POC Big Data like loading the data into HDFS and creating Map Reduce Jobs and worked on the project documentation and also prepared the Source Target mapping specs with the business logic and also involved in data modeling.
- Worked on migrating data warehouses from existing SQL Server to Oracle database.
- Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse and developed mappings /Transformation/Joblets and designed ETL Jobs/Packages using Talend Integration Suite (TIS) in Talend.
- Implemented Performance tuning in Mappings and Sessions by identifying the bottlenecks and Implemented effective transformation Logic.
- Used Teradata utilities (TPT, BTEQ) to load data from source to target table and created various kinds of indexes for performance enhancement.
- Created Workflows using various tasks like sessions, control, decision, e-mail, command, worklets, and assignment and worked on scheduling of the workflows and verify the logs to confirm all the relevant jobs are completed successfully and timely and involved in production support to resolve the production issues.
- Designed and developed Big Data analytics platform for processing customer viewing preferences and social media comments using Java, Hadoop, Hive and Pig and provide ETL solution to the requirement using BIG DATA Hadoop.
Environment: Talend Platform for Big Data 6.2, Talend Open Studio 5.0.1, Cognos Data Manager, Cognos 10.2.2, UNIX, Oracle 12c, TAC (Admin Center), SQL Server, TOAD, Autosys, Spark, Oracle 12c, XML files, MongoDB, Flat files, HL7 files, JSON, AWS, HDFS, Hive, HBase Agile Methodology, Tableau, SSRS, SQL, PL/SQL, Teradata, Netezza, Aginity, SQL Assistant, UNIX Shell Scripting, SQL, T-SQL, Tableau and Cloudera Manager.
Confidential, Chicago, IL
- Worked closely with Business Analysts to review the business specifications of the project and to gather the ETL requirements.
- Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.
- Created and managed Source to Target mapping documents for all Facts and Dimension tables analyzing the source data to know the quality of data by using Talend Data Quality and involved in writing SQL Queries and used Joins to access data from Oracle, and MySQL.
- Prepared ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.
- Design and Implemented ETL for data load from heterogeneous Sources to SQL Server and Oracle as target databases and for Fact and Slowly Changing Dimensions SCD-Type1 and SCD-Type2.
- Utilized Big Data components like tHDFS Input, tHDFS Output, tPig Load, tPig FilterRow, tPig FilterColumn, tPigStore Result, tHiveLoad, tHiveInput, tHbaseInput, tHbaseOutput, tSqoopImport and tSqoopExport.
- Used Talend most used components (tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator, tSetGlobalVar, tHashInput & tHashOutput and many more).
- Created many complex ETL jobs for data exchange from and to Database Server and various other systems including RDBMS, XML, CSV, and Flat file structures and used debug mode of Talend to debug a job to fix errors.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
- Conducted JAD sessions with business users and SME's for better understanding of the reporting requirements.
- Developed Talend jobs to populate the claims data to data warehouse - star schema and used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly, monthly and yearly basis.
- Worked on various Talend components such as tMap, tFilterRow, tAggregateRow, tFileExist, tFileCopy, tFileList, tDie etc and worked extensively on Talend Admin Console and Schedule Jobs in Job Conductor.
Environment: Talend Data Integration 5.5.1, Talend Enterprise Big Data Edition 5.1, Talend Administrator Console, MS SQL Server 2012/2008, Oracle 11g, Hive, HDFS, Sqoop, TOAD, UNIX.
Confidential, Chicago, IL & India
- Participated in requirement analysis with the help of business model and functional model and wrote documentation to describe program development, logic, coding, testing, changes and corrections.
- Created complex mappings using various transformations like Transaction control, SQL Transformations, etc.
- Wrote PL/SQL stored procedures and triggers, cursors for implementing business rules and transformations. Created complex T-SQL queries and functions.
- Provided support to develop the entire warehouse architecture and planned the ETL process.
- Extracted data from flat files, XML files and Oracle, applied business logic to load them in the central Oracle database.
- Involved in migration of maps from IDQ to Power Center and Applied the rules and profiled the source and target table's data using IDQ
- Developed and maintained ETL (Extract, Transformation and Loading) mappings to extract the data from multiple source systems like Oracle, SQL server and Flat files and loaded into Oracle.
- Performance tuned various mappings, Sources, Targets and transformations by optimizing caches for lookup, joiner, rank, aggregator, sorter transformation and tuned performance of Informatica session for data files by increasing buffer block size, data cache size, sequence buffer length and used optimized target based commit interval and Pipeline partitioning to speed up mapping execution time
- Populate or refresh Teradata tables using Fast load, Multi load & Fast export utilities for user Acceptance testing and wrote SQL queries and PL/SQL procedures to perform database operations according to business requirements.
- Created some exclusive mappings in Informatica to load the data from external sources to landing tables of MDM hub.
- Developed mappings/reusable objects/transformations/mapplets by using mapping designer, transformation developer and mapplet designer in Informatica Power Center
- Monitored and tuned ETL repository and system for performance improvements and Created folders, users, repositories, deployment group using Repository Manager.
- Extensively used Netezza utilities like NZLOAD and NZSQL and loaded data directly from Oracle to Netezza without any intermediate files.
- Defined the content, structures and quality of high complex data structures using Informatica Data Explore (IDE).
- Implemented slowly changing dimension to maintain current information and history information in dimension tables.
- Generated the SAP Business Objects reports involving complex queries, sub queries, Unions and Intersection.
- Primary activities include data analysis identifying and implementing data quality rules in IDQ and finally linking rules to power center ETL process and delivery to other data consumers.
- Designed and Developed ETL strategy to populate the Data Warehouse from various source systems such as Oracle, Teradata, Netezza, Flat files, XML, SQL Server
- Responsibilities included designing and developing complex mappings using Informatica power center and Informatica developer (IDQ) and extensively worked on Address validator transformation in Informatica developer (IDQ).
- Generated queries using SQL to check for consistency of the data in the tables and to update the tables as per the Business requirements.
- Created Jobs and Job streams in Autosys scheduling tool to schedule Informatica, SQL script and shell script jobs
- Implemented Real-Time Change Data Capture (CDC) for SalesForce.com (SFDC) sources using Informatica Power Center and implemented Slowly Changing Dimensions for applying INSERT else UPDATE to Target tables.
- Designed complex mappings in Power Center Designer using Aggregate, Expression, Filter and Sequence Generator, Update Strategy, Union, Lookup, Joiner, XML Source Qualifier and Stored procedure transformations.
- Proposed PL/SQL and UNIX Shell Scripts for scheduling the sessions in Informatica.
- Created the mapping to load the data from different Base Objects in MDM into single flat structure in Informatica Developer.
- Worked with reporting team using the BI interface Business object on improving the business.
Environment : Informatica Power Center 9.3/5(Power Center Designer, Teradata, workflow manager, workflow monitor), Oracle 11g, IDQ, SQL Server 2010, MDM, TERADATA, PL/SQL, TOAD, Informatica Scheduler, Netezza, TeradataSQL Assistnace, SQL, SSRS, UNIX, Shell Scripting, Autosys, Informatica IDQ, SAP, T-SQL