- Around 10 Years of Experience in Data analyst, Data Profiling and Reports development by using Talend Studio / Big Data Integration / Enterprise version, Tableau, Jasper, Oracle SQL, SqlServer,Amazon S3, Redshift and Hadoop Eco systems such as Hive, Hue, Spark SQL, Sqoop, Impala and epic data sources.
- Experience working in various industry sectors such as Core Banking, Retail Business, Tele communication Domain and Health Care Domain.
- Proficiency in writing complex Spark jobs/programs to create the data frame for analyzing the disparate datasets in HDFS file system.
- Working knowledge and practice working in Agile development environment.
- Proficiency on loading the Financial services Data Mart known as FSDM, and Data Lake Designs in Hadoop by using Sqoop.
- Good Hands on developing Talend DI Jobs to transfer the data from Source views to Hadoop Staging, Target Layers to perform the Fraud identification survey on the transactions.
- Strong in transferring the data from relational data base to Cloud such as Amazon S3 and Redshift by using Talend Big data Spark Jobs
- Extensively used the External tables and Manage tables in Hive environment at the time of transforming the data from multiple source system to HDFS.
- Extending Hive and Pig core functionality by writing custom UDFs, UDTF and UDAFs on handling the data based on the business requirements.
- Worked as production support L2 Tire, to monitor the Jobs in CA7 Scheduler and fixing the data load issues in the given time lines, documented the resolution for future s.
- Experienced in using Hadoop eco system Flume that helps to gather the real - time data from multiple source system such as Web Servers and Social Media and loaded the data in HDFS for further business analysis.
- Hands on Hadoop eco system YARN to performing configuring and maintaining job Schedulers.
- Extensive knowledge in Dimensional data modeling, Star Schema, Snow-Flake Schema, Normalization, Physical and Logical Data modeling.
- Hands on experience working on NoSQL databases like Hive, PostgreSQL and Casandra.
- Expertise in processing Semi-structured data such as (XML,JSON,RC and CSV) in Hive/Impala by using Talend ETL Tool.
- Experience in writing Spark SQL scripts by using Python interpreter.
- Create and maintained Talend Job Run Book to trigger the Hive Data Transfers Jobs in HDFS thorough CA Scheduler.
- Developed POC projects by writing SQL scripts and queries to extract data from various data sources into the BI tool, Visualization tool, and Excel reports.
- Built and expanded the reports availability for both internal support and external customer/carrier reports, which included increasing the scope of data extracted and utilized for reporting purposes.
- Created numerous ad-hoc analysis, dashboards, reports, graphs, and charts to the management for their quick decisions.
- Experienced in performing Unit Testing and hands on complex SQL queries as per the Business requirements in Hive, to validating the data loaded by Hadoop Sqoop, ETL tools such as Talend.
- Extensively worked on Application Life Cycle Management (HP ALM 12) to track the execution, defects and pulled the traceability Matrix.
- Excellent communication - ability to tell a story through data.
ETL/ Talend Developer
Confidential, Atlanta, GA
- Worked closely with Business Analysts to review the business specifications of the project and to gather the ETL requirements.
- Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.
- Created and managed Source to Target mapping documents for all Facts and Dimension tables
- Analyzing the source data to know the quality of data by using Talend Data Quality.
- Involved in writing SQL Queries and used Joins to access Data from Oracle, and MySQL.
- Assisted in migrating the existing data center into the AWS environment.
- Prepared ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.
- Design and Implemented ETL for data load from heterogeneous Sources to SQL Server and Oracle as target databases and for Fact and Slowly Changing Dimensions SCD-Type1 and SCD-Type2.
- Utilized Big Data components like tHDFSInput, tHDFSOutput, tPigLoad, tPigFilterRow, tPigFilterColumn, tPigStoreResult, tHiveLoad, tHiveInput, tHbaseInput, tHbaseOutput, tSqoopImport and tSqoopExport.
- Used Talend most used components (tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator, tSetGlobalVar, tHashInput & tHashOutput and many more)
- Created many complex ETL jobs for data exchange from and to Database Server and various other systems including RDBMS, XML, CSV, and Flat file structures.
- Experienced in using debug mode of Talend to debug a job to fix errors.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
- Conducted JAD sessions with business users and SME's for better understanding of the reporting requirements.
- Developed Talend jobs to populate the claims data to data warehouse - star schema.
- Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly, monthly and yearly basis.
- Worked on various Talend components such as tMap, tFilterRow, tAggregateRow, tFileExist, tFileCopy, tFileList, tDie etc.
- Worked Extensively on Talend Admin Console and Schedule Jobs in Job Conductor.
Environment: Talend Enterprise Big Data Edition 5.1, Talend Administrator Console, MS SQL Server 2012/2008, Oracle 11g, Hive, HDFS, Sqoop, TOAD, UNIX Enterprise Platform for Data integration.
Confidential, Irving, TX
- Developed ETL process to load Oracle data to Sql Server System using following Talend Components:
- Oracle Components - tOracleConnection, tOracleInput, tOracleBulkExec
- Worked on Linux system (Red Hat) to deploy the Talend code.
- Deployed the code using shell scripts in other machines.
- Worked extensively on SQL Queries for validating the records.
- Worked on paginating the SQL statements in the ETL flow to handle the memory issues and to improve the performance.
- Worked on handling the dead lock errors while updating the SQL Server tables in the ETL flow.
- Parameterized the overall work flow to execute the code in different environments.
- Parallelized the workflows to improve the time for execution.
- Developed ETL mappings.
- Developed and tested all the backend programs, Informatica mappings and update processes.
- Developed Informatica mappings to load data into various dimensions and fact tables from various source systems.
- Worked on Informatica power center Designer tools like source Analyzer, Target Designer, Transformation Developer, Mapping Designer and Mapplet Designer
- Worked on Informatica Power Center Workflow Manager tools like Task Designer, Workflow Designer, and Worklet Designer.
- Designed and developed Informatica power center medium to complex mappings using transformations such as the Source Qualifier, Aggregator, Expression, Lookup, Filter, Router, Rank, Sequence Generator, Stored Procedure and Update.
- Worked as a key project resource taking day-to-day work direction and accepting accountability for technical aspects of development.
- Developed the Business rules for cleansing, validating and standardization of data using Informatica Data Quality.
- Designed and developed multiple reusable cleanse components.
Environment: Talend Open Studio 5.0.1, Informatica Power center, UNIX, SQL Server, TOAD, AutoSys.
- Worked on Informatica tool - Source Analyzer, Data warehousing designer, Mapping Designer, Mapplets and Transformations Developer.
- Interacted with Business Users and Managers in gathering business requirements
- Created mappings using Informatica Designer to build business rules to load data.
- Most of the transformations were used like the Source Qualifier, Aggregator, Lookup, Router, Filter, Sequence Generator, Expression, Joiner and Update Strategy.
- Developing and modifying changes in mappings according to the business logic.
- Creating Mapping variables, Mapping Parameters, Session parameters.
- Recovering Data from Failed sessions and Workflows.
- Creating Workflows, sessions using Workflow Manager.
- Coding & Debugging, Sorting out their time-to-time technical problems.
- Analyze the types of business tools and transformations to be applied.
- Tuned performance of Informatica session for large data files by increasing block size, data cache size and target based commit interval. Worked with short cuts for various Power Center objects like sources, targets and transformations.
- Involved in writing the Unit Test Cases using SQL.
Environment: Informatica Power Center 8.6, Oracle 9i, SQL/PLSQL, TOAD, UNIX, and Windows XP
- Prepared Detailed ETL design documents explaining the mapping logic in technical as well as Business terminology.
- Attended business user sessions to request/validate the enhancements for the current project.
- Proven excellent data analysis skills, total understanding of data flow all the way from various Source systems (Oracle and Flat files) into the Data Marts.
- Review new and existing systems design projects and procurement plans for compliance with standard and architectural plans. Ensures that solutions across platforms adhere to the enterprise architectural road map and support both short-term tactical plans as well as long-term strategic goals.
- Attended session on enhancement of Architecture, Quality, and Governance and Management assessment on client contact data.
- Developed complex mappings to format, parse and load the data by applying complex business rules using transformations like Source Qualifier, Expression, Aggregator, Joiner, Connected and Unconnected lookups, Router, Normalizer and Update strategy.
- Extensively used the reusable transformations and mapplets for faster development and standardization.
- Implemented slowly changing dimensions Type1, Type2 approach for loading the Dimensions and Fact tables.
- Used a type II SCD to store unique person data in the HR data mart. The database used was Oracle 11g. The data mart layout was a star schema and the unique person data (such as names, addresses, etc.) was stored in one of the dimension tables. The dimension table was a type II SCD implemented via date ranges.
- Modified the existing mappings and mapplets based on user change requests to implement new business logic and improve the session performance.
- Performed code migrations as part of Weekly/Monthly Change request releases.
- Created workflows and worklets with parallel and sequential sessions that extract, transform, and load data to one or more targets.
- Tuned existing code, by partitioning the sources, optimizing the SQL query in Source Qualifier and Lookup transformations, rearranging jobs in the schedule based on dependencies, rearranging sessions in the workflow to improve session performance.
- Created effective test data, developed and performed thorough Unit test cases to ensure successful execution of the data loading processes.
- Supported on Various test cycles like Integration, System, User acceptance tests and data warehouse testing.
- Conducted workflow analysis and suggestion of modifications through WCC GUI applications.
- Designed AutoSys based solutions for communication of issues to technical teams.
Environment: Informatica Power Center 9.1, Oracle 11g/10g, Flat Files, Sun Solaris, AutoSys, SQL,UNIX Shell Scripting, Toad, SVN, Windows 7.0 and MS Office Suite.