- Over 7 years of IT experience in Analysis, Design, Developing and Testing and implementation of business application systems.
- Highly skilled ETL Engineer with 7 years of software development in tools like SQL/Informatica/Talend.
- 3+ years’ Experience on Talend ETL Enterprise Edition for Big data/Data integration.
- Experience in Talend administration, Installation and configuration and worked extensively on Talend Big data to load the data in to HDFS, S3, Hive, Redshift.
- Experience with Talend administration, creating projects, users, and assigning projects to users and job scheduling etc.
- Experienced in ETL methodology for performing Data Migration, Extraction, Transformation and Loading using Talend and designed data conversions from large variety of source systems including Oracle 10g/9i/8i/7.x, DB2, Netezza, SQL server, Teradata, Hive and non - relational sources like Delimited files, Positional files, flat file and XML.
- Expertise in creating mappings using Talend DI/ Big data components like ts3configuration, tSQLrow, thdfsconfiguration, tRedshiftinput, tfileparquetinput, tfileparquetoutput, tFilterRow, tmap, tJoin, ts3Put, ts3Get, ts3Copy, tFileList, tJava, tAggregateRow, tDie, tLogRow, tuniqrow, tamazonEMRManage etc.
- Involved in code migrations from Dev, QA, CERT and Production and providing operational instructions for deployments.
- Implemented stand-alone installation, file system management, backups, process control, user administration and device management in a networked environment.
- Experience in developing Stored Procedures, Functions and SQL queries using SQL Server. • Experienced in integration of various data sources like Oracle SQL,AWS(S3,EMR,Redhsift,Aurora), Netezza, SQL server and MS access into staging area.
- Experienced in all phases of Data warehouse development from requirements gathering for the data warehouse to develop the code, Unit Testing and Documenting.
- Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting, and used Netezza Utilities to load and execute SQL scripts using Unix
- 6+ months as a Scrum master and set up meetings and Demo session’s and maintain the confluence page according to the requirements gathered from BA.
- Experience in data migration with data from different application into a single application. • Responsible for Data migration from my SQL Server to Redshift Databases.
- Experienced in batch scripting on windows and worked extensively with slowly changing dimensions using Talend.
- Hands-on experience across all stages of Software Development Life Cycle (SDLC) including business requirement analysis, data mapping, build, unit testing, system integration and user acceptance testing.
Operating Systems: Windows 2008/2007, UNIX,LINUX.
Data warehousing: Talend DI, Talend Big Data, AWS (S3,EMR,Redshift,Aurora,Lambda,SNS topic) Informatica Power Center 7.x Designer, Workflow Manager, Workflow Monitor and Repository manager.
Databases: Redshift, Aurora, Oracle 12c/11g/10g, MS SQL Server 2012 /2008/2005 , DB2 v8.1, Netezza.
Methodologies: Agile, Waterfall
Languages: SQL, UNIX, Shell scripts, C++.
Scheduling Tools: TAC (Talend Administrator Console), Control-M
Confidential, Reston, VA
Talend Big Data Developer
- Experienced in Design, Develop, and Improve databases and ETL processes in scope of Application development.
- Created and worked on API implementation part.
- Created job lets and sub jobs and routines for easy code reusability.
- Created and managed Source to Target mapping documents for all Facts and Dimension tables.
- Created Implicit, local and global Context variables in the job.
- Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance Tuning of mappings.
- Used Talend components like tmap, tDie, tConvertType, tFlowMeter, tParallelize, tLogCatcher, tRowGenerator, tSetGlobalVar, tHashInput, tHashOutput, tcacheInput, tcacheoutput, tsqlrow, tRESTClient and many more.
- Worked on Error handling technique's and tuning the ETL flow for better performance.
- Writing Spark SQL to apply complex business logics on Talend big data batch jobs using tsqlrow component.
- Created data frames in Hadoop to execute the automated test Scripts.
- Perform the process to load the data to S3 and from S3 to Redshift using copy commands.
- Implemented Change Data Capture technology in Talend to load deltas to a Data Warehouse.
- Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java and Talend.
- Developed jobs to move inbound files to vendor server location based on monthly, weekly and daily frequency.
- Configuring the EMR Hadoop cluster to run Complex Talend spark jobs to handle large set of data in Big data jobs by giving necessary nodes and memory for the cluster.
- Designed Talend jobs in using tParallelize component from execution standpoint to reduce the run time.
- Used ETL methodologies and best practices to create Talend ETL jobs. Followed and enhanced programming and naming standards.
- Developed stored procedure to automate the testing process to ease QA efforts and also reduced the test timelines for data comparison on tables.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Using GitHub repository to maintain code version control, Jira to build and track current work and backlog stories, and Agile project management development methods of Scrum/sprints and Kanban to track project progress.
Environment: Talend 7.1, AWS (S3, Lambda, SNS, Step functions etc.), Redshift, Parquet Files, Git, Jenkins, Spark Sql, Putty, API, JIRA.
Confidential, Lombard, IL
Talend Big Data Developer
- Involved in creating jobs to ingest multiple client data sets using Talend data integration (DI) and Talend big data spark job components.
- Created job lets and sub jobs for code reusability.
- Used thdfsconfiguration and tS3Configuration components to access the data from s3 and HDFS when we run the job on Amazon EMR.
- Used tAmazonEMRManage component in launching and shut down EMR from Talend and running the Talend big data spark job on launched EMR and creating parquet files on s3 using delimited files, excel files, gzip files and load the data into S3, Redshift database and Aurora.
- Storing and reading the parquet files and csv files using tFileOutParquet, tFileInputParquet, tFileInputdelimited and tfileOutputDelimited.
- Involved to run the jobs on TAC based events from Amazon s3 using Lambda function and AWS S3 events.
- Performed data manipulations using various Talend components like tmap, tSQLrow, tschemacompliancecheck, tFilterRow, tuniqrow, tJavarow, tJava, tAmazonRedshift, tOracleRow, tOracleInput, tOracleOutput and many more.
- Used the extracting appropriate features from data sets in order to handle bad, partial records using Spark SQL.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
- Implemented a mechanism to start an EMR based on the input file size using Lambda function and s3 events.
- Experience in using Zeppelin tool and parquet viewer for data analytics and data visualization.
- Developed jobs to extract data from redshift using UNLOAD command and loading data into redshift using COPY command using tRedshiftrow component.
- Built Talend big data spark jobs to process large volume of data and perform necessary transformations and do the rollups for huge raw files.
- Experience in triggering the spark applications on EMR on file arrival from the client on source location.
- Experience in using s3 components like (ts3Get, tS3Put, ts3Copy, ts3List and ts3Delete).
- Monitored and supported the Talend jobs scheduled through Talend Admin Center (TAC)
- Involved in creating S3 event-based trigger jobs in Talend.
- Experienced using tcacheInput and tcacheoutput components to load the data memory and other persistence techniques to improve performance of the job.
- Involved in deployment of jobs from all environments (DEV/QA/UAT/PROD) using Talend Repo Manager (TRM) and involved in production issues.
Environment: Talend Big data 6.3.1/ 7.1.1 , AWS Redshift, S3, Aurora RDS, Spark, Oracle 12c, Pipe delimited / Positional Files/ Parquet, SQL workbench, SQL developer, Putty, WinSCP, FileZilla, Zeppelin, JIRA, SVN, Unix scripting, Agile.