We provide IT Staff Augmentation Services!

Talend Big Data Developer Resume

4.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • 6+ years of IT experience in Analysis, Design, Developing and Testing and implementation of business application systems.
  • Highly skilled ETL Engineer with 6+ years of software development in tools like SQL/Informatica/Talend.
  • 3+ years’ Experience onTalendETL Enterprise Edition for Big data/Data integration.
  • Experience in Talend administration, Installation and configuration and worked extensively on Talend Big data to load the data in to HDFS, S3, Hive, Redshift.
  • Experience with Talend administration, creating projects, users, and assigning projects to users and job scheduling etc.
  • Experienced in ETL methodology for performing Data Migration, Extraction, Transformation and Loading using Talend and designed data conversions from large variety of source systems including Oracle 10g/9i/8i/7.x, DB2, Netezza, SQL server, Teradata, Hive and non - relational sources like Delimited files, Positional files, flat file and XML.
  • Expertise in creating mappings using Talend DI/Big data components like ts3configuration,tSQLrow, thdfsconfiguration,tRedshiftinput,tfileparquetinput,tfileparquetoutput,tFilterRow,tmap,tJoin,ts3Put,ts3Get,ts3Copy,tFileList,tJava,tAggregateRow,tDie,tLogRow,tuniqrow,tamazonEMRManage etc.
  • Involved in code migrations from Dev, QA, CERT and Production and providing operational instructions for deployments.
  • Hands on experience on Hadoop technology stack (HDFS, Map - Reduce, Hive and Spark)
  • Implemented stand-alone installation, file system management, backups, process control, user administration and device management in a networked environment.
  • Experience in developing Stored Procedures, Functionsand SQL queries using SQL Server.
  • Experienced in integration of various data sources like Oracle SQL,AWS(S3,EMR,Redhsift,Aurora), Netezza, SQL server and MS access into staging area.
  • Experienced in all phases of Data warehouse development from requirements gathering for the data warehouse to develop the code, Unit Testing and Documenting.
  • Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting, and usedNetezza Utilities to load and execute SQL scripts using Unix
  • 6+ months as a Scrum master and set up meetings and Demo session’s and maintain the confluence page according to the requirements gathered from BA.
  • Experience in data migration with data from different application into a single application.
  • Responsible for Data migration from my SQL Server to Redshift Databases.
  • Experienced in batch scripting on windows and worked extensively with slowly changing dimensions using Talend.
  • Hands-on experience across all stages of Software Development Life Cycle (SDLC) including business requirement analysis, data mapping, build, unit testing, system integration and user acceptance testing.

TECHNICAL SKILLS

Operating Systems: Windows 2008/2007, UNIX,LINUX.

Data warehousing: Talend DI, Talend Big Data,AWS (S3,EMR,Redshift,Aurora,Lambda,SNStopic)Informatica Power Center 7.x Designer, Workflow Manager, Workflow Monitor and Repository manager.

Databases: Redshift, Aurora, Oracle 12c/11g/10g, MS SQL Server /2005, DB2 v8.1, Netezza.

Methodologies: Agile, Waterfall

Languages: SQL, UNIX, Shell scripts, C++.

Scheduling Tools: TAC (Talend Administrator Console), Control-M

PROFESSIONAL EXPERIENCE

Confidential, Charlotte NC

Talend Big Data Developer

Responsibilities:

  • Experienced in Design, Develop, and Improve databases and ETL processes in scope of Application development.
  • Created and worked on API implementation part.
  • Created job lets and sub jobs and routines for easy code reusability.
  • Created and managed Source to Target mapping documents for all Facts and Dimension tables.
  • Created Implicit, local and global Context variables in the job.
  • Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance Tuning of mappings.
  • Used Talend components like tmap, tDie, tConvertType, tFlowMeter, tParallelize, tLogCatcher, tRowGenerator, tSetGlobalVar, tHashInput, tHashOutput, tcacheInput, tcacheoutput, tsqlrow, tRESTClient and many more.
  • Worked on Error handling technique's and tuning the ETL flow for better performance.
  • Writing Spark SQL to apply complex business logics on Talend big data batch jobs using tsqlrow component.
  • Created data frames in Hadoop to execute the automated test Scripts.
  • Perform the process to load the data to S3 and from S3 to Redshift using copy commands.
  • Implemented Change Data Capture technology in Talend to load deltas to a Data Warehouse.
  • Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java and Talend.
  • Developed jobs to move inbound files to vendor server location based on monthly, weekly and daily frequency.
  • Configuring the EMR Hadoop cluster to run Complex Talend spark jobs to handle large set of data in Big data jobs by giving necessary nodes and memory for the cluster.
  • Designed Talend jobs in using tParallelize component from execution standpoint to reduce the run time.
  • Used ETL methodologies and best practices to create Talend ETL jobs. Followed and enhanced programming and naming standards.
  • Developed stored procedure to automate the testing process to ease QA efforts and also reduced the test timelines for data comparison on tables.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Using GitHub repository to maintain code version control, Jira to build and track current work and backlog stories, and Agile project management development methods of Scrum/sprints and Kanban to track project progress.

Environment: Talend 7.1, AWS (S3, Lambda, SNS, Step functions etc.), Redshift, Parquet Files, Git, Jenkins, SparkSql,Putty, API, JIRA.

Confidential, Columbus, Ohio

ETL Developer

Responsibilities:

  • Involved in creating jobs to ingest multiple client data sets using Talend data integration (DI) and Talend big data spark jobcomponents.
  • Created job lets and sub jobs for code reusability.
  • Used thdfs configuration and tS3Configuration components to access the data from s3 and HDFS when we run the job on Amazon EMR.
  • Used tAmazonEMRManage component in launching and shut down EMR from Talend and running the Talend big data spark job on launched EMR and creating parquet files on s3 using delimited files, excel files, gzip files and load the data into S3, Redshift database and Aurora.
  • Storing and reading the parquet files and csv files using tFileOutParquet, tFileInputParquet, tFileInputdelimited and tfileOutputDelimited.
  • Involved to run the jobs on TAC based events from Amazon s3 using Lambda function and AWS S3 events.
  • Performed data manipulations using various Talend components like tmap,tSQLrow, tschemacompliance check, tFilterRow, tuniqrow, tJavarow, tJava, tAmazonRedshift, tOracleRow, tOracleInput, tOracleOutput and many more.
  • Used the extracting appropriate features from data sets in order to handle bad, partial records using Spark SQL.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Implemented a mechanism to start an EMR based on the input file size using Lambda function and s3 events.
  • Experience in using Zeppelin tool and parquet viewer for data analytics and data visualization.
  • Developed jobs to extract data from redshift using UNLOAD command and loading data into redshift using COPY command using tRedshiftrow component.
  • Built Talend big data spark jobs to process large volume of dataand perform necessary transformations and do the rollups for huge raw files.
  • Experience in triggering the spark applications on EMR on file arrival from the client on source location.
  • Experience in using s3 components like (ts3Get, tS3Put, ts3Copy, ts3List and ts3Delete).
  • Monitored and supported theTalendjobs scheduled throughTalendAdmin Center (TAC)
  • Involved in creating S3 event-based trigger jobs in Talend.
  • Experienced using tcacheInput and tcacheoutput components to load the data memory and other persistence techniques to improve performance of the job.
  • Involved in deployment of jobs from all environments (DEV/QA/UAT/PROD) using Talend Repo Manager (TRM) and involved in production issues.

Environment: Talend Big data 6.3.1/ 7.1.1 , AWS Redshift, S3, Aurora RDS, Spark, Oracle 12c, Pipe delimited / Positional Files/ Parquet, SQL workbench, SQL developer, Putty,WinSCP, FileZilla, Zeppelin, JIRA, SVN, Unix scripting, Agile.

Confidential

Informatica Developer

Responsibilities:

  • Extraction, Transformation and data loading were performed usingInformaticainto the database. Involved in Logical and Physical modeling of the database.
  • Designed the ETL processes using Informatica to load data from Oracle, Flat Files to target Oracle Data Warehouse database.
  • Based on the requirements created Functional design documents and Technical design specification documents for ETL.
  • Created tables, views, indexes, sequences and constraints.
  • Transferred data using SQL Loader to database.
  • Involved in testing of Stored Procedures and Functions. Designed and developed table structures, stored procedures, and functions to implement business rules.
  • Implemented SCD methodology including Type 1 and Type 2 changes.
  • Used legacy systems, Oracle, and SQL Server sources to extract the data and to load the data.
  • Involved in design and development of data validation, load process and error control routines.
  • Used pmcmd to run workflows and created Cron jobs to automate scheduling of sessions.
  • Involved in ETL process from development to testing and production environments.
  • Analyzed the database for performance issues and conducted detailed tuning activities for improvement.
  • Generated monthly and quarterly inventory/purchase reports.
  • Coordinated database requirements with Oracle programmers and wrote reports for sales data.

Environment: InformaticaPower Center7.1, Oracle 9, Control-M, SQL Server 2005, XML, SQL, PL/SQL, UNIX Shell Script.

We'd love your feedback!