Talend Bigdata Developer Resume
SUMMARY
- Around 8 years of experience in Analysis, Design, Development, Testing, Implementation, Enhancement and Support of ETL applications in development of Enterprise Data Warehouse using best practices with Talend and DataStage.
- 4+ years in developing & leading the end to end implementation of Data Integration and Talend Bigdata projects by using Talend Data Integration and Talend Real Time Bigdata, comprehensive experience Relational Databases.
- Extensive experience in ETL methodology for performing Data Migration, Extraction, Transformation and Loading using Talend and designed data conversions from wide variety of source systems including Oracle, DB2, SQL server, MYSQL, Azure Tables, Hive, Impala, AWS S3, Flat files, and XML files.
- Hands on experience on Hadoop technology stack (Map - Reduce, Hive, Impala, and Spark).
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Good Understanding of relational database management systems, experience in integrating data from various data source like Oracle, MSSQL Server, MySQL and Flat files too.
- Good knowledge in deployment process from DEV to QA, UAT and PROD with both Deployment group and Import/Exports method.
- Familiar with implementation of the Data Warehouse life cycle and excellent knowledge on entity-relationship/multidimensional modeling (star schema, snowflake schema), Slowly Changing Dimensions (SCD Type 1, Type 2, and Type 3).
- Debugging ETL jobs errors, ETL Sanity and production Deployment in TAC-Talend Administrator Console using SVN.
- Experience in Trouble shooting and implementing Performance tuning at various levels such as Source, Target, Mapping Session and System in ETL Process.
- Experienced in working with Tac (Talend Administration Console).
- Hands on experience in working with Cloudera.
- Collect, transform, cleanse, and move data from virtually any data source to Azure.
- Tracking Daily data load, Monthly data extracts and send to client for their verification.
- Experience in converting the Store Procedures logic into ETL requirements
- Good communication and interpersonal skills, ability to learn quickly, with good analytical reasoning and adaptive to new and challenging technological environment.
TECHNICAL SKILLS
Development Tools: Talend Real-time Big Data 7.1, Talend Data Integration 5.6/6.2, IBM Web Sphere DataStage Version 8.1, Shell Scripting
DBMS: CDH 5, HIVE, IMPALA, Azure Tables, Oracle 12c, Netezza, DB2, SQL, PL/SQL, MYSQL
Database Tools: SQL Developer, Aginity Workbench, WinSQL, Aqua data studio, Putty, HUE
Operating Platforms: Windows, Unix, and Linux
Scheduling Tools: Tidal Enterprise scheduler, TAC, Control M
PROFESSIONAL EXPERIENCE
Confidential
Talend Bigdata Developer
Responsibilities:
- Created complex mappings in Talend Real-time Bigdata by using components like tMap, tParallelize, tFileInputPositional, tFileInputParuet, tPartition, tFileOutputParquet, tHiveInput, tHiveOutput, tExtractPositionalFields, tAzureStorageInputTable, tAzureStorageOutputTable, tAzureStorageGet, tAzureStoragePut, tSqlRow etc.
- Hands on experience in Hadoop Ecosystem components such as Hadoop, Spark, HDFS, YARN, Hive, Sqoop, MapReduce.
- Designed and developed end-to-end ETL process using Talend Real-time Bigdata from various source systems to Staging area, from staging to Data Marts.
- Developed complex ETL jobs from various sources such as Netezza, Oracle, AWS S3, flat files and Positional Flat files and loaded into target systems like, Azure Storage Tables, HDFS, Hive, Impala and AWS S3 Bucket using Talend Real-time Big Data.
- Developed End to End Talend Spark jobs to handle huge volumes of data in HDFS Environment.
- Moving the files from AWS S3 to HDFS and Azure Tables using Talend Batch jobs.
- Performed Dynamic allocation of Resources to Talend Spark jobs to manage the Resources during the Runtime.
- Applied Partitioning (which enables Parallel Processing) in Spark jobs to handle the large volumes of data and increases the performance.
- Developed UNIX shell scripts for automating and enhancing/streamlining existing manual procedures.
- Work with team which supports the existing Technical Landscape.
- Understand and assess source data integration requirements and recommend effective solution from data integration standpoint.
- Migrated Existing DataStage ETL jobs to Talend.
- Implemented File Transfer Protocol operations using Talend Studio to transfer files in between network folders.
- Responsible to tune ETL mappings, Workflows, and underlying data model to optimize load and query Performance in Spark.
- Handled slowly changing dimensions (SCD) to populate current and historical data to Dimensions and Fact tables in the data warehouse.
- Understanding and documenting the business requirement.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager and checked the Logs.
- Executed Impala SQL’s in CDH Environment by using Impala Shell Command in Shell Scripts.
- Worked on reconciliation reports, maintaining job execution statistics and error logging.
- Used debugger and breakpoints to view transformations output and debug mappings.
- Responsible for modifying the code, debugging, and testing the code before deploying on the production cluster.
- By Using Azure was able to integrate and cleanse data from multiple sources and deliver real-time insights. With a clear view of each customer segment’s profitability, we could target their customers with customized offers at the right time to maximize engagement.
- Designed and created both Managed and External tables in Hive to optimize performance.
- Created users and provided user access and project access to team, Deployed and scheduled jobs in TAC.
- Scheduled the Talend jobs with Talend Administration Center and Control and setting up best practices and migration strategy.
Skills: Talend Real-time Big Data 6.5/7.1.1, Cloudera 5, UNIX, MYSQL, Oracle 12.2, AWS EC2, AWS S3, Azure Tables, Netezza, EXCEL files, Flat files, Talend Administrator Center, HIVE, IMPALA, HDFS, Control-M, Agile Methodology.
Confidential
ETL Talend Consultant
Responsibilities:
- Created complex mappings in Talend 6.2 using tMap, tJoin, tReplicate, tParallelize, tFixedFlowInput, tAggregateRow, tFilterRow, tIterateToFlow, tFlowToIterate, tAssert, tWarn etc.
- Design and developed end-to-end Data Integration jobs using Talend Data Integration Enterprise Edition from various source systems to Stage area, from staging to Data Marts.
- Handled slowly changing dimensions (SCD) of Type-2 to populate current and historical data to Dimensions and Fact tables in the data warehouse.
- Integrated java code inside Talend Real-time Big by using components like tJavaRow, tJava, tJavaFlex and Routines.
- Created ETL mappings and Workflow using Talend to extract data from flat files (CSV, Text) and loads to warehouse
- Used tRunJob component to run child job from a parent job and to pass parameters from parent to child job.
- Used tParalleize component and multi thread execution option to run subjobs in parallel which increases the performance of a job.
- Developed talend jobs to call RestAPI services.
- Extensively used tRest components to call GET, PUT and POST services on RestAPI.
- Responsible to tune ETL mappings, Workflows and underlying data model to optimize load and query Performance.
- Implemented FTP operations using Talend Studio to transfer files in between network folders as well as to FTP server using components like tFileCopy, TFileAcrchive, tFileDelete, tCreateTemporaryFile, tFTPDelete, tFTPCopy, tFTPRename, tFTPut, tFTPGet etc. created Generic schemas and creating Context Groups and Variables to run jobs against different environments like Dev, Test and Prod.
- Responsible for modification of ETL data load scripts, scheduling automated jobs and resolving production issues (if any) on time.
- Scheduled the Talend jobs with Talend Administration Console and Tidal Job Scheduler setting up best practices and migration strategy.
- Developed error logging module to capture both system errors and logical errors that contains Email notification and moving files to error directories.
- Involve in Knowledge transfer framework and documentation for the application.
- Hands of Experience on many components which are there in the palette to design Jobs & used Context Variables/Groups to Parameterize Talend Jobs.
Skills: Talend Data Integration 5.6/6.2, Oracle 11G, XML files, Flat files, Talend Administrator Console, Tidal Job Scheduler, IMS, Waterfall Methodologies
Confidential
DataStage Developer
Responsibilities:
- Experience in Creating the Job Sequencer with various stages like Job activity, User Variable Activity, Terminator Activity, Routine Activity, Execute Command Activity and Notification Activity.
- Design to build and maintain batch processes that derive data from data warehouse database and propagate it into downstream systems.
- Worked on ORACLE 9i and SQLServer2005 databases as part of DataStage development.
- Having experience on UNIX shell scripts.
- Worked closely with SA’s in understanding of the requirements and converting into technical specifications & Mapping Documents.
- Experience in Data collection, design, analysis and development of systems using ETL.
- Worked on agile projects and waterfall methodologies with cross functional teams, Business analysts and Quality analysts.
- Detailed and problem solving oriented in DataStage jobs and addressing production issues.
- Extensive experience in loading high volume data, and performance tuning.
- Involved in tuning of the DataStage jobs for optimal performance, and performance tuning of oracle queries.
- Used Director for Validation, Running and Monitoring of jobs and logs etc.
- Experience in design, Unit testing, preparing test cases and executing test cases.
- Responsible for modification of ETL data load scripts, scheduling automated jobs and resolving production issues (if any) on time.
- Worked closely with testing team in all phases of testing for successful closure of defects and issues.
- Experience in Import/Export Jobs from Development to Production Environment.
- Worked extensively with the Parallel extender for parallel processing in order to improve the job performance while working with bulk data sources.
- Worked on data profiling aspect for the source data by making use of Information Analyzer- Generated Reports, Table Analysis, Column Analysis, Primary Key Analysis and Foreign Key Analysis.
- Worked closely with Business Analysts to review the business specifications of the project also to gather the ETL requirements.
Skills: IBM Web Sphere DataStage Version 8.1, Oracle10g, and Windows XP and HP-UX (UNIX).
Confidential
SQL Developer
Responsibilities:
- Created and managed schema objects such as Tables, Views, Indexes and referential integrity depending on user requirements.
- Used MS SQL Server 2008/2005 to design, implement and manage data warehouses OLAP cubes and reporting solutions to improve asset management, incident management, data center services, system events support and billing.
- Utilized T-SQL daily in creating customs view for data and business analysis.
- Utilized Dynamic T-SQL within functions, stored procedures, views, and tables.
- Used the SQL Server Profiler tool to monitor the performance of SQL Server particularly to analyze the performance of the stored procedures.
- Stored Procedures and Functions were optimized to handle major business crucial calculations.
- Implementation of data collection and transformation between different heterogeneous sources such as flat file, Excel and SQL Server 2008/2005 using SSIS.
- Migrated all DTS packages to SQL Server Integration Services (SSIS) and modified the package according to the advanced feature of SQL Server Integration Services.
- Defined Check constraints, rules, indexes and views based on the business requirements.
- Extensively used SQL Reporting Services and Report Builder Model to generate custom reports.
- Designed and deployed Reports with Drop Down menu option and Linked reports
- Developed drill down and drill through reports from multi-dimensional objects like star schema and snowflake schema using SSRS and performance point server.
- Created subscriptions to provide a daily basis reports and managed and troubleshoot report server related issues.
Skills: MS SQL Server 2005/2008, SSIS, SSRS, Microsoft.NET, MS Office Access2003, Excel, Erwin.