Lead Talend Big Data Developer Resume
Farmington, CT
PROFESSIONAL SUMMARY:
- 5 years of professional experience in warehouse project with ETL Talend (DI,DQ,MDM,ESB,Data Mapper and Big Data).
- Hands on experience in Talend Big Data with MDM for creating data model, data container, views and workflows. Used different components in talend MDM like tMDMInput, tMDM Output, tMDM BulkLoad, tMDM Connection, tMDM Receive and tMDM Rollback.
- Created complex mappings in Talend 6.4.1 using tMap, tJoin, tReplicate, tParallelize, tJava, tJavaFlex, tAggregateRow, tDie, tWarn, tLogCatcher, etc.
- Talend administrator with hands on Big data (Hadoop) with Cloudera framework (HUE).
- Good experience in Talend admin (Users, Usergroups, project, project authorization, project, locks, Licenses, backup, notification, softwareupdates, jobconductor, bigdata streaming, execution plan, server, Monitoring, logging, Activity Monitoring console, audit, drools and migration check).
- Job scheduling plan base on business requirement in TAC (simple trigger, corn trigger and file trigger) to change context parameters and JVM parameters.
- Created AMC table and file for LOG, STAT and FLOW to capture job information.
- Expert noledge in creating test Plans, test Cases, test Scenarios and test Strategies, Defect Management to ensure Quality Assurance and to test all the business requirements.
- Experience in preparing test reports from Quality Center and prepared daily test status reports to communicate the test status with the team.
- Experienced in architecting Hadoop clusters Translation of functional and technical requirements into detailed architecture and design.
- Worked on analysing Hadoop cluster and different big data analytical and processing tools including Pig, Hive,Spark, and Spark Streaming.
- Analysed large amounts of data sets to determine optimal way to aggregate and report on it.
- Migrating various Hive UDF's and queries into Spark SQL for faster requests.
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to Confidential using Scala.
- Hands on experience in Spark and Spark Streaming creating RDD's, applying operations - Transformation and Actions.
- Developed and implemented hive custom UDFs involving date functions.
- Used Sqoop to import data from Oracle to Hadoop.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Experienced in developing scripts for doing transformations using Scala.
- Involved in developing Shell scripts to orchestrate execution of all other scripts and move the data files within and outside of Confidential.
- Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Using Kafka on publish-subscribe messaging as a distributed commit log, has experienced in its fast, scalable and durability.
- Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
- Used tStats Catcher, tDie, tLogRow to create a generic joblet to store processing stats.
- Created Talend Mappings to populate the data into dimensions and fact tables.
- Broad design, development and testing experience with Talend Integration Suite &Talend MDM and noledge in Performance Tuning of mappings.
- Proficient in supporting Data warehouse ETL activities using queries and functionalities of SQL, PL/SQL, SQL*Loader, AWS and SQL*Plus. Solid experience in implementing complex business rules by creating re-usable transformations and robust mappings/mapplets using various transformations like Unconnected and Connected lookups, Source Qualifier, Router, Filter, Expression, Aggregator, Joiner, Update Strategy etc.
- Experience in working on Talend Administration Activities and Talend Data Integration ETL Tool.
- Highly experienced in row and column-oriented databases and also experienced in SQL performance tuning and debugging of existing ETL process
- Familiar with design and implementation of the Data Warehouse life cycle and excellent noledge on entity-relationship/multidimensional modeling (star schema, snowflake schema).
PROFESSIONAL EXPERIENCE:
Confidential, Farmington, CT
Lead Talend Big Data Developer
Responsibilities:
- Acquire and interpret business requirements, create technical artifacts, and determine the most efficient/appropriate solution design, thinking from an enterprise-wide view.
- Worked in the Data Integration Team to perform data and application integration with a goal of moving more data more TEMPeffectively, efficiently and with high performance to assist in business critical projects coming up with huge data extraction.
- Perform technical analysis, ETL design, development, testing, and deployment of IT solutions as needed by business or IT.
- Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive, Spark, HBase, Kafka, Elastic Search, database and SQOOP. Installed Hadoop, Map Reduce, Confidential, and developed multiple Map-Reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Importing and exporting data into Confidential and Hive using SQOOP.
- Participate in designing the overall logical & physical Data warehouse/Data-mart data model and data architectures to support business requirements.
- Explore prebuilt ETL metadata, mappings and Confidential metadata and Develop and maintain SQL code as needed for SQL Server database.
- Performed data manipulations using various Talend components like tMap, tJavarow, tjava, tOracleRow, tOracleInput, tOracleOutput, tMSSQLInput and many more.
- Analyzing the source data to no the quality of data by using Talend Data Quality.
- Troubleshoot data integration issues and bugs, analyze reasons for failure, implement optimal solutions, and revise procedures and documentation as needed.
- Worked on Migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Netezza.
- Used SQL queries and other data analysis methods, as well as Talend Enterprise Data Quality Platform for profilingand comparison of data, which will be used to make decisions regarding how to measure business rules and quality of the data.
- Worked on TalendRTX ETL tool, develop jobs and scheduled jobs in Talend integration suite.
- Writing Netezza SQL queries to join or any modifications in the table.
- Used Talend reusable components like routines, context variable and globalMap variables.
- Responsible to tune ETL mappings, Workflows and underlying data model to optimize load and query Performance.
- Developed Talend ESB services and deployed them on ESB servers on different instances.
- Implementing fast and efficient data acquisition using Big Data processing techniques and tools.
- Monitored and supported the Talend jobs scheduled through Talend Admin Center (TAC) .
- Developed Oracle PL/SQL, DDLs, and Stored Procedures and worked on performance and fine Tuning of SQL.
Environment: Talend 6.4.1/5.6, Netezza, Oracle 12c, Confidential DB2, TOAD, Aginity, BusinessObjects 4.1, MLOAD, SQL Server 2012, XML, SQL, Hive, Pig, SQL, PL/SQL, HP ALM, JIRA, Amazon EC2, Apache Hadoop 1.0.1, MapReduce, Confidential, CentOS 6.4, HBase, Kafka, Scala, Elastic Search, Hive, Pig, Oozie, Flume, Java (jdk 1.6), Eclipse, Sqoop, Ganglia, Hbase.
Confidential, Atlanta, GA
Sr. Talend Big Data Developer
Responsibilities:
- Created Talend Mappings to populate the data into dimensions and fact tables.
- Broad design, development and testing experience with Talend and BigData Integration Suite and noledge in Performance Tuning of mappings.
- Development of staging, Data warehouse scripts and deployment
- Writing specifications for ETL processes.
- Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into Confidential and Hive using Scoop.
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into Confidential for analysis .
- Validating customer requirements, performing analysis to fit in the Jasper reports framework
- Design of Jasper embedded report components to embed them into customer’s application
- Development of Jaspersoft reports, dash boards UI components, writing complex queries to support the interactive reporting logic.
- Implemented Change Data Capture technology in Talend in order to load deltas to a DataWarehouse.
- Experienced in using Talend database components, File components and processing components based up on requirements.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
- Development of reports using various chart types
- Coordination with Offshore team, providing them guidance and clarifications related to reports, underlying queries
- Perform validation check and deployment of reports to customer’s staging environment Client, Business objects.
Environment: Talend Studio 6.0.1, XML files, Flat files, Talend Administrator Console, IMS, Agile Methodology, Amazon EC2, Apache Hadoop 1.0.1, MapReduce, Confidential, CentOS 6.4, HBase, Kafka, Scala, Elastic Search, Hive, Pig, Oozie, Flume, Java (jdk 1.6), Eclipse, Sqoop, Ganglia, Hbase.
Confidential, BOSTON, MA
Sr. Talend ETL/MDM Developer
Responsibilities:
- Designed and developed a new ETL process to extract and load Vendors from Legacy System to MDM by using the Talend Jobs.
- Talend MDM 5.1.1 designed and developed the Business Rules and workflow system.
- Developed Talend ETL jobs to push the data into Talend MDM and develop the jobs to extract the data from MDM.
- Developed data validation rule in the Talend MDM to confirm the golden record.
- Developed data matching/linking rules to standardized the record in Talend MDM.
- Writing specifications for ETL processes.
- Installation and configuration of MySQL database servers and Amazon Redshift.
- Experience in Talend MDM and Big-Data for functionality integration and creating data model, data container, views and workflows.
- Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
- Rolled out documentation for the ETL Process, early Data Inventory, and Data Profiling
- Implementing Data Integration process with Talend Integration Suite 3.2/4.2./5.1.2/5.2.2
- Designing, developing and deploying end-to-end Data Integration solution.
- Used different components in talend like tmap, tmssqlinput, tmssqloutput, tfiledelimitede, tfileoutputdelimited, tmssqloutputbulkexec, tunique, tFlowToIterate, tintervalmatch, tlogcatcher, tflowmetercatcher, tfilelist, taggregate, tsort, tMDMInput, tMDMOutput, tFilterRow.
- System design architecture, Data ware house design, ingestion
- Development of staging, Data warehouse scripts and deployment
- Led a team of 5 developers working on BI reports, one UI specialist
- Coordinated with onsite and Offshore team located in India
- Extensively worked on Data Mart Schema Design.
- Developing the ETL mappings for XML, .csv, .txt sources and also loading the data from these sources into relational tables with Talend ETL
- Extensively worked on Data Mart Schema Design.
- Working with Healthcare EDI 834.
- Backup / Restore of databases and Writing Complex Queries
- Experience on Teradata development and performance tuning.
- Analysis, design, development using Amazon Redshift.
- Used efficient optimization techniques to design ETL Scripts.
- Loaded data into Infobright using Talend, FastLoad, MultiLoad, and shell scripts.
- Validating customer requirements, performing analysis to fit in the Jasper reports framework
- Design of Jasper embedded report components to embed them into customer’s application
- Development of Jaspersoft reports, dash boards UI components, writing complex queries to support the interactive reporting logic
- Development of reports using various chart types
- Coordination with Offshore team, providing them guidance and clarifications related to reports,underlying queries
- Perform validation check and deployment of reports to customer’s staging environment
Environment: Talend Integration Suite 5.4.1, Talend MDM, Microsoft SQL Server 2008, Oracle 9i, Windows XP, Flat Files
Confidential
Sr. ETL Developer
Responsibilities:
- Analysed business requirements to design, develop, and implement highly efficient, highly scalable Informatica ETL processes.
- Worked closely with architects and data analysts to ensure the ETL solution meets business requirements.
- Interacted with key users and assisted them with various data issues, understood data needs and assisted them with data analysis.
- Involved in Documentation, including source-to-target mappings and business-driven transformation rules.
- Designed mappings that loaded data from flat-files to the staging tables.
- Involved in designing the end-to-end data flow in a mapping.
- Designed, developed and implemented scalable processes for capturing incremental load.
- Used a wide range of transformations such as the Source qualifier, Aggregator, Expression, lookup, Router, Filter, Sequence Generator, Update Strategy and Union Transformations.
- Used FTP connection to store, stage and archive Flat Files.
- Developed structures to support the front-end Business Objects reports.
- Extensively worked with Repository Manager, Designer, Workflow Manager and Workflow Monitor.
- Developed Informatica mappings, sessions and workflows.
- Developed and executed test plans to ensure that the ETL process fulfills data requirements.
- Worked with required Support Teams on High Critical Bridge for any Prod issues.
- Involved in tuning Informatica Mappings and Sessions as well as tuning Confidential the database level.
- Participated in peer-to-peer code review meetings.
- Data warehouse design integration and development of staging, data warehouse scripts and deployment.
Environment: informatica power center 9.5.1,Talend open studio, Netezza, Teradeta, UNIX, Big DATA.