Talend Developer/admin Resume
MinnesotA
SUMMARY
- Over 7+ years of extensive experience in Admin, D evelopment, Support and Maintenance of Data Migration, Data warehousing projects using ETL tools Talend and DATASTAGE along with Big Data processing using Apache Hadoop, HDFS, MapReduce, Pig and Hive .
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
- Work on Talend Administrator Console (TAC) for scheduling jobs through job conductor, executing plans for managing jobs, adding users as well as migrating the code versions.
- Experienced in designing DataStage jobs using DataStage designer with various stages like Teradata Connector, Oracle Enterprise Stage, SQLStage , XMLInput , MLOutput, WebsphereMQ, Flat files, Sequential File, dataset and processing stages join, Look - up, Filter, lookup, Transformer, Aggregator, sort.
- Installing, configuring and managing the databases like MySQL, NoSQL, DB2, PostgreSQL , MongoDB, DynamoDB, and Cassandra DB.
- Strong experience on DevOps essential tools like Chef, Puppet, Ansible, Docker, Kubernetes, Subversion (SVN), GIT, Hudson, Jenkins, Ant, Maven and migrated VMWAREVMs to AWS and Managed Services like EC2, S3, Route53, ELB, EBS.
- Involved in developing complex ETL transformation & performance tuning.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop .
- Maintained a Multi-Datacenter Cassandra cluster. Experience in performance tuning a Cassandra cluster to optimize writes and reads. Involved in the process of data modeling Cassandra Schema.
- Worked on setting up pig, Hive and HBase on multiple nodes and developed using Pig, Hive and HBase, MapReduce.
- Experience in Import/Export Jobs from Development to Production Environment.
- Experience as working with various Relational Databases (RDBMS): Oracle, MySQL, SQL Server, Complex Flat Files, Datasets, XML and Flat files .
- Maintain Sqoop processes to move files from SQL server, MySQL into data lake.
- Develop and maintain oozie jobs to ensure successful scheduling of shell and spark jobs.
- Have work experience/exposure on Hadoop technologies MapReduce, Pig, HBase, Hive, Jenkins, CI/CD, Talend Admin, JSON, Splunk, Map R Streams, CDC, Elasticsearch .
- Worked on UNIX shell scripts using Kshell for the Scheduling sessions, automation of processes, pre and post session scripts.
- Extensive Knowledge in relational and multi-dimensional modeling- Data Modeling, Star Schema, Snowflake Schema, determining fact and dimensional tables.
- Excellent work experience in AGILE methodology and development.
- Committed to excellence, self-motivator, fast-learner, team-player, and a prudent developer with strong problem-solving skills and communication skills.
- Ability to learn and use new systems and paradigms in considerably less time, Solid communicator with exceptional team-building skills.
TECHNICAL SKILLS
Languages: T-SQL, PL/SQL, XML, Unix, Linux, Shell Scripting
Databases: MS SQL Server 2014, 2012,2008R2,Oracle, Teradata.
DWH / BI Tools: Talend Big Data Integration 6.4, Talend Data Integration 6.x/7.x, Talend Admin Console, Datastage 7.x
Hadoop Eco System: HDFS, MapReduce, MR Unit, YARN, Hive, Pig, HBase, Impala, Zookeeper, Sqoop, Oozie, DataStax Apache Cassandra , Flume , Spark
Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profile, Visual Studio .Net, Microsoft Management Console, Microsoft Office
Operating Systems: Windows Vista/XP/2003/2000, NT & Windows 9x, MS-DOS and UNIX.
PROFESSIONAL EXPERIENCE
Confidential, Minnesota
Talend Developer/Admin
Responsibilities:
- Worked on Talend Administrator Console (TAC) for scheduling jobs through job conductor, executing plans for managing jobs, adding users as well as migrating the code versions.
- Perform security configuration for users, projects, roles in TAC for SVN, GIT working alongside security and continuous integration group.
- Used Splunk to monitor the system logs as well as notify the incident management system upon exceeding thresholds.
- Ensure availability of sufficient resources of queue memory and cores to run Spark processes n the cluster to avoid downtime to end client working on important insurance data.
- Created projects in company’s cloud platform Openshift Enterprise, provide pod level security and monitor procured resources and logs through Splunk dashboard.
- Support in development of Hive Internal/External Tables and Views using shared Meta store, writing scripts in HiveQL.
- Develop pig scripts to perform data enrichment and make available for BI and reporting.
- Assist in loading large sets of structured, semi-structured and unstructured data coming from various applications into HBase tables and MongoDB databases.
- Utilize Jenkins, Maven to deliver build integration and bigdata cluster deployment solutions across application teams through private key and secure shell.
- Maintain Sqoop processes to move files from SQL server, MySQL into data lake, develop and maintain oozie jobs to ensure successful scheduling of shell and spark jobs.
- Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Installed, monitored and maintained hardware/software related issues on Linux/Unix systems.
- Have experience in developing, profiling and maintaining multi-threaded/asynchronous applications
- Experience in Installing new software releases, system upgrades (both hardware and software (where applicable)), patches, and evaluation of stability.
- Have experience in TWS , Talend open studio , Building Jenkins GitHub pipelines CI/CD for Supported projects.
- Created and managed Source to Target mapping documents for all Facts and Dimension tables.
- Prepared ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.
- Design and Implemented ETL for data load from heterogeneous Sources to SQL Server and Oracle as target databases and for Fact and Slowly Changing Dimensions.
- Utilized Big Data components like tHDFSInput, tHDFSOutput, tPigLoad, tPigFilterRow, tPigFilterColumn, tPigStoreResult, tHiveLoad, tHiveInput, tHbaseInput, tHbaseOutput, tSqoopImport and tSqoopExport.
- Created many complex ETL jobs for data exchange from and to Database Server and various other systems including RDBMS, XML, CSV, and Flat file structures. Integrated java code inside Talend studio by using components like tJavaRow, tJava, tJavaFlex, Used Talend most used components (tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator, tSetGlobalVar, tHashInput & tHashOutput and many more) and Routines.
- Experience in converting Hive or SQL queries into Spark transformations using Python and Scala.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Worked Extensively on Talend Admin Console & Schedule Jobs in Job Conductor, this option is not available in Talend Open Studio.
- Used max JVM parameters and Cursor Size in Talend as a part of Performance tuning.
- Hands on Experience on many components which are there in the palette to design Jobs & used Context Variables to Parameterize Talend Jobs.
- Worked with Parallel connectors for Parallel Processing to improve job performance while working with bulk data sources in Talend .
- Performed root cause analysis on failed components and implements corrective measures and Assist application development teams during application design and development for highly complex and critical data projects also identifying the root cause of slow performing jobs / queries ( HDFS ).
- Have experience in Monitor running applications and provide guidance for improving DB performance for developers providing 24/7 coverage .
- Having good experience on writing and tuning SQL queries in Mongo DB, Teradata, Oracle and SQL Server and Install, configure, and administer Linux servers from scratch and experience in migration efforts ( Java, tools, map r on servers, Migration of SVN to GitHub ).
- Have working Experience with Hadoop technologies such as Splunk , Pig, Protegrity, Map R Streams .
Environment: Talend DI/BDE 7.5.3, Hadoop, MapReduce, HDFS, Hive, HBase, pig, Impala, Cassandra, spark , Java, SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, Oozie etc .Confidential - Dallas
TALEND Developer/Admin
Responsibilities:
- Deployed and scheduled Talend jobs in Administration console and monitoring the execution.
- Created separate branches with in the Talend repository for Development, Production and Deployment.
- Excellent knowledge with Talend Administration console, Talend installation, using Context and global map variables in Talend.
- Developed mappings /Transformation/Job lets and designed ETL Jobs/Packages using Talend Integration Suite (TIS) in Talend 6.1.
- Used Talend job let and various commonly used Talend transformations components like tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator, tSetGlobalVar, tHashInput & tHashOutput and many more.
- Responsible for configuring with SVN with Talend Projects and created multiple users for accessing svn repos.
- Utilized Big Data components like tHDFSInput, tHDFSOutput, tPigLoad, tPigFilterRow, tPigFilterColumn, tPigStoreResult, tHiveLoad, tHiveInput, tHbaseInput, tHbaseOutput, tSqoopImport and tSqoopExport.
- Create Hive databases and tables over the HDFS data and write HiveQL queries on the tables.
- Schedule Hadoop and UNIX jobs using OOZIE.
- Responsible for building data model for ODS/OLAP logical/physical design.
- Modifies, installs, and prepares technical documentation for system software applications.
- Developed POCs for bulk load options, web service API with in Talend.
- Heavily used Talend for building ODS & OLAP structures, data movements and XML& JSON processing.
- Responsible for generating web service SOAP requests for larger volume and run through SOAP service and load the SOAP response into POSTGRE database.
- Set up and managed transactional log shipping, SQL server Mirroring , Fail over clustering and replication.
- Designed the architecture of Talend jobs in parallel from execution stand point to reduce the run time.
- Have handled issues related to cluster start, node failures and several java specific errors on the system.
- Perform troubleshoot on all tools and maintain multiple servers and provide back up for all files and script management servers.
- Wrote backup and recovery shell scripts to provide failover capabilities.
- Worked Extensively on Talend Admin Console & Schedule Jobs in Job Conductors, this option is not available in Talend Open Studio.
- Hands of Experience on many components which are there in the palette to design Jobs & used Context Variables/Groups to Parameterize Talend Jobs.
- Worked Extensively on Talend Admin Console & Schedule Jobs in Job Conductors, this option is not available in Talend Open Studio.
- Experience in using Repository Manager for Migration of Source code from Lower to higher environments.
- Created Projects in TAC and Assign appropriate roles to Developers and integrated SVN (Subversion)
- Worked on Custom Component Design and used to have embedded in Talend Studio.
- Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly, monthly and yearly basis (Cron Trigger).
Environment: Talend DI 6.1, Linux, Unix, Shell Scripting, spark, Pig, Hive, HDFS, Yarn, Hue, Sentry, Oozie, Zoo keeper.
Confidential
DataStage developer
Responsibilities:
- Experience in Creating the Job Sequencer with various stages like Job activity, User Variable Activity, Terminator Activity, Routine Activity, Execute Command Activity and Notification Activity.
- Design to build and maintain batch processes that derive data from data warehouse database and propagate it into downstream systems.
- Worked on TERADATA 14.1, ORACLE 9i and SQLServer2005 databases as part of DataStage development.
- Having experience on UNIX shell scripts and can debug Perl scripts.
- Worked closely with SA’s in understanding of the requirements and converting into technical specifications & Mapping Documents.
- Experience in Data collection, design, analysis and development of systems using ETL.
- Worked on agile projects and waterfall methodologies with cross functional teams, Business analysts and Quality analysts.
- Detailed and problem solving oriented in DataStage jobs and addressing production issues.
- Extensive experience in loading high volume data, and performance tuning.
- Involved in tuning of the DataStage jobs for optimal performance, and performance tuning of oracle queries.
- Used Director for Validation, Running and Monitoring of jobs and logs etc.
- Experience in design, Unit testing, preparing test cases and executing test cases.
- Worked closely with testing team in all phases of testing for successful closure of defects and issues.
- Having experience in defect fixing using HPALM 11.0 & HPQC tools.
- Experience in Import/Export Jobs from Development to Production Environment.
- Good experience in Analyzing and debugging job routines.
- Worked extensively with the Parallel extender for parallel processing in order to improve the job performance while working with bulk data sources.
- Worked on data profiling aspect for the source data by making use of Information Analyzer- Generated Reports, Table Analysis, Column Analysis, Primary Key Analysis and Foreign Key Analysis.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Wrote different pig scripts to clean up the ingested data and created partitions for the daily data.
- Developed Spark programs with Scala, and applied principles of functional programming to process the complex unstructured and structured data sets.
- Worked closely with Business Analysts to review the business specifications of the project also to gather the ETL requirements.
Environment: : IBM Web Sphere DataStage Version 8.1, Oracle10g, and Windows XP and HP-UX (UNIX).
Confidential
Data Stage Developer
Responsibilities:
- Worked on agile projects and waterfall methodologies with cross functional teams, Business analysts and Quality analysts.
- Detailed and problem solving oriented in DataStage jobs and addressing production issues.
- Extensive experience in loading high volume data, and performance tuning.
- Involved in tuning of the DataStage jobs for optimal performance, and performance tuning of oracle queries.
- Used Director for Validation, Running and Monitoring of jobs and logs etc.
- Experience in design, Unit testing, preparing test cases and executing test cases.
- Worked closely with testing team in all phases of testing for successful closure of defects and issues.
- Having experience in defect fixing using HPALM 11.0 & HPQC tools.
- Experience in Import/Export Jobs from Development to Production Environment.
- Good experience in Analyzing and debugging job routines.
- Worked extensively with the Parallel extender for parallel processing in order to improve the job performance while working with bulk data sources.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Wrote different pig scripts to clean up the ingested data and created partitions for the daily data.
- Developed Spark programs with Scala, and applied principles of functional programming to process the complex unstructured and structured data sets.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Good experience with Talend open studio for designing ETL Jobs for Processing of data.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Good experience with continuous Integration of application using Jenkins.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: DataStage7.5.x2 Parallel Extender, Oracle9i, and Windows 2003 and Sun OS 5.8 (UNIX).