Solution Lead Resume
SUMMARY
- Highly motivated, solutions driven Big Data Professional with extensive experience in the areas of ETL Design & Development spanning over multiple ETL, ELT and Big Data applications including Development, Integration, Code - Conversion, Data Migration Support & Maintenance and Enhancement projects.
- 3+ years of experience in Hadoop architecture and technologies like HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Impala including Storage Management.
- Hands on experience on various NoSQL databases such as HBase and MongoDB.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java, Python.
- Worked on backend using Scala and Spark to perform several aggregation logics.
- Converted applications on ETL platforms like Datastage into Big Data ELT platform using Hadoop Ecosystem and Talend.
- Broad experience and knowledge in Insurance, Banking, Financial Services, Manufacturing, Managed Health Care Services, Retail and Ecommerce business domains.
- Involved in complete Software Development life-cycle (SDLC) of various projects including requirements gathering, planning, analyzing, system designing, coding, testing and implementing database structures with hands on exposure to database administration, PL/SQL development, production support, installation, configuration, upgrades, patches, performance tuning, backup and recovery, space management, database security, cloning, migration, scheduling, shell scripting, documentation, defect fixes and trouble shooting.
- Strong experience in Data Quality, Source Systems Analysis, defining the data granularity, Business Rules Validation, Source-Target Mapping Design, Performance Tuning and High-Volume Data Loads.
- Extensive knowledge of Relational & Dimensional Modeling, Entity Relationships, Kimball Methodology, Relational Databases, FACT and Dimensions Tables, 3NF, Star and Snow Flake schema, Slowly Changing Dimensions (SCD Type 1, Type 2, and Type 3), ODS, Data Marts, Data Lakes, Active Data Warehousing, Active Enterprise Intelligence.
- Knowledge on Data Flow Diagrams, Process Models, E - R diagrams with modeling tool like ERwin.
- Cognitive about designing, deploying and operating highly available, scalable and fault tolerant systems using Amazon Web Services (AWS).
- Demonstrated understanding of AWS data migration tools and technologies including Storage Gateway, Database Migration and Import Export Services.
- Good experience in building pipelines using Microsoft Azure Data Factory (ADF) and moving the data into Azure Data Lake Store (ADLS), Azure Blob, COSMOS.
- 3+ years of experience using Talend Integration Suite - Data Integration & Big Data (6.0.1, 6.1/5.x/ 7.0.1/7.1.1 ) and other Talend Data Fabric tools -Talend MDM, Talend DQ, Talend Data Preparation, ESB, TAC.
- Extensively created mappings in Talend DI and Big Data using components like tMap, tJoin, tReplicate, tParallelize, tConvertType, tflowtoIterate, tAggregate, tSortRow, tFlowMeter, tLogCatcher, tRowGenerator, tNormalize, tDenormalize, tSetGlobalVar, tHashInput, tHashOutput, tJava, tJavarow, tAggregateRow, tFilterRow, tFilterColumns, tJoin, tHDFSGet, tHDFSPut, tWarn, tLogCatcher, tMysqlScd, tFilter, tGlobalmap, tDie etc.
- Extensive experience in using Talend features such as context variables, triggers, connectors for Database and flat files like tMySqlInput, tMySqlConnection, tOracle, tMSSqlInput, TMSSqlOutput, tMSSqlrow, tFileCopy, tFileInputDelimited, tFileExists.
- Experience in using cloud components and connectors to make API calls for accessing data from cloud storage (Google Drive, Salesforce, Amazon S3, DropBox) in Talend Open Studio.
- Hands on experience on Google Cloud Platform (Google Cloud Storage, Big Query, Big Table, Cloud SQL, Pub/Sub)
- Expertise using BigQuery browser tool and BigQuery Command Line.
- Experience in creating Joblets in Talend for the processes which can be reused in most of the jobs in a project like to Start job and Commit job.
- Expertise in creating sub jobs in parallel to maximize the performance and reduce overall job execution time with the use of parallelize component of Talend in TIS and using the Multithreaded Executions in TOS.
- Experience in monitoring and scheduling jobs using AutoSys, ASG-Zena, Skybot, IBM-TWS, TAC using UNIX Scripting.
- Strong skills on SQL, PL/SQL, expertise on writing complex SQL Queries, Performance analysis, Query Optimization, writing database objects: tables, schemas, indexes, materialized views, partitions, stored procedures, macros etc.
- Expert in working with Data Stage 7.5/8.7/9.1/11.3 Manager, Designer, Administrator, and Director with an experience of over 7 years
- Developed Datastage jobs sequences including server and parallel jobs using different processing stages like Transformer, Aggregator, Lookup, Lookup Sets, Join, Sort, Copy, Merge, Funnel, Modify, Checksum, Filter etc.
- Expertise on building Datastage process for Real Time Integration SOAP (RTI) with web services.
- Experience in troubleshooting of jobs and addressing ETL issues like data issues, environment issues, performance tuning and enhancements.
- Proficient in the preparation of ETL technical and user documentations by following the business rule, procedures and naming conventions.
- Strong in Software Engineering concepts and experienced in several project delivery methodologies - Agile - Scrum/SAFe, Iterative, Test Driven, Waterfall, Kanban etc.
- Worked at several roles like Developer, Solution Lead, Onsite-Off Shore Coordinator and Scrum Master.
- Facilitate business and project team meetings, manage agendas and meeting recaps to ensure all project tasks and goals are being accomplished as expected.
- Excellent coordination, cooperation, organizational, interpersonal skills and cross group collaboration.
- Result oriented professional with an ability to remain highly focused and self-assured in fast-paced and high-pressure environments
TECHNICAL SKILLS
Operating System: Windows XP, Windows 7, UNIX, LINUX
Programming Languages: SQL, PL/SQL, XML, JSON, Java, Python, UNIX Shell scripting, No-SQL, Hive QL, R, Scala, Pig
Scripting Languages: VB Script, Java Script
Database: IBM DB2, MS SQL Server 2005, Oracle 9i/10g/11g, Teradata, Toad, MS Access, Hive, Greenplum, HBase, MongoDB
ETL and Design Tools: Hadoop, Big Data ecosystems (MapReduce, HDFS, HBase, Hive, Sqoop, Oozie, Spark, Impala) IBM-DataStage v7.5.x/ 8.5/ 8.7 /9.1/11.3 , Kafka, Talend Studio (TOS) for Data Integration and Big Data 6.0.1/6.1.2/6.2.2/ 7.0.1/7.1.1 , GCP, Microsoft Azure, Oracle SQL Developer, Toad for DB2, IBM- DataStudio, PUTTY, ERWin 7.0, Microsoft Visio
Version Control: PVCS, GitHub
Job Scheduling Tool: Autosys, ASG-Zena, IBM TWS, Skybot, TAC, CA Workload Automation
Defect Tracking & Mgmt. Tools: HP-QC, JIRA, Rally
Utilities: MS-Office, File Zila, Win SCP, Context Tool
PROFESSIONAL EXPERIENCE
Confidential
Solution Lead
Responsibilities:
- Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Redshift for large scale data handling Millions of records every day.
- Implementing and Managing ETL solutions and automating operational processes.
- Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics
- Wrote various data normalization jobs for new data ingested into Redshift
- Advanced knowledge on Redshift and MPP database concepts.
- Migrated on premise database structure to Redshift data warehouse
- Was responsible for ETL and data validation using SQL Server Integration Services.
- Defined and deployed monitoring, metrics, and logging systems on AWS.
- Implemented Work Load Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer-running adhoc queries. This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.
- Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.
- Responsible for Designing Logical and Physical data modelling for various data sources on Redshift
- Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.
- Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.
- Involved in designing and developing Amazon EC2, Amazon S3, Amazon RDS, Amazon Elastic Load Balancing, Amazon SWF, Amazon SQS, and other services of the AWS infrastructure.
- Wrote python scripts to manage AWS resources from API calls using BOTO SDK and also worked with AWS CLI.
- Created automated pipelines in AWS Code Pipeline to deploy Docker containers in AWS ECS using services like CloudFormation, CodeBuild, CodeDeploy, S3 and puppet.
- Additionally, as a Scrum Master, organized and facilitated scrum ceremonies including poker estimations, backlog grooming, sprint/iteration planning, daily stand-up meetings, sprint retrospectives, sprint demo and release planning.
- Tracked the key scrum metrics like capacity, velocity, burndown, product/sprint backlogs and sprint/release progress on Rally and communicated the same to the team, stakeholders and leadership.
Environment: Redshift, AWS Data Pipeline, S3, SQL Server Integration Services, SQL Server 2014, AWS Data Migration Services, DQS, SAS Visual Analytics, SAS Forecast server and Tableau.
Confidential
Talend Migration Lead
Responsibilities:
- Leaded the Discovery process of understanding the existing Datastage Applications, Source & Target Data, Business Rules, Transformations, Data Exchange Protocols, API’s, Web Services, Shell Scripts under each application of 340B and CDI systems and prepare Technical Understanding Documents, Source-To-Target Mapping.
- Developed and maintained current and target state data architectures, defined strategy and roadmaps to guide transformations.
- Provided several best-fit architectural solutions and framework for converting Datastage Code into Talend Code, defined scope and sizing of work, anchored Proof of Concept developments, drafted best-practices to be followed, ETL tools ‘component level conversion guidelines, Unit Test Cases, exit criteria meeting the SLA’s.
- Created multiple Talend projects similar to Datastage.
- Setup FTP, SFTP using eXact, EDI between Talend ETL Server and 340B/CDI Applications.
- Setup Project environments and Database for 340B and CDI Applications.
- Upgraded and Setup Vendor Proprietary tools for the automated code conversion from Datastage to Talend platform and set up tools like QuerySurge etc. for automating data testing process post migration.
- Implemented automated and manual ETL Datastage-to-Talend platform code conversion from Datastage to Talend platform.
- Developed structures and processes to support data warehouse solutions including availability, logging, monitoring, performance, scalability, and data quality of the new Talend platform solution.
- Created reusable components like routines, context variable and globalMap variables.
- Implemented complex business rules by creating re-usable transformations using the dynamic schema feature and robust mappings using Talend transformations like tMap, tJoin, tReplicate, tParallelize, tConvertType, tflowtoIterate, tAggregate, tSortRow, tFlowMeter, tLogCatcher, tSortRow, tReplace, tAggregateRow, tUnite, tRowGenerator, tNormalize, tDenormalize, tSetGlobalVar, tHashInput, tHashOutput, tJava, tJavarow, tAggregateRow, tFilterRow, tFilterColumns, tJoin, tHDFSGet, tHDFSPut tWarn, tLogCatcher, tMysqlScd, tFilter, tGlobalmap, tDie etc.
- Creating and maintaining metadata tables and writing Talend programs to create dynamic SQL queries to be used for extraction from the source Oracle and Teradata databases and bringing them to staging.
- Implemented FTP operations using Talend Studio to transfer files in between network folders as well as to FTP server using components like tFileCopy, tFileAcrchive, tFileDelete, tCreateTemporaryFile, tFTPDelete, tFTPCopy, tFTPRename, tFTPut, tFTPGet etc.
- Develop Talend DI jobs to populate the data in REF/XREF tables and to create the data stewardship tasks.
- Created Talend ETL jobs to receive attachment files from pop e-mail using tPop, tFileList and tFileInputMail and then loaded data from attachments into database and achieved the files.
- Used tStatsCatcher, tDie, tLogRow to create a generic joblet to store processing stats.
- Created Tables, Indexes, Partitioned Tables, Materialized Views, Stored Procedures and Packages in oracle Database.
- Used tRunJob component to run child job from a parent job and to pass parameters from parent to child job.
- Ran jobs in debug mode of Talend to debug a job to fix errors and troubleshoot.
- Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency.
- Automated SFTP process by exchanging SSH keys between UNIX servers.
- Error Handling, Performance Analysis and Performance Tuning of
- Talend ETL Components, DB Utilities, UNIX Scripts, SQL Scripts etc.
- Set up ESP schedule/dependency across 340B Talend jobs similar to the old Datastage jobs
- Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly, monthly and yearly basis (Cron Trigger).
- Write MapReduce programs to perform data cleansing, transformation and joins.
- Creating Hive external tables and partitioned tables using Hive Index, and used HQL to make ease of data analytics.
- Create HIVE queries to join multiple tables of a source system and load them into Elastic Search Tables and used HIVE QL scripts to perform the incremental loads.
- Using SQOOP to move the structured Oracle data to HBase.
- Processed data into HDFS by developing solutions, analyzed the data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
- Written the Apache PIG scripts to process the HDFS data.
- Used Sqoop to import the data from Hadoop Distributed File System (HDFS) to RDBMS.
- Established custom Map Reduces programs in order to analyze data and used Pig Latin to clean bad data.
- Created algorithms on Address cleansing and Address matching count factors.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Wrote Python program that checks a Linux Directory for incoming XML files, upload all new files to a Google Cloud Platform storage location before the data is parsed and loaded into BigQuery table.
- Built Data Pipelines to ingest structured and Unstructured Data onto Google Cloud to enable ETL capabilities.
- Written queries in BigQuery to look up all the Customer, Product, Order level data.
- Helped to deploy a containerized application on a cluster using Kubernetes, scale the deployment and debug the containerized application.
- Used Cloud Dataflow with Cloud Pub/Sub to enrich, deduplicate, order, aggregate, and land events.
- Executed Mix real-time and batch processing via Cloud Pub/Sub’s durable storage.
- Interacted with Release Management, Configuration Management, Quality Assurance, Architecture Support Database Support, other Development teams and Operations as required to facilitate smooth flow of the project activities.
- Leaded providing tier-1, tier-2 and tier-3 level support and track issues/requests for system enhancements.
- Provided support in Go-Live and Warranty for 60 days for each project and module.
Environment: IBM-DataStage v 9.1,11.3, Talend Open Studio V (7.0.1, 7.1.1), IBM DB2, Hadoop, HDFS, Hive, Impala, MongoDB, As400, PIG, Oracle 10g/11g, Toad for Oracle, Teradata, TAC, CA Workload Automation AE, Postgres SQL, XML, JIRA, Unix., Python, Google Cloud Platform, BigQuery, GitHub, Service Now, Proprietary tools for Test Data Generation, Code Conversion.
Confidential
Big Data Developer
Responsibilities:
- Requirements gathering from Business Analysts and Business Users and Collaborate with ETL Data Acquisition and Reporting teams.
- Acquire and interpret business requirements, create technical artifacts, and determine the most efficient/appropriate solution design, thinking from an enterprise-wide view.
- Responsible for setting up the correct architecture and framework, developing, testing, and implementing tools/processes related to data acquisition/transfer, integration, and management.
- Developed complete end to end Bigdata processing using Hadoop framework for distributed computing across a cluster of up to twenty-five nodes.
- Prepared detailed design and technical documents from the functional specifications.
- Created Talend jobs using the dynamic schema feature and components like tMap, tFilterRow, tjava, toracle, txmlMap, tdelimited files, tlogrow, tlogback, etc.
- Injected data into Data lake using multiple sources systems using Talend Bigdata.
- Developed Spark code using Spark-SQL/Streaming and Scala for faster processing of data.
- Copied the files from one server to another by utilizing Talend FTP components
- Created Implicit, local and global Context variables in the job
- Pushed data as delimited files and bulk data into HDFS using Talend Big data studio using components like tHDFSInput, tHDFSOutput, tPigLoad, tPigFilterRow, tPigFilterColumn, tPigStoreResult, tHiveLoad, tHiveInput, tHbaseInput, tHbaseOutput, tSqoopImport and tSqoopExport.
- Analyzed JSON Files schema definitions for multi nested data hierarchy levels using HIVE-SERDE and created ETL jobs to capture, transform and load data into target DWH.
- Developed Oozie coordinator workflows and sub-workflows for Sqoop, Hive and Spark.
- Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
- Used existing Deal Model in Python to inherit and create object data structure for regulatory reporting.
- Prepared of Hive and Pig scripts for ELT purposes.
- Loaded and transformed data into HDFS from large set of structured data / As400/Mainframe/Oracle/Sql server using Talend Big data studio.
- Optimized hive scripts to use HDFS efficient by using various compression mechanisms.
- Migrated Mainframe code to Hive/Impala.
- Optimization and performance tuning of Hive QL, formatting table column using Hive functions.
- Involved in designing & creating hive tables to upload data in Hadoop and processes like merging, sorting and creating, joining tables
- Design, develop, unit test, scripts for data Items using Hive/Impala
- Exporting final tables from HDFS to SQL server using SQOOP.
- Develop MapReduce jobs in Java for log analysis, analytics, and data cleaning
- Perform big data processing using Hadoop, MapReduce, Sqoop, Oozie, and Impala.
- Assist in designing and development of ETL procedures as per business requirements for Finance Domain.
- Executing parameterized Pig, Hive, impala, and UNIX batches in Production.
- Performance tuning - Using the tMap cache properties, Multi-threading and Parallelize components for better performance in case of huge source data. Tuning the SQL source queries to restrict unwanted data in ETL process.
- Wrote scripts in Python/Java and Shell to meet business requirements and to automate routine tasks carried out by the applications support team.
- Used AWS components (Amazon Web Services) - Downloading and uploading data files (with ETL) to AWS system using S3 Talend components.
- Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly, monthly and yearly basis (Cron Trigger).
- Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse
- Creating and maintaining metadata tables and writing Talend programs to create dynamic SQL queries to be used for extraction from the source Oracle and Teradata databases and bringing them to staging.
- Troubleshoot data integration issues and bugs, analyze reasons for failure, implement optimal solutions, and revise procedures and documentation as needed.
Environment: Talend Open Studio V (6.1.2, 6.2.2), Hadoop, HDFS, Hive, Impala, Scoop, MongoDB, MapReduce, Pig, Oozie, Oracle 10g/11g, Toad for Oracle, Teradata, TAC, CA Workload Automation AE, Postgres SQL, XML, JIRA, Unix.
Confidential
Datastage Module Lead
Responsibilities:
- Requirements gathering from Business Analysts and Business Users and Collaborate with ETL Data Acquisition and Reporting teams.
- Designed and developed mappings between sources and operational staging targets, using Star and Snow Flake Schemas.
- Designing model for various Dimensions and Fact tables and creating jobs in Datastage Designer for Extracting, cleansing, transforming, integrating and loading data into data warehouse.
- Defined strategies for back- and front-end construction, data security, metadata population and usage, Web usage, error handling, reconciliation, and reference data maintenance.
- Performed Data Profiling, Data Migration, Extraction, Transformation and Loading using Talend components like tMap, tJoin, tReplicate, tParallelize, tConvertType, tflowtoIterate, tAggregate, tSortRow, tFlowMeter, tLogCatcher, tRowGenerator, tNormalize, tDenormalize, tSetGlobalVar, tHashInput, tHashOutput, tJava, tJavarow, tAggregateRow, tWarn, tLogCatcher, tMysqlScd, tFilter, tGlobalmap, tDie etc.
- Designed data conversions from wide variety of source systems including Oracle, DB2, Teradata, Hive and non - relational sources like flat files, XML and Mainframe files by creating mappings in Talend.
- Used Talend features such as context variables, triggers, connectors for Database and flat files like tMySqlInput, tMySqlConnection, tOracle, tMSSqlInput, TMSSqlOutput, tMSSqlrow, tFileCopy, tFileInputDelimited, tFileExists.
- Utilized Big Data components like tHDFSInput, tHDFSOutput, tPigLoad, tPigFilterRow, tPigFilterColumn, tPigStoreResult, tHiveLoad, tHiveInput, tHbaseInput, tHbaseOutput, tSqoopImport and tSqoopExport.
- Developed ETL processes using Talend including processing of NoSQL and JSON formats.
- Push data as delimited files into HDFS using Talend Big data studio 6.0.1.
- Developed complete end to end Big data processing in Hadoop environment using components like Hive, Pig, Spark, HBase.
- Built pipelines to move hashed and un-hashed data from Azure Blob to Data lake.
- Created pipelines to move data from on-premise servers to Azure Data Lake.
- Loading data into parquet files by applying transformation using Impala
- Run Hadoop streaming jobs to process terabytes of XML data
- Implemented a 50 node Cloudera CDH4 Hadoop clustered on AWS SUSE Linux.
- Loading data and running Datastage jobs through TWS and SKYBOT.
- Created Implicit, local and global Context variables in the jobs.
- Worked on Talend Administration Console (TAC) for scheduling jobs and adding users.
- Follow all standard operating procedures (SOP) and maintain updated ticketing for events, incidents, requests, changes, problems, etc.
- Did POC on ASG-Zena and worked on implementing ASG-Zena as a scheduling tool in DWH to run Datastage jobs on daily basis. Documented the process and learnings in our DWH process knowledge repository.
Environment: Windows, IBM-DataStage v 11.3, Oracle 11g, Toad for Oracle, Hadoop, Hive, PIG, Impala, Spark, HBase, Talend Big Data Studio 6.0.1, MS Azure, CA Workload Automation AE, ASG-Zena, Unix