Sr. Data Engineer/developer Resume
SUMMARY
- Over 12 years of progressive IT experience and Technical proficiency in Business Requirement Analysis, Application Design, Development, Testing and Implementation.
- Over 7 years of experience in Teradata Database and Application Design, Development, Implementation and Performance Tuning and optimization.
- 4 years of experience on Big Data utilizing Hadoop ecosystem components like Apache Spark (PySpark), HIVE, HQL, MapReduce and Sqoop.
- Experience working on Python by utilizing modules such as MapReduce and collections objects for data processing.
- Extensive experience in working on Data Profiling, Data Analysis, implementing standard elements of Data Quality Dimensions in various verticals such as Banking, Retail, Telecom and Healthcare.
- Technical proficiency in Designing and Developing ETL Applications using Teradata and Hadoop ecosystem tools and utilities such as Apache Spark, Sqoop, BTEQ, FastLoad, MultiLoad and FastExport.
- Proficient in Performance Tuning and Optimization of High volume Databases, ETL applications, SPOOL space issues, Complex Queries using functional and technical approach such as enforcing Parallel Job Steam, applying PPIs, PIs, DBQL, Volatile/Global Temporary Tables, EXPLAIN PLAN, ViewPoint and Collect Statistics.
- Experienced in OLTP/OLAP System design, Analysis and Design Database Schemas such as Star and Snowflake (Fact and Dimension), Data Modeling of large data sets, normalization process, writing source - to-target mapping documents, Dimensional and Multi-dimensional modeling.
- Involved in Full Lifecycle of various projects, including requirement gathering, System Designing, Architectural decision making, application development, enhancement, deployment, maintenance and Post-Production Support.
- Strong commitment towards quality, experience in ensuring experience in ensuring compliance to coding standards and review process.
TECHNICAL SKILLS
Big Data: Hadoop, Apache Spark (PySpark), HIVE, HQL, Sqoop
RDBMS/Databases: Teradata, SQL Server, Oracle, FoxPro, dBASE-III+.
Languages: Python, Visual Basic 6.0, VB.Net
Tools: & Utilities BTEQ, FastLoad, MultiLoad, TPump, FastExport, SQL Server DTS Packages, IBM Tivoli, WLM, AutoSys
OS: LINUX, DOS, Windows.
PROFESSIONAL EXPERIENCE
Sr. Data Engineer/Developer
Confidential
Responsibilities:
- Requirement analysis for various source systems to bring data into Bronze, Silver and Gold Layer (Data Lake).
- Design HLD/LLD and data integration strategy from heterogenous source systems such as Teradata, Oracle, SAS, Flat Files and HDFS to SDE Data Lake.
- Development of all Data Lake layers such as Bronze (Raw), Silver (Pre-processing) and Gold (Final) using Hadoop and Teradata components such as Apache Spark (PySpark), Sqoop, HIVE, BTEQ, FastLoad and TPT.
- Design and develop ETL solution for Teradata source systems using TPT, FastLoad and BTEQ which provision trigger based account level data to SDE.
- Write Sqoop jobs to export data from Gold Layer to Oracle which is an analytical layer for Strategy Team.
- Build and prepare test environments for all required platforms like Hadoop, Teradata and Oracle.
- Perform data validation whenever required by utilizing Hadoop and Teradata components such as HIVE/HQL, BTEQ, SQL Assistant etc.
Confidential
Data Control Check
Responsibilities:
- Co-ordinate with Business and Business Analyst to understand requirement and prepare data sourcing mapping documents.
- Design HLD/LLD, data flow strategy to source data from Teradata and HDFS, proper implementation of Data Control checks at various level such as Source, Raw, Stage and Target.
- Development of Bronze, Silver and Gold layers using Hadoop components like Sqoop, PySpark and HIVE.
- Co-ordinate downstream team to explain the changes and help in their impact analysis.
- Database design at Raw to Stage layer to hold temporary and history data for Change Data Capture (CDC).
- Write ETL scripts such as BTEQ, TPT, Shell Scripts to load data into CP tables as per designed data flow.
- Implement PREPAID data sourcing changes in all impacted objects such as BTEQ scripts, existing profile tables and views.
- Involve in data migration from HDFS to Teradata system using Sqoop for data analysis and also used HIVE to analyze data from HDFS.
- Co-ordinate with Source Team (W) to setup NDM connection to send trigger file, facilitate data for SIT and UAT validation.
- Environment setup for SIT and UAT validation such as new and impacted DDL implementation, migration of codes from Dev to UAT environment.
- SIT and UAT execution, facilitate data for business validation and work on observations.
- Co-ordinate and work with all stake holders on necessary documents for release approval.
- Prepare comprehensive deployment plan in co-ordination with other team’s release.
- Involve and co-ordinate with DBA, ETL and Production support team to implement all the steps as per deployment document.
Environment: Hadoop, PySpark, Sqoop, HIVE, Teradata 14.10, UNIX, BTEQ, MultiLoad, FastLoad, TPT, AutoSys.
Technical Lead/Analyst/Developer
Confidential
Responsibilities:
- As a Technical Lead/Analyst/Developer involved in End2End solution design such as requirement analysis, understand business requirement, prepare Technical Design, define ETL strategy, development, prepare test cases, data validation and solution implementation.
- Worked on AGP FACETS NY BRANDING project. The purpose this effort was to eliminate data integrity issue which was caused due to commissioning new branding prefix criteria in AGP data source system. Involved in design and development of ETL scripts using Teradata utilities such as BTEQ and MultiLoad. Working on multiple BTEQ components to modify complex queries, with data validation and data flow design to fix/load existing membership tables.
- Worked on multiple decommission projects such as Dental DeCare, ZIP CODE, CMS NPPES which are developed using Teradata Load utilities such as BTEQs, MultiLoad FastLoad and Informatica ETL tool.
- Involved in Performance Tuning activities, ad-hoc tuning of complex queries to reduce response time, spool space utilization, identify skew factor and suggest solution.
- Worked on analysis and discovery phase to build a new Dependent Data Mart (set of 146 facts) from FDS to Clinical Program (CP). Involved in impact and risk analysis, development, back-tracking Informatica workflow/Teradata components to understand transformation and business logic to replicate solution into EDWard.
- Worked on CMS NPPES project to design and development of an ETL program using Teradata/Informatica tool to automate receive and process monthly data feed from CMS Gov. website.
- Developed/design Teradata scripts using BTEQ, MultiLoad, FastLoad to source data from csv and fixed length delimited files.
- Co-ordination with various distributed team to discuss proposed solution, ETL strategy, impact assessment, risk assessment etc.
- Developed/procedures to ensure conformity, compliance with Confidential EDWard standards.
Environment: Teradata 14.10, UNIX, Informatica PowerCenter 9.6.1, BTEQ, MultiLoad, FastLoad, WLM, ClearCase
Teradata Analyst/ Developer /Architect
Confidential
Responsibilities:
- Analyze business requirement, design, development, testing and solution deployment.
- Prepare and analyze data flow diagram (DFD) for existing Margin Minder application.
- Prepare Architectural Diagram, discuss and seek approvals from the stake holders.
- Impact analysis, risk assessment, technical document design and Low Level document design.
- Data validation and comparison with legacy Margin Minder process.
- Daily status discussion and task allocation with off-shore team.
- Develop/Modify numerous Teradata Load Scripts such as BTEQs, MultiLoad.
- Develop ad-hoc FastExport and FastLoad scripts to transfer production data to Test environment.
- Write FastExport scripts to fetch historical data for initial load of Margin Minder application.
- Creation of numerous Tables, Views and Macros as per new Margin Minder requirement.
- Responsible for performance tuning and optimization of existing Views, Queries and base Tables.
- Write numerous complex queries consisting various base tables and assists MicroStrategy developer to generate business reports.
- Prepare and execute system and integration test plans which integrates Data Services Jobs, FTP file watcher, CONA ftp servers and Margin Minder ftp servers.
- Responsible for follow and enforce Architectural Review Board (ARB) and Confidential EDW coding standards.
- Prepare detailed pre and post deployment checklist and ensure to meet all Change Management Requirements.
- Detailed Post-production validation such as data validation, proper execution of deployed scripts and monitor jobs.
Environment: Hadoop, HIVE, Teradata 14, UNIX, BTEQ, MultiLoad, FastLoad, FastExport, SAP BW, Data Services, IBM Tivoli, MicroStrategy.
Teradata Analyst/ Developer /Architect
Confidential
Responsibilities:
- Analyze problems, design solution document, discuss with Team of Architects, stakeholders and take approvals.
- Analyze upstream and downstream applications, job logs, DBQLs and do the impact analysis of change.
- Prepare detailed step-by-step Low Level Design (LLD) document for implementing Performance Tuning and Optimization approach.
- Responsible for replicating Teradata, UNIX and Ab Initio test environments for development and testing purposes.
- Developed, modified numerous Teradata Scripts such as BTEQ, MultiLoad, FastLoad, FastExport.
- Developed numerous new Wrappers, Jobs and Schedules in UNIX for IBM Tivoli scheduler.
- Identify and optimized unusual long running queries by scanning monthly queries with help of DBQL queries, ViewPoint and Production logs.
- Optimized long running and complex queries by various optimization techniques such as Explain Plan, Visual Plan and Collect Stats.
- Performance tuning, monitoring and index selection while using ViewPoint statistics wizard and Index wizard and Teradata Visual Explain to see the flow of SQL queries in the form of Icons to make the join plans more effective and fast.
- Pre and Post data validation to ensure proper implementation of Optimized solution.
- Analyze sequential running jobs and optimize to run parallel to make optimum use of Teradata resources.
- Optimized numerous skewed high volume LIVE base tables containing 2-3 billion records by analyzing skew factor and applying proper PIs to enforce uniqueness.
- Optimized tables for various applications by applying optimization techniques such as changing SET to MULTISET, PI to enforce uniqueness, Multi-value Compression (MVC), remove unused columns, change data types to save space etc.
- Resolved SPOOL space issues of various queries by breaking up complex queries, utilizing Volatile and Global Temporary Tables.
- Optimized Presentation Layer (PL) by introducing a process to eliminate 30% i.e. 23 millions disconnected registrations to process in daily job stream.
- Utilized Teradata Monitoring tools ViewPoint, PMON to analyze query execution steps and priorities Queries.
- End to end impact analysis, risks analysis of upstream and downstream applications to avoid any failure.
- Prepare detailed integration and system test plans.
- Prepare deployment checklist, regression plan, review checklist, post deployment activities.
- Post-production validation such as data validation, job monitoring and log analysis using ETLREAD.
Environment: Teradata 13/14, Ab Initio, Siebel, Dice, UNIX, BTEQ, MultiLoad, FastLoad, FastExport, IBM Tivoli, ViewPoint, PMON, ETLREAD.
Confidential
Sr. Data Engineer/Developer
Responsibilities:
- Identify, prepare list of BAGs long running queries by scanning monthly queries with the help of DBQL queries, ViewPoint and logs.
- Optimized at least 50-55 long running complex queries for BAG application.
- Identified and resolved SPOOL space issues for various queries by splitting, utilizing Volatile and Global Temporary Tables.
- Track down each query to find all source temporary and base tables.
- Setup test environment such as replication of Teradata and UNIX environment, brining and loading sample data from production to test environment.
- Analyze, design document, discuss optimization approach with team.
- Impact analysis of applications, job logs, DBQLs for the targeted queries.
- Prepare detailed step-by-step Low Level Design (LLD) document for implementing Performance Tuning and Optimization approach.
- Developed, modified Teradata Scripts - BTEQ, MultiLoad, FastLoad, FastExport.
- Developed Jobs and Schedules in UNIX for IBM Tivoli scheduler.
- Optimized queries by optimization techniques such as Explain Plan, Visual Plan and Collect Stats.
- Performance tuning, monitoring and index selection while using ViewPoint statistics wizard and Index wizard and Teradata Visual Explain to see the flow of SQL queries in the form of Icons to make the join plans more effective and fast.
- Pre and Post data validation to ensure proper implementation of Optimized solution.
- Optimized jobs to run parallel to make optimum use of Teradata resources.
- Utilized Teradata Monitoring tools ViewPoint, PMON to analyze query execution steps.
- Prepare deployment checklist, regression plan, review checklist, post deployment activities.
- Successfully deployed fix for critical LIVE data stream applications with high volume of data.
- Post-production validation such as data validation, job monitoring and log analysis using ETLREAD.
Environment: Teradata 13/14, Ab-Initio, Siebel, Dice, UNIX, BTEQ, MultiLoad, FastLoad, FastExport, IBM Tivoli, ViewPoint, PMON, ETLREAD.
Confidential
Sr. Data Engineer/Developer
Responsibilities:
- Leading team and provide technical assistant to the development team.
- Understand requirement and document business rules/derivations as per the HLD and other technical documents.
- Conduct proof of concept and system analysis.
- Impact analysis and risk assessment.
- Provide technical assistant on tactical (Teradata) solutions in all areas.
- Designed Technical Design Document as per the Scope document HLD.
- LLD designing and task allocation to the development team.
- Explain solution approach with Architectural Board and take approvals.
- Interacted with Business System Analyst to gather information as per changes to be made on Business Data Model.
- Responsible for preparing source/target mapping documents for incremental and initial load.
- Performance tuning and trouble shooting.
- Prepare integrated deployment plan, review checklist, post deployment activities.
- Drive daily status meeting with team to discuss the status, risks and challenges.
Environment: Teradata 13/14, Ab Initio, Siebel, Dice, UNIX, BTEQ, MultiLoad, FastLoad, FastExport, IBM Tivoli, ViewPoint, PMON, ETLREAD.
Sr. System Analyst
Confidential
Responsibilities:
- Analyze business requirement and provide conceptual design.
- Design Low Level Document (LLD) and programming specifications.
- Developed various Teradata Load Scripts for initial and incremental load using BTEQ, FastLoad, MultiLoad, FastExport.
- Developed numerous ad-hoc FastExport, MultiLoad and FastLoad scripts to fetch historical data from Production to Business.
- Created numerous INSERT and UPDATE Macros to trigger data load from JAVA environment.
- Responsible for providing daily and weekly and on-demand CSA extracts to the business users.
- Developed and configured many jobs for Global Command Center project (GCC) and Capacity Management Planning (CMP) project thru Control M.
- Performance tuning and optimization of queries and tables thru the entire development cycle.
- Database design and data modeling in initial phase of the project.
- Developed Disaster Recovery (DR) module for routine backup of daily snapshots.
- Involved in developing UNIX shell script to send auto emails to users if threshold meets.
- Responsible for post-production validation, trouble shooting and fixing issues.
- Provide technical assistant to team.
Environment: Teradata 13, UNIX, BTEQ, MultiLoad, FastLoad, FastExport, Control M, D1, CSA, ES (Oracle)
Technical Lead
Confidential
Responsibilities:
- Analyze Business Analyst and client requirement.
- Responsible for interface design, process design and functional specification.
- Conduct proof of concept and system analysis.
- Impact analysis and risk assessment.
- Explain solution approach with Architectural Board and take approvals.
- Develop solution using Teradata Load utilities such as BTEQ, FastLoad and MultiLoad.
- Writing load processes using Teradata tools like BTEQ, MLoad, TPump,
- Writing new and changing exiting stored procedures to implement new price features.
- Prepare and execute integration and system test plans.
- Performance tuning and trouble shooting.
- Prepare, execute deployment checklist.
- Post-production validation.
Environment: Teradata, UNIX, BTEQ, MultiLoad, FastLoad, FastExport, MS-ACCESS.