Architect / Bigdata Engineer Resume
Memphis, TN
SUMMARY:
- Over 16 years of IT experience in Data Warehouse / Data Mart Architect, Design Development using ETL / Bigdata technologies.
- 5 years in installation, configuration and administration of Big Data Technologies using Horton Works, Ambari, HDP 2.5, HDFS, HBase, Hive, Cassandra, Zoo Keeper, Yarn, Vertica. Pig Latin, Oozie, Map/Reduce, Talend.
- Excellent experience in SQL and No SQL databases like HBase, Hive, MongoDB, Cassandra, Vertica, Oracle, DB2, Netezza, Teradata, SQL Server, PeopleSoft EPM, SAP etc.
- Proficient in managing complex Bigdata environments of Cloudera / Horton works Hadoop.
- Strong hold on all Hadoop platform technology and tools like - Harton Works and Cloudera.
- Strong Big Data programming skill with Spark, Scala, Python, Pig and Talend using Databases HBase, Hive, Mango, Cassandra and Vertica etc.
- 12 years of Linux / Unix Administration and support experience.
- 10 years of IBM Infospehere Information Server Data Stage Admin experience.
- Lead projects from initialization phase from capturing business requirements to final phases of Production support.
- Served as mentor and reviewer to junior Specialists in all areas of data warehousing projects.
- Experienced in End user training, documentation, and Audit record maintenance and application and product demos.
- Managed all aspects of the SDLC process (Waterfall, Iterative and Agile) - requirement analysis, time estimates, functional specification, design, development, testing and Production support/maintenance.
- Strong knowledge of programming experience in Linux Shell scripting, C, C++, Java, Perl and Python Programming.
- Solid understanding of Data Warehousing, OLTP and OLAP concepts.
- Extensively worked on implementation of physical and logical Data Warehouses and Data Marts and built multidimensional cubes for strategic decision making, data modeling in star schema and snow flake schemas.
- Experienced in creating required documentation for production support hand off, and training production support on commonly encountered problems, possible causes for problems and ways to debug them.
- Expertise in MQ Series for loading, unloading data. Troubleshot MQ channels, queues, services and logs verification.
TECHNICAL SKILLS:
Big Data: Horton Works, HDP, Ambari, Talend, Spark, Storm, Pig, Hive, HBase, Sqoop, Zookeeper, Ooze, MapReduce.
ETL Tools: IBM Data stage, Talend, Informatica and Sagent.
Databases: HBase, Hive, MongoDB, Cassandra, Vertica, Oracle, DB2, Netezza, Teradata, SQL Server, PeopleSoft EPM, SAP
UNIX Tools: Red Hat Linux, Aix, Sun OS, Z/OS
Data Modeling: Star Schema, Snow flake Schema, physical and logical data modeling
Reporting Tools: Seagate Crystal Reports 6.x/ 5.x, Developer 2000 and MS Access Reports.
Languages: SQL, PL/ SQL, C, C++, VC++, Perl, Java, JDBC, VB, and XML.
ERP: SAP R/ 3, ECC 6.0, BW3.5, BI 7.0, XI, and SAP pack 5.0.
PROFESSIONAL EXPERIENCE:
Architect / Bigdata Engineer
Confidential, Memphis, TN
Responsibilities:
- Provided architectural solutions and best practices covering data structures, sourcing, integration and data retention.
- Created software Installation plan, Configured Linux Servers for Development, Test and production servers and installed required fix packs.
- Installed and configured cluster with 60+ nodes for HDP 2.5 Development and Productions.
- Installed tools like- Horton Works Ambari, Sqoop, Flume, Map Reduce, Pig, Hive, HBase, Zookeeper, Ooze, HDP, HDFS, Talend, Yarn, Spark, Python etc.
- Set up the projects, roles, users, privileges in different environments (DEV, QA and PRD).
- Worked on Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Created metrics and measured resource utilization and performance tuning.
- Worked on capacity planning and implementation of new / upgraded hardware and software releases as well as storage infrastructure.
- Performed Day to Day administration activities.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment
- Conducted team meetings for Code reviews and QA, Production support teams for code testing and Production migration.
- Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis
- Introduced the use of development techniques such as source control, code reviews, test plans and Production support run books.
- Worked on NoSQL databases including HBase, Mongo DB, and Cassandra.
- Documented Production support run books for future use.
Environment: Horton Works (HDP2.5), Ambari, HDFS, Red Hat E Linux 7, Talend, Pig, Hive, HBase, Zookeeper, Yarn, Spark, Python, Oracle, DB2, Netezza, SAP, SQL Server.
Lead Bigdata Engineer
Confidential, Detroit, MI
Responsibilities:
- Evaluated and recommended systems software and hardware for the enterprise system including capacity modeling.
- Provided architectural solutions and best practices covering data structures, sourcing, integration and retention.
- Installed HDP 2.5 cluster with Ambari, Pig, Hive, HBase, Zookeeper, Yarn, Spark, Python, HDFS.
- Set up the projects, roles, users, privileges in different environments (DEV, Model, and PRD).
- Participated business meetings with vendors and clients in finalizing the Data warehouse enhancements.
- Retrieved data from HDFS into relational databases with Sqoop.
- Conducted team meetings for Code reviews and QA, Production support teams for code testing and Production migration.
- Monitored workload, job performance and capacity planning using Ambari.
- Adding/installation of new components and removal of them through Ambari.
- Worked on Performance tuning and database improvements with many data bases.
- Documented Production support run books for future use.
Environment: Horton Works (HDP2.5), Ambari, HDFS, Red Hat E Linux 7, Talend, Pig, Hive, HBase, Zookeeper, Yarn, Spark, Python, Oracle, DB2, Netezza, SAP, SQL Server.
Infrastructure Admin
Confidential, Plano, TX
Responsibilities:
- Worked with Conventional data warehouse technologies to come up with plans that best support our customer’s business strategies.
- Provided architectural solutions and best practices covering data structures, sourcing, integration and retention.
- Worked in Installation and configuration of Bigdata tools HDP 2.5, Amari, Talend, Python, Yarn, Pig, Hive, and Vertica.
- Created IBM info sphere software Installation plan, Configured Linux Servers for Development, Test and production and installed software IBM IS 11.3 and fix packs, created required projects.
- Set up the projects, roles, users, privileges in different environments (DEV, QA, and PRD).
- Configure, tune, and maintain Data Stage application instances and supporting underlying database and OS environments
- Worked with IBM to create PMR tickets and patch installations to fix the Info Sphere issues.
- Created high level design documents for extract, transform, validate and load ETL process and flow diagrams.
- Created design standards, shell scripts for data manipulation and scheduling purpose.
- Worked with Scheduling Tools Ooze, Autosys, Crontab, and Control - M.
- Worked with developers to define ETL best practices and standards and design, develop and maintain ETL architecture, reusable code assets and solutions.
- Conducted team meetings for Code reviews and QA, Production support teams for code testing and Production migration.
- Introduced the use of development techniques such as source control, code reviews, test plans and Production support run books.
- worked on Performance tuning and database improvements with many data bases
- Documented Production support run books for future use.
Environment: Horton Works (HDP2.1), HDFS, GFS, Talend, Python, Red Hat E Linux 7, Aix 7.1, Oracle 11g, Netezza, SAP, Vertica, Pig, Hive, HBase, SQL Server, Aqua Data, Toad, FTP Switch, Clear Case.
Data Warehouse Admin
Confidential, Arlington, TX
Responsibilities:
- Worked with Conventional data warehouse technologies to come up with plans that best support our customer’s business strategies.
- Provided architectural solutions and best practices covering data structures, sourcing, integration and retention.
- Created IBM info sphere software Installation plan, Configured AIX Servers for Development, Test and production and installed software IBM IS 11.3 and Applied fix packs, created required projects.
- Created high level design documents for extract, transform, validate and load ETL process and flow diagrams.
- Created ETL design standards, shell scripts for data manipulation and scheduling purpose.
- Worked with ETL developers to define ETL best practices and standards and design, develop and maintain ETL architecture, reusable code assets and solutions.
- Conducted team meetings for Code reviews and QA, Production support teams for code testing and Production migration.
- Introduced the use of development techniques such as source control, code reviews, test plans and Production support run books
- Worked on Data Warehousing Projects using Horton Works, Hive, Pig, HBase and Data Stage to load data into Data Warehouse.
- Processed Billions of Records using Datasets and HDFS.
- Introduced the use of development techniques such as source control, code reviews, test plans and Production support run books
- Worked with Business Intelligence teams running business reports using Cognos & BO
Environment: Horton Works (HDP2.1), IBM IIS 9.7, Aix 7.1, HDFS, Hive, HBase, Oracle 11g, DB2, Netezza, SQL Server, Aqua Data, Toad, FTP Switch, Clear Case and control-m.
ETL Admin
Confidential, Richardson, TX
Responsibilities:
- Participated business meetings with vendors and clients in finalizing the data warehouse enhancements.
- Created high level design documents for extract, transform, validate and load ETL process and flow diagrams.
- Assigns tasks to Development Team and participates code review sessions.
- Conducted business meetings to resolve the design issues.
- Installed IBM information Server 8.5 in Red Hat Linux 6, Created Projects, Users and Code Migration.
- Created data mappings from source to target as per business requirements.
- Provided oversight and direction for the design and development of the data layers and models.
- Created UNIX shell script to run jobs/ exception handling/ email notification which basically controlled the number of job runs in the server, and success/ failure reports.
- Maintained quality of data in the data warehouse, ensuring data integrity and established design procedures.
- Tuned production jobs to reduce the cycle time and resource utilization.
- Extracted data from various sources i.e Oracle, SQL, DB2, SAP systems and processed and loaded into Oracle data warehouse.
- Worked with business analysts to make sure they got required data.
- Worked with IBM to create PMR tickets and patch installations to fix the Info Sphere issues.
Environment: IBM Information Server 8.5, Oracle 11g, DB2-9.7, Netezza, SQL server 2008, SAP, Linux Red Hat 6, Clear Case, Control-M, Crontab, Erwin, Toad, Aqua Data, FTP Switch, and Ultra Edit.
ETL Lead
Confidential, Fort Worth, TX
Responsibilities:- Participated in business meetings with vendors and clients in finalizing the Business Requirements.
- Created high level design documents for extract, transform, validate and load ETL process and flow diagrams.
- Setup new Development and production environments for OMLA, and BORKER LOANS projects.
- Designed new jobs for OMLA and BROKER LOAN Project
- Studied the existing systems, and created generic jobs to reduce man hour.
- Tuned jobs to Improve performance.
- Wrote shell scripts for automation, job run, file processing, FTP, initial load, batch loads, cleanup, job scheduling and reports.
- Converted Stored Procedures into Parallel Jobs.
- Involved in system and performance testing and performance tuning of SQL queries.
- Took part in code review and performance analysis of new jobs.
Environment: Data Stage 8.1, Oracle 10g, UNIX AIX, MS Access, Crontab, Erwin, Toad, FTP Switch, and Ultra Edit.
Sr. ETL Developer
Confidential, Middletown, CT
Responsibilities:
- Participated in business meetings with vendors and clients in finalizing the enhancements.
- Created high level design documents for extract, transform, validate and load ETL process and flow diagrams.
- Setup new project environments for BOA, and ATV projects.
- Studied the existing systems, and created generic jobs to reduce man hour.
- Involved in new projects development and created parallel jobs using DB2 UDB, ODBC. Net, lookup, join, sequential file, transformer, dataset, peak, funnel, sort, aggregator, remove duplications, modify, CDC, row generator, row merger and other stages.
- Designed the jobs between sources ( Confidential and MVS data databases) to Data Warehouse/ Data Marts for BOA ( Confidential ), and ATV (Aetna Total Clinical View) projects in Informatics and TKC team.
- Involved in data mapping from source to target as per business requirements.
- Created wrapper script to run jobs/ exception handling/ email notification which basically controlled the number of job runs in the server, and success/ failure reports.
- Established best practices for Data Stage jobs to ensure optimal performance, reusability, and restart ability.
- Wrote shell scripts for automation, job run, file processing, initial load, batch loads, cleanup, job scheduling and reports.
- Involved in system and performance testing and performance tuning of SQL queries.
- Took part in code review and performance analysis of new jobs.
- Designed and tested the migration of Data Stage jobs from development system to test system and then to production system for operational data store and Data Warehouse.
- Scheduled jobs using UNIX crontab and ZEKE Scheduler.
- Involved in framework design to minimize the number jobs in development.
- Developed load utility using MS Access for loading tables.
- Worked with Data Stage Manager for importing metadata from repository, new job categories and creating new data elements.
- Developed metadata and implemented RCP (Runtime Column Propagation).
- Involved in metadata driven processes, where we only need to insert a row in DB2 table to run new similar sources.
- Created generic jobs for all recurring processes such as FTPing files from various sources, SAS, and other server of BOA.
Environment: Ascential Data Stage PX 7.5, DB2 UDB 9.5, VB, Oracle 10g, UNIX AIX, Z/ OS, Tivoli, MS Access, Crontab, ZEKE Scheduler, Erwin, Control Center, FTP Switch, and Ultra Edit.
Sr. Data Stage Admin
Confidential, Milwaukee, WI
Responsibilities:
- Created high level design documents for extract, transform, validate and load ETL process and flow diagrams.
- Applied patches and installed clients on end users.
- Created projects in new development, QES and production environment.
- Involved in migration Data Stage 7.0 to IBM Web Sphere 8.0.
- Exported and imported projects and metadata from DS7.0 to new development box DS8.0
- Involved new projects development in creating server and mainframe jobs.
- Involved in data mapping from source to target as per business requirements.
- Monitored jobs and found out the abend jobs, hung jobs or failed jobs and resolved them accordingly.
- Created cleanup and archival UNIX scripts and tested and then exported to production.
- Tested and validated jobs in DS8.0 development.
- Involved in system and performance testing.
- Scheduled jobs using UNIX crontab and ZEKE for development, QES, production jobs, cleanup jobs and reports.
- Monitored disk space and resolved the issue by deleting obsolete files, large log files and test files.
- Created tickets for all failed or hung jobs and also for the disk space usage.
Environment: IBM Web Sphere Data Stage 8.0, Ascential Data Stage 7.0 (server and mainframe), Oracle 10g/ 9i, Teradata R12, DB2 UDB 9.5, SQL Server, SAP R/3, UNIX, Linux, MVS (TSO/ISPF), crontab, and ZEKE Scheduler.
Sr. Data Stage Developer
Confidential, San Francisco, CA
Responsibilities:- Created high level design documents for extract, transform, validate and load ETL process and flow diagrams.
- Participated in business meetings with vendors and clients in finalizing the enhancements.
- Designed logical and physical models using Erwin data modeling tool.
- Involved in designing and development of both server jobs and parallel jobs to extract data from PeopleSoft GL, People Soft EPM load into Oracle, text files, sequential files, flat files, and MS Access.
- Wrote SQL and PL/ SQL queries for aggregation and outer joins for better performance.
- Wrote shell scripts for job scheduling.
- Created job sequencers.
- Involved in system and performance testing.
- Took part in code review and performance analysis of new jobs.
- Involved in production support along with designing and development of jobs.
- Preformed code migration, and performance tuning.
- Scheduled jobs using UNIX crontab, Data Stage Scheduler and ZEKE for development, QES, production jobs, cleanup jobs and reports.
Environment: Ascential Data Stage PX 7.5.2, Quality Stage, PeopleSoft EPM, GL, Oracle 10g/ 9i, DB2, MS Access, UNIX, Linux, crontab, Tivoli, MVS, XML, sequential files, and complex flat files.
Data Stage Developer
Confidential, Richmond, VA
Responsibilities:- Involved in designing and development of both server jobs and parallel jobs to load data from text files, sequential files, flat files, and MS Access.
- Created high level design documents for extract, transform, validate and load ETL process and flow diagrams.
- Used stages such as Lookup, Join, Sequential File, Transformer, Dataset, Peak, Funnel, Row Generator, Row Merger and other stages.
- Creating source files by loading data from various tables in the repository.
- Wrote routines in validating fax/ phone numbers and e-mail in validation jobs for their contact preference.
- Created both local and shared containers to use or move data in or across the repositories.
- Used change capture, aggregator, merged stages to compare, performed summation and combined data sets.
- Created job sequencers and applied dependency logic using Data Stage job sequence.
- Took part in code review and performance analysis of new jobs.
- Involved code migration and performance tuning.
- Involved in production support along with designing and development of jobs.
- Scheduled jobs in the production environment by using the scheduling functionality of the Data Stage Director.
- Monitored disk space and resolved the issue by deleting obsolete files, large log files and test files.
- Created tickets for all failed or hung jobs and also for the disk space usage.
Environment: Ascential Data Stage PX 7.5.1, Quality Stage, Oracle 10g, Teradata, UNIX, Crontab, ZEKE Scheduler, XML, Perl, sequential files, complex flat files, and SQL Server 2000.
Data Stage Developer
Confidential, Cypress, CA
Responsibilities:- Involved in the development phase for business analysis and requirements gathering.
- Coordinated with the team members to translate business requirements into Data Mart design.
- Designed logical and physical models using Erwin data modeling tool.
- Coordinated with the team members to translate business requirements into Data Mart design.
- Created and loaded Data Warehouse tables such as dimensional, fact and aggregate tables using Ascential Data Stage.
- Created Data Stage jobs, batches and job sequences and tuned them for better performance.
- Designed and developed ETL processes using Ascential Data Stage.
- Used extensively ODBC, OraOCI, and Transformer, Sort, Sequential File and Aggregator stages in the jobs.
- Worked with MQ Series for loading, unloading data. Troubleshot channels, queues, services and logs verification. Started and stopped channels.
- Designed the jobs between sources (Oracle database) to Data Warehouse/ Data Marts (MS SQL Server)
Environment: Ascential Data Stage PX 7.5 (Designer, director, manager, and administrator), Quality Stage, Oracle 9i, MS SQL Server, DB2, DB2UDB, MQ Series, AIX and shell scripting.
Data Stage Developer
Confidential
Responsibilities:- Involved in the analysis phase for business analysis and requirements gathering.
- Coordinated with the team members to translate business requirements into Data Mart design.
- Created and loaded Data Warehouse tables such as dimensional, fact and aggregate tables using Ascential Data Stage.
- Worked on modifying the star schema dimensions to meet the current requirements. The granularity of the fact table was set to the most atomic level according to the requirements.
- Worked with parallel extender for heavy load of data transfer.
- Worked with Data Stage Administrator to create/ delete/ edit projects. Worked with DataStage Administrator to tune the memory setting for Read - Cache and Write - Cache
- Worked with Data Stage Administrator to unlock the jobs in case a jobs gets locked-up, and recreated the indexes of the project (Clean-Up)
- Involved in jobs and analyzed scope of application, and defined relationship within and between groups of data, and star schema.
Environment: Ascential Data Stage 6.0 (DS Designer, DS Director, DS Manager, and DS Administrator), Oracle 8i, DB2, flat files, PL/ SQL, Windows NT 4.0, UNIX (Korn Scripts), and SQL* Loader.
Oracle Developer
Confidential
Responsibilities:- Developed Oracle SQL, PL/SQL code based on requests raised by changes in business logic, norms and standards.
- Developed Unit, system and performance Test Plans.
- Performed Unit Testing and Quality Assurance activities on the codes developed by various team members.
- Designed custom performance reports for UK, China, and India using WebTrends tool from webserver logs for business and management.
- As a part of the Infinity project, modified Oracle procedures for downloading data from Financial, and SQL Loader scripts for uploading into database.
- Prepared MIS reports for Infinity applications system regarding status of Change Requests coming from business or proactive changes initiated by the team.
- Also worked with Database Admin to create appropriate indexes for faster execution of jobs.
- Developed UNIX shell scripts to automate the Data Load processes to the target Data warehouse.