Senior Big Data/etl Architect Resume
Atlanta, GA
PROFESSIONAL SUMMARY:
- 12+ years of professional IT experience in all phases of Software Development Life Cycle which includes hands on experience in Data Warehousing, Big Data, ETL & Data migration
- Facilitated and documented evaluation of business/IT systems for an internal IT systems Audit project’s (As - Is vs To-Be)
- In depth understanding/knowledge of Hadoop Architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce.
- Aspiring Project Management Professional with proven record in Planning, Executing, Controlling and Reporting implementations in projects life cycle using Waterfall and Agile. Assisted PMO and Competency-centers
- Good knowledge of Hadoop ecosystems, HDFS, Big Data, ETL Concepts, RDBMS
- Hands on delivery experience working on popular Hadoop distribution platforms such as Cloudera and Horton Works
- Expertise in Data issue resolution, debugging and application performance tuning
- Experience in google platform architecture & Bigquery & Proficient in designing efficient workflows
- Expertize in Confidential Infosphere Information Governance Catalog
- Broad understanding of dimensional data modeling like Star schema, Snow flake schema, fact and dimension table design.
- Conducted POC’s on PLC HANA, SAP ERP Packs, MQ Connector & ODBC Connector.
- Experience in Ready to launch solution (RTL) which is majorly used for SAP Migrations.
- Expertise in designing and architecting large-scale Hadoop deployments for production, development and testing environments, which include planning, configuration, installation, performance tuning, monitoring and support
- Good understanding of Distributed Systems and Parallel Processing architecture
- Experience in working with XML & DoubleClick data (Google’s unstructured data)
- Innovative thinking and deep technology expertise, self-motivated team player, excellent communication and interpersonal skills with results oriented dedication towards goals
- Proficient in Waterfall & Agile development methodology
- Involved in supporting 24/7 production applications
- SDLC, Big Data, Hadoop, SAP, Data Architecture, Data Governance, Data Migration, Reporting, Business Intelligence, Audit & Compliance
TECHNICAL SKILLS:
Big Data Ecosystems: Greenplum,Hana Cloud, Datameer, Sqoop, Spark, Pig, Hive, Flume, Kafka, Hadoop, Ambari, MapReduce, Netezza, Mongo DB,Pentaho Community Edition, Lumira
Reporting Tools: Spotfire, Tableau, OBIEE, Hyperion, Business Objects
Databases: Greenplum,Netezza,Hana,HBase,Teradata,Oracle,DB2,SQLServer,MSAccess,Mainframes MongoDB Hue,Gemfire
ETL & Scheduling tools: Confidential Infosphere Information Server, Informatica Powercenter 9.0/8.6/8.1,BODS,SSIS, CTL-M, Autosys,DAC and Tivoli
Operating System: UNIX, Windows, Red hat Linux MVS Mainframes
Languages: SQL,NO SQL,HQL,PostgreSQL,Shell scripting,Java, C, C++
Tools: and Utilities SharePoint, Eroom,LDAP, LSMW, Erwin, SQL Assistant, Visio, Teradata suit of tools, Priority scheduler, Index Wizard, statistics wizard, Arcmain, HPQC
ERP: SAP R/3, SAP ECC 6
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta, GA
Senior Big Data/ETL Architect
Responsibilities:
- Architected data systems, strategies and business processes for new reporting projects (IBIS,PLC,BVS)
- Highlighted gaps in reporting and data management capabilities, and developed mitigation strategies
- Provided recommendations on data gathering and analysis based on best practices and emerging technologies
- Involved in creation of enterprise-level information/data standards and governance processes
- Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation and testing
- Involved in Data migration to SAP HANA from multiple National & International sites
- Worked with highly unstructured and semi structured data
- Created data feeds using PostgreSQL & Extensively Used Hive to do transformations and some pre-aggregations before storing the data onto HDFS
- Processed HDFS data and created external tables using Hive, in order to analyze visitors per day, page views and most purchased parts
- Involved in analyzing data using google bigquery to discover information, business value, patterns and trends in support of decision making & Worked on data profiling and data quality analysis
- Interacted with different teams from Business, Technical and application owners at AGCO globally.
- Created the scope of each Business object using Business Data Roadmap tool.
- Exported the database into conversion workbench CWDB database.
- Extracted the data from legacy systems into staging area using ETL jobs & SQL queries.
- Structural harmonization is done by extracting the data from different staging tables into alignment area table by integrating multiple source tables from staging into alignment schema table.
- Applied different business rules while loading data into alignment schema after data is extracted from staging area database table
- Ran GAP reports pre-requisites jobs to generate Information Analyzer Data rules.
- Documented the workflow for knowledge transfer and maintained the documenting standards along with screen shots to meet variety of audience and Audit purpose
- Involved in multiple Cutover/release activities for different rollouts
- Used JIRA for bug tracking and Active batch for scheduling jobs
Environment: Cloudera suit of tools, Greenplum, HDFS, Tableau, Map Reduce, Mongo DB, Sqoop, Information Server11.3/Datastage, Informatica 9.6, SAP ECC 6, Jira, Confluence,Pentaho Community Edition
Confidential, Greater Philadelphia, PA
Senior Big Data Architect/Consultant
Responsibilities:
- Wrote Company's MetaData standards and practices, including naming standards, modeling guidelines and Data Warehouse Strategy
- Created a comprehensive Project Plan outlining all tasks and deliverables supporting the development of an enterprise data model
- Reviewed, analyzed, and evaluated business/IT systems and user requirements for the IT systems Audit
- Involved in creating UDF’s in pig and hive where Custom Functionalities are required.
- CreatedHBasetables to load large sets of structured, semi-structured data and grouped related columns into column families to improve the query performance.
- Performed server assessment, resources estimation, time estimation before installing the Confidential InfoSphere DataStage 11.3 on an Confidential AIX Server with database as Oracle 11g.
- Worked closely with NYC HRA Infrastructure teams to resolve database issues and server issues.
- Documented and mentored the team on installation, configurations and usage of the tools
- Made recommendations for performance issues while importing and exporting data out of Greenplum
- Worked with Partitions, Bucketing concepts in Hive and designed and Managed External tables in Hive for optimized performance.
- Responsible for Loading and transforming large sets of structured, semi structured and unstructured data.
- Experience in managing and reviewing Hadoop log files and Managing and scheduling Jobs on a Hadoop cluster.
- Used Hive as ETL tool to do transformations and some pre-aggregations before storing the data onto HDFS.
- Processed HDFS data and created external tables using Hive, in order to analyze visitors per day, page views and most purchased plans
- Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster
- Trained and guided the team on Bigdata, Greenplum, Hadoop framework and HDFS concepts
- Used JIRA for bug tracking and Active batch for scheduling jobs
Environment: Hadoop, Ambari, Cloudera suit of tools, Confidential Infosphere Information Server 11.3/Datastage, Confidential AIX 7.0, Oracle 11g, Netezza., MySQL, shell scripts HDFS, Unix, Active Batch, JIRA
Confidential, Atlanta, GA
Senior Big Data/ETL Architect
Responsibilities:
- Involved in the full lifecycle for the Big Data solution including requirements analysis, technical architecture design, solution design, solution development, testing & deployment
- Involved in IT strategies and practices for client engagement, project methodology, architectural governance
- Responsible for building scalable distributed data solutions using Hadoop & data coming from different sources
- Designed and Developed full stack for Hadoop Distributed File System (HDFS) framework, including MapReduce, Hbase, Hive, Framework & Involved in loading data from UNIX file system to HDFS.
- Responsible for creating daily, monthly feeds and reports on Greenplum
- Migrate the data from Current platform(Datastage,Informatica) to new Big data platform build on Hadoop clusters
- Involved with security team in setting standards to maintains data security and integrity
- Involved in successful execution of test cases and Help troubleshoot issues by analyzing data
- Interacted with Business and Engineering teams to understand reporting requirements.
- Worked with application teams to install operating system,Hadoopupdates, patches, version upgrades as required
- Involved in leading a team of 5 on-site resources & maintain documentation, policies and standards.
Environment: Hadoop, Cloudera suit of tools Datameer 2.1.5, Greenplum 4.2.1,Hue 2.3.0,Hive, Hbase, Sqoop, Oozie, Spotfire, java 7, Map Reduce, MySQL, shell scripts, Siebel, Information Server/ 8.5/8.1,Informatica power center 9.1, HDFS,Unix, DAC
Confidential - Johnson Controls, WI
Senior RTL Architect/consultant
Responsibilities:
- Performed server assessment, volume estimation, topology design, resources estimation, time estimation before installing the Confidential Infosphere DataStage 9.1.
- Installed and configured the Confidential Infosphere DataStage 9.1 on all three environments Dev, Unified Dev and Production.
- Worked closely with Confidential support team to resolve opened PMR’s.
- Involved in building the best practices related to all the Infosphere tools like Metadata Workbench, Business Glossary, Information Analyzer, FastTrack, Asset Manager, QualityStage, DataStage etc.
- Responsible for Functional specs signoffs for OTC and STP objects and Involved in source to target mapping documents.
- Involved in the creation of road-map for Pilot release and future releases.
- Create a strategy to deploy solutions and ensure that there are no gaps or deviations in implementations.
- Responsibilities include migration of jobs (450 jobs approx) for 16 process areas across different projects.
- Responsible for creating connectivity between Datastage and different SAP Clients using Datastage Administrator for SAP & Extensively used IDOC stage and BAPI stage for loading data directly into SAP.
- Involved in unit testing and validating the data on target SAP system and Familiar with the SAP table structures and T-codes & Used background processing in SAP for faster processing of data.
- Documented the workflow for knowledge transfer and maintained the documenting standards along with screen shots to meet variety of audience and Audit purpose & Used T-codes to load the data in the SAP tables.
- Involved in Unit Testing and developing Test Cases for various complex scenarios of input data.
- Involved in leading a team of 16 on-site and off shore developers and provided On-call support 24x7 when required.
Environment: Information Server/Datastage 8.5/8.1,BODS, Quality Stage, SAP ECC 6.0, Hortonworks suit of tools, Datameer 2.1.5, MongoDB, Oracle 9i.Oracle 11g, UNIX, Windows XP, Citrix, Remedy HP Quality Center, Microsoft Access, eroom
Confidential, TX
ETL Architect
Responsibilities:
- Involved in IT strategies and practices for client engagement, project methodology, architectural governance, enterprise documentation, infrastructure migration, development team evolution, and IT support.
- Stabilizes ad hoc support operations by clearly defining the boundaries and responsibilities of project and operational teams, while progressively injecting fitness checks earlier into the project lifecycle.
- Involved in requirements gathering, analysis, design, development, test and implementation.
- Responsibilities include migration of 5000 jobs (32 subject areas) fromDataStage7.5 toDataStage8.1
- Worked on requirements to exclude or mask any information on business outside of the JB network
- Involved in the design of the project compensation received by the flight customer (HELIX).
- Data migration from and to external vendors systems and internal database thru FTP, SFTP.
- Worked with reporting team in identifying Universe structure. Helped in identifying Shared dimensions and reduce redundancy of data. & Used Teradata load utilities like fast load, multi load etc
- Documented the workflow for knowledge transfer with documenting standards to meet variety of audience.
- On-call support to enable 24x7 coverage and coordinated a team of seven offshore resources.
Environment: Information Server 8.1.1 Informatica Power Center 9.1, Citrix, Teradata,DB2, Oracle 9i.Oracle 11g, UNIX, Linux Redhat, Windows Server 2003,Netezza,Tivoli Scheduler, BTEQ, MULTILOAD, TPUMP, FASTLOAD, FAST EXPORT, Teradata Manager
Confidential
Senior ETL/Data Architect
Responsibilities:
- Work with the State of Delaware team to improve the performance of the Datastage environment
- Identify and prioritize issues for resolution
- Responsibilities include in project planning and upgrading DataStage 7.5 to 8.1 with webservices.
- Recommendations to improve the performance of the Datastage environment
- Documented the whole installation process with step by step screen shots and gave an presentation about the new version of the tool and its components.
- Successfully installed patches and updates as needed and created projects in all the environments (Involved in troubleshooting of system issues and recommended solutions.
- Involved in data conversion (contacts, leads, accounts, Splits, Products, Quotes etc) from CRM/Siebel system to Warehouse and Siebel Data mart.
- Involved in Setting up Security assigning roles, project permissions & On-call support to enable 24x7 coverage.
Environment: SAP ECC, Windows, Oracle, SSIS,SQL Server, Data Stage8.1/Data Stage7.5, LSMW, Siebel on Premise, CRM on Demand application,AIX 6.1.
Confidential
Data Conversion Lead
Responsibilities:
- Involved in datastage migration from 7.5 to 8.1(With Quality stage).
- Coordinated with Master Data team and Legacy data team in creating design standards and technical specifications for Datastage jobs & Scheduled the jobs through UNIX shell scripts using CTL-M.
- Extensively used ABAP extract Stage for extracting data from SAP/R3 Repository to files along with LSMW
- Involved in leading a team of 3 on-site and off shore developers
- Created job sequences to control networks of jobs and handle the load dependency
- Used Address Doctor a Third-Party tool for formatting the address to a standard layout.
- Developed user defined Routines and Transforms to cleanse the data by using Universe Basic.
- Extensively used job parameters in the jobs developed which is helpful in the job automation.
- Responsible for migrating projects between multiple environments & production support
Environment: SAP ECC,SAP R3, CTRL-M, Quality Center, UNIX, Windows, Oracle, SQL Server, DB2, Data Stage8.0/Quality Stage, Shell Scripts, Erwin
Confidential, Salinas, CA
ETL Developer
Responsibilities:
- Reviewed various Analysis and Design documents.
- Developed and executed effective test plans to ensure quality & stability
- Extensively used Datastage designer to develop various parallel jobs to extract Cleanse, transform into Enterprise Data Warehouse Tables along with Teradata unitilities like Fastload, Multiload, Fastexport, BTEQ etc
- Implemented various performance tuning techniques for improving the performance of the application.
- Worked extensively with the release team to plan and execute the release & Involved in production support
Environment: UNIX, MVS Mainframes, COBOL, JCL, CTRL-M, Quality Center, Windows, Teradata, DB2, Oracle 10g SQL,Shell Scripts, Data Stage8.0, Quality Stage 7.5, BTEQ, MULTILOAD, TPUMP, FASTLOAD, FAST EXPORT, Teradata Manager, PMON
Confidential, Hartford, CT
ETL Developer
Responsibilities:
- Worked in EDW environment extensively & involved in Logical & Physical Modeling.
- Built Datastage jobs to FTP the data file for loading into the Data warehouse
- Involved in Unit Testing and developing Test Cases for various complex scenarios of input data.
- Actively participated in collecting enterprise module design requirements and discussions
Environment: UNIX, MVS Mainframes, COBOL, JCL Windows, Teradata, Oracle 9i, SQL, Shell Scripts. Erwin, Ascential Data Stage 7.5.2 and Ascential Data Stage server edition.