Lead Data Architect Resume
Lead Data Architect Bedminster, NJ
SUMMARY:
- Seasoned data architect with over 25 years of international experience in the commercial, financial, government and telecom sectors with top - notch system integrators such as Computer Sciences Corporation, LogicaCMG, Spectrum Technology Group, and General Research Corporation. My accomplishments include hands-on experience in the design and delivery of cost-effective, high-performance and highly available information technology infrastructures and applications with emphasis on data architecture, data warehouse design, big data, analytics and cloud computing.
TECHNICAL SKILLS:
Data Modeling: Enterprise, Logical, & Physical Data Modeling (EDW, OLTP, OLAP, ODS, Metadata Mgmt), ERwin, PowerDesigner, Embarcadero ER/Studio, Oracle Designer, SQLWorkbench
Database Management: Oracle Enterprise Manager Tools, TOAD, DB Artisan, SQL Server Management Studio
Master Data Management: Siperian Hub XU, Siebel UCM
Enterprise Architecture: TOGAF 9, Zachman Framework, System Architect
Database Languages: SQL, PL/SQL, TSQL, Pig, Hive
Data Processing Systems: Oracle, MS SQL Server, MySQL, Azure SQL Data Warehouse, Azure Cosmos DB, Azure HDInsight (Hadoop,Hive, Pig, Oozie, Sqoop, Spark, Kafka)
Data Mining: SQL Server Reporting\Analytical Services (SSRS\SSAS), R, Power BI, Cognos, Crystal Reports, Business Objects
Data Integration: SSIS/DTS, DataStage, Informatica, PL/SQL, Azure Data Factory
PROFESSIONAL EXPERIENCE:
Confidential
Lead Data Architect
Responsibilities:
- Architected and deployed an Azure HDInsight Hadoop cluster (ver 2.7.3, 30 data node, 120 core, 6 TB SSD storage) to process in excess of 148M encounter attendance records and associated data. The Hadoop cluster was refreshed using an Oozie coordinated workflow that invoked Sqoop and optimized Hive scripts. Client applications connected to the Hadoop cluster thru Hive ODBC\HiveQL.
- Developed a multiple linear regression model in R to analyze the correlation between standardized test scores and socio-economic indicators as a Capstone project for the Harvard Data Science Program. Final grade: 96%.
- Developed a pilot implementation of Encounter Attendance dimensional model in the Azure Cloud using Azure Data Factory and Azure SQL Data Warehouse.
- Standardizing infrastructure on MS SQL Server products resulting in savings exceeding $1M annually in licensing and support fees. This effort entailed generating an ERwin r9.5 data model, data dictionary and data lineage documentation for the target Operational Data Store (ODS) encompassing 8 data domains: Referral, Evaluation, Individualized Education Program (IEP), Placement, Student, Staff, Location and Encounter Attendance. Over 100 DataStage 8 ETL packages were migrated to SSIS utilizing Visual Studio 2013.
- Implementing the Special Education Data warehouse to support several mandated reports such as Mayor’s Management Reports and New York City Council reports that detail compliance metrics of Special Education in New York City public schools.
- Developing multiple data feeds to support critical NYC DOE functionality such as Medicaid reimbursement for encounter attendance service records, Office of Pupil Transport (OPT) pre-school transportation, and Charter school feeds.
- Leading the SESIS Master Data Management (MDM) initiative to promote a shared foundation of common data definitions within the DOE complying with the Common Education Data Standard (CEDS).
Chief Data Architect
Confidential, Bedminster, NJ
Responsibilities:
- Leading the development of the Special Education ODS that reconciled data entities from many legacy DOE systems (ATS, CAP, TIENET, HRHUB, LCGMS, and SEC) and propagated data to various business stakeholders within the DOE.
- Modeling the ODS logical data models in both ERwin r7 and ER/Studio and instantiating optimized Oracle 10g/11g DDL to handle multi- million row tables and efficient data access.
- Generating and validating source-to-target mapping documents that facilitated the design of DataStage jobs to populate the ODS from various DOE source systems mentioned above.
- Implementing data profiling, validation and remediation processes that detected anomalies across multiple systems.
- Leading the design and development of a real-time student update service that utilized an IBM Integration Bus (IIB) v10 to capture real-time updates from ATS.
- Participating in architectural design, capacity planning and failover procedures for underlying Oracle 11g database infrastructure
- Designing a complete reconciliation process for a master data management environment based on Siperian Hub XU SP1. This activity entailed designing the base object model, identifying source systems and address cleansing functions between landing and staging tables, and optimizing match and merge criteria for base object identification and survivorship criteria.
- Generating dimensional models that were based on end-user requirements and augmented with data profiling and validation processes using ER/Studio.
- Generating physical DDL from ER/Studio to instantiate a corresponding Oracle 10g database to store multi- million row fact and dimension tables.
- Authoring source-to-target mapping documents that facilitated the design of DataStage jobs to populate the EDW.
- Identified performance bottlenecks and optimized data access by clients (Cognos) utilizing TOAD Explain Plan and SGA Trace facility
- SQL Server Upgrade: Performed an in-place upgrade for various servers in the Pilot and QA environments. To ensure backward compatibility, the database compatibility level was initially set to 80 compatibility. Utilized the output of the Upgrade Advisor to update stored procedures, indexed views and other database objects to 90 compatibility.
- Replication tuning: Reduced the overall latency of the system by optimizing the publisher, distributor and subscriber configurations.
Confidential
Lead Data architect
Responsibilities:
- Relational Modeling: Generation of Logical & Physical Data models for the 17 components of POAP through the identification of business requirements and the analysis of Hibernate object-relational mappings of Java data structures (POJO’s and JavaBeans). ERwin r7 was utilized to generate a relational DDL that was instantiated in an Oracle 10g database.
- Capacity Planning and H/W Configuration: Specified a highly available infrastructure that consisted of a clustered Oracle 10g (RAC) and a physical standby database (DataGuard) for a distributed environment that spanned from Los Angeles to Staten Island Data centers.
- Customer Data Integration (CDI) Project: Created a unified representation of customers that integrated data elements from extended credit information, sales transactions and interview information. An integrated customer data model was created in Erwin and instantiated in an Oracle 10g database. Data quality for information received from various telemarketing vendors (in excess of 4 million interviews and 26 million responses) was analyzed as per pre-defined constraints.
- Data warehouse implementation: Designed and implemented a data warehouse of US household information that exceeded 220 million records, augmented with ethnic and credit information. Data migration incorporated ETL processes that utilized SQL*Loader (direct mode) and PL/SQL.
- Customer Classification: A database of more than 2.7 million interviewed individuals, with tens of attributes, was loaded into SQL Server 2005 to further classify potential leads for more expensive live operator interviews. The Microsoft Decision Trees algorithm was utilized to classify these customers and generate a lead profile.
Confidential
Lead Data architect
Responsibilities:
- Relational & Dimensional Modeling: Generation of Logical & Physical Data models for the Staging and Enterprise data warehouse utilizing ERwin r7. A dimensional data model for the Enterprise data warehouse was generated with five fact tables at different levels of aggregation. In addition, source-to-target mappings were generated to outline data lineage. The physical data models were generated and optimized for SQL Server 2005 and included a partitioning scheme.
- Near real-time ETL process design. SSIS was utilized to develop a near-real-time ETL process that polled the production system every 5 minutes for specific events. Based on the event, the resultant data was loaded into the real-time partition of the EDW.
- Capacity Planning: Various use cases and scenarios were developed to determine critical resource capacities that are needed (such as server processors, memory, internal disk space, SAN connectivity and scalability) within the infrastructure to meet forecasted workloads.
Telelogic ME Partner
Confidential
Responsibilities:
- Creation of an enterprise ODS that captured the topology of the STC data network that consisted of element management systems of five main vendors (Alcatel, Lucent, Nortel, Cisco and Tellabs) Shell and SQL scripts against the underlying Oracle, Informix or Sybase databases were utilized to extract source data. The final topology was loaded using Xperload, Xpercom’s data loading facility, into an Oracle 9i database.
- Creation of a data reconciliation routine that automatically refreshes the enterprise ODS to reflect any changes in the underlying network configuration and generating appropriate discrepancy and activity reports.
- Re-engineered the fulfillment and assurance business processes in accordance with the enhanced Telecom Operations Map (eTOM) standard.
Enterprise Architect,
Confidential
Responsibilities:
- Masriya (Egyptian PTT): Lead the Customer Data Integration (CDI) project by designing a data cleansing and enrichment process that analyzed, enriched, and clustered data received from the NCR billing system by extrapolating attributes from alternate information sources with the goal of achieving a coherent and correct view of the customer. The resultant customer database was instantiated in Oracle 8i.
- Data modeling and generating a relational database schema targeted to Oracle 8i using Erwin 3.5.2.
- Tuned data access by analyzed SQL statements and object-relational mappings generated by TOPLink between Java data structures (EJB entity beans) the physical DDL generated by ERwin
- Database administration functions such as backup/recovery, SQL tuning, user maintenance, and optimizing database transactional parameters (i.e. isolation levels and locking schemes).
- Data conversion and migration procedures that converted legacy data into the schema utilized by our product. Data Transformation Services (DTS) of MS SQL Server 7 was utilized for this activity.
- Authoring a document type definition (DTD) for the XML document that described the system configuration.
- Principal Analyst, General Research Corporation, New Jersey 6 & 8
- Global Combat Support System - Air Force (GCSS-AF): Formulated the architecture specification for the Seamless Supply Initiative encompassing the technical, system and operational architectures. The architecture specification complied with Defense Information Systems Agency's Common Infrastructure Environment/Common Operating Environment (DISA CIE/COE) specifications and was modeled using the Rational Rose tool set.
- Material Management Standard System Integration (MMSSI): Developed an enterprise data model covering the business areas of Asset Management, Requirements Determination and Supply and Technical Databased on the implementations of the Configuration Management Information System (CMIS 4.0 & CMIS 5.0) and Provisioning and Cataloging Technical Support System (PCTSS).
- Command, Control and Communication (C3) Database Management Strategy: Analyzed the informational requirements and compliance with respect to the Department of Defense Intelligence Information System (DODIIS) Profile. This included reviewing ANSI/SQL92, DCE (X.500, X.400), ANSI X3H2, and Government Open System Interconnection Profile (GOSIP).
- Global Directory/Dictionary (GDD): Designed and implemented the Global Directory/Dictionary (GDD) for the Global Database Management System (GDMS). The GDD design is based upon OSI X.500 standards and entailed specifying the fragmentation/allocation, consistency control, role-based access control, data mappings and maintenance policies of the GDMS.
