Data Analytics & Machine Learning Solutions Architect Resume

SUMMARY

30 years' experience in successfully delivering clear and simple solutions to very complex data management challenges as a consultant leading fixed length data management engagements typically staffed by a mix of on - shore and off-shore resources.
Focused on delivering executives answers to very complex business questions using a mix of analytics, data science, machine learning and natural language processing (NLP) of huge amounts of unstructured data collected from many sources.

TECHNICAL SKILLS

Data Science • Machine Learning • Python • Anaconda • Computational Statistics • Enterprise Information Architecture
AWS, Azure & Google Compute Cloud Servers • ClearNLP • NLTK • Ontology and RDF Design (OWL, Turtle, N-Triple, etc.)
GDPR • TOGAF • Data Vault Architecture • Master Data MDM/CDM • R Studio • Neo4j • Graph Databases • Tableau

PROFESSIONAL EXPERIENCE

Data Analytics & Machine Learning Solutions Architect

Confidential

Responsibilities:

Using a workflows compliant with the EU's General Data Protection Regulation (GDPR) privacy laws, raw datasets were cataloged and received into a Hadoop 2.6.0 data lake and then loaded into an architecture based on Data Vault structures as shown here: confidential
Designed and deployed data science infrastructure which ingested (via Kafka/Oozie), parsed, classified and indexed unstructured content from public government records using Python, BeautifulSoup, Natural Language Tool Kit (NLTK), Neo4j, MySQL and Logstash into ElasticSearch/Kibana.
Master Data Management (MDM) done using the open-sourced OpenRefine (formerly Google Refine) software.
Converted all their legacy relational databases into Resource Description Format (RDF) formatted records for use with semantic browsers (Marbles, Fluent Editor, etc.) using D2R Server. The D2R Server tutorial I recorded is now available on my YouTube channel at: confidential

Confidential, New York

Data Analytics & Machine Learning Solutions Architect

Responsibilities:

Designed and deployed a new Data Science infrastructure and led an India-based development team. The new system successfully delivers Natural Language Processing (NLP) capabilities for ingesting, parsing, classifying and indexing unstructured content from legal documents.
Ingestion of the ‘raw’ documents uses a Data Vault Hub, Spoke, Link approach and is stored in Cassandra. k-Nearest Neighbor (kNN) and Support Vector Machine classifications done using Python Anaconda’s Natural Language Toolkit (NLTK). Tableau, with R Studio, were used for computational statistics & visualizations.
Relationship visualizations done using Neo4j. Metadata management, data stewardship and automatic generation of Neo4j Cypher statements was done using the Chameleon Metadata approach.

Confidential, Massachusetts

Data Science & Master Data Management Solutions Architect

Responsibilities:

This project successfully delivered, in just five months, an open-source alternative to their planned deployment of an $8M Oracle MDM Suite not including maintenance. Total software cost for my solution was $2,350.
Google Refine used for data clustering and cleanup. MonkeyLearn API’s used for legal entity extraction, location extraction, content sentiment analysis. A Data Vault ETL staging area ensures complete auditability.
Non-technical users are now able to explore enterprise data with a polyglot persistence approach using Neo4j, and MySQL as its graph (DAG) and relational databases. Unstructured data was stored using Cassandra and Hadoop for “small record & rapid arrival” and “large record & slow arrival” data arrival, respectively.

Confidential

Lead Integration Architect

Responsibilities:

Design the information integration architecture for their John Hancock subsidiary as Phase-I of their initial Master Data Management effort for Party data for several lines of business on IBM’s MDM Server (MDMS) V11.
Managed a Malaysia-based ETL team of ManuLife employees developing Informatica PowerCenter Workflows.
Designed and modeled a ‘Heavy Onboarding Footprint’ ETL ecosystem using a Data Vault data model for aligning incoming data to the IBM/MDMS RDF definitions. The graph database nodes, relationships, properties and OWL ontologies were documented using IHMC CMAP Knowledge Modeling Kit.
Another ‘post-MDM RDF’ was used to populate Hadoop 1.2.1, graph databases (Neo4j) and NoSQL stores.

Confidential, New Jersey

Analytics Specialist

Responsibilities:

Provide design enhancement recommendations for an analytics environment staging via Data Vault stores (i.e. Hubs, Links and Satellites) atop Terradata.
Performed a data quality and analytics assessment focused on: Financial and Compliance Services; Risk Management and Compliance; Product Development; and Global Account Management.
Technical landscape included SAP (ECC, SD, CRM, BP, BI BOBJ and BW), MicroStrategy reporting and Microsoft TFS for service design and delivery management

Confidential, New York

Hadoop Big Data Architect

Responsibilities:

Captured existing business processes, lineage and metadata as ‘RDF Triples’ (i.e. Subject Predicate Object) for their new Hadoop 0.23.1, HDFS, HIVE and HBASE data stores.
Designed Object-Oriented RDF using CmapTools 5.0.03 to segment incoming source data and UDEF-based RDF Schema aligned to W3C XSD 1.1 Part 2 datatypes for organizing captured domains and ranges and linking to source system URI’s.
Guided IDC subject matter experts through definition and documenting of business processes and required execution sequences using Directed Acyclic Graphs as RDF Triples to represent the lifecycles of IDC information- based products from data vendor to ultimate consumer and its compliance with source data vendor contract agreements.
Designed the Exchange-to-Hadoop Business, Information and Technical architectures for any data lifecycle.
Standardized the valid Linked Data value pairs and their relationship(s) to URI’s and business processes via Object-Relational Mappings associating them to available OAGIS GS1 Global Product Classification (GPC) Business Object Documents (BOD’s). And, where possible, linked ISO 10383 identifiers for source data vendors.
Created a Corporate Product Information Ontology using the Florida Institute for Machine & Human Cognition (IMHC) CmapTools Knowledge Modeling Kit.
Created a TOGAAF-based workflow management, role-based product entitlements and task-level audit metrics to capture data product information management (PIM) knowledge.

Confidential, Ohio

Master Data Architect.

Responsibilities:

Designed reusable-process strategy to increase predictability and efficiency of the Production (Conceive/Design/Produce/Deploy/Service) and Customer-interaction (Campaign/Order/Cash/Care) lifecycles.
Designed conceptual and logical canonical product information master (PIM) models.
Designed business process architecture using Oracle’s AIA PIM Hub and the Siperian UCM tool and roadmap for migrating canonical PARTY/CUSTOMER from legacy Oracle Trading Community Architecture (TCA) to Oracle AIA.
Deployed proof-of-concept MDM systems (profiling: Trillium, metadata: CA Repository, ETL: IBM DataStage)
Led senior business and IT executives through the identification, consensus building and project planning phases of this MDM initiative under standards of the Capability Maturity Model (CMM).