Lead Big Data Architect - Engineer / Lead Cloud Architect-engineer / Data Scientist Resume
Jacksonville, FL
WORK HISTORY:
Confidential, Jacksonville, FL
Lead Big Data Architect - Engineer / Lead Cloud Architect-Engineer / Data Scientist
Responsibilities:
- Design/Plan/Architect Pivotal Big Data Suite roadmap including use of the HDP / HDF Hadoop technology stack
- Design/Plan/Architect data ingestion strategy from 400+ data sources into (HDP) Hadoop Data Lake
- Design/Plan/Architect ETL strategies for real time data pipeline ingestion to (HDP/HDF) Hadoop Data Lake
- Design/Plan/ArchitectStorm, Kafka and Spark architecture included in HDP/HDF real time data solutions
- Data Discovery, Data Profiling, Predictive modelling, Machine Learning, R & Python development
- Led the creation of Data Governance vision, charter, framework, committees and processes for the enterprise.
- Led the implementation, design (one full lifecycle) of Master Data Management (MDM) using MuleSoft / Talend.
- Proven "hands on" MDM experience with expertise in MDM strategy proposal, roadmap and planning
- Phased implementation leveraging best practices and strong focus in data quality
- Experience in design/architecture of MDM Hub, data integration, data governance process and data quality.
- Proficient using Talend, Profisee Maestro, Informatica Siperion and IBM Infosphere MDM, DQ & DG tools.
- R & Python with comprehensive proficiency; Scala - Architecture & Collection Library, REPL, Scala, Refection, Macros
- Deployment of Hadoop and Spark ecosystems.
- Using Erwin for logical and physical database design, database optimization, loading strategy design and implementation, conducting business analysis, event modeling & using knowledge of standard commercial databases (Oracle, Teradata, DB2).
- Working in Big Data and Microservices technologies like - Hadoop, Map Reduce Frameworks, Cassandra, Kafka, Spark, HBase, Hive, Springboot, nodejs etc.
- Developing database solutions by designing proposed system; defining database physical structure and functional capabilities, security, back-up, and recovery specifications and providing database support by coding utilities, responding to user queries and troubleshooting issues
- Interacting and collaborating with cross functional teams including application development, peer reviews, testing, operations, security and compliance and project management office, as well as business customers and external vendors.
- Machine Learning Frameworks - Amazon Machine Learning / Azure Machine Learning / Singa / H20 / Spark MLLib
- Machine Learning Frameworks (Streams) - Massive Online Analysis / Spark MLLib
- Regression, trees, neural networks, survival analysis, cluster analysis, forecasting, anomaly detection, association rules.
- Detailed understanding of machine learning pipelines and ability to discuss concepts such as feature discovery/engineering, model evaluation/validation, online vs. offline learning, and model deployment.
- CreatepredictiveandclusteringmodelsutilizingOracle, SQLServerandHDFSdatasources
- Definewhenpredictiveorclusteringmodelscouldbeutilized,andthetypeofdatarequiredtomaketheminsightful
- Develop,extractandmaintainlogicalandphysicaldatamodelsfordataanalyticswithinDirectEnergy
- Enhancingdatacollectionprocedurestoincludeinformationthatisrelevantforbuildinganalyticsystems
- Dataminingusingstate-of-theartmethodsandproduceactionableinsight
- Selectingfeatures,buildingandoptimizingclassifiersusingmachinelearningtechniques
- Designanddeveloppredictivemodelsandmachinelearningalgorithmsusingadvancedmethodologies
- Architect - AWS, AWS RDB, AWS Data Warehouse, AWS Redshift & AWS Storage solutions
- Architect - EC2, S3, CloudFormation, RDS, CloudFront, VPC, Route53, IAM, CloudWatch, Beanstalk, Lambda
- Architect - Build, design, architect, implement high-volume, high-scale data analytics, machine learning Snowflake solutions
- Architect - Data Migrations from Oracle, SQL Server and Hadoop (Hive / HDFS) to Snowflake Databases
- Engineer - SnowFlake Data Warehouse - Analyze and performance tune, query processing engine with SnowFlake DW.
- Engineer - SnowFlake Data Warehouse -Data Migration Strategy from On-Prem to SnowFlake DW solution - Ingestion Plan
- Engineer - Deploy cloud infrastructure (Security Groups and load balancers needed to support EBS environment)
- Engineer - Create and manage TFS Continuous integration builds on VSTS
- Engineer - Responsible for maintaining AWS instances as part of EBS deployment
- Engineer - Systems administration with Windows / Unix scripting
- Engineer - Support on AWS services and DevOps deploying applications
- Architect - Technical / Solution SME within the Data Integration across on-premise and AWS data sources / applications
- Architect - MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) - UDF and UDAF requirements
- Architect - Talend Data Fabric through Spark and AWS EMR for Big Data Batch Jobs - UDF and UDAF requirements
- Architect - Azure Data Factory, Data Pipeline Design, Azure Data lake / Azure Storage - Oracle, DB2, SQL Server, MySQL
- Engineer - Azure Data Factory, Data Pipeline DevelopmentSQL, SSIS, Powershell and ETL scripting
- Engineer – Azure Data Flow, Data Modeling in Azure, and Azure Ad-HOC Reporting (design / development)
- Architect – ETL from AWS to Google Cloud to Azure and from/to other On-Prem data sources / targets.
- Architect – Google Cloud Platform utilizing the Data Analytics, Data Stream Analytics, Hadoop, Data Lake and BI toolset
- Engineer – Google Cloud Platform Data ingestion, Analytics datasets, data lake integration, data migration to Google Cloud
- Engineer – Google Cloud Platform to Kafka and Spark cluster solutions, Google Cloud Platform to Azure via HDFS/Hive
- Architect – Google Big Query for use cases where other Hadoop Solutions didn’t provide the results needed by business.
- Engineer – Develop / Design data patterns via microservices into data pipelines across the Azure Technology Stack.
- Architect & Administrator (AWS) GenGireXD, PostgresSQL, Greenplum, Hawq & Kafka environments; GoLANG Program
- Architect & Administrator (Azure) Azure SQL DB, Hadoop, Hadoop Spark w/ NoSQL (Mongo, Cassandra & Couchbase)
- Architect – Salesforce Data Extractions into Azure Data Lake & Cosmos DB / MuleSoft Microsoft Service Bus Connector
- Architrct – MuleSoft Anypoint Platform to Azure API – Data Ingestion / Data Services from/to Salesforce, SAP, Databricks
- Architect – Designed / Developed Data Migration Strategy from On Premise to Cloud (SQL & NoSQL Technology)
- Architect & Administrator (Google Cloud) Hadoop, MongoDB, Couchbase, Hbase, PostgreSQL, Cassandra (Spark / Storm)
- Architect - Callidus Cloud w/ SAP Hana; ETL Sales Data via Kafka to Data Lake (Hadoop); Data Visualizations
- Architect – Calidus Cloud Integration to enterprise data stores via both SAP Data, Non-SAP data and Master Data Mgt.
- Engineer – Deployment MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) – UDF and UDAF requirements
- Engineer – Deployment Talend Data Fabric, Spark within AWS EMR – UDF and UDAF requirements
- Engineer – NetAPP Data Fabric architecture for UDF and UDAF deployments
- Engineer - Azure Data Bricks, Azure Data Lake Service, Azure SQL Data Warehouse, Azure Data Catalog
- Engineer - Technical / Solution SME within the Data Integration of Azure, Blob Storage, Log Analytics
- Engineer – Azure Data Flow, Data Modeling in Azure, and Azure Ad-HOC Reporting (design / development)
- Architect & Administrator (Azure) Cosmos DB – Schema Design, Data ingestion, Performance and Query optimization
- Oracle Golden Gate / Oracle Data Integrator / Hive / PostgreSQL Data integration design & configuration
- Oracle12C Enterprise Metadata Management installation and deployment
- Data Integration with MuleSoft EBS, JMS Transport - TIBCO Suite (EMS) for EAI, SOA and BPM
- MuleSoft integration patterns – Migration, Broadcast, Aggregation, Bidirectional Sync, Correlation.
- MuleSoft Anypoint Platform, Connectors / Transports, Enterprise Service Bus, Integration Services
- Installation and Deployment – Oracle Enterprise Metadata Management for Hadoop / PostreSQL Data Store,
- Implement Talend Data Integration- Reading an input file, transforming data, combining columns, Joining data sources
- Implement Talend Data Integration- Creating database metadata, Joining data, Master Data Management model design
- Define Cloud Native (micro-services/BDD (behavior driven development), containers/Dockers, Agile, BDD)
- Define data ingestion strategies; Kafka, Storm, Nifi, Zookeeper, Oozie, Sqoop – Lambda Architecture
- Cloud to/from On-Prem – “Apache” - Kafka, NiFi, Storm, Flume, Sqoop, Samza, Chukwa
- Cloud to/from On-Prem - Wavefront, Data Torrent, Amazon Kinesis, Syncsort, Gobblin, FluentD, Cloudera Morphlines,
- Cloud to/from On-Prem - White Elephant, Heka, Scribe, Databus
- Tools were tested, compared, bench marked, via POC and Performance testing. Ingestion Strategy /Tool Outcome
- Delta Architecture Ingestion - Acquisition -> Sqoop, Flume, Python; Messaging -> Kafka, Pulsar;
- Delta Architecture Ingestion - Stateful -> Flink; Query / Processing / Lamda -> Hadoop, MapReduce, Hive, NoSQL
- Neo4J -Graphical Data – Query, Analyze for highly connected data; Native Graph Storage, Native Graph Processing
- Neo4J - Graph scalability, high availability, Graph Clustering – Graphs on Spark, Graphs in Azure Cloud Graph Platform
- HBase - Clusters Design, Management, including backup/recovery, replication, cluster failover, and disaster recovery
- HBase – Large structured and unstructured Datasets from multiple data sources – pipeline to Hadoop / KAFKA Clusters
- Couchbase - 5.0/4.6.x (15 Node/Cluster) Document Data modeling, Cluster Management w/ Hadoop HDF
- Couchbase - Node Configuration, Data Conversion to JSON, Hadoop/HDP
- Cassandra - DataStax (25 Node/Cluster) Transaction Data with replica’s across 3 data centers
Confidential, Santa Monica, CA
Lead Big Data Architect-Engineer / Lead Cloud Architect-Engineer / Enterprise Data Architect / Data Scientist
Responsibilities:
- Design/Plan/Architect Pivotal Big Data Suite roadmap including use of the HDP / HDF Hadoop technology stack
- Design/Plan/Architect data ingestion strategy from 2500+ data sources into (HDP) Hadoop Data Lake
- Design/Plan/Architect ETL strategies for real time data pipeline ingestion to (HDP/HDF) Hadoop Data Lake
- Design/Plan/ArchitectStorm, Kafka and Spark architecture included in HDP/HDF real time data solutions
- Data Discovery, Data Profiling, Predictive modelling, Machine Learning, R & Python development
- Architect - AWS, AWS RDB, AWS Data Warehouse, AWS Redshift & AWS Storage solutions
- Architect – EC2, S3, CloudFormation, RDS, CloudFront, VPC, Route53, IAM, CloudWatch, Beanstalk, Lambda
- Architect - Build, design, architect, implement high-volume, high-scale data analytics, machine learning Snowflake solutions
- Architect – Azure Data Factory, Data Pipeline Design, Azure Data lake / Azure Storage – Oracle, DB2, SQL Server, MySQL
- Engineer – Azure Data Factory, Data Pipeline DevelopmentSQL, SSIS, Powershell and ETL scripting
- Engineer – SnowFlake Data Warehouse - Analyze and performance tune, query processing engine with SnowFlake DW.
- Engineer – SnowFlake Data Warehouse -Data Migration Strategy from On-Prem to SnowFlake DW solution - Ingestion Plan
- Engineer - Deploy cloud infrastructure (Security Groups and load balancers needed to support EBS environment)
- Engineer - Create and manage TFS Continuous integration builds on VSTS
- Engineer - Responsible for maintaining AWS instances as part of EBS deployment
- Engineer - Systems administration with Windows / Unix scripting
- Excellent grasp of integrating multiple data sources into an Enterprise data management platform and can lead data storage solution design
- Ability to understand business requirements and building pragmatic/cost effective solutions using agile project methodologies
- Participate in Agile/Scrum ceremonies, including 2 week release sprints
- Perform requirements analysis and high quality code development
- Review the code of coworkers and offer feedback
- Design frameworks, libraries, and components that are reusable
- Engineer - Support on AWS services and DevOps deploying applications
- Architect - Technical / Solution SME within the Data Integration across on-premise and AWS data sources / applications
- Architect – MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) – UDF and UDAF requirements
- Architect – Talend Data Fabric through Spark and AWS EMR for Big Data Batch Jobs – UDF and UDAF requirements
- Engineer – Azure Data Flow, Data Modeling in Azure, and Azure Ad-HOC Reporting (design / development)
- Architect – ETL from AWS to Google Cloud to Azure and from/to other On-Prem data sources / targets.
- Architect – Google Cloud Platform utilizing the Data Analytics, Data Stream Analytics, Hadoop, Data Lake and BI toolset
- Engineer – Google Cloud Platform Data ingestion, Analytics datasets, data lake integration, data migration to Google Cloud
- Engineer – Google Cloud Platform to Kafka and Spark cluster solutions, Google Cloud Platform to Azure via HDFS/Hive
- Architect – Google Big Query for use cases where other Hadoop Solutions didn’t provide the results needed by business.
- Engineer – Develop / Design data patterns via microservices into data pipelines across the Azure Technology Stack.
- Architect & Administrator (AWS) GenGireXD, PostgresSQL, Greenplum, Hawq & Kafka environments; GoLANG Program
- Architect & Administrator (Azure) Azure SQL DB, Hadoop, Hadoop Spark w/ NoSQL (Mongo, Cassandra & Couchbase)
- Architect – Designed / Developed Data Migration Strategy from On Premise to Cloud (SQL & NoSQL Technology)
- Architect & Administrator (Google Cloud) Hadoop, MongoDB, Couchbase, Hbase, PostgreSQL, Cassandra (Spark / Storm)
- Architect - Callidus Cloud w/ SAP Hana; ETL Sales Data via Kafka to Data Lake (Hadoop); Data Visualizations
- Architect – Calidus Cloud Integration to enterprise data stores via both SAP Data, Non-SAP data and Master Data Mgt.
- Engineer – Deployment MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) – UDF and UDAF requirements
- Engineer – Deployment Talend Data Fabric, Spark within AWS EMR – UDF and UDAF requirements
- Engineer – NetAPP Data Fabric architecture for UDF and UDAF deployments
- Engineer - Azure Data Bricks, Azure Data Lake Service, Azure SQL Data Warehouse, Azure Data Catalog
- Engineer - Technical / Solution SME within the Data Integration of Azure, Blob Storage, Log Analytics
- Engineer – Azure Data Flow, Data Modeling in Azure, and Azure Ad-HOC Reporting (design / development)
- Architect & Administrator (Azure) Cosmos DB – Schema Design, Data ingestion, Performance and Query optimization
- Engineer – ETL from Azure SQL to multitude of data targets and to/from data targets/sources.
- DevOps – Automation for Support. Deployment, Patching, Configuration, SDLC, Migration efforts, Sync with On-Premise
- DevOps - Build/Release/Deployment/Operations; Tools (Datical, Jenkins, SolarWinds, Splunk, Vagrant, Nagios)
- DevOps - Linux/Unix/Windows Administration
- Led the creation of Data Governance vision, charter, framework, committees and processes for the enterprise.
- Led the implementation, design (one full lifecycle) of Master Data Management (MDM) using Profisee Maestro / Talend.
- Proven "hands on" MDM experience with expertise in MDM strategy proposal, roadmap and planning
- Phased implementation leveraging best practices and strong focus in data quality
- Experience in design/architecture of MDM Hub, data integration, data governance process and data quality.
- Proficient using Talend, Profisee Maestro, Informatica Siperion and IBM Infosphere MDM, DQ & DG tools.
- R & Python with comprehensive proficiency; Scala – Architecture & Collection Library, REPL, Scala, Refection, Macros
- Machine Learning Frameworks - Amazon Machine Learning / Azure Machine Learning / Singa / H20 / Spark MLLib
- Machine Learning Frameworks (Streams) - Massive Online Analysis / Spark MLLib
- Regression, trees, neural networks, survival analysis, cluster analysis, forecasting, anomaly detection, association rules.
- Understanding of machine learning pipelines, feature discovery/engineering, model evaluation/validation/deployment.
- Translatecomplexbusinessissuesintoachievableanalyticallearningobjectivesandactionableanalyticprojects
- CreatepredictiveandclusteringmodelsutilizingOracle, SQLServerandHDFSdatasources
- Processing,cleansing,andverifyingtheintegrityofdatausedforanalytics
- Selectingfeatures,buildingandoptimizingclassifiersusingmachinelearningtechniques
- Designanddeveloppredictivemodelsandmachinelearningalgorithmsusingadvancedmethodologies
- Oracle Golden Gate / Oracle Data Integrator / Hive / PostgreSQL Data integration design & configuration
- Oracle12C Enterprise Metadata Management installation and deployment
- Installation and Deployment – Oracle Enterprise Metadata Management for Hadoop / PostreSQL Data Store,
- Implement Talend Data Integration- Reading an input file, transforming data, combining columns, Joining data sources
- Implement Talend Data Integration- Creating database metadata, Joining data, Master Data Management model design
- Define SOA based applications and micro-services for Data Pipeline Architecture
- Define Cloud Native (micro-services/BDD (behavior driven development), containers/Dockers, Agile, BDD)
- Define data ingestion strategies; Kafka, Storm, Nifi, Zookeeper, Oozie, Sqoop – Lambda Architecture
- Cloud to/from On-Prem – “Apache” - Kafka, NiFi, Storm, Flume, Sqoop, Samza, Chukwa
- Cloud to/from On-Prem - Wavefront, Data Torrent, Amazon Kinesis, Syncsort, Gobblin, FluentD, Cloudera Morphlines,
- Cloud to/from On-Prem - White Elephant, Heka, Scribe, Databus
- Tools were tested, compared, bench marked, via POC and Performance testing. Ingestion Strategy /Tool Outcome
- Delta Architecture Ingestion - Acquisition -> Sqoop, Flume, Python; Messaging -> Kafka, Pulsar;
- Delta Architecture Ingestion - Stateful -> Flink; Query / Processing / Lamda -> Hadoop, MapReduce, Hive, NoSQL
- Database Refactoring, Database Upgrades, Database Migrations, Database Platform Changes (SQL->SQL, SQL->NoSQL)
- Database connection pooling and configuration (Oracle, SQL Server, DB2, PostgreSQL)
- Migration from Informatica ETL / ELT methods into Azure Data Factory Data Pipeline architecture.
- Data Pipeline from relational database into cloud data lake and data storage using ETL extraction via Azure Data Factory
- SQL Env – Oracle 12c (450+ PRD DB’s); SQL Server 2012/2014/2016 (425+ PRD DB’s); DB2 9/10 (75 PRD DB’s)
- Oracle – Oracle Performance Tuning, backup & recovery, DevOps Automated Changes, Data Migrations, ETL
- SQL – Oracle 11i, 12c, 12c R2, OEM 13c; SQL Server 2012, 2014, 2016; DB2 11.1, 10.5, 10.1, 9.8, 9.7;
- Sybase – (ASE 15.x/16.x) – 30+ Dev/TST/PRD DB’s (Log Shipping, Replication Server, DB Mirror);
- MySQL – (Percona 5.7.x) – 15+ PRD DB’s Percona Cluster; (MariaDB 10.1.x, 10.2.x, 10.3.x) 20+ TST/DEV
- MySQL – Replication, cluster configurations, sharding – IaaS (On-Prem & Cloud), Index, Elastic Search
- PostgreSQL DBA 9.5/9.6 (20+ PRD DB’s. Data Ingestion, Data Model optimization, Table/Index Optimization
- PostgreSQL DBA – Backup/Recovery, Performance Tuning, Data loads, Connectors to other environments, SQL Tuning
- PostgreSQL DBA – Replication and Replica Management; Sync Management and Optimization;
- PostgreSQL DBA – Query, Storage, Index, - Data Analytics Optimization.
- Exadata – X5 migration to X7 4 node NVMe 18 Core configuration;
- Exadata - Migration from Oracle RAC to Oracle Exadata multi-tenant (RAC) Cluster
- Exadata – Parallelization Optimization, Index Optimization, Partition optimization, Stats optimization
- Exadata – Perf. tuning, Optimizer optimization, Configuration optimization, Smart Scans optimization
- Exadata – EXACHK, ASM, ClusterWare, DCLi, CellCLi, - Storage, Operating System and Network
- Oracle – Oracle 12C Pluggable Databases Through Database Consolidation; Redaction Policy; Top N Query and Fetch;
- ODI – Oracle 12C Data Integrator configuration with Hadoop/Hive & Oracle DB integration;
- All Databases – Meta Data Management, Log Management, IN-Memory Optimization, Database Cluster Tuning
- SQL Server – SQL Profiler, Indexing Optimization, Parallel Query Optimization, Storage Optimization (Data Files, Logs)
- Netezza - (15+ PRD DB’s) – 30+ Dev/TST/PRD DB’s (Log Shipping, Replication Server, DB Mirror);
- Netezza - Provided technical efficiency performance and security functions of Netezza databases.
- Netezza - Implemented procedures for allocation of hardware resources and performance tuning
- MongoDB –3.6/3.4 (15 MongoDB PRD DB’s) w/Sharding across 20 node cluster
- MongoDB –Full High Availability within two data centers. Document store for more than 5000 users.
- Neo4J -Graphical Data – Query, Analyze for highly connected data; Native Graph Storage, Native Graph Processing
- Neo4J - Graph scalability, high availability, Graph Clustering – Graphs on Spark, Graphs in Azure Cloud Graph Platform
- HBase - Clusters Design, Management, including backup/recovery, replication, cluster failover, and disaster recovery
- HBase – Large structured and unstructured Datasets from multiple data sources – pipeline to Hadoop / KAFKA Clusters
- GreenPlum – (15+ PRD DB’s). 10 Node Cluster, HAWQ, Pivotal HD & Big Data Suite
- GreenPlum – Data Ingestion, Backup/Recovery, Performance Tuning, Connection Pooling, Query Tuning
- Couchbase - 5.0/4.6.x (15 Node/Cluster) Document Data modeling, Cluster Management w/ Hadoop HDF
- Couchbase - Node Configuration, Data Conversion to JSON, Hadoop/HDP
- Cassandra - DataStax (25 Node/Cluster) Transaction Data with replica’s across 3 data centers
- Oracle EBS 12.2.5/ DB 11iR2 Vision Installations at 5 different locations for 11i to 12c Upgrade Planning
- Oracle EBS upgrade from 12.1.3 to 12.2.5 / RDBMS Upgrade from 11i R2 to 12c / EBS performance tuning
- Oracle EBS to Oracle PeopleSoft bi-directional data replication and data ETL into Big Data (Hadoop) Data Lake (Batch)
- Peoplesoft Prod, Test, QA & Dev support & administration
- Peoplesoft Upgrade from 9.1 to 9.2 Upgrade Assistant; Peoplesoft Tools Upgrade to latest 8.51 to 8.54 Change Assistant
- Peoplesoft Integration Broker, Gateway Properties, App Messages - Optimization
- Peoplesoft Database to Oracle Exadata Platform – Optimized: Statistics, Indexes, Configurations, Scans & Parallelization
- Peoplesoft Application Server, Process Scheduler, REN Server, Gateway, Weblogic & File Server Support/Upgrade
Confidential
Big Data Architect / Enterprise Architect / Data Scientist / Big Data Technical Lead / SQL Server 2016 DBA / Azure Cloud Architect
Responsibilities:
- Build services that help categorize data based on usage and underlying attributes coming from a variety of systems.
- Create systems that help quickly make anomalous patterns in data pipelines known to teams throughout enterprise.
- Provide requirements and techniques into systems that help cleanse data being used in key business data pipelines.
- Analyze data originating from many different source systems and database technologies.
- Work with people/teams throughout enterprise to find opportunities improve data quality for overall data products.
- Build features to support data categorization models, data quality anomaly detection and better data cleansing processes.
- Identify and improve data elements within existing data lakes and new data lakes still in design phase.
- Design and develop data requirements and samples that can be incorporated into engineering (technical) processes.
- Machine Learning Frameworks - Amazon Machine Learning / Azure Machine Learning / H20 / Spark MLLib
- Machine Learning Frameworks (Streams) - Massive Online Analysis / Spark MLLib
- Utilizing Event Stream Processing and Complex Event Processing, Egress, Visualization and Utilization
- Edit Python and R code for optimization and performance improvements.
- R & Python with comprehensive proficiency; Scala – Architecture & Collection Library, REPL, Scaladoc, Refection, Macros
- Detailed understanding of machine learning pipelines and ability to discuss concepts such as feature discovery/engineeringmodel evaluation/validation, online vs. offline learning, and model deployment.
- Deployment of Hadoop and Spark ecosystems.
- Regression, trees, neural networks, survival analysis, cluster analysis, forecasting, anomaly detection, association rules.
- Big Data Architecture and Security, Maintenance and Governance
- Big Data Design Patterns - Data Ingress, Data Wrangling, Data Storage
- Big Data Solution Patterns - Data Processing, Data Analysis, Data Egress, Data Visualization
- Defined scope for Big Data Platform and identified / Selected initial Use Cases that would drive Big Data Project
- Big Data Strategy – Developed Initial Approach and Selected Initial Technology Stack
- Big Data Strategy – performance management, data exploration, social analytics, data science
- Architect & Admin. (Azure) PostgresSQ, Cloudera & Kafka environments;GoLANG Program
- Architect & Admin. (Azure) Hadoop Cloudera, Hadoop Yarn, Hadoop Spark w/Mongo Cassandra & Couchbase
- Architect & Admin. (Azure) Cloudera, Hadoop Yarn, Storm, Nifi w/Mongo, Cassandra & Couchbase
- Architect – Azure Data Factory, Data Pipeline Design, Azure Data lake / Azure Storage – HDFS, SQL, NoSQL
- Engineer – Deployment MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) – UDF and UDAF requirements
- Engineer – Deployment Talend Data Fabric, Spark within AWS EMR – UDF and UDAF requirements
- Architect – MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) – UDF and UDAF requirements
- Support API and Java Developer teams with both Administration of the total cluster and data request from legacy and cloud
- DevOps – Automation for Support. Deployment, Patching, Configuration, SDLC, Migration efforts, Sync with On-Premise
- DevOps - Build/Release/Deployment/Operations; Tools (Datical, Jenkins, SolarWinds, Splunk, Vagrant, Nagios)
Confidential
Database Administrator / Database Engineer / Big Data Architect / Big Data Administrator / Hadoop Administrator / Hadoop Technical Lead
Responsibilities:
- Drafted EnterpriseBig Data Platform policy which was incorporated in executive Project Management guidance
- Defined scope for Big Data Platform and identified / Selected initial Use Cases that would drive Big Data Project
- Big Data Strategy – Developed Initial Approach and Selected Initial Technology Stack
- Big Data Strategy – performance management, data exploration, social analytics, data science
- Architect & Administrator Hadoop, MongoDB, Hadoop Cluster
- Architect & Administrator Hadoop Cluster; Hadoop HDFS; Hadoop Hive; Hadoop Map Reduce; Hadoop Pig
- Oracle12C Enterprise Metadata Management installation and deployment
- Database connection pooling and configuration (Oracle, SQL Server, DB2, MySQL – ODBC & JDBC)
- Oracle Enterprise Metadata Management - Impact Analysis, Annotation and Tagging functions, Reporting Source Lineage
- Oracle Exadata - Migration from Oracle RAC to Oracle Exadata multi-tenant (RAC) Cluster
- Oracle Exadata – Parallelization Optimization, Index Optimization, Partition optimization, Statistics optimization
- Oracle Exadata – Performance tuning, Optimizer optimization, Configuration optimization, Smart Scans optimization
- Oracle Golden Gate / Oracle Data Integrator / Hive / PostgreSQL Data integration design & configuration
- Database connection pooling and configuration (Oracle, SQL Server, DB2, MySQL – ODBC & JDBC)
- SQL DBA - Log Shipping, Database Restore, Database Refreshes, Monitoring
- SQL DBA – Meta Data Management, Log Management, IN-Memory Optimization, Database Cluster Tuning
- SQL DBA – SQL Profiler, Indexing Optimization, Parallel Query Optimization, Storage Optimization (Data Files, Logs)
- DB2/UDB DBA – Backups, Performance Tuning, Parameter/Configuration Optimization, Partitioning, Query Optimization,
- DB2/UDB DBA - Log Shipping, Database Restore, Database Refreshes, Monitoring
- DB2/UDB DBA – Meta Data Management, Log Management, IN-Memory Optimization, Database Cluster Tuning
- DB2/UDB DBA – Indexing Optimization, Parallel Query Optimization, Storage Optimization (Data Files, Logs)
Confidential, McLean, VA
Big Data Architect / Data Governance and Master Data Management
Responsibilities:
- Database Refactoring, Database Upgrades, Database Migrations, Database Platform Changes (SQL->SQL, SQL->NoSQL)
- Defined scope for Big Data Platform and identified / Selected initial Use Cases that would drive Big Data Project
- Big Data Strategy – Developed Initial Approach and Selected Initial Technology Stack
- Technical Architect of analytics platform collecting usage from billions of records.
- Database connection pooling and configuration (Oracle, SQL Server – ODBC & JDBC)
- Led team of engineers and coordinated the QA effort for prod-ops, QA and presentations to product and executive teams.
- Implemented high speed caching engine directly serving millions of customers when not possible prior.
- Database Refactoring, Database Upgrades, Database Migrations, Database Platform Changes (SQL->SQL)
- Led the creation of Data Governance vision, charter, framework, committees and processes for the enterprise.
- Led the implementation, design (one full lifecycle) of Master Data Management (MDM).
- Proven "hands on" MDM experience with expertise in MDM strategy proposal, roadmap and planning
- Phased implementation leveraging best practices and strong focus in data quality
- Experience in design/architecture of MDM Hub, data integration, data governance process and data quality.
- Documented cloud strategy for Big Data Platform and showed value for use cases selected for project
- Setup initial ETL design for Big Data Platform from Production EDW (Replacing Informatica with Big Data Platform)
- Architect & Administrator Hadoop Cluster; Hadoop HDFS; Hadoop Hive; Hadoop Pig