We provide IT Staff Augmentation Services!

Lead Big Data Architect - Engineer / Lead Cloud Architect-engineer / Data Scientist Resume

4.00/5 (Submit Your Rating)

Jacksonville, FL

WORK HISTORY:

Confidential, Jacksonville, FL

Lead Big Data Architect - Engineer / Lead Cloud Architect-Engineer / Data Scientist

Responsibilities:

  • Design/Plan/Architect Pivotal Big Data Suite roadmap including use of the HDP / HDF Hadoop technology stack
  • Design/Plan/Architect data ingestion strategy from 400+ data sources into (HDP) Hadoop Data Lake
  • Design/Plan/Architect ETL strategies for real time data pipeline ingestion to (HDP/HDF) Hadoop Data Lake
  • Design/Plan/ArchitectStorm, Kafka and Spark architecture included in HDP/HDF real time data solutions
  • Data Discovery, Data Profiling, Predictive modelling, Machine Learning, R & Python development
  • Led the creation of Data Governance vision, charter, framework, committees and processes for the enterprise.
  • Led the implementation, design (one full lifecycle) of Master Data Management (MDM) using MuleSoft / Talend.
  • Proven "hands on" MDM experience with expertise in MDM strategy proposal, roadmap and planning
  • Phased implementation leveraging best practices and strong focus in data quality
  • Experience in design/architecture of MDM Hub, data integration, data governance process and data quality.
  • Proficient using Talend, Profisee Maestro, Informatica Siperion and IBM Infosphere MDM, DQ & DG tools.
  • R & Python with comprehensive proficiency; Scala - Architecture & Collection Library, REPL, Scala, Refection, Macros
  • Deployment of Hadoop and Spark ecosystems.
  • Using Erwin for logical and physical database design, database optimization, loading strategy design and implementation, conducting business analysis, event modeling & using knowledge of standard commercial databases (Oracle, Teradata, DB2).
  • Working in Big Data and Microservices technologies like - Hadoop, Map Reduce Frameworks, Cassandra, Kafka, Spark, HBase, Hive, Springboot, nodejs etc.
  • Developing database solutions by designing proposed system; defining database physical structure and functional capabilities, security, back-up, and recovery specifications and providing database support by coding utilities, responding to user queries and troubleshooting issues
  • Interacting and collaborating with cross functional teams including application development, peer reviews, testing, operations, security and compliance and project management office, as well as business customers and external vendors.
  • Machine Learning Frameworks - Amazon Machine Learning / Azure Machine Learning / Singa / H20 / Spark MLLib
  • Machine Learning Frameworks (Streams) - Massive Online Analysis / Spark MLLib
  • Regression, trees, neural networks, survival analysis, cluster analysis, forecasting, anomaly detection, association rules.
  • Detailed understanding of machine learning pipelines and ability to discuss concepts such as feature discovery/engineering, model evaluation/validation, online vs. offline learning, and model deployment.
  • CreatepredictiveandclusteringmodelsutilizingOracle, SQLServerandHDFSdatasources
  • Definewhenpredictiveorclusteringmodelscouldbeutilized,andthetypeofdatarequiredtomaketheminsightful
  • Develop,extractandmaintainlogicalandphysicaldatamodelsfordataanalyticswithinDirectEnergy
  • Enhancingdatacollectionprocedurestoincludeinformationthatisrelevantforbuildinganalyticsystems
  • Dataminingusingstate-of-theartmethodsandproduceactionableinsight
  • Selectingfeatures,buildingandoptimizingclassifiersusingmachinelearningtechniques
  • Designanddeveloppredictivemodelsandmachinelearningalgorithmsusingadvancedmethodologies
  • Architect - AWS, AWS RDB, AWS Data Warehouse, AWS Redshift & AWS Storage solutions
  • Architect - EC2, S3, CloudFormation, RDS, CloudFront, VPC, Route53, IAM, CloudWatch, Beanstalk, Lambda
  • Architect - Build, design, architect, implement high-volume, high-scale data analytics, machine learning Snowflake solutions
  • Architect - Data Migrations from Oracle, SQL Server and Hadoop (Hive / HDFS) to Snowflake Databases
  • Engineer - SnowFlake Data Warehouse - Analyze and performance tune, query processing engine with SnowFlake DW.
  • Engineer - SnowFlake Data Warehouse -Data Migration Strategy from On-Prem to SnowFlake DW solution - Ingestion Plan
  • Engineer - Deploy cloud infrastructure (Security Groups and load balancers needed to support EBS environment)
  • Engineer - Create and manage TFS Continuous integration builds on VSTS
  • Engineer - Responsible for maintaining AWS instances as part of EBS deployment
  • Engineer - Systems administration with Windows / Unix scripting
  • Engineer - Support on AWS services and DevOps deploying applications
  • Architect - Technical / Solution SME within the Data Integration across on-premise and AWS data sources / applications
  • Architect - MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) - UDF and UDAF requirements
  • Architect - Talend Data Fabric through Spark and AWS EMR for Big Data Batch Jobs - UDF and UDAF requirements
  • Architect - Azure Data Factory, Data Pipeline Design, Azure Data lake / Azure Storage - Oracle, DB2, SQL Server, MySQL
  • Engineer - Azure Data Factory, Data Pipeline DevelopmentSQL, SSIS, Powershell and ETL scripting
  • Engineer – Azure Data Flow, Data Modeling in Azure, and Azure Ad-HOC Reporting (design / development)
  • Architect – ETL from AWS to Google Cloud to Azure and from/to other On-Prem data sources / targets.
  • Architect – Google Cloud Platform utilizing the Data Analytics, Data Stream Analytics, Hadoop, Data Lake and BI toolset
  • Engineer – Google Cloud Platform Data ingestion, Analytics datasets, data lake integration, data migration to Google Cloud
  • Engineer – Google Cloud Platform to Kafka and Spark cluster solutions, Google Cloud Platform to Azure via HDFS/Hive
  • Architect – Google Big Query for use cases where other Hadoop Solutions didn’t provide the results needed by business.
  • Engineer – Develop / Design data patterns via microservices into data pipelines across the Azure Technology Stack.
  • Architect & Administrator (AWS) GenGireXD, PostgresSQL, Greenplum, Hawq & Kafka environments; GoLANG Program
  • Architect & Administrator (Azure) Azure SQL DB, Hadoop, Hadoop Spark w/ NoSQL (Mongo, Cassandra & Couchbase)
  • Architect – Salesforce Data Extractions into Azure Data Lake & Cosmos DB / MuleSoft Microsoft Service Bus Connector
  • Architrct – MuleSoft Anypoint Platform to Azure API – Data Ingestion / Data Services from/to Salesforce, SAP, Databricks
  • Architect – Designed / Developed Data Migration Strategy from On Premise to Cloud (SQL & NoSQL Technology)
  • Architect & Administrator (Google Cloud) Hadoop, MongoDB, Couchbase, Hbase, PostgreSQL, Cassandra (Spark / Storm)
  • Architect - Callidus Cloud w/ SAP Hana; ETL Sales Data via Kafka to Data Lake (Hadoop); Data Visualizations
  • Architect – Calidus Cloud Integration to enterprise data stores via both SAP Data, Non-SAP data and Master Data Mgt.
  • Engineer – Deployment MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) – UDF and UDAF requirements
  • Engineer – Deployment Talend Data Fabric, Spark within AWS EMR – UDF and UDAF requirements
  • Engineer – NetAPP Data Fabric architecture for UDF and UDAF deployments
  • Engineer - Azure Data Bricks, Azure Data Lake Service, Azure SQL Data Warehouse, Azure Data Catalog
  • Engineer - Technical / Solution SME within the Data Integration of Azure, Blob Storage, Log Analytics
  • Engineer – Azure Data Flow, Data Modeling in Azure, and Azure Ad-HOC Reporting (design / development)
  • Architect & Administrator (Azure) Cosmos DB – Schema Design, Data ingestion, Performance and Query optimization
  • Oracle Golden Gate / Oracle Data Integrator / Hive / PostgreSQL Data integration design & configuration
  • Oracle12C Enterprise Metadata Management installation and deployment
  • Data Integration with MuleSoft EBS, JMS Transport - TIBCO Suite (EMS) for EAI, SOA and BPM
  • MuleSoft integration patterns – Migration, Broadcast, Aggregation, Bidirectional Sync, Correlation.
  • MuleSoft Anypoint Platform, Connectors / Transports, Enterprise Service Bus, Integration Services
  • Installation and Deployment – Oracle Enterprise Metadata Management for Hadoop / PostreSQL Data Store,
  • Implement Talend Data Integration- Reading an input file, transforming data, combining columns, Joining data sources
  • Implement Talend Data Integration- Creating database metadata, Joining data, Master Data Management model design
  • Define Cloud Native (micro-services/BDD (behavior driven development), containers/Dockers, Agile, BDD)
  • Define data ingestion strategies; Kafka, Storm, Nifi, Zookeeper, Oozie, Sqoop – Lambda Architecture
  • Cloud to/from On-Prem – “Apache” - Kafka, NiFi, Storm, Flume, Sqoop, Samza, Chukwa
  • Cloud to/from On-Prem - Wavefront, Data Torrent, Amazon Kinesis, Syncsort, Gobblin, FluentD, Cloudera Morphlines,
  • Cloud to/from On-Prem - White Elephant, Heka, Scribe, Databus
  • Tools were tested, compared, bench marked, via POC and Performance testing. Ingestion Strategy /Tool Outcome
  • Delta Architecture Ingestion - Acquisition -> Sqoop, Flume, Python; Messaging -> Kafka, Pulsar;
  • Delta Architecture Ingestion - Stateful -> Flink; Query / Processing / Lamda -> Hadoop, MapReduce, Hive, NoSQL
  • Neo4J -Graphical Data – Query, Analyze for highly connected data; Native Graph Storage, Native Graph Processing
  • Neo4J - Graph scalability, high availability, Graph Clustering – Graphs on Spark, Graphs in Azure Cloud Graph Platform
  • HBase - Clusters Design, Management, including backup/recovery, replication, cluster failover, and disaster recovery
  • HBase – Large structured and unstructured Datasets from multiple data sources – pipeline to Hadoop / KAFKA Clusters
  • Couchbase - 5.0/4.6.x (15 Node/Cluster) Document Data modeling, Cluster Management w/ Hadoop HDF
  • Couchbase - Node Configuration, Data Conversion to JSON, Hadoop/HDP
  • Cassandra - DataStax (25 Node/Cluster) Transaction Data with replica’s across 3 data centers

Confidential, Santa Monica, CA

Lead Big Data Architect-Engineer / Lead Cloud Architect-Engineer / Enterprise Data Architect / Data Scientist

Responsibilities:

  • Design/Plan/Architect Pivotal Big Data Suite roadmap including use of the HDP / HDF Hadoop technology stack
  • Design/Plan/Architect data ingestion strategy from 2500+ data sources into (HDP) Hadoop Data Lake
  • Design/Plan/Architect ETL strategies for real time data pipeline ingestion to (HDP/HDF) Hadoop Data Lake
  • Design/Plan/ArchitectStorm, Kafka and Spark architecture included in HDP/HDF real time data solutions
  • Data Discovery, Data Profiling, Predictive modelling, Machine Learning, R & Python development
  • Architect - AWS, AWS RDB, AWS Data Warehouse, AWS Redshift & AWS Storage solutions
  • Architect – EC2, S3, CloudFormation, RDS, CloudFront, VPC, Route53, IAM, CloudWatch, Beanstalk, Lambda
  • Architect - Build, design, architect, implement high-volume, high-scale data analytics, machine learning Snowflake solutions
  • Architect – Azure Data Factory, Data Pipeline Design, Azure Data lake / Azure Storage – Oracle, DB2, SQL Server, MySQL
  • Engineer – Azure Data Factory, Data Pipeline DevelopmentSQL, SSIS, Powershell and ETL scripting
  • Engineer – SnowFlake Data Warehouse - Analyze and performance tune, query processing engine with SnowFlake DW.
  • Engineer – SnowFlake Data Warehouse -Data Migration Strategy from On-Prem to SnowFlake DW solution - Ingestion Plan
  • Engineer - Deploy cloud infrastructure (Security Groups and load balancers needed to support EBS environment)
  • Engineer - Create and manage TFS Continuous integration builds on VSTS
  • Engineer - Responsible for maintaining AWS instances as part of EBS deployment
  • Engineer - Systems administration with Windows / Unix scripting
  • Excellent grasp of integrating multiple data sources into an Enterprise data management platform and can lead data storage solution design
  • Ability to understand business requirements and building pragmatic/cost effective solutions using agile project methodologies
  • Participate in Agile/Scrum ceremonies, including 2 week release sprints
  • Perform requirements analysis and high quality code development
  • Review the code of coworkers and offer feedback
  • Design frameworks, libraries, and components that are reusable
  • Engineer - Support on AWS services and DevOps deploying applications
  • Architect - Technical / Solution SME within the Data Integration across on-premise and AWS data sources / applications
  • Architect – MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) – UDF and UDAF requirements
  • Architect – Talend Data Fabric through Spark and AWS EMR for Big Data Batch Jobs – UDF and UDAF requirements
  • Engineer – Azure Data Flow, Data Modeling in Azure, and Azure Ad-HOC Reporting (design / development)
  • Architect – ETL from AWS to Google Cloud to Azure and from/to other On-Prem data sources / targets.
  • Architect – Google Cloud Platform utilizing the Data Analytics, Data Stream Analytics, Hadoop, Data Lake and BI toolset
  • Engineer – Google Cloud Platform Data ingestion, Analytics datasets, data lake integration, data migration to Google Cloud
  • Engineer – Google Cloud Platform to Kafka and Spark cluster solutions, Google Cloud Platform to Azure via HDFS/Hive
  • Architect – Google Big Query for use cases where other Hadoop Solutions didn’t provide the results needed by business.
  • Engineer – Develop / Design data patterns via microservices into data pipelines across the Azure Technology Stack.
  • Architect & Administrator (AWS) GenGireXD, PostgresSQL, Greenplum, Hawq & Kafka environments; GoLANG Program
  • Architect & Administrator (Azure) Azure SQL DB, Hadoop, Hadoop Spark w/ NoSQL (Mongo, Cassandra & Couchbase)
  • Architect – Designed / Developed Data Migration Strategy from On Premise to Cloud (SQL & NoSQL Technology)
  • Architect & Administrator (Google Cloud) Hadoop, MongoDB, Couchbase, Hbase, PostgreSQL, Cassandra (Spark / Storm)
  • Architect - Callidus Cloud w/ SAP Hana; ETL Sales Data via Kafka to Data Lake (Hadoop); Data Visualizations
  • Architect – Calidus Cloud Integration to enterprise data stores via both SAP Data, Non-SAP data and Master Data Mgt.
  • Engineer – Deployment MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) – UDF and UDAF requirements
  • Engineer – Deployment Talend Data Fabric, Spark within AWS EMR – UDF and UDAF requirements
  • Engineer – NetAPP Data Fabric architecture for UDF and UDAF deployments
  • Engineer - Azure Data Bricks, Azure Data Lake Service, Azure SQL Data Warehouse, Azure Data Catalog
  • Engineer - Technical / Solution SME within the Data Integration of Azure, Blob Storage, Log Analytics
  • Engineer – Azure Data Flow, Data Modeling in Azure, and Azure Ad-HOC Reporting (design / development)
  • Architect & Administrator (Azure) Cosmos DB – Schema Design, Data ingestion, Performance and Query optimization
  • Engineer – ETL from Azure SQL to multitude of data targets and to/from data targets/sources.
  • DevOps – Automation for Support. Deployment, Patching, Configuration, SDLC, Migration efforts, Sync with On-Premise
  • DevOps - Build/Release/Deployment/Operations; Tools (Datical, Jenkins, SolarWinds, Splunk, Vagrant, Nagios)
  • DevOps - Linux/Unix/Windows Administration
  • Led the creation of Data Governance vision, charter, framework, committees and processes for the enterprise.
  • Led the implementation, design (one full lifecycle) of Master Data Management (MDM) using Profisee Maestro / Talend.
  • Proven "hands on" MDM experience with expertise in MDM strategy proposal, roadmap and planning
  • Phased implementation leveraging best practices and strong focus in data quality
  • Experience in design/architecture of MDM Hub, data integration, data governance process and data quality.
  • Proficient using Talend, Profisee Maestro, Informatica Siperion and IBM Infosphere MDM, DQ & DG tools.
  • R & Python with comprehensive proficiency; Scala – Architecture & Collection Library, REPL, Scala, Refection, Macros
  • Machine Learning Frameworks - Amazon Machine Learning / Azure Machine Learning / Singa / H20 / Spark MLLib
  • Machine Learning Frameworks (Streams) - Massive Online Analysis / Spark MLLib
  • Regression, trees, neural networks, survival analysis, cluster analysis, forecasting, anomaly detection, association rules.
  • Understanding of machine learning pipelines, feature discovery/engineering, model evaluation/validation/deployment.
  • Translatecomplexbusinessissuesintoachievableanalyticallearningobjectivesandactionableanalyticprojects
  • CreatepredictiveandclusteringmodelsutilizingOracle, SQLServerandHDFSdatasources
  • Processing,cleansing,andverifyingtheintegrityofdatausedforanalytics
  • Selectingfeatures,buildingandoptimizingclassifiersusingmachinelearningtechniques
  • Designanddeveloppredictivemodelsandmachinelearningalgorithmsusingadvancedmethodologies
  • Oracle Golden Gate / Oracle Data Integrator / Hive / PostgreSQL Data integration design & configuration
  • Oracle12C Enterprise Metadata Management installation and deployment
  • Installation and Deployment – Oracle Enterprise Metadata Management for Hadoop / PostreSQL Data Store,
  • Implement Talend Data Integration- Reading an input file, transforming data, combining columns, Joining data sources
  • Implement Talend Data Integration- Creating database metadata, Joining data, Master Data Management model design
  • Define SOA based applications and micro-services for Data Pipeline Architecture
  • Define Cloud Native (micro-services/BDD (behavior driven development), containers/Dockers, Agile, BDD)
  • Define data ingestion strategies; Kafka, Storm, Nifi, Zookeeper, Oozie, Sqoop – Lambda Architecture
  • Cloud to/from On-Prem – “Apache” - Kafka, NiFi, Storm, Flume, Sqoop, Samza, Chukwa
  • Cloud to/from On-Prem - Wavefront, Data Torrent, Amazon Kinesis, Syncsort, Gobblin, FluentD, Cloudera Morphlines,
  • Cloud to/from On-Prem - White Elephant, Heka, Scribe, Databus
  • Tools were tested, compared, bench marked, via POC and Performance testing. Ingestion Strategy /Tool Outcome
  • Delta Architecture Ingestion - Acquisition -> Sqoop, Flume, Python; Messaging -> Kafka, Pulsar;
  • Delta Architecture Ingestion - Stateful -> Flink; Query / Processing / Lamda -> Hadoop, MapReduce, Hive, NoSQL
  • Database Refactoring, Database Upgrades, Database Migrations, Database Platform Changes (SQL->SQL, SQL->NoSQL)
  • Database connection pooling and configuration (Oracle, SQL Server, DB2, PostgreSQL)
  • Migration from Informatica ETL / ELT methods into Azure Data Factory Data Pipeline architecture.
  • Data Pipeline from relational database into cloud data lake and data storage using ETL extraction via Azure Data Factory
  • SQL Env – Oracle 12c (450+ PRD DB’s); SQL Server 2012/2014/2016 (425+ PRD DB’s); DB2 9/10 (75 PRD DB’s)
  • Oracle – Oracle Performance Tuning, backup & recovery, DevOps Automated Changes, Data Migrations, ETL
  • SQL – Oracle 11i, 12c, 12c R2, OEM 13c; SQL Server 2012, 2014, 2016; DB2 11.1, 10.5, 10.1, 9.8, 9.7;
  • Sybase – (ASE 15.x/16.x) – 30+ Dev/TST/PRD DB’s (Log Shipping, Replication Server, DB Mirror);
  • MySQL – (Percona 5.7.x) – 15+ PRD DB’s Percona Cluster; (MariaDB 10.1.x, 10.2.x, 10.3.x) 20+ TST/DEV
  • MySQL – Replication, cluster configurations, sharding – IaaS (On-Prem & Cloud), Index, Elastic Search
  • PostgreSQL DBA 9.5/9.6 (20+ PRD DB’s. Data Ingestion, Data Model optimization, Table/Index Optimization
  • PostgreSQL DBA – Backup/Recovery, Performance Tuning, Data loads, Connectors to other environments, SQL Tuning
  • PostgreSQL DBA – Replication and Replica Management; Sync Management and Optimization;
  • PostgreSQL DBA – Query, Storage, Index, - Data Analytics Optimization.
  • Exadata – X5 migration to X7 4 node NVMe 18 Core configuration;
  • Exadata - Migration from Oracle RAC to Oracle Exadata multi-tenant (RAC) Cluster
  • Exadata – Parallelization Optimization, Index Optimization, Partition optimization, Stats optimization
  • Exadata – Perf. tuning, Optimizer optimization, Configuration optimization, Smart Scans optimization
  • Exadata – EXACHK, ASM, ClusterWare, DCLi, CellCLi, - Storage, Operating System and Network
  • Oracle – Oracle 12C Pluggable Databases Through Database Consolidation; Redaction Policy; Top N Query and Fetch;
  • ODI – Oracle 12C Data Integrator configuration with Hadoop/Hive & Oracle DB integration;
  • All Databases – Meta Data Management, Log Management, IN-Memory Optimization, Database Cluster Tuning
  • SQL Server – SQL Profiler, Indexing Optimization, Parallel Query Optimization, Storage Optimization (Data Files, Logs)
  • Netezza - (15+ PRD DB’s) – 30+ Dev/TST/PRD DB’s (Log Shipping, Replication Server, DB Mirror);
  • Netezza - Provided technical efficiency performance and security functions of Netezza databases.
  • Netezza - Implemented procedures for allocation of hardware resources and performance tuning
  • MongoDB –3.6/3.4 (15 MongoDB PRD DB’s) w/Sharding across 20 node cluster
  • MongoDB –Full High Availability within two data centers. Document store for more than 5000 users.
  • Neo4J -Graphical Data – Query, Analyze for highly connected data; Native Graph Storage, Native Graph Processing
  • Neo4J - Graph scalability, high availability, Graph Clustering – Graphs on Spark, Graphs in Azure Cloud Graph Platform
  • HBase - Clusters Design, Management, including backup/recovery, replication, cluster failover, and disaster recovery
  • HBase – Large structured and unstructured Datasets from multiple data sources – pipeline to Hadoop / KAFKA Clusters
  • GreenPlum – (15+ PRD DB’s). 10 Node Cluster, HAWQ, Pivotal HD & Big Data Suite
  • GreenPlum – Data Ingestion, Backup/Recovery, Performance Tuning, Connection Pooling, Query Tuning
  • Couchbase - 5.0/4.6.x (15 Node/Cluster) Document Data modeling, Cluster Management w/ Hadoop HDF
  • Couchbase - Node Configuration, Data Conversion to JSON, Hadoop/HDP
  • Cassandra - DataStax (25 Node/Cluster) Transaction Data with replica’s across 3 data centers
  • Oracle EBS 12.2.5/ DB 11iR2 Vision Installations at 5 different locations for 11i to 12c Upgrade Planning
  • Oracle EBS upgrade from 12.1.3 to 12.2.5 / RDBMS Upgrade from 11i R2 to 12c / EBS performance tuning
  • Oracle EBS to Oracle PeopleSoft bi-directional data replication and data ETL into Big Data (Hadoop) Data Lake (Batch)
  • Peoplesoft Prod, Test, QA & Dev support & administration
  • Peoplesoft Upgrade from 9.1 to 9.2 Upgrade Assistant; Peoplesoft Tools Upgrade to latest 8.51 to 8.54 Change Assistant
  • Peoplesoft Integration Broker, Gateway Properties, App Messages - Optimization
  • Peoplesoft Database to Oracle Exadata Platform – Optimized: Statistics, Indexes, Configurations, Scans & Parallelization
  • Peoplesoft Application Server, Process Scheduler, REN Server, Gateway, Weblogic & File Server Support/Upgrade

Confidential

Big Data Architect / Enterprise Architect / Data Scientist / Big Data Technical Lead / SQL Server 2016 DBA / Azure Cloud Architect

Responsibilities:

  • Build services that help categorize data based on usage and underlying attributes coming from a variety of systems.
  • Create systems that help quickly make anomalous patterns in data pipelines known to teams throughout enterprise.
  • Provide requirements and techniques into systems that help cleanse data being used in key business data pipelines.
  • Analyze data originating from many different source systems and database technologies.
  • Work with people/teams throughout enterprise to find opportunities improve data quality for overall data products.
  • Build features to support data categorization models, data quality anomaly detection and better data cleansing processes.
  • Identify and improve data elements within existing data lakes and new data lakes still in design phase.
  • Design and develop data requirements and samples that can be incorporated into engineering (technical) processes.
  • Machine Learning Frameworks - Amazon Machine Learning / Azure Machine Learning / H20 / Spark MLLib
  • Machine Learning Frameworks (Streams) - Massive Online Analysis / Spark MLLib
  • Utilizing Event Stream Processing and Complex Event Processing, Egress, Visualization and Utilization
  • Edit Python and R code for optimization and performance improvements.
  • R & Python with comprehensive proficiency; Scala – Architecture & Collection Library, REPL, Scaladoc, Refection, Macros
  • Detailed understanding of machine learning pipelines and ability to discuss concepts such as feature discovery/engineeringmodel evaluation/validation, online vs. offline learning, and model deployment.
  • Deployment of Hadoop and Spark ecosystems.
  • Regression, trees, neural networks, survival analysis, cluster analysis, forecasting, anomaly detection, association rules.
  • Big Data Architecture and Security, Maintenance and Governance
  • Big Data Design Patterns - Data Ingress, Data Wrangling, Data Storage
  • Big Data Solution Patterns - Data Processing, Data Analysis, Data Egress, Data Visualization
  • Defined scope for Big Data Platform and identified / Selected initial Use Cases that would drive Big Data Project
  • Big Data Strategy – Developed Initial Approach and Selected Initial Technology Stack
  • Big Data Strategy – performance management, data exploration, social analytics, data science
  • Architect & Admin. (Azure) PostgresSQ, Cloudera & Kafka environments;GoLANG Program
  • Architect & Admin. (Azure) Hadoop Cloudera, Hadoop Yarn, Hadoop Spark w/Mongo Cassandra & Couchbase
  • Architect & Admin. (Azure) Cloudera, Hadoop Yarn, Storm, Nifi w/Mongo, Cassandra & Couchbase
  • Architect – Azure Data Factory, Data Pipeline Design, Azure Data lake / Azure Storage – HDFS, SQL, NoSQL
  • Engineer – Deployment MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) – UDF and UDAF requirements
  • Engineer – Deployment Talend Data Fabric, Spark within AWS EMR – UDF and UDAF requirements
  • Architect – MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) – UDF and UDAF requirements
  • Support API and Java Developer teams with both Administration of the total cluster and data request from legacy and cloud
  • DevOps – Automation for Support. Deployment, Patching, Configuration, SDLC, Migration efforts, Sync with On-Premise
  • DevOps - Build/Release/Deployment/Operations; Tools (Datical, Jenkins, SolarWinds, Splunk, Vagrant, Nagios)

Confidential

Database Administrator / Database Engineer / Big Data Architect / Big Data Administrator / Hadoop Administrator / Hadoop Technical Lead

Responsibilities:

  • Drafted EnterpriseBig Data Platform policy which was incorporated in executive Project Management guidance
  • Defined scope for Big Data Platform and identified / Selected initial Use Cases that would drive Big Data Project
  • Big Data Strategy – Developed Initial Approach and Selected Initial Technology Stack
  • Big Data Strategy – performance management, data exploration, social analytics, data science
  • Architect & Administrator Hadoop, MongoDB, Hadoop Cluster
  • Architect & Administrator Hadoop Cluster; Hadoop HDFS; Hadoop Hive; Hadoop Map Reduce; Hadoop Pig
  • Oracle12C Enterprise Metadata Management installation and deployment
  • Database connection pooling and configuration (Oracle, SQL Server, DB2, MySQL – ODBC & JDBC)
  • Oracle Enterprise Metadata Management - Impact Analysis, Annotation and Tagging functions, Reporting Source Lineage
  • Oracle Exadata - Migration from Oracle RAC to Oracle Exadata multi-tenant (RAC) Cluster
  • Oracle Exadata – Parallelization Optimization, Index Optimization, Partition optimization, Statistics optimization
  • Oracle Exadata – Performance tuning, Optimizer optimization, Configuration optimization, Smart Scans optimization
  • Oracle Golden Gate / Oracle Data Integrator / Hive / PostgreSQL Data integration design & configuration
  • Database connection pooling and configuration (Oracle, SQL Server, DB2, MySQL – ODBC & JDBC)
  • SQL DBA - Log Shipping, Database Restore, Database Refreshes, Monitoring
  • SQL DBA – Meta Data Management, Log Management, IN-Memory Optimization, Database Cluster Tuning
  • SQL DBA – SQL Profiler, Indexing Optimization, Parallel Query Optimization, Storage Optimization (Data Files, Logs)
  • DB2/UDB DBA – Backups, Performance Tuning, Parameter/Configuration Optimization, Partitioning, Query Optimization,
  • DB2/UDB DBA - Log Shipping, Database Restore, Database Refreshes, Monitoring
  • DB2/UDB DBA – Meta Data Management, Log Management, IN-Memory Optimization, Database Cluster Tuning
  • DB2/UDB DBA – Indexing Optimization, Parallel Query Optimization, Storage Optimization (Data Files, Logs)

Confidential, McLean, VA

Big Data Architect / Data Governance and Master Data Management

Responsibilities:

  • Database Refactoring, Database Upgrades, Database Migrations, Database Platform Changes (SQL->SQL, SQL->NoSQL)
  • Defined scope for Big Data Platform and identified / Selected initial Use Cases that would drive Big Data Project
  • Big Data Strategy – Developed Initial Approach and Selected Initial Technology Stack
  • Technical Architect of analytics platform collecting usage from billions of records.
  • Database connection pooling and configuration (Oracle, SQL Server – ODBC & JDBC)
  • Led team of engineers and coordinated the QA effort for prod-ops, QA and presentations to product and executive teams.
  • Implemented high speed caching engine directly serving millions of customers when not possible prior.
  • Database Refactoring, Database Upgrades, Database Migrations, Database Platform Changes (SQL->SQL)
  • Led the creation of Data Governance vision, charter, framework, committees and processes for the enterprise.
  • Led the implementation, design (one full lifecycle) of Master Data Management (MDM).
  • Proven "hands on" MDM experience with expertise in MDM strategy proposal, roadmap and planning
  • Phased implementation leveraging best practices and strong focus in data quality
  • Experience in design/architecture of MDM Hub, data integration, data governance process and data quality.
  • Documented cloud strategy for Big Data Platform and showed value for use cases selected for project
  • Setup initial ETL design for Big Data Platform from Production EDW (Replacing Informatica with Big Data Platform)
  • Architect & Administrator Hadoop Cluster; Hadoop HDFS; Hadoop Hive; Hadoop Pig

We'd love your feedback!