Lead Big Data Architect - Engineer Resume
4.00/5 (Submit Your Rating)
PROFESSIONAL EXPERIENCE
Confidential
Lead Big Data Architect - Engineer
Responsibilities:
- Design/Plan/Architect Pivotal Big Data Suite roadmap including use of the HDP / HDF Hadoop technology stack
- Design/Plan/Architect data ingestion strategy from 400+ data sources into (HDP) Hadoop Data Lake
- Design/Plan/Architect ETL strategies for real time data pipeline ingestion to (HDP/HDF) Hadoop Data Lake
- Design/Plan/Architect Storm, Kafka and Spark architecture included in HDP/HDF real time data solutions
- Data Discovery, Data Profiling, Predictive modelling, Machine Learning, R & Python development
- Led the creation of Data Governance vision, charter, framework, committees and processes for the enterprise.
- Led the implementation, design (one full lifecycle) of Master Data Management (MDM) using MuleSoft / Talend.
- Proven "hands on" MDM experience with expertise in MDM strategy proposal, roadmap and planning
- Phased implementation leveraging best practices and strong focus in data quality
- Experience in design/architecture of MDM Hub, data integration, data governance process and data quality.
- Proficient using Talend, Profisee Maestro, Informatica Siperion and IBM Infosphere MDM, DQ & DG tools.
- R & Python with comprehensive proficiency; Scala - Architecture & Collection Library, REPL, Scala, Refection, Macros
- Deployment of Hadoop and Spark ecosystems.
- Using Erwin for logical and physical database design, database optimization, loading strategy design and implementation, conducting business analysis, event modeling & using knowledge of standard commercial databases (Oracle, Teradata, DB2).
- Working in Big Data and Microservices technologies like - Hadoop, Map Reduce Frameworks, Cassandra, Kafka, Spark, HBase, Hive, Springboot, nodejs etc.
- Developing database solutions by designing proposed system; defining database physical structure and functional capabilities, security, back-up, and recovery specifications and providing database support by coding utilities, responding to user queries and troubleshooting issues
- Interacting and collaborating with cross functional teams including application development, peer reviews, testing, operations, security and compliance and project management office, as well as business customers and external vendors.
- Machine Learning Frameworks - Amazon Machine Learning / Azure Machine Learning / Singa / H20 / Spark MLLib
- Machine Learning Frameworks (Streams) - Massive Online Analysis / Spark MLLib
- Regression, trees, neural networks, survival analysis, cluster analysis, forecasting, anomaly detection, association rules.
- Detailed understanding of machine learning pipelines and ability to discuss concepts such as feature discovery/engineering, model evaluation/validation, online vs. offline learning, and model deployment.
- Create predictive and clustering models utilizing Oracle, SQL Server and HDFS data sources
- Define when predictive or clustering models could be utilized, and the type of data required to make them insightful
- Develop, extract and maintain logical and physical data models for data analytics within Direct Energy
- Enhancing data collection procedures to include information that is relevant for building analytic systems
- Data mining using state-of-the art methods and produce actionable insight
- Selecting features, building and optimizing classifiers using machine learning techniques
- Design and develop predictive models and machine learning algorithms using advanced methodologies
Confidential
Lead Big Data Architect-Engineer
Responsibilities:
- Design/Plan/Architect Pivotal Big Data Suite roadmap including use of the HDP / HDF Hadoop technology stack
- Design/Plan/Architect data ingestion strategy from 2500+ data sources into (HDP) Hadoop Data Lake
- Design/Plan/Architect ETL strategies for real time data pipeline ingestion to (HDP/HDF) Hadoop Data Lake
- Design/Plan/Architect Storm, Kafka and Spark architecture included in HDP/HDF real time data solutions
- Data Discovery, Data Profiling, Predictive modelling, Machine Learning, R & Python development
- Architect - AWS, AWS RDB, AWS Data Warehouse, AWS Redshift & AWS Storage solutions
- Architect - EC2, S3, CloudFormation, RDS, CloudFront, VPC, Route53, IAM, CloudWatch, Beanstalk, Lambda
- Architect - Build, design, architect, implement high-volume, high-scale data analytics, machine learning Snowflake solutions
- Architect - Azure Data Factory, Data Pipeline Design, Azure Data lake / Azure Storage - Oracle, DB2, SQL Server, MySQL
- Engineer - Azure Data Factory, Data Pipeline Development SQL, SSIS, Powershell and ETL scripting
- Engineer - SnowFlake Data Warehouse - Analyze and performance tune, query processing engine with SnowFlake DW.
- Engineer - SnowFlake Data Warehouse -Data Migration Strategy from On-Prem to SnowFlake DW solution - Ingestion Plan
- Engineer - Deploy cloud infrastructure (Security Groups and load balancers needed to support EBS environment)
- Engineer - Create and manage TFS Continuous integration builds on VSTS
- Engineer - Responsible for maintaining AWS instances as part of EBS deployment
- Engineer - Systems administration with Windows / Unix scripting
- Excellent grasp of integrating multiple data sources into an Enterprise data management platform and can lead data storage solution design
- Ability to understand business requirements and building pragmatic/cost effective solutions using agile project methodologies
- Participate in Agile/Scrum ceremonies, including 2 week release sprints
- Perform requirements analysis and high quality code development
- Review the code of coworkers and offer feedback
- Design frameworks, libraries, and components that are reusable
- Engineer - Support on AWS services and DevOps deploying applications
- Architect - Technical / Solution SME within the Data Integration across on-premise and AWS data sources / applications
- Architect - MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) - UDF and UDAF requirements
- Architect – Talend Data Fabric through Spark and AWS EMR for Big Data Batch Jobs – UDF and UDAF requirements
- Engineer – Azure Data Flow, Data Modeling in Azure, and Azure Ad-HOC Reporting (design / development)
- Architect – ETL from AWS to Google Cloud to Azure and from/to other On-Prem data sources / targets.
- Architect – Google Cloud Platform utilizing the Data Analytics, Data Stream Analytics, Hadoop, Data Lake and BI toolset
- Engineer – Google Cloud Platform Data ingestion, Analytics datasets, data lake integration, data migration to Google Cloud
- Engineer – Google Cloud Platform to Kafka and Spark cluster solutions, Google Cloud Platform to Azure via HDFS/Hive
- Architect – Google Big Query for use cases where other Hadoop Solutions didn’t provide the results needed by business.
- Engineer – Develop / Design data patterns via microservices into data pipelines across the Azure Technology Stack.
- Architect & Administrator (AWS) GenGireXD, PostgresSQL, Greenplum, Hawq & Kafka environments; GoLANG Program
- Architect & Administrator (Azure) Azure SQL DB, Hadoop, Hadoop Spark w/ NoSQL (Mongo, Cassandra & Couchbase)
- Architect – Designed / Developed Data Migration Strategy from On Premise to Cloud (SQL & NoSQL Technology)
- Architect & Administrator (Google Cloud) Hadoop, MongoDB, Couchbase, Hbase, PostgreSQL, Cassandra (Spark / Storm)
- Architect - Callidus Cloud w/ SAP Hana; ETL Sales Data via Kafka to Data Lake (Hadoop); Data Visualizations
- Architect – Calidus Cloud Integration to enterprise data stores via both SAP Data, Non-SAP data and Master Data Mgt.
- Engineer – Deployment MapR Data Fabric for Kubernetes (FlexVolume, PersistentVolume) – UDF and UDAF requirements
- Engineer – Deployment Talend Data Fabric, Spark within AWS EMR – UDF and UDAF requirements
- Engineer – NetAPP Data Fabric architecture for UDF and UDAF deployments
- Engineer - Azure Data Bricks, Azure Data Lake Service, Azure SQL Data Warehouse, Azure Data Catalog
- Engineer - Technical / Solution SME within the Data Integration of Azure, Blob Storage, Log Analytics
- Engineer – Azure Data Flow, Data Modeling in Azure, and Azure Ad-HOC Reporting (design / development)
- Architect & Administrator (Azure) Cosmos DB – Schema Design, Data ingestion, Performance and Query optimization
- Engineer – ETL from Azure SQL to multitude of data targets and to/from data targets/sources.
- DevOps – Automation for Support. Deployment, Patching, Configuration, SDLC, Migration efforts, Sync with On-Premise
- DevOps - Build/Release/Deployment/Operations; Tools (Datical, Jenkins, SolarWinds, Splunk, Vagrant, Nagios)
- DevOps - Linux/Unix/Windows Administration
Confidential
Big Data Architect
Responsibilities:
- Build services that help categorize data based on usage and underlying attributes coming from a variety of systems.
- Create systems that help quickly make anomalous patterns in data pipelines known to teams throughout enterprise.
- Provide requirements and techniques into systems that help cleanse data being used in key business data pipelines.
- Analyze data originating from many different source systems and database technologies.
- Work with people/teams throughout enterprise to find opportunities improve data quality for overall data products.
- Build features to support data categorization models, data quality anomaly detection and better data cleansing processes.
- Identify and improve data elements within existing data lakes and new data lakes still in design phase.
- Design and develop data requirements and samples that can be incorporated into engineering (technical) processes.
- Machine Learning Frameworks - Amazon Machine Learning / Azure Machine Learning / H20 / Spark MLLib
- Machine Learning Frameworks (Streams) - Massive Online Analysis / Spark MLLib
Confidential
Database Administrator / Database Engineer
Responsibilities:
- Drafted Enterprise Big Data Platform policy which was incorporated in executive Project Management guidance
- Defined scope for Big Data Platform and identified / Selected initial Use Cases that would drive Big Data Project
- Big Data Strategy – Developed Initial Approach and Selected Initial Technology Stack
- Big Data Strategy – performance management, data exploration, social analytics, data science
- Architect & Administrator Hadoop, MongoDB, Hadoop Cluster
- Architect & Administrator Hadoop Cluster; Hadoop HDFS; Hadoop Hive; Hadoop Map Reduce; Hadoop Pig
- Oracle12C Enterprise Metadata Management installation and deployment
- Database connection pooling and configuration (Oracle, SQL Server, DB2, MySQL – ODBC & JDBC)
- Oracle Enterprise Metadata Management - Impact Analysis, Annotation and Tagging functions, Reporting Source Lineage
- Oracle Exadata - Migration from Oracle RAC to Oracle Exadata multi-tenant (RAC) Cluster
- Oracle Exadata – Parallelization Optimization, Index Optimization, Partition optimization, Statistics optimization
- Oracle Exadata – Performance tuning, Optimizer optimization, Configuration optimization, Smart Scans optimization
- Oracle Golden Gate / Oracle Data Integrator / Hive / PostgreSQL Data integration design & configuration
- Database connection pooling and configuration (Oracle, SQL Server, DB2, MySQL – ODBC & JDBC)
- SQL DBA - Log Shipping, Database Restore, Database Refreshes, Monitoring
- SQL DBA – Meta Data Management, Log Management, IN-Memory Optimization, Database Cluster Tuning
- SQL DBA – SQL Profiler, Indexing Optimization, Parallel Query Optimization, Storage Optimization (Data Files, Logs)
- DB2/UDB DBA – Backups, Performance Tuning, Parameter/Configuration Optimization, Partitioning, Query Optimization,
- DB2/UDB DBA - Log Shipping, Database Restore, Database Refreshes, Monitoring
- DB2/UDB DBA – Meta Data Management, Log Management, IN-Memory Optimization, Database Cluster Tuning
- DB2/UDB DBA – Indexing Optimization, Parallel Query Optimization, Storage Optimization (Data Files, Logs)