SENIOR AWS SOLUTION ARCHITECT Resume

SUMMARY

My primary Big Data Platform (12 years overall) is AWS (6 yrs) along with a lot of Cloudera (4 yrs) and Azure (4yrs) Big Data architecture/Design not only data lake design and development but also a total solution architecture approach to Big Data implementation.
10 plus years’ experience in HADOOP both development and architecture with 6 plus years as an architect. My initial work included work at Berkeley 2003, on initial release of Hadoop and next with AWS, the Amazon Web Services, Cloudera Navigator 2.9, Cloudera CDH 5.7 - 5.10, Impala/KUDU, Cloudera Director and Hortonworks/Azure.
TALEND MDM 4yrs, ETL process including scripting 14+ years, Zookeeper, HMaster, HBase database, HFile, Apache: Flume (log files)ingest 2 years, Oozie (sched. Workflow) 3 years, Sqoop (xfers data) 3 years, Python (2.7 & 3.6 w/SPSS Statistics 23 ) 5 years, Dev Tools such as Spark (with Perf, & Caching) 2 years, HBase 4 years, Pig 4 years, Analysis with: Drill (SQL) 2 years, Hive (HQL) 4 years, Mahout (Clustering, Classification, Collaborative filtering) 6 mos., additionally C & C++, and Shell Script 5 years.
I have extensive use of MDM tools, and Erwin and additionally Power Designer and IBM’s ER tool. I have extensive work on Apache Hadoop which is a highly scalable storage platform designed to process very large data sets across hundreds to thousands of computing nodes that operate in parallel. Hadoop provides a cost effective storage solution on commodity hardware for large data volumes with no format requirements. Additionally, extensive work with MapReduce, the programming paradigm that allows for this massive scalability, is the heart of Hadoop. Note that the term MapReduce actually refers to two separate and distinct tasks that Hadoop programs perform. Hadoop has two main components- HDFS and YARN.
I utilized Ansible, Tower Red Hat, to scale automation, manage complex deployments and speed up the productivity at the client site for Confidential . I further used to extend the power of the workflows process to streamline jobs and simple tools to share solutions with the Confidential team. With Ansible, IT we were able to free admins from automating away the drudgery from the daily tasks. This Automation freed admins up to focus on efforts that help deliver more value to the business by speeding time to application delivery, and building on a culture of success. I was able to give teams the one thing they can never get enough of: time. Allowing smart people to focus on smart things.
I used StreamSets Data Collector (SDC) is an Open Source lightweight in streams data in real time. It allowed us to configure data flows as pipelines through a web UI in few minutes. Among its many features, it makes possible to view real-time statistics and inspect data as it passes through the pipeline
25+ years of experience in IT systems or applications development
15+ years of experience architecting or delivering large scale systems on multiple platforms, with a focus on Big Data Hadoop
Talend (4 years) utilized on several projects to simplify and automate big data integration with graphical tools and wizards that generate native code. This allowed the teams to start working with Apache Hadoop, Apache Spark, Spark Streaming and NoSQL databases right away. Talend Big Data Integration platform was utilized to deliver high-scale, in-memory fast data processing, as part of the Talend Data Fabric solution, so the project enterprise systems allowing more data into real-time decisions. It provided blazing fast speed and scale with Spark and Hadoop, it allowed for anyone to access and cleanse big data while governing its use and allowed for optimization of big data performance in the cloud on several project.
Graphics and statistics implementations with RStudio and R programming which is (a free and open source tool) utilized for the integrated development environment (IDE).
Agile development experience as development team leader.
Experience working in network operations center (NOC) administrators supervise, monitor and maintains work to maintain a telecommunications network
Extensive Data Warehousing (Teradata, DB2, Teradata, SQL Server, MySQL & Oracle including building/implementing)
Microsoft Azure Cloud Technologies (Dashboard & performance with web problem identification and drill-down capability) Hortonworks 2.0m AWS Eco-System Cloudera 5.7-5.10 CDH w/Cloudera Manager Cloudera Navigator 2.9 Extensive JAVA coding experience.
Azure Data Factory to manage Batch, HDInsight, Machine Learning Administration tools such as Apache Zookeeper, MapReduce, YARN Maintain Cluster Configuration Info Cassandra, Zookeeper Monitor Cluster Heartbeats Apache: Flume (log files), Oozie (schedule. Workflow), Sqoop (ingests data for relation DBs), Python (lang.), Scala (lang.), Java (lang.)
Dev Tools: Spark (Perf, w/Caching), HBase, Pig, Shell, MongoDB Analysis with: Drill (SQL), Hive (HQL), Mahout (Clustering, Classification, Collaborative filtering) Tableau (dashboard) & Talend (MDM, mapping & Datalinage)
MongoDB: One of the most popular document stores. It is a document oriented database. All data in Mongodb is treated in JSON/BSON format. It is a schema less database which goes over terabytes of data in database. It also supports master slave replication methods for making multiple copies of data over servers making the integration of data in certain types of applications easier and faster. MongoDB combines the best of relational databases with the innovations of NoSQL technologies, enabling engineers to build modern applications. MongoDB maintains the most valuable features of relational databases: strong consistency, expressive query language and secondary indexes. As a result, developers can build highly functional applications faster than NoSQL databases. MongoDB provides the data model flexibility, elastic scalability and high performance of NoSQL databases. As a result, engineers can continuously enhance applications, and deliver them at almost unlimited scale on commodity hardware. Full index support for high performance. Integrations & Migrations
Collaboration as a consultant with Teradata Professional Services
Collaboration as a consultant with IBM Professional Services
Advanced Analytical solutions - IBM, Teradata, HCL
PhD Business Psychology/ComSci Machine Learning and Artificial Intelligence… with excellent verbal and written communication and persuasion skills; able to collaborate and engage effectively with technical and non-technical resources, speaking the language of the business
Have proven experience solving complex problems in a multi-platform systems environment
Cloud/XaaS solutions
Demonstrated comprehensive expert knowledge and exceptional insight into the information technology industry
Expertise in application and information architecture / design artifacts and mechanisms
TOGAF or Zachman with practical experience in the use of these common Architecture frameworks
Experience HL Conceptual Models and the development, implementation, and management of Enterprise Data Models, Data Architecture Strategies, Delivery Roadmaps, Information Lifecycle Management, and Data Governance capabilities
PhD Psychology with minor in Computer Science
Encryption tools such as Protegrity and in depth understanding of security legislation that affects our businesses, including, but not limited to Sarbanes-Oxley, Payment Card Industry regulations, Customer Data Protection regulations and contemporary security legislation activities that may impact future plans
Significant experience with three or more of the following technologies: Teradata, Tableau, Cognos, Oracle, SAS, Hadoop, Hive, SQL Server, DB2, SSIS, Essbase, Microsoft Analysis Services

TECHNICAL EXPERIENCE:

Languages: Script, Scala, Python, JavaScript, Java, C++. Many others.

Operating Systems: Linux, BSD Unix variants, Macintosh, OpenVMS

Sign-on: LDAP/OpenLDAP

Linux: Debian, Linux Mint, Ubuntu, Red Hat RHEL, Fedora, CENTOS

Desktop GUI design: Java, GTK+/GNOME, QT/KDE Custom-tailored Linux kernels for; Alpha, PowerPC, Intel

OS configuration: filesystem layouts, packaging systems. Debian

Version control and build: CVS, subversion, GIT

Web API Development DEVOPS: Postman, REST, SOAP, JSON-RPC, and Repository Confluence

Parallel APIs: MPI, PVM (from C, FORTRAN, Python)

Threading: Pthreads from C and compiled languages, Python and Ruby threads.

Network protocols: TCP/IP (e.g. UDP, ARP, etc.), MIDI.

Primary Databases: MySQL, SQLObject, SQLAlchemy, Postgres SQL, Teradata, DB2, Oracle

Web Frameworks: Express (ExpressJS), Django, Flask

GUI Toolkits: Java/Swing, TK, GTK, GTK+, GLADE, GNOME, PyQt, QT/KDE, Wx

Amazon Web Services: AWS, EC2

PROFESSIONAL EXPERIENCE

Confidential

SENIOR AWS SOLUTION ARCHITECT

Responsibilities:

Architecting the AWS solution for the EDP/SSUP involving the on-prem (Oracle) to AWS Cloud for the companies work request processing. This included delivery of the 3 primary components IaaS, SaaS and PaaS for the new cloud solution architecture.
The architecture migration path included multiple source entities for ELT ingestion via AWS DMS processes to both S3 (raw data) and Aurora (raw data). Lambda was utilized for data cleansing and data validation. With messaging via AWS SNS for load notification.
Alation was additionally employed as the data catalog and metadata repository. Performance tuning on the Aurora PostgreSQL via index modifications yielded sub second latencies for the validation queries.
APIs were established from AWS Aurora to Microservices and apigee (I obtained an average of 435 ms latency for all APIs). Initial bulk loads with incremental and CDC implementation (I obtained a CDC latency of 5 seconds for table loads).
Devops established with EC2 instances and ECS/ECR and with dockers setup (all access authorities established for dev team… offshore) were implemented real-time (CICD) and Jenkins pipeline was utilized.
Guidelines were established for the S3 buckets to implement migration paths to production such as naming standards for S3 buckets. Developed the conceptual models, physical and logical utilizing Erwin. Use of Confluence as the documentation repository for the project. Functioned as the SCRUM Master/AWS Solution Architect for the team of 14 developers… offshore. Hands on development, governance, and business analysis.

Confidential

AWS Data Architect/Web Architecture

Responsibilities:

Worked as the Chief Data Architect at Confidential Cruise Lines . AWS utilized as primary platform PVC with extensive MongoDB. PYTHON CODING. Provided data governance, work flow diagrams, schema documentation and best practices as the SME for the development team at Royal.
Confluence was the primary repository for all documentation once deliverables completed and sign off by me as governance lead. Development was primarily done in Postman (including base documentation) and Soap. End points defined data for the landing page. Sources of input included AEM and MongoDB.
Managed the Devops team through development to Testing to QA and through final release. Established meetings with stakeholders at RCCL for reviews and signoffs on deliverables. Worked closely with the Srcum Master to coordinate Agile Sprint releases and RCCL expectations for our deliverables. Conducted meeting for user approvals and signoffs.

Confidential

AWS SUPPORT ARCHITECTURE/DESIGN

Responsibilities:

Problem resolution including monitoring and responding to cluster issues over 18 clusters ranging from 7-10 nodes. PYTHON CODING. These included development clusters, POC clusters, stage clusters, UAT clusters, utility clusters and production clusters. RANGER - Installation and administration for data security across the Hadoop platform.
Ranger can support a true data lake architecture. Extensive use of Ranger, a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. Resolution of issues such as authorizations to AWS S3 buckets, Hive insert/overwrite issues in production.
Resolution of issues such as authorizations to AWS S3 buckets, Hive insert/overwrite issues in production. EMR with Service Catalog deployments, identification of cluster version drift (to make sure versions of Hive UDF, Red Hat, etc. were at same level between clusters). Documentation to suggest methods to insure stability between clusters, documentation to identify all versioning across all clusters. Provides 24/7 support as the only Hadoop Admin for Vertex. Work with AMI for release verification via a shell script (sample script is available). Completed project and turned over to offshore Chania team.

Confidential

ARCHITECTURE/DESIGN/PROGRAMMING

Responsibilities:

Design and architecture of the state’s cloud data lake solution in AWS. Redshift, EMR and S3 buckets as primary tools for migration and integration. Also designed for the Cloudera data lake infrastructure design. The vision with Ranger was to provide a comprehensive service security across the Apache Hadoop ecosystem. PYTHON CODING.
With the advent of Apache YARN, the Hadoop platform can now support a true data lake architecture. I u tilized Redshift, EMR and S3 buckets as primary tools for migration and integration. Also designed for the Cloudera data lake infrastructure design.
Installation and administration of both AWS and Cloudera 5.11.1 for production and development clusters.
Involvement with audit reporting at the state and support for application programming including the installation and utilization of Eclipse and associated plugin packages SCALA, Python, Java, R. SQLSSIS Data
Audit & Compliance Project where I was the Architect, Designer, Analyst, Capacity Planning, Programmer and Production Implementer. 100+ terabytes of all state audit and compliance current and historical data from Oracle Exadata Databases to HDFS Hadoop files for data analysis and reporting. The CDH 5.11.1production cluster was established with 12 nodes and full services provided. This included analysis for the Teradata Connector in implementation of Sqoop1 and JDBC Drivers.
Utilized Copy2Hadoop (datapump file - for Oracle data type control) to JOIN the large audit tables and consolidate to one HIVE table. Also configured for real time updates from 12 additional tables.
Provided solutions for performance issues and resource utilization. Responsible for system operations, networking, operating systems, and storage and having a strong knowledge of computer hardware and operations, in the state’s complex network.
Amazon Redshift Cluster Management . Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. First began with a few hundred gigabytes of data and scaled to a petabytes. This enabled us to use the data to acquire new insights to provide better customer service.
At the state I provided:
Responsible for implementation and ongoing administration of Hadoop infrastructure.
Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios, Cloudera Manager Enterprise, Dell Open Manage and other tools.
Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
Screen Hadoop cluster job performances and capacity planning
Monitor Hadoop cluster connectivity and security
Manage and review Hadoop log files.
File system management and monitoring.
Ingest SQL Server SSIS
HDFS support and maintenance.
Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
Point of Contact for Vendor escalation
RedShift 2+ years at Chemical Abstract Services & State of MN
Utilized Amazon Redshift cluster. It is a set of nodes, which consists of a leader node and one or more compute nodes. The type and number of compute nodes that you need depends on the size of your data, the number of queries you will execute, and the query execution performance that you need.
EMR Elastic Map Reduce 2+ years
Utilized EMR provide a managed Hadoop framework that made easy at Confidential and State of MN, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances. We ran distributed frameworks such as Apache Spark, HBase, Presto, and Flunk in Amazon EMR, and interact with data in other AWS data S3 stores.
S3 Buckets, Simple Storage Service at Confidential and Sate of MN
Utilized S3 bucket s to provide for comprehensive security and compliance capabilities that meet even the most stringent regulatory requirements, manage cost. It allowed us to run powerful analytics directly on data at rest in S3.

Confidential

SENIOR BIG DATA AWS/CLOUDERA ADMIN/SECURITY

Responsibilities:

AWS: Design and architecture of the Confidential ’s cloud data lake solution in AWS Use of AWS with some Lambda and Amazon S3
RANGER - data security across the Hadoop platform. Utilized Redshift, EMR and S3 buckets Cloudera administrator version 4-5.10 and Kerberos 2.0 security administrator at Confidential working with a small team of Hadoop administrators.
TALEND MDN for data linage and master data management
Extensive SPSS Impala/KUDU administration. Python CODING. I mentored and assisted the team with Cloudera administration and Cloudera Navigator. Cloud installation utilizing Cloudera Director with AWS provider. Performance and Tuning: Assisted with establishment of queue architecture through the Fair Scheduler.
Tuning MapReduce jobs for enhanced throughput (Java Heap Adjustments) Block Size Adjustments Spark Performance Adjustments Ingest SQL Server SSIS
I assisted with setup and administration of Kerberos to allow trusted, secure communications between trusted entities. Hadoop Security, Kerberos & Sentry Together: For Hadoop operators in finance, government, healthcare, and other highly-regulated industries to enable access to sensitive data under proper compliance, each of the four functional requirements must be achieved: They are: Perimeter Security
Guarding access to the cluster through network security, firewalls, and, ultimately, authentication to confirm user identities Data Security: Protecting the data in the cluster from unauthorized visibility through masking and encryption, both at rest and in transit Access Security: Defining what authenticated users and applications can do with the data in the cluster through filesystem ACLs and fine-grained authorization Visibility
Reporting on the origins of data and on data usage through centralized auditing and lineage capabilities Requirements 1 and 2 are now addressed through Kerberos authentication, encryption, and masking. Cloudera Navigator supports requirement 4 via centralized auditing for files, records, and metadata. But Requirement 3, for access security, had been largely unaddressed, until Sentry.

Confidential, Foster City CA

Senior AWS/AZURE Data Lake Architect, Onsite

Responsibilities:

Lead Architect for the AWS CLOUD ARCHITECTURE & DATA MODEL at Confidential which provided for a framework to assessment the HADOOP Solution Architecture upgrade as a staging area repository for unstructured, semi structured and structured data. Installation and 5 node cluster developed on BOTH AWS & Hortonworks (Azure Microsoft). SPSS Data Scientist calculations for Multi-variant Linear Regression analysis.
This work was done with the primary target in mind and with regard to the medical model which has been specifically concerned with issues regarding signal refinement positive results. “Positive results” are defined to be when an association is detected between a medical product and an adverse outcome that exceeds a pre-specified threshold in the direction of increased risk.
Talend utilized on several projects to simplify and automate big data integration with graphical tools and wizards that generate native code. This allowed the teams to start working with Apache Hadoop, Apache Spark, Spark Streaming and NoSQL databases right away. Talend Big Data Integration platform was utilized to deliver high-scale, in-memory fast data processing, as part of the Talend Data Fabric solution, so the project enterprise systems allowing more data into real-time decisions.
Selection bias is a distortion in an effect estimate due to the manner in which the study sample is selected from the source population. To avoid case selection bias, the cases (outcomes) that contributed to a safety signal must represent cases in the source population.

Confidential

AWS BIG DATA, DATA LAKE Analysis/Architecture

Responsibilities:

Data Lake work included development with AWS and completed on a 9 node clustered Data Lake architecture. Primarily unstructured and semi-structured data with utilization of Sqoop, MongoDB, Spark (Hive, Python & Java), Flume, Cloudera Search & Talend as the MDM repository and Apache Sentry for authorization for Impala and Hive access.
Lead Hadoop Architect for the de-normalization project at Optum Corporation which involved the simplification of 3rd normal form tables to enhance performance and usability for the end user business community.
Extensive consideration was given to Hadoop as the staging area repository for ingesting of source data. The thought was that this data could then me identified and used for marketing analysis. Additionally of interest was logging information which might potentially be mined to determine better monitoring of issues related to anomaly’s in the data. Erwin was a primary tool used for the de-normalization/simplification project.
Both Logical Data Models (LDM) and Physical Data Model (PDM) were generated in all platforms, through development, to UAT and finally to production. Use of the ALM, Application Lifecycle Management tool greatly assisted in the reporting and tracking of the project fixes as required and the Rally tool allowed for tracking and timely reporting of the deliverable products to the business.
Involved the business users at all points of decision making and signoff processes. Projects were delivered on time and in budget. MongoDB: One of the most popular document stores. It is a document oriented database.
All data in Mongodb is treated in JSON/BSON format. It is a schema less database which goes over terabytes of data in database. It also supports master slave replication methods for making multiple copies of data over servers making the integration of data in certain types of applications easier and faster.

Confidential, Newport Beach Ca

Big Data Architect/Senior Modeler

Responsibilities:

AWS at Boeing Space and Security. They were in need of skilled modelers, data architects and HADOOP/Oracle, skilled implementers to transition systems from Oracle and other System of Record (SOR) data to a Data Lake work included development with Cloudera CDH and completed on a 6 node clustered Data Lake architecture. PYTHON CODING. Ingested unstructured and semi-structured data with utilization of Sqoop, HBase, Spark (Hive, Python & Java), Flume and Talend platform in the cloud. Security for the Data Lake via Apache Sentry.
This implementation required the interface with end and business units to migrate data and data attributes to the newly modeled enterprise architecture. This effort has involved extensive user interface to determine the correct mappings for attributes and their datatypes with metadata information passed to the new staging areas and on to the base, 3thrd Normal form table architecture.
This activity consisted of the interfaces, email and formal meetings to establish the correct linage of data through its initial attribute discovery level and on through the Agile development process to insure data integrity. As funding “went south…literal” at Boeing the work was concluder and final turnover/meetings took place.

Confidential

Senior Teradata/Big Data Architect

Responsibilities:

This project utilized AWS and was an extensive evaluation of HADOOP systems and infrastructure at Freescale (FSL), providing detailed evaluation and recommendations to the modeling environment and to the current modeling architecture at FSL and for the EDW. Of concern on this project was the scalability and reliability of the daily operations and it is noted that these issues were among the most significant requirements, along with data quality (directly from originating source) and capability for very high performance which is accomplished with the MapR distribution for Hadoop.
Additionally we investigated Cloudera CDH 5.6 with a POC completed on a 2 node clustered Data Lake architecture as a POC for Freescale. Ingested unstructured and semi-structured data with utilization of Sqoop, Spark (Hive, Python & Java), and Flume and Talend platform in the cloud.
TALEND administration for big data Data Lake.
PYTHON CODING
Rather than employ HBase, the authentication system uses MapR-DB, a NoSQL database that supports the HBase API and is part of MapR. Strict availability requirements, provide robustness in the face of machine failure, and operate across multiple datacenters and deliver sub-second performance. .
Provided extensive executive level reporting regarding findings and recommendations. implementation and evaluated additional tools such as PDCR, MDS, MDM, ANTANASUITE, APPFLUENT and HADOOP 14.10 functions and features and migrated from Erwin & ModelMart v7.2 to 8.2 and then finally to v9.5. I functioned as the lead consultant for the 6 month effort at FSL assuming responsibility for delivery and executive meeting status delivery regarding all aspects of the project.
Provided numerous power point presentations including the delivery of the “score card” evaluation of the “as is” ongoing modeling, DBA and support activities at FSL. Identified areas to improve upon especially in the modeling area and rendered assistance with the BI semantic layer performance tuning effort and the MDS glossary deliverable for metadata. Designed and assisted with the developed of the executive dashboard reporting process.
Recommended and provided information regarding three new primary tools at FSL, Appfluent, Antanasuite and MDS/MDM (HADOOP). These tools were recommended as part of the agile improvement process to increase productivity and ROI estimated at yielding a 73% overall realized benefit.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship