SENIOR AWS & AZURE SOLUTION ARCHITECT Resume

SUMMARY:

8+ years of data architecture/admin experience
3+ years of experience working with AWS DMS, S3, GLUE, KINESIS, LAMBDA, ATHENA, EMR, SAGE MAKER, Redshift, RDS
5+ years on Big Data (Hadoop, Spark, Java, Scala, Python)
7+ years on Metadata Management, Data Governance, quality and security.
2+ years of Python development experience
3+ years of Redshift experience. Understanding of relational data models is a must
3+ years of experience working with big data architectures in high - volume environments
Extensive experience building and managing ETL pipelines on cloud based platforms from inception to production rollout
I utilized Ansible, Tower Red Hat, to scale automation, manage complex deployments and speed up teh productivity at teh client site for CAS. I further used to extend teh power of teh workflows process to streamline jobs and simple tools to share solutions with teh CAS team. With Ansible, IT we were able to free admins from automating away teh drudgery from teh daily tasks. dis Automation freed admins up to focus on efforts that halp deliver more value to teh business by speeding time to application delivery, and building on a culture of success. I was able to give teams teh one thing they can never get enough of: time. Allowing smart people to focus on smart things.
I used StreamSets Data Collector (SDC) is an Open Source lightweight in streams data in real time. It allowed us to configure data flows as pipelines through a web UI in few minutes. Among its many features, it makes possible to view real-time statistics and inspect data as it passes through teh pipeline
25+ years of experience in IT systems or applications development
15+ years of experience architecting or delivering large scale systems on multiple platforms, with a focus on Big Data Hadoop

TECHNICAL EXPERIENCE:

Languages: Script, Scala, Python, JavaScript, Java, C++. Many others.

Operating Systems: Linux, BSD Unix variants, Macintosh, OpenVMS

Sign-on: LDAP/OpenLDAP

Linux: Debian, Linux Mint, Ubuntu, Red Hat RHEL, Fedora, CENTOS

Desktop GUI design: Java, GTK+/GNOME, QT/KDE Custom-tailored Linux kernels for; Alpha, PowerPC, Intel

OS configuration filesystem layouts, packaging systems. Debian

Version control and build: CVS, subversion, GIT

Web API Development DEVOPS: Postman, REST, SOAP, JSON-RPC, and Repository Confluence

Parallel APIs: MPI, PVM (from C, FORTRAN, Python)

Threading: Pthreads from C and compiled languages, Python and Ruby threads.

Network protocols: TCP/IP (e.g. UDP, ARP, etc.), MIDI.

Primary Databases: MySQL, SQLObject, SQLAlchemy, Postgres SQL, Teradata, DB2, Oracle

Web Frameworks: Express (ExpressJS), Django, Flask

GUI Toolkits: Java/Swing, TK, GTK, GTK+, GLADE, GNOME, PyQt, QT/KDE, Wx

Amazon Web Services: AWS, EC2

PROFESSIONAL EXPERIENCE:

Confidential

SENIOR AWS & AZURE SOLUTION ARCHITECT

Responsibilities:

Responsible for architecting teh AWS solution for teh EDP/SSUP involving teh on-prem (Oracle) to AWS Cloud for teh companies work request processing.
dis included delivery of teh 3 primary components IaaS, SaaS and PaaS for teh new cloud solution architecture.
Teh architecture migration path included multiple source entities for ELT ingestion via AWS DMS processes to both S3 (raw data) and Aurora (raw data). Lambda was utilized for data cleansing and data validation.
With messaging via AWS SNS for load notification. Alation was additionally employed as teh data catalog and metadata repository. Performance tuning on teh Aurora PostgreSQL via index modifications yielded sub second latencies for teh validation queries. APIs were established from AWS Aurora to Microservices and apigee (I obtained an average of 435 ms latency for all APIs).
Initial bulk loads with incremental and CDC implementation (I obtained a CDC latency of 5 seconds for table loads).
Devops established with EC2 instances and ECS/ECR and with dockers setup (all access authorities established for dev team… offshore) were implemented real-time (CICD) and Jenkins pipeline was utilized. Guidelines were established for teh S3 buckets to implement migration paths to production such as naming standards for S3 buckets.
Developed teh conceptual models, physical and logical utilizing Erwin. Use of Confluence as teh documentation repository for teh project. Functioned as teh SCRUM Master/AWS Solution Architect for teh team of 14 developers… offshore. Hands on development, governance, and business analysis.

Confidential

AWS & AZURE Solution Architect - Web Architecture

Responsibilities:

Worked as teh Chief Data Architect at Confidential Cruise Lines (RCCL). AWS utilized as primary platform PVC with extensive MongoDB. PYTHON CODING. Provided data governance, work flow diagrams, schema documentation and best practices as teh SME for teh development team at Royal. Confluence was teh primary repository for all documentation once deliverables completed and sign off by me as governance lead. Development was primarily done in Postman (including base documentation) and Soap. End points defined data for teh landing page. Sources of input included AEM and MongoDB. Managed teh Devops team through development to Testing to QA and through final release. Established meetings with stakeholders at RCCL for reviews and signoffs on deliverables. Worked closely with teh Srcum Master to coordinate Agile Sprint releases and RCCL expectations for our deliverables. Conducted meeting for user approvals and signoffs.

Confidential

SOLUTION ARCHITECTURE & DESIGN

Responsibilities:

Problem resolution including monitoring and responding to cluster issues over 18 clusters ranging from 7-10 nodes.
PYTHON CODING. These included development clusters, POC clusters, stage clusters, UAT clusters, utility clusters and production clusters. RANGER - Installation and administration for data security across teh Hadoop platform. Ranger can support a true data lake architecture.
Extensive use of Ranger, a framework to enable, monitor and manage comprehensive data security across teh Hadoop platform. Resolution of issues such as authorizations to AWS S3 buckets, Hive insert/overwrite issues in production. Resolution of issues such as authorizations to AWS S3 buckets, Hive insert/overwrite issues in production. EMR with Service Catalog deployments, identification of cluster version drift (to make sure versions of Hive UDF, Red Hat, etc. were at same level between clusters).
Documentation to suggest methods to insure stability between clusters, documentation to identify all versioning across all clusters. Provides 24/7 support as teh only Hadoop Admin for Confidential . Work with AMI for release verification via a shell script (sample script is available). Completed project and turned over to offshore Chania team.

Confidential

SOLUTION ARCHITECTURE, DESIGN, PROGRAMMING

Responsibilities:

Audit & Compliance Project where I was teh Architect, Designer, Analyst, Capacity Planning, Programmer and Production Implementer. 100+ terabytes of all state audit and compliance current and historical data from Oracle Exadata Databases to HDFS Hadoop files for data analysis and reporting. Teh CDH 5.11.1production cluster was established with 12 nodes and full services provided. dis included analysis for teh Teradata Connector in implementation of Sqoop1 and JDBC Drivers. Utilized Copy2Hadoop (datapump file - for Oracle data type control) to JOIN teh large audit tables and consolidate to one HIVE table. Also configured for real time updates from 12 additional tables.
Provided solutions for performance issues and resource utilization. Responsible for system operations, networking, operating systems, and storage and having a strong knowledge of computer hardware and operations, in teh state’s complex network.
Amazon Redshift Cluster Management . Amazon Redshift is a fully managed, petabyte-scale data warehouse service in teh cloud. First began with a few hundred gigabytes of data and scaled to a petabytes. dis enabled us to use teh data to acquire new insights to provide better customer service.
At teh state I provided:
Responsible for implementation and ongoing administration of Hadoop infrastructure.
Aligning with teh systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
Working with data delivery teams to setup new Hadoop users. dis job includes setting up Linux users, setting up Kerberos TEMPprincipals and testing HDFS, Hive, Pig and MapReduce access for teh new users.
Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios, Cloudera Manager Enterprise, Dell Open Manage and other tools.
Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
Screen Hadoop cluster job performances and capacity planning
Monitor Hadoop cluster connectivity and security
Manage and review Hadoop log files.
File system management and monitoring.
Ingest SQL Server SSIS
HDFS support and maintenance.
Diligently teaming with teh infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
Point of Contact for Vendor escalation
RedShift 2+ years at Chemical Abstract Services & State of MN
Utilized Amazon Redshift cluster. It is a set of nodes, which consists of a leader node and one or more compute nodes. Teh type and number of compute nodes that you need depends on teh size of you're data, teh number of queries you will execute, and teh query execution performance that you need.
EMR Elastic Map Reduce 2+ years
Utilized EMR provide a managed Hadoop framework that made easy at CAS and State of MN, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances. We ran distributed frameworks such as Apache Spark, HBase, Presto, and Flunk in Amazon EMR, and interact with data in other AWS data S3 stores.
S3 Buckets, Simple Storage Service at CAS and Sate of MN
Utilized S3 bucket s to provide for comprehensive security and compliance capabilities that meet even teh most stringent regulatory requirements, manage cost. It allowed us to run powerful analytics directly on data at rest in S3.

Confidential, Onsite Columbus, OH

SOLUTION ARCHITECTURE

Responsibilities:

AWS: Design and architecture of teh CAS’s cloud data lake solution in AWS
Use of AWS with some Lambda and Amazon S3
RANGER - data security across teh Hadoop platform.
Utilized Redshift, EMR and S3 buckets
Cloudera administrator version 4-5.10 and Kerberos 2.0 security administrator at CAS working with a small team of Hadoop administrators.
TALEND MDN for data linage and master data management
Extensive SPSS
Impala/KUDU administration.
Python CODING.
I mentored and assisted teh team with Cloudera administration and Cloudera Navigator.
Cloud installation utilizing Cloudera Director with AWS provider.
Performance and Tuning:
Assisted with establishment of queue architecture through teh Fair Scheduler.
Tuning MapReduce jobs for enhanced throughput (Java Heap Adjustments)
Block Size Adjustments
Spark Performance Adjustments
Ingest SQL Server SSI

Confidential, Foster City CA

SOLUTION ARCHITECTURE

Responsibilities:

Lead Architect for teh AWS CLOUD ARCHITECTURE & DATA MODEL at Confidential which provided for a framework to assessment teh HADOOP Solution Architecture upgrade as a staging area repository for unstructured, semi structured and structured data. Installation and 5 node cluster developed on BOTH AWS & Hortonworks (Azure Microsoft). SPSS Data Scientist calculations for Multi-variant Linear Regression analysis. dis work was done with teh primary target in mind and with regard to teh medical model which TEMPhas been specifically concerned with issues regarding signal refinement positive results. “Positive results” are defined to be when an association is detected between a medical product and an adverse outcome that exceeds a pre-specified threshold in teh direction of increased risk.
Talend utilized on several projects to simplify and automate big data integration with graphical tools and wizards that generate native code. dis allowed teh teams to start working with Apache Hadoop, Apache Spark, Spark Streaming and NoSQL databases right away. Talend Big Data Integration platform was utilized to deliver high-scale, in-memory fast data processing, as part of teh Talend Data Fabric solution, so teh project enterprise systems allowing more data into real-time decisions.
Selection bias is a distortion in an effect estimate due to teh manner in which teh study sample is selected from teh source population. To avoid case selection bias, teh cases (outcomes) that contributed to a safety signal must represent cases in teh source population.

Confidential

AWS & AZURE DATA LAKE Analysis/Architecture

Responsibilities:

Data Lake work included development with AWS and completed on a 9 node clustered Data Lake architecture. Primarily unstructured and semi-structured data with utilization of Sqoop, MongoDB, Spark (Hive, Python & Java), Flume, Cloudera Search & Talend as teh MDM repository and Apache Sentry for authorization for Impala and Hive access.
Lead Hadoop Architect for teh de-normalization project at Optum Corporation which involved teh simplification of 3rd normal form tables to enhance performance and usability for teh end user business community. Extensive consideration was given to Hadoop as teh staging area repository for ingesting of source data. Teh thought was that dis data could tan me identified and used for marketing analysis.
Additionally of interest was logging information which might potentially be mined to determine better monitoring of issues related to anomaly’s in teh data. Erwin was a primary tool used for teh de-normalization/simplification project. Both Logical Data Models (LDM) and Physical Data Model (PDM) were generated in all platforms, through development, to UAT and finally to production. Use of teh ALM, Application Lifecycle Management tool greatly assisted in teh reporting and tracking of teh project fixes as required and teh Rally tool allowed for tracking and timely reporting of teh deliverable products to teh business. Involved teh business users at all points of decision making and signoff processes. Projects were delivered on time and in budget. MongoDB:
One of teh most popular document stores. It is a document oriented database. All data in Mongodb is treated in JSON/BSON format. It is a schema less database which goes over terabytes of data in database. It also supports master slave replication methods for making multiple copies of data over servers making teh integration of data in certain types of applications easier and faster.

Confidential

Solution Architect, Senior Modeler,

Responsibilities:

AWS at Boeing Space and Security. They were in need of skilled modelers, data architects and HADOOP/Oracle, skilled implementers to transition systems from Oracle and other System of Record (SOR) data to a Data Lake work included development with Cloudera CDH and completed on a 6 node clustered Data Lake architecture. PYTHON CODING. Ingested unstructured and semi-structured data with utilization of Sqoop, HBase, Spark (Hive, Python & Java), Flume and Talend platform in teh cloud. Security for teh Data Lake via Apache Sentry. dis implementation required teh interface with end and business units to migrate data and data attributes to teh newly modeled enterprise architecture. dis effort TEMPhas involved extensive user interface to determine teh correct mappings for attributes and their datatypes with metadata information passed to teh new staging areas and on to teh base, 3thrd Normal form table architecture.
dis activity consisted of teh interfaces, email and formal meetings to establish teh correct linage of data through its initial attribute discovery level and on through teh Agile development process to insure data integrity. As funding “went south…literal” at Boeing teh work was concluder and final turnover/meetings took place.

Confidential

Solution Architecture

Responsibilities:

dis project utilized AWS and was an extensive evaluation of HADOOP systems and infrastructure at Freescale (FSL), providing detailed evaluation and recommendations to teh modeling environment and to teh current modeling architecture at FSL and for teh EDW. Of concern on dis project was teh scalability and reliability of teh daily operations and it is noted that these issues were among teh most significant requirements, along with data quality (directly from originating source) and capability for very high performance which is accomplished with teh MapR distribution for Hadoop.
Additionally we investigated Cloudera CDH 5.6 with a POC completed on a 2 node clustered Data Lake architecture as a POC for Freescale. Ingested unstructured and semi-structured data with utilization of Sqoop, Spark (Hive, Python & Java), and Flume and Talend platform in teh cloud.
TALEND administration for big data Data Lake.
PYTHON CODING
Rather than employ HBase, teh autantication system uses MapR-DB, a NoSQL database that supports teh HBase API and is part of MapR. Strict availability requirements, provide robustness in teh face of machine failure, and operate across multiple datacenters and deliver sub-second performance. .
Provided extensive executive level reporting regarding findings and recommendations. implementation and evaluated additional tools such as PDCR, MDS, MDM, ANTANASUITE, APPFLUENT and HADOOP 14.10 functions and features and migrated from Erwin & ModelMart v7.2 to 8.2 and tan finally to v9.5. I functioned as teh lead consultant for teh 6 month effort at FSL assuming responsibility for delivery and executive meeting status delivery regarding all aspects of teh project.
Provided numerous power point presentations including teh delivery of teh “score card” evaluation of teh “as is” ongoing modeling, DBA and support activities at FSL. Identified areas to improve upon especially in teh modeling area and rendered assistance with teh BI semantic layer performance tuning effort and teh MDS glossary deliverable for metadata. Designed and assisted with teh developed of teh executive dashboard reporting process.
Recommended and provided information regarding three new primary tools at FSL, Appfluent, Antanasuite and MDS/MDM (HADOOP). These tools were recommended as part of teh agile improvement process to increase productivity and ROI estimated at yielding a 73% overall realized benefit.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship