Sr. Big Data Architect Resume
Seattle, WA
SUMMARY:
Enterprise IT Professional with 16+ Years of Industry Experience ~ Collaborative data engineer with extensive hands - on experience in developing and deploying robust and scalable cloud-based/on premise data analytics systems involving large scale CDI, MDM, MPP, data integration/migration and storage, data warehousing, real-time data analytics and reporting. With sound business and technical acumen is able to research, explore and leverage Big Data technologies (HDFS, Hadoop or Spark cluster, NLP/Text Mining or Machine Learning) when existing enterprise frameworks are not sufficient to effectively process and analyze terabytes of structured and unstructured data. Excellence in deriving value from data using primarily open source technologies like Apache Spark, Kafka and Python mixed with my passion for sharing knowledge has led me to seminars to speak on behalf of Spark + AI Summit and lead workshops around the country for organizations like PyData.
TECHNICAL SKILLS
- Expertise in developing OLTP, OLAP, and Data Warehouse systems (using 3NF, Multidimensional Star and nowflake schemas designs), NoSQL databases, MDM and Data Governance, Data Mining, statistical and historical/predictive Data Analytics Solutions.
- Strong Data modeling (Logical and Physical) experience and high proficiency in using Erwin, ER/Studio, and Entity-Relationship Modeling with in-depth understanding of business applications, dataflow and the use of stored procedures/triggers in ETL development tasks.
- Strong understanding of data structure/model classifications, data wrangling and cleansing.
- Extensive hands on experience in APIs, Docker, Cloud and Virtualization technologies.
- Have hands-on experience in implementing Big Data solutions using Hadoop/Storm and Spark; loading data using Flume and Sqoop, analyzing big data using python, Pig, Impala/Hive SQL editor in Hue.
- Have hands-on experience in SaaS, PaaS, IaaS, DaaS, SDLC (Agile/SCRUM framework - sprint planning, daily SRUM meeting, product/sprint backlog, sprint review, and sprint restrospective).
- Have strong communication and presentation skills, with experience in translating complex business requirements into detailed functional and/or technical specifications.
PROFESSIONAL EXPERIENCE
Confidential Seattle, WA
SR. Big Data Architect
- Leveraged Microsoft Azure, Amazon Web Services and Rally Agile/SRUM in
- Leverage AWS, Informatica Cloud, Snowflake Data Warehouse, HashicorpPlatform, AutoSys, and Rally Agile/SRUM to implement Data Lake, EnterpriseData Warehouse, and advanced data analytics solutions based on datacollection and integration from multiple sources (Salesforce, Salesconnect, S3,SQL Server, Oracle, NoSQL and Mainframe systems).
- Lead the architectural design and development of highly scalable and
- Leverage AWS, Informatica Cloud, Snowflake Data Warehouse, Hashi corp Platform, AutoSys, and Rally Agile/SRUM to implement Data Lake, Enterprise Data Warehouse, and advanced data analytics solutions based on data collection and integration from multiple sources (Salesforce, Salesconnect, S3, SQL Server, Oracle, NoSQL and Mainframe systems).
- Lead the architectural design and development of highly scalable and optimized data models, Data Marts, Snowflake Data Warehouse, Data Lineage, Metadata repository and using Jenkins, Vagrant, Vault, GitHub/Git-Bash Enterprise and Terraform as IaS to provision cloud infrastructure and security.
- Implement AWS Data Lake leveraging S3, terraform, vagrant/vault, EC2, Lambda, VPC, and IAM in performing data processing and storage while writing complex SQL queries, analytical and aggregate functions on views in Snowflake data warehouse to develop near real time visualization using Tableau Desktop/Server 10.4 and Alteryx.
- Perform data masking and ETL process using S3, Informatica cloud, Informatica Power Center and Informatica Test Data Management to support Snowflake Data warehousing solution in the cloud.
- Query large data sets on Snowflake Data warehouse and distributive file systems using SQL/SparkSQL and manage JSON, XML, Parquet, CSV data format to ingestion into Snowflake Data Warehouse before perform advanced data analytics/visualization using Tableau.
- Provide guidance and support in data masking/mapping tasks, data profiling, data quality checks, data standardization and metadata capture using Informatica Data Quality and Test Data Management.
- Use Informatica Designer, Workflow Manager and Repository Manager to create source and target definition, design mappings, create repositories for Test Data Management process.
- Provide guidance and support in sharing data between legacy data sources and cloud based enterprise relational/NoSQL systems.
Confidential Seattle, WA
Big Data Architect
- Leveraged Microsoft Azure, Amazon Web Services and Rally Agile/SRUM in developing prototypes to build Data Lake and real-time big data analytics solutions based on data streaming and data feeds from broad range of customer-facing devices, web/mobile applications and back-end Java based systems; analyze and integrate data streams in-motion with back-office systems, processing big data using Spark and Hadoop (HDInsight/EMR) cluster, and developed machine-learning solutions in Azure Machine Learning (AML) studio and Spark MLlib.
- Lead architect in prototyping, deploying and implementing Spark data lake cluster to support big data processing solution while leveraging AWS services, Kafka, NiFi, Amberi, Zeppelin, HDFS, ElasticSearch/Kibanna, Hive, Metadata Repository platform (Apache Atlas), and Spark Components ( SparkSQL, Spark Streaming, SparkML) in a Test-Driven Development (TDD), Continuous Integration (CI), and Continuous Delivery (CD) environment.
- Led the architectural design and development of highly scalable and optimized Mobile solutions and Clinical Workflow Applications including management of on-site, off-site and off-shore team members and developers of Python/Java/C# applications using Jenkins and Docker.
- Lead architect in building enterprise big data solutions from multiple data sources made up of 12 data repositories and 6 databases using S3, EMR, Data Pipeline, EC2 and Redshift.
- Query large data sets on HDFS/distributive file systems using HiveQL/SparkSQL and managed JSON, XML, Parquet, CSV data conversion, integration, transfer and ingestion into ElasticSearch, Hbase, MongoDB, and Cassandra using Kafka/Sqoop and across Oracle 11g, MySQL and SQL Server2012 to perform real time data analytics using Power BI and Kibana.
- Used AML service and Cortana Analytics to create and deploy cloud-based predictive analytics models that learn from existing/current data in order to forecast future outcomes and trends.
Confidential Seattle & Bay Area
Financial/Data Analyst
- Leveraged Global Banking application to analyze the risk factor involved in a loan and the possibility of default on a loan to help the credit administrator make a reasonable decision on approving the loan and determining the rate of interest, or even deny lending of loan.
- Collected all loan applicants’ information and performed analysis on the credit history of the loan applicants and making use of statistics in assessing risks associated with lending loans.
- Reviewed and synthesized large amounts of financial data from Core Banking and Financial Banking System Applications to produce reports for management and decision-makers.
- Performed data extraction and manipulation using Microsoft SQL Server Management Studio 2005/2008 and Microsoft Excel to create and manipulate spreadsheets with intensive use of financial and statistical functions, v-look-up, pivot tables, charts and graphing.
- Performed a lead role in the re-engineering of SQL Server 2008 OLTP/OLAP operations into Microsoft Access with use of linked tables, complex queries, and macros in implementing database tasks automation; importing and exporting data, and using various tools to manipulate and present data in a comprehensible format.
- Used MS Expression Builder and functions such as string functions, mathematical functions, date functions, logical functions, aggregate and group by functions to extract and manipulate data.
- Created new MS Access databases and also made major modifications to the existing databases and database's user interface forms and corrected data integrity and constraints issues.