Sr Big Data Architect Resume
Austin, TX
SUMMARY
- Enterprise information management professional with expertise in Leading enterprise data analytics initiatives and data capabilities.
- Proficient in creating and architecting data requirements for a wide variety of data domains across enterprise.
- Demonstrated ability to synthesize and translate complex business requirements into accurate, flexible solutions and data products with the ability to communicate results effectively to a diverse audience.
TECHNICAL SKILLS
Cloud: AWS Big data technologies ( S3,Glue, Data pipeline,Atana,Redshift,Kinesis… etc ) Data bricks, Snowflake, ADF
Big data: Spark,Hadoop
ETL Tools: Informatica, AB Initio
Meta Data: ER studio, Erwin
Reporting: Tableau, Excel, Informatica Analyst & IDQ, AB Initio Data Quality
Database: Oracle, DB2, Teradata,snowflake
Programming: Python, core Java
PROFESSIONAL EXPERIENCE
Confidential, Austin, TX
Sr Big Data Architect
Responsibilities:
- Enterprise Information Management roadmap and strategy initiative projects discussions.
- Help the product management team to execute enterprise data Analytics projects.
- Architect and solutioning Viable Products (MVP) with Analytics Product Owner Advisory.
- POC Development for Enterprise Information Management (EIM) initiatives
- Confidential Current State Information Management Assessment.
- Enterprise Data Quality and Data Stewardship Models for EIM projects.
- Responsible for design and implementation of enterprise Data Lake for effective reporting and analytics.
- Experienced with batch processing of data sources using Apache Spark and Elastic search.
- Developed Spark core and Spark SQL scripts using Scala for faster data processing.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis.
- Worked closely with Elastic Search team to eliminate the data Ingestion issues by implementing the Parquet files and enabling the broadcast joins to fulfill the requirement.
- Migrated Hive QL queries on structured into Spark QL to improve performance
- Developing predictive analytic using Apache Spark Scala APIs.
- Worked on creating azure data factory(ADF) for enterprise data management for transforming, enriching and managing enterprise data .
- Lead Data Quality and metadata management programs for different data projects across Confidential .
- Analyzed and prepared gap analysis for Confidential current state and future state systems architecture for better data analytics solution.
- Use Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as storage mechanism.
- Leading Data Engineering teams for effective delivery of data products for enterprise .
- Created spark and python data management platform for data quality and Analytics using data bricks
- Worked with master data and reference data intermediate and future state solutions from discovery phase till implementation .
- Revised/Modernized different data management platforms to align with enterprise data strategy
Environment: Spark,Amazon Web Services (AWS), Informatica, Talend, HIVE,Oracle,Python, Tableau, Visio,Linux
Confidential, Baltimore, MD
Big Data Architect
Responsibilities:
- Co - authored enterprise data architecture strategy
- Deep requirements gathering and analysis with data owners and integration teams
- KPI identification and designed/modeled the data requirements
- Developed Metadata Standards for the assigned project
- Led Data Quality Assessments for better metadata management Spark and python
- Utilized Spark SQL to extract and process data by parsing using Datasets or RDDs in Hive Context, with transformations and actions (map, flat Map, filter, reduce, reduceByKey).
- Worked on solutioning and engineer enterprise data management using big data tools
Environment: s & Toolsets: Informatica Suite, teradata, AWS, Spark, Python, Tableau
Confidential, Atlanta GA
Big Data
Responsibilities:
- Responsible for designing big data management analytics platforms for enterprise data and analytics.
- Responsible for designing data pipelines using ETL for effective data ingestion from existing data management platforms to enterprise Data Lake.
- Responsible for bringing data into dtaalake from various heterogeneous sources using ETL utilities
- Involved in designing different data zones within a lake for the enablement of various enterprise data capabilities for effective data management and analytics.
- Developed strategy and road map for efficient big data quality program.
- Developed road map for analyzing structured, semi structured and unstructured data requirements for enterprise data ingestion.
- Involved in data lineage and data reconciliation for movement of data across different zones within a lake.
- Involved in design and implementation of Change data capture (CDC) and technical data quality (TDQ) solution for Data Lake.
Environment: Hadoop, Ab Initio GDE, Tableau, HIVE, Linux.
Confidential, FranklinLakhs NJ
Data Architect
Responsibilities:
- Responsible for data migration (loading business data into production) and streamlining the process using ETL utilities.
- Responsible for running every day incremental and delta load for critical business data to the enterprise data hub.
- Lead the production support team for resolving data issues and anomalies after every incremental load.
- Responsible for reporting data load reports to the business owners.
- Designed and streamlined Process Methodologies for different data anomaly categories starting form data issue Identification to resolution using in-house ETL tool.
- Involved in enhancing the existing ETL data pipeline for better data migration with reduced data issues.
- Responsible for reverse engineering source data using metadata management tools for better data migration and integration.
- Applied SQL tuning/database performance optimization techniques for improved query performance.
Environment: Ab Initio GDE, DB2, Oracle, Teradata, UNIX.
Confidential, Irving, TX
Sr MDM Data Analyst
Responsibilities:
- Worked on CITIGroup Enterprise Master Data management solution(MDM) - (Customer Management Repository) 360-degree view of CITI customer demographics data.
- Responsible for identification of data domains for Effective MDM Technology Implementation.
- Responsible for defining data standards and data elements mapping from multiple heterogeneous data sources to MDM.
- Responsible for analyzing the different Line of business data to fit into an enterprise master data repository.
- Responsible for data profiling of in-house and third-party data sourcing to master data hub.
- Worked with ETL and DBA teams for effective data migration and production data issues resolution.
- Designed conceptual, logical and physical data models for effective metadata management.
- Responsible for development and tuning of existing ETL and SQL codes for new data requirements.
Environment: Oracle, Tera data, Ab Initio Data profiler, Ab Initio GDE, IBM infosphere, Erwin
Confidential, NJ
Data Analyst
Responsibilities:
- Responsible for gathering business data requirements for data management projects.
- Responsible for Conducting design walkthroughs with data management teams.
- Responsible for database design and implementation for project requirements.
- Developed complex SQL queries to fetch data from different tables on databases.
Environment: Microsoft Office suite, SQL