We provide IT Staff Augmentation Services!

Big Data Architect & Data Architect Resume

5.00/5 (Submit Your Rating)

CA

SUMMARY:

  • Seasoned Information Professional with over 16 years of experience in software consulting and industry positions, enabling organizations to harness the power of information and leverage data assets by implementing robust and scalable information platforms.
  • Specialist in architecting, designing and developing Data Warehouses, Data Marts, Data Integration, Master Data Management and Big Data Solutions.
  • Excellent track record for executing and managing Business Intelligence platforms and projects in cost effective ways and within established timelines.
  • Strong blend of technical expertise with good business acumen.
  • Highly adept Confidential partnering with Business and IT leadership teams to establish roadmaps and execution strategies.
  • Exceptional people leader with proven track record of building and leading global high performance teams.
  • Excellent interpersonal and organization skills. Result focused and adaptive in a team environment.

FUNCTIONAL S KILLS:

Enterprise Architecture: Enterprise Data Strategy, Enterprise Data Integration, Enterprise Business Intelligence, Data Architect.

Big Data: Big Data Platform Architecture, Hadoop(MapReduce, HDFS), Impala, Spark, HBase, Oozie, Flume, Sqoop, Zookeeper.

Data Architecture: Dimensional Modeling, Conceptual Modeling, Logical Modeling, Physical Modeling, ER Modeling - 3NF, Star Schema, Snowflake Schema, Conformed Dimensions, Reference Master Data, Master Data Management (MDM).

ETL Architecture: Integration and Semantic Transformations, Optimized Data Extraction and Load Strategy, Informatica/ETL Performance Tuning, Database Performance Tuning, Partitioning Strategy, Indexing Strategy.

Business Intelligence: Data Discovery, Self Service BI, Data Visualization, Executive Dashboards

Data Governance: Master Data Governance, Enterprise Data Quality, Subject Matter Expert (SME), Technical Data Steward

Management: Project Management, Vendor Management, Offshore Resource Management.

TECHNICAL SKILLS:

Big Data: Hadoop, Cloudera CDH, HDP2.0, HBase, Cassandra

ETL Tools: Talend Open Studio, Data Torrent, SyncSort, Informatica, Ab Initio

Data Visualization: QlikView, R Studio, Visual Insight.

BI Tools: TIBCO Spotfire, QlikView, Microstrategy, Cognos, Crystal Reports.

Database: Oracle, DB2, MS SQL Server 2000/7/6, Teradata.

Predictive & Statistical: Statistica, Rapid Miner, R Studio

Language: HiveQL, Pig (PigLatin), Python, Scala, SQL, PL/SQL, Perl, Python, Java, C/C++, Objective C, R Programming, ksh.

Data Virtualization: Denodo, Informatica Data Virtualization edition

MDM: Informatica Master Data Management, Talend MDM

Data Modeling: ERWin , ER Studio, Oracle Data Modeller, Visio

PROFESSIONAL EXPERIENCE:

Confidential, CA

Big Data Architect & Data Architect

Responsibilities:

  • Responsible for Architecting and building the new Hadoop, NoSql, In-Memory platforms and data collectors, building a Big Data Platforms that can ingest 100’s of Terabytes of data to be consumed for Business Analytics, Operational Analytics, Text Analytics, Data Services and other Big Data Solutions for various Business units.
  • Responsible for designing and architecting the Customer Viewership Model to understand customer’s viewership patterns, predict Churn, provide Recommendations for Personalized content and content acquisition, package right sizing and measuring campaign effectiveness.
  • Architected a Big Data Lake of 500 nodes with over 5 PB of data, ingesting 5TB of raw data daily. Curated data is processed into Enterprise Data Views (EDV) using AVRO and Parquet.
  • Design Offloading and migration of resource hungry ETL Data Warehouse processes to Hadoop. EDW capacity was freed up for faster analytics, cost savings from offloading helped justify further investment in Hadoop capacity expansion.
  • Engineered a system for parsing complex variable depth nested structure log files (JSON/XML/Text/CLOB). Process log files using Spark into a set of relational tables for easy access.
  • Build a Hybrid on-premises and Amazon AWS Cloud Clusters (Cloudera distribution). Private-Red reference data is masked before being uploaded to the AWS Cluster for workload off-loading, while regional data collectors feed directly into the cloud cluster.
  • Architected Lambda Architecture pipeline for real time agent incentive reporting and forecasting. Speed layer processes streaming data feeds and provides operational reporting. Batch layer processes incoming and late arriving files. A serving layer (parquet) presents both data views for reporting.
  • Engineered streaming data ingestion from Twitter Firehose using Flume, Kafka and Spark Streaming. Processing data using Spark (Scala/Python) to generate reports correlating call center volume and Social media events.
  • Designed the Migration from DirecTv Cloudera cluster to Confidential & Confidential Hortonworks cluster. GAP Analysis of architectural differences, feasibility analysis for lift-and-shift of ETL code, designing around the network challenges to transporting over 5 PB of data.
  • Designed (PoC) a Supernova customer-centric data mart to correlate customer events to understand the customer experience. Time Series data/event logs in Cassandra is processed using Spark to uncover patterns leading to customer churn.
  • Engaged with PwC for a Confidential in Hadoop PoC ( Confidential: Change Data Capture in Hadoop) for delta identification, replication and synchronization of Hadoop datasets using Python, Hive, Pig and Impala scripts.
  • Designed virtual slowly changing dimensions Type 1 (SCD 1) and Type 2 (SCD 2). Using Hive analytical windowing functions, table Partitioning(Parquet/Snappy) with nightly compaction of small files.
  • Designed the Big Data platform to ingest over 3TB of unstructured log files from STB’s. Flume used to ingest XML and JSON log files into the DTV-E Hadoop cluster. Data is tokenized/masked for security before being uploaded to the AWS Hadoop cluster. DistCP/BDR replicate files in hourly batches to DTV-BI Hadoop cluster, XML and JSON log files are parsed and transformed to AVRO staging tables and aggregated in Parquet tables.
  • Architected the conversion of existing data pipeline code from Hive, Pig and MapReduce to Spark (PySpark:Python). Performance Tuning of Spark and Spark Streaming processes.
  • Performance Tuning of MapReduce and Hive queries. Tuning for map side joins, long reducer, broadcast joins, compression and improve join performance leveraging pinning of tables in cache and memory management.
  • PoC Real Time Operational Dashboards. Create real time dashboards leveraging SOLR for streaming data. Back-end feeds streaming data using Flume and Kafka while the operational dashboards show current metrics using SOLR.
  • PoC Tachyon and Apache Ignite Data Fabric to reap the benefits of an in-memory file system/in-memory data fabric. Tachyon improves IO times along with enabling Shared RDD’s for Spark. While Data Fabric speeds up Hive and MR processing on Hadoop.

Environment: Cloudera CDH 4.x/5.x, Hive, Pig, Impala, Cassandra, HBase, Oozie, SparkSpark Streaming, Sqoop, Flume, Kafka, SOLRData Torrent, Sync Sort DMX-h, Talend for Big Data, Cassandra, HBase, RadisAster Data, Teradata, Oracle, Unix Scripting, QlikView.

Confidential, Newport Beach, California

Data Architect/Data Warehouse S olution Architect

Responsibilities:

  • Responsible for guiding the evolution of business intelligence and data services for the RSD data warehouse. Re-engineering and re-architecting silo legacy data marts to use the bus architecture. Architected the in corporation of new subject areas and data marts as new business units were acquired.
  • PoC a hub and spoke Big Data platform (CDH Cloudera Hadoop distribution) to feed the statistical and predictive analytics platform with larger volumes of data. Incorporating Big Data platform to evolve and extend the enterprise data warehouse capabilities with advanced analytics delivering inferences from granular data.
  • PoC Cloudera EDH on AWS. Explore secure storage of data while running a variety of enterprise workloads (batch/interactive SQL/REPL/Advanced Analytics) with amazons scalability and flexibility.
  • PoC Amazon Elastic MapReduce (Amazon EMR) a hosted and managed Hadoop service verses in-house installation of Hadoop in PL data centers.
  • Architected a MDM(Master Data Management) solution for a customer centric(customer 360) repository of chronological interactions. The objective is to create an enriched and engaging experience when interacting with representatives and agents.
  • Designed a Data Discovery platform for actuarial teams to analyze large output files from statistical models. Sandbox scenarios and solutions to iteratively tune the statistical models.
  • Designed a Data Virtualization platform (Informatica) to improve time to value by enabling rapid development, agile reporting and realtime data feeds from heterogeneous source systems.
  • Engaged with strategic partners to explore the propensity to cross-sell amongst customer representatives using predictive analytics platforms and statistical models. Prescriptive intelligence gained would be used to develop an actionable target marketing strategy.
  • Evaluated Cloud Analytics and Amazon Cloud (AWS) hosted infrastructure to address the increasing maintenance towards infrastructure, administrative tasks, data center scalability concerns. Refocusing the resources towards business driven projects and innovation.
  • Promoted the cultural shift towards adoption of Data Visualization and data discovery for analytical and strategic insight while servicing operational reporting with the traditional reporting stack.

Environment: ERStudio, Hadoop CDH 4, Hive 0.9.0, Pig 0.10.0, Impala, HBase 0.94.6, Oozie 2.3.2, Sqoop 1.3.0, Flume 1.2.0, Talend MDM, Informatica PowerCenter 9.x Advanced Edition, Informatica Data Virtualization Edition, Informatica Metadata Manager, Oracle 11g, M.S. SQL Server, DB2, SQL, PL/SQL, Unix Scripting, Micro Strategy.

Confidential, CA

Data Warehouse Solution Architect

Responsibilities:

  • Responsible for re-architecting and re-engineering the legacy Data Warehouse implementations. Defined Roadmap was to design and implement a Decision Support/Analytic Platform that encourages Self-Service Data Discovery to obtain Insightful and actionable information.
  • Defined key business drivers for the Data Warehouse initiative. Delivered a Project Charter and Project Scope that aligns with the business drivers.
  • Architected a Hadoop big data platform for a lift-and-shift project to migrate components of the legacy Oracle data warehouse.
  • Optimized EDW storage and performance by archiving cold data and offloading ETL processing to Hadoop. Using Hadoop as a landing zone to ingest and transform data efficiently and economically freeing up capacity on the EDW.
  • PoC Big Data Hadoop platform. Selection of Pilot project to bring business value on the Big Data platform. ETL components re-engineered onto the Hadoop platform.
  • Evaluated data integration technologies to extract and run sentiment analysis algorithms on unstructured data from social networking sites to measure the effectiveness of marketing campaigns by integrating with data from the data warehouse.

Environment: Informatica PowerCenter 9.x, Apache Hadoop, Apache Hive, Apache Pig, Oracle 9i, DB2, ksh, Cognos 9/10, QlikView, Tibco Spotfire, ER Studio, Informatica 8.x, SQL Server 2008, Teradata, ksh, Cognos 9/10, Micro Strategy, ER Studio.

Confidential, CA

Data Architect

Responsibilities:

  • Responsible for the data architecture and ETL architecture of data marts. Prepared functional and technical specifications. Managing onsite/offshore delivery teams. Performance tuning and user training.
  • Coordinated requirements between business teams. Managed external vendors and their deliverables. Leveraged Agile to boost team productivity and effectiveness.
  • Collaborated with Stakeholders and SME’s to gather business requirements. Documented functional requirements and business process flow.

Environment: Informatica 8.x, DB2 v6r2, Oracle 8i/9i/10g, SQL Server, Teradata, ER Studio, ERWin4.0, Micro Strategy, Business Objects, Crystal Reports X & XIPerl, ksh, SVN, Merced, eWFM, Autosys, Tivoli.

Confidential, CA

Data Warehouse Solution Architect

Responsibilities:

  • Architected Logical and Physical data models and ETL design for a trickle-flow near real time data warehouse.
  • Prepared technical design documentation, data mapping (source to target mapping) document.
  • Documented Data Warehouse Standards, ETL best practices guide, database naming conventions and ETL naming convention.
  • Responsible for the Data Architecture and Data Warehouse ETL architecture and . Managed on-site/offshore development teams. Interacting with business users to gather and analyze the business requirements, create report mockups, functional specifications, high-level technical design and the mapping specifications. Manage deliverables from the offshore development teams.

Environment: Informatica 7.1/8.x, Oracle 9i/10g, SQL Server 2000, Teradata, MQ Series, CoSort, Autosys, SVN, Perl, ksh, Business Objects, SAS, Crystal Reports, ERWin4.0.

Confidential, CA

Data Warehouse Architect

Responsibilities:

  • Translated functional requirements to technical requirements for the implementation teams. Implementing a solution that is robust and scalable, leveraging reusable components and a flexible database design.
  • Designed incremental data extracts using Confidential (Change Data Capture). Feed incremental transactional data to the Data Warehouse. Load Master Reference tables and dimensions using Type 2 slowly changing dimension (SCD).

Environment: Informatica Powercenter 7.1, Oracle 9i, MySQL, SQL Server 2000Kalido, Murrex, ERWIN 4.0, JD-Edwards One World, PL/SQL, Crystal Reports 8.5, Cognos.

We'd love your feedback!