We provide IT Staff Augmentation Services!

Lead Big Data Architect/delivery Lead Resume

SUMMARY:

  • Highly experienced and knowledgeable big data management professional with expertise in Managing, architecting and delivering big data, data warehouse/ETL, Business Intelligence/ Analytics systems, Enterprise Data Integrations, Data marts, Hubs and Operational Data Stores.
  • Implementing Big Data architecture and processing frameworks using Apache Hadoop/Spark ecosystems on various distributions on - premise and cloud
  • In-depth, hands-on knowledge of Big Data/Information Architecture, Metadata Management, Data Governance, Data warehousing methodologies, designing of complex dimensional schema, data warehouses and marts.
  • Successfully playing roles of Technical Project Manager, Architect, Team/Delivery lead and Lead Developer mentoring team of developers

TECHNICAL SKILLS:

Big Data: Hadoop/HDFS, Spark(Pyspark), Hive/Impala, Oozie, Sqoop, Hbase, Kafka, Datameer

Data Modelling: Erwin, Embarcadero ER-Studio, PowerDesigner

ETL: Custom Spark/Hadoop scripting, Informatica PowerCenter, Talend

BI /Reporting: Datameer, Tableau, Cognos, Business Objects, MicroStrategy

Programming: Python, Java, Shell, SQL

Database: Netezza, Oracle, Teradata, MS SQL Server, MySQL

PROFESSIONAL EXPERIENCE:

Lead Big Data Architect/Delivery lead

Confidential

Responsibilities:

  • Worked as Technical Lead/Architect to coordinate between hardware procurement group, network group, application team and vendors to drive and implement Confidential Clusters ( 5 different environments: Dev/SIT/UAT/Prod/DR)
  • Architected application development platform on Hadoop (HDP 2.6) to offload various credit risk analytics processing from SAS Platform by performing capacity sizing, memory workload needs and component selections
  • Designed & Developed metadata driven distributed compute framework using Apache Spark, Python and Hadoop eco-system components
  • Enhanced performance ( 10-fold ) of month-end risk analytics processes like attribution analysis, overrides/what-ifs and other simulation by rewriting them in PySpark and Hadoop components
  • Worked with SAS platform admins and Hadoop infrastructure team to configure SAS Access for Hive/HDFS. Developed alternative data transfer methods for larger datasets due to limitations of SAS connector
  • Built handshake architecture to seamlessly trigger workloads in Hadoop platform using SAS RFW UI by users on-demand
  • Re-designed and developed entire Expected Credit Loss Calculator process from SAS to Python/pandas/numpy/Pyspark components to enable parallelization and achieved load times savings of 10-fold

Environment: Python/Pyspark, Confidential (HDP 2.6.5), Spark 2.x, Hive, Oozie, Sqoop, Git, SAS 9.4, RFW, Visual Analytics, Postgres

Big Data Solution Architect/Scrum Lead

Confidential

Responsibilities:

  • Collaborate with business sponsors, enterprise architects and project/program management to lead and deliver multiple big data initiatives
  • Laid out roadmap for Hadoop/Big Data architecture and developed data ingestion framework for enterprise data hub
  • Implemented and validated various big data use-cases ranging from data processing, real-time fraud detection, data ingestion and real-time analytics using components like Hive, Spark, Oozie, Kafka, Hbase, Spark and Python
  • Evaluated cost-benefits of migration of on-premise Netezza database to cloud (AWS & Azure )
  • Build self-service data engineering framework for Data Scientists to retrieve the key model data needed for their Predictive Modelling exercise
  • Groomed development and support team by cross- and continuous mentoring
  • Led and managed multiple Application team in Agile framework to deliver large data processing and analytics in Big Data Platform

Environment: Python, Confidential (CDH 5.x) Hive, Oozie, Sqoop, Spark, Hbase, Kafka, Git, Tableau

Technical Project manager/Big Data Architect

Confidential

Responsibilities:

  • Managed relationships with clients in Financial/Banking verticals to assist in migrating data sourcing and processing platforms from their legacy platforms to HDP clusters
  • Design Data Lake architecture and infrastructure roadmap for supporting big data initiatives at client sites
  • Providing advisory and trouble-shooting services to Confidential ’ premier clients to ensure their big data strategy is operationalized with minimum roadblocks.
  • Coordinate client’s requests and critical bug escalations between product management, engineering and JIRA teams
  • Liaison between clients and Confidential product management to pursue critical enhancements and feature requests on upcoming data processing platforms like Spark, No-Sql Hbase, Hive and relevant Hadoop ecosystem components

Environment: Hadoop (HDP 2.x) Hive, Oozie, Pig, Hbase/Phoenix, Sqoop, MapReduce, Spark

Big Data Architect/Developer

Confidential

Responsibilities:

  • Provided architectural input for big data strategy initiative for implementing big data platform to enable large volume batch processing
  • Provided foundation layer design for Data Lake repository for making all sourcing data generated from various applications to reside in Hadoop platform
  • Conceived and developed data sourcing and processing strategy from various databases/mainframe systems to Hadoop
  • Developed Hadoop ETL processes (using Hive/Pig/Oozie, MapReduce) to migrate Teradata data processing into Cluster. Involved in performance tuning efforts for large dataset joins in Hive queries
  • Collaborated with infrastructure team and other architects to establish Hadoop coding standards, best practices and development guidelines
  • Co-architected Job workflow framework using Oozie to modularize multiple source systems ETL into single workflow/templates
  • Assisting beta-testing and validation of Confidential Nagivator (metadata component) to ensure it is integrated with MetaCenter (DAG tool) and meets clients Metadata/Audit/Data lineage requirements
  • Established Datameer development guidelines/best practices and implemented workbooks to perform data analysis, Data /Quality validations & Scorecards/Dashboards in Datameer
  • Designed dashboards and administered user/groups security, migrations within Datameer

Environment: Hadoop (CDH 4.x) Hive, Oozie, Sqoop, MapReduce, Teradata, Netezza, MetaCenter, Talend, Spotfire

Data Integration Architect

Confidential

Responsibilities:

  • Guide and provide ETL architecture best practices and technical expertise to various project teams
  • Worked with Data architects and functional users to design system controls and auditability
  • Review logical and physical models from ETL and auditability perspective Review ETL specifications and ETL code developed by onsite/offshore developers
  • Designed and Developed complex/real-time ETLs to interface with IBM MQ messaging systems

Data Integration Architect

Confidential

Responsibilities:

  • Collaborated with Business partners and Application Owners to hash out data sourcing and transmission requirements for various trading systems.
  • Contributed in designing automated reconciliation process for transactions and positions between trading system sources (like Calypso, IBIS, OPICS) and FACS data warehouse
  • Re-designed file-based Sources integration to provide audit statistics and traceability for every piece of data in each pipeline e.g. file arrival time, byte size, record-counts and audit column sum at each stage
  • Authored migration guidelines and Informatica/UNIX specific development guideline and naming standards including reusable components across various systems, dynamic parameter files
  • Mentor on-site and off-shore development teams and provide ETL development Best Practices

ETL Lead Consultant

Confidential

Responsibilities:

  • Designed, Developed and managed ETL architecture written in Python/Shell/XML to bring Confidential ’s Advertisement transactions into Netezza-DW for facilitating metrics like clicks,impressions,cost/$ to be reported across variety of dimensions like regions,platforms/web-properties, time and industry verticals.
  • Led an effort to collaborate external demographic data feeds from D&B,Thomson etc. for Gap-analysis between Confidential existing customers, prospects and respective media-spends.
  • Architected Conformed Dimensions for Confidential ’s Internal Data Mart subject areas and contributed in developing common platform (bus architecture)
  • Performed Data Security/Sensitivity analysis and Classified attributes into Sensitive (SI), rmation (PI) and Personally Identifiable Information (PII)
  • Facilitated User-Acceptance Testing for MicroStrategy BI reports/ Dashboards along with end-to-end data analysis for reported bugs and resolutions

Data Architect

Confidential

Responsibilities:

  • Reviewed prevailing DW architecture and presented recommendations for possible areas of improvement in data mart design, ETL design and overall business processes for data mart implementation
  • Worked with HR Application implantation teams to identify data attributes important for HR data Mart and created Data Dictionary and Element Glossary for business users and metadata management

DW Architect

Confidential

Responsibilities:

  • Led and supported business requirements sessions with various Marketing group to design Dimensional Model and ETL architecture (involving Encryption mechanism for sensitive data) for retrieving information from VOD (Video-On-Demand) applications
  • Assisted team technically for complex ETL program development and performance tuning of data loads
  • Co-ordinated efforts and transition with various support groups for systems implementation procedures
  • Evaluated new projects/reporting requests and provided scope/approach recommendation along with effort estimations and time-line projections

Hire Now