We provide IT Staff Augmentation Services!

Chief Data Architect/modeler/data Engineer Resume

3.00/5 (Submit Your Rating)

New, YorK

SUMMARY:

  • 15+ years; seeking an Data Engineering, Data Modeler, Data Science, Data Management role in
  • Leadership in Data Management. Built complete Data Management, Big Data Migration, Data Governance team from ground up. Understand all file feed data issues and resolution around it.
  • Can create SAAS tools using any web technologies with complex processing - Python Django/Flask/ NodeJs.
  • Lead Data Architect in Global Compliance IT New York.
  • Designed the Compliance Data Warehouse using Power Designer- Conceptual, Logical and Physical based on subject area - Equity, Fixed Income, Listed Derivatives, Index, Equity Transactions, Cost Centre, HR Employee data, Asset management; Fix protocol.
  • Defined/Designed the Complete Data Management Eco-System- creating standards/Naming convention, Document Format to capture Feed onboarding, Data Lineage-source to attribute level, Data Delivery, Data Quality Rules for CDE-Critical Data Elements, Data Dictionary publishing, Creating Data Taxonomy and assigning Data class/Category for each element.
  • Migrating any data sources/oracle databases/Legacy Main Frame/ Streaming -social Media/ File watcher to Hadoop File system or feeding to Apache Spark engine to process using RDD and finally publishing. Technology used Apache Sqoop/Apache Flume/Apache Kafka/Apache Spark/Python/Scala scripting.
  • Designed AWS Migration plan to either archive data to cheap storage like AWS Glacier or uploading to S3/ buckets or creating Hadoop cluster using AWS EMR or EC2 instances and estimating cost of processing every month. Implementing and configuring the same. Designing the Security Access using AWS IAM Security Roles, Content Distribution using Edge Servers and AZ- Availability Zones.
  • Designed Many OLTP/OLAP/Data Warehouse/ NO Sql based Document store
  • Designed Strategy and Implementing AI/Machine Learning algorithm.
  • Managed Offshore/Onsite staff when required, L1/L2 Runbook creation
  • Data Analysis and creating ETL Mapping document at detailed level.

IT.KEY SKILLS:

Data Architecture | Core Data Modeler using Power Designer/ERWin | Data Governance - DQ Rules Engine, Data Lineage/Taxonomy, MDM, Data Sourcing, CDE-Critical Data Elements, Data Profiling | Designing Complete Data Warehouse from scratch- Dimensional or Non OLTP or NOSQL or Cloud based DynamoDB | Migrating data from any source - streaming, feeds, Market data, social feeds to target - Hadoop HDFS or HBase or Oracle/Netezza or Apache Spark Engine | Configuring/Implementing Apache Kafka based stream/file watcher and writing to HDFS or Apache Spark engine | Machine Learning - Data Pre-processing, cleaning/filling missing data / creating Training/Test set and creating Model using RStudio or Anaconda Spyder GUI based Python libraries | NLP-Natural Language Processing using NLTK libraries and python | Setting up /Migrating complete Database or process to Amazon AWS S3/ bucket or EC2 instances using ubuntu/RHL Linux instances | setting up AWS EMR clusters for Hadoop, migrating data to HDFS and processing using python, Scala, java | ETL using Apache Sqoop from any database or using Apache Flume based stream capture for HDFS or standard ETL tools like Ab Initio or Informatica | Designed/Implemented Hive, HBase data stores | scripting using Perl, Unix, Python, Java, Scala; DevOps Chef, Bamboo, BitBucket .

PROFESSIONAL EXPERIENCE:

Confidential, New York

Chief Data Architect/Modeler/Data Engineer

Environment: Power Designer ER, oracle 11g, Informatica, Big data Tech/Cloudera, Data Management- Data Lineage, Taxonomy, AWS EMR/EC2/DynamoDB, Hive/HBase, Python, RStudio, R, scikit, Machine Learning, Data preparation...etc

Responsibilities:

  • Designed using Power Designer- the conceptual, Logical, Physical Data Model for multiple subject areas- Reference Data (Party Model, Instrument Master - Equity/Fixed Income/Listed Derivatives/Index…, Exchanges, Cost Center, UBR, HR data, Instrument Listing, Aggregation Unit, Trader Book) and Transaction Data (Orders, Executions, Trades, Placements, Portfolio Managers, Brokers…)
  • Data Governance Framework & Standards (Object Naming Conventions, Production Jobs naming, Database, Data Mapping document standard), Data Lineage from authoritative sources all the way to consumers at attribute level, Master Data, CDE (Critical Data Elements), DQ Rules and implementation for daily feeds, Data Steward reviews, Data Dictionary
  • Represented Compliance into CDO -Chief Data Office and CTO forums.
  • Guided all development teams team in data transformations and managed end-to-end releases
  • Designed the Data Lake - installed & configured Hadoop in multi-node cluster, installed & configured Apache Spark, spark streaming, python 2.7/3.5, Apache Kafka, Flume, Sqoop, Oozie,
  • Coded multiple ETL processes using python 3.5 reading from xml/json files, parsing data using Spark’s pyspark data frames and sorting/cleaning...etc before writing to database or flat files. Launched master to yarn Client for cluster processing.
  • Daily data feed archival -coded using Flume File sink to hdfs for 7-10 years retention.
  • Communication Surveillance - email/Chat- coded Kafka sources to sink via Zookeeper broker and cleaned the data using spark streaming before writing to RDD and hdfs.
  • For another team designed the AWS-Amazon Web Services S3 buckets, EC2 spark AMI and EMR creation, connected via putty using AWS private key credentials, and configured IAM security access roles, and Apache spark and related libraries.
  • Used other libraries like scikit, sklearn, matplotlib,..etc for smaller project work using ML
  • Designed & coded the initial feed using Restful API based data delivery mechanism using Python Django & Flask.

Confidential, New York

Enterprise Data Architect/Sr Manager/GCP Consultant

Environment: ER/Studio Data Architect 9.5, UDB DB2 v9, Oracle 11g, Informatica Power Center 9.2, Tidal, Linux, Informatica Metadata Manager & Business Glossary 9.1., Hadoop, Cloudera requirements

Responsibilities:

  • Engaged with various clients, met senior management -CXO/CIO level with Sales and Accounts leads helping clients on addressing various issues involving data.
  • Eli Lilly - a major Pharmaceutical company based in Indianapolis had issues with identifying data touch points and sources- we suggested and delivered the complete end-to-end Data lineage and identifying Master Sources of all Data and assigning Data Owners along with creation of a complete plan to maintain the refined design.
  • JPMC - we designed and delivered the reference data Architecture and various components
  • Gilead Science - Needed assistance on managing huge amount of research and pharma trails data. We design the Hadoop based solution with Apache Pig/Hive access layer and Apache Sqoop and Flume based ingestion. Coded a major part of the project with our offshore partners and gave daily status update on progress to the client manager. Also gave a demo of NLTK based solutions.

Confidential, Jersey City, NJ

Solution Data Architect/Consultant

Environment: ER/Studio Data Architect 9.5, UDB DB2 v9, Oracle 11g, Informatica Power Center 9.2, Tidal, Linux.

Responsibilities:

  • Working with Multiple analyst who were documenting the data elements for various BO reports, we created the Data Subject Areas.
  • Designed the Banking Data Model along with another architect and client SMEs. Went through various iterations and walk through. It was a classic IBM Party Model with Part, Arrangement, Location, Condition, Classification, Resource, Events...etc
  • Guiding various Informatica ETL developers with component design and transformations as we released one feed after another into the data warehouse as we went through continuous Sprint cycles and Project releases.
  • Next, I designed the Data Marts for various reporting - this was mostly dimensional marts with 70+ dimensions and 20+ facts covering subject areas such as Deposit, Loans, Customer….
  • Designed the Batch Process Metadata repository for Scheduling job creation, job names, job consumer, daily feed names, feed status, DQ rules threshold failures, email alerts…etc
  • For our BO reporting we had to maintain a very strict access control based on bank employees’ roles and responsibilities- for that I worked with our enterprise security team and intranet development team to tap into the HR change message streams and capture these in near real time into our database- that is finally sued by our BO report in updating entitlement.

Confidential

Head, Data Warehouse Team, Senior VP

Environment: Netezza Twin Fin, Oracle 11g, UDB DB2, Informatica 8.6/9.x, Autosys, Linux, Java/.Net

Responsibilities:

  • Worked with NJ based Equity Data Warehouse Director and his JC teams to take guidance on roles and responsibilities to build locally.
  • Conducted various hiring drives and hired the best candidates - experiences and fresh out of schools
  • Conducted various sessions on understanding the existing landscape and assigning developers for retaining expertise for each role.
  • Built the local teams - Architectural, Data Modeling, Linux and Business Object development, Oracle database development, Java development.
  • We supported OATS Execution Report (trade capture), Riskless Principal OTC report, Best Execution Report, Market Order Timeliness (MOT) Report, Reporting Firm 10 Second Compliance Report, Reg NMS Trade Through Report...etc.
  • Built and expanded the global Equity Netezza Data warehouse and daily data syncing between Asia pacific/Singapore, London, NY regional databases.
  • Created plans for Primary/Secondary production support for the global data warehouse and was a point of escalation.
  • Worked on creating various ETL flow into Netezza as part of the team (as I was Netezza certified and have lot of expertise)-
  • Using JIRA for Scrum work management, Confluence, Chef, Recipe, Cookbook Infra management tools

Confidential

Technical Program Manager, Data Architecture (Senior VP)

Environment: Oracle 11g, Informatica 8.6, Linux, Autosys, Java, J2EE

Responsibilities:

  • Worked closely with Senior management in understanding plan and objectives- short team and long term and gave frequent updates for feedback.
  • Worked with Business SME and managers in London office in guiding them as we built expertise at our Gurgaon location. Frequently travelled to London to meet the customers.
  • Managed 45+ team members across multiple location and addressed all team issues, promotions, pay master, reviews, rating...etc Hired as needed from PM to developer roles.
  • Build the Architecture & Design team that was instrumental in delivering the ETL redesign for our Credit Risk platform.
  • Guided in database design, ETL design, data capture designs, performance tuning, SQL tuning, building test cases, test strategy, deployment strategy, creating run books for support and implementations, project documentation, best practice
  • Data Feeds delivery included - Customer data, Collaterals, Loss Data, Credit Data, Market Rating and Core Banking Products
  • Basel II Credit Risk engine calculated the VaR (Value-at-Risk), EAD (Exposure at Default), PD (Probability of Default), LGD - Loss Given Default, RWA- Risk Weighted Average…etc
  • Some of the reports we generated for the business included:
  • Credit Risk Assessment, Credit Risk Exposure Analysis, Customer Credit Risk Profile
  • Liquidity Position Analysis & Liquidity Retail Funding Risk
  • Best Execution Analysis, Sarbanes Oxley Act Analysis (SOX) & Sarbanes Oxley Act Balance Sheet Analysis
  • Liquidity Risk Analysis etc

Confidential

Head, Data Management Platform Team

Environment: Oracle 10g, Informatica, Linux, Autosys, Perl, Java, Business Object XI

Responsibilities:

  • Documentations on “AS-IS” state- what data is available, what is needed, what is gap, who are the users.
  • Gap on data—identified tick data and market data need that we eventually purchased from Confidential corp.
  • Hardware platform architecture based on industry benchmark - suggested Dell Power Edge 2950 Quad core servers for 2-node RAC and 1 such Standby in remote location; storage was SAN RAID 10 disks; implemented the failover standby using Oracle Data Guard 10g, ETL using Perl/Shell scripting, Oracle Backup/Recovery strategy using RMAN scripting..etc. We implemented this. in fact I hand coded most of the things
  • BI architecture using 2-node BO cluster and BO SSO (Single Sign-on) user security and authentication using Windows Exchange LDAP
  • BI Folder access security - planned 6 different folders and access roles based on groups-Finance, Marketing, Research, Sales, Compliance, IT.
  • Hired a team of 9 developers, BO XI Admin and DBA overtime as we designed, developed and executed the projects.
  • We used postgreSQL database –that’s also used by our C++ based Front End system, that captured the FIX messages of dark pool trade order/execution that comes from Brokerages and clearing houses.
  • Oracle Backup and Enterprise manager Monitoring Alerts and Tuning were put in place. Oracle backup using RMAN was tested and implemented. We also tested block level recoveries.
  • Planned and created the Rules for Project Testing and deployment. Worked with business team to create test plans and conducted system, UAT teams prior to any production deployment or changes to put a good control of the environment.
  • I also created the Support Model-L1, L2, L3 and roles and responsibilities along with Run Books and Heat map design.
  • It was one of the best experiences.

Confidential, New York

Technical Manager

Environment: Oracle 10g, Linux, Autosys, Perl, Java, Actimize

Responsibilities:

  • Initial days were spent in getting to know the stakeholders – Desk Heads, Compliance Supervisors, RCG-Regulatory Control Group, Data Source Teams, vendors’ resource managers.
  • Studied the existing systems with multiple rounds of walk-through sessions. Having meetings with the team members and creating a matrix of skills, background, interest and working relationships.
  • Then working with management I presented my resource requirement to be transitioned to the new team—from NY FTE, Kean Contractors, Confidential Contractors in India, Confidential employees in Mumbai…etc.
  • For BaU Development and Support, I had to create an on-call ROTA for Primary/Secondary Production support list per week for our developers, where I was in escalation.
  • We continued in development of existing Regulatory reports – Rule 606, OATS, Rule-92, Employee Trading, ATS-R, Reg-NMS, 1% Market Volume, Short-Sell Locate, pink sheets, 5% market making and other NYSE, NASD,AMEX,TSX reporting..etc
  • Had regular meetings with project stakeholders-compliance and RCG teams along with on-site and offshore engineers and we walked through the development progress, testing strategies, result of test cases, milestones achieved, issues & concerns, new change requests from users, estimations & project impacts, risk & mitigations, regulatory audits, newer projects..etc..etc
  • Hired staff in vendor locations, working with vendors on contracts & SLA.
  • Within about 10 -11 months we were able to successfully separate our systems by ring-fencing our surveillance systems with separate login id, separate servers, test & QA environments, repositories, database schemas, Unix env and schedulers.
  • Later I worked on creating a Production Support Model to outsource the Level 1 & 2 to Kean in Canada as part of our low cost negotiation and savings to the bank. As part of this process, I worked on creating the Run Book, Heat Map, instructions, training the support staff and being on standby for 6-8 weeks as they came up to speed.
  • Once all separation and Support structure was put in place, our team was moved across Business Unit to Global Legal IT and Compliance Team under a Managing Director.
  • At our new home, we found a much smaller Equities Trade Surveillance Team under a VP, that delivered using Actimize Tool and there was a decision to migrate all our reports to Actimize. With that decision and cost conscious management, I decided to find greener pasture elsewhere.

Confidential, New York

Team Lead & Data Warehouse Architect

Environment: Oracle 9i, Ab Initio, Linux, Autosys, Perl, Java

Responsibilities:

  • Initial days were spent reading Design Documents, getting to know the OLTP databases, data, tools, users requirements, issues & concerns, wish list ..etc.
  • I proposed a POC-proof-of-concept for both Sybase IQ and Oracle and secured a Sun Fire 25K machine with 500GB of SAN space, where I created 2 partitions for Oracle 9i and Sybase IQ. Installed the server software and drivers; created Test Plan, test scripts/sql; loaded data from production environment, multiplying data many times to create the testing volume. Ran the test and captured the performance metric and presented to management. We decided on Oracle 9i as our strategic database.
  • Data Warehouse Design: I proposed the Kimball Dimension Schema for the Market Data DB, which was finally implemented. We used CDC Type-2 for Historical Data retention, daily partitioning, local/global index, bitmap index—20-30+gb daily feed..etc was implemented.
  • Historical data were stored in read-only separate tablespaces.
  • Data Sources were: Bloomberg & Reuter’s Market data for exchanges all over the world—North America, Asia1 and Asia2, Latin America, Europe, captured throughout the day as time zone closes.
  • ETL Load process for loading Market Data feed and other internal feeds were done using AbInitio GDE graphs.
  • Meta data was maintained in AbInitio Enterprise Meta>Environment (EME). It provides capability to store both business and technical metadata. EME metadata can be accessed from the Ab Initio GDE, web browser or AbInitio CoOperating system command line (air commands)
  • Co>Operating System is a program provided by AbInitio which operates on the top of the operating system and is a base for all AbInitio processes.
  • For Change Request, we “Check Out” from EME Data store to our individual sand box and run. We lock a graph before making any change. Used various types of parallelism –Pipeline, Data, Component; Aggregation and Rollup, reformat, SCP/FTP, Updater, Lookup File, Intermediate File, Run SQL Component, running in Phases. It’s one of the best tools for ETL and simplest to develop on.
  • ETL Load: Data were loaded on end-of-day and intra-day basis, thereby having very little window for maintenance.
  • We defined each project work as minor or major with a defined implementation cycle of 30 days to 3-6 months cycle. Everything starts with a BRD-Business requirement and a kickoff and agreed estimates. Then we go through Functional and Tech specs, build, unit testing, UAT, signoff and implementation.
  • As part of implementation planning, we had many rounds of meeting with the Production DBA and Unix Support teams on securing their support and time. We built the run book for Level 1.
  • Post-implementation we have the responsibility for Level 2 & 3 support for any production failures or data issues.
  • We delivered the Oracle Market Data warehouse, ETL processes to load the data intra-day and EOD and then built the ETL process to delivery to data to users groups in various formats-csv, txt, xml, dat formats.

We'd love your feedback!