We provide IT Staff Augmentation Services!

Sr. Etl/ Bigdata Lead Resume

Nyc, NC


  • Confidential is a Big Data, Data warehouse, Data governance ETL Subject Matter Expert/Architect and Developer with over 12 years of professional IT experience in Healthcare, Banking/Finance, Telecom and Internet domain.
  • He has extensive experience in Big Data technologies, Analytics, Data Integration, Data Architecture & Modeling, Data Quality, Data Governance, Master Data Management (MDM), Data Warehousing, Business Intelligence and Applications Architecture with specific expertise in:
  • Big Data (Hadoop Stack) Data Warehousing/BI/ETL Informatica Big Data Manager (BDM)
  • Data Architecture Analytics Informatica Data Quality (IDQ) Informatica PowerCenter Oracle, SQLServer, DB2 Netezza, Teradata
  • Informatica Cloud Data Modelling UNIX Scripting
  • Visualization - Tableau Python, R Amazon S3


Sr. ETL/ BigData Lead

Confidential, NYC, NC

  • Delivery of Amazon AWS and cloudera based Big Data, Data Integration, Analytics products
  • In depth analysis and ingestion to hadoop for clinical, M2Gen and TruVen data to build analytical models and cohort building
  • Created best practices and standards for design documents, Source to target (S2T), unit testing, performance testing, capacity planning documentation and also production support response strategy
  • Designed and Developed Enterprise Data lake comprising of landing (Amazon S3 buckets), Raw (HDFS & Hive tables), Refined (Hive tables) zones and Analytical (Hive tables) zones for Intelligent data lake (Informatica IDL) for enterprise data assets cataloging and tagging
  • Designed and created Data Models (relational, dimensional & flattened) for Raw zone, Refined zone and the Analytical zone of the hadoop data lake using Erwin
  • Designed the ETL/Data Integration engine by meeting the highest standards for Data quality performance, scalability and modularity using the BDM developer, Data Profiling tools and scripting
  • Used Informatica Analyst for data profiling, Score carding to bring data quality issues upfront and then developed Informatica BDM jobs/workflows to operationalize the data delivery to data scientists and the cancer research teams
  • Used Amazon S3 Informatica connector to source the data from landing zone and loaded that into Hive tables into the data lake
  • Created dynamic BDM mappings to take care of dynamic schemas on read and write
  • Executed BDM jobs on various execution engines like spark, Hive/Map-Reduce and Blaze
  • Delivered products in fast paced, volatile, agile(scrum) environment with sprints lasting for 3 weeks
  • Used Atlassian products like Jira for task management and BitBucket for code versioning

Design Lead

Confidential, NYC

  • Strategy, roadmap & delivery of cloud based and On-Premise Data Integration, Big Data, Analytics products related to Claims (Provider Referral Pattern & leaks to out of network sites), Provider, Patient, Consumer (Propensity/Predictive Modeling), Call Center, Digital Marketing and ROI’s.
  • Define, lead and implemented the architecture of Data Products including the reporting dashboards (using Tableau), call center and Experian consumer data
  • Used IDQ match, consolidator, association, address doctor transformations to master patient/consumer and practice data, then did analytics for target marketing to potential consumers for healthcare services
  • Worked on Health link, IMS, Symphony and Optum healthcare data assets to build next gen analytics products
  • Architected patient/provider Master data management initiatives using Informatica MDM.
  • Designed onboarding of Big-Data technologies (Hadoop ecosystem) for claims data (Provider Referral Patterns) using - Hive, Latin Pig, Mahout, SparkSQL, Sqoop, Impala, Hue UI
  • Designed Hadoop data ingestion engine to store Batch (RDBMS-Sqoop/ETL), Files (SFTP-ETL tools) data into Hadoop data lake
  • Created reusable mappings using Informatica BDM (Big Data Manager) ver10.0 for various clients using the Dynamic Schema functionality and researched the SQL to mapping offering in BDM ver10.1
  • Expert in implementing and performance tuning of Informatica Developer tool’s IDQ transformations like Match (fuzzy match algorithms to master consumer/patient data), consolidation, association, address doctor (to clean/standardize/score address data) and change data capture (CDC)
  • Performed hands on data analysis using Informatica Analyst (Data Profiling, Score carding ), Hive and SparkSQL
  • Effectively contributed in taking the start up from small to multi-million company

Sr. ETL Consultant

Confidential, DE

  • Analyzed data models, created proof of concepts to prepare high level and low level design documents
  • Designed the ETL/Data Integration engine by meeting the highest standards for Data quality, performance, scalability and modularity
  • Analyzed the data models of the source & target systems to develop comprehensive mapping specifications
  • Co-coordinated with offshore team on day to day basis
  • Coordinated with the business and products team to understand the system requirements
  • Worked extensively with dimensional tables to create SCD Type1,2 mappings using Informatica
  • Developed the performance tuning and ETL error Handling Strategy
  • Design and developed ETL process to load and extract data using BTEQ, FLOAD, MLOAD and Fast Export.
  • Performed Informatica Administrator functions like installation, creating repository, Groups Users, Folders, code deployments
  • Written PL/SQL packages, procedures and functions to achieve various business functionalities.

Hire Now