Sr. Etl/ Bigdata Lead Resume
4.00/5 (Submit Your Rating)
Nyc, NC
SUMMARY:
- Confidential is a Big Data, Data warehouse, Data governance ETL Subject Matter Expert/Architect and Developer with over 12 years of professional IT experience in Healthcare, Banking/Finance, Telecom and Internet domain.
- He has extensive experience in Big Data technologies, Analytics, Data Integration, Data Architecture & Modeling, Data Quality, Data Governance, Master Data Management (MDM), Data Warehousing, Business Intelligence and Applications Architecture with specific expertise in:
- Big Data (Hadoop Stack) Data Warehousing/BI/ETL Informatica Big Data Manager (BDM)
- Data Architecture Analytics Informatica Data Quality (IDQ) Informatica PowerCenter Oracle, SQLServer, DB2 Netezza, Teradata
- Informatica Cloud Data Modelling UNIX Scripting
- Visualization - Tableau Python, R Amazon S3
PROFESSIONAL EXPERIENCE:
Sr. ETL/ BigData Lead
Confidential, NYC, NC
Responsibilities:- Delivery of Amazon AWS and cloudera based Big Data, Data Integration, Analytics products
- In depth analysis and ingestion to hadoop for clinical, M2Gen and TruVen data to build analytical models and cohort building
- Created best practices and standards for design documents, Source to target (S2T), unit testing, performance testing, capacity planning documentation and also production support response strategy
- Designed and Developed Enterprise Data lake comprising of landing (Amazon S3 buckets), Raw (HDFS & Hive tables), Refined (Hive tables) zones and Analytical (Hive tables) zones for Intelligent data lake (Informatica IDL) for enterprise data assets cataloging and tagging
- Designed and created Data Models (relational, dimensional & flattened) for Raw zone, Refined zone and the Analytical zone of the hadoop data lake using Erwin
- Designed the ETL/Data Integration engine by meeting the highest standards for Data quality performance, scalability and modularity using the BDM developer, Data Profiling tools and scripting
- Used Informatica Analyst for data profiling, Score carding to bring data quality issues upfront and then developed Informatica BDM jobs/workflows to operationalize the data delivery to data scientists and the cancer research teams
- Used Amazon S3 Informatica connector to source the data from landing zone and loaded that into Hive tables into the data lake
- Created dynamic BDM mappings to take care of dynamic schemas on read and write
- Executed BDM jobs on various execution engines like spark, Hive/Map-Reduce and Blaze
- Delivered products in fast paced, volatile, agile(scrum) environment with sprints lasting for 3 weeks
- Used Atlassian products like Jira for task management and BitBucket for code versioning
Design Lead
Confidential, NYC
Responsibilities:- Strategy, roadmap & delivery of cloud based and On-Premise Data Integration, Big Data, Analytics products related to Claims (Provider Referral Pattern & leaks to out of network sites), Provider, Patient, Consumer (Propensity/Predictive Modeling), Call Center, Digital Marketing and ROI’s.
- Define, lead and implemented the architecture of Data Products including the reporting dashboards (using Tableau), call center and Experian consumer data
- Used IDQ match, consolidator, association, address doctor transformations to master patient/consumer and practice data, then did analytics for target marketing to potential consumers for healthcare services
- Worked on Health link, IMS, Symphony and Optum healthcare data assets to build next gen analytics products
- Architected patient/provider Master data management initiatives using Informatica MDM.
- Designed onboarding of Big-Data technologies (Hadoop ecosystem) for claims data (Provider Referral Patterns) using - Hive, Latin Pig, Mahout, SparkSQL, Sqoop, Impala, Hue UI
- Designed Hadoop data ingestion engine to store Batch (RDBMS-Sqoop/ETL), Files (SFTP-ETL tools) data into Hadoop data lake
- Created reusable mappings using Informatica BDM (Big Data Manager) ver10.0 for various clients using the Dynamic Schema functionality and researched the SQL to mapping offering in BDM ver10.1
- Expert in implementing and performance tuning of Informatica Developer tool’s IDQ transformations like Match (fuzzy match algorithms to master consumer/patient data), consolidation, association, address doctor (to clean/standardize/score address data) and change data capture (CDC)
- Performed hands on data analysis using Informatica Analyst (Data Profiling, Score carding ), Hive and SparkSQL
- Effectively contributed in taking the start up from small to multi-million company
Sr. ETL Consultant
Confidential, DE
Responsibilities:- Analyzed data models, created proof of concepts to prepare high level and low level design documents
- Designed the ETL/Data Integration engine by meeting the highest standards for Data quality, performance, scalability and modularity
- Analyzed the data models of the source & target systems to develop comprehensive mapping specifications
- Co-coordinated with offshore team on day to day basis
- Coordinated with the business and products team to understand the system requirements
- Worked extensively with dimensional tables to create SCD Type1,2 mappings using Informatica
- Developed the performance tuning and ETL error Handling Strategy
- Design and developed ETL process to load and extract data using BTEQ, FLOAD, MLOAD and Fast Export.
- Performed Informatica Administrator functions like installation, creating repository, Groups Users, Folders, code deployments
- Written PL/SQL packages, procedures and functions to achieve various business functionalities.