- 10+ years of work experience in Architecting, Installing, Configuring, Analyzing, Designing, Integrating, re - engineering, and developing highly sophisticated Software Systems with database at their center, which consist of 3.5+ Bigdata Space and 8+ Yrs in Advanced Analytics, Data warehousing and Business Intelligence.
- Excellent understanding of BigData Technologies, Analytics, Data warehousing Concepts, Business Intelligence, Cloud platforms, and support. Demonstrable knowledge of Hadoop, Spark, Map Reduce (MR), HDFS, HBase, Hive, Sqoop, Flume, Ambari and Oozie, Impala, Apache Pheonix, Knox, Kylin, DataFu
- Installed and configured Hadoop in Cloud thru EC2 from AWS and Azure creating Multi Node - Clusters in cloud. Good Knowledge and Experience in RHEL, Ubuntu - Linux, Scripting - Pig. Shell Experienced in Hortonworks, Cloudera and IBM Big Insights Distribution. Persuing knowledge on HDInsights.
- Extensively worked on IBM BigInsights 4.2.1, using BigSQL and IBM Big Integrate and Big Quality on clients requirements. Worked in supporting the increase of Performance in BigSQL, have an understanding of Analyzing statistics, WLM capabilities, Access Plan, Optimizer and Benchmarks on BigSQL
- Hadoop cluster environment administration that includes adding and removing nodes from Cluster. Excellent understanding on NameNode, DataNode, Secondary NameNode, YARN (ResourceManager, NodeManager, WebAppProxy), and Map Reduce Job History Server and other Hadoop ecosystem components
- Have a good awareness on ElasticSearch, MongoDB, Teradata, IBM PureData (Netezza).
- Involved in Pre-Sales Activity and also in Product evaluation, and can provide recommendation for technology stack and target architecture within Hadoop ecosystem
- Experience in DatawareHousing (ETL/ELT) - in developing and designing application using Tableau and IBM DataStage. Good understanding of OLTP & OLAP concepts.
- Project Management Experience: Worked in overall project Phases of Initiating, Planning, Executing, Monitoring & Controlling on mostly on Big Data Projects, in wide range of Domains/Applications.
- Hands on with Project Schedule, Risk, Resource, Scope and Integration - Managements. Experience in SDLC (Software Development Life Cycle), Agile / Scrum Process.
- Stay current on emerging tools, techniques and technologies. Strong knowledge of Dimensional Data Modeling like Star and Snowflake schemas and knowledge in designing/Modelling tools like Embarcadero, Erwin and Power Designer and techniques defined by Bill Imnon and Ralph Kimball
BIGDATA FRAME WORK: Horton Works (HDP 2.1), Cloudera(CDH 5, Impala 1.3), IBM BigInsights 4.2
ETL TOOLS: IBM DataStage, SSRS, SSAS, SSIS, Informatica
NO SQL: HBase, Cassandra, Mongo DB
BI Tools: Tableau, PowerBI, MSBI Qlikview
BIGDATATOOLS&TECHNOLOGIES: Hadoop (HDFS, MapReduce) 2.4.6, Pig, Hive, Spark, Ambari, Flume, Sqoop, Kafka, Storm, Knox, Oozie, Ganglia, Zookeeper, Kylin, Knox etc
Analytics: Statistical Modelling, Predictive Analytics, Machine Learning, R
RDBMS: Confidential 9.x/10.x/11.x/12c, MS-SQL Server 2016, MySQL, H2, Netezza
OS: Windows, Linux / Unix, Kerberos Security
CLOUD: AWS, Azure and private cloud
Mainframes: Cobol, JCL, VSAM, CICS
Data Model: Dimensional Data Modeling, Star Join Schema Modeling, Snow-Flake Modeling, FACT and Dimensions Tables, Physical and Logical Data Modeling,
Language / Tools : Java, Scala, Python, Ecllipse, Xpeditor, Changeman, File Manager, Sysdebug, MS-Visio
Main Technologies: Technology Stack - SQL Server 2012, Confidential 12C, SFTP
- Define and document data architecture and technical specifications for enterprise data assets, working with business and other stakeholders
- Source data analysis and validation of data integrity norms
- Design data lake schema for raw storage of initial and incremental data storage. Specs are maintained on Confluence
- Develop data marts along with confluence documentation considering data lineage and technical data definitions using Bill Inmon and Ralph Kimball methodologies.
- Design and create reports supporting database objects helping in data discovery, advance analytics and for standard enterprise reporting needs.
- Design and maintain job execution framework for Nifi
- Performance enhancements - query/process tuning handling large data volumes
Main Technologies: Technology Stack - IBM Datastage 11.0, SQL Server 2012, Netezza 7.0.x, Confidential 12C, InfoSphere Data Architect 126.96.36.199, SQL Server Integration Services 2012, Tibco
- Requirement gathering - Identify reporting needs and KPIs to map client’s source system with IBM data model, Scope AWM and DWH for reporting requirements
- Source Analytics - Analysis of existing source systems, validation of data integrity norms, reverse engineering of source databases and creation of DFD, data dictionary.
- Documentation - Create conceptual, logical and physical data models with associated metadata using Ralph Kimball technique. Developed standard vocabulary, Business Data Model (BDM), Atomic Warehouse Model (AWM) and the Dimensional Warehouse Model (DWM) to fit unique requirements using IDA and business glossaries.
Main Technologies: IBM technology stack - Netezza, Datastage 11.5, InfoSphere Data Architect (IDA)
Hadoop Technology stack - Hbase, Hive, BigSQL
Hadoop Distribution - IBM Big Insights
- Member of overall architecture team, leading database architecture
- Defined data modeling policies, standards, data quality, object naming conventions, definitions, security and formatting of data
- Worked with business and IT stakeholders on-site to define and document the data architecture for enterprise data assets
- Developed and designed standard Vocabulary, Business Data Model (BDM), Atomic Warehouse Model (AWM) and the Dimensional Warehouse Model (DWM) to fit unique requirements using IDA and business glossaries
- Designed Datastage job execution framework (Sequences, exception handling (restart from point of failure, notifications etc
- Created technical specifications for ETL and reporting teams
- Developed/tuned complex jobs handling large volumes (Billion records) of data
- Performance implications to access Hbase/Hive tables through Datastage where Hadoop operated on Tez, vs Map Reduce
Main Technologies: Hadoop Technology Stack - Impala
Secondary Technology Stack - SQL Server, Shell scripting, VB .Net, IIS 7 and Git, Google charts API
Hadoop Distribution - Cloudera
- Requirement gathering - Identify reporting needs and KPIs to map client’s source system
- Design overall project architecture detailed use on components listing their limitations etc
- Source system analysis - Validation of data integrity norms, data quality analysis and creation of data health report for various source systems
- Documentation - Creation of DFD, data dictionary, logical and conceptual model for business process insights
- Dashboard designing - Designed required dashboards by analyzing claims data
- Lead team of 5 developers
- Developed and implemented Dashboards as per requirements
Main Technologies: Hadoop Technology Stack - Apache Map Reduce, Oozie, Hbase, Hive, Phoenix, and Spark
Hadoop Distribution - Horton works
- DB Architect and RnD Lead (DB activities)
- Member of overall architect team, designing data lake and defined data read patterns for required use case
- Extracted data capture points from standard HL7 and CCD messages eligible for storage as per use case
- Led & mentored development team to implement data ingestion module
- Infra setup - Responsible to setup and maintain Horton works Cluster on Azure and internal networks
- Worked on various critical Big Data implementations
- Hbase Region Hotspotting
- Performance implications for Phoenix vs Hive (on Tez), Spark vs Map Reduce
- Sqoop vs Flume
- Custom implementation in Impala
- Integration with Kafka module for relaying data to Hbase/HDFS via messages in Talend
Main Technologies: Technology Stack - Core Java, Azure Cloud, Sql Server
Hadoop Distribution: Hortonworks, Cloudera
- DB Architect and RnD Lead for cloud infrastructure
- Member of overall architect team, designed cloud based deployment architecture considering aspects of security and performance
- Worked closely with clients stake holders and development team to gather requirements and build database model
- Critical Big Data POC/Implementations
- Setup cluster on Auzre cloud (manual installation, wizard unavailable)
- HBase vs Hive data storage use case implementation.
- Using Azure cloud storage as HDFS storage (similar to S3 storage)
- Map Reduce vs Tez framework for complex data processing
- Creation of Datalake, ODS and DataMart in Hive
- MR code to read/process Image file header and create metastore in Hive
Main Technologies: Technology Stack - Java, spring, SOAP services, ActiveMQ JMS, XML, Git
Database - Confidential 11gR2
- DB Modeler / Dev Lead -
- Designed database model to persist HL7 messages considering SCD and CDC updates
- Infra Setup - Installed and deployed Confidential 11gR2 databases on Azure and local environments
- Maintained, promoted database changes on all environments
- Developed Jenkins prototype for automated DB builds to facilitate release onto various environments
- Designed and implemented module to facilitate data load from flat or fixed format files into ODS. It was a configurable system to facilitate load of medical trial files from different vendors/labs into appropriate studies
- Developed stored procedures, views, designed tables, indexes and worked on performance enhancement of the module