- Scale technology, engineering, and process.
- Design, Managed, Tuned and Implement Enterprise scale, highly transactional, complex, strategically important, business critical Database and Applications.
- Hands - on experience with common machine learning algorithms with real time streaming data. Extract data, analyze information and communicate insights to non-technical stakeholders.
- Hands on programming experience in multiple technologies, frameworks and methodologies and on varied platforms and technology stacks.
- Hands on experience with Master Data Management (MDM) and Data Governance.
- Advanced knowledge & experience with all aspects of Hadoop ecosystem.
- Designed multiple Big Data Systems using (Scoop, Pig, Hive, Oozie, Kafka, Hue, Spark, Zeppelin, Atlas, Solr, LLAP, etc.) Messaging technologies Nifi, Rabbit Mq.
- Articulate and adopt latest in technology. Refactor messy code/config and vastly improve key algorithms. Multi-paradigm Programming (Java, Python, SQL, R). API. Services. Concurrent Programming. Self-taught Continuous Learner. Design for High Performance & Scalability with 100Ks of concurrent usage. Lead and Resolve 'War Rooms' during major crises.
Big Data Architect/Consultant
- Object detection, classification and segmentation, object detection by implementing new mathematical algorithm (patent in process).
- Developed data pipeline to perform pre-processing, NLP on collected data for sentiment analysis.
- Developed real-time data pipeline using Kafka, spark streams, Hbase.
- Designed and built data pipeline which stream data from client apps to server to Kafka Consumer to HDFS. Spark jobs to read HDFS data by using Spark-SQL for stream and batch processing jobs.
- Designing data queries against data in HDFS environment (hive, hbase)
- Apache Kafka with storm connectors to consume live streaming data, data lake creation for building RWI (Real World Intelligence) application and also used hdfs as consumer.
- Sqoop data from oracle/postgres/sql server database to HDFS and flume from sensors and web log to HDFS for malware analysis.
Principal Engineer/Big Data Architect
- Machine learning and deep learning methods for predictions.- Demonstrating the understanding of statistical inference and model comparisons, and in feature extraction
- Manage horizontal Architecture & Principal Engineers Team. Built Product Data Mart and implemented a consolidated Analytic Cloud on Hadoop. Performance/Availability/ Scalability leadership. Data capacity planning and node forecasting.
- Installed and build a Hadoop Cluster from scratch. Configure and tune the Hadoop Environment to ensure high throughput and availability.
- Implemented Multi-tenancy, Integrated Security, and Authentication with Kerberos and LDAP integration via PAM and ACL.
- Successfully created and deployed application on (AWS) Amazon managed instances in our VPC where each deployable micro service has an application Contain many environments, Environments deployed in appropriate subnets for Auto scales.
- Designed and developed Data Ingestion, Data processing and Data export and visualization frameworks.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Installed and Configured Hbase by installing Hbase Master and Hbase Regional Servers.
- Performed benchmarking on the Hadoop cluster using different bench marking mechanisms.
- Tuned the cluster by Commissioning and decommissioning the Data Nodes.
- Deployed high availability on the Hadoop cluster quorum journal nodes.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Data Ingestion: Involved in importing and exporting data from local/external file system and RDBMS to HDFS using Sqoop. Python scripting for slicing and dicing of data and Automating process.
- Installed Apache Solr cloud on cluster and configured it with Zookeeper, index documents using hive-Solr storage handler to import different datasets including xml, csv, and json.
- Build Kafka consumer to do spark streaming for business transformations of application log files.
- Design and build logging solution using Elastic Search, Log stash and Kibana.
- Design, developed, maintained, and tuned highly available Data warehouse using Postgres, Pgpool, Kettle (ETL) to execute 90000-view objects delta refresh for day to day reporting.
- Loading data from different servers to s3 bucket and setting appropriate bucket permissions.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Design and implemented real time reporting solution for small and medium size customer who cannot afford to run enterprise edition of database.
- Developed application in web and mobile modules using spring, Hibernate and web services (SOAP and REST).
- Developed presentation layer using Servlets and JSP and MVC.
- Used Spring Repository to load data from oracle database to implement DAO layer.
- Designed, developed and delivered enterprise-level, full life-cycle Business Intelligence platforms and solutions for various Banking and Financial organizations including Commercial Cards, Mortgage, Commodities and Derivative Trading data infrastructure, security, applications, architecture solution, design and delivery.
- Lead BI development using Business Objects Xi R2, built KPIs & dashboards.
- Consolidated various reporting tools inherited from Mergers and Acquisitions.
- Actively participated and contributed to the data governance and architecture council.