- Data Scientist with 6+years of Data science industry experience extensive building Data pipelines, handling large structured and unstructured data sets and deriving optimal solutions using state - of- the art
- Data science, machine learning, deep learning tools.
Tools / Languages: R, Python, Java, Scala, web frameworks - Shiny, Django, TenserFlow, Keras, QlikView, Ruby
Cloud Platforms: AWS - ec2, s3. Google cloud, Docker, Vagrant, HPC Grid engine, MS Azure HDInsight
Packages: Python - Pandas, SciKit-learn, Matplotlib, Bokeh, Plotly, Theano, NLTK, Gensim
Confidential, Raleigh, NC
Sr. Data Science consultant
- Working for Applied machine learning and applied research while also working on the business perspective for Cognitive computing AI platform named Syntheses for Digital reasoning Conduct surveillance - a deep learning/Artificial intelligence product feature that helps the banking system mitigate insider risk and enforce compliance for their employees. Responsible for several Pre-trained and POC models for NLP text classification and pattern recognition using logistic regression, neural networks.
- Working on unsupervised methods to reduce the False positive outcome for models using probabilistic graphical models, warpLDA etc. for next generation model research.
- Worked on developing an Opinion mining algorithm for Customer Insights using NLP concepts to detect the product from a corpus and measure the polarity of the same using statistical methods.
- Involved in Research on Synthetic Data Generation using statistical methods, sentiment analysys for customer insights with time series forecasting, custom algorithms around Speech act profiling, aspects extraction, rare words embeddings, dynamic meta embeddings and Bayesian modeling.
Confidential, Boston, MA
Data Science consultant
- Build a recommender system to help a customer configure a new truck from features and based on the historical data containing configuration selections with the help of association rules in SAP HANA, a recommender system was implemented.
- Performed data profiling to learn user behavior, data sourcing and EDA using R and HIVE on Hadoop HDFS. Involved in all aspects of data pre-processing, implementing novel ML algorithms, including standard modeling practices like feature scaling, feature engineering, etc
- Prototype machine learning algorithm for POC (Proof of Concept) SAP HANA platform was used for implementation which provides several mining algorithms, associated rules etc. SAP Lumira implementation for frontend.
Confidential, Indianapolis, IN
Data Science consultant
- The objective of this project is to evaluate treatment effect on the overall population and identify subgroups that have significantly better or worse outcome than the overall population. Involved in business planning for problem definition.
- I was involved with prototyping machine learning algorithms for POC (Proof of Concept), Participated in Data acquisition from multiple sources, data wrangling, EDA in R and data preparation, Data analysis using Spark in R and ML algorithms (SVM, NN, random forests, etc)
- Created a R-shiny application tool for interactive Data visualization with the ability of generating rmarkdown reports in word, pdf format using saved objects.
Confidential, Columbus, OH
Text classification, NLP
- Used Topic Modelling to build categories that accurately capture scientific documents in therapeutic department categories, using the data available on all historical articles and other available variables from discrete applications.
- Experimented with various feature representations and feature normalization techniques
- Perform topic modeling (latent semantic analysis - LSA, Probabilistic Latent Semantic Analysis-PLSA) to identify topics from clustered documents.
- Sandman is a proprietary financial forecasting tool, which performs multiple functions such as forecast drug’s market revenue. Worked under Advanced analytics group to build forecasting tool for their financial and prototyped algorithms for POCs.
- Worked on all different parts of a ML workflow from data collection till model building.
- Utilized hierarchical forecasting approaches like top-down, bottom-up, combination and middle out as well as range forecast predictions, time series models like Arima, and other probabilistic models.
- Presented the tool to top financial executives.
- Multiple comparisons are when one considers a set of statistical inferences simultaneously on data. this statistical technique is a method of evaluating a successful primary outcome for a molecule as well as discovering (multiple) secondary outcomes from large datasets to generate p values to discover statistically significant difference simultaneously controlling type 1 error rate these results are key to differentiate from competitors, product label and marketing. The outcomes are a part of findings submitted to the FDA and this project has helped in approval of two major drugs. Involved in evaluating different machine learning algorithms.
- Used graph-based DBs and graph-based ML algorithms including naïve Bayes on graph nodes.
Environment: R/RStudio, Shiny - for interactive visualizations, HPC grid engine
Confidential, Evanston, IL
Data Science Consultant
- Develop advanced algorithms that solve problems of large dimensionality in a computationally efficient and statistically effective manner.
- Execute and teach application of statistical and data mining techniques (e.g. hypothesis testing, machine learning and retrieval processes) on large, unstructured data sets to identify trends, figures and other relevant information;
- Lead conversations with clients and other Confidential stakeholders to effectively integrate and communicate analysis findings;
- Evaluate emerging datasets and technologies and help define our analytic platform;
- Own development of select assets/accelerators that create scale;
- Contribute to thought leadership through research and publication;
- Guide and manage project teams of Associates/Associate Consultants.