- Have unique combination of data science and technology strategy and expertise with an experience of over 25 years as a research scientist, data scientist, product/engineering manager, people manager and architect.
- Extensively managed and lead R&D projects that involve business insights and analytics through predictive modeling, big data systems and Web based software development.
- Expertise and experience in specific domains - B2B bid response pricing, Confidential pricing and revenue optimization, customer conversion and retention modeling, fraud and other anomaly detection, competition - tactics and strategy, advertisement budget optimization, and supply chain - demand forecasting, capacity planning and inventory optimization.
FUNCTIONAL AND ALGORITHMS EXPERTISE:
- Statistical Machine Learning, Hypothesis Testing, Classification and Regression. Feature Engineering, NLP, Pattern recognition (used in Forecasting) and Clustering - Market Segmentation, Anomaly Detection for Intrusion Detection and IOT, User/Item Based Recommenders: Collaborative Filtering.
- Time Series Analysis and Modeling, Forecasting, Moving Averages, Exponential Smoothing, and ARIMA
- Optimization - Cyclic and Safety Inventory Optimization, Bidding/Pricing and Revenue Optimization, Advertisement Budget Optimization
Software Tools Architecture and Design ML: Python Pandas, NumPy, Scipy, Scikit-learn, NLTK, Open NLP, R, Matlab, Tkinter, Jupyter Notebook
ML/DL: Spark ML/MLLib, Mahout, Weka, DL4j, Mallet, MOA, ELKI
Big Data: Apache Spark MLLib, Spark ML with Hadoop/YARN; AWS S3, EC2
Confidential, Phoenix, AZ
Senior Data Scientist
- Prognostics and Diagnostics of Aircraft Mission Critical Systems
- Detect the failure modes of aircraft LRUs: engine, landing gear, and pneumatic systems
- Data mining from repositories, conditioning data, developing models, creating visualizations, preparing technical presentations and documenting the development.
- Multivariate data models: Hotelling T Square and Q Statistics (fault diagnostics), and RNN: LSTM (fault prognostics)and PCA to find feature importance and reduce dimensions.
- Python, Keras and Tensorflow, Pandas, Numpy, Scikit-learn, Jupyter, etc.
Confidential, Cupertino, CA
Sr. Director and Chief Data Scientist
- As Director, built the business insights (data science) and relevant technology team to develop predictive models using structured, semi-structured and unstructured data. Mentored data analysts to come up to speed with the projects. Clarified project objectives, delegated responsibilities, and delivered results to executive management.
- As Data Scientist, developed predictive models for classification, regression and clustering problems. Developed standards and created patents and copy rights.
- As Product Manager interacted with clients, captured and documented business requirements; advocated clients on potential concepts, system functionality and use cases. Interfaced with development teams to ensure the schedules and accuracy in the delivery of use cases.
- Develop classification and clustering models for customer conversion (from free users to paid users), and retention of existing customers.
- Develop analytics to identify and study the effect of features influencing conversion and retention.
- Responsible for data extraction, preprocessing, model evaluation, and visualizations.
- Developed a logistic regression model to find probabilities of customer conversion
- Architect and design the application to deploy in a multi-node cluster environment.
Tools: Scala or Java, Python, Scikit learn, MLLib, Spark ML, Hadoop, Spark RDDs, Spark SQL, and R/RStudio/SparkR Visualizations
- Develop a clustering model based fraud detection system.
- Responsible for data extraction, visualization, preprocessing, model evaluation, and post-processing.
- Used K Means based model. Developed optimal number of clusters.
- Generated the mean entropy of clusters and F1 score to evaluate model relevance.
- Architected and designed the application to deploy in a multi-node cluster environment.
Tools: Scala, Java, MLLib, Spark ML, Hadoop, Spark RDDs, Spark SQL, and R/RStudio/SparkR Visualizations
- Pricing models/algorithms development - Price analytics, market sensitivity analysis, market segmentation, tactical promotion planning.
- Designed APIs for clustering/segmentation: Kmeans for customer segmentation.
- Developed best response functions to deal with pricing from competitors based on game theory.
- Developed optimization models to generate margin maximization.
- Created a proprietary multi-recursive market segmentation and optimization algorithm.
Environment: Python: Scikit-Learn, Hadoop HDFS, Spark, MLLib, Java API, MySQL, SQL and Eclipse
- Responsible for price segmentation/clustering and models development and code delivery.
- Conducted comparison studies on Classification algorithms to decide the best response (LR) model.
- Designed APIs for bidding LR algorithms and Machine Learning based clustering/segmentation: Kmeans.
- Preprocessed data and compressed training sets through dimensionality reduction through LDA.
- Tuned and tested algorithms for performance - accuracy and speed.
Environment: Python (x, y): Scikit-Learn, SQL, Java, J2EE, HDFS, MapReduce, MySQL DB
Confidential, Seattle, WA
- Reviewed functional requirements; proposed system requirements.
- Architected and designed a data-driven remote multiplexer XMA8/16 setup software.
- Proposed architectural standards: highly extensible pipelines-elements grid model.
- Wrote architecture document.
- Architected the systems to be avionics MDL and iNet compliant, and vendor agnostic.
- Identified and designed the client, client proxy, remote server and repository.
- Architected the key subsystem interfaces: multiplexer and MDL models, data transfer processes, etc.
- Acted as one of the key persons to communicate project development with management;
- Used Agile& Scrum.
Technologies Utilized: Java/J2EE, Spring DI, MVC, JPA, RESTful, JAXB, Angular JS, Jackson, XML.
Confidential, Plano, TXSenior Data Scientist
- Created models - used extrapolation based smoothing, and pattern recognition based algorithms to minimize statistical errors - MAPE, MAD, and MSE.
- Models operated well with level, trend and seasonal data and also for data with variance in trend.
- Developed causal forecasting models using multi-variate regression - pricing, ad budget, team size.
- Developed API interfaces to work with inventory optimization models and related data.
- Supported the team with analytical modeling and algorithms through the life cycle of product.
Environment: Java/J2EE, Numerical Recipes, Struts, SQL, Oracle DB, Tomcat 6.0, Eclipse IDE