- Over 7+ years of experience in data science building end to end data science solutions using R, Python, Java, SQL and Tableau by leveraging machine learning based algorithms, Statistical Modeling, Data Mining, Natural Language Processing (NLP) and Data Visualization
- Experienced with a range of data analysis techniques such as linear regression, random forest, clustering, boosting, Factor analysis, supervised and unsupervised learning, STL analysis, graphic modeling, graph algorithms, etc.
- Experience building user centric analytics, statistical definition of the outlier, and developed methods to systematically identify these outliers. Worked out why such examples are outliers and defined if any actions needed
- Proactive participation product roadmap discussions, data science initiatives and the optimal approach to apply the underlying algorithms
- Proficient in applying unsupervised learning algorithms like K - Means & Medoids, Principal Component Analysis, Hierarchical clustering, Gaussian Mixture Models and Bayesian models in Retail, Finance and Consumer analytics
- Expert knowledge in breadth of machine learning algorithms and love to find the best approach to a specific problem. Implemented several supervised and unsupervised learning algorithms such as Ensemble Methods (Random forests), Logistic Regression, Regularized Linear Regression, SVMs, Deep Neural Networks, Extreme Gradient Boosting, Decision Trees, KMeans, Gaussian Mixture Models, Hierarchical models, and time series models (ARIMA,GARCH, VARCH etc)
- Experience working with large data and metadata sources; interpret and communicate insights and findings from analysis and experiments to both technical and non-technical audiences in product, service, and business
- Strong track record of contributing to successful end-to-end analytic solutions (clarifying business objectives and hypotheses, communicating project deliverables and timelines, and informing action based on findings)
- Strong computational background (complimented by Statistics/Math/Algorithmic expertise), healthy portfolio of projects dealing with Big Data, solid understanding of machine learning algorithms, and with a love for finding meaning in multiple imperfect, mixed, varied, and inconsistent data sets
- Hands on experience in Data visualization with Tableau: Line and scatter plots, Bar Charts, Histograms, Pie chart, Dot charts, Box plots, Time series, Error Bars, Multiple Charts types, Multiple Axes, subplots etc
- Fluent and well-versed writing production quality code in R, Python, Java, SQL and Spark. Developed and deployed dashboards in Tableau and RShiny to identify trends and opportunities, surface actionable insights, and help teams set goals, forecasts and prioritization of initiatives
- Professional working experience in writing spark streaming and spark batch jobs using spark MLlib
- Experience using multiple ETL tools in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export such as Ab Initio, Alteryx and Informatica Power Center
- Hands on experience in optimizing the SQL Queries and database performance tuning in Oracle, SQL Server and Teradata databases. Experienced in Data Modeling retaining concepts of RDBMS, Logical and Physical Data Modeling until 3NormalForm (3NF) and Multidimensional Data Modeling Schema (Star schema, Snow-Flake Modeling, Facts and dimensions)
- Exposure to AI and Deep learning platforms such as PyTorch, TensorFlow, Keras, Theano, CNN, RNN
Exploratory Data Analysis: Univariate/Multivariate Outlier detection, Missing value imputation, Histograms/Density estimation, EDA in Tableau
Supervised Learning: Linear/Logistic Regression, Lasso, Ridge, Elastic Nets, Decision Trees, Ensemble Methods, Deep Neural Networks, Bayesian Learning, Time Series Forecasting (ARIMA, Holt winters and Exponential smoothing)
Unsupervised Learning: Principal Component Analysis, Association Rules, Factor Analysis, K-Means, Hierarchical Clustering, Gaussian Mixture Models, Market Basket Analysis, Collaborative Filtering and Low Rank Matrix Factorization
Feature Selection: Stepwise, Recursive Feature Elimination, Relative Importance, Filter Methods, Wrapper Methods and Embedded Methods
Statistical Tests: T Test, Chi-Square tests, Stationarity tests, Auto Correlation tests, Normality tests, Residual diagnostics, Partial dependence plots and Anova
Sampling Methods: Bootstrap sampling methods and Stratified sampling
Model Tuning/Selection: Cross Validation, Walk Forward Estimation, AIC/BIC Criterions, Grid Search and Regularization
Machine Learning / Deep Learning: R caret, glmnet, forecast, xgboost, rpart, survival, arules, sqldf, dplyr, nloptr, lpSolve, ggplot
Python: pandas, numpy, scikit-learn, scipy, statsmodels, ggplot2,tensorflow
SAS: Forecast server, SAS Procedures and Data Steps
Spark: MLlib, GraphX
Frameworks: ReactJS, NodeJS, ExpressJS, AngularJS, Spring boot, JDBC
Tools: /IDE GitHub, Eclipse, Jenkins, Celery, RabbitMQ, AWS, Jira, Azure, Azure IOT
Databases/ETL/Query: Teradata, SQL Server, Postgres and HDFS, YARN, MapReduce, PIG, HIVE, Hue Spark, & Kafka
Visualization: Tableau, ggplot2 and RShiny
Prototyping: RShiny, Tableau, Balsamiq and PowerPoint
Confidential, Sunnyvale, CA
- Developed 5 customer segments using unsupervised learning techniques like KMeans and Gaussian mixture models. The clusters helped business simplify complex patterns to manageable set of 11 patterns that helped set strategic and tactical objectives pertaining to customer retention, acquisition, spend and loyalty
- Productionized models and built end to end automated system with insights that drove business actions
- Implemented market basket algorithms from transactional data, which helped identify products ordered together frequently. Discovering frequent product sets helped unearth Cross sell and Upselling opportunities and led to better pricing, bundling and promotion strategies for sales and marketing teams
- Actively participated in design and modification of models for machine learning. Analyzed prediction data and revise existing machine learning models and efforts
- Provide feedback to signal processing and algorithm teams for further improvement. Monitor data integrity and quality and alert team members when inconsistencies arise and help troubleshoot the problem
- Implemented various advanced forecasting algorithms that were effective in detecting seasonality and trends in the patterns and thus helped in improving sale/demand forecast accuracy by 20-25% which helped business plan better with respect to budgeting and sales and operations planning
- Tuned model parameters (p,d,q for ARIMA) using walk forward validation techniques
- Communicated results and ideas to key decision makers
- Predicted the likelihood of customer attrition by developing classification models based on customer attributes like customer size, revenue, type of industry, competitor products and growth rates etc. The models deployed in production environment helped detect churn in advance and aided sales/marketing teams plan for various retention strategies in advance like price discounts, custom licensing plans etc
- Communicated key results to senior management in verbal, visual, and written media
- Developed machine learning models that predicted sale propensity of customers based on customer attributes such as verticals they operate in, revenue, historic purchases and other related attributes. Predicting customer propensity helped sales teams to aggressively pursue prospective clients
- Projected customer lifetime values based on historic customer usage and churn rates using survival models. Understanding customer lifetime values helped business to establish strategies to selectively attract customers who tend to be more profitable for groupon. It also helped business to establish appropriate marketing strategies based on customer values
- Developed a machine learning system that predicted purchase probability of a particular offer based on customer’s real time location data and past purchase behavior; these predictions are being used for mobile coupon pushes
Confidential, Minneapolis, MN
- Developed algorithms for optimal set of Stock keeping units to be put in a store that maximized store sales, subject to business constraints; advised retailer to gauge demand transfer due to SKU deletion/addition to its assortment
- Measured the price elasticity for products that experienced price cuts and promotions using regression methods; based on the elasticity, groupon made selective and cautious price cuts for certain licensing categories
- Designed and deployed real time Tableau dashboards that identified items which are most/least liked by the customers using key performance metrics that aided retailer towards better customer centric assortments. It also aided retailer towards strategies pertaining to better product placement, bundling and assortments
- Developed a personalized Item recommender system using recommender algorithms (collaborative filtering, low rank matrix factorization) that recommended best offers to a user based on similar user profiles.
- The recommendations enabled users to engage better and helped improving the overall user retention rates
- Clustered the supply chain of Confidential stores based on volume, volatility in demand and proximity to warehouses using Hierarchical clustering models and identified strategies for each of the clusters to better optimize the service level to stores
- Built Tableau dashboards that tracked the pre and post changes in customer behavior post campaign launch; the ROI measurements helped retailer to strategically extend the campaigns to other potential markets
Confidential, San Francisco, CA
Jr. Data Scientist
- Automated the scraping and cleaning of data from various data sources in R and Python.
- Developed Banks’s loss forecasting process using relevant forecasting and regression algorithms in R
- The projected losses under stress conditions helped bank reserve enough funds per DFAST policies
- Forecasted bank-wide loan balances under normal and stressed macroeconomic scenarios using R.
- Performed variable reduction using the stepwise, lasso, and elastic net algorithms and tuned the models for accuracy using cross validation and grid search techniques
- Built Mortgage risk scorecards and marketing response models using SQL and SAS. Evangelized the complex technical analysis into easily digestible reports for top executives in the bank
- Developed several interactive dashboards in Tableau to visualize nearly 2 Terabytes of credit data by designing a scalable data cube structure
- Built classification models using several features related to customer demographics, macroeconomic dynamics, historic loan payment behavior, type and size of loans, credit scores and loan to value ratios and with accuracy of 95% accuracy the model predicted the likelihood of default under various stressed conditions
Confidential, San Jose, CA
Data Modeler/Data Analyst
- Analyzed large datasets to provide strategic direction to the company.
- Performed quantitative analysis of product sales trends to recommend pricing decisions.
- Conducted cost and benefit analysis on new ideas.
- Scrutinized and tracked customer behavior to identify trends and unmet needs.
- Developed statistical models to forecast inventory and procurement cycles. Assisted in developing internal tools for data analysis.
- Designed scalable processes to collect, manipulate, present, and analyze large datasets in a production ready environment, using Akamai's big data platform
- Achieved a broad spectrum of end results putting into action the ability to find, and interpret rich data sources, merge data sources together, ensure consistency of data-sets, create visualizations to aid in understanding data, build mathematical models using the data, present and communicate the data insights/findings to specialists and scientists in their team
- Implemented full lifecycle in Data Modeler/Data Analyst, Data warehouses and DataMart’s with Star Schemas, Snowflake Schemas, and SCD& Dimensional Modeling Erwin. Performed data mining on data using very complex SQL queries and discovered pattern and used extensive SQL for data profiling/analysis to provide guidance in building the data model
Confidential, Franklin Lakes, New Jersey
- Participated in JAD sessions, gathered information from Business Analysts, end users and other stakeholders to determine the requirements
- Designed the Data Warehouse and MDM hub Conceptual, Logical and Physical data models
- Used Normalization methods up to 3NF and De-normalization techniques for effective performance in OLTP and OLAP systems.
- Generated DDL scripts using Forward Engineering technique to create objects and deploy them into the database
Data Analyst /Data Modeler
- Worked with SME's and other stakeholders to determine the requirements to identify Entities and Attributes to build Conceptual, Logical and Physical data Models.
- Used Star Schema methodologies in building and designing the logical data model into Dimensional Models extensively. Developed Star and Snowflake schemas based dimensional model to develop the data warehouse. Designed Context Flow Diagrams, Structure Chart and ER- diagrams