Data Scientist Resume
SUMMARY
- 8+ years of Data Science experience building interpretable machine learning models, and building end to end data pipelines which included extracting, transforming and combine all incoming data with the goal of discovering hidden insight, with an eye to improve business processes, address business problems or result in cost savings
- Experience working with large data and metadata sources; interpret and communicate insights and findings from analysis and experiments to both technical and non - technical audiences in ad, service, and business
- Expert knowledge in breadth of machine learning algorithms and love to find the best approach to a specific problem. Implemented several supervised and unsupervised learning algorithms such as Ensemble Methods (Random forests), Logistic Regression, Regularized Linear Regression, SVMs, Deep Neural Networks, Extreme Gradient Boosting, Decision Trees, KMeans, Gaussian Mixture Models, Hierarchical models, and time series models (ARIMA, GARCH, VARCH etc.)
- Experience with applied statistical techniques and machine learning, including Bayesian methods, time-series modeling, classification, regression, mixture models, clustering, dimensionality reduction, model selection, feature extraction, experimental design, and choice modeling
- Led independent research and experimentation of new methodologies to discover insights, improvements for problems. Delivered findings and actionable results to management team through data visualization, presentation, or training sessions. Proactively involved in roadmap discussions, data science initiatives and the optimal approach to apply the underlying algorithms
- Fluent and well-versed writing production quality code in SQL, R, Python, Spark and Scala
- Hands on experience building regression, classification, and recommender systems with large datasets in distributed systems and constrained environments
- Domain expertise in architecting and building comprehensive analytical solutions in Marketing, Sales and Operations functions across Technology, Retail and Banking industries
- Hands on experience communicating business insights by dashboarding in Tableau. Developed automated tableau dashboards that helped evaluate and evolve existing user data strategies, which include user metrics, measurement frameworks, and methods to measurement
- Strong track record of contributing to successful end-to-end analytic solutions (clarifying business objectives and hypotheses, communicating project deliverables and timelines, and informing action based on findings)
- Developed and deployed dashboards in Tableau and RShiny to identify trends and opportunities, surface actionable insights, and help teams set goals, forecasts and prioritization of initiatives
- Experienced inDataModeling retaining concepts of RDBMS, Logical and PhysicalDataModeling until 3NormalForm (3NF) and MultidimensionalDataModeling Schema (Star schema, Snow-Flake Modeling, Facts and dimensions)
- Professional working experience in writing spark streaming and spark batch jobs using spark MLlib
- Hands on experience in optimizing the SQL Queries and database performance tuning in Oracle, SQL Server and Teradata databases
TECHNICAL SKILLS
Exploratory Data Analysis: Univariate/Multivariate Outlier detection, Missing value imputation, Histograms/Density estimation, EDA in Tableau
Supervised Learning: Linear/Logistic Regression, Lasso, Ridge, Elastic Nets, Decision Trees, Ensemble Methods, Random Forests, Support Vector Machines, Gradient Boosting, XGB, Deep Neural Networks, Bayesian Learning
Unsupervised Learning: Principal Component Analysis, Association Rules, Factor Analysis, K-Means, Hierarchical Clustering, Gaussian Mixture Models, Market Basket Analysis, Collaborative Filtering and Low Rank Matrix Factorization
Feature Selection: Stepwise, Recursive Feature Elimination, Relative Importance, Filter Methods, Wrapper Methods and Embedded Methods
Statistical Tests: T Test, Chi-Square tests, Stationarity tests, Auto Correlation tests, Normality tests, Residual diagnostics, Partial dependence plots and Anova
Sampling Methods: Bootstrap sampling methods and Stratified sampling
Model Tuning/Selection: Cross Validation, Walk Forward Estimation, AIC/BIC Criterions, Grid Search and Regularization
Time Series: ARIMA, Holt winters, Exponential smoothing, Bayesian structural time series
PROFESSIONAL EXPERIENCE
Confidential
Data Scientist
Responsibilities:
- Automated anomalous detection algorithms through iterative outlier detection and imputation algorithm using multiple density-based clustering techniques (DBSCAN, kernel density estimation). Identified and corrected outlier records of third-party sales data by applying newly developed algorithm and custom data normalization function. Created interactive dashboard suite that illustrated outlier characteristics across several sales-related dimensions and overall impact of outlier imputation in R (Shiny)
- Built relational databases in SQL server of several flat files of partner information from several large (5-10 GB) flat files in Python. Used logistics regression and random forests models in R/Python to predict the likelihood of partner participation in various marketing programs. Designed and developed visualizations and dashboards in R /Tableau that surfaced the primary factors that drove program participation and identified the best targets for future targeted marketing efforts
- Developed classification models to predict the likelihood of customer churn based on customer attributes like customer size, revenue, type of industry, competitor products and growth rates etc. The models deployed in production environment helped detect churn in advance and aided sales/marketing teams plan for various retention strategies in advance like price discounts, custom licensing plans etc
- Developed 11 customer segments using unsupervised learning techniques like KMeans and Gaussian mixture models. The clusters helped business simplify complex patterns to manageable set of 11 patterns that helped set strategic and tactical objectives pertaining to customer retention, acquisition and spend
- Improved sales/demand forecast accuracy by 20-25% by implementing advanced forecasting algorithms that were effective in detecting seasonality and trends in the patterns in addition to incorporating exogenous covariates. Increased accuracy helped business plan better with respect to budgeting and sales and operations planning
- Implemented market basket algorithms from transactional data, which helped identify products ordered together frequently. Discovering frequent product sets helped unearth cross sell and upselling opportunities and led to better pricing, bundling and promotion strategies for sales and marketing teams
- Measured the price elasticity for products that experienced price cuts and promotions using regression methods; based on the elasticity, Confidential made selective and cautious price cuts for certain licensing categories
- Developed machine learning models that predicted purchase propensity of customers based on customer attributes such as verticals they operate in, revenue, historic purchases and other related attributes. Predicting customer propensity helped sales teams to aggressively pursue prospective clients
Confidential
Data Scientist
Responsibilities:
- Measured the price elasticity for products that experienced price cuts and promotions using regression methods; based on the elasticity, Confidential made selective and cautious price cuts for certain licensing categories.
- Developed algorithms for optimal set of stock keeping units to be put in stores that maximized store sales, subject to business constraints; advised retailer to gauge demand transfer due to SKU deletion/addition to its assortment.
- Developed a personalized coupon recommender system using recommender algorithms (collaborative filtering, low rank matrix factorization) that recommended best offers to a user based on similar user profiles. The recommendations enabled users to engage better and helped improving the overall user retention rates at Confidential
Confidential
Data Scientist
Responsibilities:
- Clustered the customers of Confidential based on demographics, health attributes, policy inclinations using hierarchical clustering models and identified strategies for each of the clusters to better optimize retention, marketing and product offering strategies
- Built executive dashboards in Tableau that measured changes in customer behavior post campaign launch; the ROI measurements helped Confidential to strategically select the effective campaigns
- Projected customer lifetime values based on historic customer usage and churn rates using survival models. Understanding customer lifetime values helped business to establish strategies to selectively attract customers who tend to be more profitable for Confidential . It also helped business to establish appropriate marketing strategies based on customer values
- Designed and deployed real time Tableau dashboards that identified policies which are most/least liked by the customers using key performance metrics that aided Confidential for better rationalization of their product offerings
Confidential
Data Scientist
Responsibilities:
- Forecasted bank-wide loan balances under normal and stressed macroeconomic scenarios using R. Performed variable reduction using the stepwise, lasso, and elastic net algorithms and tuned the models for accuracy using cross validation and grid search techniques.
- Automated the scraping and cleaning of data from various data sources in R and Python. Developed Banks’s loss forecasting process using relevant forecasting and regression algorithms in R.
- The projected losses under stress conditions helped bank reserve enough funds per DFAST policies
- Built classification models using several features related to customer demographics, macroeconomic dynamics, historic loan payment behavior, type and size of loans, credit scores and loan to value ratios and with accuracy of 95% accuracy the model predicted the likelihood of default under various stressed conditions.
- Built credit risk scorecards and marketing response models using SQL and SAS. Evangelized the complex technical analysis into easily digestible reports for top executives in the bank.
- Developed several interactive dashboards in Tableau to visualize nearly 2 Terabytes of credit data by designing a scalable data cube structure.
Confidential
DataModeler/DataAnalyst
Responsibilities:
- Analyzed large datasets to provide strategic direction to the company. Performed quantitative analysis of ad sales trends to recommend pricing decisions.
- Conducted cost and benefit analysis on new ideas. Scrutinized and tracked customer behavior to identify trends and unmet needs.
- Developed statistical models to forecast inventory and procurement cycles. Assisted in developing internal tools for data analysis.
- Designed scalable processes to collect, manipulate, present, and analyze large datasets in production ready environment, using Akamai's big data platform
- Achieved a broad spectrum of end results putting into action the ability to find, and interpret rich data sources, merge data sources together, ensure consistency of data-sets, create visualizations to aid in understanding data, build mathematical models using the data, present and communicate the data insights/findings to specialists and scientists in their team
- Implemented full lifecycle inDataModeler/DataAnalyst,Datawarehouses and DataMart’s with Star Schemas, Snowflake Schemas, and SCD& Dimensional Modeling Erwin. Performeddatamining ondatausing very complex SQL queries and discovered pattern and used extensive SQL fordataprofiling/analysis to provide guidance in building thedatamodel
Confidential
DataAnalyst /DataModeler
Responsibilities:
- Participated in JAD sessions, gathered information from Business Analysts, end users and other stakeholders to determine the requirements
- Designed theDataWarehouse and MDM hub Conceptual, Logical and Physicaldatamodels
- Used Normalization methods up to 3NF and De-normalization techniques for effective performance in OLTP and OLAP systems.Generated DDL scripts using Forward Engineering technique to create objects and deploy them into the database
- Worked with SME's and other stakeholders to determine the requirements to identify Entities and Attributes to build Conceptual, Logical and PhysicaldataModels.
- Used Star Schema methodologies in building and designing the logicaldatamodel into Dimensional Modelsextensively. Developed Star and Snowflake schemas based dimensional model to develop thedatawarehouse.Designed Context Flow Diagrams, Structure Chart and ER- diagrams