Da Ta Scientist/ Data Science Consultant Resume
San Jose, CA
SUMMARY:
- Data analytics professional with 7 years of experience in delivering end to end data analytics, operational intelligence projects with a demonstrated ability to deliver valuable business insights leveraging advanced data - driven methods across multiple domains.
TECHNICAL SKILLS:
Programming Languages: Python, R language, H2O.ai, H2O FLOW, Bash scripting, SAS Base, SAS Enterprise Miner, Regular Expressions and SQL (Oracle, MySQL & SQL Server). Keras, TensorFLow, Pandas, NumPy, SciPy, Scikit: Learn, NLTK, matplotlib, Seaborn, Bokeh, ggplot2, dplyr, data.table, H2o.ai., SparkR and PySpark
Machine Learning: Linear Regression, Logistic Regression, Multinomial logistic regression, Regularization (Lasso & Ridge), Decision trees, Support Vector Machines, Ensembles - Random Forest, Gradient Boosting, Xtreme Gradient Boosting(xGBM), Time series forecasting( ARIMA, Exponential Smoothing methods), Dimensionality Reduction- Principal Component Analysis(PCA), LDA, Weight of Evidence (WOE) and Information Value, Hierarchical & K-means clustering, K-Nearest Neighbors and A/B testing.
Good knowledge on: Neural Networks, Deep Neural Networks (CNN & RNN) and LSTM.
Business Intelligence: Tableau, Qlik Sense, Google Cloud Data Studio, Advanced Microsoft Excel and Power BI.
Big Data Tools: Spark/ PySpark, Hive, Impala, HUE, Map Reduce, HDFS, Sqoop, Flume, Kafka and Oozie
Text Pre: Processing, Information Retrieval, Classification, Topic Modeling, Text Clustering, Sentiment Analysis and Non-negative Matrix Factorization (NMF)).
Cloud Technologies: Google Cloud Platform Big Data & Machine Learning modules - Cloud Storage, Cloud DataFlow, Cloud ML, BigQuery, Cloud Dataproc, Cloud Datastore, BigTable. Familiarity on AWS - EMR, EC2, S3.
Version Control: Git
PROFESSIONAL EXPERIENCE:
Confidential, San Jose, CA
Da ta Scientist/ Data Science Consultant
- Designed a price optimization engine which enable to better price, refresh products for customers in different programs. Utilized Non-linear optimization technique (NLOPT) for maximizing revenue while satisfying the constraints. Performed clusters analysis on PID’s that covers 80% of total revenue.
- Build a 3 month demand forecast using ARIMA time-series technique. Performed demand and discount variation analysis across different channels and conducted exploratory data analysis on variables for building model constraints using python.
- Built RF and MFG Price correlation analysis, Segment based pricing analysis for demand and inventory mapping. Developed Optimization Models for clusters covering PIDs which contribute to 68% of the total revenue. Expected an incremental revenue of 15% to 20% (~5M).
- Predominantly used Python, R, PySpark, HDFS, Sqoop, Hive (HQL) and Oracle SQL developer in providing insights to the business users.
- Built an analytical model to forecast Build to Max for each PID for different theatres. Provided recommendations for building optimal safety stock in order to avoid overbuild or under build situations. This model could reduce overbuilding of finished goods inventory (FGI) to about 40% compared to as-is model.
- Developed a segmentation framework based on historical demand and product life cycle attributes End of life (EOL), NPI activation date and inventory holding attributes.
- Blended lead times for damaged goods processing at different repair sites with demand variability for calculating safety stock. Automated the model predictions for safety stock calculations per business requirements.
Confidential, Fort Worth, TX
Da ta Scientist
- Built an analytics tool at its core which help to make better pricing and contracting decisions and that enabled commercial pricing strategy teams to properly assess the profitability of proposed deals.
- Performed data cleansing and data mapping using python programming language for generating price variations to enable pricing decisions for profit analysis of contracts.
- Clustered customers base using unsupervised clustering techniques based on historical sales, contracts and New Business Opportunities. Analyzed customer attrition and player market share among the cluster segments created.
- Designed data pipelines leveraging big data technology tools as HIVE, PySpark to support enterprise pricing dashboards. Developed Hive queries to interact with Hadoop data lake on AWS for data transformation (ETL).
- Integrated disparate data from sources like SalesForce, SAP BI extracts, Siebel and third-party data like AnalySource, IMS NSP, Predictive Acquisition Cost® (PAC) and internal sources.
- Developed comprehensive data visualizations in Tableau to illustrate complex ideas to various stakeholder levels on KPI’s like product supply & capacity, cost for moving out of recommended tiers, price increase trend analysis by competitor.
- Helped client to accelerate responses to competitive situations and provide insights into revenue leaks for stronger negotiating position. This solution could save up to $4M annually.
Confidential, Bowling Green, OH
Graduate Assistant
- An alyzed M edi care res o u r ce u tiliz atio n g ro up s (RU G’s) an d M an ag ed C are i n sur an ce claims d ata fr o m h ealt h care p rov id er and p redi ct ed residents with negative margins u sin g Regr essi o n and CART.
- Handled class imbalance using re-sampling techniques.
- Utilized Logistic regression in R to identify the factors affecting margin and predict residents with negative margins.
- Build Gradient Boost Model utilizing H20.ai in R to analyze variable importance and evaluate model performance.
- Performed clustering analysis on historical patient level data to classify them into payment (total expense per stay) groups and identified parameters impacting expenditures and provided recommendations to drive reimbursements.res, holidays and other variables.