Data Scientist Resume
SUMMARY:
- Highly motivated Data Scientist with extensive 13 + years providing enterprise solutions in Data Science, Big Data Analytics, Business Intelligence, Data Mining & Warehousing. Building robust Machine Learning models in Regression, Classification, Clustering, Anomaly Detection and Text Mining through the Confidential - DM methodology.Empowering business Decision Support Systems (DSS) by delivering data driven solutions, visualizations and statistical inferences to maximize Return on Investment (ROI), minimize costs, mitigate customer churn and accelerate business processes.
- Supervised Learning - Regression problems (Linear & Stepwise), Classification problems i.e. Logistic Regression, K-Nearest Neighbor, Decision Tree, Random Forest, SVM, Time Series Analysis & Forecasting
- Unsupervised Learning - K-means Clustering, Hierarchical Clustering, Association Rules
- Natural Language Processing ( Confidential ) - Naïve Bayes and Text mining
- Leveraging Hadoop framework (MapReduce & HDFS) to ingest, store, analyze and process large structured & unstructured datasets using Sqoop, Flume, Kafka, Hive
- Exploratory Data Analysis of large datasets, descriptive statistical analysis
- Data Preprocessing - cleansing, blending, transformation, imputation, aggregation, feature scaling, resolving data inconsistences, filtering, handling outliers
- Dimensionality reduction using feature engineering & feature selection, Confidential, Factor Analysis, Stepwise Regression (backward & forward) to maximize information gain
- Hypothesis testing; null hypothesis and alternative hypothesis
- Statistical programming using R libraries; ggplot2, plyr, dplyr, tidyr, Shiny, caret and Python packages; NumPy, Pandas, Matplotlib model evaluation based on predictive performance, Confusion Matrix, accuracy, misclassification rate, sensitivity, specificity, precision, type I & II error, R-squared, MSE, ROC Curve, k-fold cross validation, p-values, statistical significance, t-test, ANOVA
- Visualizations i.e. histograms, scatterplots, box plots, correlation matrix, pie charts, line graph
- Business Intelligence Analytics - data acquisition and integration (ETL), data warehousing, relational databases, star schema modeling, conformed dimensions modeling. Delivering interactive dashboards, visualizations, Customer 360 & Product 360 business metrics, adhoc queries, drill down reports, KPI reports, Operations metric reports
TECHNICAL SKILLS:
BI Analytical Tools: OBIEE 12c/11g/10, Tableau, Microsoft Power BI, Alteryx, Excel, Hive
Big Data Technology: Hadoop Ecosystem, HDFS, MapReduce, Spark, Hive, Sqoop, Flume, Kafka, Cloudera Manager, HUE, YARN, Cloudera Manager
Classification Models: Logistic Regression, Decision Trees, K-Nearest Neighbor, Naive Bayes Classifier, Random Forest, SVM
Regression Models: Linear & Multiple Regression, Stepwise
Unsupervised Machine Learning: K-Means Clustering, Confidential, Hierarchical Clustering, Association Rules, Time Series Analysis
Delivered Use Cases: Customer Segmentation, Churn Analysis, Predictive Analytics, Text Mining, Anomaly Detection, Market Basket Analysis
Programming: SQL, HiveQL, R, Python, WEKA
ETL: Informatica 10/9, Client Power Center Client
Databases: MS SQL Server, MySQL, Oracle 12c/11g/10g
Agile Methodology: KANBAN, JIRA
PROFESSIONAL EXPERIENCE:
Confidential
Data Scientist
Responsibilities:
- Confidential -DM approach to understanding the business objectives, business problems, gaps and areas of growth & optimization
- K-means algorithm for Customer Segmentation based on geography, similar purchasing behavior, products & services, utilization, high value accounts, risk of default and attrition
- Customer Churn analysis leveraging Predictive models to increase customer retention efforts by 3% by analyzing and predicting customers likelihood to churn based on current and historical behavioral patterns
- Speech & Text Analytics to mine consumer sentiment through call center recorded calls, correspondence, surveys, social media platforms and media publications
- Association of frequent itemsets using Apriori algorithm to derive > 0.60 % product association such as support, lift and confidence
- Hypothesis testing, comparing and selecting the best performing algorithmic model based on the target variable measurement, generalization performance, training speed, statistical significance of predictors
- Model evaluation based on the R-squared, Adjusted R-Squared, Mean Squared Error (MSE), Mean Absolute Error (MAE), Confusion Matrix; accuracy, misclassification rate, type I & type II errors, specificity, sensitivity, precision vs recall balance, PPV, NPV values, ROC curve, k-fold cross validation, Null Deviance and Residual Deviance
- Collaborating with colleagues and SME’s in Peer reviews to ensure data products, statistical inferences and recommendations are accurate, generalizable and production-ready
- Automation and deployment of robust models into production and performance tuning
- Communicating and providing visualizations to empower business stakeholders to make informed and strategic goals based on the statistical inferences, insights, trends and forecasts to maximize revenue, risk optimization, accelerate business process and increase the overall quality of the customer lifecycle
Confidential
Sr Software Developer
Responsibilities:
- OBIEE 11.1.1.7.1 installation and configuration on Windows OS
- T- shaped contributor by rotating all 3 roles as BI Analyst, Developer and UAT Tester
- Semantic layer modeling using the Administration Tool; Physical, BMM, Presentation Layers based on OBIEE best practices
- Establishing physical joins, star schema dimensional modeling, defining logical tables, aggregation levels, content level, hierarchies, building subject areas, Initialization blocks and repository variables
- Report development using OBIEE Analytics, BI Publisher and Tableau to build interactive dashboards, adhoc reports, Performance Tiles, Scatterplots, Heat Map, Bar charts, Line charts, Box Plots, Geographic map, Tree map, Highlight map, Gantt chart, Bubble chart etc.
- Implementing data level security, configuration of user, groups & Application Roles, performance tuning, monitoring system metrics, pinging server availability, usage tracking, managing Catalog root directories, unit testing, version control and object migrations
- BI Administrative functions - catalog migration, repository versioning, deployments, code migration, monitoring system health, restarting BI services, performance tuning, debugging, cache management, provisioning user access
- OBIEE Training and development of both onshore and offshore resources
- Providing deliverables within SLA guidelines to both internal and external Stakeholders