We provide IT Staff Augmentation Services!

Data Scientist Resume

PROFESSIONAL SUMMARY:

  • Over 11+ years of experience in Analytics, Consulting and Implementation in BFS, CPG retail and Insurance industry.
  • Experienced with techniques like Clustering, PCA, Regression (Linear, Generalized Linear, and Logistic), Time Series Forecasting, Machine Learning Techniques like Decision tree, Random Forest and Support Vector Machines.
  • Text mining, Sentiment Analysis, Neural Network.
  • Having an extensive experience in Data preparation, validation, processing and predictive modelling for Banking and financial services like Investment banking and mortgage home loans portfolios, retail consumer packaged goods worked on point of sales data to forecast the sales for future demand for different clients.
  • Extensive knowledge in extraction, transformation and loading using SAS, R, SQL and Python and SPSS.
  • Primary role involves data preparation, cleaning, validation, processing and predictive modelling.
  • Experience in Azure machine learning using R programming to visualize the reports.
  • Extensive knowledge on the SAS programming and preparing the adhoc reports.
  • Have led multiple projects on predictive modelling and forecasting and handling the team members.
  • Have provided consulting insights while handling the forecasting and predictive modelling projects for banking and retail consumer packaged goods clients.
  • Analysis on missing value and outlier analysis using SAS, R and Python
  • Worked on loan performance and loan monitoring for mortgage and auto Insurance portfolio.
  • Excellent analytical, problem solving, interpersonal and communication skills

KEY SKILLS:

BASE/SAS, SAS/STAT, SAS/ETS, SAS/Macros, SQL server, R - Studio, Python like pandas, Numpy and Scikit learning, Matplot lib packages, Tableau

PROFESSIONAL EXPERIENCE:

Confidential

Data scientist

Technology: R-Studio, R-Shiny and Python

Responsibilities:

  • Extracting data from Flat files using structure and unstructured data and loading into Python and R-Studio platform
  • Used Numpy, Pandas and Matplotlib packages for cleaning and processing the unstructured data.
  • Used natural language processing using text mining to convert word to vector using genism package and removing the punctuation, special characters and semantic words.
  • Create document term matrix and term co-occurrence matrix and mapping each defect word each test case words using similarity matrix.
  • Creating testcases historical weighted average for fails and passes and functionalities of testcases to predict the future test case cycle failure.
  • Built logistic regression, Random forest and Naïve Bayes algorithm to predict the future test cycle failure.
  • Validate the model using Accuracy, Precision, Recall and F1 score, Confusion matrix

Confidential

Data scientist

Technology: BASE/SAS, SAS/STAT, SAS/ETS, SAS/Macros and R-Studio

Responsibilities:

  • Extracting data from Flat files, SQL Db2 and oracle database into SAS and R-Studio platform
  • Used SAS and different R function like dplyr, sqldf and stringr packages for data preparation and data cleaning.
  • Doing missing value and outlier analysis for all the variables. Imputing missing values with mean, median and multiple regression. Capturing outliers by boxplots
  • Identifying the dependent/target variable with the repayment of loan from recent past 12 months with 90/120days late payment.
  • Performing exploratory statistical data analyses for both categorical and numerical variables.
  • Creating univariate and bivariate analysis using various graphs and plots to understand the distribution of data.
  • Building logistic regression and Decision tree to predict the probability of default and prepayment for different home loans like prime and subprime.
  • Validate the models with performance data with confusion metric and decile analysis and lift chart.
  • Documentation of R and SAS codes and preparation of project reports.
  • Validating the models with the metrics like Confusion metric, ROC curves, Lift curve and Accuracy Ratio.

Confidential

Data scientist

Technology: R Software and SAS

Responsibilities:

  • Extracting data from SQL database into python platform
  • Used different function like pandas and Numpy libraries for data preparation and data cleaning.
  • Doing missing value and outlier analysis for all the variables.
  • Identifying the dependent/target variable that the property assets are not sold within 90/120 days.
  • Performing exploratory statistical data analyses for both categorical and numerical variables.
  • Performed predictive modelling to generate marketability scores for the REO properties which are getting sold through Auction and Retail channels. These scores were developed on three different phases after property gets listed in market (Pre-Valuation, Post Valuation & Auction phase). Developed weighted scores are then fed into Auction engine for further mathematics and to generate different statistical insights
  • Building logistic regression to predict the probability of assets that sold within 90 days.
  • Performed EDA to analyze bidding data on various dimensions such as Bid Timing, Occurrence of online bids, Bid Amount distribution & Auctioneer activities across time frames. Developed an Artificial NN based simulator engine that predict future bids and simulate different bid-time scenarios. It also accounts for many other process decisions taken by Auctioneers. The algorithm is set to provide improvise results after every new bid instance occur in live environment.
  • Statistical Techniques Applied: Logistic Regression, Artificial Neural Network based algorithm, R Programing & Algorithm etc.

Confidential

Predictive modeler

Technology: BASE/SAS, SAS/STAT, SAS/ETS and SAS/Macros, R-Studio

Responsibilities:

  • Extracting data from flat files into R platform
  • Used different R function like dplyr and stringr packages for data preparation and data cleaning.
  • Doing missing value and outlier analysis for all the variables.
  • Involved in model building using PROC Logistic and PROC PHREG procedure and test for assumptions of logistic and multinomial logistic regression and Interpretation of results
  • Developed Acquisition of existing and new balances forecasting models for Amount of balances in money market accounts based on Account age, Customer age, customer segment, region, channels with Interest rate, Cross rate, Comp rate, CPI and other covariates and regimes
  • Constructing accuracy ratio, Gini, ROC curve and Interpreting the results

Confidential

Statistical Analyst

Technology: SAS, R & Python

Responsibilities:

  • Data preparation, cleaning and merging the necessary files suitable to perform the time series analysis with SAS and R
  • Model building with using multiple regression and Time series analysis like Exponential smoothening, ARIMA and Finalizing the models based on the MAPE, Historical CAGR, Forecasted CAGR, and Month on month % change with Business intuition.
  • Generate Proc SQL to extract data from csv, excel and oracle database.
  • Data preparation and Data processing using BASE/SAS and SAS/Macros to build time series Analysis.
  • Involved in model building using Exponential smoothing and PROC ARIMA procedure for univariate and multivariate forecasts.
  • Creating TOP DOWN and BOTTOM UP Approach to align channel forecasts with each segments and category forecasts using BASE/BASE, SAS/ETS and SAS/MACROS.
  • Decompose volume & dollar in to different factors using more contributing the volume growth rate using BASE/SAS and SAS/STAT.
  • Involved in preparing SAS code to pick the best forecasts using MAPE, CAGR, YTD, YTG, YOY growth and annual share.
  • Involved in giving business insights for all the categories, segments and their channels.
  • Documentation of analysis and SAS Programs Preparing the waterfall charts to easily understand the historical data.
  • Prepared spaghetti graphs to visualize all forecasts into single plot and documentation of codes and reports.

Confidential

Statistical Analyst

Technology: R and Python

Responsibilities:

  • Create data audit document to confirm the sufficiency of data to meet the objectives. (Solutions to Business problems)
  • Understand business objectives & goals of the project.
  • Perform data profiling and identify data characteristics for important attributes. Identify attribute cleansing rules. Create delivery document that includes what are all the objectives we can achieve using given data, brief methodology, final deliverables to clients and timelines.
  • Prepare end to end ETL mapping from source to Stage to Cleansing to final Data Hub.
  • Design ETL Framework and Define and build Metadata repository.
  • Measures Health of the data by giving the percentage of missing; variable type and quick distribution charts (box plot; histogram)

Hire Now