- Around 8 years of professional experience in the Insurance, Banking and Financial Services Industry with adept knowledge on Data Analytics , Machine Learning ( ML ), Predictive Modelling , Natural Language Processing ( NLP ) and Deep Learning algorithms.
- Proficient in Data cleaning , Exploratory data analysis ( EDA ) and Initial Data Analysis ( IDA ).
- Experienced in facilitating the entire lifecycle of a data science project: Data Extraction, Data Pre - Processing, Feature Engineering, Dimensionality Reduction, Algorithm implementation, Back Testing and Validation.
- Expert knowledge in machine learning algorithms such as Ensemble Methods (Random forests), Linear, Polynomial, Logistic Regression, Regularized Linear Regression, SVMs, Deep Neural Networks, Extreme Gradient Boosting, Decision Trees, K-Means, K-NN, Gaussian Mixture Models, Hierarchical models, Naïve Bayes.
- Well versed with dealing with Structured and Unstructured data , Time Series data and statistical methodologies like Hypothesis Testing , ANOVA , multivariate statistics, regression, classification, modeling, decision theory, time-series analysis and Descriptive statistics.
- Proficient in Data transformations using log, square-root, reciprocal, cube root, square and complete box-cox transformation depending upon the dataset.
- Concrete mathematical background in Statistics , Probability , Differentiation and Integration , Linear Algebra and Geometry
- Proficient Confidential wide varieties of Data Science programming languages Python, R, SQL, Tableau, Sci-kit Learn, NumPy, SciPy and Pandas .
- Experience with relational and non-relational databases such as MySQL, SQLite and SQL.
- Adroit Confidential employing various Data Visualization tools like Tableau , Power BI , Matplotlib , Seaborn , ggplot2 , and Plotly.
- Extremely organized with demonstrated skills to perform several tasks and assignments simultaneously within the scheduled time.
TECHNICAL SKILLS AND TOOLS:
Machine Learning: Classification, Regression, Feature Engineering, Clustering, Neural Networks, Regression analysis, Naive Bayes, Decision Tree, Random Forest, Support Vector Machine, Neural Network, KNN, Ensemble Methods, K-Means Clustering, Natural Language Processing (NLP),Sentiment Analysis, Collaborative Filtering, ML packages (TensorFlow, PyTorch, Keras, Caffe).
Statistical Analysis: Time Series Analysis, Regression models, Confidence Intervals, Principal Component Analysis and Dimensionality Reduction, cluster analysis.
Programming Languages: Python (panda, numpy, Scikit-learn), R, SQL.
Selected Coursework: Linear Algebra, Multivariate Calculus, Probability and Statistics, Time Series Analysis.
IDE: Jupyter-Notebook, R Studio, Spyder.
Confidential, Dayton, OH
- Used statistical techniques for hypothesis testing to validate data and interpretations.
- Used text mining and predictive modeling Miner to cleanse and mine collected data in order to provide modeling and analysis of structured and unstructured data used for major business initiatives.
- Analyzed data using data visualization tools and reported key features using statistical tools and supervised machine learning techniques to achieve project objectives.
- Created data visualizations and reports to convey results and analyze data using Tableau.
- Created visualizations in Tableau using Excel data extract source. Performance tuning by analyzing and comparing the turnaround times between SQL and Tableau.
- Carried out Statistical Analysis such as Hypothesis and Chi-square tests using R.
- Built models were built using supervised classification techniques like K-Nearest Neighbor (KNN), Logistic Regression and Random Forests with Principal component analysis to identify important features.
- Built models using K-means clustering algorithm to create user groups.
- Scheduled data refresh on Tableau Server for weekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
- Used Jupyter notebook for writing Python scripts for training/testing data sets.
- Implemented Naïve Bayes, Decision Trees, Random Forest and Gradient Boosting for predictive analysis using python Scikit-Learn.
- Maintained data warehouse tables through the loading of data and monitored system configurations to ensure data integrity.
- Perform data extraction, sampling, advance data mining and statistical analysis using linear and logistic regression, time series analysis and multivariate analysis within R and Python.
Confidential, San Antonio, TX
- Built advanced Machine Learning classification models like KNN, SVR and clustering algorithms Hierarchical Clustering.
- Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
- Used K-Means Algorithm Model with different clusters to find meaningful segments on customers, and calculated the accuracy of model.
- Performed data mining on data using very complex SQL queries and discovered pattern and used extensive SQL for data profiling/analysis to provide guidance in building the data model.
- Responsible for mining large data sets and connected data from different sources in order to identify insights and designs.
- Analyzed data and predicted end customer behaviors and product performance by applying machine learning algorithms using Spark MLlib.
- Used Pyhon, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks.
- Involved in creating Data Lake by extracting customer's Big Data from various data sources (from Excel, Flat Files, Oracle, SQL Server, Mongo DB, HBase, Teradata and also log data from servers) into Hadoop HDFS.
- Generated ad-hoc SQL queries to fetch data from SQL Server database systems.
- Created dashboards in Tableau desktop based on the data collected from MS-excel and CSV files, with MS SQL server databases.
- Prepared and presented complex written and verbal reports, findings, and presentations by using various visualization tools such as Matplotlib, ggplot2.
- Performed Exploratory Data Analysis (EDA) to maximize insight into the dataset, detect the outliers and extract important variables numerically and graphically.
- Performed various data manipulation techniques in statistical analysis like missing data imputation, indexing, merging, and sampling.
- Built multiple classification algorithms, such as Logistic Regression, Support Vector Machine (SVM), Random Forest, Ada boost and Gradient boosting using Python, Scikit-Learn and evaluated the performance on customer discount optimization
- Applied multiple Machine Learning (ML) and Data Mining techniques to improve the quality of product ads and personalized recommendations
- Developed NLP with Deep Learning algorithms for analyzing text improving over their existing dictionary-based approaches
- Employed statistical tests such as hypothesis testing, Confidential -test, confidence intervals, for error measurements.
- Used Optimization Technique Simulated Annealing and Decision Tree ML concepts. Statistical concepts were widely used including Central Limit Theorem, Probability Concept, Probability Distribution (Binomial, and Poisson Distribution)
- Built a Proof of Concept (POC) by researching the user behavior and historical trends and developed a fraud detection model strategy using Random Forests and Decision Trees
- Worked in creating different visualizations in Tableau using Bar charts, Line charts, Pie charts, Maps, Scatter Plot charts, and Table reports
- Implemented machine learning techniques and interpreted statistical results which are ready- consumption for senior management and clients
- Generated reports in case of Decline Claims using Tableau
- Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results
- Applied concepts of probability, distribution, and statistical inference on the given dataset to unearth interesting findings using comparison, Confidential -test, F-test, R-squared, P-value etc
- Applied linear regression, multiple regressions, ordinary least square method, mean-variance, the theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Naive Bayes, fitting function etc to data with help of Scikit, SciPy, NumPy and Pandas module of Python.
- Applied Principal Component Analysis (PCA) based unsupervised technique to determine unusual VPN log-on time.
- Performed Clustering with historical, demographic and behavioral data as features to implement the personalized marketing to the customers
- Also created classification model using Logistic Regression, Random Forests to classify dependent variable into two classes which are risky and okay
- Used F-Score, Precision, recall evaluating model performance
- Built user behavior models for finding activity patterns and evaluating risk scores for every transaction using historic data to train the supervised learning models such as Decision trees, Random Forests and SVM
- Real time analysis of customer’s financial profile and providing recommendation for financial products best suited.
- Collected historical data and third-party data from different data sources and performed data integration using Alteryx.
- Forecasted demand for loans and interest rates using Time Series analysis like ARIMAX, VARMAX and Holt-Winters.
- Obtained better predictive performance of 81% accuracy using ensemble methods like Bootstrap aggregation (Bagging) and Boosting (Light GBM, Gradient)
- Tested complex ETL mappings and sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
- Developed visualizations and dashboards using ggplot, Tableau
- Prepared and presented data quality report to stakeholders to give understanding of data