Sr.data Scientist/analyst Resume
Yonkers New, YorK
SUMMARY
- Over 6+ years of experience in Data Analytics, Data Science, Natural Language Processing (NLP) and Data Visualization
- Over 4+ years’ work experience in customer segmentation, strategy development, operation analysis and customer behavior analysis
- Good Experience in handling Structured and Unstructured data, developing various Statistical Machine Learning solutions to address several business problems and creating data visualizations using Python and Tableau
- Critical thinker with strong ability to frame and address challenging scientific problems
- Experienced in Python with teh focus to improve, validate, deploy, and optimize teh machine learning models that support many aspects of teh business
- Passionate about data forecasting and pattern detection with a strong ability to process, analyze and visualize teh datato find patterns and key trends
- Proficient in coding in Python, SQL, R, SAS and has good knowledge on Spark
- Proficient in Tableau for visualizations, creating ad - hoc reports, dashboards, andstorytelling
- Strong experience working in an integrated Agile for all phases of Software Development Lifecycle (SDLC) including statistical data analysisand hypotheses testing
- Worked in Apache Spark and Hadoop ecosystem (HDFS, MapReduce, Sqoop, Hive) to handle large datasets
- Knowledge on Spark MLlib algorithms and services such as regression, classification, clustering, collaborative filtering and dimensionality reduction
- Experience in designing and applying machine learning models such as Logistic Regression, Decision Trees, K Means, Random Forest, Support Vector Machines, KNN, XGBoost and Neural Networks
- Experienced in Text mining, Natural Language Processing, Sentiment Analysis, Text classification, Topic modeling, Segmentation methodologies
TECHNICAL SKILLS
Programming Languages: SQL, Python, R, SAS, JavaScript
Machine Learning: Linear regression, SVR, KNN, Naive Bayes, Logistic Regression, Linear Discriminant Analysis (LDA), SVM, Random Forest, Boosting, K-means clustering, Hierarchical clustering, Latent Dirichlet Allocation (LDA), Collaborative filtering, Artificial Neural Networks, CNN, RNN, LSTM, NLP
Development Environments: Jupyter, Spyder, Pycharm, Microsoft VS Code, RStudio,XML, JSON, REST
Big Data Tools/Services: Spark, Hadoop Map Reduce, Hive, Sqoop
Databases: Microsoft SQL Server, Oracle, PostgreSQL
Python Libraries: Scikit Learn, Pandas, Numpy, Scipy, Matplotlib, Seaborn, Plotly, NLTK, Gensim - Word2Vec, GloVe; Keras, TensorFlow
Data Visualization: Tableau, Microsoft Power BI
PROFESSIONAL EXPERIENCE
Sr.DATA SCIENTIST/Analyst
Confidential, Yonkers, New York
Responsibilities:
- Support, extract and deliver physical health, behavioral health, care coordination, care integration, care transition, collaborative care, chronic disease, patient’s surveys and continuity report
- Utilize Oracle SQL skills to assess and analyze data for quality and integrity
- Built Machine Learning models to predict teh Claim Fraud detection
- Developed Predictive Models to assess Healthcare Quality and administer Healthcare Costs using Medical Claims and Pharmacy Claims Data
- Built a Multi-Class Classification Model that classifies around 3 flatulent claim categories using Python, TensorFlow and Neural Networks and integrated teh model with teh front-end using Flask framework
- Develop and implement SQL Script to normalize and reformat client data elements into company standards
- Worked on application that involved building Machine Learning models, Data Cleaning and Analysis using Python
- Automating and optimizing various existing processes and current reports using complex SQL queries and python scripts
- Performed data pre-processing like handling missing values, data skewness, Scaling, outliers, feature engineering and selection followed by statistical analysis such as univariate, multivariate and correlation analysis
- Cross team collaboration across multiple stakeholders (physicians, scientists, statisticians, internal and external clients including major hospital/regional directors)
- Transform complex analytical outcomes into Tableau Dashboard / PowerPoint for presentation
Environment: SQL, Python, RStudio, Tableau
DATA SCIENTIST
Confidential, Indianapolis, Indiana
Responsibilities:
- Academic Assignments data for sentiment classification using Tensor flow and NLTK modules
- Dataset was cleaned and appropriately pre-processed using Python NLTK library for teh tokenizer, extracting Parts-Of-Speech (POS) tags, and also to check for foreign words, third party libraries PyEnchant and Vader for spellchecking and sentiment analysis of essays
- Bing Snippets, Sentiment, Unique N-grams count, Long Word Count, Part of Speech Count, Spelling Error Count, Essay length features, unsupervised learning algorithm GloVe for obtaining vector representations for words was extracted
- Bag of words feature extraction with Linear Regression, Ridge Regression, Lasso Regression, Support Vector Regression and Gradient Boosting Regression
- Built Support Vector Regression, Gensim’s Word2Vec, Decision Tree and Random Forests, boosting with Decision Trees, RNNs: Long Short-Term Memoriesmodels with performance measure of Quadrant Weighed Kappa
- AdaBoost boosting algorithm from SciKit learn for boosting teh results of multiple decision trees (again from SciKit Learn)
- Accuracy of LSTM for essay grading was truly appreciable with GloVe vectors, and outperform all of teh other methods
- Saves 35% time for teaching assistant, professor and model achieved 82% accuracy
Environment: Python, RStudio, TensorFlow, Neural Networks
Data Analyst/ Data Scientist
Confidential, Grand Rapids, Michigan
Responsibilities:
- Data Mining and Pre-processing: Extracted, merged and cleansed disparate data using Alteryx, SQL, RStudio and Hadoop Hive for tactical business projects
- Complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, Joins, Constraints, DDL, DML and User Defined Functions to implement teh business logic and also created clustered and non-clustered indexes
- Customer Churn: Merged multiple data sources to understand why customers defect. Built a predictive model and aided putting it in to production to trigger specific actions based on likelihood of leaving.
- Carried out teh full process using Sqoop, Hive (HQL), and SQL for data munging. Data pre-processing and predictive model was carried out with R, Python, and H2O
- Developed and operationalized security incident trends and vulnerability management dashboards using Power BI, resulting in 15% decrease in number of breaches and improving remediation time by 25%
- Bundled Churn: Data mining was carried out to understand why people decide to drop certain policies, while maintaining others
- Performed data pre-processing tasks like Normalization, Scaling, treating teh missing values, outliers and thus, preparing it for statistical analysis like multivariate and correlation analysis
- Assisted in creating prototype model to predict customer retention by applying various machine learning algorithms, adopting teh SMOTE technique to address class imbalance and using AUROC metric for performance comparison
- Applied Data modeling techniques namely logistic regression, classification trees (CART), K-nearest neighbors, Gradient Boosting models
- Built and trained machine learning models including XGBoost and neural networks on features including demographics, co-indicator and counter indicator conditions to predict disorders, and evaluated different models
- Created Dashboards in Tableau for Customer and Bundle Churn
Environment: Python, SQL, RStudio, Hadoop Hive, Sqoop, H2O, Tableau
DATA ANALYST
Confidential
Responsibilities:
- Extract data from central data warehouse using R through HIVE connections and creating parallel database as RDS files
- Created Time Based Model for inventory data to predict future stock required to be present in inventory to meet teh customer demands
- Created Never Out of Stock report for Category teams and Higher management to understand items selling on regular basis in regular or Event days
- Created Open to Buy report for Lifestyle Category Teams based on past purchase and their sale rate and days on hand remaining for teh items to get out of stock
- Responsible for creating sell through and discount report using previous sales data
- Utilized SQL to develop and run stored procedures, views to create result sets to meet varying reporting requirements
- Conducted Exploratory Data Analysis on teh customer historical billing information to improve upon teh model of forecasting customers increasing or declining product use
- Extensively used Tableau dashboards and MS Excel for visualizations and Report generations
Environment: Python, RStudio, Hive, MS Excel, Tableau
DATA ANALYST
Confidential
Responsibilities:
- Work with large and complex data sets (both internal and external data) to evaluate, recommend, and support teh implementation of business strategies
- Developed SQL queries to bringloans data together from various systems
- Data migration using ETL (Data Engineering-Extract, Transform and Load) tool
- Worked closely with teh Data Science team, on a POC - migration from SAS to python, during teh data cleaning phase and helped them enhancing teh base model performance
- PerformedData alignment andData Cleansing and supported for loans data integrity using Pandas
- Performed Data Analysis and automated monthly vintage monitoring report using Python
- Collaborating with Data scientists and created dependent(referrals) variable to build fraud score model
- Developed an interactive dashboard for Home loans and vehicle loan report for all teh underperforming loans, with teh reasons breakout, using Tableau
- Followed Agile methodology
- Used Blueprint for tracing teh requirements and documentation
Environment: Python, SQL, SAS, MS Excel, Tableau
