Sr.DATA SCIENTIST/Analyst Resume Yonkers, New York - Hire IT People

SUMMARY

Over 6+ years of experience in Data Analytics, Data Science, Natural Language Processing (NLP) and Data Visualization
Over 4+ years’ work experience in customer segmentation, strategy development, operation analysis and customer behavior analysis
Good Experience in handling Structured and Unstructured data, developing various Statistical Machine Learning solutions to address several business problems and creating data visualizations using Python and Tableau
Critical thinker with strong ability to frame and address challenging scientific problems
Experienced in Python with teh focus to improve, validate, deploy, and optimize teh machine learning models that support many aspects of teh business
Passionate about data forecasting and pattern detection with a strong ability to process, analyze and visualize teh datato find patterns and key trends
Proficient in coding in Python, SQL, R, SAS and has good knowledge on Spark
Proficient in Tableau for visualizations, creating ad - hoc reports, dashboards, andstorytelling
Strong experience working in an integrated Agile for all phases of Software Development Lifecycle (SDLC) including statistical data analysisand hypotheses testing
Worked in Apache Spark and Hadoop ecosystem (HDFS, MapReduce, Sqoop, Hive) to handle large datasets
Knowledge on Spark MLlib algorithms and services such as regression, classification, clustering, collaborative filtering and dimensionality reduction
Experience in designing and applying machine learning models such as Logistic Regression, Decision Trees, K Means, Random Forest, Support Vector Machines, KNN, XGBoost and Neural Networks
Experienced in Text mining, Natural Language Processing, Sentiment Analysis, Text classification, Topic modeling, Segmentation methodologies

TECHNICAL SKILLS

Programming Languages: SQL, Python, R, SAS, JavaScript

Machine Learning: Linear regression, SVR, KNN, Naive Bayes, Logistic Regression, Linear Discriminant Analysis (LDA), SVM, Random Forest, Boosting, K-means clustering, Hierarchical clustering, Latent Dirichlet Allocation (LDA), Collaborative filtering, Artificial Neural Networks, CNN, RNN, LSTM, NLP

Development Environments: Jupyter, Spyder, Pycharm, Microsoft VS Code, RStudio,XML, JSON, REST

Big Data Tools/Services: Spark, Hadoop Map Reduce, Hive, Sqoop

Databases: Microsoft SQL Server, Oracle, PostgreSQL

Python Libraries: Scikit Learn, Pandas, Numpy, Scipy, Matplotlib, Seaborn, Plotly, NLTK, Gensim - Word2Vec, GloVe; Keras, TensorFlow

Data Visualization: Tableau, Microsoft Power BI

PROFESSIONAL EXPERIENCE

Sr.DATA SCIENTIST/Analyst

Confidential, Yonkers, New York

Responsibilities:

Support, extract and deliver physical health, behavioral health, care coordination, care integration, care transition, collaborative care, chronic disease, patient’s surveys and continuity report
Utilize Oracle SQL skills to assess and analyze data for quality and integrity
Built Machine Learning models to predict teh Claim Fraud detection
Developed Predictive Models to assess Healthcare Quality and administer Healthcare Costs using Medical Claims and Pharmacy Claims Data
Built a Multi-Class Classification Model that classifies around 3 flatulent claim categories using Python, TensorFlow and Neural Networks and integrated teh model with teh front-end using Flask framework
Develop and implement SQL Script to normalize and reformat client data elements into company standards
Worked on application that involved building Machine Learning models, Data Cleaning and Analysis using Python
Automating and optimizing various existing processes and current reports using complex SQL queries and python scripts
Performed data pre-processing like handling missing values, data skewness, Scaling, outliers, feature engineering and selection followed by statistical analysis such as univariate, multivariate and correlation analysis
Cross team collaboration across multiple stakeholders (physicians, scientists, statisticians, internal and external clients including major hospital/regional directors)
Transform complex analytical outcomes into Tableau Dashboard / PowerPoint for presentation

Environment: SQL, Python, RStudio, Tableau

DATA SCIENTIST

Confidential, Indianapolis, Indiana

Responsibilities:

Academic Assignments data for sentiment classification using Tensor flow and NLTK modules
Dataset was cleaned and appropriately pre-processed using Python NLTK library for teh tokenizer, extracting Parts-Of-Speech (POS) tags, and also to check for foreign words, third party libraries PyEnchant and Vader for spellchecking and sentiment analysis of essays
Bing Snippets, Sentiment, Unique N-grams count, Long Word Count, Part of Speech Count, Spelling Error Count, Essay length features, unsupervised learning algorithm GloVe for obtaining vector representations for words was extracted
Bag of words feature extraction with Linear Regression, Ridge Regression, Lasso Regression, Support Vector Regression and Gradient Boosting Regression
Built Support Vector Regression, Gensim’s Word2Vec, Decision Tree and Random Forests, boosting with Decision Trees, RNNs: Long Short-Term Memoriesmodels with performance measure of Quadrant Weighed Kappa
AdaBoost boosting algorithm from SciKit learn for boosting teh results of multiple decision trees (again from SciKit Learn)
Accuracy of LSTM for essay grading was truly appreciable with GloVe vectors, and outperform all of teh other methods
Saves 35% time for teaching assistant, professor and model achieved 82% accuracy

Environment: Python, RStudio, TensorFlow, Neural Networks

Data Analyst/ Data Scientist

Confidential, Grand Rapids, Michigan

Responsibilities:

Data Mining and Pre-processing: Extracted, merged and cleansed disparate data using Alteryx, SQL, RStudio and Hadoop Hive for tactical business projects
Complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, Joins, Constraints, DDL, DML and User Defined Functions to implement teh business logic and also created clustered and non-clustered indexes
Customer Churn: Merged multiple data sources to understand why customers defect. Built a predictive model and aided putting it in to production to trigger specific actions based on likelihood of leaving.
Carried out teh full process using Sqoop, Hive (HQL), and SQL for data munging. Data pre-processing and predictive model was carried out with R, Python, and H2O
Developed and operationalized security incident trends and vulnerability management dashboards using Power BI, resulting in 15% decrease in number of breaches and improving remediation time by 25%
Bundled Churn: Data mining was carried out to understand why people decide to drop certain policies, while maintaining others
Performed data pre-processing tasks like Normalization, Scaling, treating teh missing values, outliers and thus, preparing it for statistical analysis like multivariate and correlation analysis
Assisted in creating prototype model to predict customer retention by applying various machine learning algorithms, adopting teh SMOTE technique to address class imbalance and using AUROC metric for performance comparison
Applied Data modeling techniques namely logistic regression, classification trees (CART), K-nearest neighbors, Gradient Boosting models
Built and trained machine learning models including XGBoost and neural networks on features including demographics, co-indicator and counter indicator conditions to predict disorders, and evaluated different models
Created Dashboards in Tableau for Customer and Bundle Churn

Environment: Python, SQL, RStudio, Hadoop Hive, Sqoop, H2O, Tableau

DATA ANALYST

Confidential

Responsibilities:

Extract data from central data warehouse using R through HIVE connections and creating parallel database as RDS files
Created Time Based Model for inventory data to predict future stock required to be present in inventory to meet teh customer demands
Created Never Out of Stock report for Category teams and Higher management to understand items selling on regular basis in regular or Event days
Created Open to Buy report for Lifestyle Category Teams based on past purchase and their sale rate and days on hand remaining for teh items to get out of stock
Responsible for creating sell through and discount report using previous sales data
Utilized SQL to develop and run stored procedures, views to create result sets to meet varying reporting requirements
Conducted Exploratory Data Analysis on teh customer historical billing information to improve upon teh model of forecasting customers increasing or declining product use
Extensively used Tableau dashboards and MS Excel for visualizations and Report generations

Environment: Python, RStudio, Hive, MS Excel, Tableau

DATA ANALYST

Confidential

Responsibilities:

Work with large and complex data sets (both internal and external data) to evaluate, recommend, and support teh implementation of business strategies
Developed SQL queries to bringloans data together from various systems
Data migration using ETL (Data Engineering-Extract, Transform and Load) tool
Worked closely with teh Data Science team, on a POC - migration from SAS to python, during teh data cleaning phase and helped them enhancing teh base model performance
PerformedData alignment andData Cleansing and supported for loans data integrity using Pandas
Performed Data Analysis and automated monthly vintage monitoring report using Python
Collaborating with Data scientists and created dependent(referrals) variable to build fraud score model
Developed an interactive dashboard for Home loans and vehicle loan report for all teh underperforming loans, with teh reasons breakout, using Tableau
Followed Agile methodology
Used Blueprint for tracing teh requirements and documentation

Environment: Python, SQL, SAS, MS Excel, Tableau

We provide IT Staff Augmentation Services!

Sr.data Scientist/analyst Resume

Yonkers New, YorK

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship