Senior Data Scientist Resume
PROFESSIONAL SUMMARY:
- 8+ years of experience as a Data Scientist & Data Engineer Data Scientist with experience in Artificial Intelligence, Machine Learning, Deep Learning, Data Mining, Predictive Analytics and Decision Science.
- Specialized knowledge to deal with exponentially growing data. Understands the underlying science of data and applies the same in a diverse set of problem statements in a variety of fields.
- Top Skills Python, R, SAS, AWS, GCP, Spark, Hadoop, SQL, TSQL, Airflow, Rshiny, Python Dash, Tableau
- Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
- Experience in building deep learning models, using Natural Language Processing methods to aid in normalizing vendor names, implementing clustering algorithms, and deriving novel metrics.
- Experienced in Building and Deploying Machine Learning and Deep Learning models in AWS Segamaker and GPC Big query.
- Built and deployed Image recognition models using Keras and Tensorflow framework, Text comprehension, Classification, Pattern Recognition, Recommendation Systems, Targeting Systems, Ranking Systems
- Experience in implementation of the Stored Procedures, Triggers, Functions using T - SQL
- Skills to build a fully automated, highly elastic cloud orchestration framework on AWS and GCP and scheduling jobs in AirFlow.
- Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
- Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights using large data sets. Created visually powerful and actionable interactive reports and dashboards.
- Experience in developing and analyzing data models. Involved in writing simple and complex SQL queries to extract data from a database for data analysis and testing
- Statistical advance programming language coding skills in R, Python, SAS and cloud platforms such as Azure ML and AWS ML.
- Experience using CUDA/GPU API for real-time image processing.
- Strong SQL programming skills working with functions, packages, and triggers.
- Developed predictive models using Decision Trees, Random Forests, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
- Strong knowledge in all phases of the SDLC (Software Development Life Cycle).
- Knowledgeable in machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, Natural Language Processing (NLP) etc.
- Software development (full SDLC), Agile, and Scrum methodologies.
- Experience in stochastic optimization and regression with machine learning algorithms
- Experience formulating and solving discrete and continuous optimization problems.
- Able to research statistical machine learning, supervised learning, and classification methods
- Knowledgeable in implementing technical solutions using machine learning and other advanced technologies.
- Able to create new methods and solutions through a combination of foundational research and collaboration with ongoing initiatives.
- Strong mathematical and statistical modeling and computer programming skills in an innovative manner.
- Experience with AWS cloud computing, Spark, Tableau.
- Capable of writing efficient code and working with large data sets.
- Able to identify and learn applicable new techniques independently as needed.
- Able to work comfortably and effectively within an interdisciplinary research environment.
- Experience with validation of machine learning ensemble classifiers.
- Significant enhancement of critical thinking and problem solving in application.
- Familiarity with project management, Agile Scrum methodologies and technical knowledge of how different IT functions integrate with one another.
TECHNICAL SKILLS:
Analytics Tools: Classification and Regression Trees (CART), Support Vector Machine (SVM), Random Forest, Gradient Boosting Machine (GBM), Principal Component Analysis (PCA), Regression, Naïve BayesAnalytic Languages and Scripts Python, PySpark, Spark SQL, Scala, Java, Ruby, LaTeX, R
Data Integration: SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS)
Languages: SAS, R, Perl, SQL, Python
Version Control: GitHub, Git, SVN
Libraries: ggplot, nltk, numpy, opencv, pandas, scikit-learn, scipy, spacy, Keras, Tensorflow
Computing Environments: Jupyter, Spyder IDE, Atom
Command Shell: iPython, Mac OS X
Data Query: Azure, Google, SQL and noSQL, various SQL and NoSQL databases and data warehouses.
Operating Systems: Windows 8 and 10, Linux Ubuntu
PROFESSIONAL EXPERIENCE:
Senior Data Scientist
Confidential
Responsibilities:
- Employed Hadoop and GCP to ingest and curate the Client’s data.
- Currently working on RNN model to perform sentiment analysis on customer tweets and reviews about products, the models scrape for thousands of reviews and gives out a multi class output (SoftMax) which ranks a tweet as either positive, negative, positive with reservations or Unsatisfied customer.
- Clean data, perform EDA, impute data for a predictive model which would predict delivery failures. Modelling was performed using SVM, Logisitic regression, KNN, Bernoulli, Naïve bayes and Artificial NN (RNN, CNN, ANN, LSTMS)
- Evaluate model performance for Binary classification models using confusion matrix, f1 scores, precision and recall
- Built an MVP for customer segmentation using unsupervised models, K means, grid search and DB scan.
- Data extraction, transformation and then loading for time series modelling to predict SKU wise demand on a store level, model built using ARIMA, and Prophet models, near 90% accuracy on SKU (home and personal care).
- Develop code to store RMSE, R2 score for all-time series models in production. These scores are regularly updated and model performance is analyzed. This error analysis helps many teams to immediately find insights if models need reassessment
- Time series Models performed and stored Trend, Seasonality, and residual values in separate tables.
- Designed, wrote, and supported the code to automate the data processing pipeline for business-critical space and floor planning data for the entire Confidential enterprise.
- Developed PySpark applications using Data frames and Spark SQL API for faster processing of data.
- Data support and debugging of the aforementioned space and floor planning data
- Designed visualizations for the Retail Operations data sets to inform the clients of the quality of the data they are receiving and to which they are applying data science methodologies
- Engaged with principal data scientist to cross-pollinate and leverage design patterns and practices for big data equerries which would filter out anomalies or input entries in data files.
- Develop, test and enhance data stores (tables, views, files) fitting to design and architecture of data solution.
- Developed Machine learning models such as KNN and K-means on incoming data, apply unsupervised Machine Learning models for segmentation.
- Investigate performance issues, identify optimization measures including but not limited to coalescing fragmented data, optimizing resource usage, re-partitioning, indexing, bucketing/distribution.
- Implemented PySpark and Spark SQL for faster testing and processing of data worked on migrating Oracle queries and Alteryx workflows into PySpark transformation
Data Scientist
Confidential
Responsibilities:
- Develop applications to standardize and ETL data; design large scale data analytics and reporting platforms;
- Develops and codes software programs, algorithms and user inter-faces to cleanse, integrate and evaluate large datasets from multiple disparate sources primarily on RStudio and developed on Shiny app
- Designed and built core data visualization capabilities. Tableau or Shiny apps built with Plotly
- Design and build core data visualization capabilities. Tableau or Shiny apps built with Plotly
- Identifies meaningful insights from large data and meta data sources, built a recommender app to map customer behavior and recommend/show next sample on the Aroma platform.
- Interacts with research teams to identify questions and issues. Built easy to use ANN and tree models (XGboost, Gradient desent) and deployed them using Rshiny. The dashboard was easy to use can was frequently used by non-IT staff to generate predictions for a population using data.
- Responsible a A/B testing on new flavor extractions,
- Used AWS SQS to store multiple requests and AWS Cloud Watch to execute the lambda functions as well as integrated AWS SNS to send recommendations to the user via email.
- Worked with data science team and provided respective data as required on an ad-hoc request basis.
- Delivered portfolio risk dashboard as a package covering all aspects of the credit life cycle for retail unsecured loans.
- Created time series forecasts using Prophet for default rates of bank financial instruments.
- Worked with huge data sets from Big Data with Hadoop, HDFS, Map Reduce, and Spark.
- Information used included structured and semi-structured data elements collected from both internal and external sources.
- Unbalanced data issue was handled using Synthetic Minority Over Sampling, SMOTE. Missing data was handled using KNN imputation.
- Assisted both application engineering and data scientist teams in mutual agreements/provisions of data, deployment of production models etc.
- Python, MlLib, and a broad variety of machine learning methods including classifications, regressions and dimensionality reduction were incorporated.
- Use of Supervised, Unsupervised, Semi-Supervised classification and clustering of documents.
- Strong communication and problem-solving skills incorporated in a team environment.
- Contributed to security projects involving real-time object tracking and classification using OpenCV libraries.
- Interrogated analytical results to resolve algorithmic success, robustness and validity.
- Assisted in developing Spark/Scala, Python for regular expression (regex) projects in a Hadoop/Hive environment.
Consultant Machine Learning Engineer
Confidential
Responsibilities:
- Used predictive modelling techniques MNG’s Health extensive datasets to identify patterns by Time Series Analysis
- Applied hypothesis testing, regression analysis, linear models, non-linear models, forecasting, and machine learning
- Built a R shiny dashboard that allows to filter and select from over a million records, used by non-data science teams
- Final model predicts the probability and like-hood of prescription for hundreds of thousands of HCPs, Model validated by empirical data sets, allows laser strategy for marketing saving hundreds of manhours and thousands in savings.
- Feature engineering and data cleaning on different files, import, mutate and add new variables for machine learning
- Applied both Random Forest and Gradient boosting to cross validate results, identified top GINI split variables
- Project management with Scrum, all project deadlines meet with no backlogs, and clear distribution of work in team
- Thrived as a contributor, scientist and developer in an Agile development process.
- Use of Python/R or similar scripting languages to manipulate, analyze and visualize large data sets.
- Rapid model creation in Python using pandas, numpy, sklearn, and plot.ly for data visualization.
- Models implemented in SAS and interfaced with MSSQL databases and scheduled to update on a timely basis.
- Developed anomaly and outlier detection development in the creation of algorithms with CNNs.
- Machine learning classification of documents - Neural Network and Deep Learning language techniques, K-neighbors, K-means, Random Forest, Logistic Regression.
- Trained Data with Different Classification Models like Decision Trees, SVM and Random forest
- Managed BI group and associated cross-departmental collaborators in the design, development, and implementation of near-real-time cloud and traditional data systems to capture, cleanse, store, and process data
- Programmatic usage of SQL databases and search engines.
- Segmentation of medical related images using OpenCV libraries.
- Developed and deployed machine learning as a service on Microsoft Azure cloud service.
- Conducted and interpreted analyses using noisy data sets.
Data Scientist/Data Engineer
Confidential, Englewood Cliffs, NJ
Responsibilities:
- Test solutions on AWS using services such as SageMaker, EC2, and Snowball Edge.
- Measured, monitored, and analyzed statistical trends for route planning, and route optimization
- A/B testing with development team on app and web, apply ML (association) to help upsell and cross sell.
- Used Machine learning to predict demand and supply patterns and guide operations to lower fuel costs and time
- Implemented logistics, vehicle routing problem, traveling salesman problem, capacitated vehicle routing problem.
- Developed personalized product recommendation with machine learning algorithms that used Collaborative filtering to better meet the needs of existing customers and acquire new customers.
- Created machine learning algorithm and employed logistic regression, random forest, KNN, SVM, neural network, linear regression, and lasso regression and k-means.
- Developed optimization algorithms that can be used with data driven models, such as with supervised and unsupervised machine learning or reinforcement machine learning.
- Able to research statistical machine learning methods which may include forecasting, supervised learning, classification, and Bayesian methods.
- Able to advance the technical sophistication of solutions through the use of machine learning and other advanced technologies.
- Performed exploratory data analysis and data visualizations using R, and Tableau.
- Collaborated with data engineers to implement the ETL process, wrote and optimized SQL queries to perform data extraction and merging from Oracle.
- Used R, Python, and Spark to develop a variety of models and algorithms for analytic purposes.
- Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R and Python.
Data Scientist
Confidential
Responsibilities:
- Create data quality products for monitoring and reporting, develop documentation for production and maintenance efficiency
- Establish and maintain an ongoing process for reviewing suspect data, determining root cause, and communicating remediation requirements
- Extracted payroll data from SQL and NoSQL, perform statistical analysis of performance and people analytic measures lead of analytics and rewards fora team that achieved10% in productivity in form of reduced breaks and more work hours and negotiate contract with modest increment in wages
- Use of a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction using NTLK, word2vec and TF/IDF
- Cleaned, parsed, and tokenized 1.6 million sentences using NLTK and scikit-learn.
- Developed, deployed, and maintained production NLP models with scalability in mind.
- OLAP cubes used in preparation for data mining, behavioral and attitudinal segmentation, predictive modeling, insight extraction, and data visualization
- Worked with complex applications such as R and SAS, to develop neural network, cluster analysis.
- Implementation of machine learning algorithms and concepts such as: K-means Clustering (varieties), gaussian distribution, decision tree etc.
- Analyzed data using data visualization tools and reported key features using statistic tools and supervised machine learning techniques to achieve project objectives.
- Analyzed large data sets and applied machine learning techniques and develop predictive models, statistical models.
- Used key indicators in Python for machine learning concepts like regression, boot strap aggregation and random forest.
- Use of inferential statistics and machine learning data pipelines.
