We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

Framingham, MA

SUMMARY:

  • Data Scientist with 7+ years of professional experience in E - commerce, Finance, Retail and Telecommunications domains, performing Statistical Modelling, Data Extraction, Data Screening&Cleaning, Data Exploration and Data Visualization of structured and unstructured data, and implementing Machine Learningalgorithms on large-scale to deliver resourceful insights and inferencestargeted towards boostingrevenues and enriching customer experiences.
  • Experienced in facilitating the entire lifecycle of a data science project: Data Collection,Data Extraction, Data Pre-Processing, Feature Engineering, Dimensionality Reduction, Algorithm Implementation, Back Testing and Validation.
  • Expert at working with statistical tests such as Two-way independent&Paired t-test, One-way&Two-way ANOVAas well as with non-parametric tests such asMann-Whitney U, Wilcoxon Rank Test, Shapiro-Wilk Test, & Kruskal-Wallis Test using RStudioor Python.
  • Adept at analysis of missing data by exploring correlations and similarities and choosing from imputation methods such iterative imputer on Python.
  • Experienced in Machine Learning techniques such as regression and classification models like Linear, Polynomial, Support Vector, Decision Trees, Logistic Regression, andSupport Vector Machines.
  • Experienced in use of Ensemble learning with Bagging, Boosting & Random Forests; clustering methods like K-means and DBSCAN; association rule learning methods like Apriori and Eclat.
  • In-depth knowledge of Dimensionality Reduction (PCA, LDA), Model Regularization (Ridge, Lasso, Elastic Net), Hyper-parameter tuning, and Grid Search techniques to optimize model performance.
  • Experienced in working with Artificial Neural Networks to develop AI solutions.
  • Expertise in creating executive Tableau Dashboards and Stories for Data visualization and deploying it to the servers.
  • Working knowledge of extracting, transforming and loading (ETL) data from spreadsheets, database tables, flat files and other sources using Informatica.
  • Skilled in using Tidyverse in R and Pandas, Matplotlib &Seaborn in Python for performing exploratory data analysis.
  • Proficient in Data Visualization tools such as Tableau and PowerBI, Big Data tools such as Hadoop HDFS, Spark, and MapReduce, MySQL, Oracle SQL and Redshift SQL and Microsoft Excel (VLOOKUP, Pivot tables)
  • Skilled in Big Data Technologies like Spark, Spark SQL,PySpark, and Kafka.
  • Experience in Web Data Mining with Python’s BeautifulSoup along withknowledge of Natural Language Processing (NLP) to analyze text patterns.
  • Experience with Python libraries including NumPy, Pandas, SciPy, SkLearn&statsmodels, MatplotLib, Seaborn, Theano, Tensorflow, Keras, NLTK and Rmodules like ggplot2.
  • Working knowledge of database Creation and maintenance of Physical data models with Oracle, DB2 and SQL server databases as well as normalizing databases up to third form using SQL functions.

SKILL:

Languages: Python, R, MATLAB, SQL

Database: MySQL, PostgreSQL, Oracle, Microsoft SQL Server

Statistical Tests: Hypothesis Testing, ANOVA Test, t-Test, F-test, Chi-Square Fit Test Validation Techniques k-Fold cross validation, Out-of-the-Box Estimates, A/B Tests, Monte Carlo Simulations

Optimization Techniques: Gradient Descent, Stochastic Gradient Descent, Mini-Batch Gradient Descent, Gradient Optimization - Adam, Momentum

Data Visualization: Tableau, Microsoft PowerBI, ggplot2, Matplotlib, Seaborn, Alteryx

Data modeling: Entity relationship Diagrams (ERD), Snowflake Schema, SPSS Modeler, Dataiku DSS

Big Data: Apache Hadoop, HDFS, MapReduce, Spark

Version Control: GitHub, Git

Cloud Services: AWS EC2/S3/SageMaker, Microsoft Azure, Google Cloud Platform

PROFESSIONAL EXPERIENCE:

Confidential, Framingham, MA

Data Scientist

Responsibilities:

  • Reviewed business requirements to analyze the data source and worked closely with the business analysts to understand project objectives.
  • Collaborated with Data Engineers and Business Analystswhile building the data pipeline.
  • Involved in Data Collection,Data Extraction, Data Pre-Processing, Feature Engineering, Dimensionality Reduction, Algorithm Implementation, Back Testing and Validation.
  • Optimized SQL queries for transforming raw data into MySQL with Informatica to prepare structured data for machine learning.
  • Imported data from AmazonS3 buckets using boto3 library.
  • Imported reviews data by into Python environment using Beautiful Soup and re (regex) library in Python.
  • Created Tableau dashboards from the structured data to visualize data.
  • Performed text-data pre-processing by implementingTokenizationand Lemmatizationto convert the raw text data to structured data.
  • Used TF-IDF, pattern, and NLTK to evaluate the sentiment shown in user reviews and classify productsas having poor, bad, neutral, good, or excellent customer popularity.
  • Used Neural Network as classification model using Keraswith TensorFlow backend.
  • Executed processes in parallel using distributed environment of TensorFlow across multiple devices (CPUs & GPUs).
  • The model was deployed using SageMaker(AWS) for production use. Worked and delivered results in an agile environment.
  • Classification model was evaluated using Precision, Recall&F-1 Score. Focus was to maximize recall by minimizing false negative in order to ensure all fake reviews are detected.
  • Analyzed and grouped product reviews into different clusters based on product description, purchase and historic data using techniques such as LDA.
  • Developed a RESTful API which generates a list of most frequent issues encountered by users for a certain product.

Environment: Python (NumPy, Pandas, Matplotlib, boto3, Beautiful Soup, Requests, re), NLTK, pattern, Tableau, Keras, TensorFlow,AWS SageMaker.

Confidential, Wilmington, DE

Data Scientist

Responsibilities:

  • Performed data visualization, cleaning, model development, and model validation in the project to deliver data science solutions.
  • Retrieved data from Postgres database by writing SQL queries like stored procedure, temp table, view in PgAdmin.
  • Used Tableau for quick visualization of data at hand.
  • Used SMOTE to create minority class instances in order to deal with data imbalance.
  • Implemented Principal Component Analysis (PCA) to reduce the number of features in the model.
  • Built prediction models for fraud detection of loans by use of supervised learning techniques such as Logistic Regression, Support Vector Machines, Decision Trees, Random Forests, etc.
  • Ensemble methods were used to increase the accuracy of the training model with different Boostingmethods through use of XGBoost.
  • Applied DBSCAN to discover hidden patterns and relationships between features and response that might hint towards fraudulent behavior.
  • Trained aNeural Net in Python using Keras to train a sequential prediction model.
  • Used various metrics such as F-1 Score, ROC and AUC to evaluate the performance of each model
  • Performed model tuning by using cross-validation and hyperparameter tuning to prevent overfitting.

Environment: Python 3.x (Keras, Numpy, Pandas, Seaborn, SkLearn, XGBoost), PostgreSQL, PgAdmin, Jupyter Notebooks, Tableau, DBSCAN.

Confidential, Philadelphia, PA

Data Scientist

Responsibilities:

  • Defined appropriate churn scope and defined target metric for the project.
  • Created feature requirements for the project by working with a team of data engineers and business analysts.
  • Imported cleaned and structured data from Postgre servers using PgAdmin by writing SQL queries.
  • Supported in data visualization code migration from R to Python.
  • Performed data visualization using Matplotlib and Seaborn.
  • Applied various classification models such asLogistic Regression, K-NN, SVM, Random Forests and Neural Networks using SKLearn.
  • Addressed overfitting and underfitting by using K-fold Cross Validation.
  • Applied K-means clustering to look for churn patterns among customers based of various features.
  • Documented the factors crucial for customer retention.

Environment: Python (NumPy, Pandas, Matplotlib,SKLearn), Tidyverse, R,Postgre SQL,PgAdmin.

Confidential

Data Analyst

Responsibilities:

  • Received intensive training on Python, Object Oriented Programming, Oracle, MySQL, JavaScript, Bootstrap and Project Management.
  • Analyzed business requirements and project objectives as proposed by stakeholders combining them with market analysis.
  • Maintained and updated 10 databases related to customer info using MySQL.
  • Designed interactive dashboards for developing business solutions, insights, and monthly reports using Microsoft Power BI.
  • Collaborated with the Business Intelligence team by facilitating prioritization of user stories based on the data analyzed.
  • Performed day-to-day data visualization tasks as per the requirements of stakeholders.
  • Performed A/B Testing to test for significant improvements over the old website.

Environment: Python, Oracle, MySQL, JavaScript, Bootstrap, Microsoft Power BI.

Confidential

Data Analyst

Responsibilities:

  • Drew statistical inferences using t-tests, ANOVA, Chi-sq tests and performed Post-Hoc Analysis using Tukey’s HSD and Bonferroni correction to assess difference across levels of raw material categories, test significance of proportional differences and assess whether sample size is large enough to detect the differences.
  • Provided statistical insights into semi-deviation & skewness-to-kurtosis ratio to guide vendor decisions and inferences into optimum pricing for raw material order quantities.
  • Performed Data Analysis on target data after transfer to Data Warehouse.
  • Developed interactive executive dashboards using PowerBI and Excel VBA to provide a reporting tool that facilitates organizational metrics and data.
  • Created Database designs through data-mapping using ER diagrams and normalization upto the 3rd normal form and extracted relevant data whenever required using joins in Postgre SQL and Microsoft SQL Server.
  • Conducted data preparation and outlier detection using Python.
  • Worked with Team manager to develop a lucrative system of classifying auditions and vendors best fitting for the company in the long run.
  • Presented executive dashboards and scorecards to visualize and present trends in the data using Excel and VBA-Macros.

Environment: Microsoft Office, Microsoft PowerBI, SQL, Tableau, SPSS

We'd love your feedback!