We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

San Antonio, TexaS

SUMMARY

  • Data Scientist/ Machine Learning Engineer wif overall 5+ years of experience in providing end - to-end analytics solutions across Banking, Insurance, Pharmaceutical/ Healthcare, Manufacturing and Retail industries.
  • Hands-on Experience in Machine Learning, Data Analytics, Data Mining, Text Mining, Statistic Modelling, Predictive Modelling and Natural Language Processing (NLP) and familiar wif Time series.
  • Hands on experience in implementing Linear Regression, Logistic Regression, Multiple Regression, Support Vector Machine (SVM), Naïve Bayes, Decision Tress, CART, Random Forest, K-Nearest Neighbor- k-NN, K-means clustering and Hierarchical clustering algorithms and skilled in Boosting, TEMPPrincipal Component Analysis (PCA), LDA and artificial neural network and convolutional neural networks (CNN).
  • Worked on different data formats - JSON, CSV, XML and performed machine learning algorithms using different python libraries - pandas, NumPy, matplotlib, seaborn, SciPy, scikit-learn, NLTK and PySpark.
  • Solid ability to write and optimize diverse SQL queries, working noledge of RDBMS like MySQL, NoSQL databases like MongoDB.
  • Extensive experience in Data Visualization including producing tables, graphs, storytelling listings and to derive insights from analyzing large, complex data sets using various tools such as Tableau, R shiny, MS Excel, SQL.
  • Expertise in Excel Macros, Pivot Tables and other advanced functions. Extensive experience on usage of ETL (data warehouse - ExtractTransformLoad).
  • Hands-on expertise in cloud computing services like Amazon Web Services (AWS) and Microsoft Azure.
  • Experience wif Big Data and data mining technologies including Hadoop, Hive, etc. and good understanding in R (ggplot2, gbm, dplyr, randomforest) and Java programming.
  • Proficient in deep learning techniques of artificial neural networks ANN, such as Tensorflow, Keras and familiar wif image processing using OpenCV and MATLAB.
  • In-depth expertise in Statistical Procedures like Parametric and Non-Parametric Tests, Hypothesis Testing, ANOVA, ARIMA, Interpreting P values.
  • Involved in teh entire data science project life cycle and actively involved in all teh phases wif large data sets of structured and unstructured data in Agile/SCRUM environment.

TECHNICAL SKILLS

Programming Language: Python, SQL, SQL/ Hive, NoSQL, R, SAS, Java

Machine Learning and Data Science: Classification models, Regression models, Clustering, Visualization, Dimensionality Reduction, Bootstrapping, Recommender System, Neural Networks using TensorFlow, Keras, NLTK such as LSA, LDA, Markov Models, POS tagging, CNN, Time Series

Image Processing: MATLAB, OpenCV

Databases: MySQL, MongoDB

BI/ Reporting Tools: Tableau, Excel

Big Data: Knowledge of Big Data, Hadoop and Hive

Other: Spyder, Anaconda, Jupyter, GIT, Windows, Linux, Amazon AWS, LaTex

PROFESSIONAL EXPERIENCE

Confidential - San Antonio, Texas

Data Scientist

Responsibilities:

  • Built a propensity model for marketing emails using machine learning algorithms.
  • Handle importing of data from SAS data libraries and loaded data through SQL/ Hive queries.
  • Performed exploratory data analysis (EDA) to visualize and understand data distribution and data cleaning, find patterns, to spot anomalies, anomaly (outlier) detection etc. using Python libraries - NumPy, pandas and SciPy and analyzed data using PySpark.
  • Performed data organization, feature extraction, feature engineering and feature preprocessing.
  • Created insightful visualization using different visualization (scatter plot, box plots, and histograms) techniques using matplotlib and seaborn in Python and developed BI dashboards using Tableau.
  • Implemented supervised machine learning algorithm like Logistic Regression and XGBoost in Python.
  • Fine-tuned teh model using hyperparameter tuning wif GridSearchCV increase teh accuracy of teh training data from 78% to 83% on validation data using F1-score.
  • Evaluated teh performance of teh model using ROC curves, AUC, gains charts, confusion matrix and K-fold cross-validation to test teh models wif different samples ofdatato optimize teh models.
  • Collected teh feedback after deployment, retrained teh model to improve teh performance.
  • Implemented agile methodology for building internal applications.
  • Provided detailed documentation using Tableau, MS Office, LaTex, Excel.
  • Involved in business requirements understanding phase, discuss wif different teams and come up wif product implementation plan.

Environment: Python, Tableau, PySpark, SQL/Hive, Hadoop, HDFS, SAS, Jupyter Notebook, Anaconda, python libraries (pandas, NumPy, Seaborn, SciPy, matplotlib, scikit-learn), GitHub, MS Office, LaTex, Excel.

Confidential - Mason, Ohio

Data Scientist/ Machine Learning Engineer

Responsibilities:

  • Built an unsupervised machine learning model to segment customers for targeted marketing.
  • Worked on a large dataset in different format like JSON and CSV.
  • Created AWS S3 buckets and managing policies for AWS S3 buckets and utilized AWS S3 bucket and Glacier for Archival storage and backup.
  • Performed data wrangling to clean, transform and structure teh data utilizing NumPy and pandas library.
  • Performed exploratory data analysis (EDA) to visualize and understand data distribution.
  • Created various types of data visualizations using Python libraries matplotlib and seaborn, ggplot2 in R.
  • Clustered customer actions using K-means Clustering and Hierarchical Clustering and segmented them into different groups which helped teh marketing team to further analyze behavioral patterns of customers.
  • Developed teh model and used elbow plot to find teh optimum value of K by using Sum of Squared error as teh error measure.
  • Evaluated and optimized performance of models and tuned parameters wif K-Fold Cross validation and utilized Hypothesis-driven approach to analyze A/B testing.
  • Built a model to analyze bad reviews of customer using NLP.
  • Text analytics on review data using machine learning technique in python using NLTK.
  • Utilized teh Count-Vectorizer, TF-IDF to identify teh frequently used words, most relevant and most important words in teh feedback documents.
  • Performed tokenization, case conversion, word replacement, lemmatizing, stemming during teh data preprocessing stage using NLTK package in python.
  • Implemented Random Forest algorithm in Python and used ROC, AUC to evaluate teh model.
  • Visualized, interpreted, report findings and developed strategic uses of data using Tableau wif AWS Redshift to create interactive Dashboards.

Environment: Jupyter, Spyder, Python, Tableau, python libraries (pandas, NumPy, Seaborn, SciPy, matplotlib, scikit-learn, NLTK), NLP, AWS, R.

Confidential, Collegeville, Pennsylvania

Data Analyst

Responsibilities:

  • Built a model to predict if teh patient is suffering from a particular disease by analyzing patient medical records (healthcare data).
  • Used correlation matrix, histogram and bar plots and performed exploratory data analysis (EDA) to analyze, visualize and understand teh dataset.
  • Performed data preprocessing by eliminating duplicate and inaccurate data in Python and identified teh null values, filling null values, and transforming raw data.
  • Used different libraries in Python like NumPy, pandas, scikit-learn for data extraction and cleansing.
  • Implemented few machine learning algorithms like k-NN, Support Vector machine SVM, decision trees, random forest classifier, XG Boost, etc.
  • Tested teh performance using ROC curve and used different parameters to evaluate performance of a model.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Designed data visualizations to model data wif Tableau and seaborn, matplotlib, R shiny.
  • Designed and implemented Cross-validation and statistical tests including k-fold, stratified k-fold, hold-out scheme to test and verify teh models' significance.
  • Build a chatbot using Natural Language NLP.
  • Used NLTK package and performed noise and stop words removal, stemming, lemmatization, etc.
  • Utilized teh Count-Vectorizer, TF-IDF to identify teh frequently used words for chatbot.

Environment: Python, python libraries (pandas, NumPy, seaborn, SciPy, matplotlib, scikit-learn), Machine learning algorithms (Logistic Regression, Decision trees, Naïve Bayes), Anaconda, Hadoop, NoSQL, GitHub, Hive, Tableau, R, R shiny Linux.

Confidential

Data analyst

Responsibilities:

  • Built a model to determine customer churn rate to develop retention strategy.
  • Worked on model building using Machine Learning algorithms like Linear Regression, Logistic Regression, gradient boosting, Support Vector machine SVM, Naive Bayes.
  • Advanced charts, drill downs and intractability are incorporated in reporting for different stakeholders and integrating teh publishing of reports to teh client infrastructure.
  • Collaborated wif data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit teh analytical requirements.
  • Worked on data cleaning and ensured data quality, consistency, integrity using pandas, NumPy.
  • Explored and analyzed teh customer specific features by using matplotlib, seaborn in python and dashboards in Tableau.
  • Performed data imputation using scikit-learn package in Python.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding wif scikit-learn preprocessing.
  • Designed rich data visualizations to model data into human-readable form wif Tableau and matplotlib.
  • Involved in preparing to deliver daily, weekly, monthly, and annual statistical reports and analysis using SQL, Python, Excel Pivot Tables, V-lookups, Tableau, and graphical representation.
  • Developed complex SQL queries by performing various joins, grouping and sorting using JOINS, GROUPBY and HAVING clause to extract data.

Environment: Python, Python libraries (scikit-learn, SciPy, NumPy, pandas, matplotlib, seaborn, NLTK), Tableau, SQL, relational database, MySQL, ETL, Machine Learning algorithms, GitHub.

Confidential

Data analyst

Responsibilities:

  • Involved in teh entire data science project life cycle including data extraction, data cleansing, transform and prepare teh data ahead of analysis and data visualization phase.
  • Interacted wif clients to understand business requirement and build a product based on teh requirement.
  • Imported and exported large amounts of data from files to relational database - MySQL database and vice versa.
  • Maintained teh database and created Stored Procedures for data extraction.
  • Developed complex SQL queries by performing various joins, grouping and sorting using GROUPBY and HAVING clause to fetch data later used for monthly quality audit reporting across multiple teams for reporting and analysis.
  • Ensured reporting and creation of dashboards and digitization of reports using SQL, MS Excel, and Tableau.
  • Automated teh ETL process including all QA/QC at various levels using SQL.
  • Utilized MS-Excel at maximum extent for creating, managing, reporting using Macros, pivot table, vlookup.
  • Developed firm noledge on Excel, SQL and SAS for large dataset manipulations.

Environment: Relational database, SQL, SAS, Tableau, Python.

We'd love your feedback!