We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

San Diego, CA

SUMMARY

  • Overall 6 + years of experience in teh field of Statistical Modeling, Hypothesis testing, Predictive modeling, Machine Learning, Multivariate Analysis, Correlation, ANOVA, Data Analytics, Text Mining, solving SQL queries, Database Management.
  • Expert in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions dat scales across massive volume of structured data.
  • Well - versed in predictive statistical modeling using Python and R.
  • Proven track record of successfully handling data science projects from start to finish.
  • Proficient inData Analysis, Cleansing, Transformation, Model Building, Model Evaluation, and Presentation.
  • Extensively worked on Python 3.5/2.7 (NumPy, Scipy, Pandas, Matplotlib, Seaborn, NLTK and Scikit-Learn).
  • Experienced in extracting data from various data sources like csv, text, tab separated values, pipe separated values, reading tables from teh web using Python.
  • Expert in performing data manipulation operations using Python.
  • Skilled in R programming using packages like caret, ggplot2, dplyr.
  • Expert in implementing machine learning algorithms like K-Means, Clustering, KNN, Naïve Bayes, SVM, Decision Trees, Linear and Logistic Regression Methods.
  • Experienced in applying ensemble methods such as Random Forests and Gradient Boosting for improving accuracy of predictions.
  • Exposure to Natural Language Processing (NLP) and text mining.
  • Experienced in Dimensionality Reduction methods such as PCA (Principal component Analysis) and Regularization techniques like Lasso and Ridge.
  • Possess solid understanding of bias-variance trade-off, cross-validation, as well as steps dat must be carried out to avoid overfitting.
  • Experienced in writingcomplex SQL queries and designing various database objects like Stored Procedures, Indexes, Views, and Triggers.
  • Designed and implemented critical database solutions dat are reliable, scalable, and perform at a high-level to meet teh service levels associated with teh software dat support teh core product of teh organizations.
  • Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS.
  • Well-versed in manipulating data using string functions, aggregate functions in MySQL as well as Sybase database.
  • Exposure to cloud computing using Amazon AWS (S3 and EMR).
  • Capability to use MLlib, Spark's Machine learning library to build and evaluate different models.
  • Good at using pivot table operations using Microsoft Excel for slicing and dicing data, counting counts, sums, and other aggregate metrics.
  • Excellent ability to communicate technical details to non-technical audience.
  • Proficient at summarizing and presenting key insights through effective visualizations using Microsoft PowerPoint.
  • Implemented projects in various project life cycle and SDLC methodologies - Scrum, Waterfall.
  • Ability to work independently and interdependently within a team setting.

TECHNICAL SKILLS

Programming Languages: Python, R, SQL, Matplotlib, NumPy, Pandas

Database: Sybase, SQL Server

Big Data: Spark, PySpark, Spark ML.

Machine Learning Methodology: Supervised and Unsupervised Learning

Analytical models: Linear Regression, Logistic Regression, PCA, SVM, Classification, Predictive Modelling

Tools: R Studio, Jupiter Notebook, Putty, AWS Spark, Microsoft Excel, Scikit-Learn, SciPy

Others: Exploratory Data Analysis, Data visualization, NLP, Statistical Analysis, Regression Analysis, Correlation

PROFESSIONAL EXPERIENCE

Confidential, San Diego, CA

Data Scientist

Responsibilities:

  • Loaded dataset and merged customer information table and subscription table into one table. Performed exploratory data analysis to answer business questions.
  • Worked on various graph methods to visualize and understand teh data like Scatter plot, pie-chart, bar charts, boxplot, histograms using seaborn and Matplotlib.
  • Cleansed data by applying various techniques like, missing value treatment, outlier treatment, data normalization and hypothesis testing.
  • Manipulated and summarized data to maximize possible outcomes efficiently.
  • By performing dataset slicing and dicing, found out factors/variables dat are responsible for churning.
  • Identified which group of customers are likely to churn through exploratory analysis. Visualized teh findings.
  • Discovered wat kind of service most and least preferred among customers.
  • Performed regression analysis and correlation analysis.
  • Leveraged class imbalances technique from Python for dataset balancing and improved teh accuracy in churn prediction by 50%.
  • Increased revenue of teh business by $6000 on monthly basis.
  • Provided recommendations for customer engagement.
  • Presented findings, business impact to technical team using Microsoft Presentation.

Environment: Python 3.6, NumPy, Pandas, Matplotlib, Seaborn, Microsoft Excel, Microsoft PowerPoint.

Confidential

Data Analyst/Software Engineer

Responsibilities:

  • Predicted whether a borrower will default teh loan or not using Python.
  • Discovered patterns on borrowers who are likely to default teh loan using NumPy, Pandas and visualized teh findings using Seaborn and Matplotlib.
  • Merged different data sets into a single data set and read data from various data sources like CSV, text, excel and json formats.
  • Performed pre-processing steps like imputing missing values, removing highly correlated variables and converting categorical variables to dummy variables.
  • Selected best features using Recursive Feature Elimination to prevent teh curse of dimensionality.
  • Involved in model building process and incorporated cross-validation technique to avoid over-fitting and implemented classification algorithms like Logistic Regression, Decision Trees, Random Forest, Support Vector Machines.
  • Validated teh machine learning classifiers using AUC ROC curve and teh accuracy. Teh accuracy of teh best model is 82.7% using Decision Trees.
  • Presented teh findings to teh technical team.

Environment: Python, Pandas, NumPy, Scikit-Learn, Seaborn, PCA, Linear models, Non-linear models, Ensemble models.

Confidential

Responsibilities:

  • Involved in all phases of teh SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance with timely delivery against aggressive deadlines.
  • Developed stored procedures, written complex SQL queries using multiple table joins, sub-queries with knowledge of optimal software performance techniques and generated reports for business shareholders and customers.
  • Migrated code from testing to production environment ensuring accuracy and allowing stakeholders to leverage teh information for strategic business decision.
  • Monitored systems post go live. Proactively delivered solutions for continuous improvement.
  • Worked in scrum team setting as a developer with very good understanding of each of teh scrum roles.
  • As a scrum team player, liaised with product manager and product owner to understand client requirements and halped develop software dat exceeds client’s expectations.
  • Collaborated with team members to eliminate unwanted dependencies of auto sys batch jobs.
  • Automated reports to end clients using TCL scripts and job scheduler dat saved manual efforts of 72 hours/month.
  • Improved teh performance of stored procedure by using techniques like bulk insertion of data from teh source to teh database.

Environment: Sybase, TCL scripting, SQL, python scripting, Putty, UNIX.

We'd love your feedback!