Data Scientist Resume San Diego, CA - Hire IT People

SUMMARY

Overall 6 + years of experience in teh field of Statistical Modeling, Hypothesis testing, Predictive modeling, Machine Learning, Multivariate Analysis, Correlation, ANOVA, Data Analytics, Text Mining, solving SQL queries, Database Management.
Expert in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions dat scales across massive volume of structured data.
Well - versed in predictive statistical modeling using Python and R.
Proven track record of successfully handling data science projects from start to finish.
Proficient inData Analysis, Cleansing, Transformation, Model Building, Model Evaluation, and Presentation.
Extensively worked on Python 3.5/2.7 (NumPy, Scipy, Pandas, Matplotlib, Seaborn, NLTK and Scikit-Learn).
Experienced in extracting data from various data sources like csv, text, tab separated values, pipe separated values, reading tables from teh web using Python.
Expert in performing data manipulation operations using Python.
Skilled in R programming using packages like caret, ggplot2, dplyr.
Expert in implementing machine learning algorithms like K-Means, Clustering, KNN, Naïve Bayes, SVM, Decision Trees, Linear and Logistic Regression Methods.
Experienced in applying ensemble methods such as Random Forests and Gradient Boosting for improving accuracy of predictions.
Exposure to Natural Language Processing (NLP) and text mining.
Experienced in Dimensionality Reduction methods such as PCA (Principal component Analysis) and Regularization techniques like Lasso and Ridge.
Possess solid understanding of bias-variance trade-off, cross-validation, as well as steps dat must be carried out to avoid overfitting.
Experienced in writingcomplex SQL queries and designing various database objects like Stored Procedures, Indexes, Views, and Triggers.
Designed and implemented critical database solutions dat are reliable, scalable, and perform at a high-level to meet teh service levels associated with teh software dat support teh core product of teh organizations.
Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS.
Well-versed in manipulating data using string functions, aggregate functions in MySQL as well as Sybase database.
Exposure to cloud computing using Amazon AWS (S3 and EMR).
Capability to use MLlib, Spark's Machine learning library to build and evaluate different models.
Good at using pivot table operations using Microsoft Excel for slicing and dicing data, counting counts, sums, and other aggregate metrics.
Excellent ability to communicate technical details to non-technical audience.
Proficient at summarizing and presenting key insights through effective visualizations using Microsoft PowerPoint.
Implemented projects in various project life cycle and SDLC methodologies - Scrum, Waterfall.
Ability to work independently and interdependently within a team setting.

TECHNICAL SKILLS

Programming Languages: Python, R, SQL, Matplotlib, NumPy, Pandas

Database: Sybase, SQL Server

Big Data: Spark, PySpark, Spark ML.

Machine Learning Methodology: Supervised and Unsupervised Learning

Analytical models: Linear Regression, Logistic Regression, PCA, SVM, Classification, Predictive Modelling

Tools: R Studio, Jupiter Notebook, Putty, AWS Spark, Microsoft Excel, Scikit-Learn, SciPy

Others: Exploratory Data Analysis, Data visualization, NLP, Statistical Analysis, Regression Analysis, Correlation

PROFESSIONAL EXPERIENCE

Confidential, San Diego, CA

Data Scientist

Responsibilities:

Loaded dataset and merged customer information table and subscription table into one table. Performed exploratory data analysis to answer business questions.
Worked on various graph methods to visualize and understand teh data like Scatter plot, pie-chart, bar charts, boxplot, histograms using seaborn and Matplotlib.
Cleansed data by applying various techniques like, missing value treatment, outlier treatment, data normalization and hypothesis testing.
Manipulated and summarized data to maximize possible outcomes efficiently.
By performing dataset slicing and dicing, found out factors/variables dat are responsible for churning.
Identified which group of customers are likely to churn through exploratory analysis. Visualized teh findings.
Discovered wat kind of service most and least preferred among customers.
Performed regression analysis and correlation analysis.
Leveraged class imbalances technique from Python for dataset balancing and improved teh accuracy in churn prediction by 50%.
Increased revenue of teh business by $6000 on monthly basis.
Provided recommendations for customer engagement.
Presented findings, business impact to technical team using Microsoft Presentation.

Environment: Python 3.6, NumPy, Pandas, Matplotlib, Seaborn, Microsoft Excel, Microsoft PowerPoint.

Confidential

Data Analyst/Software Engineer

Responsibilities:

Predicted whether a borrower will default teh loan or not using Python.
Discovered patterns on borrowers who are likely to default teh loan using NumPy, Pandas and visualized teh findings using Seaborn and Matplotlib.
Merged different data sets into a single data set and read data from various data sources like CSV, text, excel and json formats.
Performed pre-processing steps like imputing missing values, removing highly correlated variables and converting categorical variables to dummy variables.
Selected best features using Recursive Feature Elimination to prevent teh curse of dimensionality.
Involved in model building process and incorporated cross-validation technique to avoid over-fitting and implemented classification algorithms like Logistic Regression, Decision Trees, Random Forest, Support Vector Machines.
Validated teh machine learning classifiers using AUC ROC curve and teh accuracy. Teh accuracy of teh best model is 82.7% using Decision Trees.
Presented teh findings to teh technical team.

Environment: Python, Pandas, NumPy, Scikit-Learn, Seaborn, PCA, Linear models, Non-linear models, Ensemble models.

Confidential

Responsibilities:

Involved in all phases of teh SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance with timely delivery against aggressive deadlines.
Developed stored procedures, written complex SQL queries using multiple table joins, sub-queries with knowledge of optimal software performance techniques and generated reports for business shareholders and customers.
Migrated code from testing to production environment ensuring accuracy and allowing stakeholders to leverage teh information for strategic business decision.
Monitored systems post go live. Proactively delivered solutions for continuous improvement.
Worked in scrum team setting as a developer with very good understanding of each of teh scrum roles.
As a scrum team player, liaised with product manager and product owner to understand client requirements and halped develop software dat exceeds client’s expectations.
Collaborated with team members to eliminate unwanted dependencies of auto sys batch jobs.
Automated reports to end clients using TCL scripts and job scheduler dat saved manual efforts of 72 hours/month.
Improved teh performance of stored procedure by using techniques like bulk insertion of data from teh source to teh database.

Environment: Sybase, TCL scripting, SQL, python scripting, Putty, UNIX.

We provide IT Staff Augmentation Services!

Data Scientist Resume

San Diego, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship