Data Scientist Resume
Auburn Hills, MI
SUMMARY
- Data scientist with 7years of experience in transforming business requirements into actionable data models, prediction models and informative reporting solutions.
- Expert in the entire Data Science process life cycle including Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Machine Learning Algorithms, Validation and Visualization.
- Experience in Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, MATLAB.
- Hands - on experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, GLM, CART, SVM, KNN, LDA/QDA, Naive Bayes, Random Forest, XG Boost, Deep Learning etc.
- Experienced in Artificial Neural Networks, CNN, RNN (LSTM, GRU)
- Experience with Natural language techniques such as Tokenization, Lemmatization, Stemming, Count Vectorization, TF-IDF Vectorization, and Word2Vec.
- Proficient in Python and its libraries such as NumPy, Pandas, Scikit-learn, Matplotlib and Seaborn,
- Experience in working with deep learning framework Tensorflow, Keras and Pytorch
- Experience working on Integrated Development Environments (IDE) like PyCharm, Sublime Text and Eclipse.
- Developed highly scalable classifiers and tools by leveraging machine learning, Apache spark & deep learning.
- Proficiency in R (e.g. ggplot2, cluster, dplyr, caret), Python (e.g. pandas, Keras, Pytorch, NumPy, scikit-learn, bokeh, nltk), Spark - MLlib, H20, or other statistical tools.
- Worked on integration of diverse mathematical and statistical procedures, pattern recognition, model building
- Knowledge and experience in agile environments such as Scrum and version control tools such as GitHub/Git.
- Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction from Cloud and merging from Oracle.
- Experienced in Big Data with Hadoop, HDFS, Map Reduce, and Spark.
- Hands-on experience in importing and exporting data using Relational Database including MySQL and MS SQL Server, and NoSQL database like MongoDB.
- Good team player and quick learner; highly self-motivated person with good communication and interpersonal skills.
TECHNICAL SKILLS
Programming Languages: Python, R, C, C++, MATLAB
ML Algorithms: Linear Regression, Logistic regression, Principal Component Analysis (PCA), K-means, Random Forest, Decision Trees, SVM, K-NN, Deep learning (CNN, RNN) and Ensemble methods
Python Packages: Scikit Learn, NumPy, Pandas, Keras, NLTK, Matplotlib, Seaborn, Scipy
Deep Learning Framework: Tensor Flow, NLP, Keras, Pytorch
Big Data Ecosystems: Hadoop, Spark
Database Systems: SQL, MongoDB
Operating System: Linux, Windows, Unix
PROFESSIONAL EXPERIENCE
Confidential - Auburn Hills, MI
Data Scientist
Responsibilities:
- Responsible for data identification, collection, exploration, and cleaning for modeling, participate in model development for buyback program.
- Develop models using Random forests, XGboost.
- Used Pandas, Numpy, Scipy, Matplotlib, Sci-kit-learn and NLTK in Python for developing various machine learning algorithms.
- Tuned the models using Machine learning algorithms Bayes point, logistic regression, decision tree and neural network models for good accuracy and deploy prediction models and test on the test data.
- Visualize, interpret, report findings, and develop strategic uses of data by python Libraries like Numpy, Scikit-learn, Matplotlib, Seaborn.
- Used Python 3.X (NumPy, SciPy, pandas, Scikit-learn, seaborn) and R (caret, trees, arules) to develop variety of models and algorithms for analytic purposes.
- Performed Data Cleaning, features scaling, features engineering.
- Missing value treatment, outlier capping and anomalies treatment using statistical methods, deriving customized key metrics.
- Performed analysis using industry leading text mining, data mining, and analytical tools and open source software.
- Applied natural language processing (NLP) methods to data to extract structured information.
- Implemented deep learning algorithms such as Artificial Neural network (ANN) and Recurrent Neural Network (RNN), tuned hyper-parameter and improved models with Python packages TensorFlow.
- Evaluated models using Cross Validation, ROC curves and used AUC for feature selection.
- Dummy variables where created for certain datasets to into the regression.
- Creating data pipelines using big data technologies like Hadoop, spark etc.
Environment: Python, R, Deep Learning, NLP, TensorFlow, Machine Learning, ROC, Hadoop, AUC, SQL, Spark, MongoDB
Confidential - Chicago, IL
Data Scientist
Responsibilities:
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization to deliver data science solutions
- Gathering Requirements from FIU (Financial Intelligence Unit) and Regulators.
- Developed advanced analytical models and computational solutions using large-scale data manipulation and transformation, statistical analysis, machine learning, visualization.
- Generate reports to meet regulatory requirements.
- Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
- Experience in Deep Learning frameworks like MXNet, Caffe 2, Tensorflow, Theano, CNTK, and Keras to help our customers build DL models.
- Creating statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
- Performed data imputation using Scikit-learn package in Python.
- Used RMSE/MSE to evaluate different models' performance.
- Create different charts such as Heat maps, Bar charts, Line charts etc.
Environment: Python, Numpy, Pandas, Predictive models, Scikit-learn
Confidential
Data Scientist
Responsibilities:
- Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
- Experimented and built predictive models including ensemble methods such as Gradient boosting trees and Neural Network to predict Sales amount.
- Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
- Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
- Used RMSE/MSE to evaluate different models' performance.
- Worked closely with internal stakeholders such as business teams, product managers, engineering teams, and partner teams
- Designed database solution for applications, including all required database design components and artifacts.
Environment: Python, Matplotlib, Seaborn, Tableau, Numpy, Pandas, ETL, SQL, Predictive models, Scikit-learn
Confidential
Data Scientist
Responsibilities:
- Responsible for reporting of findings that will use gathered metrics to infer and draw logical conclusions from past behavior.
- Write SQL queries to pull historical data from Policy and Claim Center
- Collected data needs and requirements by Interacting with other departments.
- Created various types of data visualizations using Python
- Communicated the results with operations team for taking best decisions.
- Data Wrangling and analysis in Python for statistical models like Retention Model
- Used Logistic Regression, Random Forest, Decision Tree, SVM to build predictive models
- Created reports/presentations to communicate insights to both technical and non-technical person.
Environment: Python, SQL, Retention model, Logistic Regression, Random Forest, Decision Tree