Data Scientist Resume
Plano, TX
SUMMARY
- 4 years of strong experience in Python programming, Deep Learning, Machine Learning, Data Visualization and Data Analytics.
- Expertise in transforming business requirements into building models, designing algorithms, developingreports based on a massive volume of unstructureddataand structured data.
- Proficient in working wif Numpy for matrix calculations, writing linear algebra equations and numerical processing.
- Experience in working wif Pandas for manipulating data - frames, handling missing values, profiling data.
- Working experience wif matplotlib for creating bar charts, correlation graphs and other visualizations.
- Hands on experience in Scikit-learn, Keras, Tensor-flow for writing end-to-end machine learning and deep learning applications.
- Hands on experience in using R machine learning packages.
- Expert in working wif PySpark for distributed computing using PySpark RDD.
- Proficient in integratingdata, profiling, validating andcleansing data using Python.
- Hands on experience increating data visualizations, dashboards in a Tableau desktop.
- Adept in writing complex SQL queries for data analysis.
- Experience using AWS S3 for data storage and accessing it in using a programming framework.
- Hands on experience in AWS Lambda for serverless computing and AWS EC2 for training deep learning models in teh cloud.
- Hands on experience implementing supervised learning algorithms such as Neural Networks, Linear Regression, Logistic Regression, Support Vector Machines, Decision Trees, KNN, Random Forest.
- Experience implementing Convolution Neural Networks (CNN) for computer vision and object detection tasks.
- Experience implementing Recurrent Neural Networks (RNN) for natural language processing tasks such as text classification, sentiment analysis.
- Worked on mlflow library for end-to-end machine learning library.
- Expert in text analysis by using a Stanford nltk library.
- Hands on experience in scrapping and crawling web sites using python to gather data.
TECHNICAL SKILLS
Certifications: Neural Networks and Deep Learning by deeplearning.ai, Improving Deep Neural Networks Hyper-parameter tuning Regularization and Optimization by deeplearning.ai, Structuring Machine Learning Projects wif deeplearning.ai, Convolutional Neural Networks by deeplearning.ai, Applied AI wif Deep Learning by IBM, Fundamentals of Scalable Data Science by IBM, Advanced Machine Learning and Signal Processing by IBM, SQL for Data Science by UCDAVIS, Hands-on Tableau for Data Science, AWS Cloud Practitioner on Linux academy.
Languages: Python, C, C++, SQL
Libraries and Frameworks: Keras, NumPy, Pandas, Matplotlib, Scikit-Learn, Tensor-flow
Programing Software: Microsoft Visual Studio, Microsoft SSMS, Azure Data Studio, Jupyter Notebooks, PyCharm, Tableau Desktop, Tableau Prep, Azure ML Studio, Databricks, Apache Spark, Git
Machine Learning Algorithms & Models: Convolutional Neural Network, Recurrent Neural Networks, Linear Regression algorithms, Logistic Regression model, Anomaly Detection, K-Means Clustering, Decision Tree algorithm, Random Forest, Time series analysis.
Research Papers: Classification of document in NLP (Natural Language Processing) using RNN (Performed text preprocessing to clean text data by removing stop words, using nltk for tokenization and lemmatization, using BOW, word2Vec, TF-IDF for vectorization, and creating natural language models using Recurrent Neural Model)
PROFESSIONAL EXPERIENCE
Confidential, Plano, TX
Data Scientist
Responsibilities:
- PerformedDataCollection,DataCleaning,DataClustering, DataVisualization using Python.
- Preprocessed data by recognizing missing values, outliers, invalid values.
- Used tableau desktop for creating data visualizations.
- Captured data into pandas in-memory and fast data-frames to perform data wrangling and be utilized by other libraries efficiently.
- Used python scientific library stack.
- Was involved in creating a neural network models using Python and Tensor-flow on cloud.
- Performed hyper-parameter tuning for increasing efficiency and accuracy.
- Implemented regularization techniques for reducing overfitting in teh data.
- Created Amazon S3 for storage and EC2 machine images for training models.
- Used Amazon lambda for serverless computing.
Environment: Python, Tableau Desktop, Tableau Prep, AWS services, Matplotlib, NumPy, Jupyter Notebooks
Confidential, Dallas, TX
Consultant, Data Science/ Python Programmer
Responsibilities:
- Captured data from databases and integrated required data into one data source.
- Pre-processed text data using nltk library for tokenization, stop-word removal, stemming etc.
- Performed feature extraction and reduction by using bigrams and trigrams.
- Created word vectors using BOW, TF-IDF and word2vec algorithms.
- Developed sentiment analysis and entity extraction model using RNN in Keras.
- Reiterated multiple times to create automated end-to-end ML pipeline.
- Deployed model which was giving results using web APIs.
- Created model summaries to compare results of various models.
- Built reports and visualization for easy analytics.
Environment: Python, Jupyter Notebooks, Tensorflow, AWS EC2, R programming, nltk.
Confidential
Consultant, Data Science/Data Engineering
Responsibilities:
- Worked on around 20TB of click data.
- Captured data from AWS S3 buckets using teh Python and PySpark.
- Used Spark SQL to query teh data.
- Performed data preprocessing by removing missing data, imputing required values in place of missing data.
- Created visualizations and dashboard for teh stakeholder in a tableau desktop.
- Performed feature engineering by dropping irrelevant variables for better model generalization.
- Created extra features using features from an original dataset.
- Implemented random forest and logistic regression models using a sparks machine learning library.
- Used mini-batch gradient descent for optimizing teh algorithm.
- Created general as well as campaign specific models.
- Achieved 0.80 of F-score on teh final logistic regression model.
- Developed models in an anaconda environment using jupyter notebooks.
- Used lasso and ridge regularization using sparks regularization methods.
Environment: Python, PySpark, SparkSQL, T-SQL, AWS S3, Jupyter Notebooks, Tableau Desktop.
Confidential
Python Programmer
Responsibilities:
- Cleaned data using a python regular expressions package.
- Created sparks RDDs for distributing computation.
- Performed analytics on logs to capture frequent hosts, top endpoints, unique daily hosts etc.
- Involved in creating visualizations for analyses of data.
- Used PyCharm IDE to create end-to-end analytics pipeline.
- Involved in debugging python applications.
Environment: Python, PyCharm, PySpark