Data Scientist Resume
Chicago, IL
SUMMARY
- 6 years’ experience in Data Science.
- Demonstrated skill in statistical analysis, data analytics, data modeling, and creation of custom algorithms.
- Adept with machine learning and neural networks using a variety of systems and methods.
- Experienced with NLP and Computer Vision technologies.
- Experience with a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction
- Develop, deploy, and maintain production NLP models for scalability.
- Work with Natural Language Processing with NLTK, SpaCy, and other modules for application development for automated customer responses.
- Expert in Robotic Process Automation (RPA) to implement Machine Learning model for NLP & CV
- Applies advanced statistical and predictive modeling techniques to build, maintain, and improve multiple, real - time decision systems. Closely works with product managers, Service development managers, and product development team in productizing the algorithms developed.
- Experience in designing star schema, Snowflake schema for Data Warehouse, and ODS architecture.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards and Storylines on web and desktop platforms.
- Hands-on experience working with Support Vector Machines (SVM), K Nearest Neighbors (KNN), Random Forests, Decision Trees, Bagging, Boosting (AdaBoost, Gradient Boosting, XGBoosting), Neural Networks (FNN, CNN, RNN, LSTM).
- Experience with Public Cloud (Google Cloud, Amazon AWS, and/or Microsoft Azure).
- Experience with knowledge databases and language ontologies.
- Experience with data analysis methods such as data reporting, ad-hoc data reporting, graphs, scales, pivot tables, OLAP reporting with Microsoft Excel, R Markdown, R Shiny, Python Markdown, R Studio
- Discover patterns in data using algorithms and SQL queries and use an experimental and iterative approach to validate findings in Python using TensorFlow.
- Creative thinking and proposing innovative ways to look at problems by using data mining approaches on the set of information available.
- Identifies/creates the appropriate algorithm to discover patterns and validate their findings using an experimental and iterative approach.
- Experience in working with relational databases (Teradata, Oracle) with advanced SQL programming skills.
- In-depth knowledge of statistical procedures that are applied in Supervised / Unsupervised problems
- Basic-Intermediate level proficiency in SAS (Base SAS, Enterprise Guide, Enterprise Miner) & in UNIX
- Track record of applying machine learning techniques to marketing and merchandising ideas.
TECHNICAL SKILLS
Data Science Specialties: Natural Language Processing, Machine Learning, Predictive Maintenance, Stochastic Analytics, Internet of Things (IoT) analytics, Social Analytics
Analytic Skills: Bayesian Analysis, Inference, Models, Segmentation, Clustering, Na ve Bayes Classification, Sentiment Analysis, Predictive Analytics, Regression Analysis, Linear models, Multivariate analysis, Stochastic Gradient Descent, Sampling methods, Forecasting
Data Query: Azure, Google BigQuery, Amazon RedShift, Kinesis, EMR; HDFS, RDBMS, SQL, MongoDB, HBase, Cassandra, NoSQL, data warehouse, data lake, and various SQL and NoSQL databases and data warehouses.
Languages: Python, R, Command Line, C++/C, SQL, Java
Version Control: GitHub, Git, SVN
IDEs: Jupyter Notebook, PyCharm, IntelliJ, Spyder, Eclipse
RPA: Implemented Robotic Process Automation into several industries
Deep Learning: Multi-Layer Perceptron, Machine Learning algorithms, Neural Networks, TensorFlow, Keras, PyTorch, Convolutional Neural Networks (CNN s), Recurrent Neural Networks (RNN s), LSTM
Python Packages: Numpy, Pandas, Scikit-learn, Tensorflow, SciPy, Matplotlib, Seaborn, Plotly, NLTK, Scrapy, Gensim
Analytic Tools: Classification and Regression Trees (CART), H2O, Support Vector Machine, Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, RNN, Linear and Logistic Regression
Analytic Languages and Scripts: Python, R, HiveQL, Spark, Spark MLlib, Spark SQL, Hadoop, Scala, Impala, MapReduce
Soft Skills: Deliver presentations and technical reports. Collaborate with stakeholders and cross-functional teams. Advise about how to leverage analytical insights. Develop analytical reports that directly address strategic goals.
Cloud Computing: AWS, GCP, Azure
PROFESSIONAL EXPERIENCE
Confidential, Chicago, IL
Data Scientist
Responsibilities:
- Performed data profiling on available data sources to identify potentially useful data sources for the proposed machine learning use cases.
- Consulted with the Compliance Department to determine relevant use cases.
- Applied models Convolutional and Recurrent Neural Networks, LSTM, and Transformers.
- Used with NLP models such as BERT, GPT, ELMO, and more
- Applied Python packages NumPy, Matplotlib, Plotly, Pandas, SciPy, andFeatureTools for data analytics, cleaning, and feature engineering.
- Utilized Docker, AWS, Python, NoSQL, and Kubernetes.
- Used Hadoop HBase on Spark using PySpark modules for retrieving data from a NoSQL database.
- Utilized NLTK and Gensim for NLP processes such as Tokenization and for creating custom Word Embeddings.
- Utilized TensorFlow for building Neural Network models.
- Used Bagging and Boosting Methods (XGBoost, Random Forrest, etc.).
- Utilized Docker to contain the model for use in applications.
- Deployed operational models to a RESTful API using the Python Flask package and Docker containers.
- Created customized applications to make critical predictions, automate reasoning and decisions and calculate optimization algorithms.
- Developed advanced easy-to-understand visualizations to map and simplify the analysis of heavily numeric data and reports.
- Designed, implemented, and evaluated new models and rapid software prototypes to solve problems in machine learning and systems engineering.
- Analyzed work to generate logic for new systems, procedures, and tests.
- Implemented and evaluated artificial intelligence and machine learning algorithms and neural networks for diverse industries.
- Improved performance of models with fine-tuning and data cleaning.
- Applied new technologies to improve the performance of models and reduce the time to build machine learning models
- Practiced Agile approaches, including Extreme Programming, Test-Driven Development, and Agile Scrum.
Confidential, Indianapolis, IN
Data Scientist
Responsibilities:
- Extracted text from documents using OCR.
- Applied cosine similarity and Bert to find relevant sections of text in documents.
- Applied OCR to extract handwritten signatures and dates.
- Generated Regex patterns to collect text from relevant sections.
- Utilized OpenCV to find page numbers and text coordinates.
- Stored data on a local Hadoop cluster.
- Led weekly presentations to business stakeholders to refine the output.
- Used Jira for sprint planning and cards.
- Used Bitbucket and Git for code management.
- Built deep learning neural network models from scratch using GPU-accelerated libraries like PyTorch.
- Employed PyTorch, Scikit-Learn, and XGBoost libraries to build and evaluate the performance of different models.
- Used Time Series Analysis with ARIMA, SARIMAX, CNN, LSTM and RNN
- Utilized Amazon Textract machine learning (ML) service to automatically extract text, handwriting, and data from scanned documents.
- Used Containers and Kubernetes for Model Deployment
- Configured Pandas for data manipulation.
- Troubleshot machine-learning models in Python (TensorFlow) code to keep the pipeline moving using PyTest packages.
- Applied Fuzzysearch algorithms to help locate records relevant searches.
- Designed deep learning models and Amazon Web Services (AWS) EC2 for model using TensorFlow.
- Used gradient boosted trees and random forests to create a benchmark for potential accuracy
- Utilized K-Means, Gaussian Mixture Models, DBSCAN, and Hierarchical Agglomerative clustering algorithms to discover patient groups.
- Filled missing data using k-Nearest Neighbors (kNN). (Python, sklearn)
- Documented solutions and presented the results to stakeholders.
- Utilized Python, Seaborn, and Matplotlib to produce data visualizations highlighting a variety of metrics.
Confidential, New York, NY
Data Scientist
Responsibilities:
- Classified thousands of articles and tweets to build a complete dataset for the model.
- Constructed an NLP-based filter utilizing embedding and LSTM layers in TensorFlow and Keras.
- Produced classifications of whether a given text was news or fit into other categories of potential interest, such as spam.
- Cleaned text to standardize input to the model and ensure consistent results.
- Built functions to automatically remove symbols, hyperlinks, and emojis, and did spell checking on the received text.
- Built exception handling to treat potential edge cases of incorrect or unusable data being fed to the model in production.
- Ran sentiment analysis of text and determined whether the text was overall positive, negative, or neutral.
- Deployed solutions to a Flask app on a cloud-based service (AWS) to which future user applications are connected via an API.
- Tested and compared this solution to those of AWS Sagemaker’s Comprehend, which achieved a slightly higher accuracy of 94.7%.
- Performed stemming and lemmatization of text to remove superfluous components and make the resulting corpus as small as possible while containing all important information.
- Produced a bag of words compiled and built from scratch using NLTK and TensorFlow packages for text processing and tokenization.
- Finalized model was then handed over to Android and iOS app developers along with web developers to create a user front-end.
Confidential, Bloomfield Hills, MI
Data Scientist
Responsibilities:
- Used Python and sklearn to interpret Random Forest statistical results to facilitate enhanced decision-making by stakeholders.
- Modeled sale price data with Neural Networks using Keras API for TensorFlow, Random Forests, and XGBoost (Gradient Boosted Decision Trees).
- Used ARIMA and SARIMA models for Time Series
- Applied Python to perform Survival Analysis on time until sale date using Cox Proportional Hazards.
- Conducted statistical significance testing on factors in Cox Proportional Hazards.
- Scraped data / mined data on replacement part prices to assess the cost of damages.
- Produced SQL tables/databases to store part of the pricing data.
- Classified degree of damage in used car damage reports using Sentiment Analysis Natural Language Processing (NLP).
- Wrote SQL queries to merge data from multiple tables to obtain relationships between used car model data, damage data, sale price data, and time until sale data.
- Utilized a Git versioning environment.
- Provided actionable insights about which used cars to keep on the lot in order to maximize revenue from lot space by determining which cars provided the largest return per unit time on the lot.
- Used Tableau to create visualizations of pricing and survival analysis.