Resume
Principal Data Scientist Bothell, WA
SUMMARY:
- True Data Scientist with seven years of progressive experience in this field; started as an ML intern in 2011 and have progressed to taking up Principal and Lead DS roles in the last two years
- Practitioner of Data Mining methodologies like Cross Industry Standard Process for Data Mining (CRISP - DM) and Knowledge Discovery in Databases (KDD)
- Experienced in presenting in an accessible way to executive-level stakeholders and colleagues alike to gain their support for data-driven initiatives and strategies.
- Visualization expertise of Big Data in Python matplotlib and Tableau.
- Well versed in algorithm and design techniques. Ability to document ML project requirements and assess deliverable timelines. Proficiency in manipulating and analyzing complex, high-volume, high-dimensionality data from varying sources.
- Extensive experience in data cleaning and data transformation activities using Python and R
- Proficient in MS Office applications (Word, Excel, PowerPoint, Access, Project)
- Have guided and mentored team members in implementation and execution of machine learning models Confidential different levels of project life cycle.
TECHNICAL SKILLS:
PROGRAMMING LANGUAGES: Python, Java, C, C++, R
MACHINE LEARNING: Machine Learning Techniques such as Data Preprocessing (Data Cleaning), Regression models, Classification, Clustering, Association Rule learning (Apriori and Eclat), Reinforcement Learning (UCB and Thompsons Sampling), Natural Language Processing (NLTK, SpaCy), Text Mining, Data Extraction, Predictive Modeling, Statistical Modeling, Dimensionality Reduction (PCA and SVD) and Recommender Systems (Collaborative Filtering) .
DEEP LEARNING: NLP algorithms coupled with Deep Learning (ANN and CNN), Time Series Analysis, Speech and Text Analysis (RNN, LSTM), SOMs, Recommender Systems (RBM, AutoEncoders), libraries such as Keras, Tensorflow and PyTorch
DATABASE: MySQL, Apache Spark, NoSQL (MongoDB and Dynamo DB)
IDE: Anaconda - Spyder, IPython - Jupyter, Amazon Sagemaker, Azure ML Studio
WORK EXPERIENCE:
Principal Data Scientist
Confidential, Bothell, WA
Responsibilities:
- Develop methods to create Multi-Touch Attribution and Propensity models with click stream data; Predictive Analytics to find the likelihood of a customer purchase behavior.
- Work with stakeholders throughout the organization to identify opportunities for leveraging click stream data.
- Mine and analyze data from multiple sources including database and real time streaming.
- Assess the effectiveness and accuracy of data sources and data gathering techniques.
- Coordinate with different functional teams to implement models and monitor outcomes.
Environment: Adobe Analytics, Click stream data, Python, Elliptic Envelope, Isolation Forest, One Class SVM, ANN-Classifier, XGBoost Classifier, Markov Chain, Shapely Value, Tensorflow, Keras, Anaconda
Lead Data Scientist
Confidential, Tampa, FL
Responsibilities:
- Built a data science platform on the top of DTCC’s data lake to explore the data and derive various use cases that are to be performed on the data.
- Set up the platform for MLiy with all data science related libraries of Python and R in AWS-EC2.
- Set the composition of AMI’s to execute data science use cases.
- Implemented ML algorithms using Amazon Sagemaker
Environment: Python, R, MLiy (Jupyter & H2O), Tensorflow, NLTK, SpaCy, GIT, AWS, EC2, RHEL7, Sagemaker, JIRA.
Lead Machine Learning/ AI Scientist
Confidential, Tampa, FL
Responsibilities:
- Preprocessed the documents that are to be redacted. Developed and Configured the entities using SpaCy
- Trained the model and performed redaction using the displaCy, the visualization module of SpaCy.
- Cleaned, preprocessed the HSPS request’s dataset. Designed the classification model to make decisions on the requests made.
- Achieved the desired efficiency rate by applying ML’s Random Forest.
- Constructed an Artificial Neural Network (ANN)’s classifier to make the final decision.
- Built text classification algorithm using Natural Language Processing.
- Build an enterprise level chat bot. Configured and trained the ML model of the bot.
Environment: Python, numpy, pandas, matplotlib, and scikit-learn, Spyder, CART, Random Forest, NLP, SpaCy, NER, Deep Learning (CNN), Keras, Tensorflow, IBM Watson, Azure ML Studio, Node JS, and VSTS.
Sr. Data Scientist
Confidential, Tampa, FL
Responsibilities:
- Mentored large scale data and analytics using advanced statistical and machine learning models.
- Developed a predictive model using Random Forest Regression to predict the upcoming month’s claims and possible policy cancellations using XG Boost.
- Achieved efficiency of 86.2% in churning problem using XG Boost.
- Designed Service Request Analysis model using Natural Language Processing (NLP)
- Worked with Data Visualization team which used Tableau, Performed data visualization using matplotlib.pyplot and Seaborn
- Discover patterns, formulate and test hypotheses, translate results into strategies which drive growth resulting in increased revenues and customer satisfaction.
- Performed image classification by building neural network using tensorflow and pytorch.
- Interpreted complex simulation data using statistical methods.
Environment: Python 3, numpy, pandas, matplotlib, scikit-learn, Spyder, Jupyter, Apache - Spark, mllib, CART, Random Forest, XG Boost, NLP, NLTK, Bag of Words, Deep Learning (ANN and CNN), Keras, Tensorflow, PyTorch, PySpark, Scala, Jira, Mongo DB, MySQL.
Software Engineer
Confidential
Responsibilities:
- Designed and developed machine learning models in Apache - Spark (MLlib)
- Analyzed and trained large datasets using various machine learning algorithms to provide strategic direction to the company
- Performed data cleaning on large datasets using data preprocessing methods which resulted in reduction of processing errors by 20% and conserved storage space of 2TB.
- Developed algorithms using classification models to determine the supply - chain efficiency.
- Developed search algorithm for patient’s medical records.
- Designed Health Care Chart using ARL’s Apriori algorithm and reduced the previous model’s error by 10%.
- Performed Data Migration from APIs to AWS using Dynamo DB.
- Applied machine learning models to Terabytes of data.
Environment: Python 3x, Anaconda - Spyder, TF-IDF, Association Rule Learning, Apriori, pyspark, numpy, pandas, scikit - learn, CART, MySQL, Dynamo DB
Machine Learning Engineer (Intern)
Confidential
Responsibilities:
- Collected and analyzed large amount of data from all states of India.
- Performed Data Preprocessing and Data Cleaning and organized the collected data.
- Classified and organized the data with respect to gender, region (urban and rural), financial report, religion and caste.
- Applied machine learning (clustering) and statistical models to the collected data.
- Created Data classes and DB tables for integrating with external systems using MySQL
- Worked in Agile Methodology development
Environment: Python 2.7, Java, Eclipse, Anaconda - Spyder, MySQL, Agile
Machine Learning Engineer (Intern)
Confidential
Responsibilities:
- Implementation of Document Management Solutions
- Worked on end to end creation of the application.
- Worked with huge datasets. Performed data cleaning and preprocessing.
- Developed Use Case diagrams, Class diagrams and Sequence diagrams to express the detail design.
- Integration, Installation and Deployment.
- Created Data classes and DB tables for integrating with external systems using MySQL
- Involved in Integration - Connecting to systems internally and external to the organization.
Environment: Python 2.7, Java, Eclipse, Anaconda - Spyder, MySQL, Agile