- A proactive and fast learning individual seeking an opportunity to work as a data scientist utilizing analytical & statistical skills and relevant expertise to help the company achieve business goals while sticking to vision, mission and values.
- Proficient in algorithm and design techniques.
- Ability to document ML project requirements and assess deliverable timelines.
- Visualization of big data in Python matplotlib and Tableau.
- Proficiency in manipulating and analyzing complex, high - volume, high-dimensionality data from varying sources.
- Perform data cleaning and data transformation activities using Python and R
- Practitioner of Data Mining methodologies like Cross Industry Standard Process for Data Mining (CRISP-DM) and Knowledge Discovery in Databases (KDD)
- Practitioner of Software Life Cycle Development (SDLC) process using different approaches like Waterfall, Agile, SCRUM and Test Driven Development (TDD).
- Proficient in MS Office applications (Word, Excel, PowerPoint, Access, Project)
- Experienced in presenting in an accessible way to executive-level stakeholders and colleagues alike to gain their support for data-driven initiatives and strategies.
- Guiding team members in implementation and execution of machine learning models at different levels of project life cycle.
- Highly motivated and a self-starter with effective communication and organizational skills
- Experienced in working with both technical and non-technical team members.
- Tutored high school and undergrad students for 6 years. Good team player and can handle a group as well.
PROGRAMMING LANGUAGES: Python, Java, C, C++, R, Scala
MACHINE LEARNING: Machine Learning Techniques such as Data Preprocessing (Data Cleaning), Regression models, Classification, Clustering, Association Rule learning (Apriori and Eclat), Reinforcement Learning (UCB and Thompsons Sampling), Text Mining, Data Extraction, Predictive Modeling, Statistical Modeling, Dimensionality Reduction (PCA and SVD) and Recommender Systems (Collaborative Filtering) .
DEEP LEARNING: NLP algorithms coupled with Deep Learning (ANN and CNN), Time Series Analysis, Speech and Text Analysis (RNN, LSTM), SOMs, Recommender Systems (RBM, AutoEncoders), libraries such as Keras, Tensorflow and PyTorch
DATABASE: MySQL, Apache Spark, NoSQL (MongoDB and Dynamo DB)
IDE: Anaconda - Spyder, IPython - Jupyter, Eclipse
Confidential, Tampa, FL
Sr. Data Scientist
- Performed Data Cleaning to large quantity of data. Monitored the data using MySQL and Mongo DB.
- Mentored large scale data and analytics using advanced statistical and machine learning models.
- Developed a predictive model using Random Forest Regression to predict the upcoming month’s claims and possible policy cancellations using XG Boost.
- Achieved efficiency of 86.2% in churning problem using XG Boost.
- Designed sentiment analysis model using NLP on internet data.
- Worked with Data Visualization team which used Tableau, Performed data visualization using matplotlib.pyplot
- Discover patterns, formulate and test hypotheses, translate results into strategies which drive growth resulting in increased revenues and customer satisfaction.
- Performed image classification on using tensorflow and pytorch
- Interpreted complex simulation data using statistical methods.
Environment: Python, numpy, pandas, matplotlib, scikit-learn, Spyder, Jupyter, Apache - Spark, mllib, CART, Random Forest, XG Boost, NLP, NLTK, Bag of Words, Data Preprocessing, Deep Learning (ANN and CNN), Keras, Tensorflow, PyTorch, PySpark, Scala, Jira, Mongo DB, MySQL.
- Worked with data science related libraries in Python (numpy, scipy, scikit-learn, etc.)
- Analyzed and trained large datasets using various machine learning algorithms to provide strategic direction to the company
- Performed data cleaning on large datasets using data preprocessing methods which resulted in reduction of processing errors by 20% and conserved storage space of 2TB.
- Developed algorithms using classification models to determine the supply - chain efficiency.
- Developed search algorithm for patient’s medical records.
- Developed a program based on time series analysis to analyze and predict the stock price of the company.
- Designed Health Care Chart using ARL’s Apriori algorithm and reduced the previous model’s error by 10%.
- Performed Data Migration from APIs to AWS using Dynamo DB.
- Applied machine learning models to Terabytes of data.
Environment: Python, Anaconda – Spyder, Apache – Spark, Eclipse, TF-IDF, Association Rule Learning, Apriori, pyspark, numpy, pandas, scikit – learn, Eclipse, Regression, CART, MySQL, Dynamo DB
- Worked on end to end creation of the application.
- Worked with huge datasets. Performed data cleaning and preprocessing.
- Developed Use Case diagrams, Class diagrams and Sequence diagrams to express the detail design.
- Identify reusable components and implement accordingly.
- Involved in Agile Methodology development including Design, Systems Development, Testing, Systems Integration, Installation and Deployment.
- Created Data classes and DB tables for integrating with external systems using MySQL
- Involved in Integration – Connecting to systems internally and external to the organization.
Environment: Python, Java, Eclipse, Anaconda – Spyder, MySQL, Agile/Scrum