- Data Scientist/Machine Learning Engineer with 8 years of progressive experience and emphasis on Data Analytics, Text Mining, Machine Learning, Statistic Modeling, Predictive Modeling and Natural Language Processing (NLP).
- Experience in Data Mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling and Data Visualization.
- Hands - on experience on Python libraries like Numpy, Pandas, Matplotlib, Seaborn, NLTK, Sci-kit learning and SciPy.
- Knowledge in Text mining, Topic Modelling, Sentiment Analysis, Recommendation systems, Named-entity recognition and Hidden Markov models.
- Experience in Agile Methodologies, Scrum stories and sprint experience in a Python based environment, along with data analytics and data wrangling.
- Hands-on implementation of machine learning models in the cloud using Microsoft Azure.
- Knowledge and understanding of SQL Server, Teradata, Hadoop/Hive.
- Experienced in building data models using machine learning techniques for Classification, Regression, Clustering and Associative mining.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Deep Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and creating dashboards using tools like Tableau.
- Regression Analysis (Linear Regression, Lasso Regression, Ridge Regression & Elastic net Regression)
- Natural Language Understanding (Sentiment Analysis, Custom Analyzers, Entity Analysis, Word embedding)
- Natural Language Processing-NLP (LSA, LDA, TF-IDF, Markov Models, Tokenizers, Analyzers, POS tagging)
- Unsupervised techniques (PCA, LDA, LSA using SVD, K-Means, K-Mode, Hierarchical Clustering)
- Dimensionality reduction techniques (PCA, SVD, MFA, MCA)
- Non-Parametric Fast Learning ML Algorithms (Decision Tress, Random Forest, Gradient Boosting (Xgboost, Light Gradient Boosting, CATBoost), SVM)
- Deep Learning Models (Neural Network: ANN, CNN, RNN), Deep Learning Frameworks (Tensor Flow, Keras, H20)
- Data Visualization (Tableau, MicroStrategy)
Statistical Methods: Classification models, Regression models, Time Series, Market-Basket Analysis, Dimensionality Reduction, bootstrapping, Recommender Systems, Neural Networks using Tensorflow, Keras
Programming Languages: Python, R
ETL/BI Tools: Tableau, Excel
Natural Language: Processing Latent Semantic Analysis(LSA), Latent Dirichlet Allocation(LDA), Markov Models, POS tagging
Databases: MySQL, Oracle, Hadoop Hbase, MongoDB, AWS S3
IDE: Spyder, RStudio, Jupyter, JupyterLab, Anaconda, Spyder, RStudio, JupyterLab, Anaconda
Other: GIT, Windows, UNIX, Linux, JIRA, Statistics, Big Data, HTML, Hadoop, Radar, Microsoft Azure, Spark, Google Cloud, Amazon AWS
Confidential, LOWELL, AR
Senior Data Scientist
- Created and implemented a research proposal for the analysis of customer reviews using NLP techniques.
- Applied machine learning algorithms in the fields of text extraction, summarization, classification and categorization also in email classification.
- Used Gain Adversial Networks (GAN’s) with a pair of ANN’s to identify the fake feedback and reviews.
- Developed statistical and NLP algorithms using machine learning to analyze data for better parsing.
- Used TensorFlow for language detection, text summarization and sentimental analysis.
- Performed exploratory data analysis of lexical resources using descriptive statistics.
- Used Python to train, test, evaluate and finally integrate and deploy the various machine learning models.
- Used R & Tableau to understand the data by fitting various plots and performing statistical analysis.
- Used TensorFlow for automatic differentiation capabilities which benefits gradient based machine learning algorithms.
- Performed semantic analysis to categorize annotations collected through different websites where Job-hunts was advertised.
- Worked on the full text classification cycle to prepare and pre-process data, extract features, apply Machine Learning classification models on train and test data.
- Performed text classification, determining whether the given feedback is either positive or negative.
- Performed Word Embedding techniques such as BoW or count vectorizer, TF-IDF, Word2vec to know which words are relevant, frequently used and most important and classify them.
- Performed tokenization, case conversion, word replacement, lemmatizing, stemming during the data preprocessing stage using NLTK package in python.
- Used Parts of Speech tagging to recognize the similarities and differences between words.
- Have used Ascii filter, N-grams filter (specially 2-gram) to determine which words are occurring frequently.
- Used Named-entity recognition (NER) for further filtering the reviews based on locations, branches etc.
- Built TF-IDF model and used word count as a feature, fit to training and test sets.
- Performed topic modelling and grouped the data into different topics using LDA and LSA.
- Applied machine learning techniques such as Naïve Bayes, Logistic regression, SVM, XGBM, with scikit learn on TF-IDF, calculated prediction probability and log loss.
- Built sourcing intelligence module for a comprehensive talent acquisition solution using text mining and NLP methods like topic modelling, word embeddings, NER and similarity index in Python.
Confidential, Richmond, VA
Senior Data Scientist/Machine Learning Engineer
- Implemented a highly immersive data science program involving Data Manipulation, Web Scraping, Machine Learning, Python programming and data visualization.
- Worked on analyzing the data statistically and prepared statistical reports using SAS tool.
- Used Customer Profiling models with K-means and K-means ++ clustering algorithms to enable targeted marketing. Developed the model and used elbow plot to find the optimum value of K by using Sum of Squared error as the error measure.
- Evaluated and optimized performance of models and tuned parameters with K-Fold Cross validation.
- Fine-tuned the model many times using hyperparameter tuning using grid search and thus got a model with very less RMSE (Root mean square error) value.
- Developed Spark, Python modules for machine learning and predictive analytics in Hadoop on AWS. Implemented a Python based distributed random forest via Python streaming.
- Performed Data Wrangling to clean, transform and reshape the data utilizing Numpy and Pandas library.
- Data elements validation using exploratory data analysis (univariate, bi-variate, multi-variate analysis).
- Configured big data workflows to run on the top of Hadoop which comprises of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce.
- Used Matplotlib, Spark’s machine learning to build and evaluate different models.
- Defining job flows in Hadoop environment-using tools like Oozie for data scrubbing and processing.
- Engineered many features during the feature engineering phase. Performed Fuzzy String matching, added a lot of statistical fea2tures, combined the features.
- Used a DTM matrix to store the TF-IDF values and later calculated the Cosine Vector Space model to find the similarity between queries and product descriptions.
- Let the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimizations techniques, linear regressions, K-means clustering and Naïve Bayes.
Confidential, Atlanta, GA
Data Scientist/Machine Learning Engineer
- Performed various statistical analysis methods like regression, hypotheses testing and also statistical tests to optimize the data preprocessing techniques.
- Transformed the raw data into actionable insights by incorporating various statistical techniques and data mining tools such as Python (Scikit-Learn, NumPy, Pandas, Matplotlib) and SQL.
- Data gathering, data cleaning and data wrangling performed using Python and R.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop MapReduce, Hive, Spark jobs.
- Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Developed Spark/Scala, Python for regular expression (regex) project in Hadoop/Hive environment for big data resources. Used clustering techniques like K-means to identify outliers and to classify unlabeled data. calculated the errors using various machine learning algorithms such as Linear Regression, Ridge Regression, Lasso Regression, Elastic net regression, KNN, DecisionTreeRegressor, SVM, Bagging Decision Trees, Random Forest, AdaBoost, XGBoost. Chose the best model eventually based on MAE.
- Experimented with Ensemble methods to increase the accuracy of the training model with different Bagging and Boosting methods.
- Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
Confidential, Portland, OR
Junior Machine Learning Engineer
- Translated business requirements into analytical models, designing algorithms, building models by data mining huge volume of Structured and Unstructured data.
- Applied advanced statistical and predictive modeling techniques to build, maintain and improve on multiple real-time decision systems. Conducted advanced data analysis and developed complex algorithms.
- Identified the target groups by conducting Segmentation analysis using Clustering techniques like K-means.
- Conducted model optimization and comparison using stepwise function based on AIC value.
- Worked on model selection based on confusion matrices, minimized the Type II error. Generated cost-benefit analysis to quantify the model implementation comparing with the former situation.
- Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.
- Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy, scikit learn.
- Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
- Worked and collaborated with various business teams (operations, commercial, innovation, HR, logistics, safety, environmental, accounting) to analyze and understand changes in key financial metrics and provide ad-hoc analysis that can be leveraged to build long term points of view where value can be captured.
- Maintained a portfolio of historical and upcoming relevant business changes/events, internal and external, to provide performance clarity and maintain management focus on highest value opportunities.
- Provided analytical support that evaluates venture acquisition model vs. actual performance.
- Performed performance benchmarking (internal, external and state of the art) to identify and accelerate strategies to close and outperform GAPS (cost) identified in benchmarking process.
- Participated and learn the decision-making framework process for capital ventures.
Confidential, San Francisco, CA
- Involved in the preparation of data mapping documents which include data integration tasks such as Data Transformation, Data Lineage Analysis, Data Profiling and Data Masking.
- Performed root cause analysis and created mathematical diagrams, graphs, and flowcharts to describe the process which improves organizational efficiencies.
- Conducted Data mining to examine the pattern of the datasets.
- Performed gap analysis for business rules, business and system process flows, user administration and user requirements.
- Configured software for optimizing data storage, and database growth.
- Generated complex SQL queries against large relational databases and created efficient, meaningful and accurate data visualizations in Tableau.
- Analyzed the number and types of support cases and set the production pace through Statistical Process Control charts and pivot tables using MS Excel.
- Worked closely with business units to drive data conversion process during data migration to ensure the data conversion and formatting does not affect the regular reporting program.
- Analyzed client’s business requirements and processes through document analysis and workflow analysis for designing a data warehouse and transform the data into interactive data visualizations.
- Performed data profiling to cleanse the data in the database and raise the issues found.
- Performed data mining on source data to make sure that data is accurate for reporting needs.
- Developed firm knowledge on Excel, SQL and SAS for large dataset manipulations.
- Developed key performance indicators to monitor sales and improve cost efficiency.
- Developed, validated and implemented statistical models to solve strategic and challenging business problems.
- Converting data models into business insights and proposing actions based on analysis.
- Wrote SQL queries for data extraction, manipulation and formation of tables.
Environment: Oracle DB, SQL Joins, Flat files, MS Office