Data Scientist Resume

SUMMARY

Passionate Data Scientist having strong background in statistics with 6 years of experience, seeking to solve the boatnecks of the business and increase productivity by implementing the Machine Learning models. Skilled in predictive modelling, data processing and data mining algorithms.
Expertise in Statistical analysis, Predictive modeling, Text mining, Supervised learning, Unsupervised Learning, and Reinforcement learning.
Proficient in Statistical methodologies such as Hypothesis Testing, ANOVA, Monte Carlo Sampling and Time Series Analysis.
Strong mathematical background in Linear A lgebra, Probability, Statistics, Differentiation and Integration.
Expertise in handling with the various sets of data sources such as semi structured, unstructured, time series, and spatial data.
Excellent understanding of Analytics concepts and Supervised Machine Learning algorithms like ( Decision Trees, Linear, Logistics, Random Forest, SVM, Bayesian, XGBoost, K - Nearest Neighbors) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
Excellent understanding of Analytics concepts and Unsupervised Machine Learning algorithms like K-Means, Density Based Clustering (DBSCAN), Hierarchical Clustering (Agglomerative, Divisive) and good knowledge on Recommender Systems.
Proficient in implementing Dimensionality Reduction Techniques like Principal Component Analysis, t-Stochastics Neighborhood Embedding (t-SNE), and Linear Discriminant Analysis (LDA).
Familiar with predictive models using numeric and classification prediction algorithms like support vector machines and neural networks, and ensemble methods like bagging, boosting and random forest to improve the efficiency of the predictive model.
Well experienced in Normalization, De-Normalization and Standardization techniques for optimal performance in relational and dimensional database environments.
Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
Implemented Python Machine learning algorithms, such as NLP based text classification using TensorFlow and Time series forecasting based anomaly detection Models.
Researched feature optimization on machine learning algorithms created with TensorFlow, predicting the categories of consumer complaints using NLP with >90% accuracy.
Worked on Chat Bot Product Management using NLP/NLU and designed roadmap for launch/future phases.
Excellent knowledge in numerical and scientific libraries such as SciPy and NumPy.
Experience building and optimizing big data pipelines, architectures, and data sets like Hadoop, Spark, Zeppelin, Hive, MongoDB and Cassandra.
Hands on experience in importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
Experience with Deep Learning net families, such as Artificial Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks and GANs.
Experience in building models with TensorFlow and top-level frameworks such as Keras, Theano and Pytorch.
Actively involved in all phases of data science project life cycle including Data Extraction, Data Cleaning, Data Visualization and building Models.
Hands on Experience in AWS like Amazon EC2, Amazon S3, Amazon Redshift, Amazon EMR and Amazon SQS, Sage Maker, AWS Lambda, Azure Data Lake etc.
Built GPU-accelerated machine learning package with APIs in Python and R that allows anyone to take advantage of GPUs to build advanced machine learning models using H2O4GPU.
Experience in Extract, Transfer and Load process using ETL tools like Data Stage, Data Integrator and SSIS for Data migration and Data Warehousing projects.
Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio, SSAS, SSIS and SSRS.
Well experienced in Normalization, De-Normalization and Standardization techniques for optimal performance in relational and dimensional database environments. Also, working at Low Latency application with performance tuning on the Regularization.
Proficient in data visualization tools such as Tableau and Python Matplotlib (Iplotly) to create visually powerful and actionable interactive reports and dashboards.
Experience in using GIT Version Control System.

TECHNICAL SKILLS

R, Python, Vue, JavaScript, HTML, Scala, Java, C, C++, Node.js
Oracle 11g/10g, SQL Server, MS-Access, SSIS, SSRS.
Matrix operations, Differentiation, Integration, Probability, Statistics, Linear Algebra, Geometry.
SQL Server Tools SQL Server Management Studio (SSMS), Erwin Data Modeler, SAS,
Data Warehouse Tools, MS SQL Server 2005/2008/2012/2014 /2016 Integration Services,
Business Intelligence Tools SQL Server 2005/2008/2012/2014 , Business Intelligence
Development Studio Reporting Tools, SQL Server 2005/2008/2012/2014 Reporting Services.
Logistic Regression, Linear Regression, Support Vector Machines, Decision Trees,
K-Nearest Neighbors, Random Forests, Gradient Boost Decision Trees, Stacking Classifiers,
Cascading Models, Naive Bayes, K-Means Clustering, Hierarchical Clustering and Density
Based Clustering.
Principal Component Analysis, Truncated SVD, Data Standardization, L1 and L2 Regularization,
Loss Minimization, Hyper Parameter Tuning, Performance Measurement of Models, Featurization and Feature Engineering, Content Based and Collaborative Based Filtering, Matrix Factorization,
Model Calibration, productionizing Models, A/B Testing, Point and Interval Estimation,
Hypothesis Testing, Cross-Validations, Decision Surface Analysis, Retraining,
Models periodically, t-stochastic neighborhood embedding.

PROFESSIONAL EXPERIENCE

Confidential

Data Scientist

Responsibilities:

Responsible for Data Cleaning, features scaling, features engineering by using NumPy and Pandas in Python.
Performed Exploratory Data analysis (EDA) to using Python and R to maximize insight into the dataset, detect the outliners and extract important variables by graphically and Numerically using accuracy, recall/precision, F1 Score etc
Implemented algorithms such as Principal Component Analysis (PCA) and t-Stochastics Neighborhood Embedding (t-SNE) for dimensionality reduction and normalize the large datasets. developed various Clustering algorithms for market segmentation to analyze the customer behavior patterns.
Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Seaborn, Scikit-learn, NLTK in Python at various stages for developing machine learning model.
Implemented various machine learning algorithms such as linear regression, classification, time series, decision trees and black box algorithms like Support Vector Machine, Gradient Boosting Machine to create robust data models for our clients.
Identified patterns of behavior in customer migration to products and services. Created run jobs using AWS Lambda, Snowflake and S3 services.
Collected historical data and third-party data from different data sources. Also, worked on outlier's identification with box-plot using Pandas, Numpy.
Develop new algorithms and models to create new functionalities for products and services that meet business needs for generating the LEADS.
Redesigned entire SAS code used to generate LEADS in Python and implemented ML pipelines from data pre-processing and feature extraction to application of ML algorithms in Spark and Python by combing H2O applications.
Performed features engineering such as RFE feature selection, Feature normalization and Label encoding with Scikit-learn preprocessing library.
Built machine learning models and constructed multilayer perceptions for Deep Neural Networks (DNN) to identify fraudulent applications for insurance pre-approvals and to identify fraudulent credit card transactions using the history of customer transactions and compared the results.
Implemented LSTM layer network of moderate depth to gain the information in the sequence with help of Tensor Flow.
Created distributed environment of Tensor Flow across multiple devices (CPUs and GPUs) and run them in parallel.
Implemented a recommendation model to optimize sales and marketing efforts that Increased the revenue by ~3%.
Used cross-validation techniques to avoid overfitting of the model to make sure the predictions are accurate and measured the performance using Confusion matrix and Classification report.
Improved accuracy using Ensemble methods of the training model with different Bagging and Boosting methods.
Convert ML models to PoC and production code in various iterations based on product requirements.
Worked on reading queues in Amazon SQS, which have paths to files in Amazon S3 Bucket. Also worked on AWS CLI to aggregate clean files in Amazon S3.
Building a web application using Spark, R, and d3.js that processed terabytes of data to extract insights into the progression and performance within the models.
Experienced of building Data Warehouse in Azure platform using Azure data bricks and data factory
Used data lineage methodologies is in the field of business intelligence, which involves gathering data and building conclusions from that data.
Provided clear direction and motivation to project team members, champion communication across all levels of the organization and provide daily reports to management on project status.

Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Snowflake, AWS, Microsoft Excel, Random Forests, SVM, t-SNE, PCA, SSIS, SAS Analytics, Python, DNN, K-NN, Spark.

Confidential, Jacksonville, FL

Data Scientist/Machine Learning Engineer

Responsibilities:

Performed Sentimental analysis (NLP) to determine the emotional tone behind the series of words and gain the express of the attitudes and emotions by Long-Short Term Memory (LSTM) cells in Recurrent Neural Networks (RNN) and worked on ChatBots.
Evaluate prior chatbot pilot to anticipate issues/blockers and uncover potential enhancements for upcoming end to end services reaching millions of users.
Worked on master chatbot product feature-set/capabilities including intents/entities, NLU/NLP, dialog, and business rules.
In the context of Chat bots, it assesses the intent of the input from the users and then creates responses based on a contextual analysis like a human being using NLP.
Implemented machine learning algorithms, Random forest and Support vector machines to predict the Customer credit history and Payments activity on historical data to get predicted label whether the customers are eligible for loans based on the credit score and report the marketing team.
Extracted data from HDFS using Hive, Presto and performed data analysis using Spark with Scala, pySpark, Redshift, and feature selection and created nonparametric models in MongoDB and Cassandra
Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
Performed Text analytics on unstructured email data using Natural language processing tool kit (NLTK).
Involved in various pre-processing phases of text data like Tokenizing, Stemming, Lemmatization and converting the raw text data to structured data
Performed feature engineering, performed NLP by using some techniques like Word2Vec, BOW (Bag of Words), tf-idf, Avg-Word2Vec, if-idf, Weighted Word2Vec.
Leverage tools for capturing business & technical metadata, data linage & transformation rules for assigned data domains/data sets with special attention to data profiling, data testing, and metric reporting. Emphasis on building data scorecards.
Used LSTM to predict sentiment of customer reviews using PyTorch.
Used PyTorch and MongoDB as part of the Customer feedback challenge and build image classification based on pretrained VGG16 network.
Used PySpark Machine learning library to build and evaluate different models.
Used Tableau to convey the results by using dashboards to communicate with team members and with other data science teams, marketing and engineering teams.
Communicated the results with operations team for taking best decisions.

Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Linux, Git, Microsoft Excel, PySpark-ML, Random Forests, SVM, t-SNE, PCA, Tensor Flow, K-Means, Natural Language Tool Kit, LSTM - RNN, AWS, PySpark, Redshift, Scala, MapReduce.

Confidential, Denver, CO

Data Scientist / Machine Learning Engineer

Responsibilities:

Responsible for Data Cleaning, features scaling, features engineering by using NumPy and Pandas in Python.
Performed Exploratory Data analysis (EDA) to maximize insight into the dataset, detect the outliners and extract important variables by graphically and Numerically.
Extract data and actionable insights from a variety of client sources and systems, find probabilistic and deterministic matches across second- and third-party data sources, and complete exploratory data analysis.
Applied resampling methods like Synthetic Minority Over Sampling Technique (SMOTE) to balance the classes in large data sets.
Gathered, analyzed, and translated business requirements, communicated with other departments to collect client business requirements and access available data.
Implemented algorithms such as Principal Component Analysis (PCA) and t-Stochastics Neighborhood Embedding (t-SNE) for dimensionality reduction and normalize the large datasets.
Performed K-means clustering, Regression and Decision Trees and performed Data Visualization reports for the management using R.
Implemented classification algorithms such as Logistic Regression, K-NN neighbors and Random Forests to predict the Customer churn and Customer interface.
Analyzed questionnaire questions using Named Entity Recognition (NER) Natural Language Processing (NLP) to classify degree of bad connections to the customers.
Performed Sentimental analysis on the email feedback of the customers to determine the emotional tone behind the series of words and gain the express of the attitudes and emotions by Long-Short Term Memory (LSTM) cells in Recurrent Neural Networks (RNN).
Build models with TensorFlow and top-level frameworks such as Keras and Pytorch.
Perform model tuning to find the best hyperparameter fit for the algorithm to achieve better results
Ensemble methods were used to increase the accuracy of the training model with different Bagging and Boosting methods
Used data lineage techniques on how to get customer information has been collected and what role it could play in new or improved processes that put the data through additional flow charts
Performed data visualization and Designed dashboards with Tableau, and generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
Excellent working knowledge and hands-on experience of SAS, including the writing of SAS programs using various procedures and experience with macros.
Evaluated models using Cross Validation, Log loss function, ROC Curves and AUC for feature selection and measured the performance using Confusion matrix and Classification report.
Developed Full life cycle of Data Lake, Data Warehouse with Bigdata technologies like Spark, Hadoop with JAVA.
Responsible for full data loads from production to AWS Redshift staging environment.

Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Linux, Git, Microsoft Excel, Random Forests, SVM, t-SNE, PCA, Tensor Flow, Python, DNN, K-NN, Spark, Hadoop, AWS Redshift.

Confidential

Jr. Data Scientist

Responsibilities:

Designed and implemented customized Linear regression model to predict the sales utilizing diverse sources of data to predict demand, risk and price elasticity.
Responsible for Data Cleaning, features scaling, features engineering by using NumPy and Pandas in Python.
Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features.
Performed the statistical analysis and compared the models using R. Created Visualization graphs and competed models based on R2 Values, F1 Score and Correlation.
Used information value, principal components analysis, and Chi square feature selection techniques to identify.
Applied resampling methods like Synthetic Minority Over Sampling Technique (SMOTE) to balance the classes in large data sets.
Experimented with multiple classification algorithms, such as Logistic Regression, Support Vector Machine (SVM), Random Forest, AdA boost and Gradient boosting using Python Scikit-Learn and evaluated the performance on customer discount optimization on millions of customers.
Used F-Score, AUC/ROC, Confusion Matrix and RMSE to evaluate different model performance.
Used Keras for implementation and trained using cyclic learning rate schedule.
Overfitting issues was resolved by batch norm, dropout helped to overcome the issue.
Built models using Python and Pyspark to predict the probability of attendance for various campaigns and events.
Developed AdaM dataset specifications and write SAS programs to produce ADaM datasets.
Produce summary tables, data listings, and graphs (TLGs) using SAS software.
Performed data visualization and Designed dashboards with Tableau, and generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.

Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Linux, Git, Microsoft Excel, PySpark-ML, Random Forests, SVM, Tensor Flow, Keras.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship