We provide IT Staff Augmentation Services!

Data Scientist Resume

Watertown, MA

SUMMARY:

  • Experienced Data Scientist with demonstrated history of working with Neural Networks, Natural Language processing and various Machine Learning algorithms. Passionate towards solving the problems with large amounts of data
  • Around 8+ years of experience in transforming business requirements into building models using Machine learning, Predictive modeling.
  • Experienced in Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modelling, Data Visualization.
  • Strong knowledge in writing sub queries, procedures, triggers, cursors, and functions in SQL .
  • Expert level mathematical knowledge in statistics, differentiation, integration, trigonometry and Geometry.
  • Hands on experience in Exploratory Data Analysis using numerical functions and by plotting relevant visualizations which helps for feature engineering.
  • Experience with statistical programming languages such as R and Python
  • Experienced in visualizing data using Tableau, Matplotlib, Seaborn, Plotly and to publish dashboards.
  • Experienced in Optimization techniques like Gradient Descent, Schotastic gradient descent, Adam, Adadelta, Adagrad.
  • Expertise in dimensionality reduction techniques like Truncated svd, Principal Component Analysis.
  • Hands on experience in machine learning algorithms like Decision Trees, Support Vector Machines, KNearest Neighbours, Linear Regression, Logistic Regression, Random Forest, Naïve Bayes Classifier, Ensemble Methods.
  • Expertise in Unsupervised machine learning algorithms like Kmeans clustering, Hierarchical clustering, density - based clustering.
  • Experienced in collaborative and content-based filtering for Recommender Systems.
  • Proficient with Deep Learning concepts like Multi-Layer Perceptron, Deep Neural Networks, Artificial Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks.
  • Hands on experience on Deep Learning Techniques such as Back Propagation, Choosing Activation Functions, Weight Initialization based on Optimizer, Avoiding Vanishing Gradient and Exploding Gradient Problems, Using Dropout, Regularization and Batch Normalization, Gradient Monitoring and Clipping Padding and Striding, Max pooling, LSTM, GRU.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems.
  • Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
  • Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
  • Extensive experience working in a Test-Driven Development and Agile-Scrum Development.
  • Experience in using GIT Version Control System.
  • Experienced in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along with data analytics and data wrangling.

PROFESSIONAL EXPERIENCE:

Languages

: C, C++, R, Python

Databases: Oracle 11g/10g, SQL Server.

Mathematical: Matrix operations, Differentiation, Integration, Probability, StatisticsLinear Algebra, Geometry.

MachineLearning Algorithms: Logistic Regression, Linear Regression, Support Vector Machines, Decision Trees, K-Nearest Neighbors, Random Forests, Gradient Boost decision TreesStacking Classifiers, Cascading Models, Naïve Bayes, K-Means ClusteringHierarchical Clustering and Density Based Clustering.

Machine Learning Techniques: Principal Component Analysis, Truncated SVD, Data StandardizationL1 and L2 Regularization, Loss Minimization, Hyper Parameter TuningPerformance Measurement of Models, Featurization and Feature EngineeringContent Based and Collaborative Based Filtering, Matrix Factorization, Model Calibration, productionizing Models, A/B Testing, Point and Interval EstimationHypothesis Testing, Cross Validation, Decision Surface Analysis, Retraining Models periodically, t- stochastic neighborhood embedding.

Deep Learning: Artificial Neural Networks, Convolutional Neural Networks, Multi-Layer perceptron’s, Recurrent Neural Networks, LSTM, GRU, SoftMax ClassifierBack Propagation, Chain Rule, Choosing Activation Functions, Drop outOptimization Algorithms, Vanishing and Exploding Gradient, Striding, PaddingOptimized weight Initializations, Gradient Monitoring and Clipping, Batch Normalization, Max Pooling.

Methodologies: Agile, Scrum

PROFESSIONAL EXPERIENCE:

Confidential, Watertown, MA

Data Scientist

Responsibilities:

  • Responsible for Dimensionality Reduction and Regularization .
  • Performed data cleaning, featurization, feature engineering and feature scaling
  • Apply forward elimination and backward elimination for data sets to identify most statistically significant variables for Data analysis.
  • Utilized avariety of machine learning methods including Classification, Regression, Clustering techniques
  • Implemented segmentation to applicant data.
  • Created dashboards of large datasets using Tableau which helped to understand the insights of data.
  • Created and implemented SQL Queries to deliver efficient results.
  • In data exploration stage used correlation analysis and graphical techniques to get some insights about the data.
  • Used Principal Component Analysisand Truncated svdto analyze high dimensional data
  • Isolating truly relevant data from terabytes of customer transaction records and their data.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Seaborn, Scikit-learn, NLTK in Python at various stages for developing machine learning model and utilized machine learning algorithms such as linear regression, naive Bayes, Random Forests, Decision Trees, K-means, &KNN.
  • Created distributed environment of Tensor Flow across multiple devices (CPUs and GPUs) and run them in parallel.
  • Worked with several R packages including GGPLOT, DPLYR, KNITR.
  • Tested classification algorithms such as Logistic Regression, Gradient Boosting and Random Forest using Pandas and Scikit-learn and evaluated the performance
  • Evaluated models using Cross validation, Log loss function used to measure the performance and used ROC curves and AUC for feature selection.
  • K Means clustering to identify outliers and to classify unlabeled data.
  • Designed, developed and maintained daily and monthly summary, trending and benchmark reports in Tableau Desktop.
  • Used Spark's Machine learning library to build and evaluate different models.
  • Maintaining scripts using the GIT version control tool.
  • Used JIRA for bug tracking, story and task management.

Environment: : Cluster Analysis, Regression, Natural Language Processing, Spark ML lib, Logistic regression, SoftMax classifier, Random Forest, Python, SQL, Oracle 12c, NLTK, Recurrent Neural Networks, LSTM cells, Natural Language Toolkit, NumPy, SciPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tensor Flow, Keras.

Confidential

Data Scientist

Responsibilities:

  • Performed data cleaning, featurization, feature engineering and feature scaling.
  • Writing software to clean and investigate large, messy data sets of numerical and textual characters
  • Involved in various pre-processing phases of text data like Tokenizing, Stemming, Lemmatization and converting the raw text data to structured data.
  • Using NLP developed deep learning algorithms for analyzing text, over their existing dictionary-based approaches.
  • Build and improve models using natural language processing (NLP) and machine learning to extract insights from unstructured data.
  • Build multi-layers Neural Networks to implement Deep Learning by using TensorFlow and Keras .
  • Text analytics on review data using Natural Language Processing Tool Kit (NLTK).
  • Created distributed environment of Tensor Flow across multiple devices (CPUs and GPUs) and run them in parallel.
  • Involved in various pre-processing phases of text data like Tokenizing, Stemming, Lemmatization and converting the raw text data to structured data.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Constructing the new vocabulary to convert the data into numbers to be processed by the machine by using the approaches like Bag of Words, TF-IDF, Word2vec, Average Word2Vec.
  • Implemented Bi-Directional Recurrent Neural Networks acts as encoder to process the input and as decoder to generate the output.
  • Used Recurrent Neural Networks with LSTM cells to protect the sequence information.
  • LSTM cells are implemented in the Recurrent Neural Network to get the longer-term dependencies

Environment: : Cluster Analysis, Regression, Natural Language Processing, Spark ML lib, Logistic regression, SoftMax classifier, Random Forest, Python, SQL, Oracle 12c, NLTK, Recurrent Neural Networks, LSTM cells, Natural Language Toolkit, NumPy, SciPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tensor Flow, Keras.

Confidential

Machine Learning Engineer

Responsibilities:

  • Involved in all the phases of data science project life cycle including data extraction, data cleaning, transforming and visualization.
  • Responsible for data identification, collection, exploration, cleaning for modeling.
  • Performed Data Cleaning, features scaling, featurization, features engineering.
  • Queried and aggregated data from SQL server, oracle 10g, MySQL databases to get sample datasets.
  • Performed Exploratory Analysis to understand the insights of the data and spot anomalies using Pandas, Matplotlib.
  • Ensured data accuracy, and treated missing values using NumPy and Pandas.
  • Used Principal Component Analysis to analyze high dimensional data in feature engineering and also eliminated unrelated features.
  • Utilized avariety of machine learning methods including Classifications, Regressions, Dimensionally Reduction, Clustering techniques.
  • Customer segmentation is achieved by using clustering algorithms to group customers into various segments based on their behavioral and geographical data. This helps in improving target marketing.
  • Observed groups of customers being neglected by the pricing algorithm, used hierarchical clustering to improve customer segmentation
  • Designed and developed Recommendation models to recommend products to customers using Content based and Collaborative filtering.
  • Developed NLP models for Sentiment Analysis for customer reviews.
  • Text analytics on review data using Natural Language Processing Tool Kit (NLTK).
  • Writing software to clean and investigate large, messy data sets of numerical and textual characters
  • Addressed overfitting and underfitting by tuning the hyper parameter of the algorithm and by using L1 and L2 Regularization.
  • Created various types of data visualizations using Matplotlib, Tableau to convey the results to other data and marketing teams.
  • Used Predictive Analytics to analyze the shopping behavior of the customers.
  • Responsible for establishing a detailed program specification through interaction with clients.

Environment: : NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Spark ML lib, Tableau, SQL, Linux, Git, Microsoft Excel, PySpark, Spark SQL, Logistic Regression, Random Forests, Decision Trees, t-SNE, PCA, Tensor Flow, K-Means, Natural Language Tool Kit.

Confidential

Data Scientist:

Responsibilities:

  • Working closely with marketing team to deliver actionable insights from huge volume of data, coming from different marketing campaigns and customer interaction matrices such as web portal usage, email campaign responses, public site interaction, and other customer specific parameters.
  • Characterizing false positives and false negatives to improve a model for predicting customer churn rate.
  • Consumer segmentation and characterization to predict behavior. Analyzing promoters and detractors (defined using Net Promoter Score).
  • Outlier detection using high-dimensional historical data. Acquiring, cleaning and structuring data from multiple sources and maintain databases/data systems. Identifying, analyzing, and interpreting trends or patterns in complex data sets.
  • Developing, prototype and test predictive algorithms. Filtering and "cleaning" data and review computer reports, printouts, and performance indicators to locate and correct code problems.
  • Developing and implementing data collection systems and other strategies that optimize statistical efficiency and data quality.
  • Used different statistical models like regression and classification models to create contact scoring models. Also used clustering to the customer data profiles to do customer segmentation and analysis.
  • Interpreting data, analyze results using statistical techniques and provide ongoing reports.

Hire Now