We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Engineer Resume

Princeton, NJ

SUMMARY:

  • Over 5 years of experience as a Professional Qualified Data Scientist/Data Analyst in Data Science and Analytics including Machine Learning, Data Mining, and Statistical Analysis
  • Extensive experience in Machine Learning solutions to various business problems and generating data visualizations using Python.
  • Used Pandas, NumPy, Scikit - learn in Python for developing various machine learning models.
  • Hands on experience in implementing Naive Bayes, Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, TEMPPrincipal Component Analysis and good knowledge on Recommender Systems.
  • Designing and developing various machine learning frameworks using Python, R, and MATLAB.
  • Rich Experience in managing the entire data science project life cycle and involved in all phases, including data extraction, data cleaning, statistical modeling and data visualization, wif large datasets of structured and unstructured data.
  • Implemented deep learning models and numerical Computation wif the halp of data flow graphs using TensorFlow Machine Learning.
  • Worked wif numerous data visualization tools in python like matplotlib, seaborn, ggplot, pygal
  • Experience in designing visualizations using Tableau software and publishing and presenting dashboards, Storylines on web and desktop platforms.
  • Experience wif data visualizations using Python 2.X / 3.X and R 2.15 / 3.0 and generating dashboards wif Tableau 8.0 / 9.2 / 10.0.
  • Experience in Statistical Analysis and Testing including Hypothesis test, Anova, Survival Analysis, Longitudinal Analysis, Experiment Design and Sample Determination and A/B test.
  • Working experience in version control tools such as Git 2.X to coordinate work on file wif multiple team members.
  • Employing various SDLC methodologies such as Agile and SCRUM methodologies.
  • Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, MapReduce concepts, and ecosystems including Hive and Pig.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Proficient knowledge in statistics, mathematics, machine learning, recommendation algorithms and analytics wif excellent understanding of business operations and analytics tools for effective analysis of data.
  • Highly self-motivated, enthusiastic, and result-driven wif the ability to effectively communicate wif all levels of the organization including senior management and executives.
  • Guide the development teams to break down large and complex user stories into simplified versions for execution.

TECHNICAL SKILLS:

ML Algorithms: Analytic Tools Linear regression, Anaconda (Jupyter Notebook, LDA/QDA, SVM, CART, Spyder), R (Reshape, ggplot2, Random Forest, Boosting, K-means clustering, Car, Mass and Lme4), Matlab Hierarchical clustering, Collaborative filtering, Excel, NLP

Statistical Analysis: Hypothesis Test, ANOVA, Survival Analysis, NumPy, SciPy, pandas Longitudinal Analysis, Experiment Design and seaborn, beautiful soup, Scikit-learn, NLTK)Sample Determination, A/B

Programming Languages: Python, R, Matlab, SQL, Unix, MongoDB, Spark, Hadoop, Tensor flow

Relational Database: MySQL, Oracle, MS SQL Tableau, R-ggplot2, Matplotlib, MongoDB

Hadoop Ecosystem: Spark Framework, HDFS, MapReduce, Hive, HBase, SparkSQL, Pyspark, MllibML/ Deep Learning

Techniques: Trees, Bayes Model, SVM, Ensemble Methods, Neural Networks, RNN, KNN, CNN, MLP, Ensemble SVM, Majority voting, Linear models, Classification, Regression, Logistic Regression, Clustering, Kernel methods, Memory Networks, LSTMs, Dimension reduction, Deep belief networks, Statistical tests, DBN, Gaussian mixtures, DCN

Python Libraries: Scikit, pandas, Numpy, Scipy, Theano, Keras, Matplotlib, PyMongo R Libraries dplyr, ggplot2, jsonlite, plyr, rvest, rjson, httr, xml2, curl

PROFESSIONAL EXPERIENCE:

Confidential, Princeton, NJ

Data Scientist/Machine Learning Engineer

Responsibilities:

  • The job required to derive key attributes from the obfuscated data, process the data and build machine learning algorithms to identify the ideal customer base.
  • Implemented python alongside using various libraries such as matplotlib for charts and graphs, MySQL db for database connectivity, Pandas data frame, Numpy.
  • Developed predictive models and deployed them wif interactive visualizations in seaborn, matplotlib, Tableau.
  • Forecasted the sales of products on yearly basis using Regression model, based on the past 3 years of data.
  • Built Random Forest Regression model in R for the time series prediction and connected wif tableau using external ODBC in tableau.
  • Leveraged Machine Learning to predict Customer Churn andNLP to perform SentimentAnalysis
  • Generated interactive Bar Graphs on the forecasted sales through Tableau.
  • Wrangled data, worked on large datasets (acquired data and cleaned the data), analyzed trends by making visualizations using matplotlib and python.
  • Built Artificial Neural Networks using TensorFlow in Python to identify the customer's probability of cancelling the connections. (Churn rate prediction)
  • Understanding the business problems and analyzing the data by using appropriate Statistical models to generate insights.
  • Predicted the products that are prone to be back ordered and products that are expected to be canceled.
  • Participated in data mining and machine learning project: Comparing algorithms for predicting backorders for a specific product
  • Data cleansing, transformation and creating new variables using R.
  • Data validation through cleaning and scaling the required variables.
  • Model validation was done through K-fold Cross Validation and backward elimination method.
  • Created visualizations using Tableau and ggplot2 (R) for the insights of Data.
  • Jupyter notebooks are used to write Python scripts on AWS EC2 instances to do analysis.
  • Developed pipelines that drive data for the automation of and testing the models.
  • Applied K-means clustering algorithm on the customer generated tickets to group them into related clusters.
  • Implemented Porter Stemmer (Natural Language Toolkit) and NLP bag of words model (Countvectorizer) to prepare the data. Resulted clusters are plotted visually using Tableau legends.
  • Developed Natural Language Processing to automate the classification of positive and negative reviews by the text processing using NLTK. (Sentimental Analyzer)
  • All the built models are tuned by finding the best parameters using GRID search.
  • Created and Worked on Amazon EC2 Cloud Instances using Linux Ubuntu and Configuring launched instances wif respect to specific applications.

Environment: Python, R, Tableau, AWS, TensorFlow, Theano, Keras, My SQL, NLP, NLTK, Hadoop

Confidential, Washington, DC

Data Scientist/Machine Learning Engineer

Responsibilities:

  • dis project was focused on customer clustering based on ML and statistical modeling effort including building predictive models and generatedataproducts to support customer classification and segmentation.
  • Performed data analysis, visualization, feature extraction, feature selection, feature engineering using python pandas, numpy, seaborn Apache Spark etc.
  • Applied Spark RDD's transformations and actions on raw data.
  • Developed scalable machine learning solutions wifin a distributed computation framework (e.g. Hadoop, Spark, etc).
  • Establish scalable efficient automated processes for large scale data analyses model development model validation and model implementation.
  • Design and develop machine learning and deep learning systems.
  • Developed Name Entity Recognition using Bi Directional Long Short Term Memory (LSTM)
  • Implemented Spark Mllib utilities such as classification, regression, clustering, collaborative filtering and dimensionality reduction.
  • Utilized Convolution Neural Networks to implement a machine learning image recognition component using TensorFlow.
  • Implemented profile/rule based machine learning classifiers including SVM, Linear Least Square Fit Classifier(SVD), DNF(Disjunctive Normal Form), Decision Tree.
  • Implemented Back-propagation in generating accurate predictions.
  • Implemented Apache Spark to speedup Convolutional Neural Networks modeling.
  • Utilized NLP applications such as topic models and sentiment analysis to identify trends and patterns wifin massive datasets.
  • Avoided overfitting by following standard practices such as keeping the number of independent parameters less TEMPthan the data points avoidable in the model.
  • Loaded data from Hadoop and made it available for modeling in Keras.
  • Prepared multi-class classification data for modeling using one hot encoding.
  • Used Keras neural network models wif Apache Spark.
  • Enhanced model performance by calibrating parameters, researching and improving optimization and weights initialization methods.
  • Used Pyspark data frame to read data from HDFS and S3.
  • Worked closely wif internal stakeholders such as business teams, product managers, engineering teams and partner teams.

Environment: Python, R, Tableau, AWS, TensorFlow, Theano, Keras, My SQL, NLP, NLTK, Hadoop, Spark, Spark Mllib, Apache Spark, HDFS and S3

Confidential, Washington, DC

Data Scientist

Responsibilities:

  • The job involves creating statistical machine learning models for fraud detection, implementing automated customer scoring systems, sentiment analysis etc; Customer segmentation based on their behavior or specific characteristics like age, region, income, geographical location and applying Clustering algorithms to group the customers based on their similar behavior patterns.
  • Involved in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions.
  • Created classification models to recognize web requests wif product association in order to classify the orders and scoring the products for analytics which improved the online sales percentage by 13%.
  • Used Pandas, NumPy, Scikit-learn in Python for developing various machine learning models such Random forest and stepwise regression.
  • Worked on NLTK library in python for doing sentiment analysis on customer product reviews and other third party websites using web scraping.
  • Used cross-validation to test the models wif different batches of data to optimize the models and prevent overfitting.
  • Implemented and developed fraud detection model by implementing a Feed Forward Multilayer Perceptron which is a type of ANN.
  • Worked wif ANN (Artificial Neural Networks) and BBN (Bayesian Belief Networks).
  • Used pruning algorithms to cut away the connections and perceptrons to significantly improve the performance of back-propagation algorithms.
  • Hands-on experience in Dimensionality Reduction, Model selection and Model boosting methods using TEMPPrincipal Component Analysis (PCA), K-Fold Cross Validation and Gradient Tree Boosting.
  • Implemented a structured learning method that is based on search and scoring method.
  • Created and maintained reports to display the status and performance of deployed models and algorithms wif Tableau.
  • Worked wif numerous data visualization tools in python like matplotlib, seaborn, ggplot, pygal.

Environment: Python, R, Tableau, AWS, TensorFlow, Theano, Keras, My SQL, NLP, NLTK, Hadoop, Spark, Pandas, NumPy, Scikit-learn, ANN, BBN, K-Fold, matplotlib, seaborn, ggplot, pygal

Confidential

Data Scientist

Responsibilities:

  • The goal was creating customer profiling models and customer value analysis. Also improving customer services by automating some of the tasks using machine learning, pattern analytics and exploratory analysis.
  • Developed Python modules, machine learning & predictive analytics for day to day business activities.
  • Perform Exploratory analysis, hypothesis testing, cluster analysis, correlation, ANOVA, ROC Curve and build models in Supervised and
  • Unsupervised Machine Learning algorithms, Text Analytics & Time Series Forecasting
  • Implemented Porter Stemmer (Natural Language Toolkit) and NLP bag of words model (CountVectorizer) to prepare the data.
  • Implemented number of customer clustering models and these clusters are plotted visually using Tableau legends for higher management.
  • Developed Natural Language Processing to automate the classification of customer incident queries into levels of classes to improve the customer services.
  • Implemented a machine learning model for customer sentiment patterns to better assess the heartbeat of the customer trend.
  • Conducting studies, rapid plots and using advanced data mining and statistical modeling techniques to build a solution that optimizes the quality and performance of data.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data lifecycle management in both RDBMS, Big Data environments.
  • Developed Simple to mid level Map Reduce Jobs using hive and Pig and developed multiple MapReduce jobs in python for data cleaning and preprocessing.
  • Analyzing large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Developed the model wif ~1.4million data points and used the elbow method to find the optimal value of K using Sum of Squared error as the error measure.
  • Designed and implemented a probabilistic churn prediction model wif ~80k customer data to predict the probability of customer churn out using Logistic Regression in Python. Client utilized the results in the business to finalize the list of customers to provide a discount.
  • Implemented dimensionality reduction using TEMPPrincipal Component Analysis and k-fold cross validation as part of Model Improvement.
  • Implemented Pearson's Correlation and Maximum Variance techniques to find the key predictors for the Regression models.
  • Worked wif numerous data visualization tools in python like matplotlib, seaborn, ggplot, pygal.

Environment: Python, R, Tableau, AWS, TensorFlow, Theano, Keras, My SQL, NLP, NLTK, Hadoop, Spark, Pandas, NumPy, Scikit-learn, ANN, BBN, K-Fold, matplotlib, seaborn, ggplot, pygal

Confidential, Coralville, Iowa

Data Analyst

Responsibilities:

  • Job involves collecting data from various data sources and pumps it through Informatica workflows to store it into the data warehouse. dis project also involves data correction, business logic implementation using PL/SQL and other scripting languages like Shell scripting
  • Understood and articulated business requirements from user interviews and then converted requirements into technical specifications. Effectively communicated wif the SMEs to gather the requirements.
  • Worked on Regression in performing Safety Stock and Inventory Analysis using R and performed data visualizations using Tableau and R.
  • Establish scalable efficient automated processes for large scale data analyses model development model validation and model implementation.
  • Used tableau to visualize data from a given dataset. Used R to do statistical modeling and did data transformation before using the data in tableau and visualizing it. Wrote R scripts and connected wif tableau using an external ODBC in tableau.
  • Designed different types of STAR schemas for detailed data marts and plan data marts in the OLAP environment.
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Involved wif Data Analysis Primarily Identifying Data Sets, Source Data, Source Metadata, Data Definitions and Data Formats.
  • Well experienced in Normalization and Denormalization techniques for optimum performance in relational and dimensional database environments.
  • Performed Decision Tree Analysis and Random forests for strategic planning and forecasting and manipulating and cleaning data using dplyr and tidyr packages in R.
  • Involved in data analysis and creating data mapping documents to capture source to target transformation rules.
  • Created histograms, bar charts, frequency plots, computed 3 measures of distributions: the mean, median and mode
  • Wrote, executed, performance tuned SQL Queries for Data Analysis & Profiling and wrote complex SQL queries using joins, sub queries and correlated sub queries.
  • Worked wif the ETL team to document the Transformation Rules for Data Migration from OLTP to Warehouse Environment for reporting purposes.
  • Transferred data from various OLTP data sources, such as Oracle, MS Access, MS Excel, Flat files, CSV files into SQL Server.
  • Created Use cases, activity report, logical components to extract business process flows and workflows involved in the project using Rational Rose, UML and Microsoft Visio.

Environment: SQL, Python, R, Tableau, STAR schemas, Logistic Regression, Decision trees, KNN, Naive Bayes, SQL Server, Rational Rose, UML and Microsoft Visio