We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Boston, MA

SUMMARY:

  • Data scientist with 5 years of experience in transforming business requirements into actionable data models, prediction models and informative reporting solutions.
  • Experience in developing business solutions and generating data - driven ideas working in different industries and platforms.
  • Experience in Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Expert in the entire Data Science process life cycle including Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Machine Learning Algorithms, Validation and Visualization.
  • Strong knowledge in Statistical methodologies such as Hypothesis Testing, Principal Component Analysis (PCA), Sampling Distributions and Time Series Analysis.
  • Proficient in Python and its libraries such as Numpy, Pandas, Scikit-learn, Matplotlib and Seaborn.
  • Expert in preprocessing data in Pandas using visualization, data cleaning and engineering methods such as looking for Correlations, Imputations, Scaling and Handling Categories
  • Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning.
  • Experience in building various machine learning models using algorithms such as Linear Regression, Gradient Descent, Support Vector Machines (SVM), Logistic Regression, KNN, Decision Tree, Ensembles such as Random Forrest, AdaBoost, Gradient Boosting Trees.
  • Experience in working with Hadoop Big Data tools such as HDFS, Hive, Pig Latin and Spark.
  • Experience in using cloud services like Amazon Web Services (AWS) such as EC2, S3 to work with different virtual machines.
  • Experience in Apache Spark, Kafka for Big Data Processing &Scala Functionalprogramming.
  • Experience in manipulating the large data sets with R packages like tidyr, tidyverse, dplyr reshape, lubridate, Caret and visualizing the data using lattice and ggplot2 packages.
  • Theoretical foundations and practical hands-on projects related to (i) supervised learning (linear and logistic regression, boosted decision trees, Support Vector Machines, neural networks, NLP), (ii) unsupervised learning (clustering, dimensionality reduction, recommender systems), (iii) probability & statistics, experiment analysis, confidence intervals, A/B testing, (iv) algorithms and data structures.
  • Extensive knowledge on Azure Data Lake and Azure Storage.
  • Experience in tuning algorithms using methods such as Grid Search, Randomized Search, K-Fold Cross Validation and Error Analysis.
  • Experience in Unsupervised learning working on social network datasets using K-means Clustering and Dimension Reduction methods.
  • Expertise in implementing writing and optimizing the HiveQL queries .
  • Experience of building machine learning solutions using PySpark for large sets of data on Hadoop ecosystem.
  • Experience in Natural Language Processing (NLP) and Time Series Analysis and Forecasting using ARIMA model in Python and R.
  • Prototyped, conducted, and reported on data science experiments (to technical and non-technical audiences) using supervised, semi-supervised and unsupervised learning techniques: anomaly detection, named entity recognition, ontology creation, etc.
  • Experience of building and publishing interactive reports and dashboards with design customizations based on the stakeholders needs in Tableau.
  • Deep knowledge of SQL languages for writing Queries, Stored Procedures, User-Defined Functions, Views, Triggers, Indexes and etc.
  • Experience in developing and designing ETL packages and reporting solutions using MS BI Suite (SSIS/SSRS) and Tableau.
  • Knowledge and experience in Agile environments such as Scrum and using project management tools like Jira/Confluence and version control tools such as Github/Git.
  • Quick learner in any new business industries or software environments to deliver the best solutions adapted to new requirements and challenges.

TECHNICAL SKILLS:

Languages: Java 8, Python,  Numpy, SciPy, Pandas, Scikit learn, Matplotlib, Seaborn, ggplot2, caret, dplyr, purrr, readxl, tidyr, Rweka, gmodels, RCurl, C50, twitter, NLP, Reshape2, rjson, plyr, Beautiful Soup, Rpy2

Kernel Density Estimation and Non: parametric Bayes Classifier, K-Means, Linear Regression, Neighbors (Nearest, Farthest, Range, k, Classification), Non-Negative Matrix Factorization, Dimensionality Reduction, Decision Tree, Gaussian Processes, Logistic Regression, Na ve Bayes, Random Forest, Ridge Regression, Matrix Factorization/SVD

NLP/Machine Learning/Deep Learning: LDA (Latent Dirichlet Allocation), NLTK, Apache OpenNLP, Stanford NLP, Sentiment Analysis, SVMs, ANN, RNN, CNN, TensorFlow, MXNet, Caffe, H2O, Keras, PyTorch, Theano, Azure ML

Cloud: Google Cloud Platform, AWS, Azure, Bluemix

Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL

Data Modeling Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer

Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka

Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Teradata, Netezza, MongoDB, Cassandra.

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

ETL Tools: Informatica Power Centre, SSIS.

Version Control Tools: SVM, GitHub

BI Tools: Tableau, Tableau Server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat

PROFESSIONAL EXPERIENCE:

Confidential - Boston, MA

Data Scientist

Responsibilities:

  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization to deliver data science solutions.
  • Built linear model and visualized the data by customer survey data from Confidential to explore the correlation between customer NPS and the features of each hotel.
  • Developed machine learning model for predicting the likelihood to recommend at a given hotel.
  • Performed hotel segmentation based on the purpose of visiting of the majority customers and provided suggestions for hotels to improve services.
  • Collected data using Hadoop tools to retrieve the data required for building models such as Hive and Pig Latin.
  • Extensively use Python's multiple data science packages like Pandas, NumPy, matplotlib, Seaborn, SciPy, Scikit-learn and NLTK.
  • Apply various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Worked on Amazon Web Services cloud virtual machine to do machine learning on big data.
  • Used Pandas, Numpy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models and utilized algorithms such as Linear regression, Logistic regression, Gradient Boosting, SVM and KNN
  • Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.
  • Implement machine learning model (logistic regression, XGBoost, SVM) with Python Scikit- learn.
  • Work on different data formats such as JSON, XML and applied machine learning algorithms in Python.
  • Created Transformation Pipelines for preprocessing large amount of data with methods such as imputing, scaling, selecting and etc.
  • Work with NLTK library to NLP data processing and finding the patterns.
  • Perform Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Implemented different models like Logistic Regression, Random Forest and Gradient-Boost Trees to predict whether a given die will pass or fail the test.
  • Develop MapReduce pipeline for feature extraction using Hive and Pig.
  • Create and publish multiple dashboards and reports using Tableau server.
  • Work on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
  • Performed Ensemble methods to increase the accuracy of the training model with different Bagging and Boosting methods

Environment: Hadoop 2.x, HDFS, Hive, Pig Latin, PySpark, Python 3.x (Numpy, Pandas, Scikit-learn, Matplotlib), Jupyter, Github, Linux

Confidential - Iselin, PA

Data Scientist

Responsibilities:

  • Use IBM Watson Studio to analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
  • Use Jupyter Notebook to create and share documents that contain live code, equations, visualizations, and explanatory text.
  • Understanding and implementation of text mining concepts, graph processing and semi structured and unstructured data processing.
  • Worked with Ajax API calls to communicate with Hadoop through Impala Connection and SQL to render the required data through it.
  • Used ElasticSearch (Big Data) to retrieve data into application as required.
  • Performed Map Reduce Programs those are running on the cluster.
  • Developed scalable machine learning solutions within a distributed computation framework (e.g. Hadoop, Spark).
  • Analyzed the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Developed Hive queries for Analysis across different banners.
  • Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.
  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications to improve robustness.
  • Used Hive to partition and bucket data.
  • Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Worked on improving performance of existing Pig and Hive Queries.

Environment: SQL/Server, Oracle 9i, MS-Office, Kera, Tensorflow, IBM Watson Studio, Jupyter Notebook, Teradata, R Studio, XML, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, HIVE, AWS.

Confidential - New York, NY

Data Analyst/ Data Scientist

Responsibilities:

  • Worked with data scientists and the research team to gain valuable insights.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Performed data imputation using Scikit-learn package in Python.
  • Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
  • Worked with Amazon EC2 based cloud-hosted architecture systems to provide solutions for client.
  • Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
  • Work on different data formats such as JSON, XML and applied machine learning algorithms in Python.
  • Use AWS Environment for loading data files from the cloud servers.
  • Collaborated with business leaders to analyze problems optimize processes and build presentation dashboards.
  • Merge data into AWS environment to make several teams could access the data from different locations which saves times and increase security.
  • Program both R and Python scripts and modules for data collection, cleaning, analysis and visualization.
  • Updated legacy data systems to convert hard copies to searchable online database format.

Environment: Apache Hadoop, Linux, SQL, Tableau, Python (Numpy, Pandas, Scikit-learn, Matplotlib, Seaborn), AWS.

We'd love your feedback!