We provide IT Staff Augmentation Services!

Data Scientist Resume

San Francisco, CA


  • Highly experienced Data Scientist with over 10 years’ experience in Data Extraction, DataModelling, Data Wrangling, Statistical Modeling, Data Mining, Machine Learning and DataVisualization.
  • Domain knowledge and experience in Retail, Banking and Manufacture industries.
  • Expertise in transforming business resources and requirements into manageable data formatsand analytical models, designing algorithms, building models, developing data mining andreporting solutions that scale across a massive volume of structured and unstructured data.
  • Proficient in managing entire data science project life cycle and actively involved in all thephases of project life cycle including data acquisition, data cleaning, data engineering, featuresscaling, features engineering, statistical modeling, testing and validation and datavisualization.
  • Proficient in Machine Learning algorithm and Predictive Modeling including RegressionModels, Decision Tree, Random Forests, Sentiment Analysis, Naïve Bayes Classifier, SVM,Ensemble Models.
  • Proficient in Statistical Methodologies including Natural Language Processing, Hypothetical Testing, ANOVA, Time Series,Principal Component Analysis, Factor Analysis, Cluster Analysis, Discriminant Analysis.
  • Knowledge on time series analysis using AR, MA, ARIMA, GARCH and ARCH model.
  • Worked in large scale database environment like Hadoop and MapReduce, with workingmechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).
  • Strong experience with Python(2.x,3.x) to develop analytic models and solutions.
  • Proficient in Python 2.x/3.x with SciPy Stack packages including NumPy, Pandas, SciPy,Matplotlib and IPython.
  • Working experience in Hadoop ecosystem and Apache Spark framework such as HDFS,MapReduce, HiveQL, SparkSQL, PySpark.
  • Very good experience and knowledge in provisioning virtual clusters under AWS cloud whichincludes services like EC2, S3, and EMR.
  • Proficient in data visualization tools such as Tableau, Python Matplotlib, R Shiny to createvisually powerful andactionable interactive reports and dashboards.
  • Excellent Tableau Developer, expertise in building, publishing customized interactive reportsand dashboards with customized parameters and user - filters using Tableau(9.x/10.x).
  • Experienced in Agile methodology and SCRUM process.
  • Strong business sense and abilities to communicate data insights to both technical andnontechnical clients.


Databases: MySQL, Oracle, HBase, Amazon Redshift, MS SQL Server 2016/2014/2012/2008 R2/2008.

Statistical Methods: Hypothetical testing, ANOVA, Times Series, Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation, Auto-correlation

Machine Learning: Regression analysis, Bayesian Method, Decision Tree, Random Forests, Support Vector Machine, Neural Network, Sentiment Analysis, K-means Clustering, KNN and Ensemble Method

Reporting Tools: Tableau Suite of Tools 10.x, 9.x, 8.x which includes Desktop, Server and Online, Server Reporting Services(SSRS)

Data Visualization Tools: Tableau, Matplotlib, Seaborn, ggplot2, JavaScript Libraries - D3, React, Node, Angular

Languages: Python (2.x/3.x), R, SAS, Excel, SQL, T-SQL

Process Modeling Tools: BPMN, DMN - Signavio, Visio

Operating Systems: Windows 98/NT/2000/2003/XP/7/8/10, Linux/ Unix


Data Scientist

Confidential, San Francisco, CA


  • Lead the full machine learning system implementation process: Collecting data, model design, feature selection, system implementation, and evaluation.
  • Worked with Machine learning algorithms like Regressions (linear, logistic), SVMs and Decision trees.
  • Developed a Machine Learning test-bed with different model learning and feature learning algorithms.
  • By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning).
  • Used Text Mining and NLP techniques find the sentiment about the organization.
  • Developed unsupervised machine learning models in the Hadoop/Hive environment on AWS EC2 instance.
  • Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data.
  • Participated in all phases of Data mining, Data cleaning, Data collection, developing models, Validation, Visualization and Performed Gap analysis.
  • Used R programming language for graphically critiquing the datasets and to gain insights to interpret the nature of the data.
  • Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company's core business.
  • Worked on AWS S3 buckets and intra cluster file transfer between PNDA and s3 securely.
  • Developed predictive models on large scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning and deep learning.
  • Extensively used Pandas, Numpy, Seaborn, Matplotlib, Scikit-learn, SciPy and NLTK in R for developing various machine learning algorithms.
  • Built multi-layers Neural Networks to implement Deep Learning by using Tensor flow and Keras.
  • Perfectly Utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Researched extensively on the nature of the customers and designed multiple models to perfectly fit the necessity of the client and Performed Extensive Behavioral modeling and Customer Segmentation to discover behavior patterns of customers by using K-means Clustering.
  • Super Intended usage of open source tools - R Studio(R) and for statistical analysis and building the machine learning models.
  • Establish a robust process by machine learning(MLbase) to insure the predictive analytics and quality of all algorithms and processes
  • Ensure operational and optimal execution of production data science routines and processes
  • Implemented supervised learning algorithms such as Neural networks, SVM, Decision trees and Naïve Bayes for advanced text analytics.
  • Performed Data wrangling to clean, transform and reshape the data utilizing Numpy and Pandas library.
  • Contribute to data mining architectures, modeling standards, reporting, and data analysis methodologies.
  • Conduct research and make recommendations on data mining products, services, protocols, and standards in support of procurement and development efforts.
  • Involved in defining the Source to Target data mappings, Business rules, data definitions.
  • Worked with different data science teams and provided respective data as required on an ad-hoc request basis
  • Assisted both application engineering and data scientist teams in mutual agreements/provisions of data.

Environment: R Studio 3.5.1, AWS S3, NLP, EC2, Neural networks, SVM, Decision trees, MLbase, ad-hoc, MAHOUT, NoSQL, Pl/Sql, MDM,

Data Scientist

Confidential, San Francisco, CA


  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
  • Designed and automated the process of score cuts that achieve increased close and good rates using advanced R programming
  • Managed datasets using Panda data frames and MySQL, queried MYSQL relational database (RDBMS) queries from python using Python-MySQL connector.
  • Utilized standard Python modules such as csv, itertools and pickle for development.
  • Analyzed large datasets to answer business questions by generating reports and outcome.
  • Worked in a team of programmers and data analysts to develop insightful deliverables that support data-driven marketing strategies.
  • Executed SQL queries from R/Python on complex table configurations.
  • Retrieving data from database through SQL as per business requirements.
  • Create, maintain, modify and optimize SQL Server databases.
  • Manipulation of Data using python Programming.
  • Adhering to best practices for project support and documentation.
  • Understanding the business problem, build the hypothesis and validate the same using the data.
  • Managing the Reporting/Dash boarding for the Key metrics of the business.
  • Involved in data analysis with using different analytic techniques and modeling techniques.

Environment: R, Python, MYSQL, exploratory analysis, feature engineering, Machine Learning, Python (NumPy, SciPy, pandas, scikit-learn, NLTK, NLP), Tableau.

Data Scientist

Confidential, San Francisco, CA


  • Worked closely with data scientists to assist on feature engineering, model frameworks, and model deployments implementing documentation discipline.
  • Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
  • Worked with the ETL team to document the Transformation Rules for Data Migration from OLTP to Warehouse Environment for reporting purposes.
  • Performed data testing, tested ETL mappings (Transformation logic), tested stored procedures, and tested the XML messages.
  • Created Use cases, activity report, logical components to extract business process flows and workflows involved in the project using Rational Rose, UML and Microsoft Visio.
  • Involved in development and implementation of SSIS, SSRS and SSAS application solutions for various business units across the organization.
  • Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
  • Wrote test cases, developed Test scripts using SQL and PL/SQL for UAT.
  • Creating or modifying the T-SQL queries as per the business requirements and worked on creating role playing dimensions, fact-less Fact, snowflake and star schemas.
  • Wrote, executed, performance tuned SQL Queries for Data Analysis & Profiling and wrote complex SQL queries using joins, sub queries and correlated sub queries.
  • Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR2.
  • Developed Data Mapping, Transformation and Cleansing rules for the Master Data Management Architecture involved OLTP, ODS and OLAP.
  • Performed Decision Tree Analysis and Random forests for strategic planning and forecasting and manipulating and cleaning data using dplyr and tidyr packages in Python.
  • Involved in data analysis and creating data mapping documents to capture source to target transformation rules.
  • Extensively used SQL, T-SQL and PL/SQL to write stored procedures, functions, packages and triggers.
  • Analyzed of data report were prepared weekly, biweekly, monthly using MS Excel, SQL & Unix.
  • Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python.

Environment: Python 2.7, T-SQL, SSIS, SSRS, SQL, PL/SQL, OLTP, Oracle, MS Access2007, MS Excel, XML, Microsoft Visio, UML, OLAP, Unix

Hire Now