We provide IT Staff Augmentation Services!

Data Scientist Resume



  • Over 6+ years of experience in Data Analysis, Machine Learning, Data mining with large data sets of Structured and Unstructured Data, Data Acquisition, Data Validation, Predictive modeling, Classification, Data Visualization, and discovering meaningful business insights.
  • Proficient in applying Statistical Modelling and Machine Learning techniques (Linear Regression, Logistic Regression, Ridge Regression, K - Nearest Neighbors, Decision Trees, Bagging, Boosting Random Forest, Support Vector Machine, Bayesian, Gradient Boosting, XGBoost, Neural Network, Clustering in Predictive Analytics, Segmentation Methodologies, Regression-Based Models, Factor Analysis, PCA, and Ensembles.
  • Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification, Data Visualization and identifying Data Mismatch.
  • Experience working in Azure Cloud including Azure Data Lake Gen2 for Data Store, Azure Data Factory, Azure Devops and Azure Databricks.
  • Experience building docker container and images to deploy a web application.
  • Experience in designing stunning visualizations using Tableau/PowerBI software and publishing and presenting dashboards, the storyline on web and desktop platforms.
  • Using Agile methodology to develop a project when working on a team.
  • Expertise in relational databases like Oracle SQL and MySQL
  • Explicitly fashioned in writing SQL queries for various RDBMS such as SQL Server, MySQL, Microsoft SQL, Oracle, NoSQL databases such as MongoDB to handle unstructured data.
  • Experience in using advanced Excel, VBA scripts, and MS Access to dump the data and analyze based on business needs.
  • Robust participation for functioning in a fast-paced multi-tasking environment both independently and in the collaborative team. Adequate with challenging projects and work in ambiguity to solve complex problems. A self- motivated exuberant learner.


Programming Languages: Python, SQL, Pyspark

Querying languages: Spark SQL, MySQL, Microsoft SQL.

Machine Learning: Scikit-learns, Keras, TensorFlow, Numpy, Pandas, Matplotlib, ggplot2, Scrapy, Seaborn, Stats Models, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, KNN, Linear Regression, Lasso, Ridge, SVM, Regression Tree, K-means, Ridge, and Lasso, PCA

Visualization Tools: Tableau, Power BI, Python - Matplotlib, Seaborn, Plotly

Databases: Azure Data Lake Gen2, MySQL, Oracle, IBM DB2

IDE Tools: Databricks, Visual Studio, Jupyter Notebook

Deployment Tools: Azure Devops, Anaconda Enterprise, GIT Hub

SDLC Methodologies: Agile, Scrum, Waterfall

Project Management: JIRA, Azure Devops Boards

Cloud: Azure Cloud, AWS Cloud


Confidential, WI

Data Scientist


  • Responsible to build an Azure Cloud Enterprise Data Platform. Including establishing connection between Azure Resources(ADF, Databricks, ADLS GEN2, Storage layer access for ADF)
  • Implemented Azure Devops for Continuous Integration (CI) and Continuous Deployment (CD).
  • Used Advanced Analytics to gain more insight into the Stator Manufacturing Process.
  • Extracted data from different data sources including IIoT data, Meta data and Test data
  • Deep dived into different processes with the Business team to understand the business and to draw insights from the data.
  • Worked with semi-structured (JSON) data and transformed into structured format as part of data preprocessing.
  • Identified different process variations in the Manufacturing Process thereby reducing the failures and increasing the business value.
  • Applied Machine Learning Algorithms to identify the different threshold values on Machines.
  • Developed Predictive Analytics using Pyspark and Spark SQL on Databricks to extract, transform and uncover insights from the raw data.
  • Implemented latest machine learning techniques Light GBM, Pycaret to identify meaningful patterns and for predictive modelling.
  • Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark databricks cluster.
  • Involved in Data ingestion to Azure Data Lake, Azure Databricks by building pipelines in Azure Data Factory.
  • Used Tableau, Python libraries (Matplotlib, Seaborn, Plotly) for data visualization
  • Presented the findings to stakeholders and created Technical Documentation.
  • Worked with huge data including ingestion, extraction, and transformation and modelling using Python, Spark etc.

Environment: Python 3.6.4, Pyspark, Spark SQL, Databricks, Azure Data lake Gen2, Azure Devops, Azure Data Factory, MLLib, Regression, SQL Server, ETL, Tableau, NumPy, Pandas, Matplotlib, Power BI, Scikit-Learn

Confidential, Dallas, TX

Data Scientist


  • Took responsibility to bridge between technologists and people to drive innovation from conception to production.
  • Contribute to the building and management of the analytics infrastructure for the Advanced Research.
  • Improved data mining processes which infer insights from company data used to develop marketing strategies.
  • Improved data cleansing and mining process using Python, resulting in a 50% time reduction.
  • Collaborated to gather data from various sources, data preparation, data visualization, and data Reporting to support analysis on large and complex data sets to identify trends, patterns, and Relationships.
  • Performed data pre-processing tasks like merging, sorting, finding outliers, missing value imputation, data normalization, making it ready for further analysis.
  • Implemented dimensionality reduction techniques like PCA, LDA by applying feature selection and feature extraction.
  • Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards.
  • Analyzed data using SQL, PySpark, Python, and present analytical reports to management and technical teams.
  • Achieved 90% customer monthly retention by predicting the likelihood of returning customers using a Random Forest, XG Boost algorithms.
  • Used SVM, Random Forest, and KNN, XG Boost, and Logistic Regression models for data modelling.
  • Implementing various machine learning algorithms on humongous data in PySpark using MLLib.
  • Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using Python, Tableau, and Confidential Studio.

Environment: Python 3.6.4, Confidential Studio, MLLib, Regression, SQL Server, ETL, Tableau, NumPy, Pandas, Matplotlib, Power BI, Scikit-Learn


Data Scientist


  • Took responsibility to bridge between technologists and business stakeholders to drive innovation from conception to production.
  • Performed market analysis to efficiently achieve objectives, increasing portfolio and customer Base by approximately 17% and 6% respectively.
  • Worked with data sets of varying degrees of size and complexity including both structured and unstructured data.
  • Generously practiced data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Implemented data acquisition and manipulation based on SQL.
  • Implemented financial time series analytical techniques in python including ARIMA, Garch, Exponential Smooth, and Markov Chain.
  • Implemented fraud detection techniques in Python with highly unbalanced datasets.
  • Extracted transaction data of all 11 big territories (1 million+) by PySpark and analyzed the data to forecast the areas (SkLearn/MLLib) with higher revenue in a 95% accuracy rate.
  • Identified trends and insights, and optimized spend and performance based on the insights
  • Utilized classification models like logistic regression, decision and boosted trees, random forest, and performed cross-validation based on grid search and K-fold cross-validation.
  • Deployed various machine learning models and regularly updated them with quarterly development with new improvements.
  • Raised 10% of revenue by lowering False Positive and False Negative by applying Bagging and Boosting algorithms.
  • Performed data pre-processing tasks like merging, sorting, finding outliers, missing value imputation, data normalization, making it ready for further analysis.
  • Evaluated the performance of using K-fold cross-validation and generated ROC curves and PR curves for comparison, analyzed feature importance to identify top factors that influenced prediction results.

Environment: Python, Oracle, Tableau, MS Excel, Server Services, SQL, Microsoft Test Manager, MS Office Suite, Spark


Data Analyst


  • Identified, formulated, and documented detailed business rules and Use Cases based on requirements analysis
  • Took responsibility to bridge between technologists and business stakeholders to drive innovation from conception to production.
  • Facilitated development, testing, and maintenance of quality guidelines and procedures along with necessary documentation
  • Collaborate with team members to collect and analyze data.
  • Led data discovery, handling structured and unstructured data, cleaning and performing descriptive analysis, and storing as normalized tables for dashboards.
  • Involved in database design, optimization on SQL server
  • Performed data visualization and Designed dashboards with Tableau, and provided complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
  • Improved data cleansing and mining processes based on SQL, resulting in a 50%-time reduction.
  • Analyzed data using SQL, Python, and presented analytical reports to management and technical teams.
  • Studied and analyzed the HTML and CSS scripts of the web pages.
  • Create presentations and reports based on recommendations and findings.
  • Cleaned the text to remove punctuation, remove stop words, and lemmatize text.
  • Recommended and evaluated marketing approaches based on quality analytics on customer consuming behavior.

Environment: Python, Linux, Tableau, JIRA

Hire Now