We provide IT Staff Augmentation Services!

Data Scientist Resume

Pittsburgh, PA


  • Over all 10+ years of IT experience and Data Scientist experience specialized in implementing advanced Machine Learning and Natural Language Processing algorithms upon data from diverse domains and building highly efficient models to derive actionable insights for business environments leveraging exploratory data analysis, feature engineering, statistical modeling and predictive analytics.
  • Experiences in Machine learning, data mining, structured and un - structured data analysis, and image data analysis, including feature extraction, pattern recognition, algorithm development, text mining, computer simulation, data modeling, databases design, model evaluation and deployment.
  • Experience with Statistical Analysis, Data Mining and Machine Learning Skills using R, Python and SQL.
  • Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning), rules and ever evolving regulatory environment.
  • Experience in building intuitive products and experiences for millions, while working alongside an excellent, cross-functional team across Engineering, Product, and Design.
  • Expert in transforming business requirements into analytical models and designing algorithms.
  • Proficient in developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data to improve business performance in every aspect.
  • Equipped with experience in utilizing statistical techniques which include Correlation, Hypotheses Modeling, Inferential Statistics as well as data mining and Modeling techniques
  • Experience working with machine learning supervised algorithms - Linear Regression, Logistic Regression, Linear Discriminant Analysis (LDA), Decision Tree, Random Forest, Support Vector Machines (SVM), Naive Bayes, K - Nearest Neighbor.
  • Experience working with machine learning un-supervised algorithms - Hierarchical clustering, K-means clustering, Density-Based Clustering (DBSCAN).
  • Expert level mathematical knowledge on Linear Algebra, Probability, Statistics, Stochastic Theory, Information Theory, and logarithms.
  • Strong experience with Python and its libraries Pandas, NumPy, Sci-Kit learn, Seaborn, Matplotlib and R for algorithm development, data manipulation, analysis, and visualization.
  • Proficient in writing complex SQL queries like stored procedures, triggers, joints, and subqueries to access and manipulate database systems like MySQL, PostgreSQL.
  • Expertise in Cost Benefit Analysis, Feasibility Analysis, Impact Analysis, Gap Analysis, SWOT analysis and ROI analysis, SCRUM, leading JAD sessions and Dashboard Reporting using tools like Tableau & Power BI.
  • Experience in SAS/STAT, STATA, R, SQL, Tableau, Python, MS EXCEL (VLOOKUP, Pivot table, Macros).
  • Experience in using Tableau for data visualization and designing dashboards for publishing and presenting storyline on web and desktop platforms.
  • Experience with deploying systems in Amazon Web Services (AWS)
  • Expertize with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark
  • Proficient in the entire project life cycle and actively involved in all the phases including data acquisition, cleaning, engineering, feature scaling, feature engineering, statistical modeling, and visualization.
  • Experienced on working different file formats like JSON, CSV, XML in Anaconda Navigator, Jupyter Notebook, Visual Studio code, and Spyder. Experience in using Git and Git Hub for source code management.
  • Excellent Team player and self-starter possess good communication skills.


Programming & Scripting Languages: Python, R, C, C++

Big Data Ecosystem: Apache Hadoop ecosystem, Apache Spark, HDFS, MapReduce, Apache Kafka, Hive, Pig, ETL

Libraries: Python (NumPy, Panda, Scikit-learn, SciPy), MatplotLib, Spark ML, Spark MLlib

Databases: MySQL, PostgreSQL, Oracle, Teradata

NoSQL: Cassandra, MongoDB

BI and Visualization: SAS, Tableau, Power BI, RShiny

IDE: Jupyter, Zeppelin, PyCharm, Eclipse

Cloud Based Tools: Microsoft Azure, Google Cloud Platform

Machine Learning: Principal Component Analysis, Association Rules, K-means, Hierarchical clustering, Market Basket Analysis, Linear/Logistic Regression, Lasso, Ridge, Elastic Nets, Decision Trees, Random Forest, K-Nearest Neighbor, Naive Bayes, Support Vector Machines, Gradient Boosting, Bayesian models, Ensemble Methods, Regularization (L1, L2)

Statistics: T-tests, Chi-square analysis, Correlation tests, A/B testing, Normality tests, Residual diagnosticsANOVA, ARIMA, Holt winters, Exponential smoothing, Bayesian structural


Confidential, Pittsburgh, PA

Data Scientist


  • Perform analytical tasks on structured and unstructured data to extract business insights and build data transformation modules.
  • Conduct data acquisition and data preparation by pulling data from various sources to create modeling dataset for predictive models.
  • Analyze data and build machine learning models using Python libraries such as Numpy, Pandas, Scikit learn.
  • Conduct exploratory data analysis and visualizations using matplotlib and seaborn libraries.
  • Recommend application and process improvements using root cause analysis, data mining, and best practices if models are not scoring.
  • Identify, analyze and interpret patterns in datasets. Engineer features by analyzing existing data and building new data features to improve the accuracy of machine learning models.
  • Responsible for parallel testing of predictive models with other automated models.
  • Build dashboards and present results to the business teams using Tableau for reporting.

Confidential, Franklin Lakes, NJ

Data Scientist


  • Designs data visualization, data transformation modules and supports enterprise data science models and Pulse platform.
  • Apply in-depth knowledge and experience to manage data science applications and liaise with partners in IT and Infrastructure team to develop new approaches and systems for efficiently implementing Data Science solutions.
  • Help debug complex system issues and rollout fixes.
  • Perform data analysis and assist in data reporting as needed.
  • Recommend application and/or process improvements using root cause analysis, data mining, and best practices if models are not scoring.
  • Assist in data acquisition and data preparation by pulling data from various sources to create modeling dataset for predictive models or text mining.
  • Help Sr Data Scientist to schedule and manage scoring jobs on unix.
  • Assist in parallel testing of predictive models with other automated models.
  • Translated business questions into research objectives, design and conduct analyses, develop findings and synthesize recommendations to deliver valuable, relevant, and actionable insights
  • Identify, analyze and interpret patterns in complex datasets. Engineer features by analyzing existing data and building new data features to improve the accuracy of machine learning models.
  • Collaborate with a multi-disciplinary team of engineers, data analysts, product managers and marketing, to leverage data and to facilitate the use of research.
  • Created data-science and machine learning products in supply-chain and inventory space
  • Leveraged large scale data processing such as Spark, Hive into the data-science products
  • Planning and implemented supply chain optimization projects (e.g. warehouse-slotting, route planning)
  • Conducted exploratory analysis and feature engineering to fit the best models using SciKit Learn.
  • Used python libraries such as Numpy, Pandas, Scikit learn to analyze the data and build machine learning models.
  • Used matplotlib, seaborn libraries for the exploratory data analysis and visualizations.
  • Used various cross validation techniques to tune the hyperparameters.
  • Involved in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions.

Confidential, Overland Park, KS

Data Engineer


  • Coordinate with analysts and business stakeholders to define data event structures that enable downstream analytics.
  • Cross functionally work with teams to design data structures to increase the efficiency of analytical processes providing data in easily accessible, usable formats for end users.
  • Involved in the data cleaning by standardizing the data, missing values and outliers’ treatments.
  • Performed exploratory data analysis to understand the distributions of the attributes and their relations and correlations.
  • Addressed overfitting and underfitting by tuning hyperparameters using L1 and L2 Regularization.
  • Extensively used Hive, SQL queries to extract data from various sources.
  • Used PySpark and SparkSQL for data transformations and aggregations.
  • Collected various store attributes and added them into our segmentation model in order to better classify different segments using clustering algorithms.
  • Used Tableau to build the dashboards and presented the results to the business teams and client.
  • Experience with AWS SDK for Python
  • Worked with Amazon S3 to store and retrieve any amount of data to improve reliability.
  • Familiar with Amazon EC2 to set up cloud computing to reduce need of traffic forecasting.
  • Created S3 resources and launch the EC2 instance on Amazon Web Services.

Hire Now