We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Richmond, VA

SUMMARY

  • Twenty years of experience in data analytics, computational fluid dynamics, and engineering.
  • Data Scientist skilled in collecting, cleaning, and organizing to create efficient predictive models using Machine Learning algorithms.
  • Excellent in programming, advanced statistical data analysis, model - based engineering, predictive analytics, Data Engineering, Data visualization, and data science technologies using Python.
  • Experienced in performing distributed and parallel computing using Apache PySpark, Microsoft Azure Databricks, and AWS Databricks
  • Experienced in managing, analyzing data using Python (NumPy, SciPy, Pandas Scikit-learn).
  • Developed and maintained structured and unstructured data for analysis and reporting.
  • Expert in using MySQL, SQLite3 to clean, collect and summarize data.
  • Experienced with Natural Language Processing and deep learning frameworks.
  • Developed interactive business intelligence dashboards using python (Plotly)
  • Followed standard solution test procedures for data vulnerability and open-source code bugs (PyChecker).
  • Experienced in Data Profiling, Data Quality, and concepts of Data Governance.
  • Proficient in Normalization(1NF/2NF/3NF)/De-normalization techniques in Relational/Dimensional database environments.
  • Worked on Random Forest, Logistic Regression, KNN, KMeans Clustering, Linear Regression, decision trees, ANN, CNN, RNN.
  • Worked on Grid Search CV, hyperparameter model tuning, Cross-validation, Bootstrapping, Bagging, Gradient Boosting, and Ensemble techniques to improve the predictive model performance.
  • Worked on Noise Reduction methods, exponential smoothing, Uni-variate Analysis, Bi-variate Analysis, random sampling, and Fast Fourier Transformation methods.
  • Experienced in using SAS Base programming to produce various reports, charts, and graphs.
  • Demonstrated experience in generating, communicating business insights to executive stakeholders.

TECHNICAL SKILLS

Data Science: ETL, Data Science pipeline (Cleansing, wrangling, visualization, EDA, modeling, interpretation), Data enrichment, Feature engineering, Model verification & validation, Data Quality assessment (Missing & outliers)

Machine Learning: Supervised ML, Unsupervised ML, LDA, PCA, Time-series analysis, NLP, Cross-validation, Regression methods, cost function reduction methods, Cross-validation, Bootstrapping, Bagging, Gradient Boosting, and Ensemble techniques

Statistical Methods: A/B testing, ANOVA, Multiple linear regression, Hypothesis Testing, sampling, random sampling, normalization, de-normalization, multiple-regression, multi-variate analysis, chi-squared analysis

Cloud Computing: HPC, Microsoft Azure, AWS EC2, S3 Storage, Large Scale, Distributed and Parallel Computing, Microsoft Azure Databricks, and AWS Databricks, Apache Spark

Programming Languages: Python, Shell Scripting, C, C++, Visual Basic, HTML, MySQL, SQlite3, Scala, Jupyter notebooks, PyCharm, Excel VBA/Macro Techniques

Tools: Tableau, Microsoft Power BI, Minitab, MATLAB, GitHub, Microsoft office

Program Management & Reporting: JIRA documentation, Microsoft Project, Scientific writing, Executive summary and PowerPoint presentations, effective visualizations in Tableau and Power BI.

PROFESSIONAL EXPERIENCE

Confidential, Richmond, VA

Data Scientist

Responsibilities:

  • Time series sales forecasting for short shelf-life food products.
  • Time series sales forecasting for predicting demand in multiple geographic regions for various food products.
  • Drafted solution plan, deployment plan, and testing plan.
  • Created and deployed code for collaborative development and deployment.
  • Implemented ETL pipelines for transactional data extraction from legacy data sources.
  • Developed SARIMA and SARIMAX Timeseries models.
  • Cross-validated using MAE, MAPE, RMSE, and linear fold evaluation.
  • Developed Spark pipelines for data preprocessing, data ingestion algorithms for ingestion into Databricks environment.
  • Communicated time series plots to multiple business entities Walmart, Target, Food Lion, etc.
  • Implemented MLFlow for model retraining and automated model deployment.
  • Communicated Multiple-use cases and outcomes of the forecasting to critical stakeholders.

Confidential, Clarksville, TN

Project Design Team Leader

Responsibilities:

  • Wrangled 3TB of data to identify target variables and potentially valuable features using pandas, NumPy, sklearn, etc.
  • Data Scaling and high dimensional data using PCA (principal component analysis- first two components account for over 75% and the first four for over 95%)
  • Feature engineering, data enrichment based on geography, and heatmaps to understand complex relations.
  • Assessed model performance on train and test splits using cross-validation and scikit-learn metrics (R-squared, mean absolute error, and mean squared error)
  • Used linear classification, decision trees, and random forest algorithms for classification based on the accuracy/requirements.
  • Provided analytics-based recommendations to increase revenues, cut costs, and implemented models for sales & engineering automation.
  • Developed dashboards for centralized business analytics.
  • Documented Business Requirement Documents (BRD) and Functional Requirement Document (FRD).
  • Statistical education and seminars for product and marketing managers.

Confidential, Richmond, VA

Technical Lead, Analytical Methods

Responsibilities:

  • Utilized Python for data collection, cleaning, and organization of extensive data from sensors and controllers to predict system failures hours or even days in advance.
  • Trained engineers and other employees of clients with minimum data-driven culture to collect and record vital data.
  • Developed PPDAC cycle (Problem, plan, data, analysis, and conclusion) and data insights using statistical Inference and ANOVA.
  • Advanced forecasting to predict temperature, humidity, and other conditions to maintain fresh food and improve food safety using time series analysis.
  • Mathematical, probabilistic models and correlations to predict the energy consumption of a refrigerator. Model Validation with empirical data and achieved 80% model accuracy. Overall, model prediction reduced testing time in the LAB by 70% and cost by 80%.
  • Identified the fundamental value creation opportunities and expanded the client's portfolio. This involves extensive research on companies' profiles, acquiring domain knowledge, and analyzing the existing data to discover various insights and control strategies.
  • Performance map predicting the performance of a heater as a function of the surrounding conditions
  • Automated the entire process using Python and developed models with algorithms.
  • Handled extensive volumes of data by splitting data sets, automated data collection/analysis from laboratory reports, and included error checking Python scripts for data quality.
  • Created Python scripts to automatically check the regressions on data sets and generate multivariate regressions/ model performance metrics using ML-Flow
  • Ensured the model has predictive power by minimizing model bias/variance and overfitting or underfitting the regressions.
  • Created a pipeline for data collection, EDA, and predictive outputs for LAB and analytics departments.
  • Created reports in Tableau for data visualization and data insights.
  • Presented results to team’s global head detailing value proposition & to enable strategic planning across all business units and product lines
  • Predicted room occupancy from environmental factors like temperature, humidity, and CO2 level using SVM classification (Support-Vector machine) and hyperparameters on KNN models. The resultant predictions using Machine Learning helped the Heating, Ventilating, and Air Conditioning industry.
  • Data wrangling, EDA to generate correlation tables for features, the 4-dimensional plot for occupancy to find best options. Validated with test data and SVM model accuracy is 85%.
  • Compared occupancy detection with ML and DL methods (classification with neural networks with or without regularization methods).
  • Developed dashboards that transform raw data into relevant data insights.
  • Experienced in working closely with data engineers for creating data pipelines. Worked with software developers and architects to help operationalize models into productio
  • Trained a machine learning algorithm assigning velocity and pressure to any flow and used a convolutional neural network with a U-net architecture.
  • Validated with empirical data and observed that trained neural network model is strikingly accurate, with an average relative error of around 1%. The ML approach computed a single sample in less than half a second, compared to 180 seconds using the standard CFD approach.
  • Optimized the airfoil shape using the neural networks model and reduced testing/simulation cost by 60% and design cycle time by 50%.
  • Implemented convoluted neural network algorithms to various train and test models.
  • Developed standard procedures on how neural networks can become part of a typical simulation pipeline.
  • Developed ML algorithms for the Department of Energy (DOE) to estimate commercial building energy usage.
  • Worked on a dataset with three years of hourly meter readings from over 1000 buildings across the globe. Predicted the target variable-mean hourly energy consumption per day using supervised Machine Learning models (Linear Regression and Random Forest Model)
  • Preprocessing of the dataset by removing anomalies, imputing missing weather data values, and correcting time zones.
  • Feature Engineering and extracted 28 features, categorical interactions between building metadata and meters.
  • Models tuned using a 12-fold cross-validation method by using a test/validation data set.
  • Ensembled the model predictions using weighted generalized mean.
  • Reading data, preprocessing, and Time-series impact on energy consumption using Python. Predictions are with an r2 score of 0.82.
  • Presented the results to the team members using visualization in Power BI.

Confidential, Indianapolis, IN

Sr. Mechanical Engineer

Responsibilities:

  • Worked in the lab as an R&D engineer on case tests to meet FDA requirements (Food and Drug Administration).
  • Analyzed ten years of Empirical and simulation data (using computational fluid dynamics) and developed correlations to predict key performance indicators of energy usage of a commercial supermarket display case.
  • Ensured on-time implementation of energy-efficient cases in the market by coordinating with technicians, design engineers, and strategic partners.
  • Created customized analytical spreadsheets to automate reports for data acquisition systems and internal component sizing’s (Automation saved 40 hrs./week for an engineer and 10 hrs./week for sales)
  • Worked on problem formulation, solution methodology & verification testing from start to finish.

We'd love your feedback!