We provide IT Staff Augmentation Services!

Lead-data Scientist Resume

Ashburn, VirginiA


  • Data scientist specializing in technology and innovation. Key contributions in identifying, drafting and implementing innovation driven technology solutions in biomedical, material science and telecommunication industries.
  • Domain specific scientific research and comprehensive validation of data science outputs in relation to business goals.
  • Scientific writing, business proposal writing, executive level reporting with effective presentation skills.
  • Patent opportunity identification with futuristic technology solutions.
  • Excellent communication skills needed for swift implementation of data science and data analytic projects.
  • Efficient team management skills with mentoring, prioritizing and managing resource allocation.
  • Full stack python data science skills to distill multi - dimensional data into effective decision making visualizations. Well-versed in Database, SQL, ETL, Tableau and other data science related technologies.
  • Led data science teams to build unified engineering, geographical and financial information portal for advanced analytics with cognitive database using fuzzy logic and AI in Confidential Business Group.
  • Performed extensive statistical evaluations to build propensity models, patient safety models, survey bias mitigation models for underperforming Insurance Star Ratings to Aetna Insurance Group
  • Steered company towards investment into digital biomarker data infrastructure for advanced member analytics.
  • Led data science projects involving member archetyping for member focused health care campaigning.
  • Developed an award-winning system to digitize customer contracts. Scraped product pricing data from millions of PDF documents into the Oracle database and structured the database for a Python flask application using fuzzy logic for the business users. Deployed an application that saves 38000 man hours per year for Confidential Business Group.
  • Used customer transactional history to segment B2B customers with complex price elasticity factors using XG boost.
  • Authored 5 peer reviewed research publications in high impact journals on drug discovery and chemical structures.
  • Implemented machine learning experiments for material science projects to evaluate silica anode coatings for charge holding capacity optimizations. Testing using multi-factorial and multi-variate analysis in advanced research environment
  • Molecular dynamic simulations to evaluate Quantitative Structure Activity Relationships (QSAR) in drug discovery and energy storage systems.
  • Extensive experience in statistics and applied machine learning using Python.
  • Experience in managing, analyzing data using Python (NumPy, SciPy, Pandas Scikit-learn).
  • Developed and maintained structured and unstructured data for analysis and reporting.
  • Worked on Spotfire, Plotly, Dash, Bokeh to create deployable visualizations.
  • Experienced in Data Profiling, Data Quality and concepts of Data Governance.
  • Follow testing procedures for data vulnerability and open source code bugs (PyChecker).
  • Technical knowledge in the field of OLTP, Datawarehousing, OLAP and ETL environment with quick adaptability to new technologies.
  • Query optimization, execution plan and performance tuning of queries for better performance in SQL.
  • Developed predictive models using Decision Tree, Random Forest and Naïve Bayes.
  • Experience with Natural Language Processing and deep learning frameworks.
  • Optical Computer Recognition packages (OCR) for text extraction, enhancement for NLP.
  • Developed interactive business intelligence dashboards using and Bokeh packages in python and Tableau.
  • Contained and deployed python based models using Flask and Jinja
  • Proficient in Normalization (1NF/2NF/3NF)/De-normalization techniques in Relational/Dimensional database environments.
  • Microsoft Azure Deployments.
  • Experience in using SAS Base programming to produce various reports, charts and graphs.
  • Efficient in analyzing and documenting Business Requirement Documents (BRD) and Functional Requirement Document (FRD).
  • SQL server and Tableau integrations


Core competencies: Data mining, programing, business analysis, data analytics, data science, Big data, ETL, predictive analytics, agile tools and technics, statistical modeling, SQL, Machine Learning, NLP, PyBrain, Time-Series analysis, Data visualization, Solution Architecture

Database: Oracle 9i/10g, MS-SQL Server, MySQL, Cassandra, BigQuery, PostgreSQL.

Programming Languages: Python, (Scikit-learn, nupy, matplotlib, pandas, SFrame, Pytorch, Keras, Flask, OCR, OpenCV, R, HADOOP (MAPREDUCE, HIVE, SPARK), Java.

Statistical tools & BI: Tableau, Power BI, SAS 9.4, SAS Visual Analytics, MS Azure Machine Learning Studio, Crystal Reports.

Server: Windows Server 2016, Unix, AWS EC2, AWS EMR, MS Azure, Google Cloud Platform

Additional tools: MS Project, MS Excel, MS PowerPoint, GitHub, JIRA, Google Analytics, Google SEM, A/B Testing.

Computing: Large Scale, Distributed Computing, Nano-second simulations.

Deployment: Jenkins Artifactory, Version Control, Continuous Integration Continuous Deployment (CICD), Containerization, Jar technologies


Confidential, Ashburn, Virginia

Lead-Data Scientist


  • Lead two data science teams map with hypothesis formulation, variable definitions for feature selection, expected visualizations.
  • Provide architecture for portal and application development with UI, database designs and integrations focused on NLP, Fuzzy logic, machine learning models.
  • Manage and mentor junior data scientists and data Develop conceptual data science proposals for projects focused on business goals.
  • Generate tables with key terms and populate each term field with business user requested data.
  • Classify text specific to various products of Confidential .
  • Deposit digitized data to business user databases.
  • Perform NLP sentiment analysis for product pricing and advertising.
  • Maintain CICD procedures.
  • B) Global Customer Hierarchy:
  • Led team of data scientists to classify business to business (B2B) customers of Confidential Wireless, Confidential Wired and Connect into various segments based on billing system features.
  • Obtain customer data, clean, and categorize based on billing system distribution.
  • Track billing system mismatches and provide visualizations to the business users to track and avoid billing mismatches.
  • C) SmartPricing & SmartAD:
  • Designed a AI tool for near-realtime pricing and marketing assist tool with pricing forecast
  • Obtain historic customer data, dynamic and semi-dynamic price influencing data.
  • Cleaned, prepared data for categorizing products based into specific price bins,
  • Used customer behaviors for segmentation using K-means clustering.
  • Predicted pricing for each customer using polynomial regression analysis.

Environment: Spark, python machine Learning Algorithms, Sci-kit learn, pandas, Flask, Ginger, OCR, PywinAuto, Python Selenium, SQL Developer, HTML, CSS, JavaScript.

Confidential, Alabama

Lead-Data Scientist consultant


  • Ranking and recommendation of daily deals.
  • Advanced statistical methods for A/B testing and user experience analytics.
  • EDA analysis like testing of hypotheses, generating graphs, cleaning of data, removing outliers, transforming data, melting of data, import and export of data.
  • Prototyping group sequential testing theory and multivariate analysis in Python and SQL .
  • Designed and developed RDBMS using MySQL.
  • Carried out Regression Analysis with python, investigated on the model for problems like goodness of fit, over-fitting, Multi co linearity, residual normality, etc.
  • Established predictive models for forecasting sales.
  • Worked on Noise Reduction methods, exponential smoothing and Fast Fourier Transformation methods and made comparison with regression methods.
  • Deriving actionable inputs for merchants and consumer data
  • Implemented machine learning models for sales automation.
  • Statistical education and seminars for product and marketing managers.

Environment: Python, Machine Learning Algorithms,Tableau Desktop, SQL, Predictive Modelling.

Confidential, Baton Rouge, Louisiana



  • Secured POC funding(Delta IFund) for lifestyle based chronic disease management application.
  • Developed info-visual application with user dashboard that transforms raw data and metadata from users’ mobile inputs into relevant health insights.
  • Created a pipeline for data collection, EDA and predictive outputs.
  • Identified statistically significant variables.
  • Actively involved in the design and development of the Star schema data model.
  • Using NLP techniques (with NLTK and Gensim libraries), prototyped extraction of structured data from EHRs.
  • Developed various workbooks in Spotfire from multiple data sources.
  • Developed predictive models using Decision Tree, Random Forest and Naïve Bayes.
  • Created dashboards and visualizations using Spotfire desktop.
  • Worked on Business forecasting and segmentation analysis.
  • Written connectors to extract data from databases.
  • Worked in Amazon Web Services cloud computing environment.

Environment: Python, Machine learning, Amazon Web Services, Jupyter Notebook, Excel, SQLite3

Confidential, New Orleans, Louisiana



  • Estimated the state-of-charge and faded capacity of Li-ion cell based on time domain analysis.
  • Collected battery charge datasets from thousands of batteries at nano-second to milli second time scales
  • Performed time-series analysis in python.
  • Used Vienna package for nano second time scale simulations.
  • Performed Monte-Carlo simulations in python for elucidation of atomic level interactions.
  • Performed iterative supervised machine learning for building models to predict battery longevity based on usage.
  • Characterization of the cells was done based on the different ground parameters of Li-ions.
  • Collected, cleaned and prepared frequency, time domain and parameter data for analysis.
  • Trained algorithms and boosted regression trees with Scikit-learn libraries.
  • The achieved accuracy of the bayesian inference and boosted regression trees was 95% of capacity estimations and within +/- 2% of the nominal cell capacity from true value.
  • Created Dashboards with interactive views, trends and drill downs, published Workbooks and Dashboards.

Environment: Python, shell scripting, Matlab, BASH scripting, Excel, LONI supercomputing platform, R Studio

Confidential, New Orleans, Louisiana

Technology Licensing Associate


  • Performed data analytics to identify potential technology licensing partners.
  • Collected, cleaned and prepared broad pharmaceutical industry’ investment portfolio data.
  • Performed advanced feature engineering for high- cardinality variables for EDA.
  • Used linear classification, decision trees and random forest algorithms for classification of technologies based on big-pharma prioritization.
  • Provided potential revenue predictions for commercial licensing partners.
  • Negotiated Intellectual property license agreements in national and international jurisdictions.
  • Drafted key technological summaries with potential revenue predictions.
  • Created structured database solutions for IP data management.
  • Created custom Stored Procedures, Functions and Packages as per logic to extract the and extensively used SQL to create various reports, such as model attribution and data analysis for marketing intelligence and data quality.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of database.

Environment: Innography, Python, Tableau Desktop, SQL.

Confidential, New Orleans, Louisiana

Post Doctoral Researcher


  • Synthesized, purified and crystallized diverse class of cellular proteins for structural studies.
  • Collected X-ray diffraction data from particle accelerator synchrotron.
  • Extracted data from electronic images and processed using python based CCP4 suite.
  • Performed advanced mathematical Fourier Transformations for processing data.
  • Performed parameter refining, integration, scaling and merging of scintillation intensities.
  • Used classification technics from CCP4 suite to identify Bravais lattice forms of crystals.
  • Iterative model building and refinement was done using python based Phenix suite.
  • Performed model B-factor refinement, multivariate analysis and ANOVA
  • Published and presented models in peer reviewed journals and communicated internationally.

Environment: Matlab, Python, CCP4, PHENIX, MosFlm, Machine Learning Algorithms.

Confidential, Baton Rouge, Louisiana

Biomedical, Consultant


  • Built a data-pipeline t hat performs ETL on data from public repositories to sample and identify gene targeting drugs.
  • Performed exploratory data analysis and data profiling to identify the cause and effect relationship between explanatory variables and target variable.
  • Performed data preparation (transformations, imputations, filtering outliers, Sampling), organize and interpret the data for modeling.
  • Imputed missing values and outliers based on similar case substitution.
  • Used machine learning algorithms to provide statistical and mechanical prediction for gene targeting drugs.
  • Used Boltzmann statistics to calculate gene’s chemical potential of drug binding.
  • Identified potential gene targets with hotspots for drug binding with near 99% accuracy based on structural conformations.
  • RMSE, ANOVA, B-factor refinements were performed for gene model refinements.
  • Identified the chemical segments, those are likely to bind to cure the genetics diseases.
  • Played a key role in the development of ranking score card for all chemical structures binding to given targets classify them into high risk and low risk customers.
  • Performed Logistic Regression to estimate the probability of a new structures being classified as a good or bad leads.
  • Identified the threshold levels of variables that help in tracking the binding moities in the presence and absence of water molecules.
  • Developed supercomputer macromolecular models of biological structures simultaneously integrating large sets of partial differential equations in massively parallel framework.
  • Built nano-second timescale ensemble distribution of macromolecular models.
  • Used PCA analysis for dimensionality reduction in base-pairing probability calculations.
  • High resolution models have been used extensively in drug discovery process.
  • Developed drug data libraries with boolean array queries to identify drug binding pockets.
  • Used classification methods to predict chemical moieties with structural similarities.
  • Used extensive statistical normalization, denormalization, standardization methods for spectroscopic, calorimetric experiments using Excel, MATLAB and Python.
  • Published and presented models in peer reviewed journals and obtained various awards.

Environment: Molecular Operating environment, SQL, Python, Matlab, Python, CCP4, PHENIX, MosFlm, Machine Learning Algorithms.

Hire Now