Chief Data Scientist Resume
4.00/5 (Submit Your Rating)
Indianapolis, IN
PROFESSIONAL SUMMARY:
- An innovative Data Scientist/AI expert with over 10 years of corporate experience delivering end to end solutions that address customer pain points, across domains like Recommender Systems
- Anomaly detection, NLP, Machine Vision, Collaborative Filtering, Clustering, Classification, Regression, Deep Learning, SLAM, and/or statistical modelling.
- Expert in statistical programming languages like R, SPSS and Python (Pandas, NumPy, Sckit - Learn, Beautiful Soup) and Mahout for implementing machine learning algorithms in production environment.
- Strong Experience using TensorFlow, MXNet, Theano, Caffe and other open source frameworks.
- Actively participated in all phases of the project life cycle including data acquisition (Web Scraping), data cleaning, Data Engineering (dimensionality reduction (PCA & LDA), normalization, weight of evidence, information value), feature selection, features scaling & features engineering, Statistical modeling (decision trees, regression models, neural networks, SVM, clustering), testing and validation (ROC plot, k-fold cross validation), Association Rule Learning, Reinforcement Learning, Deep Learning and data visualization.
- Experience working full data insight cycle - from discussions with business, understanding business logic and business drivers, Exploratory Data Analysis, identifying predictors, enriching data, working with missing values
- Exploring data dynamics, meaning or building predictive data models (if predictability can be found)
- Excellent data visualization experience either with proprietary code in R or Python, or using other visualization tools; ready for insight digestion by business and decision making to senior management (Global CTO, Global BI Leadership level)
- Extensive experience using R packages like (GGPLOT2, CARET, DPLYR)
- Extensive experience in creating visualizations and dashboards using R Shiny.
- Developed numerous visualizations in d3js.
- Hands on Experience with Natural Language Processing. Ingesting datasets from various data sources ranging from HDFS, AWS, etc.
- Packaging applications with docker and vigrant.
- .Developed visualizations in numerous tools such as Spotfire, Tableau, Power BI.
- Knowledge on Groovy, Scala and Ruby.
- Developed numerous reports in R markdown and Jupyter notebooks.
- Experience collating sparse data into single source, working with unstructured data, writing custom data logic validation scripts.
- Extensive experience in data cleaning, web scraping, fetching live streaming data, data loading & data parsing using a wide variety of Python & R packages like beautiful soup.
- Hands on experience in implementing SVM, Naïve Bayes, Logistic Regression, LDA, Decision trees, Random Forests, recursive partitioning (CART), Passive Aggressive, Bagging & Boosting
- Experienced with Big Data Tools like Hadoop (HDFS), SAP HANA, Hive, & PIG
- Expertise in writing effective Test Cases and Requirement Traceability Matrix to ensure adequate software testing and manage Scope Creep.
- Experience in working with Data Management and Data Governance based assignments.
- Proficient with high-level Logical Data Models, Data Mapping and Data Analysis.
- Extensive knowledge in Data Validation in Oracle and MySQL by writing SQL queries.
- Experience in Health care Management, Retail with excellent Domain knowledge in financial industry’s financial instruments and financial markets (Capital & Money). Excellent communication, analytical, interpersonal and presentation skills; expert at managing multiple projects simultaneously.
- Experience working with on-shore, offshore, on-site and off-site individuals and teams.
- Strong understanding of Software Testing Techniques especially those performed or supervised by BA including Black Box testing, Regression testing, and UAT.
- Experience with Object Oriented Programming, Data structures and Algorithms and Design Patterns.
- Experience using markup and scripting languages
- Working knowledge using source control ranging from git, svn and cvs.
- Experience using Webservices using SOAP and REST.
- Software development experience in Java and Java Libraries such as Hibernate, Spring
- Experience using various IDE’S.
- Developed an application in NODEJS using Gulp, Browserify, SASS, ESLint, Image Compression and Material Design Lite.
- Enhanced applications with React js and Angular js.
PROFESSIONAL EXPERIENCE:
Confidential, Indianapolis, IN
Chief Data Scientist
Responsibilities:
- Went through data cleaning and manipulation phase on labeled and unlabeled image data set.
- Handled unbalanced data set problem such as models were not learning and label imbalance issues.
- Several resampling methods were implemented.
- Overfitting issue was present when model failed to generalize using resampled data. Data augmentation, batch normalization, 12 norm, dropout helped to overcome this issue.
- Resnet algorithm was used which uses smaller network
- Used Keras for implementation and trained using cyclic learning rate schedule.
- Using cyclic learning rate automatic schedule was implemented for in three cycles for about 20 hours of time.
- Accuracy, Kappa, precision and F1 score were calculated for comparing the results of four different algorithms: Naïve, Resampled, weighted and Resnet. About 80% accuracy was achieved using Resnet
- Developed SDTM Code List Conversion Tool for its intended use as per the Computer Systems and Electronic Records; Electronic Signatures (LQS302) procedure.
- Build dynamic UI using React js..
- Performed Exploratory Data Analysis using R.
- Developed numerous dashboards and stories using tableau.
- Prototype statistical models for POC (Proof of Concept).
- Performed Data Cleaning, features scaling, features engineering.
- Participated in design, development, and optimization of code in R.
- In real-time association rules were implemented which uses prior probabilities.
- Performed Data Mining in R (TM package, LSA package) using SAP HANA platform.
- Developed backend APIs and services for internal and external consumption.
- Developed Performance metrics to evaluate Algorithm's performance.
- Performed unit testing, functional and user-acceptance testing.
- Generated several reports in RMarkdown for submissions.
- Performed data manipulation and data cleaning using R and Python.
- Improved the efficiency and performance of the application by discovering the memory leaks & redundant codes and implementing multi-threading at various critical places.
- Developed many Shiny Applications.
- Implemented Spring batch with quartz scheduler framework.
- Implemented Role-Based authentication using Spring Security.
- Generated reports using Altova Mapforce.
- Dynamically generating graphical PDF reports using IText1.1 and excel reports using apache POI.
- Developed a business process to dynamically generate data using Oracle BI publisher.
- Generated reports from the database using PL/SQL and SQL.
- Created analytics reporting charts utilizing d3.js .
- Developed and modified Sharepoint user privileges as directed and in compliance with all standard operating procedures.
- Assisted in designing and development of requirements and statistical models.
- Went through the data cleaning and manipulation phase on clinical trial data sets of different drugs
- R functions were written for data piping and manipulation before data was feed inside The Bayesian models for meta-analysis
- Included several likelihood models such as Normal, Binomial, TTE, cLogLog, survival models, Poisson etc.
- Data parsing was done using DOCOPT package.
- MCMC sampling was implemented using JAGS sampler.
- WINBUGS code was included for data processing and model implementation
- Several visualizations (density plots, forest plots, leverage plots, network plots, covariant adjustment plots etc) were made using packages such as GGPLOT2, GGMCMC, animation etc
- Customized reports and presentations were generated autonomously using tool for different models using r packages e.g. rmarkdown, animation, knitr, ReporteRs etc
- Eventually everything was put in a package for Lilly internal use.
- Tool was tested under system testing and user acceptance testing in a regulated environment.
Environment: TensorFlow, Keras, Python, HPC, R, Matlab, HPC, Java Script, JAVA, SQL, C++, R shiny, HDFS, AZURE, Docker, D3JS, Tableau, Spotfire
Confidential
Principal Data Scientist
Responsibilities:
- Successfully delivered multiple NLP projects like building a chatbot that assists a customer to trouble shoot vehicle problems and recommend actions, further the bot could handle questions asked in natural language, related to common issues with the vehicle e.g. when my car is due for oil change, brake pads, how to sync my phone with Bluetooth etc.
- Extracted data from multiple sources like vehicle sensor data, vehicle previous claims & services/repairs performed
- Performed data pre-processing like data cleaning, text preprocessing, noise removal, lexicon normalization, object standardization
- Perform featuring engineering like WordEmbedding using word2vec models
- Build seq2seqmodels using structured data & word embedding. Seq2Seq model take an input and returns as desired output for e.g. it can take a question as an input and returns an answer. The benefit is it can take any arbitrary length question and returns and answers in natural language. It uses a recurrent neural network (LSTM/Memory Network) at the back-end.
- Used multiple evaluation matrix to validate models performance. Recommender system to configure new truck
- Build a recommender system to help a customer configure a new truck from features and based on the historical data containing configuration selections with the help of association rules in SAP HANA, a recommender system was implemented.
- Performed data profiling to learn user behavior, data sourcing and performed EDA using R and HIVE on Hadoop
- HDFS, Involved in all aspects of data pre-processing. QlikView for dynamically displaying results.
- Developed novel approach to build machine learning algorithm and implement it in production environment
- Performed Data Cleaning, features scaling, features engineering, string agg the 1500 datacodes were transformed into 24 different unique features. Historical data for past 5yrs was used.
- In real-time association rules were implemented which uses prior probabilities, market basket analysis
- Prototype machine learning algorithm for POC (Proof of Concept) SAP HANA platform was used for implementation which provides several mining algorithms, associated rules etc. SAP Lumira implementation for frontend.
- By Developed Performance metrics to evaluate Algorithm’s performance. Addressed the cold start problem and created visualizations for top 10 customers.
Confidential
Data Analyst
Responsibilities:
- Data analysis and visualization (Python, R,)
- Designed, implemented and automated modeling and analysis procedures on existing and experimentally created data
- Increased pace & confidence of learning algorithm by combining state of the art technology and statistical methods; provided expertise and assistance in integrating advanced analytics into ongoing business processes
- Parsed data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format
- Implemented Topic Modelling, PASSIVE AGGRESSIVE & other linear classifier models
- Perform tfidf weighting, normalize
- Performed scheduled and adhoc data driven statistical analysis, supporting existing processes
- Developed clustering models for customer segmentation using R
- Created dynamic linear models to perform trend analysis on customer transactional data in R
- Performed Topic modeling