Senior Data Scientist Resume
SUMMARY
- Highly efficient and results - oriented data scientist with strong quantitative skills, development experience and strong education background with a MSc.
- Responsible self-starter with demonstrated experience in statistical programming language (R, Python, Scala, SAS) and programming language python for API’s.
- High ability holder on visualization with tools such as Tableau as well as good understanding of relational database such as SQL and oracle and non-relational database such as hbase, mongoDB and redis.
- Machine learning tools such as Hadoop, Spark, H2O, sparkling-water, pysparkling, SAS etc. as well as deep learning tools such as Keras, Tensorflow, Theano, MXnet, PyTorch. GPU cuda programming.
- Scaling data science. Expert in Predictive Modeling such as XGBoost, regression, Logit, Probit, GBM, RandomForest, Neural Network (generative model, GAN, VAE, RNN, CNN, word2vec etc.), Naive Bays, K-nearest learn, PCA etc. (supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning etc.) and also probabilistic modeling (PyMC3, Edward, Pyro) such as MCMC, HMC, NUTS, bayesian linear regression, variational models etc,
- Data mining skills such as parsing, nlp (natural language processing) and proficient in language modeling such as topic model, text clustering, word embedding, Word2Vec, Glove, text classification, RNN, Convolutional RNN etc. familiar with all the development environment such as Hadoop, Cloud (AWS, GCP, Azure), GPU, Spark. etc.
TECHNICAL SKILLS
Programming skills: Python, R, H2O, Sparkling-water, Pysparkling-water, Tensorflow, Keras, Theano, MXNet, PyTorch, RWeka, Stan. Scala, SAS, Matlab, Fortran
Data Science skills: SVM, XGboost, RandomForest, Deep Learning, Machine Learning, Clustering, D3, AWS, Cloudera Work Bench, Microsoft Azure, GCP (google cloud platform), DGX GPU installation and operation etc.
Managerial Skills: Project & Business Management & Development, leadership, Networking. Network Security (Accredited Configuration Engineer Certificate holder)
Language Skills: English & Korean
PROFESSIONAL EXPERIENCE
Confidential
Senior Data Scientist
Responsibilities:
- Work as senior data scientist.
- Working on rainfall forecasting modelling
- R&D (from research to Developments) as well as advanced analytics.
- Case study of advanced analytics in water business sector.
- Lead the team to execute projects in water business sector
- Work with Azure cloud (databrick, Azure Machine Learning Services, AWS, data storage, etc.)
- Water flow rate forecasting modelling in Monterey Water Project
Confidential
Chief Data Scientist
Responsibilities:
- Work as chief data scientist.
- Working on multiple projects. Data Science E2E solutions. A.I. projects
- R&D (from research to Developments) as well as advanced analytics.
Senior Data Scientist
Responsibilities:
- Neighbor Comparison project (unsupervised learning model, Fuzzy Clustering model)
- Azure MS mls POC
- Meter pipe image recognition modeling (Vision): image recognition for detecting rust in meters from the images.
- Video real time image recognition for IoT.
- Fast Meter Predictive modeling
- Gas Engineering projects building predictive models (ml & dl) and productionizing in Emergency Services and Safety Monitoring (ESSM) project.
- Transfer learning, GAN model. DCGAN, Bi-directional LSTM, Ladder Network, Ladder VAE, Auxiliary VAE, Info VAE, Disentangled Sequential Autoencoder. Triplet Loss Model, Multiarmed Bandit model, Deep Reinforcement learning model etc.
- Perceptron, MLP, Conv1, LSTM, Conv1-LSTM, LSTM-VAE, ARIMA, VAR etc.
- Bidirectional-LSTM, GRU, Bidirectional-LSTM-VAE etc.
- Pressure prediction modeling (July, 2019 - present), regression model to predict future’s pipe pressure.
- Time series clustering modeling (Dec. 2018): k-means time series clustering, K-Shape time series clustering, global assignment k-means clustering etc. variational auto encoder, LSTM-VAE model: applied VAE model & LSTM-VAE model to detect various gas leaks & water leaks in huge dataset. drf model (gap model) (Aug. 2018 - Oct. 2018): distributed random forest model to predict gaps of transmissions
- Text model (Aug. 2018 - Jan. 2019): predicting gas leaks & water leaks by text model that is built from nlp & adaboost learning algorithm.
- Time-series modeling.( Nov. 2018 - present): time dependent consumption data and weather data with generated pilot data, time series models are attempted to predict gas leaks & water leaks.
- Bad-debt model (Logit) implementation to production. (April. 2018 - Nov. 2018 ): SAS predictive model built previously by me is implemented to Confidential system. End to end test, data pipeline etc.
- Battery model. (Sep. 2018 - present): building predictive model that classifies the battery is defect or not.
- Speaker at PyData Conference. (Oct. 2018): presented to experienced level audience with topic of “hot water leak detection using variational autoencoder”
Data Scientist
Responsibilities:
- Gas Leak Detection (Oct. 02, 2017 - Aug.24. 2018): There are 6 million customers, which means there are 6 million meters. This is a pilot to implement a machine learning algorithm to forecast what the next daily hour of usage will be. This could increase safety because, if this number is way out of bounds, it could be the result of a gas leak, or, it can be used to better market to customers to save energy. Applied deeplearning model (segmentation analysis, Text Analytics using service order and language model. Variational Autoencoder, generative model and semi-supervised learning.
- High Bill Predictive model (Oct. 02, 2017 - Aug. 24. 2018): There are people who call for high bill inquiries. This model predicts the likelihood for them to call. Who is calling and when they call. This is time series problem and requires to build VAR and LSTM to predict the right timings as well as batch analysis such as XGBoost etc.
- Drive a Bad Debt (Oct. 02, 2017 - Aug. 24, 2018): The credit group wants to know what customer attribute is driving/increasing bad debt. By doing features engineering, select right features (attributes) of customers.
- Calculate correlation coefficients and important rate of features.
- By business analysis, select important features.
- Check variance loadings between variables.
- Calculate p-value, and see how significant the features are.
Confidential
Technical Reviewer
Responsibilities:
- Review and edit technologies and contents in data science books.
- Currently reviewing “what is new in tensorflow 2.0”
- Published "unsupervised learning with python" as a Technical Reviewer.
Confidential
Data Scientist
Responsibilities:
- Advanced feature engineering including categorical feature encodings such as Bayesian target encoding, WOE encoding etc. as well as information value, p-values and correlation coefficients.
- Building advanced predictive models such as Multi-class predictive models.
Confidential
Data Scientist
Responsibilities:
- Predicted optimal price of washers and dryers in facilities of Confidential .
- Used machine learning and deep learning algorithms to solve the business problems.
- Did NLP and text mining from textual data to convert unstructured data to machine learnable format.
- Did Topic modelling and text classification and clustering to extract contextual keywords and topics.
- Did Hypothesis Test, Feature engineering etc to verify right features to use in building machine learning models.
- Built various algorithms for machine learning, deep learning and feature generation.
- Planned and executed experiments to optimize machine learning model and deep learning models.
- Wrote my own algorithms to generate new features which are decided by business rules and used as predictions.
Confidential
Data Scientist
Responsibilities:
- Predicted utility’s damages and the number of outages from the weather data and historical outage and customer impact (damages) data.
- Did image processing with satellite weather data.
- Did text mining and NLP from event title column and event description.
- Analyzed data distributions and skewness to check the sensitivity on machine learning application.
- Applied XGboost algorithm, RandomForest algorithm, GLM, SVM, decision tree and etc.
- Performed Data manipulation and wrangling
- Visualized data with diverse ways such as API, shapely, shiny and so on.
- Wrote various algorithms and did modellings
- Did Research and Analytics
Confidential
Data Scientist
Responsibilities:
- Predicted operational and failure modes from physical and calculated sensors from ESP.
- Wrote unique classifier that predicts gaslock, watercut, impeller wearing, etc with 95% confidence. Expected to save 10 million dollars by predicting pumps failure in downhole.
- Leveraged Machine Learning to create diagnostic classifiers and clustering analysis.
- Analyzed and designed feature set to maximize model fit with R.
- Implemented the machine learning algorithm into production software utilizing Python.
- Applied SVM machine learning algorithm to non-linear data to fit and predict.
- Wrote Algorithm programming with R and python. Sometimes Matlab.
- Did Data mining using various models.
- Worked with Shapelet and Time series data Warping.
Confidential
Data Scientist
Responsibilities:
- Utilized various industrial geophysical and engineering data, Identified 3 new reserviors with an estimated 2 billion barrels of oil in North East Australian offshore Australia. Found where petroleum and water are located.
- Analyzed property of rocks underground based on data utilizing wavelets from well data integrated with seismic data to create synthetic map.
- Applied machine learning models to dataset and predicted.
- Designed and ran uncertainty analyses.
- Accurately predicted size of petroleum reserves underground with 90% confidence.
- Did Visualization and interpretation (2D, 3D, 4D).
Confidential
Data Science Researcher
Responsibilities:
- Worked with various geological, geographical and engineering data. With various well logging data and production data, we found out how water layers have been changed depending on time going by.
- Analyzed property of rocks underground based on data.
- Ran AVO(Amplitude Versus Offset) analyses, Fuzzy Analyses, Uncertainty Analyses
- Programmed full waveform inversion models with Fortran, C, C++.
- Analyzed CO2 layer change when time flew from seismic map after Data processing and Tomography
- Modelled of CO2 expansion underground and timing when the CO2 overflow the storage of reservoir.
