- 10 years data analytics, data - driven entrepreneur and IT operation experience specialized in big data analysis tools such as Spark and Hadoop, Machine Learning, sales analysis, customer life value analysis, Product data analysis and building scalable interpretable machine learning models, building end to end data pipelines which included extracting, transforming and combining all incoming data with the goal of discovering hidden insight, involving complex IT business analysis projects with an eye to opitimze business processes and address business problems by providing reliable and scalable data science products, services and solutions.
- Also an expert of adopting the state of art Big Data and Cloud computing technologies such as Google BigQuery, Dataproc, Dataflow, Azure’s Databricks and Data Factory, IBM Waston Studio to deliver data-based scientific and efficient solutions and actionable insights so that to grow revenues, decrease costs, optimize operations.
- Good Experience in Python Machine Learning/Deep Learning Modeling for multiple data sicnece projects of full data science production cycle using various Python libraries NumPy, Pandas, matplotlib, Sklearn, Beautifulsoup, Pyecharts, PySpark, SparkSQL, OpenCV,TensorFlow and Keras.
- Solid SQL Database design knowledge and advanced query skills such as Subqueries, Case, View, Stored Procedure.
- Proficient in Tableau and Tableau Server and other Data visualization tools such as Python Matplotlib, Python Pyecharts, R Shiny, R ggplot2 to create visually powerful and actionable interactive reports and dashboards.
- Experience with A/B test and multivariate experiment design, deployment and evaluation.
- Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modeling, analysis, design and development with experience in Agile methodologies and SCRUM process.
- Proficient in writing functional specifications, translating business requirements to technical specifications, created/maintained/modified database design with detailed description of logical entities and physical tables.
- Excellent communication skills to deliver data insights to borad audience to support decision making and business growth
- Effective interpersonal skills to interact professionally with a diverse group, including executives, managers, and subject matter experts and build productive and meaningful working relationship across department cooperations to drive business growth.
- Strong experience with Python (3.x) to develop analytic models and solutions, good knowledge of Machine learning with various packages such as Python Numpy, Pandas, SciPy, Datetime, PySpark, MLLib, Scikit-learn, NLTK, TensorFlow, Keras, XGBoost, Matplotlib, Seaborn; R DPR, data.table, reshape, caret, dplyr, neuralnet, nnet, caret, MCMC, randomForest, ctree, rpart, lm, glm, nnet, xgboost, ksvm, lda, tm, Markdown, R Shiny.
- Working Knowledge with Hadoop ecosystem HDFS, MapReduce, Hive, Apach Airflow and Apache Spark.
Programming for DS: Python (Pandas,NumPy, matplotlib, Sklearn, Beautifulsoup, TensorFlow, Keras and), SQL, Hive, SparkSQL, R, SAS(enterprise miner), Rapid Miner
Databases: Microsoft SQL Sever, P/L SQL, Toad Database Modeler, Cassandra, Splunk, Oracle MySQL, Azure Comos DB
Big Data Arcitecture: Hadoop, HDFS, MapReduce, Hive, Pig, Spark, Kafuka, Lanmada, Flink, Beam, PAC theory, SLA
Cloud Computing Platform: AWS: S3, EC2, RedShit, RedShit Spectrum, GCP: BigQuery, Data Studio, Big Table, DataProc, Azure: ADLS, Databrick, Datafactory, SQL Server, Machine Learning Studio
Data Analysis Mothed: Data distribution analysis, Data Normality analysis, Data moment analysis, Outlier analysis, Hypothetical test, Correlation Analysis, Multivirate Analysis
Business Analysis Mothed: Customer Life Value analysis, Survival analysis, Cohort, PEST analysis, Porter’s 5F analysis, Operation Research, Denamic Programming, Continuous Optimization
Statistical Methods: Hypothetical Testing, A/B Testing, Confidence Intervals, ANOVA, Chi-square test, Correlation Analysis
ML and Deep Learning Algorithms: Regression, Multivariate analysis, CNN, RNN, LSTM, SVM, K-means, Cluster analysis, PCA, Bert
IDE and Version Control: Anaconda3, Pycharm Professional, Jira, Jupypter Notebook, Git, Github, BitBucket
Data Visulization Tool: Tableau Desktop, Tableau Server, R Shiny, Pycharts, GGplot2, Matplotlib, Bokhe
Web Data Scrapting: BeautifulSoup, Regre expression, Error handling, HMTL, CSS
Data Reporting Tool: Tableau Desktop, PowerPoint, Excel, Tableau Server, DataStudio
ML for Model Optimization: L1/ L2 Regularization, Feature deduction, ROC analysis, Adam, Boosting, Pooling, Padding, Filtering
Auto Machine Learning Tool: Rapid Miner Studio, Microsoft Machine Learning Studio
Senior Data Scientist/Senior AI Developer
- Decreased internal anti-money laundering investigation time from 2 months to 45 minutes per case by developing a high-performance Convolutional Neural Network suspicious transaction detection model using Python, Pytorch, Spark, Jira, BitBucket, and Keras.
- Designed, built, trained and tested a word Embedding model for transaction codes caterogication with recall of 92% and precision 85% using Word2Vec, LSTM and Recurrent neural network
- Improved Erica sentiment analysis model’s accuracy by 25% using Dropout, Early Stop and Spatial Pyramid Pooling
- Boosted Erica Text Automatic summarization model training performance by introducing Shake-Shake Regularizartion
- Designed high stable and scalable transaction data ETL pipeline and customer profile transformation ETL pipeline for model consumptions by using Spark, Airflow
- Automated code production process and Increased code production efficiency by using Airflow, slack, and Kafka.
- Increased code production integration efficiency and deployment speed by introducing Pytest and using A nsible.
- Designed and validated customer beahivor irregularity pattern detection experiments with different neural network architectures such as RNN, CNN, Spacial Spatial Pyramid Pooling, Gated recurrent neural network, and Unstructural Embedding.
- Increased server running speed and efficiency by doing servers' housekeeping jobs using Bash shell scripts.
Data Scientist/ Data Engineer
- Improved product useability by analyzing TB level user behavior and online traffic data using Google BigQury
- Worked with the engineering team to design, deploy and evaluate A/B testing and generate other hypothesis-based experiments and statistical significance so that to test if the new features would be robust or not.
- Helped building Word Detection model for Speech Recognition to streamline user on-borading process and improved User experiences
- Increased product profitability using Dynamic programming in Excel and Scenario Analysis in Tableau Desktop
- Designed customer order image caption model experiments using CNN, LeNet5, ImageNet and PySpark
- Migrated TB level on-premise local Hadoop Spark jobs to GCP Cloud using Dataproc, Pub/Sub and Dataflow.
- Automated and opitimized ETL jobs processing millions rows data from different resources using Google DataProc
- Developed predictive models in Tensorflow using machine learning algorithms such as logistic regression, classification, Naive Bayes, Random Forests, K-means clustering, KNN, PCA, and regularization for data analysis.
- Evaluated the models’ accuracy by ROC, MSE and learning curve to identify the model overfitting or underfitting issues, so that to find appropriate approach to correct and improve the model accuracy.
- Supported data-driven scientific businee decision making by providing solid reports and Data Visualization using Google Data Studio, SQL and Tableau including visualizing product market performance and prediction results.
- Gathered, analyzed, documented, and translated application requirements into data models and supported standardization of documentation and the adoption of standards and practices related to data and applications.