- A strong leader in Computer Science and Machine Learning, with over 20 years of real - world experiences and inspirational leadership in building innovative teams from ground up to deliver world-class technology products and services.
- My own research interests are machine learning and statistical modeling, including natural language processing (NLP), linear and logistic regression, supervised classifiers (naive Bayesian, SVM, Decision Tree), unsupervised clustering (connectivity and centroid based), and deep learning (CNN & RNN) for time series and sentiment analysis.
- I am proficient in Python, R, TensorFlow, and with hands experiences in the Big Data technology stack, including Hadoop, Spark, Cassandra, Graph database, in-memory data grid.
- Machine learning algorithms
- Enterprise database and data warehouse analysis
- data modeling
- logical and physical design
- Service Oriented Architecture (SOA)
- Web Services
- Project Planning
- Requirement Gathering/Analysis
- Technical Documentation
- Build up the data science practice as a key value-add service offering, to provide real-time actionable insights about customer, employees and organizational behavior, improve the quality of services and reduce costs.
- Build a public-cloud-based data platform from ground up, including Enterprise Data Grid, data ingestion (Spark), data curation (semantic analysis, free text search & graph), data science (customer behavior prediction, time series analysis, recommendation engine, fraud detection).
- Create a new technology stack, including Hadoop, Spark, Kafka, TensorFlow, In-memory Data Grid (HazelCast), Graph processing, and APIs.
- Build and lead an innovative learning organization with three teams of over 40 members, distributed platform engineering, advanced analytics and products.
- Hands-on experiences in creating Natural Language Processing (NLP) analysis to extract and tokenize unstructured texts, e.g. machine data and customer services logs, sentimental analysis and communication pattern recognition, identified security risks for lateral movements.
- Lead development of supervised and unsupervised machine learning algorithms, k means clustering, naïve Bayesian classification, Linear Regression in solving challenging business problems, and deep learning neural network for analyzing time-series data to predict infrastructure and security fault events, as well as consumer credit risks and frauds.
- Strong hands-on experiences with R, Python, TensorFlow & Theano framework, pandas, and scikit-learn packages.
Chief Data Scientist
- Design and implement the Big Data strategy and technology infrastructure, including Hadoop clusters, MapReduce framework, data injection, and persistence.
- Create graph modeling on social networks in workspace, including person, position, skills, work experiences, projects, etc.
- Created in memory graph processing engine, graph label propagation, crowd sourcing with built-in bias detection, binary ranking, classification and cluster analysis.
- Created a framework for predictive analysis, including clickstream construction, user session identification, linear and non-linear classification analysis, association rule and naïve Bayesian analysis to predict users’ online behavior.
- Design and develop Big Data analytic strategy to meeting business needs, including predictive analysis, pattern identification and sentiment analysis.
- Develop Big Data/NoSQL risk platform to support global trading, including support for VAR (Monte Carlo and Historical) evaluation on portfolio risks, credit risks, interest risks, and counter party risks, pricing engines for options, and quant models.
- The platform was adopted for ingesting networking logs to detect suspicious and potential fraudulent activities, such as lateral movements, using stochastic gradient descent model.
- Develop predictive and analytical modeling architectures to support trend spotting, including naïve Bayesian analysis, n-gram text mining, linear and non-linear classification analysis, association rule, sequential pattern, and machine learning (random forest, K-means, etc.), Python (pandas, scikit-learn).
- Responsible for developing technology platform and road maps, tracking issues and status of day-to-day operations, as well as socializing technology demos and architectural proof-of-concepts with partners and peers in IT, business and operations community.
Senior Vice President
- Lead design and implementation of Master Data Management (MDM) including setting up and leading data governance and data stewards committees; and establishing data acquisition, storage, delivery and retention policies, standards, protocols and best practices;
- Lead design of Enterprise Logical Data Model (ELDM), Canonical Data Model (CDM), mapping logical entities and attributes with existing logical and physical database designs throughout the organization, and dimensional modeling.
- Created the first Global Watch and Restricted Surveillance (GWRS) system for compliance and regulatory reporting, including SEC filings, Anti-Money Laundry (AML) reporting, employee trading monitoring