We provide IT Staff Augmentation Services!

Big Data Scientist Resume

Emeryville, CaliforniA


  • Over 15 years of experience as scientist on big data analytics, artificial intelligence (AI), machine learning (ML), modeling and statistics.
  • Extensive experience in architecting and developing software applications and SDKs in Windows, Linux platform and iOS.


Hadoop/Spark packages: HDFS, Hive, Machine learning, MLlib, H2O/Flow, Kafka, Flink, TensorFlow, NIFI, Oozie, SQL etc;

Languages: R, Scala, Python, C/C++, C#, .Net, Objective C, JS, SQL, etc.;

Analytics Tools: SPSS, Alteryx, GenStat etc;

Application development: Intellij, Eclipse, Visual Studio, Xcode etc.

Databases: SQL Server, Vertica, OLAP, HBase, Hive, Parquet etc.

PROFESSIONAL EXPERIENCE:Confidential, Emeryville, California

Big data scientist


  • ML, Analytics modeling: Used spark ML models and SPSS to predict the risk of clients not paying the bill. Used sparkML in SCALA: DecisionTreeClassifier, GBTClassifier, RandonforestClassifier with the latest spark ML functionality, Pipeline, to compare the results from the same training datasets, and compare with results from SPSS decision tree model. ML models were also implemented in Python for performance and results comparison.
  • Designed and implemented an ETL engine in SCALA/Spark in Hadoop/Spark environment, which can be used to accomplish the whole ETL process of clients data for different clients only through configuration files.
  • Client data ETL project: work with a team of 7 big data engineers to implement a HDFS/Hive/sparkSQL ETL process in SCALA for deploying client terabyte raw data to EDW.
  • Design and architect AI/ML infrastructure for our Hadoop/Spark data lake ecosystem to automate development and deployment of analytic models in Hadoop/Spark data lake environment with Spark ML.
  • Big data modeling: Spark ML does not have time series modeling package. I developed AI, predictive analytic model tool with R and R packages, which selects the best type among these models: HortWinters, Linear, polynomials, exponential, ARIMA, based on the time series data to forecast the medical cost trend in near future. Users can configure how to handle the training time series data and outputs from the model such as how far ahead of the forecast and overlapping of forecast over the current time series data.
  • Designed/implemented two SQL Server OLAP schema for supporting two iPad mobile applications.
  • Mobile applications: Designed and developed 7 iPad mobile applications. Scrum master of tree mobile projects. Work with product managers on benefit models such as impact of clinical risks, cost sharing forecast, cost analysis of chronic conditions etc.

Confidential, Sunnyvale, California

Senior Technical Staff


  • Storage Technology on Linux system (kernel 2.6 and 2.4)
  • Enhance/modify linux scsi layer driver, significantly increased problem drive recovery rate in Confidential storage systems, compared with other company’s storage products (reduced the faulty drive rate to 1.23%/year).
  • Develop linux kernel device drivers: SAS target driver, SAS HBA controller, SCSI modules, block device, ISCSI target driver, I2C device drivers.
  • Bootloaders: GRUB and Redboot to support Confidential KDI boot requirements.
  • Power management: how to manage the system state when AC power fails to the battery backup and how the system can resume quickly and reboot to the kernel image when AC power resumes.
  • Enhance/modify LSI MPT driver suite for LSI IO controller chip.
  • Redesign and enhance iscsi target driver.
  • Redesign some components at the kernel and user level.
  • Design and enhance storage system health monitoring infrastructure.
  • Write system utilities, such as firmware downloads for expanders, SAS resource discovery, disk performance measurements etc.
  • Modify/enhance proprietary expander and IO controller firmware for Confidential hardware configurations.
  • Write I2C device drivers to monitor hardware status.
  • Virtualization - enable Confidential system to support guest OS with good performance (with XEN/KVM technology).
  • Expander firmware development (Maxim expander chip).
  • Windows automatic NVR storage system setup application
  • Design, architect and implement an automatic setup application, which sets up a storage system for NVR system. It integrates Windows network setup/configuration, iscsi initiator, device management and telnet session functionalities into one application through GUI.

Confidential, Milpitas, California

Principal Engineer


  • Managed the development of several Windows GUI applications to access the protected areas on the hard disk supporting nine international languages.
  • Wrote the original requirement/specifications for installation/deployment of Confidential Always, users guides and technical documents.
  • Provided technical expertise on deployment of Confidential Always in WinPE environment.
  • Designed and implemented several new features (such as memory caches, multi-threads, asynchronous operations) for ImageCast, which have improved the performance of the product by 100% on Linux and 30% on Window, and a new implementation of the image file format to improve the flexibility, extensibility and portability of the product.
  • Provided many Windows utility applications to our group and other groups within the company.
  • Network security projects
  • Platform/system enhancement projects
  • Developed hotswap external devices on laptop and intelligent laptop power management application.

Hire Now