Traditional machine learning classifiers such as boosted trees and random forests form the backbone of many pieces of critical software infrastructure across a broad range of industries - risk assessment models, threat detection models, and financial models help protect user privacy and safety as well as enable corporations to quickly make decisions in complex informational landscapes. These models, while critically useful, are also generally proprietary, involve training with extensive private data, and offer bad actors the chance to manipulate mission critical systems.
In recent years the Ethereum ecosystem has expanded into domains previously exclusive to traditional finance. The emergence of fixed-rate lending protocols (Notional Finance), on-chain treasury bill ETF tokens (Ondo Finance), and credit score engines (Spectral Finance) attest to the burgeoning demand for on-chain stable financial products. As new protocols expand the current slate of product offerings, the need to aggregate and model large amounts of data will likely continue to grow. Machine learning models will increasingly become critical components of the growing DeFi landscape. However, on-chain computations may be prohibitively gas expensive and are also fully public. The latter poses a problem to protocols that may wish to preserve the privacy of machine learning model parameters and sensitive user data, such as token balance and transaction histories.
Very recently, researchers have combined the fields of machine learning and zero-knowledge cryptography, giving rise to the emergent field of zkML. Together they can greatly expand the utility, safety, privacy, and trust of machine learning technology.
Enter RISC Zero – a zero-knowledge virtual machine that can execute public or private programs on public or private inputs and generate proofs attesting the correctness of the computation. RISC Zero abstracts away the need to define custom circuits and constraints, offering developers a low friction option for creating custom ZK apps using Rust code, including machine learning models.
Using the Rust-based SmartCore machine learning crate, developers can generate proofs of inference (and training) for a variety of classifier and regression models such as linear and logistic regression, nearest neighbors, decision trees, random forests, and principal component analysis. These models are well suited for applications germane to Ethereum and DeFi protocols such as modeling credit scores, estimating gas fees, and generating pricing models for DExs. For these use cases, the SmartCore library provides a familiar API for developers accustomed to using Python’s Scikit-Learn library.
For developers looking to experiment with provable inference of classifier and regression models, we have provided the following two guides: a Jupyter notebook that explains how to train models in Rust using the SmartCore framework AND a RISC-Zero starter template for proving inference of a private machine learning model. The tutorial uses the Iris data set as a demonstration, but any data set that has been pre-processed and properly formatted using established best practices can be used to train SmartCore models.