地址:CCS20:Zero Knowledge Proofs for Decision Tree Predictions and Accuracy | Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

zkDT

Zhang et al. 开创了 zkml 的研究,专注于解决黑盒机器学习模型的准确度验证问题(验证阶段的证明)。zkDT 可以实现:模型的验证者在不 reveal 自己的决策树模型的情况下,说服别人 the prediction result is indeed made by his decision tree model on the give the input data sample. 通过扩展一个测试数据的证明到 100, 1000 个测试数据,那样就可以实现对黑盒机器学习模型的验证问题了。

本文专注于 binary decision trees on classification, 并声称更复杂的决策树模型可以通过少量修改来实现.

zkDT construction

recall 一下 zero knowledge proof system 的四个部分:Arithmetization, IOP interactive oracle proof, cryptographic commitment -->a real-world proof system (a non-interactive proof with Fiat-Shamir transformation)

zkDT 采用 Aurora: Transparent Succinct Arguments for R1CS 实现 Arithmetization, IOP interactive oracle proof 两部分。Aurora 提出针对 R1CS 算术化进行优化的 Polynomail IOP,实现更快的证明时间与更小的证明大小,并且不需要 trusted Setup。在 cryptographic commitment 方面,zkDT 通过设计 authenticated decision trees with hiding and binding properties 作为 commitment scheme 来将 idealized IOP protocol(with Random Oracle Model) compile 为真正的 zero knowledge argument with computational soundness.

注:Preliminary 写的很好,可以借鉴

binary decision trees 的预测过程为:不断与非叶子节点的对应属性 attribute 的分类阈值 threshold 进行比较,从而分类到左右节点,直到叶子节点,便是预测的类别。

非叶子节点包括: 类别,类别对应的阈值

zkDT 算法流程

  1. Given security parameter \(1^{\lambda}\),Setup (Note: the height of the decision tree is a common knowledge for both prover and verifier)

  2. make the commitment of a decision tree T with a random r

  3. Open the commitment at point \(x\).

    given a data point \(x1\), the decision tree model predict the corresponding results \(y1\). With the (x1,y1) as public inputs and the model parameter as witness, the prover will give the corresponding proof \(\pi\).

  4. valid the prediction \(y1\) is indeed the output of the exact decision model with the input \(x1\).

Contributions:

  1. 开创了 zk on ml 的研究,实现决策树验证阶段的验证以及准确度验证。
  2. 设计了 authenticated data structures scheme for the commitment of decision tree (with hiding property)
  3. 利用通用零知识证明系统 Aurora 实现决策树的预测阶段验证,将 prediction path and the hashes of the siblings 作为 the witness.
  4. 借助 permutation 以及 characteristic polynomials 来解决 circuit model 下 zkp for decision tree 随机访问开销大(证明时间,证明大小)的问题

authenticated decision trees commitment

本文提出直接运用 merkle tree proof 在所有决策树节点上,开销是 O(hlogN) hashes。(不懂,merkle tree 不是 logN 吗?),于是根据 ADS(Authenticated Data Structure),提出 authenticated decision trees (ADT) ,它具有 completeness, computational soundness benefiting from collision resistance of the cryptographic hash functions, and hiding property benefiting the blinding factor(致盲因子,just a secure random number).

题外话,Authenticated Data Structure 是一大类,merkle tree 是其中一种。Past work on ADSs has focused on particular data structures, 最早被提出的便是针对与树结构的 ADS, 也就是 merkle tree.

proving the valid of prediction with zero Knowledge

prover 直接把 ADT 的 验证 path 给到 verifier,verifier 就可以验证,但是这样就暴露了一条路径上的值。

解决办法:直接用通用的 zkp,将预测路径和 sibling 的 hashes 设置成 witness (private)即可。

By applying generic zero knowledge proofs on this relationship, the prediction path and the hashes of the siblings remain confidential as the witness, a

zkp for decision tree 在 circuit model 下开销大(证明时间和证明大小)

大多数通用 zkp 协议都是基于将运算表示为算术电路的,将决策树表示为算术电路开销很大。(节点预测时,根据节点分类属性随机 access 数据点的对应属性,是 classical random access operation,算数电路对这种操作不友好?不懂)

遇事不决 GPT:

Classical random access operations are not friendly with arithmetic circuits because they require the circuit to perform a large number of memory accesses, which can be computationally expensive . In an arithmetic circuit, each gate computes a linear combination of its input wires, and the output of the circuit is a linear combination of the input wires .

To perform a random access operation on an arithmetic circuit, one would need to access a specific wire in the circuit, which would require traversing the circuit from the input wires to the desired wire . This traversal would involve performing a large number of linear combinations, which can be computationally expensive To avoid this computational overhead, researchers have developed alternative methods for performing random access operations on arithmetic circuits, such as the use of permutation arguments and the use of interactive oracle proofs (IOPs)

本文采用的 alternative methods 仿佛就是 the use of permutation arguments?将 data sample 的属性顺序 premute 调整为 prediction path 上节点的属性 v.attr 顺序,二者顺序一致,便可一一对应,不用 random access 操作了,便降低了 decision tree 在 circuit model 上的开销。

the permutation \(\overline{a}\) of the data sample a ordered by v.att of the nodes on the prediction path as part of the witness With the help of the permutation \(\overline{a}\) of the data sample a, this can be efficiently implemented using an arithmetic circuit.

不过,又引入了 permutation test 的问题,毕竟胡乱 permutation 混过验证也不得行,然后本文采用如下方法:

The construction is inspired by the RAM-based zero knowledge proof systems and we also apply the techniques of characteristic polynomials proposed in [15, 43, 45] to check permutations.

因此,zkDT 对一个数据点的验证包括三个部分:

  1. validating the prediction algorithm of the decision tree, 这就是验证预测阶段的执行,是主要部分
  2. checking the permutation between a and \(\overline{a}\),这个为了减小开销,引入的 permutation 带来的额外验证步骤。
  3. checking the validity of the prediction path in the committed decision tree, 这个是 open 之前的 commitment 看 prediction path 是否被篡改。

步骤 1 和步骤 3 是 zkml 的主要步骤,步骤二是额外步骤。

zkDT 代码复现

环境:Ubuntu 18.04 and 20.04

地址:TAMUCrypto/ZKDT_release,直接 git clone 即可

  1. 安装缺少的包:

    1
    2
    3
    sudo apt-get install -y libprocps-dev # libprocps
    sudo apt-get install libssl-dev # libcrypto
    sudo apt-get install pkg-config

    针对报错:

    No Boost libraries were found. You may need to set BOOST_LIBRARYDIR to the directory containing Boost libraries or BOOST_ROOT to the location of Boost.

    1
    sudo apt-get install libboost-all-dev

  2. 编译

    1
    2
    3
    cd ZKDT
    mkdir build && cd build && cmake ..
    make

  3. 运行

    在 build 的 src 文件夹下直接运行编译生成的二进制文件即可。

    1
    ./build/src/dt_batch

参考资料

  1. Authenticated Data Structures (Generically) - The PL Enthusiast
  2. c++ - CMake is not able to find BOOST libraries - Stack Overflow
  3. C++ Standards Support in GCC - GNU Project