Practical Differentially Private and Byzantine-resilient Federated Learning

Practical Differentially Private and Byzantine-resilient Federated Learning ，源代码: zihangxiang/Practical-Differentially-Private-and-Byzantine-resilient-Federated

发表在了 Proceedings of the ACM on Management of Data (PACMMOD) 2023, 看知乎上说是 SIGMOD 23 的文章改为在 PACMMOD 期刊。

Backgroud and Motivation

如何在实现 privacy 的前提下实现 Byzantine robust FL, 关于使用差分隐私保护模型参数和使用 Byzantine-robust 方式来缓解毒化等攻击的研究都已经有了，但是如何同时实现这两个目标还没有较好的结合。

Contributions

本文提出将

DP-SGD
类似于 NDSS'21 FLTrust（normalize and use cosine similarity）但是衡量指标改为向量内积的检测攻击

相结合，实现了 DP 和 Byzantine-robust 的联邦学习。

Byzantine_detect { DP{W_i} }(先加 DP，然后再检测恶意攻击)

第一阶段检查

由于随机噪声的引入会影响现存 byzantine attacks 检测的效果，现存有些工作通过增大 batch size 来限制噪声的影响。本文使用 small batch size 来作为第一阶段的聚合。第一阶段的 small training batch size for each worker 和第一阶段检查，可以防止一些攻击

具体方式

Different from existing works that adopt big batch size (\(10^2 −10^6\)), we adopt small batch size \(𝑏_𝑐\) (typically 8 or 16).
idea

small batch size 可以使得每次的上传结果中 DP noise 占据主导，根据加的高斯噪声，从而推导出上传结果的统计特性来检测攻击。此外，尽管每个 worker 的上传结果中噪声占据主导，但是所有的上传结果平均之后，便会减小噪声的方差，平均梯度到非零期望，便可以实现较好的效用。
细节

每个客户端的上传结果中 DP noise 占据主导，因此可以将上传结果看作噪声采样的高斯分布 N(0, \(\sigma^2\)), 根据高斯分布和卡方分布的特性，可以得知 \(\frac{||g||^2}{\sigma^2}\) 服从卡方分布 chi-squared distribution。根据中心极限定理，可以将 \(∥𝑔∥^2\) 近似为 Gaussian distribution: \(N(𝜎^2𝑑, 2𝜎^4𝑑)\), 从而 \(||g||^2\) 落在 3𝜎 范围之内

通过 1. 检查\(||g||^2\) 是否落在 3𝜎 范围之内以及 2. 通过 Kolmogorov–Smirnov test (KS test) 检测 g 是否是 Gaussian distribution 来检测攻击

第二阶段聚合

发现 bounding gradient sensitivity by normalizing 更适合 DP learning。通过对 normalize 和向量内积来作为第二阶段的检查。

use normalizing instead of clipping to bound per-example gradient norm 和第二阶段检查可以阻止绕过第一阶段的其他攻击

具体方式

Replace the clipping operation in vanilla DP-SGD by normalization to normalization, which is change the multiplication factor from Factor = min{1, \(\frac{𝐶}{∥g∥_2}\)} (𝐶 is called the clipping threshold) to Factor = \(\frac{1}{∥g∥_2}\), which normalizes the gradients to be of unit 𝑔 length.
idea

通过 server 维护的 trusted 梯度与每个客户端的梯度计算向量内积作为 score 来检测梯度更新的方向，选择 top \(⌈𝛾𝑛⌉\) scores (要求 server 预先确定 n 个 worker 中至少有 𝛾 比例的良性节点)的平均数作为筛选阈值。

局限性

要求 server 预先确定 n 个 worker 中至少有 𝛾 比例的良性节点

the server has a prior belief that at least ⌈𝛾𝑛⌉ workers are honest,
只能应用于高斯噪声
first aggregation 的约束看起来并不 solid？
1. 本文仿佛是先 DP，然后使用 FLTrust 的 normalization 和相似度换成内积直接组合而成。第一阶段的确实增加了约束，但是我没有看到该约束和第二阶段约束的互补性在哪里
虽然作者在文章中 4.7 Discussion 讲到第二阶段可能会因为噪声的随机性效果不好，但是使用第一阶段会好的原因，它只解释了因为 norm 了所以威胁小了，并没有说第一阶段的两个约束具体约束了哪些攻击,虽然确实多了一些正确约束。
1. 第二阶段使用的客户端梯度是第一阶段之后的，但是还是加了噪声，byzantine attack 检测也会不准确吧。

参考资料

如何评价 SIGMOD23 的文章改为在 PACMMOD 期刊上发表？ - 知乎