FedRecover 论文阅读笔记

Cao, Xiaoyu, et al. "Fedrecover: Recovering from poisoning attacks in federated learning using historical information." 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2023. Paper link

Backgroud and Motivation

目前关于 poisoning attack 的防御主要集中在：

Prevention：对于小规模的恶意客户端，通过 robust 联邦学习方法来检测, 缓解或排除被毒化的本地模型来避免全局模型遭受毒化。
Detection: 检测大规模的恶意客户端

问题：在检测到恶意客户端之后，如何恢复被毒化攻击的全局模型？

Contributions

FedRecover idea 的出发点应该是 poisoning attacks 和 machine unlearning 的结合。但我看到本论文之前有一些文章：针对 poisioning attack 等需求，提出 federeated machine unlearning，应该是没有直接考虑 poisoning attacks 之后 recovery 这个应用场景以及对应的解决办法。

naive method: train-from-scratch 是检测到恶意客户端之后，剔除恶意的，用剩下的客户端重新训练一遍，但是通信和计算开销显然很大。

本文提出的 FedRecover 方法使用检测到恶意攻击者之前的本地模型与更新来估计以后每轮的本地模型更新，从而恢复全局模型。

the server stores the historical information, including the global models and clients’ model updates in each round, when training the poisoned global model before the malicious clients are detected.
During the recovery process, the server estimates a client’s model update in each round using its stored historical information.
当 server 认为 estimated global model update 不够精确时，它可以在进行一轮常规的 FL 训练来提升准确度。

核心公式：

integral version of the Cauchy mean value theorem:

\[ \begin{equation} f(x+y)-f(x)=\int_0^1 f^{\prime}(x+t y) y d t=\int_0^1 f^{\prime}(x+t y) d t \cdot y \end{equation} \]

from Theorem 4.2 on page 341 of S. Lang, Real and Functional Analysis. Springer, 1993.

本文的核心公式：

$$ \[\begin{equation} \begin{aligned} g_t^i=\overline{g}_t^i+\mathbf{H}_t^i\left(\hat{w}_t-\overline{w}_t\right), where\\ \mathbf{H}_t^i=\int_0^1 \mathbf{H}\left(\overline{w}_t+z\left(\hat{w}_t-\overline{w}_t\right)\right) dz \end{aligned} \end{equation}\] $$

其中

$\overline{w}_t$ to denote the original global
$\overline{g}_t^i$ denote the original model update reported by the $i$-th client in the $t$-th round
$\hat{w}^t$ denotes the recovered global model in the $t$-th round of FedRecover.
$g^t_i$ to denote the $i$-th client’s exact model update in the tth round of the recovery process if the client computes it
梯度 g 是模型参数 w 的函数，$g = Q(w)$，具体函数形式取决于使用的 loss function 以及模型的结构。

\[ Q'(\overline{w}_t+z\left(\hat{w}_t-\overline{w}_t\right)) =\mathbf{H}\left(\overline{w}_t+z\left(\hat{w}_t-\overline{w}_t\right)\right)\left(\hat{w}_t-\overline{w}_t\right) \]

则有：

\[ \begin{equation} \begin{aligned} g_t^i - \overline{g}_t^i &= \mathbf{H}_t^i\left(\hat{w}_t-\overline{w}_t\right)\\ Q(1)-Q(0) &= \int_0^1 Q^{\prime}(z) dz \end{aligned} \end{equation} \]

由于直接计算 integrated Hessian matrix 比较困难，本文使用了 L-BFGS 算法来近似计算。

L-BFGS 算法属于最优化算法方面的.

L-BFGS（Limited-memory Broyden-Fletcher-Goldfarb-Shanno）算法是一种常用于无约束优化问题的优化算法。它是 BFGS 算法的一种变体，用于解决高维问题，特别是当存储 Hessian 矩阵不可行时。

BFGS 算法是一种拟牛顿方法，旨在寻找多元函数的局部极小值。它使用梯度信息来近似目标函数的海森矩阵（Hessian matrix），从而进行优化。

L-BFGS 算法通过限制存储和计算 Hessian 矩阵的内存需求，解决了高维问题中的性能问题。L-BFGS 只需要存储少量的历史信息，并通过迭代来逐步逼近 Hessian 矩阵的逆。这使得 L-BFGS 在处理大规模问题时表现优异。