1 导引

我们在上一篇博客《学习理论：预测器-拒绝器多分类弃权学习》中介绍了弃权学习的基本概念和方法，其中包括了下列针对多分类问题的单阶段预测器-拒绝器弃权损失$L_{\text{abst}}$：

\[L_{\text{abst}}(h, r, x, y) = \underbrace{\mathbb{I}{\text{h}(x) \neq y}\mathbb{I}{r(x) > 0}}{\text{不弃权}} + \underbrace{c(x) \mathbb{I}{r(x)\leqslant 0}}_{\text{弃权}} \]

其中$(x, y)\in \mathcal{X}\times \mathcal{Y}$（标签$\mathcal{Y} = \{1, \cdots, n\}$（$n\geqslant 2$）），$(h, r)\in \mathcal{H}\times\mathcal{R}$为预测器-拒绝器对（$\mathcal{H}$和$\mathcal{R}$为两个从$\mathcal{X}$到$\mathbb{R}$的函数构成的函数族），$\text{h}(x) = \text{arg max}_{y\in \mathcal{Y}} {h(x)}_y$直接输出实例$x$的预测标签。为了简化讨论，在后文中我们假设$c\in (0, 1)$为一个常量花费函数。

设$\mathcal{l}$为在标签$\mathcal{Y}$上定义的0-1多分类损失的代理损失，则我们可以在此基础上进一步定义弃权代理损失$L$：

\[L(h, r, x, y) = \mathcal{l}(h, x, y)\phi(-\alpha r(x)) + \psi(c) \phi(\beta r(x)) \]

其中$\psi$是非递减函数，$\phi$是非递增辅助函数（做为$z \mapsto \mathbb{I}_{z \leqslant 0}$的上界），$\alpha$、$\beta$为正常量。下面，为了简便起见，我们主要对$\phi(z) = \exp(-z)$进行分析，尽管相似的分析也可以应用于其它函数$\phi$。

在上一篇博客中，我们还提到了单阶段代理损失满足的$(\mathcal{H}, \mathcal{R})$-一致性界：

定理 1 单阶段代理损失的$(\mathcal{H}, \mathcal{R})$ - 一致性界 假设$\mathcal{H}$是对称与完备的。则对$\alpha=\beta$，$\mathcal{l} = \mathcal{l}{\text{mae}}$，或者$\mathcal{l} = \mathcal{l}{\rho}$与$\psi(z) = z$，或者$\mathcal{l} = \mathcal{l}_{\rho - \text{hinge}}$与$\psi(z) = z$，有下列$(\mathcal{H}, \mathcal{R})$ - 一致性界对$h\in \mathcal{H}, r\in \mathcal{R}$和任意分布成立：

其中对$\mathcal{l} = \mathcal{l}{\text{mae}}$取$\Gamma (z) = \max\{2n\sqrt{z}, nz\}$；对$\mathcal{l}=\mathcal{l}{\rho}$取$\Gamma (z) = \max\{2\sqrt{z}, z\}$；对$\mathcal{l} = \mathcal{l}_{\rho - \text{hinge}}$取$\Gamma (z) = \max\{2\sqrt{nz}, z\}$。

不过，在上一篇博客中，我们并没有展示单阶段代理损失的$(\mathcal{H}, \mathcal{R})$-一致性界的详细证明过程，在这片文章里我们来看该如何对该定理进行证明（正好我导师也让我仔细看看这几篇论文中相关的分析部分，并希望我掌握单阶段方法的证明技术）。

2 一些分析的预备概念

我们假设带标签样本$S=((x_1, y_1), \cdots, (x_m, y_m))$独立同分布地采自$p(x, y)$。则对于目标损失$L_{\text{abst}}$和代理损失$L$而言，可分别定义$L_{\text{abst}}$-期望弃权损失$R_{L}(h, r)$（也即目标损失函数的泛化误差）和$L$-期望弃权代理损失$R_{L}(h, r)$（也即代理损失函数的泛化误差）如下：

\[R_{L_{\text{abst}}}(h, r) = \mathbb{E}{p(x, y)}\left[L{\text{abst}}(h, r, x, y)\right], \quad R_{L}(h, r) = \mathbb{E}_{p(x, y)}\left[L(h, r, x, y)\right] \]

设$R_{{L}^{*}{\text{abst}}}(\mathcal{H}, \mathcal{R}) = \inf{h\in \mathcal{H}, r\in \mathcal{R}}R_{L_{\text{abst}}}(\mathcal{H}, \mathcal{R})$和$R_{L}^{*}(\mathcal{H}, \mathcal{R}) = \inf_{h\in \mathcal{H}, r\in \mathcal{R}}R_{L}(\mathcal{H}, \mathcal{R})$分别为$R_{L_{\text{abst}}}$和$R_L$在$\mathcal{H}\times \mathcal{R}$上的下确界。

为了进一步简化后续的分析，我们根据概率的乘法规则将$R_L(h, r)$写为：

\[R_{L}(h, r) = \mathbb{E}{p(x, y)}\left[L(h, r, x, y)\right] = \mathbb{E}{p(x)}\underbrace{\left[\mathbb{E}{p(y\mid x)}\left[L(h, r, x, y)\right]\right]}{\text{conditional risk }C_L} \]

我们称其中内层的条件期望项为代理损失$L$的条件风险（conditional risk） （也称为代理损失$L$的pointwise风险），由于在其计算过程中$y$取期望取掉了，因此该项只和$h$、$r$、$x$相关，因此我们将其记为$C_L(h, r, x)$：

\[C_L(h, r, x) = \mathbb{E}{p(y\mid x)}\left[L(h, r, x, y)\right] = \sum{y\in \mathcal{Y}}p(y\mid x)L(h, r, x, y) \]

我们用$C^*L(\mathcal{H}, \mathcal{R}, x) = \inf{h\in \mathcal{H}, r\in \mathcal{R}} C_L(h, r, x)$来表示假设类最优（best-in-class） 的$L$的条件风险。同理，我们用$C_{L_{\text{abst}}}$来表示目标损失$L_{\text{abst}}$的条件风险，并用$C^*{L{\text{abst}}}$来表示假设类最优的$L_{\text{abst}}$的条件风险。

根据$R_{L}^*(h, r)$和$C^*_L(\mathcal{H}, \mathcal{R}, x)$，我们可以表示出最小化能力差距（minimizability gap）：

\[M_L(\mathcal{H}, \mathcal{R}) = R_{L}^*(\mathcal{H}, \mathcal{R}) - \mathbb{E}_{p(x)}\left[C_L^*(\mathcal{H}, \mathcal{R}, x)\right] \]

$M_{L_{\text{abst}}}$的表示同理。

于是，我们可以对要证明的$(\mathcal{H}, \mathcal{R})$-一致性界进行改写：

\[R_{L_{\text{abst}}}(h, r) - R_{L_{\text{abst}}}^{*}(\mathcal{H}, \mathcal{R}) + M_{L_{\text{abst}}}(\mathcal{H}, \mathcal{R}) \leqslant \Gamma(R_L(h, r) - R_{L}^{*}(\mathcal{H}, \mathcal{R}) + M_{L}(\mathcal{H}, \mathcal{R}))\\ \Rightarrow R_{L_{\text{abst}}}(h, r) - \mathbb{E}{p(x)}\left[C{L_{\text{abst}}}^*(\mathcal{H}, \mathcal{R}, x)\right] \leqslant \Gamma\left(R_{L}(h, r) - \mathbb{E}{p(x)}\left[C{L}^*(\mathcal{H}, \mathcal{R}, x)\right]\right) \]

其中$R_{L_{\text{abst}}}(h, r)$和$R_L(h, r)$分别为$\mathbb{E}{p(x)}C{L_{\text{abst}}}(h, r, x)$和$\mathbb{E}{p(x)}C{L}(h, r, x)$，于是上述不等式即为

\[\mathbb{E}{p(x)}\underbrace{\left[C{L_{\text{abst}}}(h, r, x) - C_{L_{\text{abst}}}^*(\mathcal{H}, \mathcal{R}, x)\right]}{\Delta C{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)} \leqslant \Gamma\left(\mathbb{E}{p(x)}\underbrace{\left[C{L}(h, r, x) - C_{L}^*(\mathcal{H}, \mathcal{R}, x)\right]}{\Delta C{L, \mathcal{H}, \mathcal{R}}(h, r, x)}\right) \]

我们将上述不等式两边的被取期望的项简记为$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)$和$\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)$，其中$\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)$被称为校准差距（calibration gap） 。由于按定义$\Gamma(\cdot)$是凹函数，由Jensen不等式有：

\[\mathbb{E}{p(x)}\left[\Gamma\left(\Delta C{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)\right] \leqslant \Gamma\left(\mathbb{E}{p(x)}\left[\Delta C{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right]\right) \]

于是，若我们能证明下述不等式，则原不等式得证：

\[\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right) \]

我们后面将会看到，$(\mathcal{H}, \mathcal{R})$-一致性界的证明过程中重要的一步即是证明$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)$能被$\Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)$界定。

3 $\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)$的表示

我们先来看$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) = C_{L_{\text{abst}}}(h, r, x) - C^*{L{\text{abst}}}(\mathcal{H}, \mathcal{R}, x)$如何表示。根据定义，我们有：

\[\begin{aligned} C_{L_{\text{abst}}}(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x)L_{\text{abst}}(h, r, x, y) \\ &= \sum_{y\in \mathcal{Y}}p(y\mid x) \mathbb{I}{\text{h}(x) \neq y}\mathbb{I}{r(x) > 0} + c(x) \mathbb{I}_{r(x)\leqslant 0} \end{aligned} \]

由于是关于$y$的条件期望，上式最后一行中只需要对$\mathbb{I}{\text{h}(x) \neq y}$进行加权求和即可。为了进一步对$C{L_{\text{abst}}}(h, r, x)$进行表示，我们需要对$r(x)$的正负情况进行分类讨论：

$r(x) > 0$：此时$C_{L_{\text{abst}}}(h, r, x) = \sum_{y\in \mathcal{Y}}p(y\mid x) \mathbb{I}_{\text{h}(x) \neq y} = 1 - p(\text{h}(x)\mid x)$。
$r(x) \leqslant 0$：此时$C_{L_{\text{abst}}}(h, r, x) = c$。

接下来我们来看$C^*{L{\text{abst}}}$如何表示。我们假设拒绝函数集$\mathcal{R}$是完备的（也即对任意$x\in \mathcal{X}, \{r(x): r\in \mathcal{R}\} = \mathbb{R}$），那么$\mathcal{R}$也是弃权正规的（也即使得对任意$x\in \mathcal{X}$，存在$r_1, r_2\in \mathcal{R}$满足$r_1(x) > 0$与$r_2(x) \leqslant 0$）。于是我们有

\[\begin{aligned} C^*{L{\text{abst}}}(\mathcal{H}, \mathcal{R}, x) &= \inf_{h\in \mathcal{H}, r\in \mathcal{R}}C_{L_{\text{abst}}}(h, r, x)\\ & = \min \left\{\min_{h\in \mathcal{H}}\left(1 - p\left( \text{h}(x)\mid x\right)\right), c\right\}\\ & = 1 - \max\left\{\max_{h\in \mathcal{H}}p\left(\text{h}(x)\mid x\right), 1 - c\right\} \end{aligned} \]

我们假设$\mathcal{H}$是对称的且完备的（具体定义参见博客《学习理论：预测器-拒绝器多分类弃权学习》），则我们有$\{\text{h}(x): h\in \mathcal{H}\} = \mathcal{Y}$，于是

\[C^*{L{\text{abst}}}(\mathcal{H}, \mathcal{R}, x) = 1 - \max\left\{\max_{y\in \mathcal{Y}}p\left(y\mid x\right), 1 - c\right\} \]

为了进一步对$C^*{L{\text{abst}}}(\mathcal{H}, \mathcal{y}, x)$进行表示，我们需要对$\max_{y\in \mathcal{Y}}p(y\mid x)$和$(1 - c)$的大小比较情况进行分类讨论：

$\max_{y\in \mathcal{Y}}p(y\mid x) > 1 - c$：此时$C^*{L{\text{abst}}}(\mathcal{H}, \mathcal{R}, x) = 1 - \max_{y\in \mathcal{Y}}p(y\mid x)$。
$\max_{y\in \mathcal{Y}}p(y\mid x) \leqslant 1 - c$：此时$C^*{L{\text{abst}}}(\mathcal{H}, \mathcal{R}, x) = c$。

于是，我们有：

\[\begin{aligned} \Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) &= C_{L_{\text{abst}}}(h, r, x) - C^*{L{\text{abst}}}(\mathcal{H}, \mathcal{R}, x) \\ & = \left\{\begin{aligned} &\max_{y\in \mathcal{Y}}p(y\mid x) - p(\text{h}(x)\mid x)\quad &\text{if } \max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)，r(x) > 0 \\ &1 - c - p(\text{h}(x)\mid x) \quad &\text{if } \max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)，r(x) > 0 \\ &0 \quad &\text{if } \max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)，r(x) \leqslant 0 \\ &\max_{y\in \mathcal{Y}}p(y\mid x) - 1 + c \quad &\text{if } \max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)，r(x) \leqslant 0 \\ \end{aligned}\right. \end{aligned} \]

4 $\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)$的表示

4.1 分类讨论的准备

接下来我们来看$\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) = C_L(h, r, x) - C^*_L(\mathcal{H}, \mathcal{R}, x)$如何表示。根据定义，若$\alpha = \beta$，$\phi(z) = \exp(-z)$，我们有：

\[\begin{aligned} C_L(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x)L(h, r, x, y) \\ &= \sum_{y\in \mathcal{Y}}p(y\mid x) \mathcal{l}(h, x, y)e^{\alpha r(x)} + \psi(c) e^{-\alpha r(x)} \end{aligned} \]

由于是关于$y$的条件期望，上式最后一行中只需要对$\mathcal{l}(h, x, y)$进行加权求和即可。在后文中我们将会针对下列三种不同的$\mathcal{l}$函数以及$\psi(z)$的选择情况来分别对$C_L(h, r, x)$进行讨论：

$\mathcal{l} = \mathcal{l}_{\text{mae}}$，$\psi(z) = z$；
$\mathcal{l} = \mathcal{l}_{\rho}$，$\psi(z) = z$；
$\mathcal{l} = \mathcal{l}_{\rho-\text{hinge}}$，$\psi(z) = nz$。

注这三种不同$\mathcal{l}$的定义参见博客《学习理论：预测器-拒绝器多分类弃权学习》），我在这里把它们的定义贴一下：

平均绝对误差损失：$\mathcal{l}{\text{mae}}(h, x, y) = 1 - \frac{e^{{h(x)}y}}{\sum{y^{\prime}\in \mathcal{Y}}e^{{h(x)}{y^{\prime}}}}$；

约束$\rho$-合页损失：$\mathcal{l}{\rho-\text{hinge}}(h, x, y) = \sum{y^{\prime}\neq y}\phi_{\rho-\text{hinge}}(-{h(x)}{y^{\prime}}), \rho > 0$，其中$\phi{\rho-\text{hinge}}(z) = \max\{0, 1 - \frac{z}{\rho}\}$为$\rho$-合页损失，且约束条件$\sum_{y\in \mathcal{Y}}{h(x)}_y=0$。

$\rho$-间隔损失：$\mathcal{l}{\rho}(h, x, y) = \phi{\rho}({\rho_h (x, y)})$，其中$\rho_{h}(x, y) = h(x)y - \max{y^{\prime} \neq y}h(x){y^{\prime}}$是置信度间隔，$\phi{\rho}(z) = \min\{\max\{0, 1 - \frac{z}{\rho}\}, 1\}, \rho > 0$为$\rho$-间隔损失。

4.2 $\mathcal{l} = \mathcal{l}_{\text{mae}}$，$\psi(z) = z$

在这种情况下$C_L(h, r, x)$可以表示为：

\[\begin{aligned} C_L(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x) \underbrace{\left(1 - \frac{e^{{h(x)}y}}{\sum{y^{\prime}\in \mathcal{Y}}e^{{h(x)}{y^{\prime}}}}\right)}{\mathcal{l}{\text{mae}}}e^{\alpha r(x)} + c e^{-\alpha r(x)} \\ &= \sum{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} \end{aligned} \]

其中$s_h(x, y) = \frac{e^{{h(x)}y}}{\sum{y^{\prime}\in \mathcal{Y}}e^{{h(x)}_{y^{\prime}}}}$。

于是

\[\begin{aligned} C_L^*(\mathcal{H}, \mathcal{R}, x) &= \inf_{h\in \mathcal{H}, r\in\mathcal{R}} \left\{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)}\right\} \\ &= \inf_{r\in\mathcal{R}} \left\{\inf_{h\in \mathcal{H}}\left\{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)\right\}e^{\alpha r(x)} + c e^{-\alpha r(x)}\right\} \end{aligned} \]

由于假设了$\mathcal{H}$是对称的与完备的，我们有

\[\begin{aligned} &\inf_{h\in \mathcal{H}}\left\{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y\right))\right\} \\ &= 1 - \sup_{h\in \mathcal{H}}\sum_{y\in \mathcal{Y}}p(y\mid x)s_h(x, y) \\ &= 1 - \max_{y\in \mathcal{Y}}p(y\mid x)\quad \left(s_h(x, y)\in (0, 1)\right) \end{aligned} \]

注实际上，对任意$h\in \mathcal{H}$，有：

\[\begin{aligned} &\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right) - \left(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right)\right) \\ &= \max_{y\in \mathcal{Y}} p(y\mid x) - \sum_{y\in \mathcal{Y}}p(y\mid x)s_h(x, y) \\ &= \max_{y\in \mathcal{Y}} p(y\mid x) - \left(p\left(\text{h}(x)\mid x\right)s_h\left(x, \text{h}(x)\right) + \sum_{y\neq \text{h}(x)}p(y\mid x)s_h(x, y)\right) \\ &\geqslant \max_{y\in \mathcal{Y}} p(y\mid x) - \left(p\left(\text{h}(x)\mid x\right)s_h\left(x, \text{h}(x)\right) + \max_{y\in \mathcal{Y}}p(y\mid x)\left(1 - s_h\left(x, \text{h}(x)\right)\right)\right) \\ &= s_h\left(x, \text{h}(x)\right)\left(\max_{y\in \mathcal{Y}}p(y\mid x) - p\left(\text{h}(x)\mid x\right)\right) \\ &\geqslant \frac{1}{n} \left(\max_{y\in \mathcal{Y}}p(y\mid x) - p\left(\text{h}(x)\mid x\right)\right) \end{aligned} \]

这个结论我们会在后面的证明中多次用到。该结论的一个推论是如果分类器$h^*$为贝叶斯最优分类器（也即$p(\text{h}^*(x)\mid x) = \max_{y\in \mathcal{Y}} p(y\mid x)$），则$\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right) - \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right) \geqslant 0$，可直观地将其理解为$\mathbb{E}{p(y\mid x)}\left[\mathcal{l}{\text{mae}}\right]$更可能接近其下确界。

于是

\[C_L^*(\mathcal{H}, \mathcal{R}, x) = \inf_{r\in\mathcal{R}} \left\{\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)}\right\} \]

记上式中需要求极值的部分为泛函$F(r)$，则其泛函导数为

\[\frac{\delta F}{\delta r(x)} = \alpha \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)e^{\alpha r(x)} - c\alpha e^{-\alpha r(x)} \]

令$\frac{\delta F}{\delta p(x)} = 0$（对$\forall x\in \mathcal{X}$），解得$r^*(x) = -\frac{1}{2\alpha}\log \left(\frac{1 - \max_{y\in \mathcal{Y}}p(y\mid x)}{c}\right)$。将其代入$F(r)$可得：

\[C_L^*(\mathcal{H}, \mathcal{R}, x) = 2\sqrt{c(1 - \max_{y\in \mathcal{Y}}p(y\mid x))} \]

于是

\[\begin{aligned} \Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= C_L(h, r, x) - C^*L(\mathcal{H}, \mathcal{R}, x) \\ &= \sum{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c(1 - \max_{y\in \mathcal{Y}}p(y\mid x))} \end{aligned} \]

为了构建$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)$和$\Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)$的不等式关系，接下来我们将会采用第3节中类似的做法，针对$\max_{y\in \mathcal{Y}} p(y\mid x)$与$1 - c$的大小比较情况与$r(x)$的正负情况来对$\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)$进行分类讨论：

$\max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)$，$r(x) > 0$：

此时

\[\begin{aligned} \Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)} \\ & \geqslant \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - \left(c + \underbrace{\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)}{<c}\right) \\ & \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad (\text{AM-GM inequality}) \\ &\geqslant \sum{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - ce^{-\alpha r(x)} - \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)e^{\alpha r(x)} \\ &\geqslant \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right) - \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)\\ &\geqslant \frac{1}{n} \left(\max_{y\in \mathcal{Y}}p(y\mid x) - p\left(\text{h}(x)\mid x\right)\right) \\ &= \frac{1}{n} \Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \end{aligned} \]

（其中$\text{AM-GM inequality}$为算术-几何平均值不等式）

取$\Gamma (z) = nz$，于是$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)$得证。
$\max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)$，$r(x) > 0$：

\[ \begin{aligned} \Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)} \\ & \geqslant \underbrace{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h\left(x, y\right)\right)}{\geqslant c}e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c\left(\sum{y\in \mathcal{Y}}p(y\mid x)\left(1 - s_h(x, y)\right)\right)} \\ & \geqslant \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right) + c - 2\sqrt{c\left(\sum_{y\in \mathcal{Y}}p(y\mid x)\left(1 - s_h(x, y)\right)\right)} \\ &= \left(\sqrt{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)} - \sqrt{c}\right)^2 \\ &= \left(\frac{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right) - c}{\sqrt{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)} + \sqrt{c}}\right)^2 \\ &\geqslant \left(\frac{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right) - \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right) + \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x) - c\right)}{2}\right)^2 \\ &\geqslant \left(\frac{\frac{1}{n} \left(\max_{y\in \mathcal{Y}}p(y\mid x) - p\left(\text{h}(x)\mid x\right)\right) + \frac{1}{n}\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x) - c\right)}{2}\right)^2 \\ &= \frac{1}{4n^2}\left(1 - c - p\left(\text{h}(x)\mid x\right)\right)^2 \\ &= \frac{\Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2}{4n^2} \end{aligned} \]

取$\Gamma (z) = 2n\sqrt{z}$，于是$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)$得证。
$\max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)$，$r(x) \leqslant 0$：

由于此时$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) = 0$，因此$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma\left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)$对任意$\Gamma \geqslant 0$成立。
$\max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)$，$r(x) \leqslant 0$：

\[ \begin{aligned} \Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)} \\ &\geqslant \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)\underbrace{e^{\alpha r(x)}}{\leqslant 1} + c \underbrace{e^{-\alpha r(x)}}{\geqslant 1} - 2\sqrt{c\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)} \\ &\geqslant 1 - \max_{y\in \mathcal{Y}}p(y\mid x) + c - 2\sqrt{c\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)} \\ &= \left(\sqrt{1 - \max_{y\in \mathcal{Y}}p(y\mid x)} - \sqrt{c}\right)^2 \\ &= \left(\frac{1 - \max_{y\in \mathcal{Y}}p(y\mid x) - c}{\sqrt{1 - \max_{y\in \mathcal{Y}}p(y\mid x)} + \sqrt{c}}\right)^2 \\ &\geqslant \left(\frac{\max_{y\in \mathcal{Y}}p(y\mid x) - 1 + c}{2}\right)^2 \\ &= \frac{\Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2}{4} \end{aligned} \]

取$\Gamma (z) = 2\sqrt{z}$，于是$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)$得证。

综上所述，若取$\Gamma(z) = \max\{\Gamma_1(z), \Gamma_2(z), \Gamma_3(z)\} = \max\{2n\sqrt{z}, nz\}$，则恒有$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)$。于是$\mathcal{l} = \mathcal{l}_{\text{mae}}$，$\psi(z) = z$时单阶段代理损失的$(\mathcal{H}, \mathcal{R})$-一致性界得证。

4.3 $\mathcal{l} = \mathcal{l}_{\rho}$，$\psi(z) = z$

在这种情况下$C_L(h, r, x)$可以表示为：

\[\begin{aligned} C_L(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x) \underbrace{\min\left\{\max\left\{0, 1 - \frac{\rho_h(x, y)}{\rho}\right\}, 1\right\}}{\mathcal{l}{\rho}}e^{\alpha r(x)} + c e^{-\alpha r(x)} \\ &= \left(1 - \sum_{y\in \mathcal{Y}} p(y\mid x)\max\left\{\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}, 0\right\}\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} \\ &= \left(1 - \sum_{y\in \mathcal{Y}} p(y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} \end{aligned} \]

其中$\rho_h(x, y) = h(x)y - \max{y^{\prime}\neq y}h(x)_{y^{\prime}}$为间隔。

由于假设了$\mathcal{H}$是对称的与完备的，我们有

\[\begin{aligned} &\inf_{h\in \mathcal{H}}\left\{1 - \sum_{y\in \mathcal{Y}} p (y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}\right\} \\ &= 1 - \sup_{h\in \mathcal{H}}\sum_{y\in \mathcal{Y}}p(y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\} \\ &= 1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right)\quad (\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}\in [0, 1]) \end{aligned} \]

注实际上，对任意$h\in \mathcal{H}$，有：

\[\begin{aligned} &\left(1 - \sum_{y\in \mathcal{Y}} p (y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}\right) - \left(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right)\right) \\ &= \max_{y\in \mathcal{Y}} p(y\mid x) - \sum_{y\in \mathcal{Y}} p (y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\} \\ &= \max_{y\in \mathcal{Y}} p(y\mid x) - \min \left\{1, \frac{\rho_h\left(x, \text{h}(x)\right)}{\rho}\right\}p\left(\text{h}(x)\mid x\right) \\ &\geqslant \max_{y\in \mathcal{Y}}p\left(y\mid x\right) - p\left(\text{h}(x)\mid x\right) \end{aligned} \]

和之前$\mathcal{l}_{\text{mae}}$的证明类似，这个结论我们会在后面的证明中多次用到。

于是和之前$\mathcal{l}_{\text{mae}}$类似，我们有

\[C_L^*(\mathcal{H}, \mathcal{R}, x) = 2\sqrt{c(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right))} \]

于是

\[\begin{aligned} \Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= C_L(h, r, x) - C^*L(\mathcal{H}, \mathcal{R}, x) \\ &= \left(1 - \sum{y\in \mathcal{Y}} p (y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right))} \end{aligned} \]

为了构建$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)$和$\Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)$的不等式关系，接下来我们将会采用$\mathcal{l}{\text{mae}}$的证明中类似的做法，针对$\max{y\in \mathcal{Y}} p(y\mid x)$与$1 - c$的大小比较情况与$r(x)$的正负情况来对$\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)$进行分类讨论：

$\max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)$，$r(x) > 0$：

此时

\[\begin{aligned} \Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= \left(1 - \sum_{y\in \mathcal{Y}} p (y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c\left(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right)\right)} \\ &\geqslant \frac{1}{4}\left(1 - c - p\left(\text{h}\left(x\right)\mid x\right)\right)^2 \\ &= \frac{\Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2}{4} \end{aligned} \]

（由于证明步骤与$\mathcal{l}_{\text{mae}}$类似，这里对证明步骤进行了一些精简，下面同理）

取$\Gamma_2 (z) = 2\sqrt{z}$，于是$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))$得证。
$\max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)$，$r(x) > 0$：

\[\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) \geqslant \max_{y\in \mathcal{Y}}p(y\mid x) - p(\text{h}(x)\mid x) = \Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \]

取$\Gamma_1 (z) = z$，于是$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))$得证。
$\max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)$，$r(x) \leqslant 0$：

由于此时$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) = 0$，因此$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))$对任意$\Gamma \geqslant 0$成立。
$\max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)$，$r(x) \leqslant 0$：

\[\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) \geqslant \left(\frac{\max_{y\in \mathcal{Y}}p(y\mid x) - 1 + c}{2}\right)^2 = \frac{\Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2}{4} \]

取$\Gamma_3 (z) = 2\sqrt{z}$，于是$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))$得证。

综上所述，若取$\Gamma(z) = \max\{\Gamma_1(z), \Gamma_2(z), \Gamma_3(z)\} = \max\{2\sqrt{z}, z\}$，则恒有$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))$。于是$\mathcal{l} = \mathcal{l}_{\rho}$，$\psi(z) = z$时单阶段代理损失的$(\mathcal{H}, \mathcal{R})$-一致性界得证。

4.4 $\mathcal{l} = \mathcal{l}_{\rho-\text{hinge}}$，$\psi(z) = nz$

在这种情况下$C_L(h, r, x)$可以表示为：

\[\begin{aligned} C_L(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x) \underbrace{\sum_{y^{\prime} \neq y}\max\left\{0, 1 + \frac{h(x){y^{\prime}}}{\rho}\right\}}{\mathcal{l}{\rho}-\text{hinge}}e^{\alpha r(x)} + nce^{-\alpha r(x)} \\ &= \sum{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\max\left\{0, 1 + \frac{h(x)_y}{\rho}\right\}e^{\alpha r(x)} + nce^{-\alpha r(x)} \\ \end{aligned} \]

由于假设了$\mathcal{H}$是对称的与完备的，我们有

\[\begin{aligned} &\inf_{h\in \mathcal{H}}\left\{\sum_{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\max\left\{0, 1 + \frac{h(x)y}{\rho}\right\}\right\} \\ &= n - \sup{h\in \mathcal{H}}\sum_{y\in \mathcal{Y}}p(y\mid x)\max\left\{0, 1 + \frac{h(x)y}{\rho}\right\} \\ &= n\left(1 - \max{y\in \mathcal{Y}}p\left(y\mid x\right)\right) \end{aligned} \]

注实际上，若取$h_{\rho}$使得$h_{\rho}(x)y = \left\{\begin{aligned} &h(x)y\quad &\text{if } y\notin \left\{y{\max}, \text{h}(x)\right\} \\ &-\rho \quad &\text{if } y = \text{h}(x) \\ &h\left(x\right){y_{\text{max}}} + h\left(x\right){\text{h}(x)} + \rho \quad &\text{if } y = y{\text{max}} \\ \end{aligned}\right.$满足约束$\sum_{y\in \mathcal{Y}}h_{\rho}(y\mid x)=0$，其中$y_{\max} = \text{arg max}_{y\in \mathcal{Y}}p(y\mid x)$，则对任意$h\in \mathcal{H}$有：

\[\begin{aligned} &\sum_{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\max\left\{0, 1 + \frac{h(x)y}{\rho}\right\} - n\left(1 - \max{y\in \mathcal{Y}}p\left(y\mid x\right)\right) \\ &\geqslant \sum_{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\min\left\{n, \max\left\{0, 1 + \frac{h(x)y}{\rho}\right\}\right\} - n\left(1 - \max{y\in \mathcal{Y}}p\left(y\mid x\right)\right) \\ &\geqslant \sum_{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\min\left\{n, \max\left\{0, 1 + \frac{h(x)y}{\rho}\right\}\right\} \\ &\quad - \sum{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\min\left\{n, \max\left\{0, 1 + \frac{h_{\rho}(x)y}{\rho}\right\}\right\} \\ &= \left(p(y{\text{max}}\mid x) - p(\text{h}(x)\mid x)\right)\min\left\{n, 1 + \frac{h(x){\text{h}(x)}}{\rho}\right\} \\ &\geqslant \max{y\in \mathcal{Y}}p\left(y\mid x\right) - p\left(\text{h}\left(x\right)\mid x\right) \end{aligned} \]

和之前$\mathcal{l}{mae}$、$\mathcal{l}{\rho}$的证明类似，这个结论我们会在后面的证明中多次用到。

于是和之前$\mathcal{l}{mae}$、$\mathcal{l}{\rho}$类似，我们有

\[C_L^*(\mathcal{H}, \mathcal{R}, x) = 2\sqrt{n^2c(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right))} \]

于是

\[\begin{aligned} \Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= C_L(h, r, x) - C^*L(\mathcal{H}, \mathcal{R}, x) \\ &= \sum{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\max\left\{0, 1 + \frac{h(x)y}{\rho}\right\}e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c(1 - \max{y\in \mathcal{Y}}p\left(y\mid x\right))} \end{aligned} \]

为了构建$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)$和$\Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)$的不等式关系，接下来我们将会采用$\mathcal{l}{\text{mae}}$、$\mathcal{l}{\rho}$的证明中类似的做法，针对$\max_{y\in \mathcal{Y}} p(y\mid x)$与$1 - c$的大小比较情况与$r(x)$的正负情况来对$\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)$进行分类讨论：

$\max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)$，$r(x) > 0$：

此时

\[\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) \geqslant \max_{y\in \mathcal{Y}}p(y\mid x) - p(\text{h}(x)\mid x) = \Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2 \]

取$\Gamma_1 (z) = z$，于是$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))$得证。
$\max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)$，$r(x) > 0$：

\[\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) \geqslant \frac{1}{4n}\left(1 - c - p\left(\text{h}\left(x\right)\mid x\right)\right)^2 = \frac{\Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2}{4n} \]

取$\Gamma_1 (z) = 2\sqrt{nz}$，于是$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))$得证。
$\max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)$，$r(x) \leqslant 0$：

由于此时$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) = 0$，因此$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))$对任意$\Gamma \geqslant 0$成立。
$\max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)$，$r(x) \leqslant 0$：

\[\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) \geqslant n\left(\frac{\max_{y\in \mathcal{Y}}p(y\mid x) - 1 + c}{2}\right)^2 = \frac{n\Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2}{4} \]

取$\Gamma_3 (z) = 2\sqrt{z/n}$，于是$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))$得证。

综上所述，若取$\Gamma(z) = \max\{\Gamma_1(z), \Gamma_2(z), \Gamma_3(z)\} = \max\{2\sqrt{nz}, z\}$，则恒有$\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))$。于是$\mathcal{l} = \mathcal{l}_{\rho-\text{hinge}}$，$\psi(z) = nz$时单阶段代理损失的$(\mathcal{H}, \mathcal{R})$-一致性界得证。

参考

$1\] Mao A, Mohri M, Zhong Y. Predictor-rejector multi-class abstention: Theoretical analysis and algorithms\[C\]//International Conference on Algorithmic Learning Theory. PMLR, 2024: 822-867.$
$3\] Ni C, Charoenphakdee N, Honda J, et al. On the calibration of multiclass classification with rejection\[J\]. Advances in Neural Information Processing Systems, 2019, 32.$
$5\] Crammer K, Singer Y. On the algorithmic implementation of multiclass kernel-based vector machines\[J\]. Journal of machine learning research, 2001, 2(Dec): 265-292.$

学习理论：单阶段代理损失的(H, R) - 一致界证明

1 导引

2 一些分析的预备概念

3 \(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)\)的表示

4 \(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\)的表示

4.1 分类讨论的准备

4.2 \(\mathcal{l} = \mathcal{l}_{\text{mae}}\)，\(\psi(z) = z\)

4.3 \(\mathcal{l} = \mathcal{l}_{\rho}\)，\(\psi(z) = z\)

4.4 \(\mathcal{l} = \mathcal{l}_{\rho-\text{hinge}}\)，\(\psi(z) = nz\)

参考