【DeepID】《Deep Learning Face Representation from Predicting 10,000 Classes》

CVPR-2014

Sun Y, Wang X, Tang X. Deep learning face representation from predicting 10,000 classes[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 1891-1898.

文章目录

[1、Background and Motivation](#1、Background and Motivation)
[2、Related Work](#2、Related Work)
[3、Advantages / Contributions](#3、Advantages / Contributions)
4、Method
- [4.1、Deep ConvNets](#4.1、Deep ConvNets)
- [4.2、Feature extraction](#4.2、Feature extraction)
- [4.3、Face verification](#4.3、Face verification)
5、Experiments
- [5.1、Datasets and Metrics](#5.1、Datasets and Metrics)
- [5.2、Multi-scale ConvNets](#5.2、Multi-scale ConvNets)
- [5.3、Learning effective features](#5.3、Learning effective features)
- [5.4、Over-complete representation](#5.4、Over-complete representation)
- [5.5、Method comparison](#5.5、Method comparison)
[6、Conclusion（own）/ Future work](#6、Conclusion（own）/ Future work)

1、Background and Motivation

随着计算机视觉和深度学习技术的快速发展，人脸验证作为生物特征识别的一种重要方式，在安全监控、人机交互、社交媒体等多个领域展现出巨大潜力。

然而，在无约束或非控制环境下（如光照变化、姿态多样、表情丰富等），人脸图像的变异极大，给准确的人脸验证带来了巨大困难。

传统的基于低层次特征提取和浅层模型的方法（over-complete low-level features, followed by shallow models ）在处理这类复杂变化时表现不佳，因此需要更强大的特征表示和模型来提高人脸验证的准确性和鲁棒性。

本文提出一种通过深度学习来学习高级人脸特征表示的方法------Deep hidden IDentity features (DeepID)，以提高人脸验证的准确性和泛化能力。

high dimensional over-complete face descriptors, followed by shallow models.
- 26K learning-based (LE) descriptors
- 1.7M SIFT
- 1.2M CMD
learned identity related features based on low-level features
deep models

3、Advantages / Contributions

提出 DeepID 特征表示，与传统方法使用低层次特征或浅层模型相比，DeepID 包含了更丰富、更本质的人脸身份信息，显著提高了人脸验证的准确性。
deepID 有一定的泛化性，can be generalized to other tasks (such as verification) and new identities unseen in the training set
在LFW数据集上取得优异成绩，97.45% verification accuracy on LFW is achieved with only weakly aligned faces.

4、Method

over-complete representations.

highly compact and discriminative features are acquired

4.1、Deep ConvNets

four convolutional layers

特征提取是一个 4 层的 CNN，输入是不同的人脸 patch

长方形 patch 的尺寸为 39x31xk，正方形 patch 的尺寸为 31x31xk，彩色 patch k = 3，灰色 patch k = 1

The features extracted from different face regions are complementary and further boost the performance

卷积操作的公式化表达如下

activation function 用的是 ReLU

max pooling 的公式化表达如下

conv3 和 conv4 之间加了 bypass connection 结构（shortcut），666

The ConvNet output is an n-way softmax predicting the probability distribution over n different identities

4.2、Feature extraction

Features are extracted from 60 face patches with ten regions, three scales, and RGB or gray channels

10 x 3 x 2 = 60 patches

The total length of DeepID is 19, 200 (160×2×60), which is ready for the final face verification

60 种 patch 训练了 60 个网络，每个网络的输出是 160 维特征，2 是 horizontall flipped 的输出得到的特征

We trained 60 ConvNets, each of which extracts two 160-dimensional DeepID vectors from a particular patch and its horizontally flipped counterpart

4.3、Face verification

Joint Bayesian 或者 neural network 方法都可以，输入就是 ConvNets 提取到的特征，输出 1:1 验证结果

（1）Joint Bayesian

核心公式

（2）neural network

注意这里 640 的计算，前面已知 60 patch，每个 patch 160 特征加上 horizontal filp 也才 320 特征，怎么变成 640 了呢？

因为 face verification 每次输入两个人脸

5、Experiments

5.1、Datasets and Metrics

LFW

5749 people，only 85 have more than 15 images, and 4069 people have only one image

CelebFaces

87, 628 face images of 5436 celebrities from the Internet，with approximately 16 images per person on average.

CelebFaces+

extend CelebFaces to the CelebFaces+ dataset, which contains 202, 599 face images of 10, 177 celebrities

evaluate our algorithm on LFW

trained our model on CelebFaces

We randomly choose 80% (4349) people from CelebFaces to learn the DeepID , and use the remaining 20% people to learn the face verification model (Joint Bayesian or neural networks).

评价指标

top-1 error rates

5.2、Multi-scale ConvNets

The lower error rates indicate the better hidden features learned.

5.3、Learning effective features

人数变多了，hidden 层不变还是能 hold 住

More identity classes help to learn better hidden representations that can distinguish more people (discriminative) without increasing the feature length (compact).

可视化看看学到的 160 维隐藏层特征（远小于训练时候的 id 数量）

同 id 的ren，激活（白色）会相似一些，不同 id 的，激活有差异

5.4、Over-complete representation

best performing single patch (k = 1),

global color patches in a single scale (k = 5),

all the global color patches (k = 15),

all the color patches (k = 30),

all the patches (k = 60)

The curves show that the performance may be further improved if more features are extracted.

5.5、Method comparison

Number of points 指的是人脸对齐时关键点的数量，eg，It utilized 3D alignment and pose transform as preprocessing，或者比较简单的眼睛、鼻子、嘴巴五个关键带你

Low feature dimensions indicate efficient face recognition systems

feature dimension 才 150，

6、Conclusion（own）/ Future work

参考学习来自人脸识别合集 | 2 DeepID解析
港中文孙祎、汤晓鸥、王晓刚
Q：face identification（面部识别）和 face recognition（人脸识别）的区别
A：面部识别主要是一对多的比对过程，而人脸识别则涵盖了更广泛的技术步骤，包括人脸检测、预处理、特征提取和比对识别等。
Face Identification，1：N
Face Verification，1：1
注意特征维度的计算，每个 patch 160 不多，但是每张图 60 个 patch，加上 horizontal flip，提取的特征也很庞大，60x2x160

更多论文解读，请参考【Paper Reading】