【2025版李宏毅机器学习系列课程】CH2 机器学习 Training Guide

model bias：增加模型的flexibility，比如激活函数、更多层数等
model bias 还是 optimization ？
- comparison
- Start from shallower network (or other models), which are easier to optimize
- If deeper networks do not obtain smaller loss on training data, then there is optimization issue.
Overfitting：flexibility太大，training data不够多导致的
- more training data：
  - data augmentation 数据增强，对现有数据进行各种变换来生成更多数据，要合理变换
- less flexibility , constrained model：
  - Less parameters：less神经元、less层数
  - sharing parameters：CNN
  - Less features
  - Early stopping
  - Regularization
  - Dropout

benchmark corpora：基准测试语料库
how to select the best one?
- 不建议的做法：直接比较model 在 public testing set的分数来选择。WHY？类比猴子敲出莎士比亚，如果test很多遍，即使是很废模型，还是有可能拿到好分数
- testing set分public和private：public one 可以看成是训练时会用的，private one可以理解为实际放出来给大众用的，在public testing set上表现好可能是用了某些手段导致在此过拟合，但是在private testing set的表现不好
- 建议的做法：cross validation，用validation set 来选model，少看public testing set的结果
- n-fold cross validation

gradient为0的点统称critical point：比如local minima、saddle point
判断critical point的类型：Hessian
- 大概原理：Tayler Series Approximation 可以近似Loss函数某点附近的样子，critical point处绿色项为0，只剩红色项，只需判断H矩阵的特征值
  
  海森矩阵、特征值与函数的凹凸性的关系
saddle point：可以沿着负特征值的特征向量去更新参数
local minima：When you have lots of parameters, perhaps local minima is rare
可能在高维空间只是个saddle point
经验上看，其实local minima其实不常见，多数是saddle point