接下来,在采用公式 J ( θ ) = 1 2 m ∑ i = 0 m − 1 ( y p i − y i ) 2 J\left( \mathbf{\theta} \right) = \frac{1}{2m}\sum_{i = 0}^{m - 1}{(y_{pi} - y_{i})}^{2} J(θ)=2m1∑i=0m−1(ypi−yi)2这样的优化函数的情况下,一起来求解模型的参数。
在极值点必有导数值为0。导数值为0则表明函数随未知变量的变化率为0。下面先对 θ 0 \theta_{0} θ0求偏导数:
∂ J ( θ ) ∂ θ 0 = ∂ ∂ θ 0 ( 1 2 m ∑ i = 0 m − 1 ( ( θ 1 x i + θ 0 ) − y i ) 2 ) = 1 2 m ∑ i = 0 m − 1 ∂ ∂ θ 0 ( ( θ 1 x i + θ 0 ) − y i ) 2 = 1 2 m ∑ i = 0 m − 1 ( 2 ( ( θ 1 x i + θ 0 ) − y i ) ∂ ∂ θ 0 ( ( θ 1 x i + θ 0 ) − y i ) ) = 1 2 m ∑ i = 0 m − 1 ( 2 ( ( θ 1 x i + θ 0 ) − y i ) ∂ ∂ θ 0 ( θ 1 x i + θ 0 ) ) = 1 2 m ∑ i = 0 m − 1 ( 2 ( ( θ 1 x i + θ 0 ) − y i ) ) = 1 m ∑ i = 0 m − 1 ( ( θ 1 x i + θ 0 ) − y i ) \frac{\partial J\left( \mathbf{\theta} \right)}{\partial\theta_{0}} = \frac{\partial}{\partial\theta_{0}}\left( \frac{1}{2m}\sum_{i = 0}^{m - 1}\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right)^{2} \right) = \frac{1}{2m}\sum_{i = 0}^{m - 1}{\frac{\partial}{\partial\theta_{0}}\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right)^{2}} = \frac{1}{2m}\sum_{i = 0}^{m - 1}\left( 2\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right)\frac{\partial}{\partial\theta_{0}}\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right) \right) = \frac{1}{2m}\sum_{i = 0}^{m - 1}\left( 2\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right)\frac{\partial}{\partial\theta_{0}}\left( \theta_{1}x_{i} + \theta_{0} \right) \right) = \frac{1}{2m}\sum_{i = 0}^{m - 1}\left( 2\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right) \right) = \frac{1}{m}\sum_{i = 0}^{m - 1}\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right) ∂θ0∂J(θ)=∂θ0∂(2m1i=0∑m−1((θ1xi+θ0)−yi)2)=2m1i=0∑m−1∂θ0∂((θ1xi+θ0)−yi)2=2m1i=0∑m−1(2((θ1xi+θ0)−yi)∂θ0∂((θ1xi+θ0)−yi))=2m1i=0∑m−1(2((θ1xi+θ0)−yi)∂θ0∂(θ1xi+θ0))=2m1i=0∑m−1(2((θ1xi+θ0)−yi))=m1i=0∑m−1((θ1xi+θ0)−yi)
要想打好机器学习的数学基础,请参见清华大学出版社的人人可懂系列,包括《人人可懂的微积分》(已上市)、《人人可懂的线性代数》(即将上市)、《人人可懂的概率统计》(即将上市)。

再对 θ 1 \theta_{1} θ1求偏导数:
∂ J ( θ ) ∂ θ 1 = ∂ ∂ θ 1 ( 1 2 m ∑ i = 0 m − 1 ( ( θ 1 x i + θ 0 ) − y i ) 2 ) = 1 2 m ∑ i = 0 m − 1 ∂ ∂ θ 1 ( ( θ 1 x i + θ 0 ) − y i ) 2 = 1 2 m ∑ i = 0 m − 1 ( 2 ( ( θ 1 x i + θ 0 ) − y i ) ∂ ∂ θ 1 ( ( θ 1 x i + θ 0 ) − y i ) ) = 1 2 m ∑ i = 0 m − 1 ( 2 ( ( θ 1 x i + θ 0 ) − y i ) ∂ ∂ θ 1 ( θ 1 x i + θ 0 ) ) = 1 2 m ∑ i = 0 m − 1 ( 2 ( ( θ 1 x i + θ 0 ) − y i ) ∂ ∂ θ 1 ( θ 1 x i ) ) = 1 2 m ∑ i = 0 m − 1 ( 2 ( ( θ 1 x i + θ 0 ) − y i ) x i ) = 1 m ∑ i = 0 m − 1 ( ( ( θ 1 x i + θ 0 ) − y i ) x i ) = 1 m ∑ i = 0 m − 1 ( ( θ 1 x i + θ 0 ) x i − y i x i ) \frac{\partial J(\mathbf{\theta})}{\partial\theta_{1}} = \frac{\partial}{\partial\theta_{1}}\left( \frac{1}{2m}\sum_{i = 0}^{m - 1}\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right)^{2} \right) = \frac{1}{2m}\sum_{i = 0}^{m - 1}{\frac{\partial}{\partial\theta_{1}}\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right)^{2}} = \frac{1}{2m}\sum_{i = 0}^{m - 1}\left( 2\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right)\frac{\partial}{\partial\theta_{1}}\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right) \right) = \frac{1}{2m}\sum_{i = 0}^{m - 1}\left( 2\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right)\frac{\partial}{\partial\theta_{1}}\left( \theta_{1}x_{i} + \theta_{0} \right) \right) = \frac{1}{2m}\sum_{i = 0}^{m - 1}\left( 2\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right)\frac{\partial}{\partial\theta_{1}}(\theta_{1}x_{i}) \right) = \frac{1}{2m}\sum_{i = 0}^{m - 1}\left( 2\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right)x_{i} \right) = \frac{1}{m}\sum_{i = 0}^{m - 1}\left( \left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right)x_{i} \right) = \frac{1}{m}\sum_{i = 0}^{m - 1}\left( \left( \theta_{1}x_{i} + \theta_{0} \right)x_{i} - y_{i}x_{i} \right) ∂θ1∂J(θ)=∂θ1∂(2m1i=0∑m−1((θ1xi+θ0)−yi)2)=2m1i=0∑m−1∂θ1∂((θ1xi+θ0)−yi)2=2m1i=0∑m−1(2((θ1xi+θ0)−yi)∂θ1∂((θ1xi+θ0)−yi))=2m1i=0∑m−1(2((θ1xi+θ0)−yi)∂θ1∂(θ1xi+θ0))=2m1i=0∑m−1(2((θ1xi+θ0)−yi)∂θ1∂(θ1xi))=2m1i=0∑m−1(2((θ1xi+θ0)−yi)xi)=m1i=0∑m−1(((θ1xi+θ0)−yi)xi)=m1i=0∑m−1((θ1xi+θ0)xi−yixi)
由此,可得到方程组:
{ ∂ J ( θ ) ∂ θ 0 = 1 m ∑ i = 0 m − 1 ( ( θ 1 x i + θ 0 ) − y i ) = 0 ∂ J ( θ ) ∂ θ 1 = 1 m ∑ i = 0 m − 1 ( ( θ 1 x i + θ 0 ) x i − y i x i ) = 0 \left\{ \begin{matrix} \frac{\partial J(\mathbf{\theta})}{\partial\theta_{0}} = \frac{1}{m}\sum_{i = 0}^{m - 1}\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right) = 0 \\ \frac{\partial J(\mathbf{\theta})}{\partial\theta_{1}} = \frac{1}{m}\sum_{i = 0}^{m - 1}\left( \left( \theta_{1}x_{i} + \theta_{0} \right)x_{i} - y_{i}x_{i} \right) = 0 \\ \end{matrix} \right.\ {∂θ0∂J(θ)=m1∑i=0m−1((θ1xi+θ0)−yi)=0∂θ1∂J(θ)=m1∑i=0m−1((θ1xi+θ0)xi−yixi)=0
(方程组1)
怎么求解这个方程组呢?可根据方程组的第1个方程,得到:
1 m ∑ i = 0 m − 1 ( ( θ 1 x i + θ 0 ) − y i ) = 0 ⟹ 1 m ∑ i = 0 m − 1 ( θ 1 x i + θ 0 − y i ) = 0 \frac{1}{m}\sum_{i = 0}^{m - 1}\left( \left( \theta_{1}x_{i} + \theta_{0} \right) - y_{i} \right) = 0 \Longrightarrow \frac{1}{m}\sum_{i = 0}^{m - 1}\left( \theta_{1}x_{i} + \theta_{0} - y_{i} \right) = 0 m1i=0∑m−1((θ1xi+θ0)−yi)=0⟹m1i=0∑m−1(θ1xi+θ0−yi)=0
⟹ ∑ i = 0 m − 1 θ 0 = ∑ i = 0 m − 1 ( y i − θ 1 x i ) ⟹ m θ 0 = ∑ i = 0 m − 1 ( y i − θ 1 x i ) \Longrightarrow \sum_{i = 0}^{m - 1}\theta_{0} = \sum_{i = 0}^{m - 1}\left( y_{i} - \theta_{1}x_{i} \right) \Longrightarrow m\theta_{0} = \sum_{i = 0}^{m - 1}\left( y_{i} - \theta_{1}x_{i} \right) ⟹i=0∑m−1θ0=i=0∑m−1(yi−θ1xi)⟹mθ0=i=0∑m−1(yi−θ1xi)
⟹ m θ 0 = ∑ i = 0 m − 1 y i − ∑ i = 0 m − 1 ( θ 1 x i ) ⟹ mθ 0 = ∑ i = 0 m − 1 y i − θ 1 ∑ i = 0 m − 1 x i \Longrightarrow m\theta_{0} = \sum_{i = 0}^{m - 1}y_{i} - \sum_{i = 0}^{m - 1}\left( \theta_{1}x_{i} \right) \Longrightarrow \text{mθ}{0} = \sum{i = 0}^{m - 1}y_{i} - \theta_{1}\sum_{i = 0}^{m - 1}x_{i} ⟹mθ0=i=0∑m−1yi−i=0∑m−1(θ1xi)⟹mθ0=i=0∑m−1yi−θ1i=0∑m−1xi
⟹ θ 0 = 1 m ∑ i = 0 m − 1 y i − 1 m θ 1 ∑ i = 0 m − 1 x i \Longrightarrow \theta_{0} = \frac{1}{m}\sum_{i = 0}^{m - 1}y_{i} - {\frac{1}{m}\theta}{1}\sum{i = 0}^{m - 1}x_{i} ⟹θ0=m1i=0∑m−1yi−m1θ1i=0∑m−1xi
根据方程组4-5的第2个方程,可得:
1 m ∑ i = 0 m − 1 ( ( θ 1 x i + θ 0 ) x i − y i x i ) = 0 ⟹ ∑ i = 0 m − 1 ( ( θ 1 x i + θ 0 ) x i − y i x i ) = 0 \frac{1}{m}\sum_{i = 0}^{m - 1}\left( \left( \theta_{1}x_{i} + \theta_{0} \right)x_{i} - y_{i}x_{i} \right) = 0 \Longrightarrow \sum_{i = 0}^{m - 1}\left( \left( \theta_{1}x_{i} + \theta_{0} \right)x_{i} - y_{i}x_{i} \right) = 0 m1i=0∑m−1((θ1xi+θ0)xi−yixi)=0⟹i=0∑m−1((θ1xi+θ0)xi−yixi)=0
⟹ ∑ i = 0 m − 1 ( θ 1 x i 2 + θ 0 x i − y i x i ) = 0 ⟹ ∑ i = 0 m − 1 ( θ 0 x i ) = ∑ i = 0 m − 1 ( y i x i − θ 1 x i 2 ) \Longrightarrow \sum_{i = 0}^{m - 1}\left( \theta_{1}{x_{i}}^{2} + \theta_{0}x_{i} - y_{i}x_{i} \right) = 0 \Longrightarrow \sum_{i = 0}^{m - 1}\left( \theta_{0}x_{i} \right) = \sum_{i = 0}^{m - 1}\left( y_{i}x_{i} - \theta_{1}{x_{i}}^{2} \right) ⟹i=0∑m−1(θ1xi2+θ0xi−yixi)=0⟹i=0∑m−1(θ0xi)=i=0∑m−1(yixi−θ1xi2)
⟹ θ 0 ∑ i = 0 m − 1 x i = ∑ i = 0 m − 1 ( y i x i ) − θ 1 ∑ i = 0 m − 1 x i 2 \Longrightarrow \theta_{0}\sum_{i = 0}^{m - 1}x_{i} = \sum_{i = 0}^{m - 1}\left( y_{i}x_{i} \right) - \theta_{1}\sum_{i = 0}^{m - 1}{x_{i}}^{2} ⟹θ0i=0∑m−1xi=i=0∑m−1(yixi)−θ1i=0∑m−1xi2
⟹ θ 0 = ( ∑ i = 0 m − 1 ( y i x i ) − θ 1 ∑ i = 0 m − 1 x i 2 ) ∑ i = 0 m − 1 x i \Longrightarrow \theta_{0} = \frac{\left( \sum_{i = 0}^{m - 1}\left( y_{i}x_{i} \right) - \theta_{1}\sum_{i = 0}^{m - 1}{x_{i}}^{2} \right)}{\sum_{i = 0}^{m - 1}x_{i}} ⟹θ0=∑i=0m−1xi(∑i=0m−1(yixi)−θ1∑i=0m−1xi2)
因此,可得:
1 m ∑ i = 0 m − 1 y i − 1 m θ 1 ∑ i = 0 m − 1 x i = ( ∑ i = 0 m − 1 ( y i x i ) − θ 1 ∑ i = 0 m − 1 x i 2 ) ∑ i = 0 m − 1 x i \frac{1}{m}\sum_{i = 0}^{m - 1}y_{i} - {\frac{1}{m}\theta}{1}\sum{i = 0}^{m - 1}x_{i} = \frac{\left( \sum_{i = 0}^{m - 1}\left( y_{i}x_{i} \right) - \theta_{1}\sum_{i = 0}^{m - 1}{x_{i}}^{2} \right)}{\sum_{i = 0}^{m - 1}x_{i}} m1i=0∑m−1yi−m1θ1i=0∑m−1xi=∑i=0m−1xi(∑i=0m−1(yixi)−θ1∑i=0m−1xi2)
⟹ 1 m ∑ i = 0 m − 1 y i ∑ i = 0 m − 1 x i − 1 m θ 1 ∑ i = 0 m − 1 x i ∑ i = 0 m − 1 x i = ∑ i = 0 m − 1 ( y i x i ) − θ 1 ∑ i = 0 m − 1 x i 2 \Longrightarrow \frac{1}{m}\sum_{i = 0}^{m - 1}y_{i}\sum_{i = 0}^{m - 1}x_{i} - \frac{1}{m}\theta_{1}\sum_{i = 0}^{m - 1}x_{i}\sum_{i = 0}^{m - 1}x_{i} = \sum_{i = 0}^{m - 1}\left( y_{i}x_{i} \right) - \theta_{1}\sum_{i = 0}^{m - 1}{x_{i}}^{2} ⟹m1i=0∑m−1yii=0∑m−1xi−m1θ1i=0∑m−1xii=0∑m−1xi=i=0∑m−1(yixi)−θ1i=0∑m−1xi2
⟹ ∑ i = 0 m − 1 y i ∑ i = 0 m − 1 x i − θ 1 ∑ i = 0 m − 1 x i ∑ i = 0 m − 1 x i = m ∑ i = 0 m − 1 ( y i x i ) − m θ 1 ∑ i = 0 m − 1 x i 2 \Longrightarrow \sum_{i = 0}^{m - 1}y_{i}\sum_{i = 0}^{m - 1}x_{i} - \theta_{1}\sum_{i = 0}^{m - 1}x_{i}\sum_{i = 0}^{m - 1}x_{i} = m\sum_{i = 0}^{m - 1}\left( y_{i}x_{i} \right) - m\theta_{1}\sum_{i = 0}^{m - 1}{x_{i}}^{2} ⟹i=0∑m−1yii=0∑m−1xi−θ1i=0∑m−1xii=0∑m−1xi=mi=0∑m−1(yixi)−mθ1i=0∑m−1xi2
⟹ m θ 1 ∑ i = 0 m − 1 x i 2 − θ 1 ∑ i = 0 m − 1 x i ∑ i = 0 m − 1 x i = m ∑ i = 0 m − 1 ( y i x i ) − ∑ i = 0 m − 1 y i ∑ i = 0 m − 1 x i \Longrightarrow \ m\theta_{1}\sum_{i = 0}^{m - 1}{x_{i}}^{2} - \theta_{1}\sum_{i = 0}^{m - 1}x_{i}\sum_{i = 0}^{m - 1}x_{i} = m\sum_{i = 0}^{m - 1}\left( y_{i}x_{i} \right) - \sum_{i = 0}^{m - 1}y_{i}\sum_{i = 0}^{m - 1}x_{i} ⟹ mθ1i=0∑m−1xi2−θ1i=0∑m−1xii=0∑m−1xi=mi=0∑m−1(yixi)−i=0∑m−1yii=0∑m−1xi
⟹ θ 1 ( m ∑ i = 0 m − 1 x i 2 − ∑ i = 0 m − 1 x i ∑ i = 0 m − 1 x i ) = m ∑ i = 0 m − 1 ( y i x i ) − ∑ i = 0 m − 1 y i ∑ i = 0 m − 1 x i \Longrightarrow \ \theta_{1}\left( m\sum_{i = 0}^{m - 1}{x_{i}}^{2} - \sum_{i = 0}^{m - 1}x_{i}\sum_{i = 0}^{m - 1}x_{i} \right) = m\sum_{i = 0}^{m - 1}\left( y_{i}x_{i} \right) - \sum_{i = 0}^{m - 1}y_{i}\sum_{i = 0}^{m - 1}x_{i} ⟹ θ1(mi=0∑m−1xi2−i=0∑m−1xii=0∑m−1xi)=mi=0∑m−1(yixi)−i=0∑m−1yii=0∑m−1xi
⟹ θ 1 = ( m ∑ i = 0 m − 1 ( y i x i ) − ∑ i = 0 m − 1 y i ∑ i = 0 m − 1 x i ) ( m ∑ i = 0 m − 1 x i 2 − ∑ i = 0 m − 1 x i ∑ i = 0 m − 1 x i ) \Longrightarrow \ \theta_{1} = \frac{\left( m\sum_{i = 0}^{m - 1}\left( y_{i}x_{i} \right) - \sum_{i = 0}^{m - 1}y_{i}\sum_{i = 0}^{m - 1}x_{i} \right)}{\left( m\sum_{i = 0}^{m - 1}{x_{i}}^{2} - \sum_{i = 0}^{m - 1}x_{i}\sum_{i = 0}^{m - 1}x_{i} \right)} ⟹ θ1=(m∑i=0m−1xi2−∑i=0m−1xi∑i=0m−1xi)(m∑i=0m−1(yixi)−∑i=0m−1yi∑i=0m−1xi)
至此,已求解得直线模型的2个参数值为:
{ θ 1 = ( m ∑ i = 0 m − 1 ( y i x i ) − ∑ i = 0 m − 1 y i ∑ i = 0 m − 1 x i ) ( m ∑ i = 0 m − 1 x i 2 − ∑ i = 0 m − 1 x i ∑ i = 0 m − 1 x i ) θ 0 = 1 m ∑ i = 0 m − 1 y i − 1 m θ 1 ∑ i = 0 m − 1 x i \left\{ \begin{matrix} \theta_{1} = \frac{\left( m\sum_{i = 0}^{m - 1}\left( y_{i}x_{i} \right) - \sum_{i = 0}^{m - 1}y_{i}\sum_{i = 0}^{m - 1}x_{i} \right)}{\left( m\sum_{i = 0}^{m - 1}{x_{i}}^{2} - \sum_{i = 0}^{m - 1}x_{i}\sum_{i = 0}^{m - 1}x_{i} \right)} \\ \theta_{0} = \frac{1}{m}\sum_{i = 0}^{m - 1}y_{i} - {\frac{1}{m}\theta}{1}\sum{i = 0}^{m - 1}x_{i} \\ \end{matrix} \right.\ ⎩ ⎨ ⎧θ1=(m∑i=0m−1xi2−∑i=0m−1xi∑i=0m−1xi)(m∑i=0m−1(yixi)−∑i=0m−1yi∑i=0m−1xi)θ0=m1∑i=0m−1yi−m1θ1∑i=0m−1xi
(方程组2的解)
要想打好机器学习的数学基础,请参见清华大学出版社的人人可懂系列,包括《人人可懂的微积分》(已上市)、《人人可懂的线性代数》(即将上市)、《人人可懂的概率统计》(即将上市)。
提示: ∑ i = 0 m − 1 ( y i x i ) \sum_{i = 0}^{m - 1}\left( y_{i}x_{i} \right) ∑i=0m−1(yixi)与 ∑ i = 0 m − 1 y i ∑ i = 0 m − 1 x i \sum_{i = 0}^{m - 1}y_{i}\sum_{i = 0}^{m - 1}x_{i} ∑i=0m−1yi∑i=0m−1xi不同,只要将这2个式子展开来就可以看得很清楚了:
∑ i = 0 m − 1 ( y i x i ) = y 0 x 0 + ... + y m − 1 x m − 1 \sum_{i = 0}^{m - 1}\left( y_{i}x_{i} \right) = y_{0}x_{0} + \ldots + y_{m - 1}x_{m - 1} i=0∑m−1(yixi)=y0x0+...+ym−1xm−1
∑ i = 0 m − 1 y i ∑ i = 0 m − 1 x i = ( y 0 + ... + y m − 1 ) ( x 0 + ... + x m − 1 ) \sum_{i = 0}^{m - 1}y_{i}\sum_{i = 0}^{m - 1}x_{i} = (y_{0} + \ldots + y_{m - 1})(x_{0} + \ldots + x_{m - 1}) i=0∑m−1yii=0∑m−1xi=(y0+...+ym−1)(x0+...+xm−1)
这2个式子明显不同。再来看 ∑ i = 0 m − 1 x i 2 \sum_{i = 0}^{m - 1}{x_{i}}^{2} ∑i=0m−1xi2和 ∑ i = 0 m − 1 x i ∑ i = 0 m − 1 x i \sum_{i = 0}^{m - 1}x_{i}\sum_{i = 0}^{m - 1}x_{i} ∑i=0m−1xi∑i=0m−1xi,将这2个式子展开来就可以看得很清晰:
∑ i = 0 m − 1 x i 2 = x 0 2 + ... + x m − 1 2 \sum_{i = 0}^{m - 1}{x_{i}}^{2} = {x_{0}}^{2} + \ldots + {x_{m - 1}}^{2} i=0∑m−1xi2=x02+...+xm−12
∑ i = 0 m − 1 x i ∑ i = 0 m − 1 x i = ( x 0 + ... + x m − 1 ) 2 \sum_{i = 0}^{m - 1}x_{i}\sum_{i = 0}^{m - 1}x_{i} = {(x_{0} + \ldots + x_{m - 1})}^{2} i=0∑m−1xii=0∑m−1xi=(x0+...+xm−1)2
这2个式子看起来也明显不同。