Maximum_Likelihood

Statistics with Prof. Liu Sept 6, 2024

Statistics has two streams: frequenties, and Bayes.

Maximum likelihood is frequenties method.

Likelihood function is powerful, which contains all information for the data. We don't need others, just this function.

Likelihood function is the joint probability of all data. Likelihood 就是data的概率。所有data的信息全在这个likelihood finction里面了!

We assume data are iid. In statistics we say iid is "random sample". It means, e.g. 每次抽球的概率一样。

That Joint probability is just the product of all P(data | parameter)

It's data's probability! Not parameters' probability.

例:两个盒子,一个有5个黑球5个白球,另一个有9个黑球1个白球。现抽4次,每次放回地抽1个球。4个都是黑球。问最可能从哪个盒子抽的。

For box1, if 4 times, P(4 black) = P(black)^4 = 0.5^4

For box2, if 4 times, P(4 black) = P(black)^4 = 0.9^4 that's why iid and we make product.

The latter probability is larger, so we choose box2.

The parameters for each box: box1, binomial, n=4, p=0.5. box2, n=4, p=0.9

例,无穷多个盒子,它们有黑球的比例是从0到1不等。抽4次,4个都是黑球。问从哪个盒子抽的。

Now the parameters, n=4, but don't know p. Want to know p, once know p, we then know which box.

**We still choose the box with the highest p. We choose the max P(data given p) ie P(data given box). **

But what is the probability that box2 is what I have done? What is the probability for box1?

But these numbers are not the probabilities for the two boxes! It's more intuitive to make decisions based on their probabilities, such as the prob of rain 40% and not rain 60%.

It has logics. We are choosing the parameter which can make the data to be most likely to stand.

Likelihood is just the probability, the probability of data. 0 to 1.

**应用到科学方法论,We can measure the distance of a theory to the real world data, ie, to examine a theory is good or bad, using likelihood. **


Statistics do inference: estimation and prediction.

Estimation has two categories: 1. Assume we know the population distribution, we just estimate its parameters. 2. We don't even know the population distribution.

After that, if we use our model to fit new data, then it's prediction.

**Prediction error is larger than estimation error. ** Estimatiin error is just RSS, the sum of square residuals. But for prediction, a new dataset will introduce new noise, and plus the model's original RSS.


In logistics regression, and linear regression, and linear discriminate analysis, the conditional class probabilities sum up to 1, and thus is posterior probability P(parameter given data). Just compare them directly. We can use Bayes optimal classifier. It's not related to likelihood.

相关推荐
hssfscv2 小时前
Javaweb学习笔记——后端实战2_部门管理
java·笔记·学习
于越海3 小时前
材料电子理论核心四个基本模型的python编程学习
开发语言·笔记·python·学习·学习方法
我命由我123453 小时前
开发中的英语积累 P26:Recursive、Parser、Pair、Matrix、Inset、Appropriate
经验分享·笔记·学习·职场和发展·求职招聘·职场发展·学习方法
北岛寒沫3 小时前
北京大学国家发展研究院 经济学原理课程笔记(第二十三课 货币供应与通货膨胀)
经验分享·笔记·学习
wdfk_prog3 小时前
[Linux]学习笔记系列 -- [fs][proc]
linux·笔记·学习
hetao17338374 小时前
2026-01-04~06 hetao1733837 的刷题笔记
c++·笔记·算法
Niuguangshuo5 小时前
高斯分布的加权和 vs. 加权混合
概率论
民乐团扒谱机6 小时前
【微实验】数模美赛备赛MATLAB实战:一文速通各种“马尔可夫”(Markov Model)
开发语言·人工智能·笔记·matlab·数据挖掘·马尔科夫链·线性系统
宵时待雨6 小时前
数据结构(初阶)笔记归纳1:复杂度讲解
c语言·数据结构·笔记
今儿敲了吗6 小时前
第二章 C++对C的核心拓展
c++·笔记