Maximum_Likelihood

Statistics with Prof. Liu Sept 6, 2024

Statistics has two streams: frequenties, and Bayes.

Maximum likelihood is frequenties method.

Likelihood function is powerful, which contains all information for the data. We don't need others, just this function.

Likelihood function is the joint probability of all data. Likelihood 就是data的概率。所有data的信息全在这个likelihood finction里面了!

We assume data are iid. In statistics we say iid is "random sample". It means, e.g. 每次抽球的概率一样。

That Joint probability is just the product of all P(data | parameter)

It's data's probability! Not parameters' probability.

例:两个盒子,一个有5个黑球5个白球,另一个有9个黑球1个白球。现抽4次,每次放回地抽1个球。4个都是黑球。问最可能从哪个盒子抽的。

For box1, if 4 times, P(4 black) = P(black)^4 = 0.5^4

For box2, if 4 times, P(4 black) = P(black)^4 = 0.9^4 that's why iid and we make product.

The latter probability is larger, so we choose box2.

The parameters for each box: box1, binomial, n=4, p=0.5. box2, n=4, p=0.9

例,无穷多个盒子,它们有黑球的比例是从0到1不等。抽4次,4个都是黑球。问从哪个盒子抽的。

Now the parameters, n=4, but don't know p. Want to know p, once know p, we then know which box.

**We still choose the box with the highest p. We choose the max P(data given p) ie P(data given box). **

But what is the probability that box2 is what I have done? What is the probability for box1?

But these numbers are not the probabilities for the two boxes! It's more intuitive to make decisions based on their probabilities, such as the prob of rain 40% and not rain 60%.

It has logics. We are choosing the parameter which can make the data to be most likely to stand.

Likelihood is just the probability, the probability of data. 0 to 1.

**应用到科学方法论,We can measure the distance of a theory to the real world data, ie, to examine a theory is good or bad, using likelihood. **


Statistics do inference: estimation and prediction.

Estimation has two categories: 1. Assume we know the population distribution, we just estimate its parameters. 2. We don't even know the population distribution.

After that, if we use our model to fit new data, then it's prediction.

**Prediction error is larger than estimation error. ** Estimatiin error is just RSS, the sum of square residuals. But for prediction, a new dataset will introduce new noise, and plus the model's original RSS.


In logistics regression, and linear regression, and linear discriminate analysis, the conditional class probabilities sum up to 1, and thus is posterior probability P(parameter given data). Just compare them directly. We can use Bayes optimal classifier. It's not related to likelihood.

相关推荐
向上的车轮6 小时前
MATLAB学习笔记(七):MATLAB建模城市的雨季防洪排污的问题
笔记·学习·matlab
躺着听Jay7 小时前
Oracle-相关笔记
数据库·笔记·oracle
田梓燊7 小时前
数学复习笔记 19
笔记·线性代数·机器学习
逼子格7 小时前
硬件工程师笔记——二极管Multisim电路仿真实验汇总
笔记·嵌入式硬件·硬件工程师·multisim·硬件工程师学习·电子器件·电路图
龙湾开发8 小时前
计算机图形学编程(使用OpenGL和C++)(第2版)学习笔记 10.增强表面细节(二)法线贴图
c++·笔记·学习·图形渲染·贴图
liang_20268 小时前
【HT周赛】T3.二维平面 题解(分块:矩形chkmax,求矩形和)
数据结构·笔记·学习·算法·平面·总结
汇能感知8 小时前
光谱相机的空间分辨率和时间分辨率
经验分享·笔记·科技
lwewan9 小时前
26考研408目录汇总~
笔记·考研
ljt27249606619 小时前
Compose笔记(二十三)--多点触控
笔记·android jetpack
2303_Alpha12 小时前
深度学习入门:深度学习(完结)
人工智能·笔记·python·深度学习·神经网络·机器学习