Maximum_Likelihood

Statistics with Prof. Liu Sept 6, 2024

Statistics has two streams: frequenties, and Bayes.

Maximum likelihood is frequenties method.

Likelihood function is powerful, which contains all information for the data. We don't need others, just this function.

Likelihood function is the joint probability of all data. Likelihood 就是data的概率。所有data的信息全在这个likelihood finction里面了!

We assume data are iid. In statistics we say iid is "random sample". It means, e.g. 每次抽球的概率一样。

That Joint probability is just the product of all P(data | parameter)

It's data's probability! Not parameters' probability.

例:两个盒子,一个有5个黑球5个白球,另一个有9个黑球1个白球。现抽4次,每次放回地抽1个球。4个都是黑球。问最可能从哪个盒子抽的。

For box1, if 4 times, P(4 black) = P(black)^4 = 0.5^4

For box2, if 4 times, P(4 black) = P(black)^4 = 0.9^4 that's why iid and we make product.

The latter probability is larger, so we choose box2.

The parameters for each box: box1, binomial, n=4, p=0.5. box2, n=4, p=0.9

例,无穷多个盒子,它们有黑球的比例是从0到1不等。抽4次,4个都是黑球。问从哪个盒子抽的。

Now the parameters, n=4, but don't know p. Want to know p, once know p, we then know which box.

**We still choose the box with the highest p. We choose the max P(data given p) ie P(data given box). **

But what is the probability that box2 is what I have done? What is the probability for box1?

But these numbers are not the probabilities for the two boxes! It's more intuitive to make decisions based on their probabilities, such as the prob of rain 40% and not rain 60%.

It has logics. We are choosing the parameter which can make the data to be most likely to stand.

Likelihood is just the probability, the probability of data. 0 to 1.

**应用到科学方法论,We can measure the distance of a theory to the real world data, ie, to examine a theory is good or bad, using likelihood. **


Statistics do inference: estimation and prediction.

Estimation has two categories: 1. Assume we know the population distribution, we just estimate its parameters. 2. We don't even know the population distribution.

After that, if we use our model to fit new data, then it's prediction.

**Prediction error is larger than estimation error. ** Estimatiin error is just RSS, the sum of square residuals. But for prediction, a new dataset will introduce new noise, and plus the model's original RSS.


In logistics regression, and linear regression, and linear discriminate analysis, the conditional class probabilities sum up to 1, and thus is posterior probability P(parameter given data). Just compare them directly. We can use Bayes optimal classifier. It's not related to likelihood.

相关推荐
崽崽的谷雨14 小时前
react使用ag-grid及常用api笔记
笔记·react.js·ag-grid
初圣魔门首席弟子15 小时前
C++ STL list 容器学习笔记:双向链表的 “小火车“ 操控指南
c++·windows·笔记·学习
LBuffer15 小时前
破解入门学习笔记题三十四
java·笔记·学习
88号技师15 小时前
2025年7月一区SCI优化算法-Logistic-Gauss Circle optimizer-附Matlab免费代码
开发语言·算法·数学建模·matlab·优化算法
再睡一夏就好15 小时前
【C++闯关笔记】unordered_map与unordered_set的底层:哈希表(哈希桶)
开发语言·c++·笔记·学习·哈希算法·散列表
摇滚侠16 小时前
Spring Boot3零基础教程,为什么有Reactive-Stream 规范,响应式编程,笔记101
java·spring boot·笔记
TL滕17 小时前
从0开始学算法——第一天(认识算法)
数据结构·笔记·学习·算法
YuforiaCode17 小时前
神领物流v2.0-day3-运费微服务笔记(个人记录、含练习答案、仅供参考)
笔记
zhangrelay18 小时前
如何使用AI快速编程实现标注ROS2中sensor_msgs/msg/Image图像色彩webots2025a
人工智能·笔记·opencv·学习·计算机视觉·机器人视觉
m0_5982500018 小时前
电源完整性07-如何确定PDN网络中的大电容
笔记·单片机·嵌入式硬件·硬件工程