python数据分析，模拟概率论问题的库，特别是条件概率。使用特定的PMF或PDF创建自定义的单一或联合分布以获取概率表并基于概率函数生成数据也很有用

介绍

这是一个模拟概率论问题的库，特别是条件概率。使用特定的PMF或PDF创建自定义的单一或联合分布以获取概率表并基于概率函数生成数据也很有用。

如何安装？

pip install pprobs

概率模拟器

它模拟概率论问题，尤其是条件概率。

示例1

我们想通过定义一些事件来获得一些信息。

P(A) = 0.3
P(B) = 0.2
P(A^B) = 0.1
A and B are dependent
P(A+B) = ? , P(A|B) = ?

复制代码

from pprobs.simulation import Simulator

space = Simulator()

space.add_event('A', 0.3)
space.add_event('B', 0.2)
space.add_event('A^B', 0.1)

prob_1 = space.get_prob('A+B') # A+B means union of A and B
prob_2 = space.get_prob('A|B')

print(prob_1, prob_2) # 0.4  0.5

示例2

在100名跑车购买者中，40人购买了报警系统，30人购买了斗式座椅，20人购买了报警系统和斗式座椅。如果一个随机选择的汽车购买者买了一个报警系统，他们也买了桶形座椅的概率是多少？

经过统计显示

P(SEAT) = 0.3
P(ALARM) = 0.4
P(SEAT ^ ALARM) = 0.2
P(SEAT | ALARAM) = ?

复制代码

from pprobs.simulation import Simulator

space = Simulator()

space.add_event('SEAT', 0.3).add_event('ALARM', 0.4) # We can also add events sequentially in a line (chaining) 
space.add_event('SEAT^ALARM', 0.2) # A^B means intersection of A & B

print(space.get_prob('SEAT|ALARM')) # 0.5

示例3

总共有1%的人有某种遗传缺陷。90%的基因检测发现缺陷(真阳性)。9.6%的测试是假阳性。如果一个人的测试结果是阳性，那么他们真的有基因缺陷的几率有多大？

经过统计显示

P(GEN_DEF) = 0.01
P(POSITIVE|GEN_DEF) = 0.9
P(POSITIVE|GEN_DEF!) = 0.096
P(GEN_DEF|POSITIVE) = ?

复制代码

space = Simulator()

space.add_event('GEN_DEF', 0.01)
space.add_event('POSITIVE|GEN_DEF', 0.9) # A|B means A given B
space.add_event('POSITIVE|GEN_DEF!', 0.096) # A! means complement of A

print(space.get_prob('GEN_DEF|POSITIVE')) # 0.0865

实例4

鲍勃明天有一个重要的会议，他必须在早上准时到达办公室。他的一般交通方式是开车，在正常情况下(没有汽车故障)，他准时到达的概率是0.3。他可能有汽车故障的概率是0.2。如果汽车出了问题，他将不得不乘火车，而10列火车中只有2列能让他准时到达办公室。

经过黑客地球

P(ON_TIME|CAR_OK) = 0.3
P(ON_TIME|CAR_OK!) = 2/10 => Go by train
P(CAR_OK!) = 0.2
P(ON_TIME) = ?

复制代码

space = Simulator()

space.add_event('ON_TIME|CAR_OK', 0.3)
space.add_event('ON_TIME|CAR_OK!', 2/10)
space.add_event('CAR_OK!', 0.2)

prob = space.get_prob('ON_TIME') # Probability of ON_TIME

print(prob) # 0.28

分布模拟器

使用特定的PMF或PDF创建自定义的单一或联合分布对于获取概率表和基于概率函数生成数据非常有用。

示例1

假设我们有一个具有特定PMF的离散随机变量。我们希望基于这个变量生成许多数据。正如您在第二个示例中看到的，1的概率最大，复制次数最多，4的概率最小，复制次数较少。

复制代码

from pprobs.distribution import Discrete

# First 
def pmf(x):
    return 1 / 6

dist = Discrete(pmf, [1, 2, 3, 4, 5, 6]) # The second is the sample space of our PMF

print(dist.generate(15)) # [4, 3, 1, 6, 5, 3, 5, 3, 5, 4, 2, 5, 6, 1, 6]


# Second
def pmf(x):
    return 1 / x

dist = Discrete(pmf, [1, 2, 3, 4])
print(dist.generate(15)) # [1, 2, 1, 1, 1, 4, 3, 1, 1, 3, 2, 4, 1, 2, 2]

示例2

假设我们有一个特定PDF的连续随机变量。

复制代码

from pprobs.distribution import Continuous

def pdf(x):
  if x > 1:
    return x / x ** 2
  return 0

dist = Continuous(pdf, [1, 6]) # The second is the sample interval of our PDF

print(dist.generate(15)) # [2.206896551724138, 4.103448275862069, ..., 5.655172413793104, 6.0]

示例3

假设我们有一个具有特定PDF的连续联合变量。

复制代码

from pprobs.distribution import Joint

def pdf(x, y):
  if x > 1:
    return 1 / (x * y)
  return 0

dist = Joint(pdf, [1, 6], [3, 10]) # The second and third are the intervals of our PDF

print(dist.probability_table(force=20)) # if force gets more, many number will generate

输出:

X/Y	x=3.0	X=3.7	...	X=10
X=1.0	0.000	0.000	...	0.000
...	...	...	...	...
X=6.0	0.055	0.044	...	0.016

复制代码

print(dist.get_prob(3.5, 3.5)) # 0.081 is P(X=3.5, Y=3.5)
print(dist.get_prob([1, 6], 4)) # 0.041 is P(Y=4) because X includes its whole domain
print(dist.get_prob(2.1, [1, 4])) # 0.206 is P(X=2.1, Y in [1, 4])

实例4

假设我们有一个具有特定PMF的离散联合变量。

复制代码

from pprobs.distribution import Joint

def pmf(x, y):
  if x > 1:
    return 1 / (x * y)
  return 0

dist = Joint(pmf, range(1, 6), range(6, 10)) # The second and third are the sample space of our PMF

print(dist.probability_table())

输出:

X/Y	Y=6	Y=7	Y=8	Y=9
X=1	0.000000	0.000000	0.000000	0.000000
X=2	0.083333	0.071429	0.062500	0.055556
X=3	0.055556	0.047619	0.041667	0.037037
X=4	0.041667	0.035714	0.031250	0.027778
X=5	0.033333	0.028571	0.025000	0.022222

复制代码

print(dist.get_prob(2, range(6, 10))) # 0.272 is P(X=2)
print(dist.get_prob(2, 6)) # 0.083 is P(X=2, Y=6)

谢谢你在Github上给我一颗星。mokar2001 (MohammadReza KarimiNejad) · GitHub