人工智能09贝叶斯网络(PPT57页)
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
• 在概率图模型中 – 每个节点表示一个随机变量(or 一组随机 变量)
8
Graphical Models in CS
• 处理不确定性和复杂性的天然工具 –贯穿整个应用数学和工程领域
• 图模型中最重要的思想是模块性概念 – a complex system is built by combining simpler parts.
13
Bayesian networks
一种简单的,图形化的数据结构,用于表示 变量之间的依赖 关系(条件独立性),为任何全联合概率 分布提供一种简 明的规范。
Syntax语法:
a set of nodes, one per variable
a directed(有向) , acyclic(无环) graph
32
因果关系?
• 当贝叶斯网络反映真正的因果模式时: – Often simpler (nodes have fewer parents) – Often easier to think about – Often easier to elicit from experts(专家)
• BNs 不一定必须是因果 – 有时无因果关系的网络是存在的 (especiallyFra Baidu bibliotekif variables are missing) – 箭头反映相关性,而不是因果关系
9
Why are Graphical Models useful
• 概率理论提供了“黏合剂”whereby – 使每个部分连接起来, 确保系统作为一个 整体是一致的 – 提供模型到数据的连接方法.
• 图理论方面提供:
–直观的接口
• by which humans can model highly-
在非因果方向决定条件独立性是很难的 (Causal models and conditional independence seem hardwired for humans!) Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed
– Sum Rule (加法规则) • 变量的概率是通过边缘化或者求和其他变量
获得的
– Product Rule (乘法规则) 5
大纲
• Graphical models (概率图模型) • Bayesian networks
– Syntax(语法) – Semantics(语义) • Inference(推导) in Bayesian networks
一个具有k个布尔父节点的布尔变量的条件概 率表中有2k个独立的可指定概率
Each row requires one number p for Xi = true (the number for Xi = false is just 1-p)
If each variable has no more than k parents, the
23
Common Effect共同影响
• 最后一种配置形态: two causes of one
effect (v-structures)
– Are X and Z independent?
• Yes: remember the ballgame and the rain causing traffic, no correlation?
网络拓扑结构反映出因果关系:
– A burglar can set the alarm off
16
Example contd.
17
Compactness(紧致性)
A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values
Global semantics(全局语义)
The full joint distribution is defined as the product of the local conditional distributions: 全联合概率分布可以表示为贝叶斯网络中 的条件概率分布的乘积
20
Local semantics
需要一种方法使得局部的条件独立关系能够保 证全局语义得以成立
1. Choose an ordering of variables X1, … ,Xn 2. For i = 1 to n
add Xi to the network select parents from X1, … ,Xi-1 such that 25
构造贝叶斯网络
要求网络的拓扑结构确实反映了合适的父节 点集对每个变量的那些直接影响。
添加节点的正确次序是首先添加“根本原因” 节点,然后加入受它们直接影响的变量, 以此类推。
26
Example
27
Example
28
Example
29
Example
30
Example
31
Example contd.
Joint probability distribution specifies probability of every atomic event
全联合概率分布指定了对随机变量的每种完 全赋值,即每个原子事件的概率
3
Independence /Conditional Independence
A and B are independent iff P(A| B) = P(A) or P(B| A) = P(B) or P(A, B) = P(A) P(B)
(link ≈ "direct influences")
14
Example
Topology(拓扑结构) of network encodes conditional independence assertions:
Weather 独立于其他变量
Toothache and Catch are conditionally
– Are X and Z independent given Y?
• No: remember that seeing traffic put the rain and the ballgame in competition?
– This is backwards from the other cases
opportunities.
“某事发生的概率是0.1” 意味着0.1是在无穷 多样本的极限
条件下能够被观察到的比例
但是,在许多情景下不可能进行重复试
验
2
Probability概率
Probability is a rigorous formalism for uncertain knowledge
概率是对不确定知识一种严密的形式化方法
– Is X independent of Z given Y?
22
Common Cause共同原因
• 另一个基础的形态: two effects of the same cause – Are X and Z independent? – Are X and Z independent given Y?
k
18
Global semantics(全局语义)
The full joint distribution is defined as the product of the local conditional distributions: 全联合概率分布可以表示为贝叶斯网络中 的条件概率分布的乘积
19
A is conditionally independent of B given C: P(A | B, C) = P(A | C)
在大多数情况下,使用条件独立性能将全联
合概率的表示由n的指数关系减为n的线性
关系。
4
Probability Theory
Probability theory can be expressed in terms of two simple equations概率理论可使用两个简 单线性方程来表达
最优决策: decision networks include utility 35
通过枚举进行推理
上一章解释了任何条件概率都可以通过将全 联合分布表中的某些项相加而计算得到
在贝叶斯网络中可以通过计算条件概率的乘 积并求和来回答查询。
36
通过枚举进行推理
上一章解释了任何条件概率都可以通过将全 联合分布表中的某些项相加而计算得到
6
什么是图模型?
概率分布的图表示 – 概率论和图论的结合
• Also called 概率图模型 • They augment analysis instead of using pure
algebra(代数)
7
What is a Graph?
• Consists of nodes (also called vertices) and links (also called edges or arcs)
12
图的方向性
• 有向图模型 – 方向取决于箭头
• 贝叶斯网络 – 随机变量间的因果 关系
• More popular in AI and statistics
• 无向图模型 – 边没有箭头
• Markov random fields (马尔科夫随机场) –更适合表达变量之间的软
约束
• More popular in Vision and physics
11
图模型在机器学习中的角色
1. 形象化概率模型结构的简单方法
2. Insights into properties of model Conditional independence properties by inspecting graph
3. 执行推理和学习表示为图形化操作需要复 杂的计算
independent given Cavity
15
Example
我晚上在单位上班,此时邻居John给我打电 话说我家警报响了,但是邻居Mary没有给 打电话。有时轻微的地震也会引起警报。 那么我家真正遭贼了吗?
Variables: Burglary(入室行窃) , Earthquake, Alarm, JohnCalls, MaryCalls
33
Inference in Bayesian networks
34
推理任务
简单查询: 计算后验概率P(Xi|E=e) e.g., P(NoGas| Gauge油表=empty, Lights=on, Starts=false)
联合查询 : P(Xi,Xj| E=e) = P(Xi| E=e)P(Xj| Xi,E=e)
• Observing the effect enables influence
between causes.
24
构造贝叶斯网络
Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics
interacting sets of variables
– 数据结构
10
Graphical models: 统一的框架
• 考虑传统的多变量的概率系统作为一般基础 形式的实例 – mixture models(混合模型) , factor analysis(因子分析) , hidden Markov models, Kalman filters(卡尔曼滤波器) , etc. –在系统工程,信息论,模式识别和统计力 学中被用到
Bayesian networks 贝叶斯网络
1
Frequentist vs. Bayesian
客观 vs. 主观
Frequentist(频率主义者) : 概率是长期的预 期出现频率. P(A) = n/N, where n is the
number of times event A occurs in N
Local semantics: each node is conditionally independent of its nondescendants(非后代) given its parents
给定父节点,一个节点与它的非后代节点是 条件独立的
21
Causal Chains因果链
• 一个基本形式:
8
Graphical Models in CS
• 处理不确定性和复杂性的天然工具 –贯穿整个应用数学和工程领域
• 图模型中最重要的思想是模块性概念 – a complex system is built by combining simpler parts.
13
Bayesian networks
一种简单的,图形化的数据结构,用于表示 变量之间的依赖 关系(条件独立性),为任何全联合概率 分布提供一种简 明的规范。
Syntax语法:
a set of nodes, one per variable
a directed(有向) , acyclic(无环) graph
32
因果关系?
• 当贝叶斯网络反映真正的因果模式时: – Often simpler (nodes have fewer parents) – Often easier to think about – Often easier to elicit from experts(专家)
• BNs 不一定必须是因果 – 有时无因果关系的网络是存在的 (especiallyFra Baidu bibliotekif variables are missing) – 箭头反映相关性,而不是因果关系
9
Why are Graphical Models useful
• 概率理论提供了“黏合剂”whereby – 使每个部分连接起来, 确保系统作为一个 整体是一致的 – 提供模型到数据的连接方法.
• 图理论方面提供:
–直观的接口
• by which humans can model highly-
在非因果方向决定条件独立性是很难的 (Causal models and conditional independence seem hardwired for humans!) Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed
– Sum Rule (加法规则) • 变量的概率是通过边缘化或者求和其他变量
获得的
– Product Rule (乘法规则) 5
大纲
• Graphical models (概率图模型) • Bayesian networks
– Syntax(语法) – Semantics(语义) • Inference(推导) in Bayesian networks
一个具有k个布尔父节点的布尔变量的条件概 率表中有2k个独立的可指定概率
Each row requires one number p for Xi = true (the number for Xi = false is just 1-p)
If each variable has no more than k parents, the
23
Common Effect共同影响
• 最后一种配置形态: two causes of one
effect (v-structures)
– Are X and Z independent?
• Yes: remember the ballgame and the rain causing traffic, no correlation?
网络拓扑结构反映出因果关系:
– A burglar can set the alarm off
16
Example contd.
17
Compactness(紧致性)
A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values
Global semantics(全局语义)
The full joint distribution is defined as the product of the local conditional distributions: 全联合概率分布可以表示为贝叶斯网络中 的条件概率分布的乘积
20
Local semantics
需要一种方法使得局部的条件独立关系能够保 证全局语义得以成立
1. Choose an ordering of variables X1, … ,Xn 2. For i = 1 to n
add Xi to the network select parents from X1, … ,Xi-1 such that 25
构造贝叶斯网络
要求网络的拓扑结构确实反映了合适的父节 点集对每个变量的那些直接影响。
添加节点的正确次序是首先添加“根本原因” 节点,然后加入受它们直接影响的变量, 以此类推。
26
Example
27
Example
28
Example
29
Example
30
Example
31
Example contd.
Joint probability distribution specifies probability of every atomic event
全联合概率分布指定了对随机变量的每种完 全赋值,即每个原子事件的概率
3
Independence /Conditional Independence
A and B are independent iff P(A| B) = P(A) or P(B| A) = P(B) or P(A, B) = P(A) P(B)
(link ≈ "direct influences")
14
Example
Topology(拓扑结构) of network encodes conditional independence assertions:
Weather 独立于其他变量
Toothache and Catch are conditionally
– Are X and Z independent given Y?
• No: remember that seeing traffic put the rain and the ballgame in competition?
– This is backwards from the other cases
opportunities.
“某事发生的概率是0.1” 意味着0.1是在无穷 多样本的极限
条件下能够被观察到的比例
但是,在许多情景下不可能进行重复试
验
2
Probability概率
Probability is a rigorous formalism for uncertain knowledge
概率是对不确定知识一种严密的形式化方法
– Is X independent of Z given Y?
22
Common Cause共同原因
• 另一个基础的形态: two effects of the same cause – Are X and Z independent? – Are X and Z independent given Y?
k
18
Global semantics(全局语义)
The full joint distribution is defined as the product of the local conditional distributions: 全联合概率分布可以表示为贝叶斯网络中 的条件概率分布的乘积
19
A is conditionally independent of B given C: P(A | B, C) = P(A | C)
在大多数情况下,使用条件独立性能将全联
合概率的表示由n的指数关系减为n的线性
关系。
4
Probability Theory
Probability theory can be expressed in terms of two simple equations概率理论可使用两个简 单线性方程来表达
最优决策: decision networks include utility 35
通过枚举进行推理
上一章解释了任何条件概率都可以通过将全 联合分布表中的某些项相加而计算得到
在贝叶斯网络中可以通过计算条件概率的乘 积并求和来回答查询。
36
通过枚举进行推理
上一章解释了任何条件概率都可以通过将全 联合分布表中的某些项相加而计算得到
6
什么是图模型?
概率分布的图表示 – 概率论和图论的结合
• Also called 概率图模型 • They augment analysis instead of using pure
algebra(代数)
7
What is a Graph?
• Consists of nodes (also called vertices) and links (also called edges or arcs)
12
图的方向性
• 有向图模型 – 方向取决于箭头
• 贝叶斯网络 – 随机变量间的因果 关系
• More popular in AI and statistics
• 无向图模型 – 边没有箭头
• Markov random fields (马尔科夫随机场) –更适合表达变量之间的软
约束
• More popular in Vision and physics
11
图模型在机器学习中的角色
1. 形象化概率模型结构的简单方法
2. Insights into properties of model Conditional independence properties by inspecting graph
3. 执行推理和学习表示为图形化操作需要复 杂的计算
independent given Cavity
15
Example
我晚上在单位上班,此时邻居John给我打电 话说我家警报响了,但是邻居Mary没有给 打电话。有时轻微的地震也会引起警报。 那么我家真正遭贼了吗?
Variables: Burglary(入室行窃) , Earthquake, Alarm, JohnCalls, MaryCalls
33
Inference in Bayesian networks
34
推理任务
简单查询: 计算后验概率P(Xi|E=e) e.g., P(NoGas| Gauge油表=empty, Lights=on, Starts=false)
联合查询 : P(Xi,Xj| E=e) = P(Xi| E=e)P(Xj| Xi,E=e)
• Observing the effect enables influence
between causes.
24
构造贝叶斯网络
Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics
interacting sets of variables
– 数据结构
10
Graphical models: 统一的框架
• 考虑传统的多变量的概率系统作为一般基础 形式的实例 – mixture models(混合模型) , factor analysis(因子分析) , hidden Markov models, Kalman filters(卡尔曼滤波器) , etc. –在系统工程,信息论,模式识别和统计力 学中被用到
Bayesian networks 贝叶斯网络
1
Frequentist vs. Bayesian
客观 vs. 主观
Frequentist(频率主义者) : 概率是长期的预 期出现频率. P(A) = n/N, where n is the
number of times event A occurs in N
Local semantics: each node is conditionally independent of its nondescendants(非后代) given its parents
给定父节点,一个节点与它的非后代节点是 条件独立的
21
Causal Chains因果链
• 一个基本形式: