使用R语言的BNLearn包实现贝叶斯网络

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

使用R语言的BNLearn包实现贝叶斯网络(1)

标签:生活2013-08-01 22:26 星期四

1. 加载程序包导入数据

library(bnlearn) #CRAN中有,可以直接用install.packages(“bnlearn”)安装或者去网上下载后复制到library文件夹下即可。

library(Rgraphviz) #用于绘图。这个包CRAN中没有,需要到

/packages/release/BiocViews.html#___Software去下载。

data(learning.test) #导入数据,数据框中的变量必须全部为因子型(离散)或数值型(连续)。

lear.test =read.csv("***.csv", colClasses ="factor") #也可以直接从csv文件直接导入数据。需要注意的是如果数据中含有0-1之类的布尔型,或者1-3之类的等级数据,需要强行指定其为因子型,不然许多BN函数会报错。因为read函数只会自动的将字符型转换成因子型,其他的不会自动转换。

该包包含贝叶斯网络的结构学习、参数学习和推理三个方面的功能,其中结构学习包含基于约束的算法、基于得分的算法和混合算法,参数学习包括最大似然估计和贝叶斯估计两种方法。

此外还有引导(bootstrap),交叉验证(cross-validation)和随机模拟(stochastic simulation)等功能,附加的绘图功能需要调用前述的Rgraphviz and lattice包。 Bayesian network structure learning (via constraint-based, score-based and hybrid algorithms), parameter learning (via ML and Bayesian estimators) and inference. This package implements some algorithms for learning the structure of Bayesian networks. Constraint-based algorithms, also known as conditional independence learners, are all optimized derivatives of the Inductive Causation algorithm (Verma and Pearl, 1991).

These algorithms use conditional independence tests to detect the Markov blankets of the variables, which in turn are used to compute the structure of the Bayesian network.

Score-based learning algorithms are general purpose heuristic optimization algorithms which rank network structures with respect to a goodness-of-fit score.

Hybrid algorithms combine aspects of both constraint-based and score-based algorithms, as they use conditional independence tests (usually to reduce the search space) and network scores (to find the optimal network in the reduced space) at the same time. Several functions for parameter estimation, parametric inference, bootstrap, cross-validation and stochastic simulation are available. Furthermore, advanced plotting capabilities are implemented on top of the Rgraphviz and lattice

packages.

使用R语言的BNLearn包实现贝叶斯网络(2)

标签:生活2013-08-01 22:27 星期四

2 基于约束的算法

Bnlearn包中可使用的基于约束的算法有gs、iamb、fast.iamb、inter.iamb。

Available constraint-based learning algorithms

引用方法很简单,就是函数名加数据框作为参数就可以了。做结构学习的时候还可以自定义黑名单、白名单列表,在学习中引入专家知识。

res = gs(learning.test)

Grow-Shrink算法(GS):是第一个(也是最简单)将马尔科夫边界检测算法(Margaritis,2003年)用于结构学习的算法。伸展/收缩。

Grow-Shrink (gs): based on the Grow-Shrink Markov Blanket, the first (and simplest) Markov blanket detection algorithm (Margaritis, 2003) used in a structure learning algorithm.

Incremental Association(iamb):基于马尔可夫边界检测算法相同的名称(Tsamardinos 等,2003),这是基于两个阶段的选择方案(一个向前的选择后紧跟着尝试消除误报)。增量协会

Incremental Association (iamb): based on the Markov blanket detection algorithm of the same name (Tsamardinos et al., 2003), which is based on a two-phase selection scheme (a forward selection followed by an attempt to remove false positives). Fast Incremental Association(fast.iamb):IAMP使用投机逐步向前选择条件独立测试的人数减少(Yaramakala Margaritis,2005年)的一个变种。快速增量协会

Fast Incremental Association (fast.iamb): a variant of IAMB which uses speculative stepwise forward selection to reduce the number of conditional independence tests (Yaramakala and Margaritis, 2005).

Interleaved Incremental Association(inter.iamb):IAMP的另一个变种,采用向前逐步选择(Tsamardinos等,2003),以避免误报马尔可夫边界检测阶段。交错增量协会 Interleaved Incremental Association (inter.iamb): another variant of IAMB which uses forward stepwise selection (Tsamardinos et al., 2003) to avoid false positives in the Markov blanket detection phase.

这些算法的计算复杂度是多项式的测试的数量,通常为O(N ^ 2)(O(N ^ 4)在最坏的

情况下),其中N是变量的数目。执行的时间尺度线性数据集的大小。

The computational complexity of these algorithms is polynomial in the number of tests, usually O(N^2) (O(N^4) in the worst case scenario), where N is the number of variables. Execution time scales linearly with the size of the data set.

条件独立测试

(有条件)独立测试

Available (conditional) independence tests

基于约束的算法在实践中使用的条件独立测试,统计测试数据集。可用的测试(以及相应的标签)包括:

The conditional independence tests used in constraint-based algorithms in practice are statistical tests on the data set. Available tests (and the respective labels) are:

离散情况(多项式分布)

discrete case (multinomial distribution)

相关文档
最新文档