贝叶斯正则化Bayesian BP Regulation


LM 优化算法和贝叶斯正则化算法

LM 优化算法和贝叶斯正则化算法

% 采用贝叶斯正则化算法提高 BP 网络的推广能力。

在本例中,我们采用两种训练方法,%即 L-M 优化算法(trainlm)和贝叶斯正则化算法(trainbr),% 用以训练 BP 网络,使其能够拟合某一附加有白噪声的正弦样本数据。

其中,样本数据可以采用如下% MATLAB 语句生成:% 输入矢量:P = [-1:0.05:1];% 目标矢量:randn(‘seed’,78341223);% T = sin(2*pi*P)+0.1*randn(size(P));% MATLAB 程序如下:close allclear allclc% P 为输入矢量P = [-1:0.05:1];% T 为目标矢量T = sin(2*pi*P)+0.1*randn(size(P));% 创建一个新的前向神经网络net=newff(minmax(P),[20,1],{'tansig','purelin'});disp('1. L-M 优化算法 TRAINLM'); disp('2. 贝叶斯正则化算法TRAINBR');choice=input('请选择训练算法(1,2):');if(choice==1)% 采用 L-M 优化算法 TRAINLMnet.trainFcn='trainlm';% 设置训练参数net.trainParam.epochs = 500; net.trainParam.goal = 1e-6;% 重新初始化net=init(net);pause;elseif(choice==2)% 采用贝叶斯正则化算法 TRAINBR net.trainFcn='trainbr';% 设置训练参数net.trainParam.epochs = 500; % 重新初始化net = init(net);pause;开放教育试点汉语言文学专业毕业论文浅谈李白的诗文风格姓名:李小超学号:20097410060058学校:焦作电大指导教师:闫士有浅谈李白的诗文风格摘要:李白的浪漫主义诗风是艺术表现的最高典范,他把艺术家自身的人格精神与作品的气象、意境完美结合,浑然一体,洋溢着永不衰竭和至高无上的创造力。








二、常见的算法调参方法1. 网格搜索(Grid Search)网格搜索是一种常用的调参方法,它通过系统地遍历参数空间中的每一种可能组合,从而寻找最佳的参数组合。


2. 随机搜索(Random Search)随机搜索与网格搜索相比,它在参数空间中随机采样一组参数,不同于网格搜索的是,随机搜索是基于随机选择,因此可以避免网格搜索的缺点。


3. 贝叶斯优化(Bayesian Optimization)贝叶斯优化是一种基于贝叶斯推断的调参方法。



4. 集成学习(Ensemble Methods)集成学习是指通过组合多个预测模型来提高模型的整体性能。


三、参数的选择和调节在进行算法调参时,需要重点关注以下几个方面的参数:1. 学习率(Learning Rate)学习率是指每一轮迭代中参数更新的步长。







1. 网格搜索(Grid Search)网格搜索是最常用和直观的参数调优方法之一。






2. 随机搜索(Random Search)随机搜索是一种更高效的参数调优方法。





3. 贝叶斯优化(Bayesian Optimization)贝叶斯优化是一种通过构建模型来优化目标函数的参数调优方法。








贝叶斯正则化的B P神经网络在经济预测中的应用
华中师范大学数学与统计学学院 李旭军
[摘 要]本文应用Bayesian正则化算法改进BP神经网络泛化能力。



[关键词]BP神经网络 贝叶斯正则 经济预测

























q ( z ) p H0
2.派生贝叶斯准则及计算 例2:考虑二元假设, H0: z=n H1: z=A+n 其中n~N(0, n ),先验概率p=q=0.5且A>0,试根据一次观测数
2.派生贝叶斯准则及计算 解:
2.派生贝叶斯准则及计算 1、最小总错误概率准则(MPE)
பைடு நூலகம்
c00 c11 0, c01 c10 1 ,即正确的
C C00 Pc C10 q C01 C11PD p q p Pe
f ( z)
Z1 Z1
C10 C00 q p( z | H1 ) p( z | H 0 ) 0 C01 C11 p

p ( z | H1 ) 似然比: ( z ) p( z | H 0 )

(C10 C00 )q (C01 C11 ) p
C10 C00 q C C01 C11 p
C10 C00 q C 1 p( z | H1 ) p( z | H 0 ) dz Z1 C01 C11 p
C10 C00 q f ( z ) p( z | H1 ) p( z | H 0 ) C01 C11 p
1.贝叶斯准则及计算 解 : ( 1)
1 p( z | H 0 ) exp ( z 1)2








1. 最小二乘法(Least Squares Method):最小二乘法是一种常见的参数估计方法,用于拟合线性回归模型。




2. 最大似然估计(Maximum Likelihood Estimation):最大似然估计是一种常见的参数估计方法,用于估计模型参数使得给定观测数据的概率最大。



3. 贝叶斯估计(Bayesian Estimation):贝叶斯估计是一种基于贝叶斯定理的参数估计方法,用于估计模型参数的后验分布。













1. 网格搜索(Grid Search)网格搜索是一种穷举搜索的方法,它会遍历超参数空间中的所有可能组合,以找到最佳组合。


2. 随机搜索(Random Search)与网格搜索不同的是,随机搜索不会遍历超参数空间中的所有可能组合,而是在随机的超参数组合中选择。


3. 贝叶斯优化(Bayesian Optimization)贝叶斯优化是一种基于贝叶斯定理和高斯过程回归的方法。



4. 学习曲线(Learning Curve)学习曲线是一种可视化方法,它会绘制不同超参数组合下训练集和测试集的准确率(或其他性能指标)与训练数据量之间的关系。


5. 验证曲线(Validation Curve)验证曲线与学习曲线类似,它会绘制某一个超参数的不同取值下的训练集和测试集准确率(或其他性能指标)与超参数取值之间的关系。


6. 交叉验证(Cross-Validation)交叉验证是一种将数据集划分为多个子集的方法。



7. 正则化(Regularization)正则化是一种限制模型复杂度的方法。


Se l 数 据 标 准 化 tp :
对 于有P 个变 量 ( X - X ) 容量 为n X , 2' 。, - 的样本 , 用矩 阵形 式
表示 为 :
f 1 … Xl Xl p] 、
保 证有较 好 的泛化 能力 , 成为 预测精 度提高 的关键 。
主 分 量 分 析 ( r cpl o o e t A a s ) 系 统 降 维 和 P nia mp nns n l i 是 i C ys 特 征提 取 的一 种基本 方 法 . 过线 性变 换 . 原来 的多个 指标 通 将
第9 第 l期 卷 2
2 1年 1 00 2月
软 件 导 刊
Sot r fwae Guie d
Vo . 1 19No.2
Twe 2 0 . 01
贝叶斯正则化 B P神经 网络在油气钻井成本预测 的应用
袁 姝 胡宏 涛 赵 , , 越
(. 安 石 油 大 学 计 算 机 学 院 , 西 西 安 7 0 6 2西 北 工 业 大 学 管 理 学 院 , 西 西 安 7 0 7 ) 1西 陕 10 5;. 陕 10 2 摘 要 : 反 映 油 田 绩 效 的 油 气 钻 井成 本 进 行 准 确 预 测 , 助 于做 出科 学 的 决 策 和 评 估 。 了解 决 在 运 用 B 对 有 为 P神 经 网 络 进 行 油 气 钻 井 成 本 预 测过 程 中 , 气 钻 井 成 本 影 响 因子 确 定 难 以及 标 准 B 油 P神 经 网 络 泛 化 能 力 差 的 问题 。 立 了 建 基 于 主 分 量 分 析 的 贝叶 斯 正 则化 的 B P神 经 网 络 油 气 钻 井 成 本 预 测 模 型 , 结 合 中 国石 油 某 公 司 各 区 块 钻 井 成 本 数 并 据 . 证 了该 模 型 具 有 较 高 的 预 测 精 度 及 实 用 性 。 验 关 键 词 : 分 量 分 析 ; 气钻 井成 本 预 测 : P神 经 网 络 ; 主 油 B 贝叶 斯 正 则化 中图分 类号 :P l. T 312 5 文 献标识码 : A 文 章 编 号 :6 2 7 0 ( 0 0 1— 10 0 1 7 — 80 2 1 )2 0 3 — 3



贝叶斯推断正则化(Bayesian inference regularization)指的是在贝叶斯推断过程中,通过引入正则化项来约束模型的参数,以减小模型过拟合的风险。














讨论 本文利用贝叶斯正规化算法对 BP 神经网络进
行改进 , 在保证网络训练误差尽可能小的情况下使
山西医科大学学报 ( J Shanx i M ed U niv ) 2008 年 9 月 , 39 ( 9) 文章编号 : 1007- 6611( 2008) 09- 0835- 03
HPLC 法测定光合细菌菌液中辅酶 Q10 的含量
数, y nk 为期望输出 , c nk 为网络实际输出。在正规化 方法中, 网络性能函数改进为 : F = E W + ED ( 2)
其中 E W 为网络权重的平方和 , E D 为是网络 响应和目标值的残差平方和 , 、 控制着其他参数 ( 权及阈值 ) 的分布形式 , 若 , 则训练算法目的 在于尽量减小网络的训练误差 ; 若 ! , 则训练算 法目的在于使网络产生更为平滑的响应 , 即尽可能 减少有效的网络参数, 以弥补较大的网络误差。 常规的正规化方法很难确定 、 的大小, 本研 究采用贝叶斯方法来确定 , 它将在网络中选定 与 值, 将网络权值视为随机变量 , 认为训练数据集 D 与权集 W 的先验概率服从高斯分布 , 再按贝叶斯准 则, 由后验概率的最大化解得目标函数 F ( w ) 最小 点 w MP 处的 和
∀ ∀ ∀ ∀
步时, 网络训练收敛, 此时网络的误差平方和 ( SSE) 和网 络 权 值 的 平 方 和 ( SSW ) 均 为 恒 值 , SSE = 0. 011 396 8, SSW= 11. 435 6。当前有效网络的参 数 ( 有效权值和阈值) 的个数为 4. 209 98。 2. 2 BP 神经网络拟合效果 神经网络训练完成 后 , 把样本输入已确定的神经网络检验 拟合效果 , 见图 1。图 1 中的圆圈代表各样本的预测值和实际 值组成的坐标点( T , A ) , 点线代表 A ( 预测值 ) = T
















常见算法有逻辑回归(Logistic Regression)和反向传递神经网络(Back Propagation Neural Network)非监督式学习:在非监督式学习中,数据并不被特别标识,学习模型是为了推断出数据的一些内在结构。







Can we directly control the posterior distributions?

An extra freedom to perform Bayesian inference Arguably more direct to control the behavior of models Can be easier and more natural in some examples
“There are two potent arrows in the statistician‟s quiver there is no need to go hunting armed with only one.”
Parametric Bayesian Inference
is represented as a finite set of parameters
Indian Buffet Process Prior [Griffiths & Gharamani, 2005] + Gaussian/Sigmoid/Softmax likelihood
Gaussian Process Prior [Doob, 1944; Rasmussen & Williams, 2006] + Gaussian/Sigmoid/Softmax likelihood

Elicit expert knowledge E.g., logic rules
Others … (ongoing & future work)

E.g., decision making, cognitive constraints, etc.



当新到一封邮件时,按照步骤2,生成TOKEN串。查 询hashtable_probability得到该TOKEN 串的键值。 假设由该邮件共得到N个TOKEN 串, t1,t2…….tn,hashtable_probability中对应的值为 P1 , P2 , ……PN , P(A|t1 ,t2, t3……tn) 表示在邮件中 同时出现多个TOKEN串t1,t2……tn时,该邮件为垃 圾邮件的概率。

计算得在本表中: “法”出现的概率为 0.3 “轮”出现的概率为 0.3 “功”出现的概率为 0.3

根据邮件B生成hashtable_good,该哈希 表中的记录为: 法: 1 次 律: 1 次 计算得在本表中: “法”出现的概率为 0.5 “律”出现的概率为 0.5

2. 提取邮件主题和邮件体中的独立字符 串,例如 ABC32,¥234等作为TOKEN 串并统计提取出的TOKEN串出现的次 数即字频。按照上述的方法分别处理垃 圾邮件集和非垃圾邮件集中的所有邮件。

3. 每一个邮件集对应一个哈希表, hashtable_good对应非垃圾邮件集而 hashtable_bad对应垃圾邮件集。表中存 储TOKEN串到字频的映射关系。

出现“功”时,该邮件为垃圾邮件的概率 为: P = 0.3/ ( 0.3 + 0 ) = 1

出现“律”时,该邮件为垃圾邮件的概率 为: P = 0/ ( 0 + 0.5 ) = 0

由此可得第三个哈希表 hasቤተ መጻሕፍቲ ባይዱtable_probability ,其数据为: 法: 0.375 轮: 1 功: 1 律: 0









1.1 网格搜索(Grid Search)网格搜索是一种常用的参数调优技巧。



1.2 随机搜索(Random Search)与网格搜索不同,随机搜索并不遍历所有可能的参数组合,而是从给定的参数空间中随机选择参数进行验证。


1.3 贝叶斯优化(Bayesian Optimization)贝叶斯优化是一种更加高效的参数调优方法,它利用贝叶斯推断的思想,在已经进行的实验结果中进行采样和建模,来选择下一个参数组合进行验证。




2.1 过滤方法(Filter Methods)过滤方法通过单独评估每个特征与目标变量之间的相关性,然后选择相关性较高的特征。


2.2 包裹方法(Wrapper Methods)包裹方法将特征选择视为一个搜索问题,通过训练机器学习算法来确定最佳的特征子集。



入局 部极值 , 导致 权值收敛 到局 部极 小点, 从而导致 网络 训练 原始数 据, 筛选 出有 关的数据 并加 以关联融合, 分析 出网络服
本文在 吸收 以上两种预测算法优 点, 结合 网络安全态势值 服务的网络服务安全指数 。 具有非线性时间序列的特点 , 利用神经 网络处理非线 性数据 的
进行教据分析, 验证了 该预测方法可以减小了 训练误 差和预测误差, 提高了 对网络安全态势预测精度 , 证明了 该方法的可行性。
关键 词: 贝叶斯正则化 ; B P 神 经网络; 网络 安 全态 势; 态势预 测
The e l e c t r i c po we r ne t w or k s e c ur it y s i t ua t i on pr e di c t i o n a bout
型的 网络安全 态势预 测方法 , 最后进 行了实验 仿真, 说明了该 预测方法的有效性和科 学性。
第二 , 根 据第一步的服务信息, 然后计算 网络 中活 动主机
第三 , 收集网络运 行时主机系统 的 , 提 出一种 基于贝叶斯的B P 神经网络模 系统中每项服务 的权 重, 从而获得网络系统安全指数 。
时代而生 。
近年来 , 网络 安全 问题 愈发凸显 , 分析及预测 网络 安网络 中主机系统和网络设备产生的 日志、 告警等数据, 利用 自下而上 安全 态势 , 对于 网络安全具有 重要意义 。 文献 提 出了基于贝叶 网络安 全态势值量化 策略, 对 网络 态势指标 进行量化 。 在实 斯 网络的网络 安全态势 评估方法研 究, 但该方法 对事物 的推 断 验 环境下, 提 取网络运行 时的多种设备 的性 能参 数 , 从而更真 必须 且只须根 据后验分布, 而不能再涉及样本分布 。 文献 使 用 实反映网络 的安全态势状况。 建模过程如下: B P 神经 网络对 网络安全态势进行 评估, 该方 法会可能使训练陷



9.520:第20课Bayesian解释Tomaso Poggio 和 Sayan Mukherjee计 划 •正则化的Bayesian解释•正则化因子的Bayesian解释•二次型损失的Bayesian解释•SVM损失的Bayesian解释•MAP的一致性检查和对于二次型损失的平均解•从数据合成核:bayesian基础•选择(被称为“对齐”)可看作核综合的特例RN,SVM,以及回归中的BPD的Bayesian解释考虑我们将会证明存在一个RN的Bayesian解释,其数据项(具有损失函数)是一个噪声模型。


定义1.{(,)}l i i D x y =,1,...,i l =是训练样本的集合2. 给定样本g ,[|]l P f D 是函数f 的条件概率3. 给定f ,[|]l P D f 是g 的条件概率,即,噪声模型4. []P f 是随机场f 的先验概率后验概率P f g可以通过应用如下Bayes规则计算。

后验分布[|]P D f为如果噪声是方差为σ的正态分布,则概率[|]lZ是归一化常数。

式中l后验概率用非正式的方式(我们将会在后面使它更精确),如果Z是另一个归一化常数,则式中rMAP估计P f D,对f的许多可能估计之一是所谓的MAP估计,即由[|]l上式与正则化泛函是相同的,如果数据项的Bayesian解释(二次型损失)如我们刚刚证明的,二次型损失(标准RN的情形)对应于Bayesian解释中假定数据iy受加性独立高斯噪声过程的影响,即()i i iy f xε=+且,[]2j j i jEεεδ=数据项的Bayesian解释(非二次型损失)为了找到SVM损失的Bayesian解释,我们现在假定一个更一般形式的噪声。

我们假定数据受加性独立噪声的影响,该噪声取样于一个平均值为µ,方差为β的高斯分布的连续混合,即前面的二次型损失对应于数据项的Bayesian 解释(绝对损失)为了求得产生给定损失函数()V γ的(,)P βµ,我们必须求解式中()y f x γ=−。

贝叶斯正则化 训练

贝叶斯正则化 训练



















过拟合处理 贝叶斯方法正则化

过拟合处理 贝叶斯方法正则化



























  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

APPLICATION OF BAYESIAN REGULARIZED BP NEURALNETWORK MODEL FOR TREND ANALYSIS,ACIDITY ANDCHEMICAL COMPOSITION OF PRECIPITATION IN NORTHCAROLINAMIN XU1,GUANGMING ZENG1,2,∗,XINYI XU1,GUOHE HUANG1,2,RU JIANG1and WEI SUN21College of Environmental Science and Engineering,Hunan University,Changsha410082,China;2Sino-Canadian Center of Energy and Environment Research,University of Regina,Regina,SK,S4S0A2,Canada(∗author for correspondence,e-mail:zgming@,ykxumin@,Tel.:86–731-882-2754,Fax:86-731-882-3701)(Received1August2005;accepted12December2005)Abstract.Bayesian regularized back-propagation neural network(BRBPNN)was developed for trend analysis,acidity and chemical composition of precipitation in North Carolina using precipitation chemistry data in NADP.This study included two BRBPNN application problems:(i)the relationship between precipitation acidity(pH)and other ions(NH+4,NO−3,SO2−4,Ca2+,Mg2+,K+,Cl−and Na+) was performed by BRBPNN and the achieved optimal network structure was8-15-1.Then the relative importance index,obtained through the sum of square weights between each input neuron and the hidden layer of BRBPNN(8-15-1),indicated that the ions’contribution to the acidity declined in the order of NH+4>SO2−4>NO−3;and(ii)investigations were also carried out using BRBPNN with respect to temporal variation of monthly mean NH+4,SO2−4and NO3−concentrations and their optimal architectures for the1990–2003data were4-6-1,4-6-1and4-4-1,respectively.All the estimated results of the optimal BRBPNNs showed that the relationship between the acidity and other ions or that between NH+4,SO2−4,NO−3concentrations with regard to precipitation amount and time variable was obviously nonlinear,since in contrast to multiple linear regression(MLR),BRBPNN was clearly better with less error in prediction and of higher correlation coefficients.Meanwhile,results also exhibited that BRBPNN was of automated regularization parameter selection capability and may ensure the excellentfitting and robustness.Thus,this study laid the foundation for the application of BRBPNN in the analysis of acid precipitation.Keywords:Bayesian regularized back-propagation neural network(BRBPNN),precipitation,chem-ical composition,temporal trend,the sum of square weights1.IntroductionCharacterization of the chemical nature of precipitation is currently under con-siderable investigations due to the increasing concern about man’s atmospheric inputs of substances and their effects on land,surface waters,vegetation and mate-rials.Particularly,temporal trend and chemical composition has been the subject of extensive research in North America,Canada and Japan in the past30years(Zeng Water,Air,and Soil Pollution(2006)172:167–184DOI:10.1007/s11270-005-9068-8C Springer2006168MIN XU ET AL.and Flopke,1989;Khawaja and Husain,1990;Lim et al.,1991;Sinya et al.,2002; Grimm and Lynch,2005).Linear regression(LR)methods such as multiple linear regression(MLR)have been widely used to develop the model of temporal trend and chemical composition analysis in precipitation(Sinya et al.,2002;George,2003;Aherne and Farrell,2002; Christopher et al.,2005;Migliavacca et al.,2004;Yasushi et al.,2001).However, LR is an“ill-posed”problem in statistics and sometimes results in the instability of the models when trained with noisy data,besides the requirement of subjective decisions to be made on the part of the investigator as to the likely functional (e.g.nonlinear)relationships among variables(Burden and Winkler,1999;2000). On the other hand,recently,there has been increasing interest in estimating the uncertainties and nonlinearities associated with impact prediction of atmospheric deposition(Page et al.,2004).Besides precipitation amount,human activities,such as local and regional land cover and emission sources,the actual role each plays in determining the concentration at a given location is unknown and uncertain(Grimm and Lynch,2005).Therefore,it is of much significance that the model of temporal variation and precipitation chemistry is efficient,gives unambiguous models and doesn’t depend upon any subjective decisions about the relationships among ionic concentrations.In this study,we propose a Bayesian regularized back-propagation neural net-work(BRBPNN)to overcome MLR’s deficiencies and investigate nonlinearity and uncertainty in acid precipitation.The network is trained through Bayesian reg-ularized methods,a mathematical process which converts the regression into a well-behaved,“well-posed”problem.In contrast to MLR and traditional neural networks(NNs),BRBPNN has more performance when the relationship between variables is nonlinear(Sovan et al.,1996;Archontoula et al.,2003)and more ex-cellent generalizations because BRBPNN is of automated regularization parameter selection capability to obtain the optimal network architecture of posterior distri-bution and avoid over-fitting problem(Burden and Winkler,1999;2000).Thus,the main purpose of our paper is to apply BRBPNN method to modeling the nonlinear relationship between the acidity and chemical compositions of precipitation and improve the accuracy of monthly ionic concentration model used to provide pre-cipitation estimates.And both of them are helpful to predict precipitation variables and interpret mechanisms of acid precipitation.2.Theories and Methods2.1.T HEORY OF BAYESIAN REGULARIZED BP NEURAL NETWORK Traditional NN modeling was based on back-propagation that was created by gen-eralizing the Widrow-Hoff learning rule to multiple-layer networks and nonlinear differentiable transfer monly,a BPNN comprises three types ofAPPLICATION OF BAYESIAN REGULARIZED BP NEURAL NETWORK MODEL 169Hidden L ayerInput a 1=tansig(IW 1,1p +b 1 ) Output L ayer a 2=pu relin(LW 2,1a 1+b 2)Figure 1.Structure of the neural network used.R =number of elements in input vector;S =number of hidden neurons;p is a vector of R input elements.The network input to the transfer function tansig is n 1and the sum of the bias b 1.The network output to the transfer function purelin is n 2and the sum of the bias b 2.IW 1,1is input weight matrix and LW 2,1is layer weight matrix.a 1is the output of the hidden layer by tansig transfer function and y (a 2)is the network output.neuron layers:an input layer,one or several hidden layers and an output layer comprising one or several neurons.In most cases only one hidden layer is used (Figure 1)to limit the calculation time.Although BPNNs with biases,a sigmoid layer and a linear output layer are capable of approximating any function with a finite number of discontinuities (The MathWorks,),we se-lect tansig and pureline transfer functions of MATLAB to improve the efficiency (Burden and Winkler,1999;2000).Bayesian methods are the optimal methods for solving learning problems of neural network,which can automatically select the regularization parameters and integrates the properties of high convergent rate of traditional BPNN and prior information of Bayesian statistics (Burden and Winkler,1999;2000;Jouko and Aki,2001;Sun et al.,2005).To improve generalization ability of the network,the regularized training objective function F is denoted as:F =αE w +βE D (1)where E W is the sum of squared network weights,E D is the sum of squared net-work errors,αand βare objective function parameters (regularization parameters).Setting the correct values for the objective parameters is the main problem with im-plementing regularization and their relative size dictates the emphasis for training.Specially,in this study,the mean square errors (MSE)are chosen as a measure of the network training approximation.Set a desired neural network with a training data set D ={(p 1,t 1),(p 2,t 2),···,(p i ,t i ),···,(p n ,t n )},where p i is an input to the network,and t i is the corresponding target output.As each input is applied to the network,the network output is compared to the target.And the error is calculated as the difference between the target output and the network output.Then170MIN XU ET AL.we want to minimize the average of the sum of these errors(namely,MSE)through the iterative network training.MSE=1nni=1e(i)2=1nni=1(t(i)−a(i))2(2)where n is the number of sample set,e(i)is the error and a(i)is the network output.In the Bayesian framework the weights of the network are considered random variables and the posterior distribution of the weights can be updated according to Bayes’rule:P(w|D,α,β,M)=P(D|w,β,M)P(w|α,M)P(D|α,β,M)(3)where M is the particular neural network model used and w is the vector of net-work weights.P(w|α,M)is the prior density,which represents our knowledge of the weights before any data are collected.P(D|w,β,M)is the likelihood func-tion,which is the probability of the data occurring,given that the weights w. P(D|α,β,M)is a normalization factor,which guarantees that the total probability is1.Thus,we havePosterior=Likelihood×PriorEvidence(4)Likelyhood:A network with a specified architecture M and w can be viewed as making predictions about the target output as a function of input data in accordance with the probability distribution:P(D|w,β,M)=exp(−βE D)Z D(β)(5)where Z D(β)is the normalization factor:Z D(β)=(π/β)n/2(6) Prior:A prior probability is assigned to alternative network connection strengths w,written in the form:P(w|α,M)=exp(−αE w)Z w(α)(7)where Z w(α)is the normalization factor:Z w(α)=(π/α)K/2(8)APPLICATION OF BAYESIAN REGULARIZED BP NEURAL NETWORK MODEL171 Finally,the posterior probability of the network connections w is:P(w|D,α,β,M)=exp(−(αE w+βE D))Z F(α,β)=exp(−F(w))Z F(α,β)(9)Setting regularization parametersαandβ.The regularization parameters αandβdetermine the complexity of the model M.Now we apply Bayes’rule to optimize the objective function parametersαandβ.Here,we haveP(α,β|D,M)=P(D|α,β,M)P(α,β|M)P(D|M)(10)If we assume a uniform prior density P(α,β|M)for the regularization parame-tersαandβ,then maximizing the posterior is achieved by maximizing the likelihood function P(D|α,β,M).We also notice that the likelihood function P(D|α,β,M) on the right side of Equation(10)is the normalization factor for Equation(3). According to Foresee and Hagan(1997),we have:P(D|α,β,M)=P(D|w,β,M)P(w|α,M)P(w|D,α,β,M)=Z F(α,β)Z w(α)Z D(β)(11)In Equation(11),the only unknown part is Z F(α,β).Since the objective function has the shape of a quadratic in a small area surrounding the minimum point,we can expand F(w)around the minimum point of the posterior density w MP,where the gradient is zero.Solving for the normalizing constant yields:Z F(α,β)=(2π)K/2det−1/2(H)exp(−F(w MP))(12) where H is the Hessian matrix of the objective function.H=β∇2E D+α∇2E w(13) Substituting Equation(12)into Equation(11),we canfind the optimal values for αandβ,at the minimum point by taking the derivative with respect to each of the log of Equation(11)and set them equal to zero,we have:αMP=γ2E w(w MP)andβMP=n−γ2E D(w MP)(14)whereγ=K−αMP trace−1(H MP)is the number of effective parameters;n is the number of sample set and K is the total number of parameters in the network. The number of effective parameters is a measure of how many parameters in the network are effectively used in reducing the error function.It can range from zero to K.After training,we need to do the following checks:(i)Ifγis very close to172MIN XU ET AL.K,the network may be not large enough to properly represent the true function.In this case,we simply add more hidden neurons and retrain the network to make a larger network.If the larger network has the samefinalγ,then the smaller network was large enough;and(ii)if the network is sufficiently large,then a second larger network will achieve comparable values forγ.The Bayesian optimization of the regularization parameters requires the com-putation of the Hessian matrix of the objective function F(w)at the minimum point w MP.To overcome this problem,the Gauss-Newton approximation to Hessian ma-trix has been proposed by Foresee and Hagan(1997).Here are the steps required for Bayesian optimization of the regularization parameters:(i)Initializeα,βand the weights.After thefirst training step,the objective function parameters will recover from the initial setting;(ii)Take one step of the Levenberg-Marquardt algorithm to minimize the objective function F(w);(iii)Computeγusing the Gauss-Newton approximation to Hessian matrix in the Levenberg-Marquardt training algorithm; (iv)Compute new estimates for the objective function parametersαandβ;And(v) now iterate steps ii through iv until convergence.2.2.W EIGHT CALCULATION OF THE NETWORKGenerally,one of the difficult research topics of BRBPNN model is how to obtain effective information from a neural network.To a certain extent,the network weight and bias can reflect the complex nonlinear relationships between input variables and output variable.When the output layer only involves one neuron,the influences of input variables on output variable are directly presented in the influences of input parameters upon the network.Simultaneously,in case of the connection along the paths from the input layer to the hidden layer and along the paths from the hidden layer to the output layer,it is attempted to study how input variables react to the hidden layer,which can be considered as the impacts of input variables on output variable.According to Joseph et al.(2003),the relative importance of individual input variable upon output variable can be expressed as:I=Sj=1ABS(w ji)Numi=1Sj=1ABS(w ji)(15)where w ji is the connection weight from i input neuron to j hidden neuron,ABS is an absolute function,Num,S are the number of input variables and hidden neurons, respectively.2.3.M ULTIPLE LINEAR REGRESSIONThis study attempts to ascertain whether BRBPNN are preferred to MLR models widely used in the past for temporal variation of acid precipitation(Buishand et al.,APPLICATION OF BAYESIAN REGULARIZED BP NEURAL NETWORK MODEL173 1988;Dana and Easter,1987;MAP3S/RAINE,1982).MLR employs the following regression model:Y i=a0+a cos(2πi/12−φ)+bi+cP i+e i i=1,2,...12N(16) where N represents the number of years in the time series.In this case,Y i is the natural logarithm of the monthly mean concentration(mg/L)in precipitation for the i th month.The term a0represents the intercept.P i represents the natural logarithm of the precipitation amount(ml)for the i th month.The term bi,where i(month) goes from1to12N,represents the monotonic trend in concentration in precipitation over time.To facilitate the estimation of the coefficients a0,a,b,c andφfollowing Buishand et al.(1988)and John et al.(2000),the reparameterized MLR model was established and thefinal form of Equation(16)becomes:Y i=a0+αcos(2πi/12)+βsin(2πi/12)+bi+cP i+e i i=1,2,...12N(17)whereα=a cosϕandβ=a sinϕ.a0,α,β,b and c of the regression coefficients in Equation(17)are estimated using ordinary least squares method.2.4.D ATA SET SELECTIONPrecipitation chemistry data used are derived from NADP(the National At-mospheric Deposition Program),a nationwide precipitation collection network founded in1978.Monthly precipitation information of nine species(pH,NH+4, NO−3,SO2−4,Ca2+,Mg2+,K+,Cl−and Na+)and precipitation amount in1990–2003are collected in Clinton Crops Research Station(NC35),North Carolina, rmation on the data validation can be found at the NADP website: .The BRBPNN advantages are that they are able to produce models that are robust and well matched to the data.At the end of training,a Bayesian regularized neural network has the optimal generalization qualities and thus there is no need for a test set(MacKay,1992;1995).Husmeier et al.(1999)has also shown theoretically and by example that in a Bayesian regularized neural network,the training and test set performance do not differ significantly.Thus,this study needn’t select the test set and only the training set problem remains.i.Training set of BRBPNN between precipitation acidity and other ions With regard to the relationship between precipitation acidity and other ions,the input neurons are taken from monthly concentrations of NH+4,NO−3,SO2−4,Ca2+, Mg2+,K+,Cl−and Na+.And precipitation acidity(pH)is regarded as the output of the network.174MIN XU ET AL.ii.Training set of BRBPNN for temporal trend analysisBased on the weight calculations of BRBPNN between precipitation acidity and other ions,this study will simulate temporal trend of three main ions using BRBPNN and MLR,respectively.In Equation(17)of MLR,we allow a0,α,β,b and c for the estimated coefficients and i,P i,cos(2πi/12),and sin(2πi/12)for the independent variables.To try to achieve satisfactoryfitting results of BRBPNN model,we similarly employ four unknown items(i,P i,cos(2πi/12),and sin(2πi/12))as the input neurons of BRBPNN,the availability of which will be proved in the following. 2.5.S OFTWARE AND METHODMLR is carried out through SPSS11.0software.BRBPNN is debugged in neural network toolbox of MATLAB6.5for the algorithm described in Section2.1.Concretely,the BRBPNN algorithm is implemented through“trainbr”network training function in MATLAB toolbox,which updates the weight and bias according to Levenberg-Marquardt optimization.The function minimizes both squared errors and weights,provides the number of network parameters being effectively used by the network,and then determines the correct combination so as to produce a network that generalizes well.The training is stopped if the maximum number of epochs is reached,the performance has been minimized to a suitable small goal, or the performance gradient falls below a suitable target.Each of these targets and goals is set at the default values by MATLAB implementation if we don’t want to set them artificially.To eliminate the guesswork required in determining the optimum network size,the training should be carried out many times to ensure convergence.3.Results and Discussions3.1.C ORRELATION COEFfiCIENTS OF PRECIPITATION IONSFrom Table I it shows the correlation coefficients for the ion components and precipitation amount in NC35,which illustrates that the acidity of precipitation results from the integrative interactions of anions and cations and mainly depends upon four species,i.e.SO2−4,NO−3,Ca2+and NH+4.Especially,pH is strongly correlated with SO2−4and NO−3and their correlation coefficients are−0.708and −0.629,respectively.In addition,it can be found that all the ionic species have a negative correlation with precipitation amount,which accords with the theory thatthe higher the precipitation amount,the lower the ionic concentration(Li,1999).3.2.R ELATIONSHIP BETWEEN PH AND CHEMICAL COMPOSITIONS3.2.1.BRBPNN Structure and RobustnessFor the BRBPNN of the relationship between pH and chemical compositions,the number of input neurons is determined based on that of the selected input variables,APPLICATION OF BAYESIAN REGULARIZED BP NEURAL NETWORK MODEL175TABLE ICorrelation coefficients of precipitation ionsPrecipitation Ions Ca2+Mg2+K+Na+NH+4NO−3Cl−SO2−4pH amountCa2+ 1.0000.4620.5480.3490.4490.6270.3490.654−0.342−0.369Mg2+ 1.0000.3810.9800.0510.1320.9800.1230.006−0.303K+ 1.0000.3200.2480.2260.3270.316−0.024−0.237Na+ 1.000−0.0310.0210.9920.0210.074−0.272NH+4 1.0000.7330.0110.610−0.106−0.140NO−3 1.0000.0500.912−0.629−0.258Cl− 1.0000.0490.075−0.265SO2−4 1.000−0.708−0.245pH 1.0000.132 Precipitation 1.000 amountcomprising eight ions of NH+4,NO−3,SO2−4,Ca2+,Mg2+,K+,Cl−and Na+,and the output neuron only includes pH.Generally,the number of hidden neurons for traditional BPNN is roughly estimated through investigating the effects of the repeatedly trained network.But,BRBPNN can automatically search the optimal network parameters in posterior distribution(MacKay,1992;Foresee and Hagan, 1997).Based on the algorithm of Section2.1and Section2.5,the“trainbr”network training function is used to implement BRBPNNs with a tansig hidden layer and a pureline output layer.To acquire the optimal architecture,the BRBPNNs are trained independently20times to eliminate spurious effects caused by the random set of initial weights and the network training is stopped when the maximum number of repetitions reaches3000epochs.Add the number of hidden neurons(S)from1to 20and retrain BRBPNNs until the network performance(the number of effective parameters,MSE,E w and E D,etc.)remains approximately the same.In order to determine the optimal BRBPNN structure,Figure2summarizes the results for training many different networks of the8-S-1architecture for the relationship between pH and chemical constituents of precipitation.It describes MSE and the number of effective parameters changes along with the number of hidden neurons(S).When S is less than15,the number of effective parameters becomes bigger and MSE becomes smaller with the increase of S.But it is noted that when S is larger than15,MSE and the number of effective parameters is roughly constant with any network.This is the minimum number of hidden neurons required to properly represent the true function.From Figure2,the number of hidden neurons (S)can increase until20but MSE and the number of effective parameters are still roughly equal to those in the case of the network with15hidden neurons,which suggests that BRBPNN is robust.Therefore,using BPBRNN technique,we can determine the optimal size8-15-1of neural network.176MIN XU ET AL.Figure2.Changes of optimal BRBPNNs along with the number of hidden neurons.parison of calculations between BRBPNN(8-15-1)and MLR.3.2.2.Prediction Results ComparisonFigure3illustrates the output response of the BRBPNN(8-15-1)with a quite goodfit.Obviously,the calculations of BRBPNN(8-15-1)have much higher correlationcoefficient(R2=0.968)and more concentrated near the isoline than those of MLR. In contrast to the previous relationships between the acidity and other ions by MLR,most of average regression R2achieves less than0.769(Yu et al.,1998;Baez et al.,1997;Li,1999).Additionally,Figures2and3show that any BRBPNN of8-S-1architecture hasbetter approximating qualities.Even if S is equal to1,MSE of BRBPNN(8-1-1)ismuch smaller and superior than that of MLR.Thus,we can judge that there havebeen strong nonlinear relationships between the acidity and other ion concentration,which can’t be explained by MLR,and that it may be quite reasonable to apply aAPPLICATION OF BAYESIAN REGULARIZED BP NEURAL NETWORK MODEL177TABLE IISum of square weights(SSW)and the relative importance(I)from input neurons to hidden layer Ca2+Mg2+K+Na+NH+4NO−3Cl−SO2−4 SSW 2.9589 2.7575 1.74170.880510.4063 4.0828 1.3771 5.2050 I(%)10.069.38 5.92 2.9935.3813.88 4.6817.70neural network methodology to interpret nonlinear mechanisms between the acidity and other input variables.3.2.3.Weight Interpretation for the Acidity of PrecipitationTo interpret the weight of the optimal BRBPNN(8-15-1),Equation(15)is used to evaluate the significance of individual input variable and the calculations are illustrated in Table II.In the eight inputs of BRBPNN(8-15-1),comparatively, NH+4,SO2−4,NO−3,Ca2+and Mg2+have greater impacts upon the network and also indicates thesefive factors are of more significance for the acidity.From Table II it shows that NH+4contributes by far the most(35.38%)to the acidity prediction, while SO2−4and NO−3contribute with17.70%and13.88%,respectively.On the other hand,Ca2+and Mg2+contribute10.06%and9.38%,respectively.3.3.T EMPORAL TREND ANALYSIS3.3.1.Determination of BRBPNN StructureUniversally,there have always been lowfitting results in the analysis of temporal trend estimation in precipitation.For example,the regression R2of NH+4and NO−3 for Vhesapeake Bay Watershed in Grimma and Lynch(2005)are0.3148and0.4940; and the R2of SO2−4,NH+4and NO−3for Japan in Sinya et al.(2002)are0.4205, 0.4323and0.4519,respectively.This study also applies BRBPNN to estimate temporal trend of precipitation chemistry.According to the weight results,we select NH+4,SO2−4and NO−3to predict temporal trends using BRBPNN.Four unknown items(i,P i,cos(2πi/12),and sin(2πi/12))in Equation(17)are assumed as input neurons of BRBPNNs.Spe-cially,two periods(i.e.1990–1996and1990–2003)of input variables for NH+4 temporal trend using BRBPNN are selected to compare with the past MLR results of NH+4trend analysis in1990–1996(John et al.,2000).Similar to Figure2with training20times and3000epochs of the maximum number of repetitions,Figure4summarizes the results for training many different networks of the4-S-1architecture to approximate temporal variation for three ions and shows the process of MSE and the number of effective parameters along with the number of hidden neurons(S).It has been found that MSE and the number of effective parameters converge and stabilize when S of any network gradually increases.For the1990–2003data,when the number of hidden neurons(S)can178MIN XU ET AL.Figure4.Changes of optimal BRBPNNs along with the number of hidden neurons for different ions.∗a:the period of1990–2003;b:the period of1990–1996.increase until10,we canfind the minimum number of hidden neurons required to properly represent the accurate function and achieve satisfactory results are at least 6,6and4for trend analysis of NH+4,SO2−4and NO−3,respectively.Thus,the best BRBPNN structures of NH+4,SO2−4and NO−3are4-6-1,4-6-1,4-4-1,respectively. Additionally for NH+4data in1990–1996,the optimal one is BRBPNN(4-10-1), which differs from BRBPNN(4-6-1)of the1990–2003data and also indicates that the optimal BRBPNN architecture would change when different data are inputted.parison between BRBPNN and MLRFigure5–8summarize the comparison results of the trend analysis for different ions using BRBPNN and MLR,respectively.In particular,for Figure5,John et al. (2000)examines the R2of NH+4through MLR Equation(17)is just0.530for the 1990–1996data in NC35.But if BRBPNN method is utilized to train the same1990–1996data,R2can reach0.760.This explains that it is indispensable to consider the characteristics of nonlinearity in the NH+4trend analysis,which can make up the insufficiencies of MLR to some extent.Figure6–8demonstrate the pervasive feasibility and applicability of BRBPNN model in the temporal trend analysis of NH+4,SO2−4and NO−3,which reflects nonlinear properties and is much more precise than MLR.3.3.3.Temporal Trend PredictionUsing the above optimal BRBPNNs of ion components,we can obtain the optimal prediction results of ionic temporal trend.Figure9–12illustrate the typical seasonal cycle of monthly NH+4,SO2−4and NO−3concentrations in NC35,in agreement with the trend of John et al.(2000).APPLICATION OF BAYESIAN REGULARIZED BP NEURAL NETWORK MODEL179parison of NH+4calculations between BRBPNN(4-10-1)and MLR in1990–1996.parison of NH+4calculations between BRBPNN(4-6-1)and MLR in1990–2003.parison of SO2−4calculations between BRBPNN(4-6-1)and MLR in1990–2003.Based on Figure9,the estimated increase of NH+4concentration in precipita-tion for the1990–1996data corresponds to the annual increase of approximately 11.12%,which is slightly higher than9.5%obtained by MLR of John et al.(2000). Here,we can confirm that the results of BRBPNN are more reasonable and im-personal because BRBPNN considers nonlinear characteristics.In contrast with180MIN XU ET AL.parison of NO−3calculations between BRBPNN(4-4-1)and MLR in1990–2003Figure9.Temporal trend in the natural log(logNH+4)of NH+4concentration in1990–1996.∗Dots (o)represent monitoring values.The solid and dashed lines respectively represent predicted values and estimated trend given by BRBPNN method.Figure10.Temporal trend in the natural log(logNH+4)of NH+4concentration in1990–2003.∗Dots (o)represent monitoring values.The solid and dashed lines respectively represent predicted values and estimated trend given by BRBPNN method.。
