Graph-based Correlation of SNMP Objects for Anomaly Detection
基于格的线性同态签名在云存储数据动态验证方案中的应用
:TP309
文献标志码:
A
文章编号
=2095 - 2783(2016)20- 2381 - 06
Application of lattice-based linearly homomorphic signatures in cloud storage dynamic verification
云存储是云计算的i 项基础服务, 云存储提供 商为用户提供大量的存储空间, 用户可以随时随地 访问云端数据, 其在为用户提供便利的同时, 也带来 了新的挑战以及安全隐患[1]。用户将本地数据上传 至云端服务器之后, 失去对数据的直接控制, 恶意云 端服务提供商可能出于好奇或者其他不为人知的目 的窥探或篡改用户的数据, 因此, 云端数据的完整性 及可用性成为亟待解决的问题[2]。基于传统密码方 案的云端验证协议一般规约到某个困难问题的难解 性, 比如, 基 于 R S A 签 名 算法的验证协议[34], 基于 D iffie -H e llm a n 困 难 问 题 的 双 线 性 映 射 的 验 证 协 议[5]。伴随科 学 技 术 的 发 展 , 使 量 子计算机的问世 成为可能。量子计算机可以在多项式时间内解决上 述困难问题[6], 从 而 基 于 传 统 密 码方案的数据验证 协议将不再安全。 根据目前的研究成果, 对格上困难问题还没有 有效的破解算法, 基于格的困难问题构造密码方案, 是当前密码体制研究的1 个重 要 方 向 , 根据文献[7] 中格的定义, 基于格的验证协议有以下几个优点:格 在代数上是一个加法交换群, 且格密码方案大都使 用整数格, 格上的线性运算与指数运算相比效率有 很大提高;基于格的困难问题有现成的规约证明, 保 证格密码的安全性。文献[8]中设计的签名方案(简 称 为 G P V 签名) 作为标准数字签名方案, 成为许多 格公钥密码算法的基本工具。界 3!^等 [9]使 用 G P V 签名构建了二元域上基于格的线性同态签名方案 ( lattice-based line ar h om om orp hic s ig n a tu re s , 1^13), 1^1等 [ 1 ( ) ] 在 L H S 的基础上又提出了云存储 公有验证方案。然 而 , 此方案不支持数据动态验证, 在云存储验证中, 由于时常会有文件或数据的插入、
基于Gabor特征和字典学习的高斯混合稀疏表示图像识别
基于Gabor特征和字典学习的高斯混合稀疏表示图像识别詹曙;王俊;杨福猛;方琪【摘要】为了克服图像识别中光照,姿态等变化带来的识别困难,同时提高稀疏表示图像识别的鲁棒性,本文提出了一种基于Gabor特征和字典学习的高斯混合稀疏表示图像识别算法.高斯混合稀疏表示是基于最大似然估计准则,将稀疏保真度表示为余项的最大似然函数,最终识别问题转化为求解加权范数的优化逼近问题.本文算法首先提取图像的Gabor特征;然后对Gabor特征集进行字典学习,由于在学习过程中引入了Fisher准则作为约束,学习得到具有类别标签的新字典;最后使用高斯混合稀疏表示识别方法进行分类识别.在3个公开数据库(人脸数据库AR库和FERET库以及USPS手写数字库)上的实验结果验证了该算法的有效性和鲁棒性.【期刊名称】《电子学报》【年(卷),期】2015(043)003【总页数】6页(P523-528)【关键词】Gabor特征;稀疏表示;fisher字典学习;最大似然估计【作者】詹曙;王俊;杨福猛;方琪【作者单位】合肥工业大学计算机与信息学院,安徽合肥230009;合肥工业大学计算机与信息学院,安徽合肥230009;三江学院电子信息工程学院,江苏南京210012;合肥工业大学计算机与信息学院,安徽合肥230009【正文语种】中文【中图分类】TP3911 引言近年来,图像识别目前已成为计算机视觉和模式识别领域的热门课题,具有广阔的应用前景.然而,图像识别仍有许多问题远未得到解决,例如光照,表情和姿态等变化都在一定程度上限制了该研究的发展,因此如何处理这些问题是目前该领域研究的关键和难点.在图像识别的众多研究方法中,稀疏表示的分类思想[1]已成功在图像识别领取得了重要的地位.基于稀疏表示的图像分类是可以用少量同一类的低维图像编码或者表示高维图像;主要有两个阶段:稀疏表示和分类识别.首先,通过字典原子与一些稀疏性约束对测试图像进行表示,然后在稀疏表示系数和字典的基础上进行分类识别.人脸识别方面,2009 年,Wright[2]等提出一个基于稀疏表示的分类器(Sparse Representation Classification,SRC),将原始的训练人脸图像作为字典,通过范数求解测试样本的稀疏系数,通过该系数对测试人脸进行重构,进而求出其残差,将其归为残差最小类,取得了不错的分类效果.由于原始的训练图像中可能存在的不确定性和噪声干扰,使得这样的字典不是非常有效.近几年国内外的学者提出了很多过完备字典学习的方法[3,4],目的是从训练样本中得到一组基能够更好的对测试样本进行表示或者编码.例如Elad.M[5]等提出的 KSVD算法,是目前用于计算稀疏表示超完备字典效果较好的算法,该算法的字典更新对字典原子逐一更新,综合考虑了字典原子对每一个训练样本的影响,增强了原子的描述能力.Zhang[6]等提出 D-KSVD(Discriminative K-SVD)算法,将稀疏系数变换到类别间差异更加突出的空间,其分类器在K-SVD训练字典的同时得到训练,使得超完备字典和分类器的性能都得到了提高,从而取得了较理想效果.Yang[7]等提出 FDDL算法,将 Fisher判别准则融入到字典学习中,训练得到一个结构性字典,同类样本在此字典下的表示误差较小,而对不同类别样本的表示误差较大,提高了分类的准确性.上述的研究都是对原始的训练样本直接进行字典学习,虽然这些算法都取得了较好的识别效果,但是这些算法是基于全局特征进行处理,没有考虑使用局部特征信息;在现实中由于受样本数目的限制,全局特征并不能有效的应对光照、表情以及姿态等变化因素.因此,以局部特征为研究方向的分类识别算法引发了国内外学者的关注,例如Gabor小波[8~11]能够很好地提取目标图像的不同空间位置、方向和频率上的特征,由于该特征从图像的局部提取,能够更好的克服光照,姿态和表情等全局干扰对识别效果的影响.同时,传统的稀疏表示算法未能很好地运用判别性的信息,而将判别性的信息应用于稀疏编码系数可以使得训练得到的字典,具有小的类内散度和大的类间散度从而获得很强的鉴别能力.结合上述分析,受文献[7]的算法启发,本文提出基于Gabor特征和字典学习的高斯混合稀疏表示图像识别算法;该算法主要是使用Gabor特征取代原来的全局特征进行字典学习,并将高斯混合稀疏表示模型作为分类策略.2 稀疏表示分类方法2. 1 稀疏表示分类稀疏表示分类是指将测试图像表示为训练样本的线性组合,然后用l1或l2范数对保真度项进行编码.文献[2]提出的传统的稀疏表示算法是假设有k类样本,令A=[A1,A2,…,Ak]表示训练样本集,其中 Ai是第 i类的训练样本子集.令y代表测试样本,将测试图像y表示为训练样本的线性组合,即y=Aα,其中α=[α1;…;αi;…;αk].SRC 算法过程如下:(1)将测试样本表示为字典A的线性组合,由l1范数最小化求解得到稀疏系数:式中λ为一个标量;(2)计算各类样本对测试样本的逼近残差:其中,δi(^α)是与第i类样本对应的系数向量;(3)根据最小逼近残差将测试图像分类:2.2 高斯混合稀疏表示模型上述稀疏表示方法中的编码模型是假定残差余项服从高斯分布或拉普拉斯分布,但是这样的假设在当测试图像出现异常像素(如被部分遮挡、受噪声干扰或局部形状变化)时可能并不成立,所以稀疏表示模型在上述情况下的有效性和鲁棒性将降低.因此,文献[12]中提出了基于最大似然估计理论的高斯混合稀疏表示模型.首先将训练样本字典改写为A=[a1;a2;…;an],其中行向量ai表示A的第i行,则余项表示为e=y-Aα=[e1;e2;…;en],其中 ei=yi-aiα,i=1,2,…,n.假设e1,e2,…,en独立同分布,设其概率密度函数为fθ(ei),θ表示该分布的参数设置.在不考虑稀疏性表示系数α约束的情况下,余项e的最大似然估计函数为:3 基于Gabor特征和字典学习的高斯混合稀疏表示图像识别算法由于当测试图像存在部分遮挡、噪声干扰或者局部变形时[13~15],传统的稀疏表示识别方法通过简单的用l1或l2范数不能够完全准确的来表示残差的分布概率情况,在一定程度上缺少鲁棒性.考虑到Gabor特征从图像局部提取,对光照,表情和姿态变化相对于整体特征更具鲁棒性;同时考虑到字典的选择构造尤为重要,无论是在图像重建和分类识别[16]的过程中都起着非常关键的作用.综合以上多种考虑,本文提出基于Gabor特征和字典学习的高斯混合稀疏表示图像识别算法.图1是本文算法的流程框图.该算法主要思想:首先是提取原始图像的Gabor特征,然后将Gabor特征集作为初始字典,通过对初始字典的学习得到具有类别标签的新字典,新字典由于引入fisher判别约束编码系数,具有小的类内散度和大的类间散度,使学习得到的字典具有很强的鉴别能力;最后,使用高斯混合稀疏表示识别方法,将稀疏表示的保真度表示为余项的最大似然函数,将识别问题转化为加权范数的优化逼近问题,对不同的余项赋予不同的权值,用迭代重加权稀疏编码算法进行求解.3.1 基于Gabor特征集和FISHER准则的字典学习Gabor滤波器定义为:式(11)的目标函数的求解是一个组合迭代优化的过程,通过分别固定D和X进行迭代优化获得所需的新字典D和鉴别系数X.3.2 基于高斯混合稀疏表示的分类方法将经过本节3.1字典学习后得到的具有鉴别性的新字典D用于高斯混合稀疏表示模型,求解编码系数和重构残差.在本文算法中,式(7)进一步写为下式:由式(7)和式(12)比较可得,本文的分类方法将传统的稀疏表示模型转化为一个加权的稀疏表示模型,式(12)是一个加权范数逼近求解问题,此处的W定义为对角的权值矩阵,赋予不同的残差余项不同的权值,即Wi,i表示分配给每一个特征点的不同权值.通常情况下,由于异常的Gabor特征点会导致较大的残差;因此,应该赋予其较低的权值,才能获得较好的鲁棒性.基于这样的情况,选择与其有类似性质的SVM的hingeloss函数作为权值函数,即:式中,μ控制从1到0的下降速度,δ控制分界点.最后,结合式(12)和式(13)得到求解稀疏表示系数的目标函数:对于式(14),通过不断迭代优化更新权值矩阵W,直至收敛,求得最优的稀疏表示系数,根据式(2)求出残差余项,最后由最小逼近残差准则完成测试图像的分类.4 实验结果为了验证本文算法的有效性,我们在AR库[17]和FERET人脸数据库[18]以及USPS手写数据库进行了一系列的实验.实验结果均为在多次实验后得到的平均识别率,以上实验在Inter(R)core(TM)i5-4130,CPU主频是3.4GHz,4G内存的win7操作系统,matlab2012b环境下运行.4.1 AR人脸库本实验主要针对光照变化问题,AR人脸库包含126个人的4000多幅正面人脸图像,从中选择50名男性和50名女性,每人14幅图像,前7幅用于训练,后7幅用于测试.实验数据库中图像包含光照变化和表情变化,所有图像的尺寸裁剪为83×60,图像都经过归一化处理,采用PCA降维后维数d=300,本文算法参数λ1=0.005,λ2=0.005.表1分别比较文献[4]中判别KSVD(DKSVD)、文献[2]中稀疏表示分类算法(SRC)、文献[6]中判别字典学习的稀疏表示(FDDL)算法、基于Gabor特征的FDDL算法(GFDDL)和本文算法的正确识别率,同时给出了各算法的在识别阶段的时间运行情况.表1 AR人脸数据库算法识别结果比较algorithm recognition rate(%)time(s)DKSVD 85.4 526 SRC 88.8 619 FDDL GFDDL 92.0 93.7 635 628本文算法96.2 1016由于AR人脸库中图像存在丰富的光照、表情等变化,SRC等算法只是简单的把全部训练样本用来构造字典,这样的字典没有充分利用已知的样本类别信息;FDDL 算法虽然在字典学习中添加了判别性约束条件,但是采用整体特征的算法仍不能有效的处理这些问题.本文算法由于使用Gabor特征和加入Fisher准则构造字典,不但使字典具有更强的表示能力,而且具有较强的鉴别性.同时为了验证高斯混合稀疏模型在分类时比SRC模型更具鲁棒性的,本文还做了GFDDL算法作为对比实验,实验结果表明本文方法能够更好的处理光照和表情变化的干扰,验证了本文算法的可行性和有效性.4.2 FERET人脸数据库本实验主要针对姿态变化问题,选择FERET库的一个姿态库作为实验数据,该姿态库由200个人的1400幅图像组成,每人7幅图像,其中3幅为正面图像,4 幅为姿态偏转图像,分别标记为 ba、bd、be、bf、bg、bj和bk.在实验中,所有图像的尺寸归一化为80×80.我们总共做了5组不同姿态角度的实验,在实验1(姿态角度为0度)中,每一类图像中标记为ba、bj的图像作为训练集,标记为bk的图像作为测试集;剩余的四组实验中,用标记为ba、bj和bk的图像作为训练集,然后分别用bg(-25°)、bf(-15°)、be(+15°)、bd(+25°)作为测试集,得到不同姿态角度下的识别结果.在实验中,用PCA降到350维,参数设置为λ1=0.005,λ2=0.005.来自同一人的人脸图像如图2所示.图3画出了几种不同算法分别在不同姿态角度下的正确识别率,本文提出的算法明显高于SRC以及其他的算法.表2给出了各算法在实验1中不同姿态角度下的运行时间情况.从图中可以看出,当姿态角度偏转适中的情况(0°和±15°)下,SRC对姿态变化很敏感,本文算法的相比于SRC算法,识别率至少提高了12%.随着偏转角度的增加,当姿态变化较大(±25°)时,各算法的识别率都在下降,但本文的算法仍然高于其他算法.由于Gabor特征描述图像局部信息的优越性,因此与单纯的采用判别式字典学习(FDDL)算法相比,GFDDL算法表明Gabor局部特征对姿态变化有较好的鲁棒性,显著地提高了识别率.同时由于高斯混合稀疏表示对局部变形的不敏感性,本文算法相比其他算法更好的克服了姿态变化的影响.实验结果表明本文算法在姿态变化适中的情况下显著的提高了识别效果,在实际中也具有一定的应用价值.表2 FERET库在实验1(姿态角度为0度)算法时间对比algorithm time(s)SRC FDDL GFDDL本文算法203 195 217 4904.3 USPS手写数字库美国国家邮政局手写数字数据库(USPS)包含0~9共10个数字的手写体图像,其中训练图像7291幅,测试图像2007幅.图4是该字库3组不同字符的图像.实验从每一类数字分别随机选取100幅作为训练集,100幅作为测试集.图像尺寸大小为16×16,图像均经过归一化处理,用PCA变换降维后维数d=120,参数设置为λ1=0.005,λ2=0.05.表3给出了本文方法与其他几种方法识别效果的比较以及各算法的在识别阶段的时间消耗情况.由于书写风格的不同造成了各种字符变形,使字符的模式特征变得不稳定和极其复杂,从图3可以看出该字库中各字符笔画的形态、粗细和灰度等差别都很显著.与SRC算法相比,本文结合Gabor特征和Fisher准则学习得到的字典能够很好的提取每一类的判别性信息,具有更好的鉴别稀疏表示能力;通过对比GFDDL算法实验证明了高斯混合稀疏模型对各种字体的形态变化相比SRC模型具有更好的鲁棒性.可以看出,相比其他算法,本文方法取得了很好的识别效果,证明了算法的合理性和有效性.表3 USPS手写数字库算法识别结果比较algorithm recognition rate(%)time(s)SRSC SRC FDDL GFDDL本文算法91.8 92.1 90.1 93.1 95.9 241 278 243 265 485为了充分评价算法的性能,在实验中记录了各算法的在识别阶段所消耗的时间,下面对各算法的复杂度进行分析.由于SRC算法只是简单的将全部训练样本用于构造字典,而FDDL,GFDDL以及本文算法需要进行Gabor特征的提取以及字典学习过程,所以FDDL,GFDDL和本文算法计算时间消耗相对要大;但显然这种情况出现在训练阶段,因此我们在这里比较各算法在测试识别阶段的时间复杂度情况.在识别阶段,上面算法耗时主要是集中在稀疏编码的过程,即求解式(2)或者式(14).假设训练字典原子的个数为m,式(2)的求解(即传统的稀疏表示分类)通常采用l1正则化稀疏编码方式,它的时间复杂度为O(mε),一般ε≈1.5,SRC、FDDL和GFDDL算法在识别阶段采用传统的稀疏分类方法,只需要执行一次稀疏编码过程,然后利用残差来实现分类;而高斯混合稀疏分类算法需要几次迭代过程才完成编码过程,即时间复杂度为O(k(mε)),k为迭代次数(本文实验k设置为2).在这种情况下,本文算法的时间复杂度要高于前面的算法,大约是它们的k倍.由于本文算法存在2次迭代,所以识别的时间是它们的2倍左右,从各组实验结果也得到了验证.5 总结针对图像识别过程中存在的光照,表情和姿态变化等问题,考虑到Gabor特征相比于整体特征更具鲁棒性,本文算法对提取出的Gabor特征进行字典学习,在字典学习过程中引入Fisher准则施加在稀疏表示系数上,最后采用高斯混合稀疏表示算法进行分类识别.在国际公开的图像数据库上的实验结果表明,相对于目前比较流行的稀疏表示算法,本文算法在处理光照、姿态变化等方面具有更好的识别效果.参考文献【相关文献】[1]Huang K,Aviyente S.Sparse representation for signal classification[A].Proceedings of Advances in Neural Information Processing Systems[C].Vancouver,Canada:The MIT Press,2006.609 -616.[2]Wright J,Yang A Y,Ganesh A,et al.Robust face recognition via sparse representation[J].Pattern Analysis and Machine Intelligence,IEEE Transactions on,2009,31(2):210-227.[3]Zhuolin Jiang,Zhe Lin,Larry S bel consistent KSVD:Learning a discriminative dictionary for recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(11):2651 -2664.[4]朱杰,杨万扣,唐振民.基于字典学习的核稀疏表示人脸识别方法[J].模式识别与人工智能,2012,25(5):859-864.ZHU Jie,YANG Wan-kou,TANG Zhen-Min.A dictionary learning based kernel sparse representation method for face recognition[J].Pattern Recognition and Artificial Intelligence,2012,25(5):859 -864.(in Chinese)[5]Elad M,Aharon M.Image denoising via sparse and redundant representations over learned dictionaries[J].Image Processing,IEEE Transactions on,2006,15(12):3736 -3745.[6]Zhang Q,Li B.Discriminative K-SVD for dictionary learning in face recognition [A].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR)[C].San Francisco,CA:IEEE,2010.2691 -2698.[7]Yang M,Zhang L,Feng X,et al.Fisher discrimination dictionary learning for sparse representation[A].Proceedings of IEEE International Conference on ComputerVision(ICCV)[C].Barcelona,Spain:IEEE,2011.543 -550.[8]Yang M,Zhang L.Gabor feature based sparse representation for face recognition with gabor occlusion dictionary[A].Proceedings of Computer Vision-ECCV 2010[C].Crete,Greece:Springer Berlin Heidelberg,2010.448 -461.[9]王科俊,邹国锋.基于子模式的Gabor特征融合的单样本人脸识别[J].模式识别与人工智能,2013,V26(1):50-56.WANG Ke-Jun,ZOU Guo-Feng.A sub-pattern Gabor features fusion method for single sample face recognition[J].Pattern Recognition and Artificial Intelligence,2013,V26(1):50 -56.(in Chinese)[10]周家锐,纪震,等.基于Gabor小波与Memetic算法的人脸识别方法[J].电子学报,2012,40(4):642 -646.ZHOU Jia-rui,JI Zhen,et al.Face recognition using Gabor wavelets and Memetic algorithm[J].Acta Electronic Sinica,2012,40(4):642 -646.(in Chinese)[11]詹曙,张启祥,等.基于Gabor特征核协作表达的三维人脸识别[J].光子学报,2013,42(12):1448-1453.ZHAN Shu,ZHANG Qi-xiang.3D face recognition by kernelcollaborative representation based on gabor feature[J].Acta Photonica Sinica,2013,42(12):1448 - 1453.(in Chinese)[12]Yang M,Zhang L,Yang J,et al.Robust sparse coding for face recognition [A].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR)[C].Colorado Springs:IEEE,2011.625 -632.[13]Ou W,You X,Tao D,et al.Robust face recognition via occlusion dictionary learning[J].Pattern Recognition,2014,47(4):1559 -1572.[14]Deng W,Hu J,Guo J.Extended SRC:Undersampled face recognition via intraclass variant dictionary[J].Pattern A-nalysis and Machine Intelligence,IEEE Transactions on,2012,34(9):1864 -1870.[15]平强,庄连生,俞能海.姿态和光照可变条件下的仿射最小线性重构误差人脸识别算法[J].电子学报,2013,40(10):1965-1970.PING Qiang,ZHUANG Liansheng.Affine minimum linear reconstruction error face recognition under varying pose and illumination[J].Acta Electronic Sinica,2013,40(10):1965 -1970.(in Chinese)[16]胡正平,李静.基于低秩子空间恢复的联合稀疏表示人脸识别算法[J].电子学报,2013,41(5):987 -991.HU Zheng-ping,LI Jing.Face recognition of joint sparse representation based on low-rank subspace recovery[J].Acta Electronic Sinica,2013,41(5):987 -991.(in Chinese)[17]Martinez A M.The AR Face Database[R].Barcelona:Computer Vision Center,Universitat Autónoma de Barcelona,1998.[18]Phillips P J,Wechsler H,Huang J,et al.The FERET database and evaluation procedure for face-recognition algorithms[J].Image and Vision Computing,1998,16(5):295-306.。
McAfee SIEM 安全信息与事件管理系统基础知识说明书
SIEM McAfee – Security Information & Event ManagementAdministrationCourse Content –Module 1: SIEM Overview▪What is SIEM?– Security Information and Event Management (SIEM)▪Event Analysis and Workflow▪Event Normalization▪Event Aggregation▪Event Correlation▪Log Management and Retention▪Security Information Management▪Security Event Management▪How SIEM is Used▪Compliance Obligations▪Elusive Security Events▪SIEM Components Overview– McAfee® Enterprise Security Manager (ESM)▪McAfee® Enterprise Log Manager (ELM)▪McAfee® Event Receiver (ERC)▪McAfee® Application Data Monitor (ADM)▪McAfee® Database Event Monitor (DEM)▪McAfee® Advanced Correlation Engine (ACE)▪McAfee SIEM Architecture –“Combo Boxes”▪Enterprise Security Manager(ESM)▪Receiver (ERC)▪Database Event Monitor (DEM)▪Application Data Monitor (ADM)▪Advanced Correlation Engine (ACE)▪Risk Correlation▪Correlation▪The Big Picture– Identifying Business Needs and Stakeholders▪Deployment Scenarios▪Large Centralized Deployment Example▪Large Distributed Deployment Example▪First-Time ESM Setup– Navigating the ESMI– Configure the Properties for the ESMI System▪Add the Devices to the System▪Configure the Device Properties▪FIPS Compliant Mode▪Implementation Process Checklist– Back-up and recovery plans– Consider integration with existing products▪Ensure end-user communications▪Apply Software Updates▪Do Validation Testing▪Follow Testing Procedures▪Change ControlModule 2: ESM and Receiver Overview▪McAfee Enterprise Security Manager– McAfee ESM Properties– ESM System Information– Content Packs– ESM Custom Settings– Login and Print Settings– Custom Device Event Links– Remedy Email Server Settings– Cyber Threat Feeds– ESM Email Settings– ESM – Configuration, Key Management and Maintenance ▪ESM Settings – File Maintenance▪ESM – Login Security▪ESM – Profile Management▪ESM – Reports▪ESM – System Logs▪ESM – Users and Groups▪ESM – Add User▪ESM – Add Group▪ESM – Add Privileges▪ESM – Watchlists▪McAfee Receiver– Receiver Properties– Receiver Name and Description– Receiver Connection– Receiver Configuration– Receiver Management– Receiver Key Management– Receiver Device Log– Receiver Asset Sources▪Receiver HA▪Practice 2: SIEM Users and GroupsModule 3: ESMI Views▪The Data Problem– Increased Incidents– Filtering Issues– Event Management Challenges– The Solution▪McAfee ESMI– McAfee User Interface– ESMI Desktop– Views Toolbar– Views Toolbar– Out-of-Box Views– Use-Case Scenarios Using ESMI Dashboards▪Key Dashboards– Summarize By– Normalized Dashboard– Asset Vulnerability Summary– Geolocation Map– Source User Summary– Host Summary– Default Flow Summary– Incident Dashboard– Incidents Dashboard – Event Drilldown– Custom Views– Data Binding– SIEM Workflow Demonstration– Identify Slow and Low Data Exfiltration– Key take-aways from this demonstration▪Configure User-specific ESM Settings– Configure User Time Zone– Configure User Default Views– Practice 3: Creating a Custom ViewModule 4: Filtering, Watchlists, and Variables ▪Filters– Filter a view– Filter Sets– Default Filters– Using Multiple Filter Sets– Description of contains and regex Filters▪Syntax for contains and regex▪Points to consider when using contains or regex: ▪String Normalization▪String Normalization File▪Watchlists and Variables– Watchlists– Creating a Watchlist– Adding a Watchlist– Static and Dynamic Watchlists– GTI Watchlist– Create a watchlist of threat or IOC feeds from the Internet – Rule Variables– Common list of Variables– Configure Variables▪Practice 4: WatchlistsModule 5: Receiver Data Source Configuration▪Receiver Data Sources– Data Sources Screen– Add Data Source Definitions– Client Data Sources– Adding Client Data Sources: Match on Type vs. IP▪Child Data Sources▪Data Source Grouping▪Data Source Profiles▪Data Sources – Auto Learn▪Data Sources – WMI▪Data Sources – WMI Event Logs▪Data Sources – Syslog▪Data Sources – Generic Net Flow▪Data Sources – Correlation Engine▪McAfee ePO▪Importing and Exporting Data Sources▪Data Source Time Problems▪Time Delta Page▪Discovered Assets– Asset Manager– Vulnerability Assessment Data Sources– Vulnerability Assessments– Enable VA▪Real Time Data Enrichment▪Case Management– Remedy (Ticketing System) Interface ▪Practice 5: Data SourcesModule 6: Aggregation▪Aggregation Overview– SIEM Architecture▪How Aggregation Works– Simplified Aggregation Example– Raw Events– Aggregated Events– Event Aggregation– Dynamic Aggregation– Automatic Retrieval– Manual Retrieval– Changing Settings– Sample Aggregated Event Count– Event Aggregation– Start at Level Aggregation– Level Aggregation– Level Aggregation– Event Aggregation – Custom– Custom Field Aggregation Example ▪Modify Event Aggregation Settings▪Flow Aggregation▪Flow Aggregation Levels▪Start at Level Aggregation▪Level Aggregation▪Level Aggregation▪Flow Aggregation – Custom▪Flow Aggregation – Ports▪Port Values▪Practice 6: AggregationModule 7: Policy Editor▪Policy Editor Overview– Policy Editor Screen– Default Policy– Policy Tree– Policy Tree icons– Policy Tree Menu Items– Copy or Copy and Replace a Policy ▪To copy a policy, follow the steps▪Import a Policy▪Export a Policy▪Policy Change History▪Policy Status▪Policy Rollout▪Rollout Policy Correlation▪Tags▪Operations Menu▪Tools Menu▪Normalization Categories▪Severity Weights▪Rule Types▪Rules Display Pane▪Rule Inheritance▪The Inheritance Icons▪Rule Properties – Settings▪Action▪Severity▪Blacklisting▪Aggregation▪Copy packet▪Advanced Syslog Parser Rules▪Parsing Tab▪Field Assignment Tab▪Mapping▪Data Source Rules – Auto Learned▪Practice 1: Using the Syslog Parser – Part 1▪Practice 2: Using the Syslog Parser – Part 2Module 8: Correlation▪Optimized Risk Management– SIEM Technology Adoption Curve▪Event Normalization▪Event Correlation▪Event Correlation Engine– Understanding Correlation– Multiple Attackers Example▪Scanning Single Server (Distributed Dictionary Attack) – Receiver-based Correlation– Advanced Correlation Engine– Advanced Correlation Engine Risk– Content, Context and Risk Correlation– Add a Correlation Data Source– Correlation Rule Editor– Component of a rule– Correlation Rule Editor – Filters– Simple Example: Creating a Custom Correlation Rule ▪Criteria▪System penetration scenario▪Rollout Correlation Policy▪Scenarios▪Rollout Correlation Policy▪Practice 8.1: Correlation Rules▪Practice 2: Adding an ACE Appliance▪Practice 8.3: Historical CorrelationModule 9: Notifications and Reporting ▪Alarms– Create an Alarm– Alarm Settings– Alarm Settings – Condition Types▪Deviation from Baseline▪Device Failure▪Device Status Change▪Event Delta▪FIPS Failure▪HA Failure▪Field Match▪Internal Event Match▪Specified Event Rate▪Alarm Settings – Devices▪Alarm Settings – Actions▪Alarm Settings – Escalation▪Alarm Settings Additional Notes▪Additional Alarm Options▪Alarms Log▪Alarm Details▪Triggered Alarms View▪Reporting Overview– Out of Box Reports– Create Reports– System Properties – Reports– Add Report– Sections 1, 2, and 3– Section 4▪Section 5▪Section 6▪New Report Layout▪Designing Report Layout▪Document Properties▪Report Conditions▪Query Wizard▪UCF Report Filter▪Email Report Recipients▪Email Report Groups▪SMS Report Recipients▪SNMP Reports Recipients▪Syslog Report Recipients▪Add a Syslog Recipient▪Remove a Syslog Recipient▪View Running Reports▪View Report Files▪Export views and reports▪Practice 9.1: Creating Alarms▪Practice 9.2: ReportingModule 10: Working with ELM▪ELM Overview– Important Terms– Adding an ELM– ELM Properties– ELM Information– ELM Configuration– ELM Management– ELM Redundancy– Device Log– ELM Data– Enhanced ELM Search View▪Configuring the ELM for Storage – ELM Storage– Estimating ELM Storage Example– ELM Storage Pools– Add, Edit, or Delete a Storage Device – Add, Edit, Delete a Storage Pool– Mapping data sources to ELM storage Pools– ELM MigrateDB– ELM Mirrored Data Storage– Creating an Integrity Check JobModule 11: Troubleshooting and System Management▪McAfee Technical Support– ServicePortal (http://m )– Web Gateway Extranet (https:// )– McAfee Customer Service () 1-866-622-3911 261 – Login Troubleshooting– ESM Fails to Communicate with the Client– Client Fails Version Validation Test– ESM is Rebuilding– ESM is Backing Up or Restoring the Database▪Unable to SSH or login to the ESM▪The NGCP password for the ESMI desktop has been lost▪User can log in to ESMI but they have no rights▪Operating System and Browser- specific Issues▪ESM Login Screen Does Not Come Up on Linux Browser▪Login – unable to get the certificate using Firefox using IPv6 address▪Export/Download Troubleshooting When Using Windows 7▪Hardware Issues▪How to obtain the serial number from a device▪Beeping during initial startup▪Update and Upgrade Issues▪Software Upgrade Process▪How to ensure that the update file is not corrupt▪Manual rules updates▪Troubleshooting Upgrade to Version 9.5.0▪Reasons for Flags▪Device Status Alerts▪Device Status Window▪ESM and ESMI Troubleshooting▪How to initiate a callhome▪How to access the terminal via the GUI▪ESM Settings – Database▪How to export the ESMI login history▪How to manually set the time if no NTP server is available▪Unable to download rules from the McAfee servers▪How to determine if you are getting data from your data source ▪McAfee SIEM Sizing Overview。
基于大数据的雷达健康管理系统
第17卷 第4期 太赫兹科学与电子信息学报Vo1.17,No.4 2019年8月 Journal of Terahertz Science and Electronic Information Technology Aug.,2019 文章编号:2095-4980(2019)04-0691-07基于分段压缩和原子范数的跳频信号参数估计李慧启1,李立春1,张云飞2,刘志鹏1(1.战略支援部队信息工程大学信息系统工程学院,河南郑州 450002;2.西安电子科技大学通信工程学院,陕西西安 710071)摘 要:针对压缩域跳频信号参数估计方法需借助测量矩阵寻找压缩采样数据的数字特征,造成运算复杂度高,且存在基不匹配的问题,提出一种压缩域数字特征和原子范数的跳频信号参数估计方法。
建立块对角化的测量矩阵,实现信号分段压缩,分析压缩采样数据的数字特征,实现跳变时刻粗估计;分离出未发生频率跳变的信号段,利用原子范数最小化方法实现跳变频率的精确估计;最后依据精确估计的跳变频率,设计原子字典,并在压缩域实现跳变时刻的精确估计。
基于该算法的跳变频率估计性能高于基于压缩感知的跳变频率估计,亦能精确估计跳频信号的跳变时刻。
仿真结果显示,在信噪比高于-2 dB,压缩比高于0.5时,基于该算法的归一化跳变频率估计误差低于10-4,归一化跳变时刻估计误差低于10-2。
关键词:跳频信号;分段压缩;原子范数;参数估计中图分类号:TN911.7文献标志码:A doi:10.11805/TKYDA201904.0691Parameter estimation of frequency hopping signals based on. All Rights Reserved.piecewise compression and atomic normLI Huiqi1,LI Lichun1,ZHANG Yunfei2,LIU Zhipeng1(1.College of Information Systems Engineering,Information Engineering University,Zhengzhou Henan 450002,China;2.College of Communications Engineering,Xidian University,Xi'an Shaanxi 710071,China)Abstract:The parameter estimation of frequency hopping signal in compressed domain needs to find the digital characteristics of compressed sampling data by means of measurement matrix, which results inhigh computational complexity and base mismatch. In order to solve this problem, a method for parameterestimation of frequency hopping signal based on digital characteristics in compressed domain and atomicnorm is proposed. Firstly, a block diagonalization measurement matrix is established to realize signalpiecewise compression, and the digital characteristics of compressed sampling data are analyzed toroughly estimate hop timing. Then, the signal segments without frequency hopping are separated and theaccurate estimation of hopping frequency is realized by minimizing atomic norm. Based on the accurateestimation of hopping frequency, an atomic dictionary is designed and the accurate estimation of hoptiming is realized in compressed domain. The proposed method's frequency estimation performance isbetter than that based on grid compressive sensing, and the hop timing of frequency hopping signals canalso be accurately estimated. Simulation results show that when SNR is higher than -2 dB and thecompression ratio is higher than 0.5, the normalized hopping frequency estimation error based on theproposed algorithm is less than 10-4, and the normalized hop timing estimation error is less than 10-2.Keywords:frequency hopping signal;piecewise compression;atomic norm;parameter estimation跳频通信技术具有优良的抗干扰性能和多址组网性能,在军事通信中得到广泛应用[1]。
一种基于改进网格搜索和广义回归神经网络的锂离子电池健康状态估计方法
CALCE)[13],该数据集同样采用先恒流后恒压的充
电方法,但是选择的电流不同。以 CS2—35 为例,
电池的额定容量约为 1.1A·h,充电时先进行 0.55A
恒流充电,直到电压达到 4.2V,然后进行 4.2V 恒
压充电,直到电流降至 50mA;放电时以 1.1A 恒流
放电,直到电压降至 2.7V。通常,SOH 代表电池容
第 22 卷 第 7 期 2021 年 7 月
电气技术 Electrical Engineering
Vol.22 No.7 Jul. 2021
一种基于改进网格搜索和广义回归神经网络的 锂离子电池健康状态估计方法
姚 远 陈志聪 吴丽君 程树英 林培杰
(福州大学物理与信息工程学院,福州 350108)
摘要 为了准确估计锂离子电池的健康状态,本文提出一种新的基于改进网格搜索(GS)和 广义回归神经网络(GRNN)的估计方法。首先,对集中的数据进行处理,并通过相关性分析方法, 提取有效的特征数据,包括电压、电流等。其次,提出一种基于改进网格搜索和广义回归神经网 络的回归模型来估计电池的健康状态。最后,使用两个锂离子电池公共数据集验证提出的估计方 法。实验结果证明,与其他估计方法相比,所提方法在准确性、泛化性和可靠性方面具有优势。
针对上述存在的问题,本文提出一种新的 SOH 估计方法。首先,通过对数据集进行数据处理,并 采用相关性分析的方法选择提取相关性较高的特征 信息,大大减少训练数据,然后建立基于改进网格 搜 索 ( grid search, GS ) 和 广 义 回 归 神 经 网 络 (generalized regression neural network, GRNN)的 模型进行 SOH 估计。与其他方法相比,该方法可大 大缩短训练时间,具有更强的泛化能力和更准确的 SOH 估计结果。
基于弹性网和直方图相交的非负局部稀疏编码
DOI: 10. 11772 / j. issn. 1001-9081. 2018071483
基于弹性网和直方图相交的非负局部稀疏编码
*பைடு நூலகம்
万 源,张景会 ,陈治平,孟晓静
( 武汉理工大学 理学院,武汉 430070) ( * 通信作者电子邮箱 Jingzhang@ whut. edu. cn)
摘 要: 针对稀疏编码模型在字典基的选择时忽略了群效应,且欧氏距离不能有效度量特征与字典基之间距离 的问题,提出基于弹性网和直方图相交的非负局部稀疏编码方法( EH-NLSC) 。首先,在优化函数中引入弹性网模型, 消除字典基选择数目的限制,能够选择多组相关特征而排除冗余特征,提高了编码的判别性和有效性。然后,在局部 性约束中引入直方图相交,重新定义特征与字典基之间的距离,确保相似的特征可以共享其局部的基。最后采用多 类线性支持向量机进行分类。在 4 个公共数据集上的实验结果表明,与局部线性约束的编码算法( LLC) 和基于非负 弹性网的稀疏编码算法( NENSC) 相比,EH-NLSC 的分类准确率分别平均提升了 10 个百分点和 9 个百分点,充分体现 了其在图像表示和分类中的有效性。
Key words: sparse coding; elastic net model; locality; histogram intersection; image classification
0 引言
图像分类是计算机视觉领域的一个重要研究方向,广泛 应用于生物特征识别、网络图像检索和机器人视觉等领域,其 关键在于如何提取特征对图像有效表示。稀疏编码是图像特 征表示 的 有 效 方 法。考 虑 到 词 袋 ( Bag of Words,BoW) 模 型[1]和空 间 金 字 塔 匹 配 ( Spatial Pyramid Matching,SPM) 模 型[2]容易造成量化误差,Yang 等[3] 结合 SPM 模型提出利用 稀疏编 码 的 空 间 金 字 塔 的 图 像 分 类 算 法 ( Spatial Pyramid Matching using Sparse Coding,ScSPM) ,在图像的不同尺度上 进行稀疏编码,取得了较好的分类效果。在稀疏编码模型中, 由于 1 范数在字典基选择时只考虑稀疏性而忽略了群体效 应,Zou 等[4]提出一种新的正则化方法,将弹性网作为正则项 和变量选择方法。Zhang 等[5]提出判别式弹性网正则化线性
网络管理与工具OpenView文件与处理过程概述
• snmpCollect
– Collects MIB data and store the data in the $OV_DB/snmpCollect directory – Perform threshold monitoring and sends threshold events to pmd
(10)
Foreground Processes (1)
• ovw
– Provides map drawing, map editing and menu management
• ipmap
– Runs under ovw to automatically draw IP topology maps representing network
CORBA-based Agent (3) POSTECH DP&NM Lab.
$OV_CONF/ovsuf
• Contains the configuration information for ovspmd • Each entry in ovsuf is created by ovaddobj from informaton in the LRF (Local Registration File). • There is at least one LRF for each of the background processes • ovsuf example 1:ovwdb:ovwdb:OVs_YES_START::-O:OVs_WELL_BEHAVED:15:PAUSE:: 0:pmd:pmd:OVs_YES_START:::OVs_WELL_BEHAVED:15:PAUSE:: • LRF example # @(#)netmon.lrf netmon:netmon: OVs_YES_START:ovtopmd,pmd,ovwdb:-P -k segRedux=true:OVs_WELL_BEHAVED:15:PAUSE::
基于模糊能量有效的无线传感器网络簇头选择路由协议(IJCNIS-V7-N4-7)
I. J. Computer Network and Information Security, 2015, 4, 54-61Published Online March 2015 in MECS (/)DOI: 10.5815/ijcnis.2015.04.07Fuzzy Based Energy Efficient Multiple Cluster Head Selection Routing Protocol for WirelessSensor NetworksSohel Rana, Ali Newaz Bahar, Nazrul Islam, Johirul IslamDepartment of Information and Communication TechnologyMawlana Bhashani Science and Technology University, Tangail-1902, Bangladesh Email: sohel.rana10045@, bahar_mitdu@, nazrul.islam@mbstu.ac.bd, johir-ul.islam.6814@Abstract—The Wireless Sensor Network (WSN) is made up with small batteries powered sensor devices with lim-ited energy resources within it. These sensor nodes are used to monitor physical or environmental conditions and to pass their data through the wireless network to the main location. One of the crucial issues in wireless sen-sor network is to create a more energy efficient system. Clustering is one kind of mechanism in Wireless Sensor Networks to prolong the network lifetime and to reduce network energy consumption. In this paper, we propose a new routing protocol called Fuzzy Based Energy Effi-cient Multiple Cluster Head Selection Routing Protocol (FEMCHRP) for Wireless Sensor Network. The routing process involves the Clustering of nodes and the selec-tion of Cluster Head (CH) nodes of these clusters which sends all the information to the Cluster Head Leader (CHL). After that, the cluster head leaders send aggregat-ed data to the Base Station (BS). The selection of cluster heads and cluster head leaders is performed by using fuzzy logic and the data transmission process is per-formed by shortest energy path which is selected apply-ing Dijkstra Algorithm. The simulation results of this research are compared with other protocols BCDCP, CELRP and ECHERP to evaluate the performance of the proposed routing protocol. The evaluation concludes that the proposed routing protocol is better in prolonging network lifetime and balancing energy consumption. Index Terms—Fuzzy logic, Wireless Sensor Network, Cluster Head Leader, Shortest Energy Path, Dijkstra Al-gorithm.I.I NTRODUCTIONA wireless sensor network is one kind of energy con-strained network. Wireless sensor networks are formed by a number of sensor nodes, which are powered by bat-teries. The replacement or recharging process of these batteries is very difficult. Sensor nodes are used to moni-tor environmental or physical conditions, such as temper-ature, sound, and motion, etc. Recent technological de-velopment in the Micro Electronic Mechanical system (MEMS) and wireless communication technologies have enabled the invention of tiny, low power, low cost, and multi-functional smart sensor nodes in a wireless sensor network. The transmission of a finite amount of infor-mation can be only supported by finite energy.In twenty-first century, WSN have been widely con-sidered as one of the most important technology. The most important factor in WSN is energy efficiency for prolonging network lifetime and also for balancing ener-gy consumption. Routing is also an important factor that affects wireless sensor networks [1, 7]. One of the most restrictive factors on the lifetime of wireless sensor net-works is the limited energy resources of the sensor nodes. Sensor nodes can be organized hierarchically by group-ing them into clusters in order to achieve energy efficien-cy.Previously, a several numbers of literatures have been done to improve energy efficiency of Wireless Sensor Networks. One of them is Low Energy Adaptive Cluster-ing Hierarchy (LEACH) [3, 4]. It is a hierarchical proto-col. Moreover, it uses single-hop routing that means eve-ry sensor node transmits information directly to the clus-ter head. Therefore, it is not recommended for large area networks. After that, some protocols BCDCP [7], PEG-ASIS [8], CELRP [10] and GPSR [11] are proposed to improve the energy efficiency of LEACH protocol using multi hop routing schema. Base-Station Control, Dynam-ic Clustering Protocol (BCDCP) [7] is a centralized rout-ing protocol, which uses Minimal Spanning Tree (MST) [2] to connect to CH which randomly chooses a leader to send data to sink. BCDCP route data energy efficiency in small-scale network. A Cluster Based Energy Efficient Location Routing Protocol (CELRP) [10] is a location based routing protocol. It applies the Greedy algorithm to chain the cluster heads. In Power-Efficient Gathering in Sensor Information Systems (PEGASIS) [8], each node can transfer data to only its nearby neighbor. It uses a greedy algorithm to form a chain of nodes. These proto-cols [7, 8, 10] use only one cluster head leader to transmit data to the base station. These protocols are not appropri-ate for large area networks. GPSR [11] is a position based routing protocol, which performed by using a geo-graphic positioning system (GPS). Then a literature [14] is introduced the trust concept in GPSR and name T-GPSR. Recently, an improvement of T-GPSR is performed in [15]. Some other position based protocols are Location Aided Routing Protocol (LAR) [12] and GRID [13]. A wide description of geography based routing pro-tocol is found in [16, 17].After that, some protocols ECHERP [5], TEEN [6] and SHORT [9] are also proposed to improve energy effi-ciency of wireless sensor networks. Equalized Cluster Head Election Routing Protocol (ECHERP) [5] models the network field as a linear system. However, in this protocol, only a first level Cluster Heads can directly transmit data to the BS, so first level nodes will die first. Threshold Sensitive Energy Efficient (TEEN) is a proto-col which designed for sudden changes in the sensed en-vironment [6]. In TEEN, the sensor network architecture is designed hierarchically. It does not operate properly when the numbers of layers increases.Energy consumption and network lifetime are the pa-rameters to measure the energy efficiency of a wireless sensor network. In a network, which uses only one CHL to transmit aggregated data to the base station, the sensor nodes start to die in a very short round and also the nodes which are close to the base station die first. It causes to decrease network lifetime and imbalance energy con-sumption, which affects the energy efficiency of the whole network. It would be interesting to evaluate, how we can minimize the total energy consumption and pro-long network lifetime of wireless sensor networks.This research mainly focuses on multiple CHs and CHLs which are used in the large area network to trans-mit data to the base station and also in the data transmis-sion process of the network. These CHs and CHLs are selected by using fuzzy logic. This study attempts to min-imize the total energy consumption and prolong network lifetime of this large area network.This paper is structured as follows: Section II presents the methodology of proposed energy efficient routing protocol. Section III describes details about the network model of this protocol. Section IV illustrates details about the simulation of this study to analyses energy consump-tion and network lifetime. Section V shows simulation results and evaluates performance on network life time and energy consumption. Finally, Section VI represents a set of conclusions and the future works.II.R ESEARCH M ETHODMany literatures have been proposed based on single-hop routing [3], multi-hop routing [5-10] and fuzzy logic [19-22] and also position based routing [10-13]. However, these solutions depend on one elected CHL to directly transmit aggregated data to the BS. This dependency on only one CHL sharply decreases total resume energy of whole network.This literature mainly follows [5, 7, 10] and introduces an energy efficient data transmission process based on multiple CHs and CHLs for WSN.We study different simulation tools used in previous studies [16-22]. There are many simulation tools to simu-late the proposed protocol. However, one of these simula-tion tools is selected based on the accuracy and minimum runtime complexity of these tools. Primarily, the simula-tion setup is performed by mapping the network field. After that, we select CHs and CHLs in the network field using Fuzzy Inference Engine and apply the Dijkstra al-gorithm to chain cluster members and CHs.After finishing the simulation setup, we observe many simulation data. We use same simulation parameters which are previously taken by other protocols [5, 7, 10] to compare the simulation results and to evaluate the per-formance of this protocol. Moreover, we plot different graphs to show the comparison of the simulation results of this study and the other protocols [5, 7, 10]. From plot-ted graphs it shows that the proposed protocol is better than BCDCP, CELRP and ECHERP in prolonging the network lifetime and balancing energy consumption. A careful observation of these plots provides a quantitative measure of the energy efficiency of the new routing pro-tocol.III.N ETWORK M ODEL OF P ROPOSED R OUTING P ROTOCOL The network model of this study is shown in Figure 1. This routing protocol provides balance in the energy con-sumption and prolongs the network lifetime.Fig 1. Scenario of proposed network modelThe new routing protocol organizes clusters so that all the nodes can be included in these clusters. It chooses CHs for each cluster using Fuzzy logic, which is based on highest energy resume and minimum distance from the BS. In this protocol, the Dijkstra algorithm is applied to find a shortest energy path of each node. After that, it chains the cluster members and CHs according to short-est energy paths. Finally, cluster members; send data packets to the CHs. In this protocol, multiple CHLs are chosen by the BS using fuzzy logic based on highest en-ergy resume and minimum distance from BS of each CH. Each CHL can transmit data directly or by other CHLs to the BS, depending on shortest energy path.We simulate this network to analyze network lifetime, average residualenergy and energy dissipation of this study. The simulation results show that, this protocol is better in prolonging network lifetime and balancing ener-gy consumption compared to the BCDCP, CELRP and ECHERP.IV. S IMULATIONThis section discusses the simulation of this study. The simulation takes place by using MATLAB. We use the Fuzzy Inference Engine to select CHs and CHLs. We also apply the Dijkstra algorithm to chain the cluster members according to their shortest energy path. A. Network Field MappingAs shown in Figure 2, we design a network field with 100 nodes, which are randomly scattered in this sensing field with dimension (100m × 80m) and BS located at position (130,100).Fig 2. A snapshot of random deployment of sensor nodes in the networkfieldTo compare the performance of the study with other protocols, we ignore the effect caused by signal collision and interference in the wireless channel. Table 1 summa-rizes the parameters used in our simulation.Table 1. Simulation ParametersB. Clustering and Cluster Head SelectionIn this protocol, Clustering has been done with the fuzzy clustering method and Cluster Heads have been selected by fuzzy logic based on both the energy resume and the distance from the Base-Station of a sensor node.We use equations which are given in [10] to determine energy spent for transmission of a 1-bit packet from the transmitter to the receiver at a distance (d) is defined as:( )(1)E Tx is the energy dissipated in the transmitter of source node. The electronic energyEelec is the per bit energy dissipation for running the transceiver circuitry. The threshold distance d 0 can be obtained from:√(2)And E Rx is the energy expanded to receive messages:( ) (3)The distance (d) of node from one node to another node is calculated by following equation:√( ) ( ) (4)Energy cluster is the sum of energy in Cluster Heads:( ) ( ) (5)Where, k i indicates the number of member nodes in the cluster heads. E Tx (l, d) indicates energy transmission. E Rx (l) indicates energy receiver and E DA indicates energy of data aggregation.B1. Fuzzy Membership Functions Implementation Membership functions [19, 20] of the fuzzy system pa-rameters to determine the cluster heads are shown in Fig-ures 3 (a), (b) and (c).(a)(b)(c)Fig 3. (a) Fuzzy Input Membership function (Distance) (b) Fuzzy Input Membership function (Energy) (c) Fuzzy output Membership function(Possibility)B2. Fuzzy Rules GenerationTo find the possibility of a node to be a CH, it needs to assign Fuzzy Rules for all possible inputs. Table 2 shows these Fuzzy Rules.Table 2.Fuzzy Rules to Select Cluster HeadsThe Table 2 shows that a sensor node that has a greater distance from the base station and less residual energy has the lowest possibility to be a CH. On the other hand, a sensor node that has a lower distance from the basestation and high residual energy has the highest possibil-ity to be a CH.(a)(b)Fig 4. (a) Implementation of Fuzzy Rules (b) The Fuzzy Rules ViewerAs shown in Figure 4, we implement fuzzy rules and find the possibility of each node to be a CH. We select a node as a CH which has the maximum possibility among all cluster members. Then we also select CHLs among all CHs using the same process. C. Data Transmission ProcessThe cluster members are chained by finding a shortest energy path of each member. These shortest energy paths are selected by applying the Dijkstra algorithm to trans-mit data from each cluster member to CH and then all CHs to the CHLs in the network field. Finally, CHLs send data to the BS according to their shortest energy paths.V. R ESULT AND A NALYSISWe technically simulate this protocol by Fuzzy Infer-ence Engine as shown in Figures 3 and 4. After that, we observe several results of this simulation. The simulation results of this study are shown for a few rounds,Fig 5. Network field scenario of first roundFigure 5 shows the scenario of the network field for first round where different clusters are defined by differ-ent color and also shows the CH of each cluster. Moreo-ver, it shows the CHLs of the network in the first round to transmit data to the BS.(a)(b)Fig 6. (a) Network field scenario after round 200. (b) Network fieldscenario after round 300.The network field scenario after round 200 and 300 are shown in Figure 6 (a) and (b) respectively. Figure 6 (a) shows CHs of different clusters and CHLs of the network in round 200. It also shows CH nodes and CHL nodes are changed based on their highest resume energy and mini-mum distance from the base station using fuzzy logic and fuzzy rules. Figure 6 (b) also shows CHs of different clusters and CHLs of the network in round 300. The changes of CH nodes and CHL nodes also take place in this round.A. Performance Evaluation of FEMCHRP ProtocolThe results of this study are compared with BCDCP, CELRP and ECHERP protocol in the same heterogene-ous setting to evaluate the performance of the study. We use three matrices to analyze and compare the results: Network lifetime, Energy dissipation and Energy resume. We define the network lifetime as the number of rounds made by a node to the first node exhausts all of its energy in the network. One round defines the operation begin-ning of the cluster formation up until the final BS re-ceives all data from the CHLs.A1. Average Energy Dissipation for Several RoundsIn WSN, the average energy dissipation is an im-portant measurement to compare the protocols. In this subsection, a graph is shown in Figure 7 is performed to calculate the average energy dissipation over several rounds. We observe that, the protocol significantly re-duces energy consumption. Since it uses an alternative method to select the CH based on the location and the residual energy of nodes. Moreover, the uses of multi hop for transmission data in each cluster also resulted in a more efficient energy usage and less consumption of en-ergy for both intra and inter cluster data transmission in our protocol.The line graph of Figure 7 shows that the reduction in the average energy dissipation can be obtained by about 48% higher than BCDCP, 41% than ECHERP and 36% than CELRP which means that the FEMCHRP consumes about 48% less than BCDCP, 41% than ECHCRP and 36% less than CELRP. The graph curve also shows that the dissipation that varies between rounds of FEMCHRP is higher than BCDCP, CELRP and ECHERP.Fig 7. A comparison of FEMCHRP’s Average Energy Dissipation withBCDCP, CELRP and ECHERPAccording to the discussion, the protocol shows a bet-ter performance than BCDCP, CELRP and ECHERP in terms of energy consumption.A2. Number of Nodes Alive over Several RoundsAnother important issue in WSN is the number of nodes alive over several rounds. In this subsection, the Figure 8 presents the number of lifetimes of nodes which means the numbers of round until the first node dies for our protocol. Figure 8 also shows that it is higher than BCDCP, CELRP and ECHERP.We also note that the lifetime starts decreasing at round 150 in BCDCP, at round 200 for ECHERP and at round 320 for CELRP while in the case of FEMCHRP the decrease only starts after more than 410 rounds. We calculated that in BCDCP the node died, 39% faster, in ECHERP the node died, 27% faster and in CELRP the node died 17% faster than FEMCHRP. That means an average number of live sensor nodes in FEMCHRP is 39% higher than BCDCP, 27% higher than ECHERP and 17% higher than CELRP.Fig 8. A compar ison of FEMCHRP’s system lifetime with BCDCP,CELRP and ECHERPTherefore, it has been shown that FEMCHRP is better in prolonging network lifetime compared to the BCDCP, CELRP and ECHERP. It should also be noted that the graph of FEMCHRP is smoother than the BCDCP, CELRP and ECHERP.A2. Average Residual Energy in Several RoundsWe measure the average residual energy of our net-work and compared these results to BCDCP, CELRP and ECHERP. We observe that the residual energy of the network is higher than other protocols.Figure 9 represents that the reduction in average ener-gy residual can be obtained by 42% higher than BCDCP, 22% than ECHERP and 10% than CELRP which means that the FEMCHRP consumes about 42% less energy than BCDCP, 22% less than ECHERP and 10% less than CELRP. Figure 9 also shows that the dissipation that varies between rounds of the protocol is higher than BCDCP, CELRP and ECHERP. Therefore, it has better performance than BCDCP, CELRP and ECHERP in terms of energy efficiency as well as able to prolong the network lifetime of sensor nodes.Fig 9. A comparis on of FEMCHRP’s Average Residual Energy withBCDCP, CELRP and ECHERPAfter observing and analyzing different simulation re-sults, we get a set of decisions which can make this pro-tocol better than BCDCP, CELRP and ECHERP. These decisions lead to the conclusion.IV.C ONCLUSIONIn this paper, we present a set of observations with re-gard average energy dissipation, network lifetime and average residual energy of the proposed network. The recapitulations of this study are discussed below. First, this network consumes less energy to transmit total ag-gregated data to the Base Station than other protocols. Its average energy dissipation is much lower than BCDCP, CELRP and ECHERP. Second, the network lifetime starts decreasing after more than 410 rounds, which is much higher than other protocols and means that this protocol is better than other protocols in terms of network lifetime. Finally, the average residual energy of the study is high which also means that it transmits more data than other protocols. It also concludes that, this proposed pro-tocol is an energy efficient protocol, which prolongs the network lifetime effectively.The future work can be addressed as to consider the delay of the system. In addition, we also plan to design a heterogeneous network where it can have several Base-Stations that communicate together and use this protocol which selects multiple Cluster Heads using fuzzy logic and uses these Cluster Heads to transmit data to the Base Stations.R EFERENCES[1]Zou, Y., & Chakrabarty, K. (2005). A distributed cover-age-and connectivity-centric technique for selecting ac-tive nodes in wireless sensor networks. Computers, IEEE Transactions on, 54(8), 978-991.[2]Shen, H. (1999). Finding the k most vital edges with re-spect to minimum spanning tree. Acta Informatica, 36(5),405-424.[3] Heinzelman, W. R., Chandrakasan, A., & Balakrishnan, H.(2000, January). Energy-efficient communication protocol for wireless microsensor networks. In System Sciences, 2000. Proceedings of the 33rd Annual Hawaii Interna-tional Conference on (pp. 10-pp). IEEE.[4] Heinzelman, W. B., Chandrakasan, A. P., & Balakrishnan,H. (2002). An application-specific protocol architecture for wireless microsensor networks. Wireless Communica-tions, IEEE Transactions on , 1(4), 660-670.[5] Nikolidakis, S. A., Kandris, D., Vergados, D. D., &Douligeris, C. (2013). Energy efficient routing in wireless sensor networks through balanced clustering. Algorithms, 6(1), 29-42.[6] Manjeshwar, A., & Agrawal, D. P. (2001, April). TEEN:a routing protocol for enhanced efficiency in wireless sen-sor networks. In Parallel and Distributed Processing Symposium, International (Vol. 3, pp. 30189a-30189a). IEEE Computer Society.[7] Sabbineni, H., & Chakrabarty, K. (2005). Location-aidedflooding: an energy-efficient data dissemination protocol for wireless-sensor networks. Computers, IEEE Transac-tions on , 54(1), 36-46.[8] zLindsey, S., & Raghavendra, C. S. (2002). PEGASIS:Power-efficient gathering in sensor information systems. In Aerospace conference proceedings, 2002. IEEE (Vol. 3, pp. 3-1125). IEEE.[9] Yang, Y., Wu, H. H., & Chen, H. H. (2007). SHORT:shortest hop routing tree for wireless sensor networks. In-ternational Journal of Sensor Networks , 2(5), 368-374. [10] Nurhayati, S. H. C., & Lee, K. O. (2011). A Cluster BasedEnergy Efficient Location Routing Protocol in Wireless Sensor Networks. Proceedings International Journal of Computers and Communications , 5(2).[11] Karp, B., & Kung, H. T. (2000, August). GPSR: Greedyperimeter stateless routing for wireless networks. In Pro-ceedings of the 6th annual international conference on Mobile computing and networking (pp. 243-254). ACM. [12] Ko, Y. B., & Vaidya, N. H. (2000). Location ‐AidedRouting (LAR) in mobile ad hoc networks. Wireless Net-works , 6(4), 307-321.[13] Liao, W. H., Sheu, J. P., & Tseng, Y. C. (2001). GRID: Afully location-aware routing protocol for mobile ad hoc networks. Telecommunication Systems , 18(1-3), 37-60. [14] Pirzada, A. A., & McDonald, C. (2007, November).Trusted greedy perimeter stateless routing. In Networks, 2007. ICON 2007. 15th IEEE International Conference on (pp. 206-211). IEEE.[15] Vamsi, P. R., & Kant, K. (2014). An Improved TrustedGreedy Perimeter Stateless Routing for Wireless Sensor Networks. International Journal of Computer Network and Information Security (IJCNIS), 5(11), 13-19.[16] Ming-jer, Tsai, hong-yen, yang, bing-Hong, liu and Wen-Qian, huang. (2008). Virtual Coordinate A Geography-based Heterogeneous Hierarchy Routing Protocol in Wireless Sensor Networks. INFOCOM on (pp. 351-355). [17] Chen, X., Qu, W., Ma, H., & Li, K. (2008, September). AGeography –Based Heterogeneous Hierarchy Routing Pro-tocol for Wireless Sensor Networks. In High Performance Computing and Communications, 2008. HPCC'08. 10th IEEE International Conference on (pp. 767-774). IEEE. [18] Su, X., Choi, D., Moh, S., & Chung, I. (2010, February).An energy-efficient clustering for normal distributed sen-sor networks. In Proceedings of the 9th WSEAS Interna-tional Conference on VLSI and Signal Processing (IC-NVS’10), Cambridge, UK (pp. 81-84).[19] Minhas, M. R., Gopalakrishnan, S., & Leung, V. C. (2008,November). Fuzzy algorithms for maximum lifetime rout-ing in wireless sensor networks. In Global Telecommuni-cations Conference, 2008. IEEE GLOBECOM 2008. IEEE (pp. 1-6). IEEE.[20] Gupta, I., Riordan, D., & Sampalli, S. (2005, May). Clus-ter-head election using fuzzy logic for wireless sensor networks. In Communication Networks and Services Re-search Conference, 2005. Proceedings of the 3rd Annual (pp. 255-260). IEEE.[21] Tashtoush, Y. M., & Okour, M. A. (2008, December).Fuzzy self-clustering for wireless sensor networks. In Embedded and Ubiquitous Computing, 2008. EUC'08. IEEE/IFIP International Conference on (Vol. 1, pp. 223-229). IEEE.[22] Banerjee, P. S., Paulchoudhury, J., & Chaudhuri, S. B.(2013). Fuzzy Membership Function in a Trust Based AODV for MANET. International Journal of Computer Network and Information Security (IJCNIS), 5(12), 27-34.Authors’ ProfilesSohel Rana was born in Comilla, Bang-ladesh, on 25th October 1992. He has completed his Bachelor of Engineering in Information and Communication Tech-nology (ICT) from Mawlana Bhashani Science and Technology University, Tangail-1902, Bangladesh in 2014. His area of research interests are Image Pro-cessing, Wireless Sensor Network andNeural Network etc.Ali Newaz Bahar received B.Sc. (Engg.) degree from Mawlana Bhashani Science and Technology University (MBSTU) in Information and Communication Tech-nology (ICT) in 2010 and Masters from Institute of Information Technology (IIT) (University of Dhaka, Bangladesh) in 2012. His area of interest is congestion control for Mobile Ad hoc Network,Wireless Sensor Networks, Cognitive Radio Network, Quan-tum-dot Cellular Automata (QCA), Artificial Intelligence and Cloud Computing.Nazrul Islam received a Bachelor de-gree in Information and Communication Technology (ICT) from Mawlana Bhashani Science and Technology Uni-versity, Tangail, Bangladesh. He is hold-ing M.Sc degree in Electrical Engineer-ing with emphasis on Telecommunica-tion Systems from Blekinge Institute of Technology, Karlskrona, Sweden. How-ever, currently he is working as a Lecturer in the Department of Information and Communication Technology at Mawlana Bhashani Science and Technology University, Tangail, Bangla-desh. His current research interests in the fields related to Communication Networks and its applications, mainly model-ing and analysis with respect to Quality of Service (QoS) and Quality of Experience (QoE).Johirul Islam was born in Noakhali, Bangladesh, on 1st January, 1992. He accomplished B.Sc. (Engg.) degree in Information and Communication Tech-nology (ICT) from Mawlana Bhashani Science and Technology University, Tangail-1902, Bangladesh in 2014. He has a great wireless sensor networks, Communication Networks and networkprotocol etc.Howto cite this paper: Sohel Rana, Ali Newaz Bahar, Nazrul Islam, Johirul Islam,"Fuzzy Based Energy Efficient Multiple Cluster Head Selection Routing Protocol for Wireless Sensor Networks", IJCNIS, vol.7, no.4, pp.54-61, 2015.DOI: 10.5815/ijcnis.2015.04.07。
利用稀疏协同模型的目标跟踪算法
利用稀疏协同模型的目标跟踪算法李飞彬;曹铁勇;宋智军;查绎;王文【期刊名称】《计算机辅助设计与图形学学报》【年(卷),期】2016(028)012【摘要】Focusing on strengthening the robustness of vedio object tracking, an algorithm via sparse col-laborative model was proposed. In the discriminative model, the prior visual information was exploited to learn an over-complete dictionary based on the SIFT feature, the dictionary was used to represent the object and train the classifier which separated the object from the background. In the generative model, it extracted the local feature and calculated the occlusion information of the object to construct the object templates, and then the tracking was implemented by computing the similarity between the candidates and the templates. Eventually, the multiplicative formula was exploited to joint the two models to acquire final tracking result. Both qualitative and quantitative evaluations on challenging image sequences demonstrate that the proposed algorithm performs favorably against several state-of-the-art methods.%针对增强视频目标跟踪鲁棒性难题,提出一种利用稀疏协同判别模型和生成模型的跟踪算法。
kdd07
Correlation Search in Graph DatabasesYiping Ke keyiping@t.hkJames Chengcsjames@t.hkWilfred Ngwilfred@t.hkDepartment of Computer Science and Engineering The Hong Kong University of Science and T echnology Clear Water Bay,Kowloon,Hong KongABSTRACTCorrelation mining has gained great success in many ap-plication domains for its ability to capture the underlying dependency between objects.However,the research of cor-relation mining from graph databases is still lacking despite the fact that graph data,especially in various scientific do-mains,proliferate in recent years.In this paper,we propose a new problem of correlation mining from graph databases, called Correlated Graph Search(CGS).CGS adopts Pear-son’s correlation coefficient as a correlation measure to take into consideration the occurrence distributions of graphs. However,the problem poses significant challenges,since ev-ery subgraph of a graph in the database is a candidate but the number of subgraphs is exponential.We derive two nec-essary conditions which set bounds on the occurrence prob-ability of a candidate in the database.With this result,we design an efficient algorithm that operates on a much smaller projected database and thus we are able to obtain a signif-icantly smaller set of candidates.To further improve the efficiency,we develop three heuristic rules and apply them on the candidate set to further reduce the search space.Our extensive experiments demonstrate the effectiveness of our method on candidate reduction.The results also justify the efficiency of our algorithm in mining correlations from large real and synthetic datasets.Categories and Subject Descriptors:H.2.8[Database Management]:Database Applications-Data Mining General Terms:AlgorithmsKeywords:Correlation,Graph Databases,Pearson’s Cor-relation Coefficient1.INTRODUCTIONCorrelation mining is recognized as one of the most impor-tant data mining tasks for its capability of identifying the underlying dependency between objects.It has a wide range of application domains and has been studied extensively on market-basket databases[5,13,15,24,23,29],quantita-Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.KDD’07,August12–15,2007,San Jose,California,USA. Copyright2007ACM978-1-59593-609-7/07/0008...$5.00.tive databases[11],multimedia databases[16],data streams [20],and many more.However,little attention has been paid to mining correlations from graph databases,in spite of the popularity of graph data model pertaining to various domains,such as biology[4,10],chemistry[2],social science [3],the Web[17]and XML[1].In this paper,we study a new problem of mining cor-relations from graph databases.We propose to use Pear-son’s correlation coefficient[21]to measure the correlation between a query graph and an answer graph.We formu-late this mining problem,named Correlated Graph Search (CGS),as follows.Given a graph database D that consists of N graphs,a query graph q and a minimum correlation thresholdθ,the problem of CGS is tofind all graphs whose Pearson’s correlation coefficient wrt q is no less thanθ. Pearson’s correlation coefficient is shown to be one of the most desirable correlation measures in[21]for its ability to capture the departure of two variables from independence. It has been widely used to describe the strength of correla-tion among boolean variables in transaction databases[21, 23,29].This motivates us to apply the measure in the con-text of graph databases.However,graph mining is a much harder problem due to the high complexity of graph oper-ations(e.g.,subgraph isomorphism testing is NP-complete [7]).The difficulty of the problem is further compounded by the fact that the search space of CGS is often large,since a graph consists of exponentially many subgraphs and each subgraph of a graph in D can be a candidate graph.Thus, it poses great challenges to tackle the problem of CGS. How can we reduce the large search space of CGS and avoid as many expensive graph operations as possible?We investigate the property of Pearson’s correlation coefficient and derive two necessary conditions for the correlation con-dition to be satisfied.More specifically,we derive the lower bound and upper bound of the occurrence probability(also called support),supp(g),of a candidate graph g.This ef-fectively reduces the search space to be the set of Frequent subGraphs(FGs)[12]in D with the support values between the lower and upper bounds of supp(g).However,mining FGs from D is still expensive when the lower bound of supp(g)is low or D is large.Moreover,we still have a large number of candidates and the solution is not scalable.Thus,we need to further reduce the number of candidates as well as address the scalability problem.Our solution to this problem is as follows.Let D q be the projected database of D on q,which is the set of all graphs in D that are supergraphs of q.We prove that,the set of FGs mined from D q using lowerbound(supp(g))supp(q)as the minimum support threshold is complete wrt the an-swer set.Since D q is in general much smaller than D whilelowerbound(supp(g))supp(q)is greater than lowerbound(supp(g)),ourfinding not only saves the computational cost for generating the candidate set,but also significantly reduces the number of candidates.Furthermore,we develop three heuristic rules to be applied on the candidate set to identify the graphs that are guaranteed to be in the answer set,as well as to prune the graphs that are guaranteed to be false positives.In addition to the formulation of the new CGS problem and its efficient solution,the significance of our work also lies in its close connection to graph similarity search,which is an important research area of graph querying.There are two types of similarity:structural similarity(i.e.,two graphs are similar in structure)and statistical similarity(i.e.,the occurrence distributions of two graphs are similar). Existing work[8,18,27,22]mainly focuses on struc-tural similarity search.However,in many applications,two graphs that are structurally dissimilar but always appear to-gether in a graph in D may be more interesting.For exam-ple,in chemistry,isomers refer to molecules with the same chemical formula and similar structures.The chemical prop-erties of isomers can be quite different due to different posi-tions of atoms and functional groups.Consider the case that the chemist needs tofind some molecule that shares similar chemical properties of a given molecule.Structural similar-ity search is not relevant,since it mostly returns isomers of the given molecule that have similar structures but different chemical properties,which is undesirable.On the contrary, CGS is able to obtain the molecules that share similar chem-ical properties but may or may not have similar structures to the given molecule.Therefore,our proposed CGS solves an orthogonal problem of structural similarity search.Our extensive experiments on both real and synthetic datasets show that our algorithm,called CGSearch,achieves short response time for various queries with relatively small memory pared with the approach whose candidate set is generated from the whole database with a support range,CGSearch is orders of magnitude faster and consumes up to41times less memory.The effectiveness of the candidate generation from the projected database and three heuristic rules is also demonstrated. Contributions.The specific contributions of the paper are stated as follows.•We formulate the new problem of correlation search in graph databases,which takes into account the occur-rence distributions of graphs using Pearson’s correla-tion coefficient.•We derive theoretical bounds for the support of a can-didate graph,which reduces the search space consid-erably.•We propose to generate the candidate set by mining FGs from the projected database of the query graph.Three heuristic rules are developed to further reduce the size of the candidate set.•We present an efficient algorithm to solve the problem of CGS.We also prove the soundness and completeness of the query results returned by the algorithm.•A comprehensive set of experiments is conducted to verify the efficiency of the algorithm,and the effec-tiveness of the candidate generation and the heuristicrules.Organization.We give preliminaries in Section2.We de-fine the CGS problem in Section3.We propose the effectivecandidate generation from a projected database in Section4.We present the algorithm,as well as the three heuristicrules,in Section5.Then,we analyze the performance studyin Section6.Finally,we discuss related work in Section7and conclude our paper in Section8.2.PRELIMINARIESIn this paper,we restrict our discussion on undirected,la-belled connected graphs(or simply graphs hereinafter),sincemost of the interesting graphs in practice are connectedgraphs;while our method can be easily extended to processdirected and unlabelled graphs.A graph g is defined as a4-tuple(V,E,L,l),where V isthe set of vertices,E is the set of edges,L is the set oflabels and l is a labelling function that maps each vertexor edge to a label in L.We define the size of a graph g assize(g)=|E(g)|.Given two graphs,g=(V,E,L,l)and g′=(V′,E′,L′,l′), a subgraph isomorphism from g to g′is an injective func-tion f:V→V′,such that∀(u,v)∈E,(f(u),f(v))∈E′, l(u)=l′(f(u)),l(v)=l′(f(v)),and l(u,v)=l′(f(u),f(v)). The subgraph isomorphism testing is known to be an NP-complete problem[7].A graph g is called a subgraph of another graph g′(or g′is a supergraph of g),denoted as g⊆g′(or g′⊇g),if there exists a subgraph isomorphism from g to g′.Let D={g1,g2,...,gN}be a graph database that consists of N graphs.Given D and a graph g,we denote the set ofall graphs in D that are supergraphs of g as D g={g′:g′∈D,g′⊇g}.We call D g the projected database of D on g.The frequency of g in D,denoted as freq(g;D),is defined as|D g|. The support of g in D,denoted as supp(g;D),is defined as freq(g;D)|D|.A graph g is called a Frequent subGraph(FG)[9, 12,25]in D if supp(g;D)≥σ,whereσ(0≤σ≤1)is a user-specified minimum support threshold.For simplicity,we use freq(g)and supp(g)to denote the frequency and support of g in D when there is no confusion.Given two graphs,g1and g2,we define the joint frequency,denoted as freq(g1,g2),as the number of graphs in D that are supergraphs of both g1 and g2,i.e.,freq(g1,g2)=|D g1∩D g2|.Similarly,we define the joint support of g1and g2as supp(g1,g2)=freq(g1,g2)|D|. The support measure is anti-monotone,i.e.,if g1⊆g2, then supp(g1)≥supp(g2).Moreover,by the definition of joint support,we have the following property:supp(g1,g2)≤supp(g1)and supp(g1,g2)≤supp(g2).Example1.Figure1shows a graph database,D,that consists of10graphs,g1,...,g10.For clarity of presentation, all the nodes are of the same label(not shown in thefigure); while the characters a,b and c represent distinct edge labels. The graph g8is a subgraph of g2.The projected database of g8,i.e.,D g8,is{g2,g3,g6,g7,g8}.The frequency of g8 is computed as freq(g8)=|D g8|=5.The support of g8 is supp(g8)=freq(g8)|D|=0.5.As for g9,we have D g9= {g6,g7,g9}.The joint frequency of g8and g9is computedas freq(g8,g9)=|D g8∩D g9|=|{g6,g7}|=2.The joint support of g8and g9is supp(g8,g9)=freq(g8,g9)|D|=0.2.(g 1 )(g 2 ) b abc c acb (g 3 ) a bc ac(g 4 ) aa b (g 5 ) bb(g 6 )a ac c (g 7 )acc a c (g 8 )cc(g 9 )a ac (g 10 ) a aa Figure 1:A Graph Database,D3.PROBLEM DEFINITIONWe first define Pearson’s correlation coefficient [19]for two given graphs.Pearson’s correlation coefficient for boolean variables is also known as “φcorrelation coefficient ”[28].Definition 1.(Pearson’s Correlation Coefficient )Given two graphs g 1and g 2,the Pearson’s Correlation Co-efficient of g 1and g 2,denoted as φ(g 1,g 2),is defined as fol-lows:φ(g 1,g 2)=supp (g 1,g 2)−supp (g 1)supp (g 2)supp (g 1)supp (g 2)(1−supp (g 1))(1−supp (g 2)).When supp (g 1)or supp (g 2)is equal to 0or 1,φ(g 1,g 2)isdefined to be 0.The range of φ(g 1,g 2)falls within [−1,1].If φ(g 1,g 2)is positive,then g 1and g 2are positively correlated;otherwise,g 1and g 2are negatively correlated.In this paper,we focus on positively correlated graphs defined as follows.Definition 2.(Correlated Graphs )Two graphs g 1and g 2are correlated if and only if φ(g 1,g 2)≥θ,where θ(0<θ≤1)is a user-specified minimum correlation thresh-old .We now define the correlation mining problem in graph databases as follows.Definition 3.(Correlated Graph Search )Given a graph database D ,a correlation query graph q and a min-imum correlation threshold θ,the problem of Correlated Graph Search (CGS)is to find the set of all graphs that are correlated with q .The answer set of the CGS problem is defined as A q ={(g,D g ):φ(q,g )≥θ}.For each correlated graph g of q ,we include D g in the answer set in order to indicate the distribution of g in D .We also define the set of correlated graphs in the answer set as the base of the answer set,denoted as base (A q )={g :(g,D g )∈A q }.In the subsequent discussions,a correlation query graph is simply called a query .Table 1gives the notations used throughout the paper.Table 1:Notations Used ThroughoutNotationDescription D a graph database q a query graphθa minimum correlation threshold,0<θ≤1φ(q,g )Pearson’s correlation coefficient of q and g A qthe answer set of qbase (A q )the base of the answer setD gthe projected database of D on graph g freq (g ),supp (g )the frequency/support of g in Dfreq (q,g ),supp (q,g )the joint frequency/support of q and g in D freq (g ;D q ),supp (g ;D q )the frequency/support of g in D qfreq (q,g ;D q ),supp (q,g ;D q )the joint frequency/support of q and g in D q lower (g ),upper (g )the lower/upper bound of supp (g )lower (q,g ),upper (q,g )the lower/upper bound of supp (q,g )4.CANDIDATE GENERATIONA crucial step for solving the problem of CGS is to ob-tain the set of candidate graphs.Obviously,it is infeasible to test all subgraphs of the graphs in D because there are exponentially many subgraphs.In this section,we discuss how to effectively select a small set of candidates for a given query.4.1Support Bounds of Correlated GraphsWe begin by investigating the bounds on the support of a candidate graph,g ,with respect to the support of a query q .We state and prove the bounds in Lemma 1.Lemma 1.If q and g are correlated,then the following bounds of supp (g )hold:supp (q )θ−2(1−supp (q ))+supp (q )≤supp (g )≤supp (q )θ2(1−supp (q ))+supp (q ).Proof.By the definition of the joint support,we have supp (q,g )≤supp (g )and supp (q,g )≤supp (q ).Since q and g are correlated,φ(q,g )≥θ.By replacing supp (q,g )with supp (g )in φ(q,g ),we have:supp (g )−supp (q )supp (g )supp (q )supp (g )(1−supp (q ))(1−supp (g ))≥θ⇒supp (g )≥supp (q )θ−2(1−supp (q ))+supp (q ).Similarly,by replacing supp (q,g )with supp (q )in φ(q,g ),we obtain the upper bound:supp (g )≤supp (q )θ2(1−supp (q ))+supp (q ).For simplicity,we use lower (g )and upper (g )to denote the respective lower and upper bounds of supp (g )with respect to q ,as given in Lemma 1.The above lemma states a nec-essary condition for a correlated answer graph.Thus,a candidate graph should have support within the range of [lower (g ),upper (g )].With the result of Lemma 1,we can obtain the candi-date set by mining the set of FGs from D using lower (g )as the minimum support threshold and upper (g )as the max-imum support threshold.However,according to the anti-monotone property of the support measure,the graphs withhigher support are always generated before those with lower support,no matter adopting a breadth-first or a depth-first strategy.As a result,the maximum threshold upper(g)is not able to speed up the mining process.Therefore,generat-ing the candidates by mining the FGs from D with a support range is still not efficient enough,especially when lower(g) is small or D is large.This motivates us to devise a more efficient and effective approach to generate the candidates.4.2Candidate Generation From a ProjectedDatabaseFrom Definition1,it follows that ifφ>0,then supp(q,g)> 0.This means that q and g must appear together in at least one graph in D.This also implies that∀g∈base(A q),g appears in at least one graph in the projected database of q,D q.Since D q is in general much smaller than D,this gives rise to the following natural question:can we mine the candidate set more efficiently from D q instead of D?The challenge is that,however,we need to determine a minimum support threshold that can be used to mine the FGs from D q,so that no correlated answer graph is missed. Obviously,we cannot use a trivial threshold since it is too expensive.In this subsection,we derive a minimum support threshold which enables us to efficiently compute the can-didates from D q.Our solution is inspired by the following important observation as stated in Lemma2.Lemma 2.Given a graph g,supp(g;D q)=supp(q,g;D q)= supp(q,g)supp(q).Proof.By the definition of the projected database,every graph in D q must contain q.Therefore,every graph in D q that contains g must also contain q.Thus,supp(g;D q)= supp(q,g;D q)holds.Since the number of graphs contain-ing both q and g in D is the same as that in D q,that is,freq(q,g)=freq(q,g;D q),we have supp(q,g)supp(q)=freq(q,g)/|D|freq(q)/|D|=freq(q,g;D q)|D q|=supp(q,g;D q).Lemma2states that the support of a graph g in D q is the same as the joint support of q and g in D q.This prompts us to derive the lower bound and upper bound for supp(q,g;D q),given that g is correlated with q.Then,we can use the bounds as the minimum and maximum support thresholds to compute the candidates from D q.Since supp(q,g;D q)=supp(q,g)supp(q)by Lemma2,we try to derive the bounds for supp(q,g).First,by the definition of the joint support,we obtain the upper bound of supp(q,g)as follows:supp(q,g)≤supp(q).(1) Then,we construct a lower bound for supp(q,g)from Defi-nition1.Givenφ(q,g)≥θ,we have the following inequality:supp(q,g)≥f(supp(g)),(2) wheref(supp(g))=θsupp(q)supp(g)(1−supp(q))(1−supp(g))+supp(q)supp(g).The lower bound of supp(q,g)stated in Inequality(2)can-not be directly used,since it is a function of supp(g),where g is exactly what we try to get using supp(q,g).However,since we have obtained the range of supp(g),i.e.,[lower(g),upper(g)]as stated in Lemma1,we now show that this range can be used in Inequality(2)to obtain the lower bound of supp(q,g). By investigating the property of the function f,wefind that,f is monotonically increasing with supp(g)in the range of[lower(g),upper(g)].Therefore,by substituting supp(g) with lower(g)in Inequality(2),we obtain the lower bound of supp(q,g).We state and prove the bounds of supp(q,g) in the following lemma.Lemma 3.If q and g are correlated,then the following bounds of supp(q,g)hold:supp(q)θ−2(1−supp(q))+supp(q)≤supp(q,g)≤supp(q).Proof.The upper bound follows by the definition of the joint support.To show that the lower bound holds,we need to prove that the function f is monotonically increasing within the bounds of supp(g)given in Lemma1.This can be done by applying differentiation to f with respect to supp(g)as follows:f′(supp(g))=θ·supp(q)(1−supp(q))(1−2·supp(g))2supp(q)supp(g)(1−supp(q))(1−supp(g))+supp(q).Thus,we need to prove that within[lower(g),upper(g)], f′(supp(g))≥0or equivalently the following inequality:1−2·supp(g)supp(g)(1−supp(g))≥−2θsupp(q)1−supp(q).(3)First,if supp(g)≤upper(g)≤0.5,then(1−2·supp(g))≥0and hence f′(supp(g))≥0.Now we consider the case when upper(g)≥supp(g)>0.5. Since the left hand side of Inequality(3)is less than0,we take square on both sides of Inequality(3)and obtain:(1−2·supp(g))2supp(g)(1−supp(g))≤4·supp(q)θ2(1−supp(q))⇔a·(supp(g))2−a·supp(g)+θ2(1−supp(q))≤0,(4) where a=4θ2(1−supp(q))+4supp(q).The left hand side of Inequality(4)is a quadratic func-tion,which is monotonically increasing within the range of[0.5,∞].Since0.5<supp(g)≤upper(g),we replace supp(g)with upper(g)in this quadratic function:a·(upper(g))2−a·upper(g)+θ2(1−supp(q))=θ2(1−supp(q))(−4·upper(g)+1)<θ2(1−supp(q))(−4×0.5+1)(Since upper(g)>0.5) <0.Therefore,when0.5<supp(g)≤upper(g),Inequality(4) holds and hence f′(supp(g))≥0.Thus,f is monotonically increasing within the range of [lower(g),upper(g)].By substituting supp(g)with lower(g) in Inequality(2),the lower bound of supp(q,g)thus follows: supp(q,g)≥f(supp(g))≥f(supp(q)θ−2(1−supp(q))+supp(q))=supp(q)θ−2(1−supp(q))+supp(q).We use lower (q,g )and upper (q,g )to denote the lower and upper bounds of supp (q,g )with respect to q ,as given in Lemma 3.With the results of Lemmas 2and 3,we propose to gener-ate the candidates by mining FGs from D q using lower (q,g )supp (q )as the minimum support threshold.A generated candidate set,C ,is said to be complete with respect to q ,if ∀g ∈base (A q ),g ∈C .We establish the result of completeness by the fol-lowing theorem.Theorem 1.Let C be the set of FGs mined from D q withthe minimum support threshold of lower (q,g )supp (q ).Then,C is com-plete with respect to q .Proof.Let g ∈base (A q ).Since φ(q,g )≥θ,it follows that lower (q,g )≤supp (q,g )≤upper (q,g )by Lemma 3.By dividing the inequality by supp (q )on all the expressions,we have lower(q,g )supp (q )≤supp (q,g )supp (q )≤1.By Lemma 2,we have lower (q,g )supp (q )≤supp (g ;D q )≤1.The result g ∈C follows,sinceC is the set of FGs mined fromD q using lower (q,g )supp (q )as theminimum support threshold.The result of Theorem 1is significant,since it implies that we are now able to mine the set of candidate graphs from a much smaller projected database D q (compared with D )with a greater minimum support threshold lower(q,g )supp (q )(com-pared with lower (g )which is equal to lower (q,g ),as shown in Lemmas 1and 3).5.CORRELATED GRAPH SEARCHIn this section,we present our solution to the CGS prob-lem.The framework of the solution consists of the following four steps.1.Obtain the projected database D q of q .2.Mine the set of candidate graphs C from D q ,using lower (q,g )supp (q )as the minimum support threshold.3.Refine C by three heuristic rules.4.For each candidate graph g ∈C ,(a)Obtain D g .(b)Add (g,D g )to A q if φ(q,g )≥θ.Step 1obtains the projected database of q .This step canbe efficiently performed using any existing graph indexing technique [26,6]that can be used to obtain the projected database of a given graph.Step 2mines the set of FGs from D q using some existing FG mining algorithm [12,25,14].The minimum support threshold is determined by Theorem 1.The set of FGs forms the candidate set,C .For each graph g ∈C ,the set of graphs in D q that contain g is also obtained by the FG mining process.In Step 3,three heuristic rules are applied on C to further prune the graphs that are guaranteed to be false positives,as well as to identify the graphs that are guaranteed to be in the answer set.Finally,for each remaining graph g in C ,Step 4(a)obtains D g using the same indexing technique as in Step 1.Then Step 4(b)checks the correlation condition of g with respectto q to produce the answer set.Note that,the joint sup-port of q and g ,which is needed for computing φ(q,g ),is computed as (supp (g ;D q )·supp (q ))by Lemma 2.In the remainder of this section,we present three heuristic rules and our algorithm,CGSearch ,to solve the problem of CGS.5.1Heuristic RulesTo check whether each graph g in C is correlated with q ,a query operation to obtain D g is needed for each candidate (Step 4(a)).The step can be expensive if the candidate set is large.Thus,we develop three heuristic rules to further refine the candidate set.First,if we are able to identify the graphs that are guar-anteed to be correlated with q before processing Step 4,we can save the cost of verifying the result.We achieve this goal by Heuristic 1.Heuristic 1.Given a graph g ,if g ∈C and g ⊇q ,then g ∈base (A q ).Proof.Since g ⊇q ,we have supp (q,g )=supp (g ).More-over,since g ∈C ,we have supp (g,q ;D q )≥lower(q,g )supp (q ).By Lemma 2,we further have supp (q,g )≥lower (q,g ).By replacing supp (q,g )with supp (g )in φ(q,g ),we haveφ(q,g )=1−supp (q )supp (q )·supp (g )1−supp (g )Now,φis monotonically increasing with supp (g ),and supp (g )=supp (q,g )≥lower (q,g ).We replace supp (g )with its lower bound of lower (q,g )=supp (q )θ−2(1−supp(q ))+supp (q )in φ(q,g ).Then,we have the following:φ(q,g )≥1−supp (q )supp (q )·θ2supp (q )1−supp (q )≥θ.Therefore,g ∈base (A q ).Based on Heuristic 1,if we find that a graph g in the candidate set is a supergraph of q ,we can add (g,D g )into the answer set without checking the correlation condition.In addition,since g is a supergraph of q ,D g can be obtained when g is mined from the projected database D q .We next seek to save the cost of unrewarding query opera-tions by pruning those candidate graphs that are guaranteed to be uncorrelated with q .For this purpose,we develop the following two heuristic rules.Before introducing Heuristic 2,we establish the following lemma,which describes a useful property of the function φ.Lemma 4.If both supp (q )and supp (q,g )are fixed,then φ(q,g )is monotonically decreasing with supp (g ).Proof.Since both supp (q )and supp (q,g )are fixed,we first simplify φfor clarity of presentation.Let x =supp (g ),a =supp (q,g ),b =supp (q ),and c =supp (q )(1−supp (q )).Then we haveφ(x )=a −b ·xc ·x (1−x ).The derivative of φat x is given as follows:φ′(x )=1√c ·(2a −b )x −a 2x (1−x )x (1−x ).Since0≤x≤1,we have x(1−x)≥0.Thus,the sign of φ′(x)depends on the sign of((2a−b)x−a).Since((2a−b)x−a)is a linear function,we can derive its extreme values by replacing0and1of x into the function.The two extreme values of((2a−b)x−a)are(−a)and(a−b),both of which are non-positive,since a≥0and a≤b.Therefore,we have ((2a−b)x−a)≤0andφ′(x)≤0.It follows thatφ(q,g)is monotonically decreasing with supp(g).Heuristic 2.Given two graphs g1and g2,where g1⊇g2 and supp(g1,q)=supp(g2,q),if g1/∈base(A q),then g2/∈base(A q).Proof.Since g1⊇g2,we have supp(g1)≤supp(g2). Since supp(g1,q)=supp(g2,q)and supp(q)isfixed,by Lemma 4,we haveφ(q,g1)≥φ(q,g2).Since g1/∈base(A q),we have φ(q,g1)≤θ.Therefore,φ(q,g2)≤φ(q,g1)≤θ.Thus,we have g2/∈base(A q).By Lemma2,if supp(g1,q)=supp(g2,q),then we have supp(g1;D q)=supp(g2;D q).Thus,Heuristic2can be ap-plied as follows:if wefind that a graph g is uncorrelated with q,we can prune all the subgraphs of g that have the same support as g in D q.We now use the function f again to present the third heuristic:f(supp(g1))=θsupp(q)(1−supp(q))supp(g1)(1−supp(g1)) +supp(q)supp(g1).Heuristic 3.Given two graphs g1and g2,where g1⊇g2, if supp(g2,q)<f(supp(g1)),then g2/∈base(A q). Proof.Since g1⊇g2,we have supp(g1)≤supp(g2).By Lemma1,the necessary condition forφ(q,g2)≥θis that, supp(g2)should fall within the range[lower(g2),upper(g2)]. As shown in the proof of Lemma3,the function f is mono-tonically increasing within the range[lower(g2),upper(g2)]. Therefore,we have supp(g2,q)<f(supp(g1))≤f(supp(g2)). By replacing supp(g2,q)with f(supp(g2))inφ(q,g2),we have the following derivations:φ(q,g2)<f(supp(g2))−supp(q)supp(g2)supp(q)supp(g2)(1−supp(q))(1−supp(g2))=θsupp(q)supp(g2)(1−supp(q))(1−supp(g2)) supp(q)supp(g2)(1−supp(q))(1−supp(g2))=θ.Therefore,we have g2/∈base(A q).Note that,supp(g2,q)<f(supp(g1))also implies g1/∈base(A q).This is because g1⊇g2implies supp(g1,q)≤supp(g2,q).Therefore,we have supp(g1,q)<f(supp(g1)). Similarly,by replacing supp(g1,q)with f(supp(g1))inφ(q,g1), we can haveφ(q,g1)<θand thus g1/∈base(A q).By Lemma2,we have supp(g2,q)=supp(g2;D q)·supp(q). Thus,if supp(g2,q)<f(supp(g1)),then supp(g2;D q)<f(supp(g1)) supp(q).Thus,Heuristic3can be applied as follows:ifwefind that a graph g is uncorrelated with q,we can prune all the subgraphs of g that have support values less than f(supp(g))supp(q)in D q.5.2CGSearch AlgorithmNow,we present the CGSearch algorithm.As shown in Algorithm1,after we obtain the candidate set C from the projected database D q(Lines1-2),we process each candi-date graph in C according to the descending order of the graph sizes.Then,Lines4-5applies Heuristic1to include the supergraphs of q∈C directly into the answer set with-out performing the query operation(as in Line7).For other graphs in C,if they are verified to be correlated with q,we in-clude them in the answer set(Lines8-9);otherwise,Heuris-tic2(Lines11-12)and Heuristic3(Lines13-14)are appliedto further reduce the search space so that the unrewarding query costs for false-positives are saved.Algorithm1CGSearchInput:A graph database D,a query graph q,and a corre-lation thresholdθ.Output:The answer set A q.1.Obtain D q;2.Mine FGs from D q using lower(q,g)supp(q)as the minimum support threshold and add the FGs to C;3.for each graph g∈C in size-descending order do4.if(g⊇q)5.Add(g,D g)to A q;6.else7.Obtain D g;8.if(φ(q,g)≥θ)9.Add(g,D g)to A q;10.else11.H2←{g′∈C:g′⊆g,supp(g′;D q)=supp(g;D q)};12.C←C−H2;13.H3←{g′∈C:g′⊆g,supp(g′;D q)<f(supp(g))supp(q)};14.C←C−H3;We now prove the soundness and completeness of the re-sult returned by CGSearch algorithm.In other words,we prove that CGSearch is able to precisely return A q with respect to a given q.Theorem 2.The answer set,A q,returned by Algorithm 1,is sound and complete with respect to q.Proof.Wefirst prove the soundness.∀(g,D g)∈A q, (g,D g)is added to A q in either Line5or Line9.For the case of Line5,we have proved in Heuristic1that g is cor-related with q;while for the case of Line9,the soundness is guaranteed in Line8.Thus,the soundness of A q follows.It remains to show the completeness.By Theorem1,the candidate set,C,produced in Line2of Algorithm1is com-plete.∀g∈C,if g is not included in A q,thenφ(q,g)is checked to be less thanθ(Line10)or g is pruned by Heuris-tics2or3(Lines11-14).For all cases,g is proved to be uncorrelated with q and thus is not in A q.Therefore,the completeness of A q follows.Example2.Consider the graph database in Figure1 and the query q in Figure2(a).Letθ=0.6.CGSearch (Line1)first obtains D q={g1,g2,g3,g4}.Thus,we have supp(q)=0.4and lower(q,g)=0.19.Then,CGSearch (Line2)mines FGs from D q using0.190.4=0.475as the min-imum support threshold and obtains9candidates,which are shown in Figure2(b).The number following“:”in the figure is the support of each candidate in D q.Since the candidates are sorted in descending order of their size,CGSearchfirst processes c1.Since c1is a super-。
网格环境下基于流水线的多重相似查询优化
ISSN 1000-9825, CODEN RUXUEW E-mail: jos@Journal of Software, Vol.21, No.1, January 2010, pp.55−67 doi: 10.3724/SP.J.1001.2010.03665 Tel/Fax: +86-10-62562563© by Institute of Software, the Chinese Academy of Sciences. All rights reserved.∗网格环境下基于流水线的多重相似查询优化胡华1,2, 庄毅2+, 胡海洋1,2, 赵格华31(杭州电子科技大学计算机学院,浙江杭州 310018)2(浙江工商大学计算机与信息工程学院,浙江杭州 310018)3(香港中文大学计算机科学与工程系,香港)Pipeline-Based Multi-Query Optimization for Similarity Queries in Grid EnvironmentHU Hua1,2, ZHUANG Yi2+, HU Hai-Yang1,2, Dickson CHIU31(College of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China)2(College of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China)3(Department of Computer Science and Engineering, Chinese University of Hong Kong, China)+ Corresponding author: E-mail: zhuang@Hu H, Zhuang Y, Hu HY, Chiu D. Pipeline-Based multi-query optimization for similarity queries in gridenvironment Journal of Software, 2010,21(1):55−67. /1000-9825/3665.htmAbstract: This paper proposes a multi-query optimization algorithm for pipeline-based distributed similarityquery processing (pGMSQ) in grid environment. First, when a number of query requests are simultaneouslysubmitted by users, a cost-based dynamic query clustering (DQC) is invoked to quickly and effectively identify thecorrelation among the query spheres (requests). Then, index-support vector set reduction is performed at data nodelevel in parallel. Finally, refinement of the candidate vectors is conducted to get the answer set at the execution nodelevel. By adopting pipeline-based technique, this algorithm is experimentally proved to be efficient and effective inminimizing the response time by decreasing network transfer cost and increasing the throughput.Key words: grid; multi-query optimization; high-dimensional indexing; data partition摘要: 提出一种网格环境下基于流水线技术的分布式多重相似查询的优化算法(pipeline-based distributedsimilarity query processing,简称pGMSQ).首先,当用户提交若干个查询请求时,采用基于代价的动态层次聚类策略(dynamic query clustering,简称DQC)对其进行合并.然后在数据结点层,采用索引支持的向量集缩减方法快速过滤无关向量.最后,在执行结点层对候选向量执行求精操作返回结果向量.由于本查询采用了流水线技术,实验结果表明,该方法在提高查询性能的同时也提高了系统的吞吐量.∗ Supported by the National Natural Science Foundation of China under Grant Nos.60873022, 60903053 (国家自然科学基金); theZhejiang Provincal Natural Science Foundation of China under Grant Nos.Y1080148, Y1090165 (浙江省自然科学基金); the KeyProgram of Science and Technology of Zhejiang Province of China under Grant No.2008C13082 (浙江省科技厅重大科技项目); the KeyProject of Special Foundation for Young Scholars in Zhejiang Gongshang University of China under Grant No.Q09-7 (浙江工商大学青年人才基金重点资助项目); the Open Fund Provided by State Key Laboratory for Novel Software Technology of Nanjing University ofChina (南京大学计算机软件新技术国家重点实验室开放基金)Received 2008-06-05; Revised 2008-11-27; Accepted 2009-02-2456Journal of Software 软件学报 V ol.21, No.1, January 2010关键词: 网格;多重查询优化;高维索引;数据分片 中图法分类号: TP311 文献标识码: AInternet 和多媒体技术的飞速发展使得查询密集(并发用户查询)条件下基于内容的海量多媒体信息检索[1]的研究成为一个热点问题.为了有效地提高其查询性能,需要应用高维索引技术.然而传统的高维索引方法都是从数据本身来设计索引结构[2−6].较少有研究从对用户查询请求的优化来提高查询性能.如图1所示,假设用户提交3个查询请求(query request,简称QR),即Q 1,Q 2和Q 3,可以看出,Q 1和Q 2存在一些相关性(相交),这意味着它们具有公共的查询区域(如图1中阴影部分所示).通过合并Q 1和Q 2使得原来的3次查询能够以批量的方式快速完成(即2次),提高其总体查询性能.这种对独立的查询序列发现它们之间的相关性并且用于查询的优化称为多重查询优化(multi-query optimization)[1],即从用户查询的角度来提高查询性能.网格环境为用户提供了一个统一的并行分布式计算平台[7].为了充分利用网格强大的并行计算能力,突出数据网格资源共享的特点,提出一种面向网格环境的基于流水线技术的分布式相似查询的多重优化方法——pGMSQ,以显著提高分布式相似查询效率及吞吐量.与传统分布式系统不同,网格中的结点呈现异构(处理能力不同等)特点且需要通过网格资源管理系统(grid resource management system,简称GRMS)动态来发现.pGMSQ 面临的技术挑战表现在以下两个方面:1) 快速、有效地发现不同查询请求的相关性:对于若干查询请求,如何快速、有效地对其进行聚类不是一个简单的问题.2) 有效的负载平衡:对于绝大多数的并行数据库系统,设计一种高效的负载均衡方法对于提高查询的并行性非常重要,特别是对于异构的网格环境.为了应对上述挑战,pGMSQ 算法包括3种支撑技术,包括动态查询聚类策略,基于中心环(centroid ring)的负载均衡机制和索引支持的向量集缩减.图2为网格环境下的多重相似查询的框架.假设用户提交一批查询(即,m 个查询向量及其半径)到查询结点N q ,pGMSQ 算法的基本思想如下:首先,利用基于代价的动态查询聚类算法对提交的查询在查询结点层进行快速合并.然后将新得到的查询类发送至数据结点层,进行基于iDistance [8]索引的不相关高维向量的快速过滤处理.之后,再将得到的候选向量发送至执行结点层并行地进行求精(距离计算)操作得到结果向量,其中该层的结点是通过GRMS 动态发现的.最后将结果向量返回查询结点N q ,这些方法的目的是减少网络传输的代价以及提高查询的并行性.不失一般性,我们将欧式距离作为本文的相似距离度量.Fig.1 Intersection part of query spheres图1 查询超球相交部分Fig.2 Topology of grid图2 网格的拓扑结构Query NodeL evelData NodeL evelqN 2e N 1dN 3dN d N α本文第1节介绍相关工作.第2节为预备工作.第3节给出3种支持pGMSQ 的支撑技术,包括基于代价的动态查询层次聚类算法、自适应负载均衡及索引支持的向量集缩减.第4节提出网格环境下的基于流水线技术的多重相似查询优化算法.第5节通过实验验证算法的有效性.第6节总结本文.E xecuting Node L evel eN β2e N 1e N Data node level N q Query node level1d N 3d NdN α 2dN 2eN e N β1eN Executing node level胡华 等:网格环境下基于流水线的多重相似查询优化 571 相关工作本文工作与多维(高维)索引相关.目前高维索引分成两类:集中式高维索引[2]和分布式高维索引[5,9,10].集中式多(或高)维索引在近20年来一直受到学术界广泛的研究[2].在这些索引方法中,数据空间被划分成多个小的区域,其中数据子空间或重叠[3]或独立[4].近年来,相继提出了一些新的索引方法,如基于向量压缩的索引(如VA-file [11])及基于距离的高维索引(如iDistance [8]).其中,在iDistance 中,首先对高维数据进行聚类,再计算每个对象与其质心的距离并用B+树索引.从而将高维查询转换为一维范围查找.以上索引都是针对集中式查询设计的.Berchtold 等人[9]提出了一种基于近似的最优数据项分布的、快速的并行相似检索.之后,文献[10]提出一种基于磁盘阵列的并行相似检索.最近,基于Peer-to-Peer(P2P)的相似检索越来越受到关注.CAN [5]是第一个支持多维数据检索的P2P 系统.文献[12]采用空间填充曲线将多维数据映射到一维空间.pSearch [13]是一个基于CAN 的文档检索的P2P 系统.另一个由Sahin 等人[14]开发的系统也是基于哈希函数的CAN 系统,然而,精确的查询在该系统中是低效的.SkipIndex [15]和文献[16]提出的查询算法支持在P2P 网络的相似查询.上述分布式索引都集中在数据本身,很少考虑到从查询请求的角度来提高查询效率.在数据网格研究领域,美国和欧洲等国已经进行了广泛而深入的研究,并且推出了一些实验系统,其中最著名的是欧洲数据网格项目[17]、美国的国际虚拟数据网格实验室IVDGL(International Virtual Data Grid Laboratory)项目[18]等.最著名的数据网格系统工具是Globus 中的数据网格支撑模块和SDSC(San Diego Supercomputer Centre)的SRB(storage resource broker)系统.虽然目前对网格环境下的传统数据库查询进行了一定的研究[7,19],但是还很少有文献研究基于网格的多重相似查询优化.与传统分布式计算环境不同[20],网格环境下各结点高度自治并且异构;所处理的数据一般都是海量数据;各结点之间的连接带宽不同,其传输速度可能会有很大的差异;网络环境不稳定,经常会出现结点之间连接不上以及连接中断的情况,这些都为基于网格环境的多重相似查询提出了新的要求.多重查询优化在传统数据库查询中已进行了广泛的研究[1].一般来说,数据库管理系统(DBMS)经常会执行一组相关的查询请求,这些相关查询请求包含一些共同的查询部分(子查询表达式).多种查询优化通过发现查询计划中的公共部分来进行,包括消除共同的子查询表达式、重新排序查询计划以及使用物化视图等.2 预备工作本文采用符号见表1.Table 1 Meaning of symbols used表1 符号表Symbol Meaning SymbolMeaningΩVector set d (V i ,V j )Similarity distanc V i The i-th vector and V i ∈Ω Ω′ Candidate vector set D Dimensionality Ω′′ Answer vector set N The number of vectors in Ω α The number of data nodes jq V The j-th query vector β The number of executing nodesΘ(,r j q V j ) The j -th query sphere, j ∈[1,m ] Vol (⋅) The volume of (⋅)Θ(,R j Q V j )The j -th query cluster, j ∈[1,m ′] and m ′<m定义1(网格形式化描述). 网格(G )可看作一张图,由结点(node)和边(edge)构成,形式化表示为G =(N ,E ),其中N 为结点集,E 为边,表示结点间数据传输的网络带宽.定义2(网格结点). 网格(G )中的结点(N )分为3种类型:查询结点(N q )、数据结点(N d )和执行结点(N e ),形式化表示为N =N q +N d +N e ,其中N q 由1个结点构成,N d 包含α个数据结点(),N d iN e 包含β个执行结点(e jN ),其中为第i 个数据结点且i ∈[1,α],d iN e jN 为第j 个执行结点且j ∈[1,β](如图2所示).58 Journal of Software 软件学报 V ol.21, No.1, January 2010根据定义2,在该网格中,查询结点N q 负责提交用户的查询请求和接受返回结果(Ω″),同时对查询进行动态批量聚类与排序.α个数据结点N d 用于存储高维向量(Ω)及其索引,向量集缩减在该结点层面并行完成,返回候选向量集(Ω′);求精过程(距离计算)在β个执行结点N e 上并行完成,返回结果向量集(Ω′′).不失一般性,假设向量在高维空间均匀分布.定义3(最小包围超球). 给定m 个查询超球:Θ(,r j q V j ),它们对应的最小包围超球(minimal bounded sphere,简称MBS)表示为MBS()()(11,,mmjjq j q j j j V r V r ==Θ⊇Θ∪∪),使得()()1,mj q j j Vol MBS V r =⎛⎞Θ⎜⎝⎠∪⎟最小,其中j ∈[1,m ]. 如图3所示,给定两个查询超球:和,它们对应的MBS 同样是一个超球,表示为MBS (,(,)i q i r V Θ(,)j q j V r Θ(,)i q i V r Θ(,)j j q V r Θ)=Θ(Q x V ,R x ).Θ(Q x V ,R x )对应的中心(Q xV )和半径表示为(,)2(,)j i q q i j x i q Q j i q q d V V r r d V V V V −++××=或()jiq qV V −(,)(2(,)i j q q j i )x ji jV Qq q q i j q q d V V r r V V d V V −+=+×−×V ,半径R x ,2()i j q q i j d V V r r +=+. 定义4(最大内切超球). 给定两个超球:Θ(V Q ,R )和Θ(V q ,r ),其对应的最大内切超球(maximal inner tangentsphere,简称MITS)表示为一个超球Θ(V x ,R x ),它包含在这两个超球的相交部分,其中(,)2Q q x r d V V R R −+=,V x = (,)()2(,)Q q Q q Q q d d V V R r V V V V V +×−+×−q 或()(,)2,()Q Q q x q Q q d R rV V V d V V V V V −−=×+×Q +.定理1. 给定两个查询超球:和,其对应的最小包围超球(MBS)表示为Θ((,)i q i r V Θ(,)j q j r V Θx Q V ,R x ),当满足: ((,)(,))12((,))Q i j q i q j x x Vol V r V r Vol V R ΘΘΘ∪>时,则得到:2()()if (,)2()()if (,)j j D D i D i i j q q i j q q i j j D D D i i j i j i j q q d d d r +r >V ,V r +r >V V >r r r +r >r +r r +r V V⎧×−⎨×≤⎩. 证明:如图5所示,根据半径及与中心间的距离可分成3种情况.为了描述方便,令A 为Θ(,r i q V i ),B 为Θ(,r j q V j ),C 为Θ(xQ V ,R x ),且r i >r j ,则按照两个查询超球中向量总数是否大于阴影部分中的向量个数,存在以下3 种情况:(i) A 包含B .如图5(a)所示,即r i >d (,)+r i q V j q V j ,则()()11()()2Vol A B Vol A Vol C Vol A ∪==>.(ii) A 与B 相交.如图5(b)所示,即r i +r j >d (,)≥r i q V j q V i −r j ,令()()()()1()()2Vol A B Vol A Vol B Vol A B Vol C Vol C ∪+−∩=>,则222()(21)(21)12(,)2(21)D D D D j i Dj i i j q q D r r Vol A B D D r r d V V D ΓΓΓ×π×π+−∩++>⎛⎞++×π⎜⎟⎝⎠+.由于任意两个高维超球A 和B 的交集部分体积Vol (A ∩B )的计算代价非常高[21],因此采用A 和B 的MITS 的体积来近似表示.如图4所示,给定两个相交的超球:Θ(V Q ,R )和Θ(V q ,r ),其对应的MITS 可用一个内切阴影圆来表示,其半径x =(,)2Q q r d V V R−+,则Vol (Θ(V Q ,R )∩Θ(V q ,r ))≈Vol (MITS (Θ(V Q ,R ),Θ(V q ,r )))=22(,)2(21)(21)DQ q D D D r d V V R x D D ΓΓ−+⎛⎞×π⎜⎟×π⎝⎠=++. 另外,因为Vol (A ∩B )≈Vol (MITS (A ,B )),则2(,)2()(21)Di j i j q q D r r d V V Vol A B D Γ⎛⎞+−×π⎜⎟⎝⎠∩≈+.合并上式,则D D i j r r +−(,)122Dji i j q q r r d V V ⎛⎞++×⎜⎟⎝⎠(,)2D j i i j q q r r d V V ⎛⎞+−>⎜⎝⎠⎟.由于r i +r j >d (,),则iq V j q V (,)1122Dji i j q q r r d V V ⎛⎞+−2×>×⎜⎟⎝⎠胡华 等:网格环境下基于流水线的多重相似查询优化 59(,)j iqq Dd V V .另外,由于(,)02Dj i i j q q r r d V V ⎛⎞+−>⎜⎝⎠⎟,所以得到2×(,)()j D D iq q i j d V V r r +>D .Fig.3 Corresponding MBS of the two spheres图3 两个超球对应的MBSFig.4 Corresponding MITS of the two spheres图4i q V j q V ir jr(a) r i >d (V ,V )+r i q jq j(b) r i +r j >d (V ,V )≥r i q jq i −r j(c) r i +r j ≤d (V ,V )i q j q Fig.5 Three cases for the query clustering图5 查询聚类的3种情况iq V jq V ir jr (iii) B 与A 相外切或B 不与A 相交,如图5(c)所示,即r i +r j ≤d (V ,),令i q j q V ()()()1()()2Vol A B Vol A Vol B Vol C Vol C ∪+=>,则有222(21)(21)12(,)2(21)D D D D j i D j i i j q q D r r D D r r d V V D ΓΓΓ×π×π+++>⎛⎞++×π⎜⎟⎝⎠+,得到(,)2()2D j i i j q q D D i j r r d V V r r ⎛⎞++×+>⎜⎟⎝⎠.由于r i +r j ≤d (V ,V ),则 iq j q 2(D i r ×+)()D D i j j r r r >+.基于以上分析,得到2()(,),if (,).2()(),if (,)j j D D i D i j q q i j q q i i j D D D ii j i j i j q q d d d r +r >V Vr +r >V V >r r r +r >r +r r +r V V ⎧j×−⎪⎨×≤⎪⎩□3 支撑技术为了更好地支持网格环境下的分布式相似查询的多重优化(pGMSQ)处理,提出3种支撑技术,包括动态查询聚类算法、自适应负载均衡及索引支持的向量集缩减.这些方法的目的是降低网络传输的代价并提高查询的并行性.3.1 动态查询层次聚类算法在某个时段,对于用户提交的一批查询请求,较难对这些查询进行快速而有效的聚类.为此,提出一种动态查询层次聚类算法(dynamic query clustering scheme,简称DQC),得到新的查询类.该查询请求集可形式化表示iq V jq V ir jr x Q V xR Query clusterx QV R xQuery cluster60 Journal of Software 软件学报 V ol.21, No.1, January 2010)为QSet ::=〈Q 1,Q 2,…,Q m 〉 (1) 其中Q j =,是指第j 个用户提交的查询请求,j ∈[1,m ].(j q j r V Θ需要注意的是,DQC 算法的第3行是基于定理1的.也就是说,对于任意两个超球,当满足定理1的结论时,则可将两者合并成一个新的查询超球,称为查询类.算法1. Dynamic query clustering(DQC) algorithm. 输入:m query spheres; 输出:m ′ query clusters.1. while (TRUE)2. for any two hyperspheres A and B do3. if Theorem.(1) is satisfied then4. A and B are merged;5. update the query list;6. m ←m −1;7. end if8. end for9. end while /* the value of m has been reduced to m ′ and m ′<m*/ 10. return m ′(updated m ) query clusters; 3.2 自适应数据负载均衡3.2.1 基于“查询投票”的结点处理能力估计对于多数并行数据库系统来说,数据分片对查询性能的影响非常关键.如第1节所述,网格是一个异构网络环境,其中每个结点的处理能力(如磁盘存储能力、磁盘转速及数据传输率)都不相同.因此,较难对每个结点(如数据结点或执行结点)的综合处理能力进行准确的模型建立.因此,作为数据分片的预处理阶段,对于每个结点的处理能力的估计对优化的数据分片非常重要.本节提出查询投票(query voting,简称QV)的方法.在该方法中,每个结点的处理能力可以通过来自不同用户查询请求得到的平均查询时间来度量,如图6所示.给定m 个用户和x 个结点,令T ij 为来自第j 个用户查询请求的在第i 个结点上的响应时间,其中i ∈[1,x ]且j ∈[1,m ].这样,对于m 个用户和x 个结点,可以得到一张用户时间表(user-time,简称UT),见式(2):111212122212............m m xm x x T T T T T T UT T T T ⎡⎤⎢⎥⎢⎥=⎢⎥⎢⎥⎢⎥⎣⎦ (2)对于第i 个结点,其处理能力(记作ρi )与其平均查询时间11mij j T m=∑成反比,见式(3): 1111ijij i mmj j mT T m ρ==∝=∑∑ (3)也就是说,平均响应时间可以通过ρi 的函数来表示,见式(4):111)(ij ij i m m j j mT T m f ρ====∑∑ (4) 其中,i ∈[1,x ]且j ∈[1,m ].基于上面的推导,对于第i 个结点来说,其处理能力所占比率可以表示为胡华 等:网格环境下基于流水线的多重相似查询优化 6112=1=1=1=11)111 (i)jjmj mmmj j j T i T T Per =+++∑∑∑∑xjT (5)算法2. Query voting algorithm.输入: Ω: Vector set, x nodes, m users;输出: ρi : The processing capabilities of x nodes.1. for i :=1 to x do /*对每个结点来说*/2. for j :=1 to m do /*对每个用户来说*/3. submit the j -th user query from i -th node;4. the query executing time(T ij ) is recorded ;5. end for6. the processing capability (ρi ) of the i -th node is obtained by Eq.(4);7.end forP 3P 2P 1PmP 4P 5Fig.6 A query polling example图6 查询投票示例Fig.7 Centroid rings intersected with query sphere图7 与查询超球相交的中心qV cV3.2.2 基于中心环的自适应数据分布策略为了使数据结点层的向量集缩减的并行度最大化,提出一种基于中心环(centroid-ring)的自适应负载均衡策略,这样可以保证向量随机地分布在不同的数据结点上.具体来说,该方法可以保证每个结点上几乎都存在一个类超球与查询超球相交.这样,对于每个查询,向量集缩减可以在每个结点并行执行.定义5(中心距离,centroid distance ). 给定一个向量V i ,其中心距离为其到中心点V c 的距离,记作CD (V i )=d (V i ,V c ),其中V c =(0.5,0.5,…,0.5).假设将高维空间按照中心距离均匀地“切”成α个“分片”,如图7所示.该分片定义为中心环.随着中心距离的增加,中心环编号也随之增加.定义6(中心环,centroid ring ). 给定一个向量V i ,其对应中心环(记作R (V i ))的编号为2(i CD V α⎡⋅+⎢1 .对于不同数据结点的向量数据分布(如算法3),将每个中心环中的向量按照数据结点处理能力的大小随机取出.这样可以保证几乎在每个数据结点上,查询超球都能与类超球∗∗相交.算法3. Vector allocation algorithm.∗∗由于每个结点上的向量集采用iDistance [6]来建立索引.而在建立iDistance 索引之前,需要对高维向量聚类,得到若干类超球.62 Journal of Software 软件学报 V ol.21, No.1, January 2010输入: Ω: The vector set, α data nodes;输出: Ω(1 to α): The vector set stored in the α data nodes.The high-dimensional spaces is equally sliced into α centroid rings; for each data node doj d N 3. randomly select |per (j )×n | vectors (Ω(j )) from α centroid rings; 4. the vectors Ω(j ) are stored in the j -th data node; 5. end for3.3 索引支持的快速向量集缩减由于向量集(Ω)存储在数据结点层,对于一个查询,直接在数据结点层执行向量的求精(距离计算)操作显然是低效的.因此,提出iDistance [8]索引支持的向量集缩减(VSR)方法.该方法的目的是减少求精过程的CPU 计算量和网络传输代价.算法4给出第j 个数据结点的向量集缩减.其中,函数RSearch (V p ,r )表示返回中心为V q 半径为r 的范围查询.算法4. Vector set reduction algorithm (VReduce).输入: V q : Query vector, r : Query radius, Ω(j ): The vector set in the j -th data node and j ∈[1,α]; 输出: Ω′(j ): The candidate vector set in the j -th data node.1. S 1←∅, S 2←∅; /*初始化*/2. for each cluster sphere Θ(O j ,CR j ) do3. if Θ(O j ,CR j ) dose not intersect with Θ(V q ,r ) then4. break;5. else6. S 2←RSearch (V q ,r );7. S 1←S 1∪S 2;8. if Θ(O j ,CR j ) contains Θ(V q ,r ) then end loop; 9. end if 10. end for11. return S 1; /*返回候选向量*/4 pGMSQ 算法基于上述3种支撑技术,我们提出一种基于网格并行流水线的分布式多重相似查询优化策略——pGMSQ.在介绍该算法之前,首先假设Ω中的向量数据已经按照中心环部署在数据结点层并且在每个数据结点分别采用iDistance [8]进行索引,用以支持快速的向量集缩减.pGMSQ 算法分为如下3步:(1) 动态查询聚类.作为pGMSQ 的第1步,首先用户提交m 个查询请求到查询结点N q ,然后执行动态查询聚类返回m ′个新的查询类作为新的查询请求,其中m ′<m .具体步骤如算法1所示.(2) 全局向量集缩减.完成查询结点层的动态查询调度后,将m ′个新的查询请求()打包并行地发(,)i Q i V R Θ送到α个数据结点.在数据结点层并行执行索引支持的向量集缩减(参见第3.3节).具体来说,对于每个数据结点j d N ,首先建立输入缓存IB j 用于保存Ω(j )中的向量,然后建立一个输出缓存OB j 用于存储候选向量集Ω′(j ),其中,j ∈[1,α].一旦得到候选向量集Ω′(j ),就对其进行散列操作并发送到执行结点层进行求精操作. 算法5. Global vector set reduction algorithm (GVReduce).输入: Ω(j ): The vector set in the j -th data node, : m ′ new query cluster spheres, α: Number of the(,)iQ i V R Θdata nodes;输出: Ω′(1 to α): The candidate vector set.胡华 等:网格环境下基于流水线的多重相似查询优化 631. for j :=1 to α do /*对于α个数据结点来说*/2. for i :=1 to m ′ do3. Ω′(j )←Ω′(j )∪VReduce (,j ); ,iQi V R 4. end for5. Ω′(j ) is cached in the output buffer OB j ;6. end for在算法5中,函数VReduce (V q ,r ,j )(参见算法4)返回第j 个数据结点的候选向量集Ω′(j ).向量集缩减是在不同数据结点上并行执行的.(3) 求精操作.作为最后一步,通过GRMS 动态地发现β个空闲结点作为执行结点,并在其上并行地计算候 选向量与查询向量的距离.如算法6所述,当距离值小于或等于r i q V i 时,其中i ∈[1,m ],则该结果向量可以通过打 包方式发送到查询结点N q .与全局向量集缩减中采用新的m ′查询类作为查询超球不同,在求精过程中,是采用m 个原来的查询超球来执行距离计算操作,其中m ′<m .算法6. Refine algorithm.输入: Ω′(j ): The candidate vector set in the j -th data node, : m query vectors and m query radius r i q V i , β:Number of the executing nodes;输出: Ω″(1 to β): The answer vector set.1. for j :=1 to β do /*对于β个执行结点来说*/2. Ω″(j )←∅;3. for i :=1 to m do4. if d (V i ,)≤r i q V i and V i ∈Ω′(j ) then Ω″(j ←Ω″(j )∪V i ;5. end for6. Ω″(j ) is cached in the output buffer OB j ;7. end for在第j 个执行结点j e N ,建立一个输入缓存IB j 用于存储候选向量集Ω′(j ),同时为Ω′(j )分配在结点j e N 上的内 存空间M j ,用于暂存Ω′(j )的向量.另外,建立一个输出缓存OB j ,用于暂时缓存结果向量.一旦在OB j 中的结果向量集大小等于包的大小,则以打包方式发送回查询结点N q .算法7为pGMSQ 算法.其中函数GVReduce (Ω,m ′ query clusters)为全局向量缩减算法(参见算法5);函数Refine (Ω′,m query spheres)是对候选向量集Ω′进行求精操作,如算法6所示;步骤4~步骤7并行执行.算法7. The pGMSQ algorithm.输入: A query list containing several queries; 输出: Ω′′: Query result.1. while the query list is not empty do2. Ω′←∅, Ω′′←∅; /*初始化*/3. m queries are extracted from the query list and submitted to the query node N q ;4. The m queries are clustered in the query node; /*在查询结点层*/5. Ω′←GVReduce (Ω,m ′ query clusters,α); /*在数据结点层*/6. The β executing nodes are dynamically discovered by the GRMS; /*动态发现β个空闲结点作为执行结点*/7. Ω′′←Refine (Ω′,m query spheres,β); /*在执行结点层*/ 8. The query answer Ω′′ is returned to the query node N q ; 9. end while由于以上讨论的查询是相对静态的,当用户的查询请求持续不断地产生时,为了进一步提高系统吞吐量,则64 Journal of Software 软件学报 V ol.21, No.1, January 2010需采用流水线技术,即在查询结点、数据结点和执行结点上的操作并发执行,使得这3个结点层在每一时刻都能够同时工作.定义7(加速比). 给定m 个查询请求,加速比是指m 次执行基于网格的相似查询(GSQ)所需的时间与基于网格并行流水线的多重查询(pGMSQ)所需时间之比,记作GSQ1pGMSQmi TIME Speedup TIME ==∑.假设查询提交及对其聚类所需时间为T 1,向量集缩减所需时间为T 2,求精操作所需时间为T 3,根据T 1,T 2和T 3的大小,分成5种情况讨论,如图8所示,给定n 批查询,其加速比分别表示为(1) 当T 1<T 2,GSQ1231231pGMSQ1232()lim lim lim3.ni n n n TIME n T T T T T T Speedup TIME T n T T T =→∞→∞→∞×++++=≈+×+=∑< (2) 当T 1>T 2且T 1<T 3,GSQ1231231pGMSQ1233()lim lim lim 3.ni n n n TIME n T T T T T T Speedup TIME T T n T T =→∞→∞→∞×++++=≈<++×=∑(3) 当T 1>T 2且T 1>T 3,GSQ1231231pGMSQ1231()lim lim lim3.mi n n n TIME n T T T T T T Speedup TIME n T T T T =→∞→∞→∞×++++=≈<×++=∑(4) 当T 1=T 2=T 3时,令T 1=T 2=T 3=T ,则GSQ1pGMSQ3lim lim lim3.1ni n n n TIME nSpeedup TIME n =→∞→∞→∞=≈+=∑{n{n(a) T 1<T 2 and T 1<T 3(b) T 1<T 2 and T 1>T 3{n{n(c) T 1>T 2 and T 1<T 3(d) T 1>T 2 and T 1>T 3{n(e) T 1=T 2=T 3T 1: Time for clustering queries T 2: Time for global vector setreductionT 3: Time for refinementFig.8 Five cases of pipeline-based parallel query图8 基于流水线并行查询的5种情况基于上面的分析可以看出,当T 1=T 2=T 3时,每个结点层能够更有效地同时工作且相互无影响.系统在理论上具有最好的并行性.本文在第5.4节实验部分通过调节数据结点(α)和执行结点(β)的个数来120获到最大的流水线并行性,即满足T 1=T 2=T 3.5 实验比较为了验证pGMSQ 方法的有效性,构建一个局域网环境来模拟网格环境,采用随机函数动态产生执行结点.同时用C 语言实现基于iDistance 的向量集缩减算法并将其部署在每个数据结点.B+树作为一维索引结构且索引页大小为4 096字节.采用两组数据:(i) 真实数据,来自UCI KDD Archive [22]的颜色直方图数据,包含68 040个32维的颜色直方图特征,每维值域范围为0~1;(ii) 合成数据,即5 000 000个100维的向量随机产生,满足均匀分布,其中每维值域范围也是0~1.在实验中,随机产生100个用户的查询请求,对于每组实验分别运行100次,以得胡华等:网格环境下基于流水线的多重相似查询优化65到平均时间.5.1 VSR对查询的影响本次实验采用两种方法分别研究向量集缩减(VSR)对pGMSQ查询性能的影响.方法1不采用VSR算法:向量集Ω的求精处理直接在数据结点层N d进行而不执行向量集缩减,然后将结果向量集Ω″发送回查询结点N q.方法2采用VSR算法:首先,向量集Ω通过算法4缩减为候选向量集Ω′,然后,对Ω′中的向量进行求精处理,得到结果向量,最后,结果向量集Ω″发送回查询结点N q.如图9所示,随着数据量的增加,采用VSR的查询性能大大优于没有采用的.因为被iDistance[6]过滤的无关向量会极大地减少在求精过程中的计算量及网络传输代价.5.2 DQC对查询的影响本次实验分别采用两种方法研究动态查询聚类(DQC)对查询性能的影响.方法1不采用DQC算法.方法2 采用该算法.从图10可以看出,数据量的增加导致采用方法2的DQC比不采用的其性能大为提高.且两者查询性能差异变大,因为通过使用DQC,多重查询的性能会进一步得以提高.5.3 维数对查询的影响本次实验研究维数对加速比的影响,采用合成数据作为本次实验的测试数据,其中每个向量的维数从20~100.本次实验数据量为1 000 000.如图11所示,随着维数的增加,其加速比在缓慢减少.因为维数的增加会导致在求精过程中消耗更高的CPU运算代价.另外,对于每个结果向量来说,维数的增加也会导致数据传输代价的提高.5.4 α和β对基于流水线并行查询的影响本次实验同样采用第2组数据集来研究数据结点个数(α)和执行结点个数(β)对流水线并行查询加速比的影响.如图12所示,随着α的增加,其加速比也在缓慢的增加.当数据结点个数超过40时,加速比在减少.这是因为,随着数据结点的增加,从查询结点到数据结点层的传输代价在增加,它会部分抵消向量集缩减带来的性能提高.另外,从图13可以看出,随着β的增加,其加速比首先会缓慢增加.当β超过30时,加速比在减少.这是因为随着执行结点的增加,从数据结点到执行结点层的传输代价在增加,它会部分抵消并行查询带来的性能提高.5.5 m对查询的影响本次实验采用第2组数据集研究用户查询请求个数(m)对加速比的影响.本次实验中,假设数据量为1 000 000、数据结点个数(α)为40和执行结点个数(β)为30,当m从20增大到100时,从图14可以看出,m对加速比无太大影响.图9 VSR对查询的影响Fig.10 Effect of DQC图10 DQC对查询的影响Fig.11 Effect of dimensionality图11 维数对查询的影响。
数据保真项与稀疏约束项相融合的稀疏重建
数据保真项与稀疏约束项相融合的稀疏重建高红霞;谢剑河;曾润浩;吴梓灵;马鸽【摘要】Aiming at the process of low-dose photon counting imaging with Poisson-Gaussian mixed noise ,a sparse reconstruction method of integrating data fidelity term and sparse constrait term isproposed .Firstly ,based on the hypothesis that Poisson and Gaussian noise are mutually independent , the sparse reconstructing objective function based on integrating data fidelity term and sparsity constraint term is established .Based on patch clustering ,the improved greedy algorithm is applied to implement sparse decomposition and dictionary update . Finally , a clean image is obtained by alternating iteration .Contrast experiments on images corrupted with strong Poisson-Gaussian mixed noise show that the average PSNR of image reconstructed by the proposed method increased by 5 .5%more than those of the contrast methods , moreover , their MSSIM increased significantly . The experiment results demonstrate that the proposed method has better image restoration and denoising effect for low photon counting image with strong Poisson-Gaussian mixed noise .%本文针对低光子计数成像过程中产生的泊松高斯混合噪声,提出了一种数据保真项与稀疏约束项相融合的稀疏重建方法.首先,基于泊松高斯噪声相互独立的混合噪声模型,建立了数据保真项与稀疏约束项相融合的稀疏重建目标函数;在图像块聚类的基础上,应用改进贪婪算法实现类内稀疏分解和字典更新;最后,稀疏分解和字典更新交替迭代求解干净图像.针对强烈泊松高斯噪声污染图像的重建实验显示,本文方法与对比方法相比,重建结果的PS N R值平均提升了5.5%,M SSIM值也有明显提升.这些结果表明:本文方法对具有强烈泊松高斯混合噪声的图像有较好的图像复原和噪声去除效果.【期刊名称】《光学精密工程》【年(卷),期】2017(025)009【总页数】11页(P2437-2447)【关键词】稀疏重建;字典学习;混合噪声;强噪声;低光子计数成像【作者】高红霞;谢剑河;曾润浩;吴梓灵;马鸽【作者单位】华南理工大学自动化科学与工程学院 ,广东广州510640;华南理工大学精密电子制造装备教育部研究中心 ,广东广州510640;华南理工大学自动化科学与工程学院 ,广东广州510640;华南理工大学精密电子制造装备教育部研究中心 ,广东广州510640;华南理工大学自动化科学与工程学院 ,广东广州510640;华南理工大学精密电子制造装备教育部研究中心 ,广东广州510640;华南理工大学自动化科学与工程学院 ,广东广州510640;华南理工大学精密电子制造装备教育部研究中心 ,广东广州510640;广州大学机械与电气工程学院,广东广州510006【正文语种】中文【中图分类】TP394.1;TP242.6Abstract: Aiming at the process of low-dose photon counting imaging with Poisson-Gaussian mixed noise, a sparse reconstruction method of integrating data fidelity term and sparse constrait term is proposed. Firstly,based on the hypothesis that Poisson and Gaussian noise are mutually independent, the sparse reconstructing objective function based onintegrating data fidelity term and sparsity constraint term is established. Based on patch clustering, the improved greedy algorithm is applied to implement sparse decomposition and dictionary update. Finally, a clean image is obtained by alternating iteration. Contrast experiments on images corrupted with strong Poisson-Gaussian mixed noise show that the average PSNR of image reconstructed by the proposed method increased by 5.5% more than those of the contrast methods, moreover, their MSSIM increased significantly. The experiment results demonstrate that the proposed method has better image restoration and denoising effect for low photon counting image with strong Poisson-Gaussian mixed noise. Key words: sparse reconstruction; dictionary learning; mixed noise; strong noise; Low-dose photon counting imaging高精密工业CT、医学低剂量CT等一类低光子计数成像系统的发展,推动着工业缺陷检测和医疗诊断技术的进步。
基于高斯映射的柱面与锥面点云拟合_李岸
k的初值 ,
m1 |m 1
×m 2 ×m 2
|作为
n (即参数
和 θ)的初值 , 最小曲率方向 m1 作为 a, 可以确定参数
α的初值 , 参数 ρ的 初值设为零 。 迭代法的初值 S =
(ρ, , θ, k, σ, τ)的确定与圆柱面相类似 , 不同的只是
圆锥面旋转轴线的位置和方向需要计算出两个数据点
2 高斯映射
对曲面 S 上每点 P , 可作出它的单位法向量 n。 因 为 |n |=1, 所以把向量 n 的起点平行地移到原点 O 后 , n的终点就是以在 O 为球心的单位球面 S2 上的一点 P ′。 把这种点的映照 , 称为曲面 S 的高斯映照 (Gauss 映照 )。 在高斯映照下 , 曲面 S 的像是单位球面内的一 个点集 , 这个点集可能是球面上的一个点 , 也可能是 一条球面曲线 。 点集 一般叫做高斯映像[ 5] 。 高斯映 射在实际工程中已经有了广泛的应用 。 M. H evert和 J. Ponce利用曲面法矢在高斯球上的分布 , 通过 H ough 技术将曲面类型分为平面 、柱面 、锥面等 [ 6] 。 H. Pottm ann提出在曲面类型识别时 , 可以利用基于法矢的高 斯映射图像识别柱面 [ 7] 。 P. Benko″等在讨论特征识别 时 , 也提到了利用基于法矢的高斯球使曲面特征可视 化 , 从而辅助识别出拉伸曲面的算法 [ 8] 。
310023, Ch ina)
Ab stract:
-
-
K ey w ord s:
0 前言
大多实物特别是机械零件的表面常常是由平面 、 球面 、圆柱面 、圆锥面以及圆环面等二次曲面构成 , 或 者是其重要的组成部分 , 所以在反求工程中 , 研究基于 点云数据的二次曲面拟合具有重要的意义 。 许多研究 人员 对 这 一 课题 进 行 了 大 量 的 研究 工 作 , C hen 和 L iu[ 1] 提 出 了 一 种 基 于 遗 传 算 法 (GA , genetic algorithm s)的一般二次曲面提取算法 。 V aughan P ra tt[ 2] 提 出了一种 “准最小二 乘 (Quasi-Least-Squares)”的sina. com。
基于资源描述框架图切分与顶点选择性的高效子图匹配方法
基于资源描述框架图切分与顶点选择性的高效子图匹配方法作者:关皓元朱斌李冠宇蔡永嘉来源:《计算机应用》2019年第02期摘要:在SPARQL查詢过程中,含有复杂结构的资源描述框架(RDF)图的查询效率低下。
为此,通过分析几种RDF图的基本结构与RDF顶点的选择性,提出RDF三元组模式选择性(RTPS)——一种基于RDF顶点选择性的图结构切分规则,以提高面向RDF图的子图匹配效率。
首先,根据谓词结构在数据图与查询图中的通性建立RDF相邻谓词路径(RAPP)索引,将数据图结构转化为传入传出双向谓词路径结构以确定查询顶点的搜索空间,并加快顶点的过滤;接着,通过整数线性规划(ILP)问题计算建模将复杂RDF查询图结构分解为若干结构简单的查询子图,通过分析RDF顶点在查询图中的相邻子图结构与特征,确立查询顶点的选择性以确定最优切分方式;然后,通过RDF顶点选择性与相邻子图的结构特征来缩小查询顶点的搜索空间范围,并在数据图中找到符合条件的RDF顶点;最后,遍历数据图以找到与查询子图结构相匹配的子图结构,将得到的子图进行连接并将其作为查询结果输出。
实验采用控制变量法,比较了RTPS、RDF子图匹配(RSM)、RDF-3X、GraSS与R3F的查询响应时间。
实验结果充分表明,与其他4种方法相比,当查询图复杂度高于9时,RTPS的查询响应时间更短,具有更高的查询效率。
关键词:SPARQL查询处理;资源描述框架;子图匹配;图结构切分;顶点选择性中图分类号: TP181; TP311文献标志码:AAbstract: As the graph-based query in SPARQL query processing becames more and more inefficient due to the increasing structure complexity of Resource Description Framework (RDF)in the graph, by analyzing the basic structure of RDF graphs and the selectivity of the RDF vertices, RDF Triple Patterns Selectivity (RTPS) was proposed to improve the efficienccy of subgraph matching for graph with RDF, which is a graph structure segmentation rule based on selectivity of RDF vertices. Firstly, according the commonality of the predicate structure in the data graph and the query graph, an RDF Adjacent Predicate Path (RAPP) index was built, and the data graph structure was transformed into incoming-outgoing predicate path structure to determine the search space of query vertices and speed up the filtering of RDF vertices. Secondly, the model of Integer Linear Programming (ILP) problem was built to divide a RDF query graph with complicated structure into several query subgraphs with simple structure. By analyzing the structure characteristics of the RDF vertices in the adjacent subgraphs, the selectivity of the query verticeswas established and the optimal segmentation method was determined. Thirdly, with the searching space narrowed down by the RDF vertex selectivity and structure characteristics of adjacent subgraphs, the matchable RDF vertices in the data graph were found. Finally, the RDF data graph was traversed to find the subgraphs whose structure matched the structure of query subgraphs. Then,the result graph was output by joining the subgraphs together. The controlling variable method was used in the experiment to compare the query response time of RTPS, RDF Subgraph Matiching (RSM), RDF-3X, GraSS and R3F. The experimental results show that, compared with the other four methods, when the number of triple patterns in a query graph is more than 9, RTPS has shorter query response time and higher query efficiency.Key words: SPARQL query processing; Resource Description Framework (RDF); subgraph matching; graph structure segmentation; vertex selectivity0 引言资源描述框架(Resource Description Framework, RDF)是一种描述语义Web中机器信息的标准数据模型,SPARQL是经W3C认证的面向RDF图的标准图查询语言[1],而模式匹配是SPARQL查询处理中的核心问题,其目的是在复杂的RDF图数据中快速地搜索符合查询请求的结果。
戴尔 SNMP 陷阱关联指南说明书
Dell SNMP Trap CorrelationGuideNotes and CautionsNOTE: A NOTE indicates important information that helps you make better use of your computer.CAUTION: A CAUTION indicates potential damage to hardware or loss of data if instructions are not followed.____________________Information in this document is subject to change without notice.©2010Dell Inc.All rights reserved.Reproduction of these materials in any manner whatsoever without the written permission of Dell Inc. is strictly forbidden.Trademarks used in this text: Dell™, the DELL logo, OpenManage™ are trademarks of Dell Inc. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell Inc. disclaims any proprietary interest in trademarks and trade names other than its own.October 2010Contents1Trap Correlation (5)Overview (5)Instrumentation Traps (5)Miscellaneous Traps (6)Temperature Probe Traps (6)Cooling Device Traps (7)Voltage Probe Traps (7)Amperage Probe Traps (8)Chassis Intrusion Traps (8)Redundancy Unit Traps (9)Power Supply Traps (9)Memory Device Traps (10)Fan Enclosure Traps (10)AC Power Cord Traps (11)Hardware Log Traps (11)Processor Device Status Traps (12)Pluggable Device Traps (12)Battery Traps (13)Contents34ContentsTrap CorrelationOverviewThis reference guide provides detailed information about the SNMP traps generated by Dell OpenManage Server Administrator (OMSA) that are displayed as messages on the HP Operations Manager (HPOM) console. It is intended for system administrators who use HPOM to monitor Dell systems. The SNMP Interceptor policy has predefined rules for processing all the OMSA and OpenManage Storage Systems (OMSS) traps sent by the Dell systems. For every OMSA or OMSS trap received there are one or more Clear Event traps that auto-acknowledge or clear the trap that is received.This guide provides information about the OMSA Clear Event traps that HPOM uses to auto-acknowledge the SNMP traps it receives from the Dell systems.For information on the OMSS Clear Event traps, see the "Storage Management Message Reference" section in the Dell OpenManage Server Administrator Version 6.3 Messages Reference Guide available on the Dell Support website at /manuals.Instrumentation TrapsThis section describes the traps that are generated by the Instrumentation service of the Server Administrator. All the traps documented in this section belong to the MIB enterprise identified by OID 1.3.6.1.4.1.674.10892.1For information on the description of the traps, see the Instrumentation T raps Section in the Dell OpenManage Server Administrator Version 6.3 SNMP Reference Guide.Trap Correlation56Trap CorrelationMiscellaneous TrapsTable 2-1 lists Miscellaneous traps that inform you that certain alert systems are up and working.Temperature Probe TrapsTemperature probes help protect critical components by alerting the systems management console when temperatures become too high inside a chassis. The temperature probe traps use additional variables: sensor location, chassis location, previous state, and temperature sensor value reported in degrees Celsius.Table 2-1.Miscellaneous TrapsTrap IDAlert NameRelated Alerts that are Cleared 1001System UpNone 1004Thermal ShutdownNone 1006Automatic System RecoveryNone 1007Host System ResetNone 1013System Peak Power New Peak NoneTable 2-2.Temperature Probe TrapsTrapIDAlert Name Related Alerts that are Cleared 1052Temperature Probe Normal TemperatureProbeWarning (1053), TemperatureProbeFailure (1054), TemperatureProbeNonRecoverable (1055)1053Temperature Probe Warning TemperatureProbeFailure (1054), TemperatureProbeNonRecoverable (1055)1054Temperature Probe Failure TemperatureProbeWarning (1053), TemperatureProbeNonRecoverable (1055)1055Temperature Probe Nonrecoverable TemperatureProbeWarning (1053), TemperatureProbeFailure (1054)Trap Correlation 7Cooling Device TrapsCooling device traps monitor how well a fan is functioning.Voltage Probe TrapsVoltage probes monitor the number of volts across critical components.Table 2-3.Cooling Device TrapsTrap IDAlert NameRelated Alerts that are Cleared 1102Cooling Device Normal CoolingDeviceWarning (1103), CoolingDeviceFailure (1104), CoolingDeviceNonRecoverable (1105)1103Cooling Device Warning CoolingDeviceFailure (1104), CoolingDeviceNonRecoverable (1105)1104Cooling Device Failure CoolingDeviceWarning (1103), CoolingDeviceNonRecoverable (1105)1105Cooling Device NonrecoverableCoolingDeviceWarning (1103), CoolingDeviceFailure (1104)Table 2-4.Voltage Probe TrapsTrap ID Alert Name Related Alerts that are Cleared1152Voltage Probe Normal VoltageProbeWarning (1153), VoltageProbeFailure (1154), VoltageProbeNonRecoverable (1155)1153Voltage Probe Warning VoltageProbeNonRecoverable (1155),VoltageProbeFailure (1154)1154Voltage Probe Failure VoltageProbeWarning (1153), VoltageProbeNonRecoverable (1155)1155Voltage Probe Nonrecoverable VoltageProbeWarning (1153), VoltageProbeFailure (1154)8Trap CorrelationAmperage Probe TrapsAmperage probes measure the amount of current (in amperes) that is traversing critical components.Chassis Intrusion TrapsChassis intrusion traps are a security measure. Chassis intrusion indicates that there is some disturbance to a system’s chassis. Alerts are sent to prevent unauthorized removal of parts from a chassis.Table 2-5.Amperage Probe TrapsTrap IDAlert NameRelated Alerts that are Cleared 1202Amperage Probe Normal AmperageProbeWarning (1203), AmperageProbeFailure (1204), AmperageProbeNonRecoverable (1205)1203Amperage Probe Warning AmperageProbeFailure (1204), AmperageProbeNonRecoverable (1205)1204Amperage Probe FailureAmperageProbeWarning (1203), AmperageProbeNonRecoverable (1205)1205Amperage Probe Nonrecoverable AmperageProbeWarning (1203), AmperageProbeFailure (1204)Table 2-6.Chassis Intrusion TrapsTrap IDAlert Name Related Alerts that are Cleared 1252Chassis Intrusion Normal ChassisIntrusionDetected (1254) 1254Chassis Intrusion Detected NoneTrap Correlation 9Redundancy Unit TrapsRedundancy indicates that a system chassis has more than one of certain critical components. Fans and power supplies, for example, are so important for preventing damage or disruption of a computer system that a chassis may have "extra" fans or power supplies installed. Redundancy allows a second or nth fan to keep the chassis components at a safe temperature when the primary fan has failed. Redundancy is normal when the intended number of critical components are operating. Redundancy is degraded when a component fails but others are still operating. Redundancy is lost when the number of components functioning falls below the redundancy threshold.The number of devices required for full redundancy is provided as part of the trap message when applicable for the redundancy unit and the platform. For more details on redundancy computation, please refer to the respective platform documentation.Power Supply TrapsPower supply traps provide status and warning information for power supplies present in a particular chassis.Table 2-7.Redundancy Unit TrapsTrap ID Alert Name Related Alerts that are Cleared1304Redundancy Normal RedundancyDegraded (1305), RedundancyLost (1306)1305Redundancy Degraded RedundancyLost (1306)1306Redundancy LostRedundancyDegraded (1305)Table 2-8.Power Supply TrapsTrap ID Alert Name Related Alerts that are Cleared1352Power Supply Normal PowerSupplyWarning (1353), PowerSupplyFailure (1354)1353Power Supply Warning PowerSupplyFailure (1354)1354Power Supply Failure PowerSupplyWarning (1353)10Trap Correlation Memory Device TrapsMemory device messages provide status and warning information for memory modules present in a particular system. Memory devices determine health status by counting the number of ECC memory corrections.NOTE: A value of failure or non-recoverable does not indicate a system failure or loss of data, but rather that the specified system exceeded the specified ECC correction threshold.Fan Enclosure TrapsSome systems are equipped with a protective enclosure for fans. Fan enclosure traps monitor enclosures for whether foreign objects are present and for how long a fan enclosure is absent from a chassis.Table 2-9.Memory Device MessagesTrap IDAlert NameRelated Alerts that are Cleared 1403MemoryDeviceWarningMemoryDeviceFailure (1404), MemoryDeviceNonRecoverable (1405) 1404MemoryDeviceFailure MemoryDeviceWarning (1403)MemoryDeviceNonRecoverable (1405)1405MemoryDeviceNonRecoverable MemoryDeviceFailure (1404)MemoryDeviceWarning (1403)Table 2-10.Fan Enclosure TrapsTrap IDAlert NameRelated Alerts that are Cleared 1452Fan Enclosure InsertionFanEnclosureRemoval (1453), FanEnclosureExtendedRemoval (1454)1453Fan Enclosure Removal FanEnclosureExtendedRemoval (1454)1454Fan Enclosure Extended Removal FanEnclosureRemoval (1453)AC Power Cord TrapsThe AC power cord sensor monitors the presence of AC power for an AC power cord. AC power cord traps provide status and warning information for power cords that are part of an AC power switch, if your system supports AC switching.Hardware Log TrapsHardware logs provide hardware status messages to systems management software. On certain systems, the hardware log is implemented as a circular queue. When the log becomes full, the oldest status messages are overwritten when new status messages are logged. On some systems, the log is not circular. On these systems, when the log becomes full, subsequent hardware status messages are lost. Hardware log sensor messages provide status and warning information about the noncircular logs that may fill up, resulting in lost status messages.Table 2-11.AC Power Cord TrapsTrap ID Alert Name Related Alerts that are Cleared1501AC Power Cord No Power Nonredundant ACPowerCordFailure (1504),1502AC Power Cord Normal ACPowerCordNoPowerNonRedundant (1501),ACPowerCordFailure (1504)1504AC Power Cord FailureACPowerCordNoPowerNonRedundant (1501), ACPower Cord Normal (1502)Table 2-12.Hardware Log TrapsTrap ID Alert Name Related Alerts that are Cleared1552 Log Normal LogWarning (1553), LogFull (1554)1553Log Warning LogFull (1554)1554Log Full LogWarning (1553)Processor Device Status TrapsThe BMC on some systems reports the status of processor devices. Processor device status traps provide status and warning information for processor devices present in a system with a BMC that reports the status of processor devices.Pluggable Device TrapsServer Administrator monitors the addition and removal of pluggable devices such as memory cards. Device traps provide information about the addition and removal of such devices.Table 2-13.Processor Device Status TrapsTrap IDAlert NameRelated Alerts that are Cleared 1602Processor Device Status NormalProcessorDeviceStatusWarning (1603), ProcessorDeviceStatusFailure (1604)1603Processor Device Status WarningProcessorDeviceStatusFailure (1604)1604Processor Device Status Failure ProcessorDeviceStatusWarning (1603)Table 2-14.Pluggable Device TrapsTrap ID Alert NameRelated Alerts that are Cleared 1651DeviceAdd None 1652DeviceRemove None 1653DeviceConfigError NoneBattery TrapsThe BMC on some systems reports the status of batteries. Battery traps provide status and warning information for batteries present in a system with a BMC that reports the status of batteries.Table 2-15.Battery TrapsAlert Name Related Alerts that are ClearedTrapID1702Battery Normal Battery Warning (1703), BatteryFailure (1704) 1703Battery Warning BatteryFailure (1704)1704Battery Failure BatteryWarning (1703)。
FortiMonitor 端到端数字体验监控说明书
DATA SHEET | FortiMonitor2CORE COMPONENTSDigital Experience MonitoringFortiMonitor consolidates monitoring data into a single SaaS-based platform which collects telemetry across complex, hybrid infrastructures. FortiMonitor visualizes the information in a unified capacity, enabling seamless correlation. The solution:§Centralizes monitoring of the end user, network, andinfrastructure that hosts your applications to lower costs, increase efficiency, and optimize resources §Performs synthetic testing from multiple vantage points to monitor application uptime, user experience, and performance §Improves customer and end-user experience by proactively identifying service degradation §Correlates network and application performance data to quickly pinpoint root cause of issues §Leverages endpoint performance data, such as CPU,memory, disk, and network metrics, along with network and application performance. This facilitates troubleshooting with insights into what the user is experiencingComprehensive Full Stack Visibility and Fortinet Fabric IntegrationFortiMonitor uniquely offers the ability to observe health and performance of all of your devices, across any network, and the infrastructure that applications utilize, whether containers, cloud, on-premises, or hybrid.§Monitors status and health of end-user devices, network, cloud, and on-prem infrastructure, public, or privately hosted applications. §Integrates with the Fortinet Security-Fabric to easily discover and gain insight into FortiGate and downstream Fortinet device health and performance metrics, including LAN, Wi-Fi, and SD-WAN. §Simulates application-specific traffic over SD-WANunderlays to accurately measure real-time user experienceand application performance over SD-WANDATA SHEET | FortiMonitorEnriched Incident and Event Management FortiMonitor enables organizations to observe, correlate, and respond to incidents from a single unified platform. Organizations can:§Enrich incidents by automating the diagnostic collection and designing remedial measures with automated runbooks, allowing teams to fully customize their incident handling experience at scale§Maximize team collaboration through native timeline and messaging communication features, accelerating response coordination§Ensure critical events trigger notification of team members instantly, through email, FortiMonitor mobile application, and SMS alerting§Eliminate fragmentation with a single source of reliable data insights AutomationFortiMonitor enables a network operator to respond to performance issues with flexible automated runbooks. These can be tuned to an organization’s best practices to accelerate operational and response tasks. Benefits of automated runbooks include:§Reduction of repetitive manual tasks and high-volume context switching§Accelerate mean time to detect (MTTD)/mean timeto response (MTTR), while improving service delivery, permitting teams to meet and exceed SLAs§Proactive optimization of processes and resolution of incidents§Swift deployment of automation for immediate use and onboardingCORE COMPONENTSFortiMonitor Live Incident Tracking3DATA SHEET | FortiMonitor4CORE COMPONENTSFeature Highlights§Comprehensive Performance Monitoring: Monitorendpoints, vendor agnostic network devices, infrastructure, applications, and cloud services with a single, SaaS-based platform. §Endpoint Digital Experience: Leverage endpoint performance data, such as CPU, memory, disk, and network metrics, along with network and applicationperformance to facilitate troubleshooting with insights into what an end user is experiencing. §Fortinet Fabric Integration: Discover and monitor FortiGate and downstream Fortinet device health and performance metrics, including LAN, Wi-Fi, and SD-WAN. §Flexible automated onboarding: Easily onboard FortiGate and connected FortiAP, FortiSwitch and FortiExtender devices. §Auto Discovery : Network-wide SNMP device discovery. Host and guest performance of VMWare , Kubernetes Helm. §Cloud Integrations: AWS, Azure,§Synthetic Transaction Monitoring (STM): Gain visibility into the availability and performance of any application using browser tracing or javascript that run from global public nodes and private networks §Incident Flow: Send and escalate alerts via mobile app, email, SMS, phone call, and third-party ticketing integrations, with advanced alert timelines §Topology Mapping: Visualize all network elements in real time to see how they connect and identify performance issues. §Network Configuration Management (NCM): Get centralized network configuration management and templates for seamless configuration updates. § Multi-tenancy: Manage client devices while providing role-based access control (RBAC) for insights into digital experience performance monitoring. §Dashboards and reporting: Save time with out-of-the-box and customizable dashboards, reports and public status pages for real-time and historical reporting. §Alert Integrations: Integrate with JIRA, Slack, Microsoft TeamsDATA SHEET | FortiMonitor COLLECTOR APPLIANCEDeploy Key Component 100F ApplianceFortiMonitor is a cloud-native, vendor-neutral, SaaS-based monitoring platform offering key strengths in digital experience monitoring, FortiOS and Fabric integration, and importantly, network device detection and monitoring. Furthermore, with offerings such as Incident Management, NetFlow and NCM, FortiMonitor quickly becomes a platform of tool consolidation and workflow centralisation. Importantly, one of the key components in the FortiMonitor deployment strategy is the FortiMonitor OnSight Collector appliance. The FortiMonitor Onsight appliance is perhaps the most ubiquitous of facilities in the FortiMonitor toolbox, offering a plethora of network level insights, including FortiOS Insights, all in a single appliance.The FortiMonitor 100F comes fully prepared with everything you need to quickly get up and running, but with the added benefit of a “plug and play” deployment strategy. This unit brings significant compute capacity right to the edge—to the end-user environment, network, and remote branch location. Out of the box, the 100F is designed to support maximum volumes of metric collection as follows:§150,000 SNMP metrics§2500 VMWare instances (hosts, clusters, VMs, datastores)§4500 synthetic network checks (ping, HTTP, HTTPS)§100 browser-based synthetic checksHowever, if you wish to deploy FortiMonitor Onsight appliance where a requirement exists for values beyond these specifications, FortiMonitor has this prospect taken care of. Using the feature of “OnSight Groups” (a FortiMonitor configuration option) it is entirely possible to harness the abilities of multiple Onsight units operating in HA / LB tandem, depending on your configuration.Upon deployment, the initial setup is performed locally on the device itself, and once that’s complete, all administrative and management functions are streamlined through the web-based FortiMonitor control panel. This approach gives the FortiMonitor 100F a familiar look and feel. Enabling monitoringat scale has never been so easy.5DATA SHEET | FortiMonitor6THE FORTINET SECURITY FABRICFortiMonitor integration with the Fortinet Security Fabric enables organizations to import all eligible Fortinet devices into the FortiMonitor platform. This enables digital experience and network performance monitoring at scale. TheFortiMonitor platform brings enriched alerting and an incident management toolset to managed devices for proactive monitoring and generating of alerts in response to error conditions.Fortinet Services24/7 FortiCare Service24x7 FortiCare technical support is delivered through ourGlobal Technical Assistance Centers. Each geographical region has a Center of Expertise that is supplemented by regional support centers. This enables us to provide regional and local language support. Foundational FortiCare device-level support includes:§Global toll-free numbers that are available 24x7 §Web chat for quick answers§ A support portal for ticket creation or to manage assets and life cycles §Access to software updates and a standard next-business-day RMA service for the device FortiCare Best Practice Services for FortiMonitorThe FortiCare Best Practice Service (BPS) provides technical advice to help organizations make the most of their Fortinet investment. FortiCare BPS is an annual subscription-based service. Once a ticket is created through the FortiCareSupport Portal, the BPS ticket is rerouted to a product-specific technical expert. Response for these consultations are handled as per a standard P3 ticket.Security FabricThe industry’s highest-performing cybersecurity platform, powered by FortiOS, with a rich ecosystem designed to span the extended digital attack surface, delivering fully automated, self-healing network security.§Broad . Coordinated detection and enforcement across the entire digital attack surface and lifecycle with converged networking and security across edges, clouds, endpoints and users §Integrated . Integrated and unified security, operation, and performance across different technologies, location, deployment options, and the richest Ecosystem §Automated . Context aware, self-healing network, and security posture leveraging cloud-scale and advanced AI to automatically deliver near-real-time, user-to-application coordinated protection across the Fabric The Fabric empowers organizations of any size to secure and simplify their hybrid infrastructure on the journey to digitalinnovation.DATA SHEET | FortiMonitorFMR-DAT-R05-20230206Copyright © 2023 Fortinet, Inc. All rights reserved. Fortinet , FortiGate , FortiCare and FortiGuard , and certain other marks are registered trademarks of Fortinet, Inc., and other Fortinet names herein may also be registered and/or common law trademarks of Fortinet. All other product or company names may be trademarks of their respective owners. Performance and other metrics contained herein were attained in internal lab tests under ideal conditions, and actual performance and other results may vary. Network variables, different network environments and other conditions may affect performance results. Nothing herein represents any binding commitment by Fortinet, and Fortinet disclaims all warranties, whether express or implied, except to the extent Fortinet enters a binding written contract, signed by Fortinet’s General Counsel, with a purchaser that expressly warrants that the identified product will perform according to certain expressly-identified performance metrics and, in such event, only the specific performance metrics expressly identified in such binding written contract shall be binding on Fortinet. For absolute clarity, any such warranty will be limited to performance in the same ideal conditions as in Fortinet’s internal lab tests. Fortinet disclaims in full any covenants, representations, and guarantees pursuant hereto, whether express or implied. Fortinet reserves the right to change, modify, transfer, or otherwise revise this publication without notice, and the most current version of the publication shall be applicable.Fortinet is committed to driving progress and sustainability for all through cybersecurity, with respect for human rights and ethical business practices, making possible a digital world you can always trust. You represent and warrant to Fortinet that you will not use Fortinet’s products and services to engage in, or support in any way, violations or abuses of human rights, including those involving illegal censorship, surveillance, detention, or excessive use of force. Users of Fortinet products are required to comply with the Fortinet EULA (https:///content/dam/fortinet/assets/legal/EULA.pdf ) and report any suspected violations of the EULA via the procedures outlined in the Fortinet Whistleblower Policy (https:///domain/media/en/gui/19775/Whistleblower_Policy.pdf ). ORDER INFORMATIONDevice/Server Subscriptions25-pack FC2-10-MNCLD-436-01-DD FC2-10-MNCLD-437-01-DD 500-pack FC3-10-MNCLD-436-01-DD FC3-10-MNCLD-437-01-DD 2000-pack FC4-10-MNCLD-436-01-DD FC4-10-MNCLD-437-01-DD 10 000-packFC5-10-MNCLD-436-01-DD FC5-10-MNCLD-437-01-DD Container Subscriptions25-pack FC2-10-MNCLD-439-01-DD FC2-10-MNCLD-440-01-DD 500-pack FC3-10-MNCLD-439-01-DD FC3-10-MNCLD-440-01-DD 2000-pack FC4-10-MNCLD-439-01-DD FC4-10-MNCLD-440-01-DD 10 000-packFC5-10-MNCLD-439-01-DD FC5-10-MNCLD-440-01-DD FortiGate Subscriptions25-pack FC2-10-MNCLD-456-01-DD FC2-10-MNCLD-457-01-DD 500-pack FC3-10-MNCLD-456-01-DD FC3-10-MNCLD-460-01-DD 2000-pack FC4-10-MNCLD-456-01-DD FC4-10-MNCLD-457-01-DD 10 000-packFC5-10-MNCLD-456-01-DD FC5-10-MNCLD-457-01-DD LAN Edge Device Subscriptions25-pack FC2-10-MNCLD-459-01-DD FC2-10-MNCLD-460-01-DD 500-pack FC3-10-MNCLD-459-01-DD FC3-10-MNCLD-460-01-DD 2000-pack FC4-10-MNCLD-459-01-DD FC4-10-MNCLD-460-01-DD 10 000-packFC5-10-MNCLD-459-01-DDFC5-10-MNCLD-460-01-DDDEM Synthetics Subscriptions25-pack FC2-10-MNCLD-441-01-DD 500-pack FC3-10-MNCLD-441-01-DD 2000-pack FC4-10-MNCLD-441-01-DD 10 000-packFC5-10-MNCLD-441-01-DDDigital Experience Monitoring (DEM) subscriptions for advanced synthetics monitoring are also available independently from endpoint servers and containers.Separate SKUs are provided for devices/servers, containers, FortiGate, and LAN Edge devices.FortiCare Best Practices (BPS) Onboarding Service< 250 EndpointsFC1-10-MNBPS-310-02-DD 250 - 999 Endpoints FC2-10-MNBPS-310-02-DD 1000 - 4999 Endpoints FC3-10-MNBPS-310-02-DD >= 5000 EndpointsFC5-10-MNBPS-310-02-DDFortiMonitor 100FFMR-100FThe FortiMonitor 100F is the appliance-based OnSight Collector, allowing a single hardware appliance for STM, SNMP Trap/Get and FortiMonitor Agent connectivity.Additional onboarding services are available as subscriptions providing onboarding consultation services.FortiMonitor Collector appliance is available now.Other FortiMonitor SKUs are orderable for Basic Nodes: Basic instances are also available for simple uptime/availabilitymonitoring of endpoints, devices, servers, and websites. This instance type does not include advanced performance metrics.ALSO AVAILABLE。
Oracle财务科技犯罪和合规侦查中心说明书
Data Sheet Oracle Financial ServicesCrime and Compliance Investigation HubDiscover criminal patterns hidden undercomplex layers of money launderingnetworksFinancial information stored in operational silos and disparate transaction monitoring and sanctions detection engines means investigations into financial crime are complex, time-consuming and often incomplete. Leveraging the Oracle Enterprise Financial Crimes Graph Model that links customers, accounts, external entities, transactions and external data, Oracle Financial Services Crime and Compliance Investigation Hub is Oracle’s end-user application for consolidated financial crime investigations.Comprehensive and contextualized financial crime investigationOracle Financial Services Crime and Compliance Investigation Hub is an intuitive investigation platform based on advanced graph technology, which has been purpose-built for accelerated and effective discovery of complex hidden financial crime network patterns. It links customers, accounts, external entities, transactions, and ad hoc external data, to eliminate information and operational silos, resolve entities, and provide a holistic intuitive graph representation of data that aids uncovering of any otherwise hidden suspicious patterns. Investigators can simply click their way through the entities and theirs connections, represented as nodes on the graph model, to analyze networks and suspicious activities. The in-built scoring, matching and correlation engines create meaningful units of investigation and pre-configured red flags and risk factors target investigative effort effectively. Oracle Financial Services Financial Crime and Compliance Investigation Hub enables financial institutions to:Prevent the manual collation of information from disparate sources for ad hoc investigationsAccelerate investigations by aggregating relevant data and automatically generating case narrative and insightsIncrease investigator productivity by recommending focus areas based on risk and historic decisionsProvide a 360-degree of a customer, external entity or account for a holistic view of alert, transactions and external data of interest.Industry ChallengesHuge quantum of structured and unstructured data makes it difficult to identify hiddenpatternsComplex and time-consuming investigationsWhy Oracle Investigation Hub?25 years of fighting financialcrime for over 175+ global FIs Recognized as Leader inQuadrant SPARK Matrix forAML, 2021Recognized Category Leader by Chartis for AML in 2020, 2021,2022Breaking complex crimenetworks with intuitive Graph-based InvestigationsHolistic graph representation of data aids identification ofhidden suspicious patternsPre-built user interfaces for case investigations andsanctionsInbuilt correlations and scoring algorithms that provideconsolidated risk score for anyentityReadily usable across theenterprise financial crimes data lake as well as external datasourcesUtilize fuzzy matching and transliteration to identify links within the graph model and with external sources of data such as watchlistinformation, company hierarchy and beneficial owner details and external investigations.Advanced data-driven investigations with intuitive graph-based network visualizationEmpower your teams to uncover hidden relationships and suspicious patterns between entities in complex criminal networks using an interactive and intuitive graph-based data visualization tool that provides a comprehensive view into case investigation data. Discover repetitive patterns between current and historical case investigations enabling investigators to get to the root of suspicious financial transactions.Powered by Oracle’s highly scalable in-memory Graph Analytics Engine (PGX) making it possible to depict business entities, related transactions, beneficiary relationships, related sanctions and industry data in a simplified visual graph pattern.Ability to intuitively explore relationships among all sorts of data points by simply clicking from one node to the nextEntity ResolutionEnables investigators to search for customer identifierinformation such as name, date of birth, address, and TIN.Checks for compliance history and the nature of the customer transactions.Investigation AnalysisProvides the investigator with the ability to view links and relationships, risk factors.Provides on screen reference data to view the effect of any transaction deletion or addition.Explain GraphEnables the investigator to incorporate visual & text components to explain risk and properties in the graph findings.Graph Network showing electronic funds transfer & customer network.Document FindingInvestigation Hub also makes suggestions for next actions, and saves evidence of complete case and findings.Enhanced investigation accuracy with risk-based scoring and case recommendationsEnsure effective investigations with intelligent case decisions using machine learning based risk scores for entities and networks coupled with Key FeaturesPre-built user interfaces for case investigation, special andad hoc investigations andsanctionsConfigurable red flags and risk factors to highlight key areasfor investigationCase summary in narrative format and caserecommendationIn-built correlation and scoring algorithmsExploration of the financial crimes global-graph using aninteractive and visual graphexplorer toolUtilizes proven EnterpriseFinancial Crimes Graph modelwhich accelerates financialcrime investigation use cases Integrates fully with Oracle Financial Crime andCompliance Managementapplication data and externaldata sources such as watchlistand company hierarchy dataand is readily usable across the enterprise financial crimes data lakeBuilt on Oracle FinancialServices Crime and Compliance Studio which includes a highlyscalable in-memory OracleGraph Analytics Engine (PGX),AI and machine learningcase recommendations. Search on, compute a consolidated risk score for any focal entity, and get a view of transactions, accounts, related parties. Real-time risk-based scoring, matching and correlation engines, combined with pre-configured red flags and risk factors creatingmeaningful units of investigationDetailed case analysis breakdowns facilitating informed case decisionsAutomated natural language-based case recommendations that aid investigators in decision-making. Key BenefitsRicher, more contextual data on cases and case focal entities Immediately search fullEnterprise Financial CrimeGraph model for persons orentities of interest. No morehaving to laboriously piecetogether siloed dataSearch on and compute a consolidated risk score for anyfocal entity and get a view oftransactions, accounts, relatedpartiesConfigurable fuzzy namematching and transliteration to find suspicious parties andobfuscated links with the dataAbout Oracle Financial Services Analytical ApplicationsOracle Financial Services Analytical Applications bring financial institutions best-of-breed capabilities to proactively manage Financial Crime, Compliance, Risk, Treasury, Finance and the Front Office. The applications are built upon a commonly available analytical infrastructure consisting of a unified financial services data model, analytical computations, a Metadata driven “R” modeling platform, and the industry-leading Oracle Business Intelligence platform.A single, unified data model and infrastructure provides one version of the analytical “truth" to business users throughout the entire enterprise. This enables financial services institutions to confidently manage performance, governance, risk and compliance. Shared data, metadata, computations and business rules enable institutions to meet emerging business and regulatory requirements with reduced expenses and the unified platform helps financial institutions to leverage existing investments.Connect with usCall +1.800.ORACLE1 or visit . Outside North America, find your local office at: /contact./financialservices /showcase/oraclefs /oraclefs /amlCopyright © 2022, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission.This device has not been authorized as required by the rules of the Federal Communications Commission. This device is not, and may not be, offered for sale or lease, or sold or leased, until authorization is obtained.Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 0122。
基于模糊Petri网和Genetic-PSO算法的动态不确定性知识表示方法
基于模糊Petri网和Genetic-PSO算法的动态不确定性知识表示方法彭勋;王伟明;谷朝臣;胡洁【期刊名称】《模式识别与人工智能》【年(卷),期】2014(000)010【摘要】针对复杂系统中不确定性信息的演变特性,提出对其动态适应的基于模糊 Petri 网和遗传-粒子群( GPSO)算法的不确定性知识表示方法。
在基于模糊Petri网的不确定性知识表示模型的基础上,对该模型进行精确数学表示,并采用GPSO实现对不确定性表征参数的动态求解和自学习。
最后通过在运载火箭伺服机构故障诊断上的应用验证基于GPSO的自学习模糊Petri网的有效性。
%To represent and reason uncertain knowledge in the complex system dynamically and effectively, a self-adaptive method based on fuzzy-Petri net ( FPN ) and genetic particle swarm optimization ( GPSO ) algorithm is proposed. In this method, the knowledge-representation model based on FPN is established to build the mathematical model. And the GPSO is used in self-learning of the uncertain parameters to achieve self-adaptation of the model. Finally, a servo mechanism fault diagnose of launch vehicle is used to verify the proposed method.【总页数】7页(P887-893)【作者】彭勋;王伟明;谷朝臣;胡洁【作者单位】上海交通大学机电设计与知识工程研究所上海200240;上海交通大学机电设计与知识工程研究所上海200240;上海交通大学机电设计与知识工程研究所上海200240;上海交通大学机电设计与知识工程研究所上海200240【正文语种】中文【中图分类】TP181【相关文献】1.基于BP算法的模糊Petri网鱼雷故障诊断方法 [J], 梁远华;苑秉成2.基于模糊Petri网的知识表示方法在燃气轮机故障诊断专家系统中的应用 [J], 黄晓光;高梅梅;王永泓;翁史烈3.基于模糊Petri网的知识表示方法在变压器故障诊断专家系统中的应用 [J], 王蓓蓓;郭基伟;谢敬东;唐国庆4.基于模糊Petri网络的分布式知识表示方法 [J], 王雪;杨叔子5.基于模糊Petri网的动态知识表示与推理方法 [J], 危胜军;胡昌振;孙明谦因版权原因,仅展示原文概要,查看原文内容请购买。
基于迭代对数阈值的加权RPCA非局部图像去噪_杨国亮
(1பைடு நூலகம்)
再利用迭代对数阈值的方法进行求解:
f(L)=‖L- N‖F2 +λ‖2 L‖W,*=
r
‖L-N‖F2 +λ2Σlog(δ+ σi(LW,*) )= i=1
r
‖L-N‖F2 +λ2Σlog(δ+ σi-Wi) ) i=1
(15)
其
中
,N=UΣVT=D-S+Y/μ,λ2=
2 μ
则:
L*=UΓλ2(Σ-W)VT
的,这样,在进行奇异值收缩时,每一个奇异值都是
平等的对待同一个,忽略了关于矩阵奇异值的先验
知识, 奇异值大的代表数据的最主要投影方向,较
大的奇异值应该收缩更小来保留主要的数据成分,
而较小的奇异值的收缩幅度应该尽可能的偏大,以
此增强图像的低秩性.
受 Gu[7]提出的加权核范数最小化算法[WNNM]
的启发,可以结合 RPCA 算法对其低秩部分进行加
结合上述方法的优点, 文中提出了基于迭代对 数 阈 值 的 加 权 RPCA 非 局 部 图 像 去 噪 方 法 (Log Robust Principal Component Analysis,LRPCA),该算 法首先通过使用非局部相似性的方法, 找到图像块 中的相似块, 将相似块转化为列向量组成新的观测 数据矩阵,再对 RPCA 进行加权,最后,在优化目标 函数的过程中,使用迭代对数阈值的方法进行求解.
(1)
式(1)中,‖誗‖1表示 l1 范数,表示矩阵中的所有元素
绝对值之和. ‖誗‖* 表示核范数,表示矩阵的奇异值
之和,即‖L‖*=Σir=1 σi,r 为 L 的秩,σi 表示矩阵的第
i 个奇异值,λ 为正则化权重参数. 在实际计算中,
基于图论与人工智能搜索技术的电网拓扑跟踪方法
的变化, 将站内结点重新分组并映射为电网的结点, 然后采 用启发式搜索算法对原搜索树进行局部更新, 从而实现对发 生变化的局部电网拓扑的快速跟踪。 对某实际电网进行的测 试结果表明, 该方法具有良好的通用性和实时性, 可满足系 统对电网实时拓扑分析的要求。 关键词:电网拓扑跟踪;图论;邻接矩阵;人工智能;启发 式搜索;电力系统
0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
( aik
b jk ) ,
其中,C 为临时变量;aik、bjk 分别为行向量 Ai 和 Bj 中第 k 列的元素; 表示 “或” 运算; 表示 “与”
SONG Shao-qun,ZHU Yong-li,YU Hong (Key Laboratory of Power System Protection and Dynamic Security Monitoring and Control under Ministry of Education (North China Electric Power University),Baoding 071003,Hebei Province,China)
1 0 0 0 Aadj = 1 0 1 1 0 1 1 1 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 0 1 0 0 1 0 0 0 1 0 1 1 0 1 1 1 0 1 0 0 1 0 0 0 1 0 1 1 1 0 0 0 1 0 1 1
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Manuscript received May 5, 2006.Manuscript revised May 25, 2006.Graph-based Correlation of SNMP Objects for Anomaly DetectionBruno Bogaz Zarpelão 1, Leonardo de Souza Mendes 1 and Mario Lemes Proença Jr.2,1School of Electrical and Computer Engineering, State University of Campinas (UNICAMP), Campinas, SP, Brazil2Computer Science Department, State University of Londrina (UEL), Londrina, PR, BrazilSummaryAnomaly detection is essential, because it allows a rapid reaction to the problems and helps assuring performance and security in computer networks. This paper presents an anomaly detection system based on: (i) the traffic characterization performed by the BLGBA model, which is responsible for the DSNS generation;(ii) an alarm system that compares the DSNS and the realmovement obtained in SNMP objects, sending the alarms to acorrelation system when a behavior deviation is detected; (iii) acorrelation system based on a directed graph which representsthe possible paths of anomaly propagation through the SNMPobjects in a network element. Three years of data collected from the State University of Londrina network were used to evaluate this anomaly detection system. The results were encouraging and confirmed that our system is able to detect anomalies on the monitored network elements, avoiding the high false alarms rate.Key words:Anomaly Detection, SNMP, DSNS, Correlation, Directed Graph1. IntroductionComputer networks are of vital importance nowadays formodern society, comparable to essential services likepiped water, electricity and telephone. Their functionalitycannot be interrupted due to their importance for thepeople that use their services. In this context, theautomation of the network management became fundamental for reducing costs, early detection of networkfailures and avoiding performance bottlenecks. Anomalydetections allow the administrators to answer appropriately to the problems, thus ensuring networksreliability and throughput [1][4][11][12].Despite the latest advances in the development oftechnologies related to networks monitoring, traffic characterization and intrusions detection, to identify anomalies correctly is still a challenging task. Anomaliescan arise from different situations, thus making it difficultto develop techniques to detect them. Among the varioussituations that can cause anomalies we can mention flashcrowds, malfunctioning, network elements failures, vendorimplementation bugs, misconfigurations, transfer of verybig files, outages and malicious attacks such as DoS(Denial of Service), DDoS (Distributed Denial of Service)and worms [6][7][12][14][16][17][18].The anomaly detection techniques known as profile-basedor statistical-based do not require any previous knowledge about the nature and properties of the anomalies to be detected. Their main advantages are the effectiveness in detecting unknown anomalies and the easiness to adapt to new environments. This method establishes a profile for the normal behavior of the network by studying the history of its movements. The detection is accomplished by searching for significant behavior changes that are notcoherent with the previously established profile [5][6][8][14][17]. The first difficulty met when using this method is the fact that there is no consensus about an effective model to characterize network traffic. Factors such as the human working hours create a dynamic network behavior, making the traffic characterization more difficult [1][4][5]. An efficient traffic characterization model must be able to deal with these factors. This work deals with theemployment of the BLGBA (Baseline for AutomaticBackbone Management) model for the calculation of the DSNS (Digital Signature of Network Segment) [10][11] as a suggestion to solve this issue. The definition of which events represent an anomaly and therefore must be reported to the network administrators is still an open question. The difficulty resides on the non-stationary behavior of the network traffic [5][7][16]. Because of these natural variations that occur in the traffic, normal events can be considered anomalous by the anomaly detection system that will generate a false alarm,also known as false positive. Thus, besides using a well-succeeded traffic characterization, we must possess means to avoid the generation of false positives and false alarmswhen comparing real traffic to the profile established by the DSNS. Besides identifying the occurrence of an anomaly, the anomaly detection system must offer additional information about the situation detected thus helping thenetwork administrator to find the origin and solution of the problem quickly. The amount of problems notifications that get to the network administrators must also be observed in order not to overload them [16][18]. An important resource to be used in anomalies detection is the monitoring of different SNMP objects, trying to correlate the results obtained from the analysis performed for each one of these objects. Each one of them offers a particular perspective of the problem. After the correlation these perspectives converge for a single notification containing the additional information useful to thelocalization of the anomaly. Besides, the correlation causes a reduction of the amount of notifications generated. This work proposes the correlation of SNMP objects based on a directed graph that represents the possible courses of anomalies propagation through the objects. This correlation graph is used aiming to verify the occurrence of an anomaly and to generate a map of its behavior related to the network element analyzed, thus increasing the semantic power of the notifications set to the network administrator.The anomaly detection system presented in this work performs in the first place the comparison between the real traffic and the profile of the normal operations obtained from the traffic characterization. This comparison is based on the identification of behavior deviations in each SNMP object monitored through the BLGBA model. After the comparison, the deviations detected are analyzed using the correlation graph and the occurrence of an anomaly is verified. The anomalies detected can be classified in one of the following categories: input flow, output flow or forwarding flow anomaly. This classification is based on the anomaly behavior map obtained with the aid of the correlation graph.This work is organized as follows: Section 2 summarizes some work related to the anomaly detection area. Section 3 describes the network environment used to obtain the results. Section 4 deals with the concepts about traffic characterization and presents the BLGBA model and DSNS. Section 5 will present the Anomaly Detection System and the results of its application to the previously described network environment. Finally, section 6 relates some final considerations and discusses some possible future work.2. Related WorksAnomaly detection has been studied by many researchers. The first works were related mainly to security issues. Usually, techniques based on the signatures of the attacks were used instead of characterizing traffic normal operations. Considering the need to detect unknown anomalies the authors initiated the development of techniques that used the characterization of the normal network operations.It has been recently discovered that besides being necessary to detect the occurrence of anomalies it is important to offer additional information about the problem to facilitate the identification of its origin.Other works such as [2], [8] and [16] also explore the properties of SNMP (Simple Network Management Protocol) [15] and MIB-II (Management Information Base) [9] aiming to detect anomalies. Cabrera et al. [2] presented the possibility of detection of Distributed Denial of Service attacks using data from SNMP objects. Li et al.[8] approached the detection of Denial of Service attacks using SNMP objects. Thottan et al. [16] used the correlation of some SNMP objects face to the anomalies in order to increase the effectiveness of their detection mechanism.Roughan et al. [12] assumed a simple approach to correlate the generated alarms with the use of two data sources: the SNMP management protocol and the BGP external routing protocol. Based on the premise that the false alarms found in the two data sources are not related, i.e., are not simultaneous, the system detects anomalies only when it finds behavior deviations in the two data sources for the same situation.The study of traffic matrices represents another branch in the area of detection and diagnosis of anomalies. The global view of the network offered by the matrices can be useful to infer the cause of the anomalies.Zhang et al. [18] presented a framework that uses traffic matrices to perform network anomography. The name of this technique comes from the union of the words anomaly and tomography. The anomagraphy process is divided in two main steps: anomalies detection and inference about their origin. The different algorithms used at the framework are based on the ARIMA model (Autoregressive Integrated Moving Average), Fourier transform, wavelets and PCA (Principal Component Analysis).Soule et al. [14] have also used the traffic matrices. In the present approach matrices are used to obtain a panoramic view of the network where the Kalman filter is applied. Data resulting from the filtering are analyzed by four different methods that include, for example, statistical techniques to detect sudden behavior changes and wavelet algorithms. These methods are responsible for pointing anomalous situations. A very interesting point of this work is the comparison performed between the results obtained from the application of the four methods.3. Network Environment StudiedTests were performed at the backbone of the network of the State University of Londrina (UEL). The network elements used in the experiments were:•S1: is the Firewall server from the State University of Londrina;•S2: is the main Web server from the State University of Londrina;•S3: is responsible for interconnecting the ATM router to the other backbone segments of the State University of Londrina network; it gathers the trafficof approximately 3000 computers;4. Traffic CharacterizationThe first step considered as fundamental for anomaly detection is the traffic characterization. The model used must be efficient at establishing a profile for the network traffic normal behavior, which presents self-similar characteristics and a lot of noise. The complete control of this normal behavior profile will lead to a precise diagnosis of anomalies.In this work, traffic characterization is performed by the BLGBA model (Baseline for Automatic Backbone Management), which is responsible for the generation of the DSNS (Digital Signature of Network Segment). The BLGBA model and the DSNS it generates were both proposed by Proença et al. [10][11].DSNS is the result of traffic characterization. It can be defined as a set of basic information that constitutes the traffic profile of a network element. This information includes data such as traffic volume, number of errors, types of protocols and the services that are carried along the network element during the day.The BLGBA model was developed based on statistical analyses. It performs analyses for each second of the day, each day of the week, respecting the exact moment of the collection, second by second for twenty-four hours, preserving the characteristics of the traffic based on the time variations along the day. Therefore the goal of the traffic characterization performed by BLGBA is that the resulting DSNS can reflect the normal behavior expected for the network traffic along the day.The BLGBA algorithm is based on a variation in the calculation of mode , which takes the frequencies of the underlying classes as well as the frequency of the modal class into consideration. The calculation takes the distribution of the elements in frequencies, based on the difference between the greatest G aj and the smallest S aj element of the sample, using only 5 classes. This difference is divided by five to form the amplitude h between the classes according to equation (1):()5aj ajS Gh −=(1)Then, the limits of each L Ck class are obtained. They are calculated according to equation (2) where Ck represents the k class (k=1…5):k h S L aj Ck *+=(2) The proposal of the calculation of the DSNS for each B lisecond is to obtain the element that represents 80% of the analyzed samples. The B li will be defined as the greatest element inserted in class with accumulated frequency equal to or greater than 80%. The purpose is to obtain the element that would be above most samples, respecting the limit of 80%. More information about the BLGBA model and DSNS can be found at [10] and [11].Figure 4.1 illustrates in the form of a histogram the daily movement of S 2, and its respective DSNS, generated by the BLGBA model. In this figure some graphs are shown concerning a week of September 2005, with the DSNS in blue and the real movement that occurred on the day in green and red. It is possible to observe a great adjustment between the real movement and the DSNS.Figure 4.1 Real traffic and DSNS at S 2, SNMP object tcpInSegs5. Anomaly DetectionAnomalies detection is performed by comparing the traffic normal operations profile to the real movement, in order to identify anomalous behaviors in the network. TheAnomaly Detection System must be effective and present a low rate of false positives, besides generating a reduced amount of notifications that do not overload the network administrators and have additional information useful to the search of the cause of the anomaly.The first objective of this stage is to compare the data obtained through the SNMP objects monitored with its respective DSNS in the search for significant behavior deviations. Deviations detected by each SNMP object bring different perspectives over the present event. They are later correlated, leading to a more precise diagnosis, which will indicate the existence of an anomaly or not. The correlation is based on a directed graph that explains the existing relations between the monitored SNMP objects according to figure 5.1.Figure 5.1 Correlation graphFigure 5.2 presents the reference model of the Anomaly Detection System. The GBA tool (Automatic Backbone Management) [11] is responsible for collection and storage of samples and the execution of the BLGBA model for the generation of the DSNS. The Alarm system reports the deviation detected through the comparison between the DSNS and the real movement pictured by the SNMP objects. The Correlation system gathers these alarms and analyzes them using the correlation graph. Its function is to verify the occurrence of an anomaly and to offer a map of its behavior for the network administrator aiming to help him on the search of the origin and solutionFigure 5-2 Reference model of the Anomaly Detection System5.1 Alarm SystemThe Alarm system indicates the occurrence of behavior deviation in a specific SNMP object, generating an alarm when the three following facts happen simultaneously: •Fact 1: the real sample analyzed deviates from the limit established by the DSNS and the hysteresis interval t is initiated.•Fact 2: the current sample analyzed overcomes the previous one concerning the occurrence of fact 1 in the hysteresis interval t.•Fact 3: the number of occurrences of fact 2 in the hysteresis interval t overcomes the value of δ.The occurrence of these three facts is required to characterize a significant behavior deviation aiming to avoid the generation of false alarms. The hysteresis interval t is of 60 seconds. The value of δ is 25. These conventions have been defined after a great amount of practical and analytical tests in various situations.The emission of alarms does not generate a straight notification for the administrator, since they indicate not the occurrence of an anomaly but of a behavior deviation in a SNMP object. However, the alarms generated are available in log files and in graphics referring to thenetwork movements so that the administrators can perform a more precise analysis about the event occurred in case it is necessary. Information such as the moment of the generation, number and frequency of the alarms and values of the DSNS and real traffic can be useful for network planning so that likely preventable future anomalous situations are avoided.The operation of the Anomaly Detection System is based on constant time frames of 5 minutes. The comparison of DSNS to the real movement and the correlation of the alarms are performed in these time frames. For each five-minute time frame, various hysteresis interval t can exist. Figure 5.3 presents an example of the behavior of the hysteresis interval t at the five-minute time frame during the operation of the Alarm system . The hysteresis interval is initiated only when a real sample deviates from the limit established by the DSNS. The five-minute frame however is fixed and independent of any other factor. Once the hysteresis interval is started, if the number of occurrences of fact 2 overcomes δ an alarm is generated and sent to thecorrelation system.Figure 5.3 Five-minute frame and hysteresis interval t5.2 Correlation SystemThe Correlation system is based on the correlation directed graph presented in figure 5.1. A graph G is a data structure defined by ()E V G ,=, where V represents the set of vertices of the graph and E the set of edges that link vertices respecting a specific relation between them. For directed graphs, edges express a unidirectional relationship between two vertices and are represented by ordered pairs ()y x , [3]. In correlation graph, an ordered pair ()y x , defines that an anomaly can propagate from the SNMP object represented by vertex x to the SNMP object represented by vertex y .The relations of the correlation graph were built based on the characteristics of the information available at each SNMP object monitored. The correlation graph aims toassemble in its scope the possible courses of propagation of anomalies along the SNMP objects, mapping their behavior at the network element analyzed. The analysis and correlation of the SNMP objects belonging to four different groups of the MIB-II [9], interface , ip , tcp and udp , allow the detection of a very diversified set of anomalies behaviors.The correlation graphs helped the identification of three data flow at network elements in general. The first one is the input flow , formed by the SNMP objects ifInOctets, ipInReceives, ipInDelivers, udpInDatagrams and tcpInSegs . The second one is the output flow formed by the objects tcpOutSegs, udpOutDatagrams , ipOutRequests and ifOutOctets . The third and last one is the forwarding flow , presenting the objects ifInOctets , ipInReceives , ipForwDatagrams and ifOutOctets .Figure 5.4 shows the behavior of the data flows face to the layers of the TCP/IP set of protocols. The input flow crosses all the layers until it reaches the application layer where all the data will be used. The output flow follows the inverse course, starting at the application layer and sending data to the network. The forwarding flow gets to the network layer where it is defined the forwarding of data to other points of the network in a process typical ofrouting equipments.Figure 5.4 TCP/IP protocol layers and the three flowsThe algorithm that gathers all the alarms generated and verifies the occurrence of an anomaly through the correlation graph was built based on the depth search algorithm [3]. The difference is that in the depth search algorithm the graph is processed going from a vertex to its adjacent, while in the algorithm used at the correlation system the graph is processed going from a vertex to itscorrelated. Two vertices are correlated when they are adjacent and there are alarms generated for both SNMP objects in the same five-minute time frame.The algorithm developed requires the definition of the initial and final vertices in order to perform the search. The choice of these vertices is based on what we call initial and final monitoring point for input , output and forwarding flows . The initial monitoring point of the input flow is the object ifInOctets and the final monitoring points are the objects udpInDatagrams and tcpInSegs. The initial points of the output flow are the objects udpOutDatagrams and tcpOutSegs and the final point is the object ifOutOctets . The initial point of the forwarding flow is the object ifInOctets and the final point is object ifOutOctets.The correlation algorithm is presented in table 5.1. The depthFirstSearch routine is recursive and is in charge of processing the correlation graph searching for the course followed by the supposed anomaly, verifying its occurrence and preparing the map to be presented to the network administrator in case the anomaly is detected. The situation is considered anomalous when the search process at the correlation graph reaches a final monitoring point.5.3 Evaluation and ResultsThe parameters used to evaluate the performance of the Anomaly Detection System are the rate of false positives and the rate of anomalies detected compared to the total of occurrences. The following variables are necessary to calculate these parameters:• amount_of_detected : gives the total number ofanomalies that were correctly detected by the Anomaly Detection System . • amount_of_missed : gives the total number of anomalies that occurred and were not detected; •amount_of_false : gives the total number of anomaly notifications that don’t correspond to an anomaly.The following rates can be calculated based on these variables:missedof amount ected of amount ectedof amount rate ection __det __det ___det +=(3) falseof amount ected of amount falseof amount rate false __det _____+=(4)Figures 5.5 and 5.6 present histograms with the results of the evaluation of the Anomaly Detection System . These results were calculated for each month of the second semester of 2005 for the three network elements analyzed in this work: S 1, S 2 and S 3.Table 5.1 Correlation algorithmInput data of the algorithm :•()E V G ,=: correlation graph where V is the set ofvertices that represent the SNMP objects and E is the set of edges that explicit their relationships;• i O : set of initial objects of the three flows;• f O : set of final objects of the three flows;• a : alarm sent to the Alarm system ;•S : stack used in the depth search algorithm;Output data of the algorithm:• anomaly notification that includes the anomalybehavior map shown in subgraph G g ∈; Functions: •()o C : function that returns the set of objectscorrelated to object o;•()a F : function that identifies the flows related toalarm a and returns the set of objects belonging to these flows;/*main program*/ beginCorrelation system receives alarm a; for each ()()a F O o i ∩∈ do depthFirstSearch(o ); end;procedure depthFirstSearch(o ) beginmark o as visited; push o on the stack S; if (f O o ∈) thensend anomaly notification; for each ()()o C o ∈′ do beginif o’ not marked then depthFirstSearch(o’); end for; pop S ; end;Figure 5.5 shows the rates of the anomalies correctly detected calculated according to equation (3). Only the months of July and December for S 1 and July and September for S 2 presented a detection rate under 85%. The other results in figure 5.5 present rates near 90%, which indicates the effectiveness of the Anomaly Detection System .Figure 5.5 Detection rate of Anomaly Detection System for S1, S2 and S3 Figure 5.6 shows the rate of false positives calculated with equation (4). All the rates of false positives found are under 4%. Element S2 presents various incidences of short-duration traffic peaks that can lead the Anomaly Detection System to generate false positives. Elements S1 and S3 do not present so many occurrences of this type and for this reason their rates of false positives are in general lower than those presented by S2. This difference of behavior between the network elements analyzed did not influence the detection rates presented in figure 5.5.Figure 5.6 False positives rate of Anomaly Detection System for S1, S2 andS3The results obtained indicate that the Anomaly Detection System is able to adapt to the different network elements presenting low rates of false positives and high rates of successful detection.The amount of anomalies found in each of the three data flows in a network element is closely related to the characteristics of its operation. S1 is a firewall that concentrates its operations in filtering and forwarding packets. For this reason, most of the anomalies detected in S1 were present at the forwarding flow. S2, which is responsible for final user services through TCP connections and concentrates its operations on the application layer, had its anomalies detected in the input and output flows. Finally, S3 is a router and therefore offers services that are important from the operational point of view and that are transparent to the final user. Its main function is related to data forwarding and it is performed at the forwarding flow, where most of the anomalies for this element were detected. The proportions of anomalies found in each flow for the network elements are presented in table 5.2.Table 5.2 Classification of anomalies occurred during the secondsemester of 2005InputflowOutputflowForwardingflowS10.00 0.04 0.96S20.50 0.50 0.00S30.03 0.02 0.95Figure 5.7 presents the graphics related to the occurrence of an anomaly at S1output flow. There was a great behavior deviation in three SNMP objects: ifOutOctets, ipOutRequests and udpOutDatagrams. The Alarm system generated alarms that were sent to the Correlation system, which executed the search at the correlation graph and detected the occurrence of the anomaly at the output flow. The initial object of the search at the correlation graph was udpOutDatagrams and the final object was ifOutOctets. The notification sent to the network administrators informed the SNMP objects involved, building an anomaly behavior map. Considering that the anomaly is at the output flow, it was possible to conclude that S1 was injecting anomalous traffic at the network and could be causing the anomalies that would be detected in other network elements.Figure 5-7 Anomaly detected for output flow at S1, Firewall6. ConclusionsThis work reconfirms such as in [10] and [11] that the DSNS generated by the BLGBA model presented good results for traffic characterization, thus accomplishing their main purpose, which is the creation of baselines for various network elements. The effectiveness of the traffic characterization is essential for a good throughput of the anomaly detection system, since it is the first and fundamental step that must be accomplished in this kind of system.It was possible to observe that the behavior of SNMP objects is intimately related to the characteristics of its operations. At elements S1and S3 that deal with packets forwarding, the anomalies have appeared most of the times in objects related to the forwarding flow. At element S2, which offers services to the final user and whose actions are concentrated in the application layer, the anomalies appeared at the SNMP objects belonging to the input and output flow.The good results presented by the rates of false positives and of detection in the three elements analyzed show that the Anomaly Detection System is able to adapt to network elements with different characteristics maintaining a good performance. Besides, it is possible to conclude that the system carries out its function effectively, notifying the network administrators about the occurrence of unexpected movements in the traffic. The notifications sent present, among other information, a map of the anomaly behavior obtained from the correlation graph, which facilitates the intervention of the network administrators on the problem.The method presented here fulfilled an important requirement that had not been approached in [17]: to offer additional information to the network administrator to facilitate the search of origin and solution of the problem. The correlation graph proposed in this paper was responsible for this gain related to the semantic of notifications sent to network administrators.Future works include increasing the variety of SNMP objects monitored for each one of the MIB-II groups approached in this work, always searching for the behavior correlation between them. The inclusion of SNMP objects related to the packets discard and to errors can enhance even more the diagnosis offered by the tool. The other step is related to the localization of the origin of theses anomalies, facilitating the determination of the cause and solution of the problem.7. AcknowledgementsOur thanks to The State of São Paulo Research Foundation (FAPESP) that supports this work.References[1]Z. U. M. Abusina, S. M. S. Zabir, A. Ashir, D. Chakraborty,T. Suganuma and N. Shiratori. “An Engineering Approachto Dynamic Prediction of Network Performance fromApplication Logs”. International Journal of Network Management, v. 15, p. 151-162, feb. 2005[2]J. B. D. Cabrera, L. Lewis, X. Qin, W. Lee, R. K. Prasanth,B. Ravichandran, R. K. Mehra. “Proactive Detection ofDistributed Denial of Service Attacks using MIB TrafficVariables – A Feasibility Study”. Integrated NetworkManagement Proceedings, 2001 IEEE/IFIP InternationalSymposium on, p. 609-622, 14-18 maio 2001.[3]J. L. Gersting “Mathematical Structures for ComputerScience”. 5 ed., W H Freeman, 2002.[4]H. Hajji “Baselining Network Traffic and Online FaultsDetection”. IEEE International Conference on Communications, 2003 (ICC '03). v.: 1, p. 301-308, may2003.[5]J. Jiang, S. Papavassiliou “Detecting Network Attacks in theInternet via Statistical Network Traffic Normally Prediction” Journal of Network and Systems Management,v. 12, p. 51-72, mar. 2004.[6] B. Krishnamurthy, S. Sen, Y. Zhang, Y. Chen. “Sketch-based change detection: methods, evaluation, and applications”, Internet Measurement Workshop Proceedingsof the 2003, ACM SIGCOMM conference on Internetmeasurement; Miami Beach, Pages: 234 – 247, ISBN:1-58113-773-7.。