Abstract Local Modelling in Classification on Different Feature Subspaces
深度优先局部聚合哈希
Vol.48,No.6Jun. 202 1第48卷第6期2 0 2 1年6月湖南大学学报)自然科学版)Journal of Hunan University (Natural Sciences )文章编号:1674-2974(2021 )06-0058-09 DOI : 10.16339/ki.hdxbzkb.2021.06.009深度优先局艺B 聚合哈希龙显忠g,程成李云12(1.南京邮电大学计算机学院,江苏南京210023;2.江苏省大数据安全与智能处理重点实验室,江苏南京210023)摘 要:已有的深度监督哈希方法不能有效地利用提取到的卷积特征,同时,也忽视了数据对之间相似性信息分布对于哈希网络的作用,最终导致学到的哈希编码之间的区分性不足.为了解决该问题,提出了一种新颖的深度监督哈希方法,称之为深度优先局部聚合哈希(DeepPriority Local Aggregated Hashing , DPLAH ). DPLAH 将局部聚合描述子向量嵌入到哈希网络 中,提高网络对同类数据的表达能力,并且通过在数据对之间施加不同权重,从而减少相似性 信息分布倾斜对哈希网络的影响.利用Pytorch 深度框架进行DPLAH 实验,使用NetVLAD 层 对Resnet18网络模型输出的卷积特征进行聚合,将聚合得到的特征进行哈希编码学习.在CI-FAR-10和NUS-WIDE 数据集上的图像检索实验表明,与使用手工特征和卷积神经网络特征的非深度哈希学习算法的最好结果相比,DPLAH 的平均准确率均值要高出11%,同时,DPLAH 的平均准确率均值比非对称深度监督哈希方法高出2%.关键词:深度哈希学习;卷积神经网络;图像检索;局部聚合描述子向量中图分类号:TP391.4文献标志码:ADeep Priority Local Aggregated HashingLONG Xianzhong 1,覮,CHENG Cheng1,2,LI Yun 1,2(1. School of Computer Science & Technology ,Nanjing University of Posts and Telecommunications ,Nanjing 210023, China ;2. Key Laboratory of Jiangsu Big Data Security and Intelligent Processing ,Nanjing 210023, China )Abstract : The existing deep supervised hashing methods cannot effectively utilize the extracted convolution fea tures, but also ignore the role of the similarity information distribution between data pairs on the hash network, result ing in insufficient discrimination between the learned hash codes. In order to solve this problem, a novel deep super vised hashing method called deep priority locally aggregated hashing (DPLAH) is proposed in this paper, which em beds the vector of locally aggregated descriptors (VLAD) into the hash network, so as to improve the ability of the hashnetwork to express the similar data, and reduce the impact of similarity distribution skew on the hash network by im posing different weights on the data pairs. DPLAH experiment is carried out by using the Pytorch deep framework. Theconvolution features of the Resnet18 network model output are aggregated by using the NetVLAD layer, and the hashcoding is learned by using the aggregated features. The image retrieval experiments on the CIFAR-10 and NUS - WIDE datasets show that the mean average precision (MAP) of DPLAH is11 percentage points higher than that of* 收稿日期:2020-04-26基金项目:国家自然科学基金资助项目(61906098,61772284),National Natural Science Foundation of China(61906098, 61772284);国家重 点研发计划项目(2018YFB 1003702) , National Key Research and Development Program of China (2018YFB1003702)作者简介:龙显忠(1985—),男,河南信阳人,南京邮电大学讲师,工学博士,硕士生导师覮 通信联系人,E-mail : *************.cn第6期龙显忠等:深度优先局部聚合哈希59non-deep hash learning algorithms using manual features and convolution neural network features,and the MAP of DPLAH is2percentage points higher than that of asymmetric deep supervised hashing method.Key words:deep Hash learning;convolutional neural network;image retrieval;vector of locally aggregated de-scriptors(VLAD)随着信息检索技术的不断发展和完善,如今人们可以利用互联网轻易获取感兴趣的数据内容,然而,信息技术的发展同时导致了数据规模的迅猛增长.面对海量的数据以及超大规模的数据集,利用最近邻搜索[1(Nearest Neighbor Search,NN)的检索技术已经无法获得理想的检索效果与可接受的检索时间.因此,近年来,近似最近邻搜索[2(Approximate Nearest Neighbor Search,ANN)变得越来越流行,它通过搜索可能相似的几个数据而不再局限于返回最相似的数据,在牺牲可接受范围的精度下提高了检索效率.作为一种广泛使用的ANN搜索技术,哈希方法(Hashing)[3]将数据转换为紧凑的二进制编码(哈希编码)表示,同时保证相似的数据对生成相似的二进制编码.利用哈希编码来表示原始数据,显著减少了数据的存储和查询开销,从而可以应对大规模数据中的检索问题.因此,哈希方法吸引了越来越多学者的关注.当前哈希方法主要分为两类:数据独立的哈希方法和数据依赖的哈希方法,这两类哈希方法的区别在于哈希函数是否需要训练数据来定义.局部敏感哈希(Locality Sensitive Hashing,LSH)[4]作为数据独立的哈希代表,它利用独立于训练数据的随机投影作为哈希函数•相反,数据依赖哈希的哈希函数需要通过训练数据学习出来,因此,数据依赖的哈希也被称为哈希学习,数据依赖的哈希通常具有更好的性能.近年来,哈希方法的研究主要侧重于哈希学习方面.根据哈希学习过程中是否使用标签,哈希学习方法可以进一步分为:监督哈希学习和无监督哈希学习.典型的无监督哈希学习包括:谱哈希[5(Spectral Hashing,SH);迭代量化哈希[6](Iterative Quantization, ITQ);离散图哈希[7(Discrete Graph Hashing,DGH);有序嵌入哈希[8](Ordinal Embedding Hashing,OEH)等.无监督哈希学习方法仅使用无标签的数据来学习哈希函数,将输入的数据映射为哈希编码的形式.相反,监督哈希学习方法通过利用监督信息来学习哈希函数,由于利用了带有标签的数据,监督哈希方法往往比无监督哈希方法具有更好的准确性,本文的研究主要针对监督哈希学习方法.传统的监督哈希方法包括:核监督哈希[9](Supervised Hashing with Kernels,KSH);潜在因子哈希[10](Latent Factor Hashing,LFH);快速监督哈希[11](Fast Supervised Hashing,FastH);监督离散哈希[1(Super-vised Discrete Hashing,SDH)等.随着深度学习技术的发展[13],利用神经网络提取的特征已经逐渐替代手工特征,推动了深度监督哈希的进步.具有代表性的深度监督哈希方法包括:卷积神经网络哈希[1(Convolutional Neural Networks Hashing,CNNH);深度语义排序哈希[15](Deep Semantic Ranking Based Hash-ing,DSRH);深度成对监督哈希[16](Deep Pairwise-Supervised Hashing,DPSH);深度监督离散哈希[17](Deep Supervised Discrete Hashing,DSDH);深度优先哈希[18](Deep Priority Hashing,DPH)等.通过将特征学习和哈希编码学习(或哈希函数学习)集成到一个端到端网络中,深度监督哈希方法可以显著优于非深度监督哈希方法.到目前为止,大多数现有的深度哈希方法都采用对称策略来学习查询数据和数据集的哈希编码以及深度哈希函数.相反,非对称深度监督哈希[19](Asymmetric Deep Supervised Hashing,ADSH)以非对称的方式处理查询数据和整个数据库数据,解决了对称方式中训练开销较大的问题,仅仅通过查询数据就可以对神经网络进行训练来学习哈希函数,整个数据库的哈希编码可以通过优化直接得到.本文的模型同样利用了ADSH的非对称训练策略.然而,现有的非对称深度监督哈希方法并没有考虑到数据之间的相似性分布对于哈希网络的影响,可能导致结果是:容易在汉明空间中保持相似关系的数据对,往往会被训练得越来越好;相反,那些难以在汉明空间中保持相似关系的数据对,往往在训练后得到的提升并不显著.同时大部分现有的深度监督哈希方法在哈希网络中没有充分有效利用提60湖南大学学报(自然科学版)2021年取到的卷积特征.本文提出了一种新的深度监督哈希方法,称为深度优先局部聚合哈希(Deep Priority Local Aggregated Hashing,DPLAH).DPLAH的贡献主要有三个方面:1)DPLAH采用非对称的方式处理查询数据和数据库数据,同时DPLAH网络会优先学习查询数据和数据库数据之间困难的数据对,从而减轻相似性分布倾斜对哈希网络的影响.2)DPLAH设计了全新的深度哈希网络,具体来说,DPLAH将局部聚合表示融入到哈希网络中,提高了哈希网络对同类数据的表达能力.同时考虑到数据的局部聚合表示对于分类任务的有效性.3)在两个大型数据集上的实验结果表明,DPLAH在实际应用中性能优越.1相关工作本节分别对哈希学习[3]、NetVLAD[20]和Focal Loss[21]进行介绍.DPLAH分别利用NetVLAD和Focal Loss提高哈希网络对同类数据的表达能力及减轻数据之间相似性分布倾斜对于哈希网络的影响. 1.1哈希学习哈希学习[3]的任务是学习查询数据和数据库数据的哈希编码表示,同时要满足原始数据之间的近邻关系与数据哈希编码之间的近邻关系相一致的条件.具体来说,利用机器学习方法将所有数据映射成{0,1}r形式的二进制编码(r表示哈希编码长度),在原空间中不相似的数据点将被映射成不相似)即汉明距离较大)的两个二进制编码,而原空间中相似的两个数据点将被映射成相似(即汉明距离较小)的两个二进制编码.为了便于计算,大部分哈希方法学习{-1,1}r形式的哈希编码,这是因为{-1,1}r形式的哈希编码对之间的内积等于哈希编码的长度减去汉明距离的两倍,同时{-1,1}r形式的哈希编码可以容易转化为{0,1}r形式的二进制编码.图1是哈希学习的示意图.经过特征提取后的高维向量被用来表示原始图像,哈希函数h将每张图像映射成8bits的哈希编码,使原来相似的数据对(图中老虎1和老虎2)之间的哈希编码汉明距离尽可能小,原来不相似的数据对(图中大象和老虎1)之间的哈希编码汉明距离尽可能大.h(大象)=10001010h(老虎1)=01100001h(老虎2)=01100101相似度尽可能小相似度尽可能大图1哈希学习示意图Fig.1Hashing learning diagram1.2NetVLADNetVLAD的提出是用于解决端到端的场景识别问题[20(场景识别被当作一个实例检索任务),它将传统的局部聚合描述子向量(Vector of Locally Aggregated Descriptors,VLAD[22])结构嵌入到CNN网络中,得到了一个新的VLAD层.可以容易地将NetVLAD 使用在任意CNN结构中,利用反向传播算法进行优化,它能够有效地提高对同类别图像的表达能力,并提高分类的性能.NetVLAD的编码步骤为:利用卷积神经网络提取图像的卷积特征;利用NetVLAD层对卷积特征进行聚合操作.图2为NetVLAD层的示意图.在特征提取阶段,NetVLAD会在最后一个卷积层上裁剪卷积特征,并将其视为密集的描述符提取器,最后一个卷积层的输出是H伊W伊D映射,可以将其视为在H伊W空间位置提取的一组D维特征,该方法在实例检索和纹理识别任务[23別中都表现出了很好的效果.NetVLAD layer(KxD)x lVLADvectorh------->图2NetVLAD层示意图⑷Fig.2NetVLAD layer diagram1201NetVLAD在特征聚合阶段,利用一个新的池化层对裁剪的CNN特征进行聚合,这个新的池化层被称为NetVLAD层.NetVLAD的聚合操作公式如下:NV((,k)二移a(x)(血⑺-C((j))(1)i=1式中:血(j)和C)(j)分别表示第i个特征的第j维和第k个聚类中心的第j维;恣&)表示特征您与第k个视觉单词之间的权.NetVLAD特征聚合的输入为:NetVLAD裁剪得到的N个D维的卷积特征,K个聚第6期龙显忠等:深度优先局部聚合哈希61类中心.VLAD的特征分配方式是硬分配,即每个特征只和对应的最近邻聚类中心相关联,这种分配方式会造成较大的量化误差,并且,这种分配方式嵌入到卷积神经网络中无法进行反向传播更新参数.因此,NetVLAD采用软分配的方式进行特征分配,软分配对应的公式如下:-琢II Xi-C*II 2=—e(2)-琢II X-Ck,II2k,如果琢寅+肄,那么对于最接近的聚类中心,龟&)的值为1,其他为0.aS)可以进一步重写为:w j X i+b ka(x i)=—e-)3)w J'X i+b kk,式中:W k=2琢C k;b k=-琢||C k||2.最终的NetVLAD的聚合表示可以写为:N w;x+b kv(j,k)=移—----(x(j)-Ck(j))(4)i=1w j.X i+b k移ek,1.3Focal Loss对于目标检测方法,一般可以分为两种类型:单阶段目标检测和两阶段目标检测,通常情况下,两阶段的目标检测效果要优于单阶段的目标检测.Lin等人[21]揭示了前景和背景的极度不平衡导致了单阶段目标检测的效果无法令人满意,具体而言,容易被分类的背景虽然对应的损失很低,但由于图像中背景的比重很大,对于损失依旧有很大的贡献,从而导致收敛到不够好的一个结果.Lin等人[21]提出了Focal Loss应对这一问题,图3是对应的示意图.使用交叉爛作为目标检测中的分类损失,对于易分类的样本,它的损失虽然很低,但数据的不平衡导致大量易分类的损失之和压倒了难分类的样本损失,最终难分类的样本不能在神经网络中得到有效的训练.Focal Loss的本质是一种加权思想,权重可根据分类正确的概率p得到,利用酌可以对该权重的强度进行调整.针对非对称深度哈希方法,希望难以在汉明空间中保持相似关系的数据对优先训练,具体来说,对于DPLAH的整体训练损失,通过施加权重的方式,相对提高难以在汉明空间中保持相似关系的数据对之间的训练损失.然而深度哈希学习并不是一个分类任务,因此无法像Focal Loss一样根据分类正确的概率设计权重,哈希学习的目的是学到保相似性的哈希编码,本文最终利用数据对哈希编码的相似度作为权重的设计依据具体的权重形式将在模型部分详细介绍.正确分类的概率图3Focal Loss示意图[21】Fig.3Focal Loss diagram12112深度优先局部聚合哈希2.1基本定义DPLAH模型采用非对称的网络设计.Q={0},=1表示n张查询图像,X={X i}m1表示数据库有m张图像;查询图像和数据库图像的标签分别用Z={Z i},=1和Y ={川1表示;i=[Z i1,…,zj1,i=1,…,n;c表示类另数;如果查询图像0属于类别j,j=1,…,c;那么z”=1,否则=0.利用标签信息,可以构造图像对的相似性矩阵S沂{-1,1}"伊”,s”=1表示查询图像q,和数据库中的图像X j语义相似,S j=-1表示查询图像和数据库中的图像X j语义不相似.深度哈希方法的目标是学习查询图像和数据库中图像的哈希编码,查询图像的哈希编码用U沂{-1,1}"",表示,数据库中图像的哈希编码用B沂{-1,1}m伊r表示,其中r表示哈希编码的长度.对于DPLAH模型,它在特征提取部分采用预训练好的Resnet18网络[25].图4为DPLAH网络的结构示意图,利用NetVLAD层聚合Resnet18网络提取到的卷积特征,哈希编码通过VLAD编码得到,由于VLAD编码在分类任务中被广泛使用,于是本文将NetVLAD层的输出作为分类任务的输入,利用图像的标签信息监督NetVLAD层对卷积特征的利用.事实上,任何一种CNN模型都能实现图像特征提取的功能,所以对于选用哪种网络进行特征学习并不是本文的重点.62湖南大学学报(自然科学版)2021年conv1图4DPLAH结构Fig.4DPLAH structure图像标签soft-max1,0,1,1,0□1,0,0,0,11,1,0,1,0---------*----------VLADVLAD core)c)l・>:i>数据库图像的哈希编码2.2DPLAH模型的目标函数为了学习可以保留查询图像与数据库图像之间相似性的哈希编码,一种常见的方法是利用相似性的监督信息S e{-1,1}n伊"、生成的哈希编码长度r,以及查询图像的哈希编码仏和数据库中图像的哈希编码b三者之间的关系[9],即最小化相似性的监督信息与哈希编码对内积之间的L损失.考虑到相似性分布的倾斜问题,本文通过施加权重来调节查询图像和数据库图像之间的损失,其公式可以表示为:min J=移移(1-w)(u T b j-rs)专,B i=1j=1s.t.U沂{-1,1}n伊r,B沂{-1,1}m伊r,W沂R n伊m(5)受FocalLoss启发,希望深度哈希网络优先训练相似性不容易保留图像对,然而Focal Loss利用图像的分类结果对损失进行调整,因此,需要重新进行设计,由于哈希学习的目的是为了保留图像在汉明空间中的相似性关系,本文利用哈希编码的余弦相似度来设计权重,其表达式为:1+。
面向高层次应用的点云数据结构化及语义化表达研究
面向高层次应用的点云数据结构化及语义化表达研究点云数据是一种重要的三维数据表示形式,它由大量的离散点组成,通常用于描述真实世界中的物体表面或环境。
在许多领域,如计算机视觉、机器人技术、地理信息系统等,点云数据都扮演着至关重要的角色。
然而,点云数据的非结构化和无序性给其应用带来了很大的挑战。
如何对点云数据进行结构化处理,并实现语义化的表达,一直是学术界和工业界共同关注的研究方向之一。
1.点云数据的特点点云数据是由大量的离散点组成的,每个点包含空间坐标信息和可能的属性信息。
这种数据表示形式具有高度的灵活性和真实性,能够精确地描述物体的表面细节和环境的空间结构。
然而,由于点云数据的无序性和非结构化特点,使其难以直接用于高层次的应用,例如物体识别、场景分割、路径规划等。
2.点云数据的结构化处理为了克服点云数据的非结构化特点,研究者们提出了许多方法和算法,对点云数据进行结构化处理。
其中,最常见的方法包括网格化处理、基于特征的描述和深度学习方法。
在网格化处理中,点云数据被转换为规则网格的形式,从而方便后续的分析和处理。
而基于特征的描述则是通过对点云数据的局部特征进行提取和描述,从而实现对点云的结构化表达。
近年来深度学习方法的发展也为点云数据的结构化处理提供了新的途径,例如基于卷积神经网络的点云处理方法已经取得了一系列的突破,使得点云数据得以高效地进行结构化处理。
3.点云数据的语义化表达除了结构化处理外,对点云数据进行语义化的表达也是十分重要的。
语义化表达是指将点云数据中的每个点赋予相应的语义标签或分类信息,从而实现对点云数据的语义理解和应用。
在实际应用中,语义化表达可以帮助机器识别不同的物体或环境,并做出相应的决策。
研究如何实现点云数据的有效语义化表达是当前研究的热点之一。
4.个人观点与展望在我的观点看来,面向高层次应用的点云数据结构化及语义化表达研究是一项既具有挑战性又具有重要意义的工作。
通过结构化处理和语义化表达,点云数据得以更好地应用于各种实际场景中,为人工智能、自动驾驶、智能制造等领域的发展提供了重要的支持。
遥感影像语义理解
遥感影像语义理解基于⾃适应深度稀疏语义建模的⾼分辨率遥感影像场景分类:为了挖掘⾼分辨率遥感场景更具区分性的语义信息,提出了⼀种将稀疏主题和深层特征⾃适应相融合的深度稀疏语义建模(ADSSM)框架。
⾸先,为了从影像中发现本质底层特征,ADSSM框架集成了基于中层的稀疏主题模型FSTM和基于⾼层的卷积神经⽹络CNN。
基于稀疏主题和深度特征视觉信息的互补性,设计了三种异质性稀疏主题和深度场景特征来描述⾼分辨率遥感影像的复杂的⼏何结构和空间模式。
其中, FSTM可以从影像中获取局部和显著性信息,⽽CNN则更多关注的是全局和细节信息。
稀疏主题和深度特征的集成为⾼分辨率遥感场景提供了多层次的特征描述。
其次,为了改善稀疏主题和深度特征的融合,针对稀疏主题和深度特征之间的差异性,提出了⼀种⾃适应特征标准化策略。
在ADSSM中,挖掘的稀疏主题和深度特征各⾃进⾏⾃适应的标准化,以增强代表性特征的重要性。
基于⾃适应融合特征的表达,ADSSM框架可以减少复杂场景的混淆。
ADSSM框架在UCM、Google、NWPU-RESISC45以及OSRSI20四个数据集上的结果表明提出的⽅法相较于⽬前公认的⾼精度场景分类⽅法来说有了较⼤的提升。
资源共享1.公开数据集(1)SIRI-WHU ⾕歌影像数据集 (The Google image dataset of SIRI-WHU, 更新⽇期:2019.12.10).该数据集包括12个类别,主要⽤于科研⽤途。
以下各个类别中均包含200幅影像:农场、商业区、港⼝、闲置⽤地、⼯业区、草地、⽴交桥、停车场、池塘、居民区、河流、⽔体每⼀幅影像⼤⼩为200*200,空间分辨率为2⽶。
该数据集获取⾃⾕歌地球,由武汉⼤学RS-IDEA研究组(SIRI-WHU)搜集制作,主要覆盖了中国的城市地区。
当您发表的结果中⽤到了该数据集,请引⽤以下⽂献:[1]Q. Zhu, Y. Zhong, L. Zhang, and D. Li, "Adaptive Deep Sparse Semantic Modeling Framework for High Spatial Resolution Image Scene Classification," IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(10): 6180-6195. DOI: 10.1109/TGRS.2018.2833293.[2]Q. Zhu, Y. Zhong, B. Zhao, G.-S. Xia, and L. Zhang, "Bag-of-Visual-Words Scene Classifier with Local and Global Features for High Spatial Resolution Remote Sensing Imagery," IEEE Geoscience and Remote Sensing Letters, 2016, 13(6): 747-751. DOI:10.1109/LGRS.2015.2513443 2016.(2)SIRI-WHU USGS标注影像数据集 (The USGS image dataset of SIRI-WHU, 更新⽇期:2019.12.10).该数据集包括4个场景类别:农场、森林、居民区、停车场,其主要⽤于科研⽤途。
多层融合深度局部PCA子空间稀疏优化特征提取模型
多层融合深度局部PCA子空间稀疏优化特征提取模型胡正平;陈俊岭【摘要】Subspace method is classical pattem recognition method,that uses global information mainly to denote an image.Recently,with the introduction of deep learning,the feature extraction model based on local self-learning has attracted more and more attention.By using the theory of deep learning,this paper presents a new feature extraction model based on multi-layered deep local subspace sparse optimization to solve the problem of object recognition.Firstly,we calculate the PCA mapping matrix on the first layer by minimizing the reconstruction error on the training sample set,then we optimize the feature mapping results through L1 norm to enhance the robustness of algorithm.Secondly,we use the output of the first layer as the input of second layer,then we implement same actions of feature learning.In this way we can map the image to deep PCA subspace.Finally we merge these feature extraction results from different layers with weighting and encode the merged feature with binary hash code and histogram segment code.After that,we obtain the multi-layered deep local subspace sparse feature.The experimental results on face database of FERET 、AR 、Yale and target database of MNIST 、CIFAR-10 show that this feature extraction model can achieve high recognition rate and robustness for illumination,expression and pose.At the sametime,compared with the convolutional neural networks,our algorithm owns the advantages of simple structure and fast convergent rate.%子空间方法是主要利用全局信息的经典模式识别方法,随着深度学习思想的引入,局部自学习结构特征模型得到大家的关注.利用深度学习原理,本文提出一种多层融合的深度局部子空间稀疏优化特征自学习抽取模型解决目标识别问题.首先,对训练样本集通过最小化重构误差得到第一层的主成分(Principal Component Analysis,PCA)特征映射矩阵;然后,通过L1范数约束对特征映射结果进行稀疏优化,提高算法鲁棒性.接着,在第二层映射层以第一层的特征输出为输入,进行同样的特征矩阵学习操作,最终将图像映射至深层PCA子空间;然后,对各个映射层的特征提取结果进行加权融合,进行二值化哈希编码和直方图分块编码,提取图像的深度子空间稀疏特征.在FE-RET、AR、Yale等经典人脸数据库以及MNIST、CIFAR-10等目标数据库上的实验结果表明,该算法可以取得较高的识别率以及较好的光照、表情、人脸朝向鲁棒性,并且相对于卷积神经网络等深度学习框架具有结构简洁、收敛速度快等优点.【期刊名称】《电子学报》【年(卷),期】2017(045)010【总页数】7页(P2383-2389)【关键词】深度学习;多层融合;子空间;稀疏优化【作者】胡正平;陈俊岭【作者单位】燕山大学信息科学与工程学院,河北秦皇岛066004;燕山大学信息科学与工程学院,河北秦皇岛066004【正文语种】中文【中图分类】TP391.4人脸识别技术是模式识别中最具挑战性的研究方向之一,在身份认证、安保监控方面有着广泛应用.人脸识别的发展主要分为三个阶段,从最开始的基于子空间的识别算法,到浅层机器学习算法,再到深度学习人脸识别方法.成熟的子空间人脸识别方法主要包括Turk等[1]提出的Principal Component Analysis(PCA)特征脸(Eigenface)方法和Belhumeur等[2]提出的Fisher脸(FisherFace)方法,奠定了代数特征人脸识别的理论基础.之后兴起的以Back Propagation(BP)神经网络[3]、Support Vector Machine(SVM)支持向量机和稀疏表示[4]为代表的浅层机器学习人脸识别算法,使得人们在像素数据层面对图像有了更深层次的理解.2006年Hinton等[5]提出多隐层神经网络具有更为优异的特征学习能力,并且其在网络训练上的难度可以通过“逐层初始化”的方式来降低,开启了深度学习的大门.近年来随着深度学习研究的深入,越来越多的学者试图将深度学习与子空间方法相结合,其中Chan等[6]提出的PCA Network(PCANet)方法,为子空间深度化指明了方向. 卷积神经网络(Convolutional Neural Network,CNN)作为目前应用最为广泛的深度学习模型,所提取的图像深层特征要优于Local Binary Patterns(LBP)、Histogram of Oriented Gradients(HOG)[7]、Gabor [8]等人为构建的图像特征,其在人脸识别方面的表现也相当出色.在2012年的ImageNet大赛上基于CNN的深度学习模型在两个比赛中均取得第一名[9],并超出第二名至少10%.2014年Taigman等[10]提出的DeepFace算法,在LFW人脸数据库上达到97%以上的正确率.之后由Sun等[11-13]提出的DeepID算法更是将CNN人脸认证的准确率提高到了99.75%.卷积神经网络虽然准确率高,但其参数众多,其参数调整过程不仅复杂而且需要依赖大量的先验知识,收敛速度缓慢,为此研究人员试图寻找更为简单有效的深度结构框架.Lu等[14]将卷积神经网络中的映射核改为加权PCA(WPCA)映射矩阵,采用生成码本的方式融合得到最终的特征向量,提高了网络收敛性能.Liong等[15]提出的深度PCA(DeepPCA,DPCA)人脸识别算法,通过构建双层PCA映射网络,结合ZCA白化,提取深度子空间特征.Chan等[16]在DPCA基础上进行扩展,在2DPCA[17]框架内,加入哈希编码和直方图分块提取,构建了PCANet算法,在人脸识别、手写字体识别、物体识别方面的识别率都达到95%以上.在此基础上,Huang等[18]将PCANet与线性回归分类器相结合,应用于人脸识别,在算法鲁棒性方面有所提高.本文提出一种多层融合深度PCA子空间稀疏优化特征提取算法,在经典PCA特征脸方法中融入深度结构思想,将图像映射至深层PCA子空间,并将不同层次子空间特征映射结果进行加权融合,通过稀疏优化算法进行迭代以增加特征映射鲁棒性.在FERET、AR、Yale等人脸数据库上的实验表明,该方法在单训练样本条件下,其正确率与传统子空间识别方法相比有很大提高,在算法鲁棒性方面相比PCANet等新兴方法也有所改善.在MNIST、CIFAR-10等目标数据库上的实验结果表明该方法和卷积神经网络深度学习框架的识别率相当,但其结构简洁、收敛速度快.2.1 DPCA原理深度学习结构模型在一定程度上与人脑分层信息处理机制[19]相似,Sun Y等[13]通过对卷积神经网络的研究,从数据层面阐述深度分层特征模型在特征提取过程中所具有的稀疏性、特征选择性和特征鲁棒性,提出特征多层映射能够更好的对图像语义进行描述.因此越来越多的研究人员致力于寻找新的有效的多层特征模型,Liong 等[15]提出的深度PCA(DPCA)算法属于子空间深度化改造的典型算法.DPCA算法流程图如图1所示.在第一层对输入图像首先进行ZCA白化处理,去除特征之间相关性,降低冗余,然后进行PCA映射,提取输入图像的主成分信息.在第二层中,以上一层的特征输出为输入,进行同样操作,得到深层映射后的主成分特征.最终,将两层特征映射结果进行加权融合,送入分类器进行训练,在单训练样本条件下使用欧氏距离分类器进行预测.2.2 PCANet模型2015年Chan等[16]提出的PCANet识别算法是深度子空间模型的又一典型算法.它在DPCA算法的基础上进行了大量扩充,以单向的2DPCA[17]为核心,在完成双层特征映射之后,对输入特征进行直方图分块特征提取,然后经过哈希编码,得到鲁棒性更强的深度特征,具体流程如图2所示.(1)输入层假设有N个训练样本样本尺寸为m×n,PCA滤波器尺寸为k1×k2,首先,将第i个训练样本分块,减去均值之后转换成向量形式:此时输入训练样本可以表示为向量集形式:(2)第一映射层首先,对输入的数据矩阵X通过最小化重构误差来得到PCA滤波器核:训练得到的第一层滤波器核为:其中ql(XXT)代表XXT的第l个特征向量.(3)第二映射层第二层,输入数据为前一层映射输出结果:同样,在第二层中对输入数据去均值后进行PCA分析,得到该层的PCA卷积核:通过PCA卷积核将输入数据映射至深层子空间,得到深层子空间主成分特征:(4)输出层在输出层,对提取到的特征进行二值化哈希编码:其中H(·)为类Heaviside阶跃函数,去除特征值中的负值部分.编码完成后,对得到的特征图进行直方图分块处理,得到最终的特征输出:3.1 层间融合深度结构扩展深度结构模型本质上就是将信号在多层次模型中进行逐层映射,在每一层都进行相应的映射特征提取.实验表明[13,16,19],在深层模型中,底层特征多为目标结构信息,顶层特征则多为目标抽象语义信息,因此将各个层面的特征进行融合来进行特征表达,将比单纯使用顶层特征更为全面而准确.Wang[20]等在通过卷积神经网络解决年龄分类问题过程中,同样采取了层间特征融合手段,对深度结构中每层映射结果通过PCA降维进行下采样,融合得到多层深度特征.如图3所示,在本文所提出的算法中,同样借鉴了这种特征级联思想,对第一层和第二层子空间信息进行加权融合,得到更为全面的人脸主成分信息.在特征提取过程中,输入图像为X,尺寸为a×b,第一层的PCA滤波器个数为L1,第二层的滤波器个数为L2,第一层的PCA映射矩阵可表示为第二层映射矩阵可表示为则第一层的映射子空间特征可表示为:第二层映射子空间特征可表示为:将两层特征进行加权融合,得到融合的深度特征F:3.2 子空间2DPCA L1范数稀疏优化人脑视觉皮层在处理图像信号过程中具有结构化、层次化稀疏性[19],同样深度学习结构在信号响应的过程中也表现出这种特性.实验证明[13],深度神经网络的神经元响应稀疏性主要表现在两个方面:一是对于单张图片样本,只有半数神经元进行激活响应;二是对于每个神经元,只会响应半数图片样本.因此人们在构建深度神经网络时会着重对其稀疏性进行约束,其中具有代表性的有稀疏滤波(Sparse Filtering)算法[21]以及线性修正单元[22](Rectified Linear Units,ReLU)激活函数算法.在此基础上,人们对子空间方法也进行了相应的稀疏化改造,包括Lu等[14]提出的稀疏加权PCA(WPCA)算法和Wang等[23]提出的L1范数约束的2DPCA算法.在本文所提出算法中,同样借助L1范数对子空间单向2DPCA映射核进行稀疏优化,使得每层特征输出呈现出视觉稀疏性.假设有K个训练样本图像X=[X1,X2,…,XK],单向2DPCA目的在于找出最优投影矩阵U以及投影系数Z=[Z1,Z2,…,ZK],通过优化以下目标函数:由于在人脸识别中,不可避免的要引入光照、表情等干扰因素,导致重构系数Z丧失原有的稀疏性,因此需要引入一个额外的稀疏误差矩阵E参与重构:重新构造目标函数L(Z,E):优化求解最优的Z和E:这里X代表训练样本数据矩阵,Z代表子空间投影系数,E为误差矩阵,λ为规则化参数,通过l1范数约束,可以求出具有最佳稀疏性的误差矩阵Eopt和Zopt,这里通过迭代来完成优化.第一步:固定Eopt,此时目标函数可表示为Z的函数J(Z):其导数为:令导数为零,求得全局最优值Zopt:第二步:固定Zopt,则目标函数可以表示为:解这个凸优化问题,可得全局最优值Eopt:循环迭代第一步和第二步,得到优化的重构系数Zopt和误差矩阵Eopt,在本文实验中设计迭代次数为10次,得到的重构系数Zopt即为子空间特征.3.3 多层融合深度局部子空间稀疏优化算法流程多层融合深度子空间稀疏优化深度特征提取流程如图4所示.对于一个双层深度子空间模型,假设第一层的映射核个数为L1,第二层的映射核个数为L2,输入图像为X.在第一层映射过程中,对训练样本通过最小化重构误差得到第一层的PCA映射矩阵然后针对子空间重构系数进行迭代优化,迭代10次之后得到第一层子空间稀疏特征在第二层中以第一层的特征输出为输入,再次进行最小化重构和稀疏映射,得到第二层子空间稀疏特征然后再将前后两层得到的映射结果进行加权融合,得到深度稀疏子空间特征再经过哈希编码和直方图分块编码,得到最终的鲁棒深度PCA子空间特征.本文在FERET、Yale、AR等人脸数据库上评估算法在人脸识别方面的准确性和鲁棒性,在CIFAR-10、Minist等数据库上验证算法在目标识别方面的性能.在进行人脸识别实验时采用欧氏距离分类器,在单训练样本的条件下进行识别;在目标识别方面测试时采用支持向量机作为模式识别分类器.采用双层网络结构,第一层映射核个数L1=8,第二层映射核个数L2=8,映射核的尺寸均为7×7,采用半覆盖式分块采样,迭代次数为10次,迭代收敛误差τ=0.05.本文采用CMU-PIE人脸数据库作为训练样本对特征提取模型进行训练,CMU-PIE人脸数据库包含来自68人的41 368张人脸图片,包含不同的姿态、光照、表情变化.这里从中选取1万张相对规范的人脸样本进行训练.实验证明多层融合深度子空间稀疏优化算法由于在特征训练和提取过程中针对噪声干扰进行了稀疏优化,因此其在光照、表情方面的鲁棒性相对于PCANet算法有有一定提升,同时在正确率方面也远远超过传统的单层子空间识别方法.4.1 光照、表情人脸识别为验证算法在面对光照、表情、人脸朝向等干扰因素时的鲁棒性,选用FERET和AR两个人脸库进行仿真实验.本文用到FERET数据库子集共200人,每人7张图片,包括1张正面人脸及6张不同朝向人脸,其中1张具有表情因素干扰,1张具有光照因素干扰,图片大小为64×64.AR数据库子集共100人,每人7张图片,包括1张正面人脸图像及6张具有表情、光照变化的人脸图像,图片大小64×64.采用单训练样本模式进行实验,将人脸数据库中每个人正面中性表情均匀光照人脸图片作为训练样本,其余图片作为测试样本,依据测试样本和训练样本之间深层子空间欧式距离进行分类.图5给出FERET数据库中的部分人脸图像,图6给出AR数据库中的部分人脸图像,表1给出在这两个数据库子集上本文算法与最新的PCANet算法在识别率方面的对比情况.在原文中[16],PCANet-2算法在FERET数据库上达到97.26%的准确率,这主要是由于Chan等所构建的PCANet-2算法是在大型人脸数据库MultiPIE上进行训练的,MultiPIE人脸数据库包含数十万张人脸训练样本,而本文实验由于条件限制,选择在CMU_PIE人脸数据库中的1万张训练样本进行试验,使得学习得到的特征映射矩阵性能相对于原文中的算法模型会有所降低.不过实验证明在同等训练水平下,本文提出的多层融合深度子空间稀疏优化人脸识别算法由于在特征训练和提取过程中针对存在的干扰噪声进行了稀疏优化,因此其相对于之前的PCANet深度学习框架,在存在光照、表情、人脸朝向干扰的情况下,识别率提高了4%~5%左右,充分说明本文提出的算法具有较好的鲁棒性,能够在PCANet算法的基础上更好的应对干扰问题.4.2 深度化模型-单层局部模型对比实验深度学习理论指出多隐层神经网络相对于单一结构网络系统具有更好的特征提取能力.很多学者都尝试对传统子空间方法进行深度化改造,将其扩展为两层甚至多层网络,具有代表性的包括Chan等[16]提出的PCANet算法以及Lu等[14]提出的联合特征学习(Joint Feature Learning,JFL)人脸识别算法.与经典的卷积神经网络等深度学习框架不同,本文所提出算法也属于单层特征提取算法多层堆叠的深度化改造,为了验证分层扩展后的特征提取性能,我们在FERET数据库、AR数据库、Yale B 数据库上进行仿真实验,对比经过深度化改造的特征提取模型与传统特征提取模型的性能,为了着重比较模型特征提取能力,需要简化人脸识别过程中用到的分类器,因此采用最简单的欧氏距离分类器,使得最终识别效果主要取决于模型所提取特征对图像语义的表达能力.表2给出各个方法之间的实验对比结果.实验证明,在同等硬件条件、均使用欧氏距离分类器的情况下,PCANet算法以及本文提出的深层PCA子空间方法都要优于以往的单层全局以及局部特征提取方法.尤其是在处理FERET这种存在较为严重的光照、人脸朝向干扰的人脸数据库时,经典的单层特征提取手段在欧氏距离分类器条件下只能达到50%左右的识别率.可见深层特征提取模型提取的层次化子空间稀疏特征能够更好的对原始图像语义进行描述.4.3 深度化模型-卷积神经网络对比实验本文提出的算法不仅在处理单训练样本人脸识别问题上表现良好,其在多训练样本目标识别方面也具有优异特征学习能力.在手写数字数据库MNIST和目标识别数据库CIFAR-10上进行仿真实验,将本文算法与卷积神经网络算法进行对比,评估算法在准确率、时间消耗等方面的性能.MNIST数据库是深度学习模型测试时常用数据库,主要采集美国中学生手写的数字样本,如图7所示.这里选用60000张图片作为训练样本,10000张样本作为测试样本.类似的CIFAR-10是一个广泛应用于深度学习训练的目标识别库,包含有10种目标,70000张图片,如图8所示,这里同样选择灰度化之后的60000张图片作为训练样本,10000张图片作为测试样本.在MNIST数据库和CIFAR-10数据库上进行仿真实验时,先在训练样本的基础上学习得到深度特征提取模型,然后再对训练样本进行特征提取,将提取到的特征送入支持向量机中进行训练,测试过程中也是先提取测试样本的深度子空间稀疏特征,然后送入支持向量机中进行分类预测.表3给出了卷积神经网络与本文算法在以上两个数据库上的仿真结果.实验证明,在同等硬件条件下(CPU主频3.10GHz,内存64G),本文所提出算法在CIFAR-10数据库上的识别率相对于卷积神经网络降低了大约3%,但训练和识别的时间消耗只为卷积神经网络的五分之一;类似的在MNIST数据库上本文算法识别率相对于卷积神经网络只降低了0.15%,但模型训练的时间损耗为原来的六分之一.不仅如此,在可调参数数量、模型收敛速度等方面本文算法也要优于卷积神经网络,充分证明了多层融合深度子空间稀疏优化算法的特征提取性能.受深度学习理论启发,本文重点研究经典子空间识别方法经过深度化改造之后的特征提取能力,对PCA算法进行深度分层扩展之后将其用于图像深度特征提取.于此同时为了解决人脸图像识别中存在的光照、表情、朝向等干扰问题,增加算法的稳定性,向其中加入稀疏优化算法,通过对特征映射矩阵进行迭代优化来增强其映射的鲁棒性.在FERET、Yale、AR等人脸数据库上的实验表明,该算法在单训练样本、只使用欧氏距离分类器进行分类的情况下,对于存在光照、表情、人脸朝向干扰的数据库仍能够达到较高的识别率,并且相对于单层局部特征提取模型具有更高程度的抽象语义特征提取能力;在MNIST和CIFAR-10等目标识别数据库上的实验表明在多训练样本、借助支持向量机分类的情况下,该方法与卷积神经网络的识别率相当,并且在时间消耗、模型复杂度、模型收敛速度方面要优于卷积神经网络.胡正平男,1970年生于四川仪陇县,博士、教授、博士生导师,中国电子学会高级会员,中国图像图形学会高级会员,目前研究方向为稀疏表示、模式识别.E-mail:***********.cn陈俊岭男,1991年生于河北唐山,燕山大学信息科学与工程学院信息与通信工程专业硕士研究生,主要研究方向为深度学习分类.E-mail:*****************【相关文献】[1]TURK M,PENTLAND A.Eigenfaces for recognition [J].Journal of Cognitive Neuroscience,1991,3(1): 71-86.[2]BELHUMEUR P N,HESPANHA J P,KRIEGMAN D.Eigenfaces vs Fisherfaces: recognition using class specific linear projection [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(7): 711-720.[3]LI Yong-qiang,PAN Jin.Face recognition algorithm based on improved BP neural network [J].International Journal of Security and Its Applications,2015,9(5): 175-184. [4]WRIGHT J,YANG A Y,GANESH A,et al.Robust face recognition via sparse representation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(2): 210-227. [5]HINTON G,SALAKHUTDINOV R.Reducing the dimensionality of data with neural networks [J].Science,2006,313(5786): 504-507.[6]CHAN Tsung Han,JIA Kui,GAO Shenghua,LU Jiwen,ZENG Zinan,MA Yi.PCANet: A simple deep learning baseline for image classification [J].IEEE Transactions on Image Processing,2015,24(12): 5017-5032.[7]TAN Hengliang,YANG Bing,MA Zhengming.Face recognition based on the fusion of global and local HOG features of face images [J].Computer Vision,2014,8(3): 224-234.[8]LIU Chengjun,WECHSLER H.Independent component analysis of Gabor features for face recognition [J].IEEE Transactions on Neural Networks,2003,14(4): 919-928.[9]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks [A].Advances in Neural Information Processing Systems[C].Lake Tahoe,NV,United states: NIPS,2012.1097-1105.[10]TAIGMAN Y,YANG Ming,RANZATO M,WOLF L.DeepFace: Closing the gap to human-level performance in face verification [A].IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [C].Columbus,OH,United states: CVPR,2014.1701-1708.[11]SUN Yi,WANG Xiaogang,TANG Xiaoou.Deep learning face representation from predicting 10,000 classes [A].IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [C].Columbus,OH,United states: CVPR,2014.1891-1898.[12]SUN Y,CHEN Y,WANG X,et al.Deep learning face representation by joint identification-verification [A].Advances in Neural Information Processing Systems [C].Montreal,Canada: NIPS,2014.1988-1996.[13]SUN Yi,WANG Xiaogang,TANG Xiaoou.Deeply learned face representations are sparse,selective,and robust [A].The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [C].Boston,MA,USA: CVPR,2015.2892-2900.[14]LU Jiwen,LIONG V E,WANG Gang,MOULIN P.Joint feature learning for face recognition [J].IEEE Transactions on Information Forensics and Security,2015,10(7): 1371-1383. [15]LIONG V E,LU J,WANG G.Face recognition using Deep PCA [A].IEEE Transactions on Information,Communications and Signal Processing (ICICS) [C].Beijing,China: ICICS,2013.1-5.[16]CHAN T H,JIA K,GAO S,et al.PCANet: A simple deep learning baseline for image classification? [J].IEEE Transactions on Image Processing,2015,24(12): 5017-5032.[17]YANG J,ZHANG D,FRANGI A F,et al.Two-dimensional PCA: a new approach to appearance-based face representation and recognition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(1): 131-137.[18]HUANG J,YUAN C.Weighted-PCANet for Face Recognition [A].Neural Information Proces sing [C].Montréal,Quebec,Canada: Springer International Publishing,2015.246-254.[19]焦李成,赵进,杨淑媛,刘芳,谢雯.稀疏认知学习、计算与识别的研究进展[J].计算机学报,2015,38(10): 1-18.JIAO Li-cheng,ZHAO Jin,YANG Shu-yuan,LIU Fang,XIE Wen.Research advances on sparse cognitive learning,computing and recognition [J].Chinese Journal ofComputers,2015,38(10): 1-18.( in Chinese)[20]WANG Xiaolong,GUO Rui,KAMBHAMETTU C.Deeply-learned feature for age estimation [A].IEEE Winter Conference on Applications of Computer Vision (WACV) [C].Waikoloa,HI,United states: IEEE,2015.534-541.[21]NGIAM Jiquan,KOH Pangwei,Chen Zhenghao,BHASKAR Sonia,NG ANDREW Y.Sparse filtering [A].Advances in Neural Information Processing Systems [C].Granada,Spain: NIPS,2011.1125-1133.[22]DAHL G E,SAINATH T N,HINTON G E.Improving deep neural networks for LVCSR using rectified linear units and dropout [A].IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP) [C].Vancouver,BC,Canada: IEEE,2013.8609-8613.[23]WANG D,LU H.Object tracking via 2DPCA and L1-regularization [J].Signal Processing Letters,2012,19(11): 711-714.。
基于自适应对偶字典的磁共振图像的超分辨率重建
L I U Z h e n - q i , B A 0 L i - j u n , C HE N Z h o n g
r De p a r t m e n t o f E l e c t r o n i c S c i e n c e , X i a me n U n i v e r s i t y , Xi a me n 3 6 1 0 0 5 , C h i n a )
刘振 圻 , 包立君 , 陈 忠
( 厦 门大学电子科 学系, 福建 厦门 3 6 1 0 0 5 )
摘 要: 为了提高磁共振成像的图像 质量 , 提 出了一种基于 自适应对偶字典的超分辨率 去噪重建方法 , 在超分辨率重建过程 中引入去噪功能 , 使 得改善图像 分辨率的同时能够有效地滤除 图像 中的噪声 , 实现 了超分辨率重建和去噪技术 的有机结合 。该 方法利用聚类一P c A算 法提取图像的主要特征来构造主特征字典 , 采用 训练方法设计 出表达图像 细节信 息的 自学 习字 典 , 两者 结合构成的 自适应对偶字典具有 良好 的稀疏度和 自适应性 。实验表 明, 与其他超分辨率算法相 比, 该方法超分辨率重建效果显 著, 峰值信噪 比和平均结构相似度均有所提高。
第2 8 卷第 4 期
2 0 1 3 年8 月
பைடு நூலகம்光 电技术 应 用
EL ECT RO一 0P T I C T ECHNOLOGY AP P LI CAT1 0N
V O1 . 28. NO. 4
Au g u s t , 2 01 3
・
信号 与信息处理 ・
基 于 自适应对偶 字典的磁共振 图像 的超 分辨率重建
基于多层特征嵌入的单目标跟踪算法
基于多层特征嵌入的单目标跟踪算法1. 内容描述基于多层特征嵌入的单目标跟踪算法是一种在计算机视觉领域中广泛应用的跟踪技术。
该算法的核心思想是通过多层特征嵌入来提取目标物体的特征表示,并利用这些特征表示进行目标跟踪。
该算法首先通过预处理步骤对输入图像进行降维和增强,然后将降维后的图像输入到神经网络中,得到不同层次的特征图。
通过对这些特征图进行池化操作,得到一个低维度的特征向量。
将这个特征向量输入到跟踪器中,以实现对目标物体的实时跟踪。
为了提高单目标跟踪算法的性能,本研究提出了一种基于多层特征嵌入的方法。
该方法首先引入了一个自适应的学习率策略,使得神经网络能够根据当前训练状态自动调整学习率。
通过引入注意力机制,使得神经网络能够更加关注重要的特征信息。
为了进一步提高跟踪器的鲁棒性,本研究还采用了一种多目标融合的方法,将多个跟踪器的结果进行加权融合,从而得到更加准确的目标位置估计。
通过实验验证,本研究提出的方法在多种数据集上均取得了显著的性能提升,证明了其在单目标跟踪领域的有效性和可行性。
1.1 研究背景随着计算机视觉和深度学习技术的快速发展,目标跟踪在许多领域(如安防、智能监控、自动驾驶等)中发挥着越来越重要的作用。
单目标跟踪(MOT)算法是一种广泛应用于视频分析领域的技术,它能够实时跟踪视频序列中的单个目标物体,并将其位置信息与相邻帧进行比较,以估计目标的运动轨迹。
传统的单目标跟踪算法在处理复杂场景、遮挡、运动模糊等问题时表现出较差的鲁棒性。
为了解决这些问题,研究者们提出了许多改进的单目标跟踪算法,如基于卡尔曼滤波的目标跟踪、基于扩展卡尔曼滤波的目标跟踪以及基于深度学习的目标跟踪等。
这些方法在一定程度上提高了单目标跟踪的性能,但仍然存在一些局限性,如对多目标跟踪的支持不足、对非平稳运动的适应性差等。
开发一种既能有效跟踪单个目标物体,又能应对多种挑战的单目标跟踪算法具有重要的理论和实际意义。
1.2 研究目的本研究旨在设计一种基于多层特征嵌入的单目标跟踪算法,以提高目标跟踪的准确性和鲁棒性。
自组织特征映射神经网络研究与应用
自组织特征映射神经网络研究与应用自组织特征映射神经网络,又称Kohonen网络,在机器学习领域中具有广泛的研究和应用价值。
它是由芬兰科学家Teuvo Kohonen于1982年提出的,用来解决模式分类和聚类问题。
本文将分别从网络结构、学习规则、应用场景等多个角度来介绍自组织特征映射神经网络的研究与应用。
一、网络结构自组织特征映射神经网络是一种有两层或多层的神经元组成的全连接网络,其特点是每个神经元与输入节点全连接,但只有部分神经元与输出节点连接,这些与输出节点相连接的神经元被称作胜者神经元。
胜者神经元的选择根据输入数据与神经元之间的权值距离进行,即越接近输入数据的神经元越容易胜出。
自组织特征映射神经网络的网络结构简单,但它可以通过适当调整参数,从而实现多种复杂的函数映射。
在具体应用中,还可以采用层级结构的自组织特征映射神经网络,对于复杂的数据集,可以通过层层处理,逐步提取其更高层次的特征。
二、学习规则自组织特征映射神经网络的学习规则是基于竞争性学习的,其原理是将输入数据投影到高维空间中的低维网格上,使其可以进行分类和聚类。
其学习过程中所用的算法有两种:批处理算法和在线算法。
批处理算法在每个Epoth后,在一个批次中对全部样本进行训练,并更新权值,从而可以获得更稳定的结果,但训练时间较长。
而在线算法则是对每个样本逐个进行学习,因此训练速度较快,但结果相对不稳定。
在学习过程中,自组织特征映射神经网络会通过不断调整权值,形成特征抽取与分类能力强的模型。
其学习的结果可以通过可视化方式,将数据点在网格上的分布呈现出来,形成热图的形式,便于分析与理解。
三、应用场景自组织特征映射神经网络在数据挖掘、图像处理、生物信息学等领域都有着广泛的应用。
在图像处理领域中,可以通过自组织特征映射神经网络对图像进行压缩和分类。
在数据挖掘方面,自组织特征映射神经网络可用于数据聚类和数据可视化。
通过自组织特征映射神经网络,大量数据可以被投射到低维空间,并形成可视化热图,从而能够更好地理解数据的分布规律。
From Data Mining to Knowledge Discovery in Databases
s Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media atten-tion of late. What is all the excitement about?This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges in-volved in real-world applications of knowledge discovery, and current and future research direc-tions in the field.A cross a wide variety of fields, data arebeing collected and accumulated at adramatic pace. There is an urgent need for a new generation of computational theo-ries and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data. These theories and tools are the subject of the emerging field of knowledge discovery in databases (KDD).At an abstract level, the KDD field is con-cerned with the development of methods and techniques for making sense of data. The basic problem addressed by the KDD process is one of mapping low-level data (which are typically too voluminous to understand and digest easi-ly) into other forms that might be more com-pact (for example, a short report), more ab-stract (for example, a descriptive approximation or model of the process that generated the data), or more useful (for exam-ple, a predictive model for estimating the val-ue of future cases). At the core of the process is the application of specific data-mining meth-ods for pattern discovery and extraction.1This article begins by discussing the histori-cal context of KDD and data mining and theirintersection with other related fields. A briefsummary of recent KDD real-world applica-tions is provided. Definitions of KDD and da-ta mining are provided, and the general mul-tistep KDD process is outlined. This multistepprocess has the application of data-mining al-gorithms as one particular step in the process.The data-mining step is discussed in more de-tail in the context of specific data-mining al-gorithms and their application. Real-worldpractical application issues are also outlined.Finally, the article enumerates challenges forfuture research and development and in par-ticular discusses potential opportunities for AItechnology in KDD systems.Why Do We Need KDD?The traditional method of turning data intoknowledge relies on manual analysis and in-terpretation. For example, in the health-careindustry, it is common for specialists to peri-odically analyze current trends and changesin health-care data, say, on a quarterly basis.The specialists then provide a report detailingthe analysis to the sponsoring health-care or-ganization; this report becomes the basis forfuture decision making and planning forhealth-care management. In a totally differ-ent type of application, planetary geologistssift through remotely sensed images of plan-ets and asteroids, carefully locating and cata-loging such geologic objects of interest as im-pact craters. Be it science, marketing, finance,health care, retail, or any other field, the clas-sical approach to data analysis relies funda-mentally on one or more analysts becomingArticlesFALL 1996 37From Data Mining to Knowledge Discovery inDatabasesUsama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth Copyright © 1996, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1996 / $2.00areas is astronomy. Here, a notable success was achieved by SKICAT ,a system used by as-tronomers to perform image analysis,classification, and cataloging of sky objects from sky-survey images (Fayyad, Djorgovski,and Weir 1996). In its first application, the system was used to process the 3 terabytes (1012bytes) of image data resulting from the Second Palomar Observatory Sky Survey,where it is estimated that on the order of 109sky objects are detectable. SKICAT can outper-form humans and traditional computational techniques in classifying faint sky objects. See Fayyad, Haussler, and Stolorz (1996) for a sur-vey of scientific applications.In business, main KDD application areas includes marketing, finance (especially in-vestment), fraud detection, manufacturing,telecommunications, and Internet agents.Marketing:In marketing, the primary ap-plication is database marketing systems,which analyze customer databases to identify different customer groups and forecast their behavior. Business Week (Berry 1994) estimat-ed that over half of all retailers are using or planning to use database marketing, and those who do use it have good results; for ex-ample, American Express reports a 10- to 15-percent increase in credit-card use. Another notable marketing application is market-bas-ket analysis (Agrawal et al. 1996) systems,which find patterns such as, “If customer bought X, he/she is also likely to buy Y and Z.” Such patterns are valuable to retailers.Investment: Numerous companies use da-ta mining for investment, but most do not describe their systems. One exception is LBS Capital Management. Its system uses expert systems, neural nets, and genetic algorithms to manage portfolios totaling $600 million;since its start in 1993, the system has outper-formed the broad stock market (Hall, Mani,and Barr 1996).Fraud detection: HNC Falcon and Nestor PRISM systems are used for monitoring credit-card fraud, watching over millions of ac-counts. The FAIS system (Senator et al. 1995),from the U.S. Treasury Financial Crimes En-forcement Network, is used to identify finan-cial transactions that might indicate money-laundering activity.Manufacturing: The CASSIOPEE trou-bleshooting system, developed as part of a joint venture between General Electric and SNECMA, was applied by three major Euro-pean airlines to diagnose and predict prob-lems for the Boeing 737. To derive families of faults, clustering methods are used. CASSIOPEE received the European first prize for innova-intimately familiar with the data and serving as an interface between the data and the users and products.For these (and many other) applications,this form of manual probing of a data set is slow, expensive, and highly subjective. In fact, as data volumes grow dramatically, this type of manual data analysis is becoming completely impractical in many domains.Databases are increasing in size in two ways:(1) the number N of records or objects in the database and (2) the number d of fields or at-tributes to an object. Databases containing on the order of N = 109objects are becoming in-creasingly common, for example, in the as-tronomical sciences. Similarly, the number of fields d can easily be on the order of 102or even 103, for example, in medical diagnostic applications. Who could be expected to di-gest millions of records, each having tens or hundreds of fields? We believe that this job is certainly not one for humans; hence, analysis work needs to be automated, at least partially.The need to scale up human analysis capa-bilities to handling the large number of bytes that we can collect is both economic and sci-entific. Businesses use data to gain competi-tive advantage, increase efficiency, and pro-vide more valuable services to customers.Data we capture about our environment are the basic evidence we use to build theories and models of the universe we live in. Be-cause computers have enabled humans to gather more data than we can digest, it is on-ly natural to turn to computational tech-niques to help us unearth meaningful pat-terns and structures from the massive volumes of data. Hence, KDD is an attempt to address a problem that the digital informa-tion era made a fact of life for all of us: data overload.Data Mining and Knowledge Discovery in the Real WorldA large degree of the current interest in KDD is the result of the media interest surrounding successful KDD applications, for example, the focus articles within the last two years in Business Week , Newsweek , Byte , PC Week , and other large-circulation periodicals. Unfortu-nately, it is not always easy to separate fact from media hype. Nonetheless, several well-documented examples of successful systems can rightly be referred to as KDD applications and have been deployed in operational use on large-scale real-world problems in science and in business.In science, one of the primary applicationThere is an urgent need for a new generation of computation-al theories and tools toassist humans in extractinguseful information (knowledge)from the rapidly growing volumes ofdigital data.Articles38AI MAGAZINEtive applications (Manago and Auriol 1996).Telecommunications: The telecommuni-cations alarm-sequence analyzer (TASA) wasbuilt in cooperation with a manufacturer oftelecommunications equipment and threetelephone networks (Mannila, Toivonen, andVerkamo 1995). The system uses a novelframework for locating frequently occurringalarm episodes from the alarm stream andpresenting them as rules. Large sets of discov-ered rules can be explored with flexible infor-mation-retrieval tools supporting interactivityand iteration. In this way, TASA offers pruning,grouping, and ordering tools to refine the re-sults of a basic brute-force search for rules.Data cleaning: The MERGE-PURGE systemwas applied to the identification of duplicatewelfare claims (Hernandez and Stolfo 1995).It was used successfully on data from the Wel-fare Department of the State of Washington.In other areas, a well-publicized system isIBM’s ADVANCED SCOUT,a specialized data-min-ing system that helps National Basketball As-sociation (NBA) coaches organize and inter-pret data from NBA games (U.S. News 1995). ADVANCED SCOUT was used by several of the NBA teams in 1996, including the Seattle Su-personics, which reached the NBA finals.Finally, a novel and increasingly importanttype of discovery is one based on the use of in-telligent agents to navigate through an infor-mation-rich environment. Although the ideaof active triggers has long been analyzed in thedatabase field, really successful applications ofthis idea appeared only with the advent of theInternet. These systems ask the user to specifya profile of interest and search for related in-formation among a wide variety of public-do-main and proprietary sources. For example, FIREFLY is a personal music-recommendation agent: It asks a user his/her opinion of several music pieces and then suggests other music that the user might like (<http:// www.ffl/>). CRAYON(/>) allows users to create their own free newspaper (supported by ads); NEWSHOUND(<http://www. /hound/>) from the San Jose Mercury News and FARCAST(</> automatically search information from a wide variety of sources, including newspapers and wire services, and e-mail rele-vant documents directly to the user.These are just a few of the numerous suchsystems that use KDD techniques to automat-ically produce useful information from largemasses of raw data. See Piatetsky-Shapiro etal. (1996) for an overview of issues in devel-oping industrial KDD applications.Data Mining and KDDHistorically, the notion of finding useful pat-terns in data has been given a variety ofnames, including data mining, knowledge ex-traction, information discovery, informationharvesting, data archaeology, and data patternprocessing. The term data mining has mostlybeen used by statisticians, data analysts, andthe management information systems (MIS)communities. It has also gained popularity inthe database field. The phrase knowledge dis-covery in databases was coined at the first KDDworkshop in 1989 (Piatetsky-Shapiro 1991) toemphasize that knowledge is the end productof a data-driven discovery. It has been popular-ized in the AI and machine-learning fields.In our view, KDD refers to the overall pro-cess of discovering useful knowledge from da-ta, and data mining refers to a particular stepin this process. Data mining is the applicationof specific algorithms for extracting patternsfrom data. The distinction between the KDDprocess and the data-mining step (within theprocess) is a central point of this article. Theadditional steps in the KDD process, such asdata preparation, data selection, data cleaning,incorporation of appropriate prior knowledge,and proper interpretation of the results ofmining, are essential to ensure that usefulknowledge is derived from the data. Blind ap-plication of data-mining methods (rightly crit-icized as data dredging in the statistical litera-ture) can be a dangerous activity, easilyleading to the discovery of meaningless andinvalid patterns.The Interdisciplinary Nature of KDDKDD has evolved, and continues to evolve,from the intersection of research fields such asmachine learning, pattern recognition,databases, statistics, AI, knowledge acquisitionfor expert systems, data visualization, andhigh-performance computing. The unifyinggoal is extracting high-level knowledge fromlow-level data in the context of large data sets.The data-mining component of KDD cur-rently relies heavily on known techniquesfrom machine learning, pattern recognition,and statistics to find patterns from data in thedata-mining step of the KDD process. A natu-ral question is, How is KDD different from pat-tern recognition or machine learning (and re-lated fields)? The answer is that these fieldsprovide some of the data-mining methodsthat are used in the data-mining step of theKDD process. KDD focuses on the overall pro-cess of knowledge discovery from data, includ-ing how the data are stored and accessed, howalgorithms can be scaled to massive data setsThe basicproblemaddressed bythe KDDprocess isone ofmappinglow-leveldata intoother formsthat might bemorecompact,moreabstract,or moreuseful.ArticlesFALL 1996 39A driving force behind KDD is the database field (the second D in KDD). Indeed, the problem of effective data manipulation when data cannot fit in the main memory is of fun-damental importance to KDD. Database tech-niques for gaining efficient data access,grouping and ordering operations when ac-cessing data, and optimizing queries consti-tute the basics for scaling algorithms to larger data sets. Most data-mining algorithms from statistics, pattern recognition, and machine learning assume data are in the main memo-ry and pay no attention to how the algorithm breaks down if only limited views of the data are possible.A related field evolving from databases is data warehousing,which refers to the popular business trend of collecting and cleaning transactional data to make them available for online analysis and decision support. Data warehousing helps set the stage for KDD in two important ways: (1) data cleaning and (2)data access.Data cleaning: As organizations are forced to think about a unified logical view of the wide variety of data and databases they pos-sess, they have to address the issues of map-ping data to a single naming convention,uniformly representing and handling missing data, and handling noise and errors when possible.Data access: Uniform and well-defined methods must be created for accessing the da-ta and providing access paths to data that were historically difficult to get to (for exam-ple, stored offline).Once organizations and individuals have solved the problem of how to store and ac-cess their data, the natural next step is the question, What else do we do with all the da-ta? This is where opportunities for KDD natu-rally arise.A popular approach for analysis of data warehouses is called online analytical processing (OLAP), named for a set of principles pro-posed by Codd (1993). OLAP tools focus on providing multidimensional data analysis,which is superior to SQL in computing sum-maries and breakdowns along many dimen-sions. OLAP tools are targeted toward simpli-fying and supporting interactive data analysis,but the goal of KDD tools is to automate as much of the process as possible. Thus, KDD is a step beyond what is currently supported by most standard database systems.Basic DefinitionsKDD is the nontrivial process of identifying valid, novel, potentially useful, and ultimate-and still run efficiently, how results can be in-terpreted and visualized, and how the overall man-machine interaction can usefully be modeled and supported. The KDD process can be viewed as a multidisciplinary activity that encompasses techniques beyond the scope of any one particular discipline such as machine learning. In this context, there are clear opportunities for other fields of AI (be-sides machine learning) to contribute to KDD. KDD places a special emphasis on find-ing understandable patterns that can be inter-preted as useful or interesting knowledge.Thus, for example, neural networks, although a powerful modeling tool, are relatively difficult to understand compared to decision trees. KDD also emphasizes scaling and ro-bustness properties of modeling algorithms for large noisy data sets.Related AI research fields include machine discovery, which targets the discovery of em-pirical laws from observation and experimen-tation (Shrager and Langley 1990) (see Kloes-gen and Zytkow [1996] for a glossary of terms common to KDD and machine discovery),and causal modeling for the inference of causal models from data (Spirtes, Glymour,and Scheines 1993). Statistics in particular has much in common with KDD (see Elder and Pregibon [1996] and Glymour et al.[1996] for a more detailed discussion of this synergy). Knowledge discovery from data is fundamentally a statistical endeavor. Statistics provides a language and framework for quan-tifying the uncertainty that results when one tries to infer general patterns from a particu-lar sample of an overall population. As men-tioned earlier, the term data mining has had negative connotations in statistics since the 1960s when computer-based data analysis techniques were first introduced. The concern arose because if one searches long enough in any data set (even randomly generated data),one can find patterns that appear to be statis-tically significant but, in fact, are not. Clearly,this issue is of fundamental importance to KDD. Substantial progress has been made in recent years in understanding such issues in statistics. Much of this work is of direct rele-vance to KDD. Thus, data mining is a legiti-mate activity as long as one understands how to do it correctly; data mining carried out poorly (without regard to the statistical as-pects of the problem) is to be avoided. KDD can also be viewed as encompassing a broader view of modeling than statistics. KDD aims to provide tools to automate (to the degree pos-sible) the entire process of data analysis and the statistician’s “art” of hypothesis selection.Data mining is a step in the KDD process that consists of ap-plying data analysis and discovery al-gorithms that produce a par-ticular enu-meration ofpatterns (or models)over the data.Articles40AI MAGAZINEly understandable patterns in data (Fayyad, Piatetsky-Shapiro, and Smyth 1996).Here, data are a set of facts (for example, cases in a database), and pattern is an expres-sion in some language describing a subset of the data or a model applicable to the subset. Hence, in our usage here, extracting a pattern also designates fitting a model to data; find-ing structure from data; or, in general, mak-ing any high-level description of a set of data. The term process implies that KDD comprises many steps, which involve data preparation, search for patterns, knowledge evaluation, and refinement, all repeated in multiple itera-tions. By nontrivial, we mean that some search or inference is involved; that is, it is not a straightforward computation of predefined quantities like computing the av-erage value of a set of numbers.The discovered patterns should be valid on new data with some degree of certainty. We also want patterns to be novel (at least to the system and preferably to the user) and poten-tially useful, that is, lead to some benefit to the user or task. Finally, the patterns should be understandable, if not immediately then after some postprocessing.The previous discussion implies that we can define quantitative measures for evaluating extracted patterns. In many cases, it is possi-ble to define measures of certainty (for exam-ple, estimated prediction accuracy on new data) or utility (for example, gain, perhaps indollars saved because of better predictions orspeedup in response time of a system). No-tions such as novelty and understandabilityare much more subjective. In certain contexts,understandability can be estimated by sim-plicity (for example, the number of bits to de-scribe a pattern). An important notion, calledinterestingness(for example, see Silberschatzand Tuzhilin [1995] and Piatetsky-Shapiro andMatheus [1994]), is usually taken as an overallmeasure of pattern value, combining validity,novelty, usefulness, and simplicity. Interest-ingness functions can be defined explicitly orcan be manifested implicitly through an or-dering placed by the KDD system on the dis-covered patterns or models.Given these notions, we can consider apattern to be knowledge if it exceeds some in-terestingness threshold, which is by nomeans an attempt to define knowledge in thephilosophical or even the popular view. As amatter of fact, knowledge in this definition ispurely user oriented and domain specific andis determined by whatever functions andthresholds the user chooses.Data mining is a step in the KDD processthat consists of applying data analysis anddiscovery algorithms that, under acceptablecomputational efficiency limitations, pro-duce a particular enumeration of patterns (ormodels) over the data. Note that the space ofArticlesFALL 1996 41Figure 1. An Overview of the Steps That Compose the KDD Process.methods, the effective number of variables under consideration can be reduced, or in-variant representations for the data can be found.Fifth is matching the goals of the KDD pro-cess (step 1) to a particular data-mining method. For example, summarization, clas-sification, regression, clustering, and so on,are described later as well as in Fayyad, Piatet-sky-Shapiro, and Smyth (1996).Sixth is exploratory analysis and model and hypothesis selection: choosing the data-mining algorithm(s) and selecting method(s)to be used for searching for data patterns.This process includes deciding which models and parameters might be appropriate (for ex-ample, models of categorical data are differ-ent than models of vectors over the reals) and matching a particular data-mining method with the overall criteria of the KDD process (for example, the end user might be more in-terested in understanding the model than its predictive capabilities).Seventh is data mining: searching for pat-terns of interest in a particular representa-tional form or a set of such representations,including classification rules or trees, regres-sion, and clustering. The user can significant-ly aid the data-mining method by correctly performing the preceding steps.Eighth is interpreting mined patterns, pos-sibly returning to any of steps 1 through 7 for further iteration. This step can also involve visualization of the extracted patterns and models or visualization of the data given the extracted models.Ninth is acting on the discovered knowl-edge: using the knowledge directly, incorpo-rating the knowledge into another system for further action, or simply documenting it and reporting it to interested parties. This process also includes checking for and resolving po-tential conflicts with previously believed (or extracted) knowledge.The KDD process can involve significant iteration and can contain loops between any two steps. The basic flow of steps (al-though not the potential multitude of itera-tions and loops) is illustrated in figure 1.Most previous work on KDD has focused on step 7, the data mining. However, the other steps are as important (and probably more so) for the successful application of KDD in practice. Having defined the basic notions and introduced the KDD process, we now focus on the data-mining component,which has, by far, received the most atten-tion in the literature.patterns is often infinite, and the enumera-tion of patterns involves some form of search in this space. Practical computational constraints place severe limits on the sub-space that can be explored by a data-mining algorithm.The KDD process involves using the database along with any required selection,preprocessing, subsampling, and transforma-tions of it; applying data-mining methods (algorithms) to enumerate patterns from it;and evaluating the products of data mining to identify the subset of the enumerated pat-terns deemed knowledge. The data-mining component of the KDD process is concerned with the algorithmic means by which pat-terns are extracted and enumerated from da-ta. The overall KDD process (figure 1) in-cludes the evaluation and possible interpretation of the mined patterns to de-termine which patterns can be considered new knowledge. The KDD process also in-cludes all the additional steps described in the next section.The notion of an overall user-driven pro-cess is not unique to KDD: analogous propos-als have been put forward both in statistics (Hand 1994) and in machine learning (Brod-ley and Smyth 1996).The KDD ProcessThe KDD process is interactive and iterative,involving numerous steps with many deci-sions made by the user. Brachman and Anand (1996) give a practical view of the KDD pro-cess, emphasizing the interactive nature of the process. Here, we broadly outline some of its basic steps:First is developing an understanding of the application domain and the relevant prior knowledge and identifying the goal of the KDD process from the customer’s viewpoint.Second is creating a target data set: select-ing a data set, or focusing on a subset of vari-ables or data samples, on which discovery is to be performed.Third is data cleaning and preprocessing.Basic operations include removing noise if appropriate, collecting the necessary informa-tion to model or account for noise, deciding on strategies for handling missing data fields,and accounting for time-sequence informa-tion and known changes.Fourth is data reduction and projection:finding useful features to represent the data depending on the goal of the task. With di-mensionality reduction or transformationArticles42AI MAGAZINEThe Data-Mining Stepof the KDD ProcessThe data-mining component of the KDD pro-cess often involves repeated iterative applica-tion of particular data-mining methods. This section presents an overview of the primary goals of data mining, a description of the methods used to address these goals, and a brief description of the data-mining algo-rithms that incorporate these methods.The knowledge discovery goals are defined by the intended use of the system. We can distinguish two types of goals: (1) verification and (2) discovery. With verification,the sys-tem is limited to verifying the user’s hypothe-sis. With discovery,the system autonomously finds new patterns. We further subdivide the discovery goal into prediction,where the sys-tem finds patterns for predicting the future behavior of some entities, and description, where the system finds patterns for presenta-tion to a user in a human-understandableform. In this article, we are primarily con-cerned with discovery-oriented data mining.Data mining involves fitting models to, or determining patterns from, observed data. The fitted models play the role of inferred knowledge: Whether the models reflect useful or interesting knowledge is part of the over-all, interactive KDD process where subjective human judgment is typically required. Two primary mathematical formalisms are used in model fitting: (1) statistical and (2) logical. The statistical approach allows for nondeter-ministic effects in the model, whereas a logi-cal model is purely deterministic. We focus primarily on the statistical approach to data mining, which tends to be the most widely used basis for practical data-mining applica-tions given the typical presence of uncertain-ty in real-world data-generating processes.Most data-mining methods are based on tried and tested techniques from machine learning, pattern recognition, and statistics: classification, clustering, regression, and so on. The array of different algorithms under each of these headings can often be bewilder-ing to both the novice and the experienced data analyst. It should be emphasized that of the many data-mining methods advertised in the literature, there are really only a few fun-damental techniques. The actual underlying model representation being used by a particu-lar method typically comes from a composi-tion of a small number of well-known op-tions: polynomials, splines, kernel and basis functions, threshold-Boolean functions, and so on. Thus, algorithms tend to differ primar-ily in the goodness-of-fit criterion used toevaluate model fit or in the search methodused to find a good fit.In our brief overview of data-mining meth-ods, we try in particular to convey the notionthat most (if not all) methods can be viewedas extensions or hybrids of a few basic tech-niques and principles. We first discuss the pri-mary methods of data mining and then showthat the data- mining methods can be viewedas consisting of three primary algorithmiccomponents: (1) model representation, (2)model evaluation, and (3) search. In the dis-cussion of KDD and data-mining methods,we use a simple example to make some of thenotions more concrete. Figure 2 shows a sim-ple two-dimensional artificial data set consist-ing of 23 cases. Each point on the graph rep-resents a person who has been given a loanby a particular bank at some time in the past.The horizontal axis represents the income ofthe person; the vertical axis represents the to-tal personal debt of the person (mortgage, carpayments, and so on). The data have beenclassified into two classes: (1) the x’s repre-sent persons who have defaulted on theirloans and (2) the o’s represent persons whoseloans are in good status with the bank. Thus,this simple artificial data set could represent ahistorical data set that can contain usefulknowledge from the point of view of thebank making the loans. Note that in actualKDD applications, there are typically manymore dimensions (as many as several hun-dreds) and many more data points (manythousands or even millions).ArticlesFALL 1996 43Figure 2. A Simple Data Set with Two Classes Used for Illustrative Purposes.。
基于半监督深度学习的语义分割
基于半监督深度学习的语义分割第一章:引言语义分割是计算机视觉领域的重要问题之一,旨在将图像中的每个像素标记为属于哪个语义类别。
该任务具有广泛的应用,包括自动驾驶、医学图像分析、图像编辑等。
然而,语义分割面临着像素级标注昂贵的问题,因为需求大量标记样本。
针对这一问题,半监督深度学习提供了一种有效的解决方法,可以利用未标记样本进行模型训练。
第二章:半监督深度学习简介2.1 深度学习概述深度学习是一种机器学习方法,通过多层神经网络模拟人脑的工作原理,具有强大的学习能力和表达能力。
2.2 监督学习与半监督学习监督学习利用有标签的数据进行模型训练,而半监督学习则既利用有标签的数据,也利用未标记的数据进行训练,提供更多的信息用于模型学习。
2.3 半监督深度学习算法半监督深度学习算法结合了深度学习和半监督学习的思想,可以通过利用未标记样本的特征信息来提高模型性能。
第三章:传统语义分割方法综述3.1 基于传统机器学习的方法传统语义分割方法通常基于手工提取的特征和分类器进行像素级别的分类,但难以捕捉到图像的高层语义信息。
3.2 基于深度学习的方法深度学习方法通过卷积神经网络(CNN)进行特征提取和像素分类,取得了显著的成果,但仍然需要大量的有标签样本。
第四章:半监督深度学习在语义分割中的应用4.1 基于半监督迁移学习的方法半监督迁移学习利用已标记样本的知识,将其迁移到未标记样本中,从而提高模型的泛化能力。
4.2 基于生成对抗网络的方法生成对抗网络(GAN)可以通过生成器和判别器的对抗过程,生成逼真的未标记样本,并利用这些样本进行模型训练。
4.3 基于自监督学习的方法自监督学习利用无监督的目标进行模型训练,例如通过图像旋转、颜色变换等方式生成伪标签,在无需人工标注的情况下进行模型训练。
第五章:实验与评估5.1 数据集与实验设置选择合适的数据集进行实验,并设置合理的实验参数,以评估半监督深度学习在语义分割中的性能。
5.2 实验结果与对比分析对比半监督深度学习方法与传统方法的性能差异,并进行结果分析,展示半监督深度学习在语义分割中的优势。
《基于语义分割的动态场景下3D稠密面元重建研究》范文
《基于语义分割的动态场景下3D稠密面元重建研究》篇一一、引言随着计算机视觉技术的不断发展,三维重建技术已经成为众多领域的研究热点。
在动态场景下,对3D稠密面元进行重建是一项具有挑战性的任务。
近年来,基于语义分割的3D稠密面元重建技术逐渐成为研究的重点。
本文旨在探讨基于语义分割的动态场景下3D稠密面元重建的研究,分析其技术原理、方法及挑战,并展望其未来的发展趋势。
二、背景及意义3D稠密面元重建是指从二维图像中恢复出三维空间中的面元结构。
在动态场景中,由于存在大量的动态目标和复杂的背景干扰,传统的重建方法往往难以取得理想的效果。
因此,如何准确地提取和重建出目标物体的三维面元结构成为亟待解决的问题。
基于语义分割的3D稠密面元重建技术可以通过分析图像中的语义信息,有效地提取出目标物体,从而更准确地实现三维重建。
因此,该技术具有重要的研究意义和应用价值。
三、技术原理及方法1. 语义分割技术语义分割是计算机视觉领域中的一项关键技术,其目的是将图像中的像素划分为不同的语义区域。
通过对图像进行语义分割,可以有效地提取出目标物体,为后续的三维重建提供基础。
目前,深度学习技术在语义分割领域取得了显著的成果,可以通过训练深度神经网络来提取图像中的特征信息,从而实现像素级别的分类。
2. 3D稠密面元重建技术3D稠密面元重建技术是通过分析二维图像中的特征信息,恢复出三维空间中的面元结构。
在动态场景下,由于存在大量的动态目标和复杂的背景干扰,传统的重建方法往往难以取得理想的效果。
基于语义分割的3D稠密面元重建技术可以通过提取出目标物体的语义信息,从而更准确地实现三维重建。
在具体实现过程中,需要利用多视角的图像信息,通过立体匹配、深度估计等技术手段,实现三维面元的稠密重建。
四、研究方法及实验结果本研究采用基于深度学习的语义分割技术和3D稠密面元重建技术相结合的方法。
首先,利用深度神经网络对图像进行语义分割,提取出目标物体。
然后,根据提取出的目标物体的语义信息,利用多视角的图像信息进行三维重建。
基于子模式的加权邻域极大边界准则的人脸识别
( co lfO t a-l t cl n o p  ̄ n i ei , nvri S h o pi l e r a dC m u r gn r g U i sto o c eci a E e n e yf
S a g af r c ne n c n l y S a g a 0 0 3 C ia) h n h io i c dT h oo , h n h i 0 9 , hn Se a e g 2
有代表 性 的是 P A 方法 ,线性判 别式 方法 ,独立 主成分分 析方 法 【,局 部保 留映射 算法[等 。和基 于 C J 6 J 7 1
全局 的方法 不同 ,基于子模 式的方 法从 人脸 图像 的各 个位置 提取 面部特 征进行 识别 ,该方 法也称局部 匹配 技术 。P nl d 人 最早 将基于 子模式 的方 法 引入 人脸识 别并 取得 了较好 的识 别结 果 。随 后 ,研 究人 员 J etn 等 a
技 术 中 ,利用人 脸特征 进行 身份识 别是最 自然 、最直接 和最友 好的手 段 。与其 它生物 特征识 别技术相 比 , 人 脸的获 取非 常容易 ,几乎 可以在被 采集对 象无 意识的状 态 下获取 人脸 图像 。 近年 来 ,人睑识 别得 到 了广泛 关注 ,研 究者提 出许 多人 脸识 别方法 。这 些方 法大致 可以 分为两 类 : j 基 于全局的 方法和 基 于子模 式的方 法I。基于全 局的 方法直 接使 用整 张人脸 图像 作为输 入进 行识 别 ,最具 j J
性 ,利 用邻接 点 的类信息 自适应 地计算每 个子块 的权 重用于 后续识 别 。其次 ,对每块 子 图像 采用加 权邻域
极大边界准则进行特征提取 ,该准则充分利用了数据的类信息,选择数据的邻域点最优重构系数用在 目 标
基于流形学习的高维空间分类器研究的开题报告
基于流形学习的高维空间分类器研究的开题报告1. 研究背景和意义在现实问题中,数据往往具有高维空间特性,并且通常不是线性可分的。
例如,在图像识别和自然语言处理领域,数据往往包含大量的特征。
基于流形学习的高维空间分类器可以有效地处理这些问题,具有广泛的应用价值。
因此,研究基于流形学习的高维空间分类器具有重要的理论和应用意义。
2. 研究内容和方法本研究将从以下三个方面进行探究:①针对高维空间数据分类问题,研究不同的流形学习方法,包括局部线性嵌入(LLE)、加权最近邻(WKNN)和拉普拉斯正则化嵌入(LRE)等方法,比较不同方法的性能。
②研究使用支持向量机(SVM)等传统分类算法和基于流形学习方法的分类器进行对比,并分析其准确性和复杂度。
③在实际应用中,研究如何利用基于流形学习的高维空间分类器解决图像识别和自然语言处理等问题。
3. 研究预期结果预计本研究将得出以下结论:①尽管不同的流形学习方法在处理高维空间数据分类问题方面有所不同,但其性能差异不是很大。
②基于流形学习的高维空间分类器与传统的分类算法相比,在性能和复杂度上具有明显优势。
③基于流形学习的高维空间分类器在图像识别和自然语言处理等领域具有广泛应用前景。
4. 研究计划和进度安排本研究计划如下:第一年:收集和了解基于流形学习的高维空间分类器的相关研究,了解和掌握流形学习和分类器的基本知识和方法,研究局部线性嵌入(LLE)、加权最近邻(WKNN)等流形学习方法。
第二年:进一步研究拉普拉斯正则化嵌入(LRE)等流形学习方法,并将不同方法与传统的分类算法进行对比,比较其准确性和复杂度。
第三年:针对实际应用问题,如图像识别和自然语言处理等,研究如何利用基于流形学习的高维空间分类器解决问题,并进行实验验证。
四年级:撰写论文,准备答辩。
基于尺度化凸壳的代价敏感学习
( c o l fElcr ncEn ie r g a d Auo t n. in Unv ri f e to i Te h oo y S h o e to i o g n e i n t ma i n o Gu l ie st o cr n c c n lg .Gul 4 0 4。C ia i y El in 5 1 0 i hn )
基 于 尺 度化 凸壳 的 代 价 敏 感 学 习
刘 振 丙
( 林 电子 科 技 大 学 电子 工 程 与 自动 化 学 院 , 西 桂 林 桂 广 5 10 ) 4 0 4
摘 要 : 改 变 类 分 布 思 想 的 启 发 , 用 最 新 的 最 大 间 隔 方 法— — 尺 度 化 凸 壳 方 法 米 解 决 代 价 敏 感 学 习 。该 方 法 可 受 采 以改 变 样 木 的 分 布 , 这 种 改 变 只 需 为 不 同 的类 赋 予 不 同 的 度 因 子 就 可 以= 现 。 实 验 结 果 表 明 , 度 化 凸 壳 方 U 立 = 尺 法水解代价敏感问题的有效性 , 其求 解 过 程也 非 常 简单 。 关键词 : 尺度 化 壳 ;代 价 敏 感 ;分类
v xh l a d i i l i . e ul n t smpi t s cy
Ke r s : e l d c n e u l o ts n i v ;ca sfc t n y wo d l s ae o v x h l;c s- e st e ls i a i i i o
利用稀疏自编码器的调制样式 识别算法
利用稀疏自编码器的调制样式识别算法利用稀疏自编码器(Sparse Auto-Encoder, SAE)可以实现对调制样式的识别,这种方法通常与Softmax分类器结合使用,以提高识别的准确性和效率。
以下是该算法的一些关键信息:1. 算法结构:该算法通过将稀疏自编码器与Softmax分类器级联构成识别系统。
稀疏自编码器负责从输入数据中提取特征,而Softmax分类器则用于根据提取的特征进行调制样式的分类。
2. 特征参数:为了利用信号幅度和相位所包含的调制样式信息,算法需要将复数信号预处理,并将两个高阶累积量特征参数的格雷码编码构成系统输入矢量。
3. 基本原理:稀疏自编码器是一种基于神经网络的自编码器模型,其目标是通过学习到的稀疏表示来重构输入数据。
与传统自编码器相比,稀疏自编码器引入了稀疏性惩罚项,以促使隐藏层神经元的激活更加稀疏。
这样可以更好地捕捉输入数据的重要特征。
4. 训练方法:稀疏自编码器的训练通常使用反向传播算法和梯度下降方法。
在反向传播过程中,首先计算重构误差,然后根据重构误差计算梯度,并更新网络参数。
为了实现稀疏性,还需要引入稀疏性惩罚项,通常使用L1正则化或KL散度来度量隐藏层神经元的稀疏激活程度。
5. 识别性能:这种基于特征参数和SAE的识别算法(FSAE算法)能够有效识别多种调制样式,如BPSK、QPSK、8PSK、16QAM、32QAM、16APSK和32APSK等。
6. 优势:该算法能够减少对复杂幅相信号识别所需的特征参数数量,提高识别率,尤其是在处理传统方法难以识别的复杂信号时表现出较高的性能。
综上所述,利用稀疏自编码器的调制样式识别算法是一种有效的信号处理技术,它通过深度学习的方法自动提取信号特征,并结合分类器进行高精度的调制样式识别。
这种方法在无线通信领域具有重要的应用价值,尤其在自动调制识别(Automatic Modulation Classification, AMC)系统中,可以显著提升系统的智能化水平。
基于低秩行为信息和多尺度卷积神经网络的人体行为识别方法
722
计算机应用
第 41 卷
流的研究方向是使用人工提取特征结合深度学习的方法[2-3], 然而由于人体行为本身的复杂性,且人体行为容易受到复杂 背景、遮挡、光线等环境因素的干扰,目前的特征提取方法大 多步骤繁琐,容易产生误差传递,且对动作相对缓慢或静止的 行为很难有效建模,此外单一尺度的卷积神经网络不能从多 角度充分描述人体行为特征,不利于最终的行为识别。
近年来,深度学习方法逐渐被应用到人体行为识别中,并 取得了更好的效果,具有代表性的方法有:1)Simonyan 等[8]提 出的双流卷积神经网络方法,该网络分为两条支路,一条支路 处理 RGB 图像,另一条支路处理光流图像,然后再联合训练, 最 后 将 两 条 支 路 识 别 结 果 按 权 重 融 合 。 2)TSN(Temporal Segment Network)[9],该 方 法 是 双 流 卷 积 神 经 网 络 的 改 进 网 络,用以解决双流卷积神经网络不能对长时间的视频进行建 模 的 问 题 。 Du 等[10]提 出 的 RPAN(Recurrent Pose-Attention Network),首先用 Two-Stream 的方法生成特征,然后引入姿态 注 意 机 制 ,最 后 用 长 短 期 记 忆(Long Short-Term Memory, LSTM) 网 络 完 成 行 为 识 别 。 3) C3D(3-Dimensional Convolution)[11],利用三维卷积核对行为视频进行处理。这些 模型被提出之后,又有很多学者在其基础上进行了改进并取 得了较好的效果[12]。深度学习的引入在一定程度上降低了对 人工提取行为特征的依赖,其识别效果特别是复杂背景下的
DOI:10. 11772/j. issn. 1001-9081. 2020060958
lda分类器 例题
lda分类器例题
摘要:
1.介绍拉格朗日正交多项式
2.拉格朗日正交多项式的性质
3.拉格朗日正交多项式的应用
4.结论
正文:
拉格朗日正交多项式是一种在数学和工程领域中广泛应用的数学工具。
这种多项式具有一些独特的性质,使其在许多方面都非常有用。
首先,让我们看看拉格朗日正交多项式的定义。
拉格朗日正交多项式是一组多项式,它们在特定的区间内是正交的。
这意味着,任何两个拉格朗日正交多项式在该区间内的乘积为零。
这个特性使得拉格朗日正交多项式在许多应用中非常有用,因为它们可以被用来构建一组基,这个基在特定区间内是线性无关的。
拉格朗日正交多项式的另一个重要性质是它们是唯一的。
也就是说,对于任何给定的区间,只有一组拉格朗日正交多项式与之对应。
这个性质使得拉格朗日正交多项式在许多应用中都具有确定性。
拉格朗日正交多项式的应用领域非常广泛。
它们在数值分析、工程设计、物理学和金融学等领域都有应用。
例如,在数值分析中,拉格朗日正交多项式可以用来求解微分方程,或者在有限元分析中用来构造基函数。
在工程设计中,拉格朗日正交多项式可以用来设计结构,以使其在特定条件下具有最优性能。
在物理学中,拉格朗日正交多项式可以用来描述系统的运动,而在金融学中,它们可以用来构建期权定价模型。
总的来说,拉格朗日正交多项式是一种非常有用的数学工具,具有独特的性质和广泛的应用。
《基于语义分割的动态场景下3D稠密面元重建研究》范文
《基于语义分割的动态场景下3D稠密面元重建研究》篇一一、引言随着计算机视觉技术的飞速发展,三维重建技术在众多领域中得到了广泛的应用。
在动态场景下,3D稠密面元重建尤为重要,它可以实现对复杂环境的细致刻画与高精度复现。
其中,语义分割技术的引入,极大地提升了3D面元重建的效率和精度。
本文旨在研究基于语义分割的动态场景下3D稠密面元重建的相关理论、方法及其实验结果。
二、相关技术背景1. 语义分割技术:语义分割是一种图像处理技术,其目的是将图像中的每个像素分配一个特定的标签,以便对图像中的不同物体进行区分。
在三维重建中,语义分割可以帮助我们更好地识别和分离场景中的不同物体。
2. 3D稠密面元重建:通过捕获和分析场景的深度信息,我们可以重建出三维空间的模型。
稠密面元重建旨在构建尽可能多的面元以更准确地表达三维场景的细节。
三、基于语义分割的动态场景下3D稠密面元重建方法1. 数据获取与预处理:通过RGB-D相机或深度相机等设备获取动态场景的图像序列及深度信息。
接着,进行图像预处理,包括去噪、配准等步骤。
2. 语义分割:利用深度学习算法对预处理后的图像进行语义分割,识别并提取出场景中的不同物体及其轮廓信息。
3. 三维信息提取:结合图像中的深度信息和物体轮廓信息,提取出三维空间中的面元信息。
4. 面元重建:根据提取的三维面元信息,进行稠密面元重建,以构建出详细的场景模型。
四、实验与分析1. 数据集与实验设置:本实验采用公共数据集对所提方法进行验证。
在实验中,我们设置了一系列参数和阈值,以确保实验的准确性和可靠性。
2. 实验结果与分析:我们通过对比传统的3D重建方法和基于语义分割的3D稠密面元重建方法,发现后者在动态场景下具有更高的重建精度和效率。
具体地,在语义分割的帮助下,我们可以更准确地识别和分离场景中的不同物体,从而更有效地提取和利用三维信息。
此外,我们还分析了不同参数和阈值对实验结果的影响,为后续研究提供了有益的参考。
deep learning-based cascade method -回复
deep learning-based cascade method -回复题目:基于深度学习的级联方法引言:随着深度学习在计算机视觉领域的快速发展,级联方法被广泛应用于目标检测、人脸识别等任务中。
本文将详细介绍深度学习基于级联方法的工作原理,并探讨其在提高检测准确性和降低误检率方面的应用和优势。
一、级联方法的基本概念级联方法是一种多阶段的目标检测方法,它通过级联地使用多个分类器来逐步过滤出目标区域。
这种方法的基本思想是通过利用级联结构,将目标检测任务拆解成多个简单的二分类器,从而提高整体系统的检测性能。
二、基于深度学习的级联方法传统的级联方法主要使用手工设计的特征,但随着深度学习的兴起,基于深度学习的级联方法逐渐成为主流。
深度学习通过自动学习适合任务的特征表示,避免了手工设计特征的繁重工作。
下面介绍基于深度学习的级联方法的基本流程:1. 数据集准备:首先需要准备一个标注好的数据集作为训练样本。
对于目标检测任务,通常需要标注目标的位置和类别信息。
2. 初始网络:构建一个初始的深度卷积神经网络用于目标检测。
可以使用现有的网络架构如VGG、ResNet等,或者根据实际情况设计新的网络结构。
3. 训练第一阶段分类器:将标注好的数据集输入到初始网络中进行训练,并在每个阶段的输出端加入一个二分类器用于区分目标和非目标。
4. 更新网络参数:首先通过正样本和负样本的分类误差来更新网络参数,在此基础上再通过反向传播算法对网络进行微调。
5. 增加层数和分类器:在第一阶段训练完成后,可以增加网络的层数和分类器的数量,以提高检测准确性。
6. 训练后续阶段分类器:使用第一阶段训练好的网络作为初始网络,继续进行下一阶段的分类器训练。
7. 级联分类器:将各个阶段训练好的分类器级联起来,构建一个多级的级联目标检测系统。
三、深度学习基于级联方法的优势深度学习基于级联方法在目标检测任务中具有许多优势:1. 准确性提高:深度学习能够自动学习适合任务的特征表示,相较于传统方法的手工设计特征,深度学习可以更好地捕捉目标的抽象特征,并提高检测准确性。
haar cascade classifier算法原理
Haar Cascade Classifier算法是一种计算机视觉算法,主要用于目标检测,例如人脸识别。
该算法的原理是基于特征分类的思想,通过训练大量的正负样本,得到一系列的分类器,然后将其应用于图像中,以检测目标。
Haar特征是该算法的核心,它是在图像的某个位置取一个矩形的小块,然后将这个矩形小块划分为黑色和白色两部分,并分别对两部分所覆盖的像素点(图像上的每个点称为一个像素)的灰度值求和。
最后用白色部分像素点灰度值的和减去黑色部分像素点灰度值的和,得到一个Haar特征的值。
在OpenCV级联分类器检测类CascadeClassifier中,使用Adaboost的方法+LBP、HOG、HAAR 进行目标检测。
Cascade器是通过将多个强分类器串联在一起,当样本满足所有分类器时,才能判别该样本是人脸。
使用Haar特征和级联分类器可以提高目标检测的准确率和效率。
以上信息仅供参考,如需了解更多信息,建议查阅相关文献或咨询专业技术人员。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Local Modelling in Classificationon Different Feature SubspacesGero Szepannek Claus WeihsFebruar2006Fachbereich StatistikUniversit¨a t DortmundAbstractSometimes one may be confronted with classification problems where classes are constituted of several subclasses that possess different distributions and therefore destroy accurate models of the entire classes as one similar group.An issue is modelling via local models of several subclasses.In this paper,a method is presented of how to handle such classification problems where the subclasses are furthermore characterized by different subsets of the variables.Situations are outlined and tested where such local models in different variable subspaces dramatically improve the classification error.1IntroductionIn order to minimize the misclassification error in a C−class classification prob-lem one aims at searching for a classification ruleˆc=arg maxP(c|x)(1)c=1,...,Cthat maximizes the conditional posterior probability given the observation x.It may be the case that a class c is composed of several”subclasses”with differ-ent distributions.For an accurate estimation of P(c|x)these subclasses have to1be modelled separately by local models.During this paper,we assume all the subclass-memberships in the training data to be known,whereas these member-ships in the test data-of course-are not known(else the class of the observation would also be given!).If the subclasses are not known in advance clustering meth-ods can be used to investigate if the data of some class is composed from several subgroups of data.We call k={1,...,K}the index of all subclasses.There is existing a(surjective) relationship f:{1,...,K}→{1,...,C}.Given the posterior probabilities of the membership of any of the subclasses P(k|x),the classification rule for any class c is given byˆc=arg maxc=1,...,CkI{c}(f(k))∗P(k|x)(2)Moreover,the subclasses may be characterized by different variables in the data. If size of training set is not very large,a variable selection may particularly be useful to model only such variables that are relevant to the classification problem. Example1Imagine the case of two classes A and B each consisting of two subclasses A i and B i,i=1,2.Let now the distribution of the subclasses in variable X f(X|A i)= f(X|B i),i=1,2.Figure1shows this example for subclasses being normally distributed with unit variance but differing meansμi.In such case subclasses A1 and B2can be discriminated,as can be subclasses A2and B1.For discrimination of the subclasses A1and B1as well as A2and B2this variable contributes no infor-mation and should therefore preferably be omitted.This reflection is summarized in the matrix of table1.Subclass A2B1B2A1(+)-+A2+-B1(+)Table1:+/−indicates whether variable X in example1serves for discrimination of two subclasses or not.Parentheses indicate the same(class c=A or B).Only half of the subclass-pairs can be discriminated in this variable.2024680.00.10.20.30.4Variable X D e n s i t y o f e t h s u b c l a s s e sSubclass A1Subclass A2Subclass B1Subclass B2Figure 1:Example of a ”2classes with 2subclasses each”problem as introduced in example 1.Only half of the subclasses can be separated by differing distribu-tions in this variable.If any preceeding variable selection in local modelling is desired,this usually has to be performed globally,since comparing local models in different variable sub-sets is a difficult task.This problem is outlined in Szepannek et al.(2006).Szepannek and Weihs (2006)proposed a method of pairwise variable selection[PVS].By this method,the simulated misclassification test-error in the well-known Waveform data set (see Breiman et al.,1984)for Linear Discriminant Analysis (which works quite well on this task)has been reduced from 20.02%to 16.96%(being bounded by 14.9%Bayes error from below).A K -class prob-lem is splitted into K (K −1)/2two-class-problems.For any of these class pairs a classification rule is built after some variable selection procedure.The result consists of K (K −1)/2classification models in a ”locally maximally reduced”variable space.Such classification of an observation leads to K (K −1)/2pairwise decisions,returning the same number of pair wise posterior probabilities.The remaining question consists in building a classification rule from these K (K −1)/2pair wise classifiers.3To solve this task a Pairwise Coupling algorithm can be used.It is described in Section2.If we perform such classification for the subclass-models k=1,...,K the desired classification can then be obtained by aggregating the subclass-posterior probabilities as in equation2.This procedure can be performed principally for any classification method returning posterior probabilities in combination with any meaningful method of variable selection.The following pseudo-code summarizes the steps of the suggested proceeding: Build.classification.model(data[containing the bels],f,classification.method,variable.selection.method)###f is the function as described above labelling the subclasses to the classes.1.For each pair of two subclasses do2.(a)Remove temporarily all observations that do not belong to one of bothsubclasses from data:return newdata.(b)Perform variable.selection.method on newdata:return subspace.of.subclass-pair.(c)Perform classification.method on newdata only consideringsubspace.of.subclass-pair:return model.of.subclass-pair.(d)Return subspace.of.subclass-pair and model.of.subclass-pair for thispair of two subclasses.3.Return the whole model consisting of:f and for all pairs of subclasses thesubspace.of.subclass-pair and model.of.subclass-pair.4Predict.class(new.object,subspaces.of.subclass-pairs,models.of.subclass-pairs,f)1.For each pair of subclasses do2.(a)Calculate the class pair wise posterior probabilities for new object as-suming the object being of in one of the actually considered two sub-classes according to model.of.subclass-pair on subspace.of.subclass-pair.(b)Return the subclass.pair.posterior.probabilities.e the Pairwise coupling algorithm to calculate the posterior probabilitiesfor all K subclasses from the set of all estimated pairs of conditionalsubclass.pair.posterior.probabilities,return:subclass.posterior.probabilities.4.Calculate the class.posterior.probabilities using the class-labelling functionf according to equation2.5.Return the predicted class c with maximal class.posterior.probability.The following section describes a solution to the question of gaining the vector of subclass-posterior probabilities form the pair wise classifications built on the different selected variable subsets.Section3briefly describes some variable se-lection methods that are used in the studies in this paper.In Section4,a simulation study is performed that shows possible benefit of such local variable reduction.In Section5,the method is applied to some real-world data.52Pairwise Coupling2.1DefinitionsWe now tackle the problem offinding posterior probabilities of a K-(sub)class classification problem given the posterior probabilities for all K(K−1)/2pair wise comparisons.Let us start with some definitions.Let p(x)=p=(p1,...,p K)be the vector of(unknown)posterior probabilities. p depends on the specific realization x.For simplicity in notation we will omit x. Assume the”true”conditional probabilities of a pair wise classification problemto be given byμij=P r(i|i∪j)=p ip i+p j(3)Let r ij denote the estimated posterior probabilities of the two-class problems.The aim is now tofind the vector of probabilities p i for a given set of values r ij. Example2:Given p=(0.7,0.2,0.1).Theμij can be calculated according to equation3and can be presented in a matrix:{μij}=⎛⎝.7/97/82/9.2/31/81/3.⎞⎠(4)Example3:The inverse problem does not necessarily have a proper solution,since there are only K−1free parameters but K(K−1)/2constraints.Consider{r ij}=⎛⎝.0.90.40.1.0.70.60.3.⎞⎠(5)where the row i contains the estimated conditional pairwise posterior probabilities r ij for class i.From Machine Learning,majority voting(”Which class wins most comparisons?”)is a well known approach to solve such problems.But here,it will not lead to a result since any class wins exactly one comparison.Intuitively, class1may be preferable since it dominates the comparisons the most clearly.62.2AlgorithmIn this section we present the Pairwise Coupling algorithm of Hastie and Tibshi-rani(1998)tofind p for a given set of r ij.They transform the problem into an it-erative optimization problem by introducing a criterion to measure thefit between the observed r ij and theˆμij,calculated from a possible solutionˆp.To measure the fit they define the weighted Kullback-Leibler distance:l(ˆp)=i<j n ijr ij∗logr ijˆμij+(1−r ij)∗log1−r ij1−ˆμij(6)n ij is the number of objects that fall into one of the classes i or j.The best solutionˆp of posterior probabilities is found as in Iterative Proportional Scaling(IPS)(for details on the IPS-method see e.g.Bishop,Fienberg and Hol-land,1975).The algorithm consists of the following three steps:1.Start with anyˆp and calculate allˆμij.2.Repeat until convergence i=(1,2,...,K,1,...):ˆp i←ˆp i∗j=in ij r ijj=in ijˆμij(7)renormalizeˆp and calculate the newˆμij3.Finally scale the solution toˆp←ˆpiˆp iMotivation of the algorithm:Hastie and Tibshirani(1998),show that l(p)increases at each step.For this rea-son,since it is bounded above by0,if there exists a proper solutionˆp providing ˆμij=r ij∀i=j,it will be found.Even if the choice of l(p)as optimization criterion is rather heuristic,it can be mo-tivated in the following way:consider a random variable n ij r ij,being the number of observations of class i among the n ij observations of class i and j.This random variable can be considered to be binomially distributed n ij r ij∼B(n ij,μij)with ”true”(unknown)parameterμij.Since the same(training)data is used for all pair wise estimates r ij,the r ij are not independent,but if they were,l(p)of equation6 would be equivalent to the log-likelihood of this model(see Bradley and Terry,71952).Then,maximizing l(p)would correspond to maximum-likelihood estima-tion forμij.Going back to example3,we obtainˆp=(0.47,0.25,0.28),a result being consis-tent with the intuition that class1may be slightly preferable.In Wu et al.(2004)several methods for multi-class probability by pairwise cou-pling algorithms are presented and compared.In the simulations of this paper,the method of Hastie and Tibshirani(1998)is used.3Validation of the principleIn this section,the suggested procedure of a subclass pair wise variable selection combined with Pairwise Coupling[PVS]is compared to classification using linear and quadratic Discriminant Analysis[LDA,QDA]with global variable subset selection.Variable selection:The method of variable selection in our implementation is a quite simple one.We used subclass pair-wise Kolmogorov-Smirnov tests(see Hajek,1969,pp.62–69) to check whether the distributions of two subclasses differ in a variable or not.For every subclass pair and every variable,the statisticD=maxx |F nk1(x)−F nk2(x)|(8)is calculated,where the F nk i (x)are the empirical distributions of subclass k i,i=1,2.A variable is taken into a pair wise model if its p value strongly indicates differing densities.Of course,any other variable selection could be used instead. Especially one could refer here to the stepclass method(see Weihs et al.,2005) which is a prediction orientated method of variable selection.Variables are in-cluded in the model if they improve some predefined measure like e.g.the mis-classification rate on the cross-validated data set.This method possesses the ad-vantage that it is adaptive to the specifics of any classification method.83.1Afirst exampleOurfirst example is chosen according to the introducing example1in Section1to again illustrate the problem.Data are simulated in3classes(`a3subclasses each) and8variables.Subclass k is distributed according to X∼N(2∗1.64∗e k,I)if k<9and X∼N(0,I),if k=9.Here e k represents the standard basis vector,0 is the0vector and I is the identity matrix.This means,two subclasses k=l,k,l<9differ in their distributions in only2 variables(k and l).Subclass9can be discriminated from any other class k only in variable k.Subclasses k=1to3are subclasses of class c=1.Subclasses k=4,5and6belong to class c=2,so do subclass k=7,8and9to class c=3. By construction,no variable can be omitted.For that reason,”global”variable se-lection will not remove any of the variables,using Linear Discriminant Analysis. Variable selection is especially useful if there are few training examples in the data for estimating the structure of the classes.If classes consist of several subclasses, the amount of available data is further reduced since there are more populations to befitted with the same amount of data.We therefore computed simulations with varying(equal)(sub)class sizes in the training data to investigate the effect of sparse data.In the test data each subclass contains50objects.Error rates are averaged over50repetitive simulations of the data set.The results are given in table2.size LDA QDA PVS(with LDA)40.186-0.15460.140-0.11080.123-0.096100.1120.4160.096150.0980.2400.087200.0950.1850.086500.0840.1050.079Table2:Averaged error rates of LDA,QDA and PVS at varying subclass sizes910203040500.080.100.120.140.160.18Objects per subclassM e a n c l a s s i f i c a t i o n e r r o rLDAPVS (LDA)Figure 2:Averaged error rates on test data in simulation 3.1.The QDA classification rules can only be build having enough data.Even at larger class sizes QDA error rates are still very high.The PVS approach shows systematically lower error rates on the test data than LDA with ”global”variable selection,especially if there are only few observations in the training data.For larger class sizes the differences of both methods in the error rates are still present but seem to vanish.3.2Differing variancesWe now extend the situation of the first example.In real life it may be possible that one is confronted with data where one of the (sub)classes is strongly concentrated in a specific variable.Of course,this class can be more easily identified by its realizations in this ing LDA will fail to detect this property by pooling all classes’covariances.We modelled this situation with data consisting of 3classes each consisting of 3subclasses (as in the previous example)in 9variables.Subclass k is distributed following X ∼N (2e k ,Σ)with Σbeing the identity except from (σ)kk :=0.1.An illustration of the phenomenon is given in Figure 3where the vertical line10−2024601234Unequal variances x d e n s i ty −202460.00.10.20.30.4Pooled variancesx d e n s i t yFigure 3:Example of unequal variances and their pooled estimators (by LDA).in the left plot indicates the wrong ’optimal decision’if wrongly assuming equal covariances as in the right plot.Intuitively,QDA seems to be more appropriate in this situation.The results for varying training data sizes are shown in Figure 4.sizeLDA QDA PVS (with QDA)100.2500.4530.177150.2260.2730.161200.2010.2180.151300.1820.1900.145500.1740.1710.1431000.1570.1510.133Table 3:Averaged error rates of LDA,QDA and PVS at varying class sizesAstonishingly,here LDA still shows smaller error rates than QDA.For QDA,there does not seem to be enough data.Both methods can be largely improved by a class pair wise variable selection using QDA.But note that such variable selec-tion simply using the KS-test statistic will fail to detect situations of correlation between variables.11204060801000.150.200.250.300.350.400.45Objects per subclassM e a n c l a s s i f i c a t i o n e r r o rLDAQDAPVS (QDA)Figure 4:Averaged error rates on test data in example 3.2.3.3Real world dataThe method is now applied to some real world data.The task is register classifi-cation (i.e.correct labelling into high and low pitch)of singers and instruments by pitch-independent features.As predictor variables characteristics of the funda-mental and the first 12harmonics are used.The fundamental [F 0]of a sound is exactly its pitch frequency,where the harmonics [F 1,F 2,...]are all integer mul-tiples of the fundamental frequency.The pitch-independent variables are the mass of the harmonics F 0to F 12and the width (number of fourier frequencies above some specified threshold in direct neighbourship to the harmonics in the normal-ized periodogram)without the information about its corresponding frequency.Figure 5illustrates the so-called voice print corresponding to the whole song “Tochter Zion”for a particular singer.For masses and widths boxplots are in-dicating variation over the involved tones (cp.Weihs and Ligges (2003)).For the analyses of this paper we use these characteristics of the voice print for individual tones per harmonic and singer or instrument.This classification problem may be an example for local modelling as it is de-120.00.10.20.30.40.5Mass F0F2F4F6F8F10F12050100150200250WidthF0F2F4F6F8F10F12Figure 5:V oice print of professional bass singer.scribed in the previous sections,since apart from the classes (namely:high and low register)and the 26variables also the subclass,i.e.the instrument-type,may influence the distribution of the data.For this reason,local modelling has already been shown here to improve the results.The data set consists of 432observations.The subclasses k :=(i,c ),i ∈{all instruments },c ∈{low,high }are all combinations of instrument i AND reg-ister c and contain between 9and 90observations.A detailed description of the classification problem as well as a description of the data set and the results of global and local modelling are described in Szepannek et al.(2005).In that paper Linear Discriminant Analysis and Decision Trees are used to build both local and global classification rules.It turned out that the best results are obtained using lo-cal LDA-classifiers.Several methods are derived to build classification rules from the local LDA-models for each instrument.The error rates (estimated by leave one out cross validation)have been improved up to 26.9%.Two of the winner-classification rules are briefly described here:The first one is referred to as average density rule .The estimated multivariate normal densities of the local instrument-subclasses as they are returned by LDA are summed up for the classes,leading to the classification rule:ˆc =arg max c kp (x |k )I {c }(f (k ))(9)where f (k )=f (i,c )=c is the function that labels the subclasses k =(i,c )to the corresponding classes c as it is introduced in Section 1and p (x |k )is the esti-mated density of the observation given the subclass k =(i,c ).Since comparing13densities on different variable subsets is questionable the local models here have to be built on a globally chosen variable subspace.The second method will be called global weighting of local posteriors.It makes use of the fact,that each of the instruments(i.e.the subclasses)appears in com-bination with all registers in an attribute-like manner and therefore an additional ”global”classification into the correct(unknown)instrument-subclass can be per-formed.Local LDA classification rules are built for every instrument separately. The obtained local posterior probabilities for the register of a new object are then weighted by some global weights that are gained by the posterior probabilities of the”global”classification into the instrument-subclass.The classification rule canbe described byˆc=arg maxciP(c|i,x)∗P(i|x)(10)which is an applicaton of Bayes’theorem.i here denotes the index of the subclass-attribute(instrument).This method turned out to render the smallest obtained er-ror rate.The different local models(given the instrument-subclass)can be built on different variable subsets.But for calculation of the global classification posterior probabilities into the right instrument-subclass of course for all instruments the same variables have to be taken into account.For comparison,an analysis has been performed using external knowledge about the instrument for the prediction(i.e.an object is classified with respect to the correct local model).Using this extra information the error rates can be improved up to15%which can be considered as a”lower bound”for the error rates. While the average density rule does not allow modelling on different variable subsets,the method of global weighting of local posteriors does allow models on different feature subsets for different instruments but for the global instrument-classification for all instruments the variables must be the same.For application of this method,it is necessary that the subclasses possess an attribute-like structure. Implementing the PVS method,leads to pairwise comparisons of any combina-tions(i,c)of instrument and register on possibly differing variables.Using now the PVS approach(with LDA)one observes a further slight improve-ment of the error rate up to24.3%.A summary of the different modelling results given in table4.14method l1o error rateglobal LDA0.345average density rule0.301global weighting of local posteriors0.269PVS(with LDA)0.243”lower bound”0.150 Table4:Leave one out cross validated error rates for the different methodsRemark:Relationship between the PVS-method and the’winner model’By definition the conditional probability of register,given instrument(and obser-vation x)is given byP(c|i,x)=P(i,c|x)P(i|x)(11)This changes the classification rule of the”winner model”of global weighting of local posteriors in equation10intoˆc=arg maxciP(c,i|x)(12)Using the function f(k)=f(i,c)=c as it is defined above,then our classificationrule becomesˆc=arg maxc(i,c∗)I{c}(f(i,c∗))P(c∗,i|x)(13)This classification rule is of the same form as it is introduced in equation2in Section1for local modelling by the PVS approach.It can be seen,that in both methods modelling is essentially done in the same way.The difference is in esti-mating the local membership probabilities.The PVS method here only uses those variables that are important for decision between two subclasses.This explains why the result of the winner rule is even slightly improved by using the proposed method.Additionally,the proposed PVS method is moreflexible since it can also be ap-plied to subclasses that do not possess an attribute-like character as the subclasses in the example do.154SummaryThe problem is tackled to perform local modelling for classification where the variable subspaces of the different local models can differ.An approach of pair-wise variable selection[PVS]is suggested to perform the maximal possible vari-able selection by splitting a K-subclass classification problem into K(K−1)/2 subclass pair-wise classification problems.An algorithm is presented to build a classification rule from the results using this method.This principle can be applied to any classification method returning class-membership posterior probabilities in combination with any(meaningful)variable selection procedure.Situations are outlined where such proceeding is strongly beneficial.The method is investigated on different simulated and real world data sets using(linear and quadratic)Discriminant Analysis and the results are compared to their original results using global variable selection.Gain in classification error rate can be no-ticed,especially if the number of observations is not very large. Additionally,the pairwise variable subset selection can give interpretational in-sight into which features characterize the differences between two(sub)classes. On the other hand,the computation time grows since there have to be built K(K−1)/2classification models.Furthermore,the classification rule of each object has to be iteratively evaluated by the Pairwise Coupling algorithm. Acknowledgment.This work has been supported by the Collaborative Research Center‘Reduction of Complexity in Multivariate Data Structures’(SFB475)of the German Research Foundation(DFG).ReferencesBISHOP,Y.,FIENBERG,S.and HOLLAND,P.(1975):Discrete multivariate analysis,MIT Press,Cambridge.BRADLEY,R.and TERRY,M.(1952):The rank analysis of incomplete block designs,i.the method of paired comparisons,Bimometrics,324–345.BREIMAN,L.FRIEDMAN,J.,OLSHEN,R.and STONE,C.(1984):Classifi-cation and regression trees.Chapman&Hall,NY.HAJEK,J.(1969):A course in nonparametric statistics.Holden Day,San Fran-cisco.16HASTIE,T.and TIBSHIRANI,R.(1998):Classification by Pairwise Coupling. Annals of Statistics,26(1),451–471.SZEPANNEK,G.,LIGGES,U.,LUEBKE,K.,RAABE,N.and WEIHS,C. (2005):Local Models in Register Classification by Timbre.Technical Report 47/2005,SFB475,Fachbereich Statistik,Universit¨a t Dortmund.SZEPANNEK,G.and WEIHS,C.(2006):Variable Selection for Discrimination of moren than two Classes where Data are Sparse.In:M.Spiliopoulou,R. Kruse,A.N¨u rnberger,C.Borgelt,W.Gaul(Eds.):From Data and Information Analysis to Knowledge Engineering,Springer-Verlag,Heidelberg(accepted).WEIHS,C.and LIGGES,U.(2003):V oice Prints as a Tool for Automatic Classi-fication of V ocal Performance.In:R.Kopiez,A.C.Lehmann,I.Wolther and C. Wolf(Eds.):Proceedings of the5th Triennial ESCOM Conference.Hannover University of Music and Drama,Germany,8-13September2003,332–335. WEIHS,C.,LIGGES,U.,LUEBKE,K.and RAABE,N.(2005):klaR Analyz-ing German Business Cycles.In:D.Baier,R.Becker and L.Schmidt-Thieme (Eds.):Data Analysis and Decision Support,Springer,Berlin,335–343.WU,T.-F.,LIN,C.-J.and WENG,R.(2004):Probability Estimates for Multi-class Classification by Pairwise Coupling.Journal of Machine Learning Re-search,5,975–1005.17。