Semi-supervised Clustering with Limited Background Knowledge
超融合解决方案完整版本
超融合解决方案V3.0 超融合解决方案目录第1章、前言 51.1IT时代的变革 (5)1.2白皮书总览 (7)第2章、超融合技术架构 (8)1.1超融合架构概述 (8)1.1.1超融合架构的定义 (8)1.2超融合架构组成模块 (8)1.2.1.1系统总体架构 (8)1.2.1.2aSV计算虚拟化平台 (9)1.2.1.2.1概述 (9)1.2.1.2.2aSV技术原理 (10)1.2.1.2.2.1 aSV的Hypervisor架构 (12)1.2.1.2.2.2 Hypervisor虚拟化实现 (16)1.2.1.2.3aSV的技术特性 (24)1.2.1.2.3.1 内存NUMA技术 (24)1.2.1.2.3.2 SR-IOV (25)1.2.1.2.3.3 Faik-raid (27)1.2.1.2.3.4 虚拟机生命周期管理 (28)1.2.1.2.3.5 虚拟交换机 (29)1.2.1.2.3.6 动态资源调度 (29)1.2.1.2.4aSV的特色技术 (30)1.2.1.2.4.1 快虚 (30)1.2.1.2.4.2 虚拟机热迁移 (31)1.2.1.2.4.3 虚拟磁盘加密 (32)1.2.1.2.4.4 虚拟机的HA (33)1.2.1.2.4.5 多USB映射 (34)1.2.1.3aSAN存储虚拟化 (35)1.2.1.3.1存储虚拟化概述 (35)1.2.1.3.1.1 虚拟后对存储带来的挑战 (35)1.2.1.3.1.2 分布式存储技术的发展 (36)1.2.1.3.1.3 aSAN概述 (36)1.2.1.3.2aSAN技术原理 (37)1.2.1.3.2.1 主机管理 (37)1.2.1.3.2.2 文件副本 (37)1.2.1.3.2.3 磁盘管理 (39)1.2.1.3.2.4 SSD读缓存原理 (41)1.2.1.3.2.5 SSD写缓存原理 (47)1.2.1.3.2.6 磁盘故障处理机制 (51)1.2.1.3.3aSAN功能特性 (63)1.2.1.3.3.1 存储精简配置 (63)1.2.1.3.3.2 aSAN私网链路聚合 (64)1.2.1.3.3.3 数据一致性检查 (65)1.2.1.4aNet网络虚拟化 (65)1.2.1.4.1网络虚拟化概述 (65)1.2.1.4.2aNET网络虚拟化技术原理 (66)1.2.1.4.2.1 SDN (66)1.2.1.4.2.2 NFV (67)1.2.1.4.2.3 aNet底层的实现 (68)1.2.1.4.3功能特性 (73)1.2.1.4.3.1 aSW分布式虚拟交换机 (73)1.2.1.4.3.2 aRouter (73)1.2.1.4.3.3 vAF (74)1.2.1.4.3.4 vAD (74)1.2.1.4.4aNet的特色技术 (75)1.2.1.4.4.1 网络探测功能 (75)1.2.1.4.4.2 全网流量可视 (75)1.2.1.4.4.3 所画即所得业务逻辑拓扑 (76)1.2.2超融合架构产品介绍 (76)1.2.2.1产品概述 (76)1.2.2.2产品定位 (77)第3章、超融合架构带来的核心价值 (78)1.1可靠性: (78)1.2安全性 (78)1.3灵活弹性 (78)1.4易操作性 (78)第4章、超融合架构最佳实践 (80)1.1 IT时代的变革20 世纪 90 年代,随着 Windows 的广泛使用及 Linux 服务器操作系统的出现奠定了 x86服务器的行业标准地位,然而 x86 服务器部署的增长带来了新的 IT 基础架构和运作难题,包括:基础架构利用率低、物理基础架构成本日益攀升、IT 管理成本不断提高以及对关键应用故障和灾难保护不足等问题。
基于密度的半监督复杂网络聚类算法
De n s i t y - b a s e d s e mi — s u p e r v i s e d c l u s t e r i n g a l g o r i t h m
摘 要 :针对 大多数 复杂网络 聚类算法不能有效利 用先验知 识的 问题 ,提 出 了一种基 于 密度 的半监督 复杂 网络 聚类 算法。 通过 已有的成对约束 关系及其 传递 性质发现 网络 中所有潜在 的约束 关系,以更充分地指 导聚类过 程;在 基 于密度 的聚类算 法基础上 ,综合考虑 节点之 间的可达性 以及 成对约束关 系,以发现 网络 中满足 连通性和 最大性的社 区结构 。将 实验 结果与 其 它算法进 行 比较 ,比较 结果表 明了该 算法能更加有效的利用先验知识来提 高聚类性能 。
i n c o mp le x n e t wo r k
M ENG Fa n — r on g,ZHA NG Ke - we i + ,ZHU Mu
( S c h o o l o f Co mp u t e r S c i e n c e a n d Te c h n o l o g y ,Ch i n a Un i v e r s i t y o f Mi n i n g a n d Te c h n o l o g y ,Xu z h o u 2 2 1 1 1 6,Ch i n a ) Ab s t r a c t : Ai mi n g a t t h e p r o b l e m t h a t mo s t o f t h e e x i s t i n g c l u s t e r i n g a l g o r i t h ms f o r c o mp l e x n e t wo r k s c a n n o t ma k e u s e o f t h e p r i o r i n f o r ma t i o n e f f e c t i v e l y,a d e n s i t y - b a s e d s e mi - s u p e r v i s e d c l u s t e r i n g a l g o r i t h m i s p r o p o s e d . Fi r s t l y ,a l l t h e h i d d e n p a i r s o f c o n s t r a i n t s a r e f o u n d b y t h e a l g o r i t h m v i a t h e e x i s t i n g p a i r s o f o n e s t o g e t h e r wi t h t h e i r t r a n s i t i v i t y t o ma k e f u l l u s e o f p r i o r i n f o r — ma t i o n . Th e n,t h e c o mmu n i t y s t r u c t u r e ,s a t i s f y i n g c o n n e c t i v i t y a n d ma x i ma l i t y ,i s d i s c o v e r e d b y t h e r e a c h a b i l i t i e s b e t we e n n o d e s a n d a l l t h e p a i r s o f c o n s t r a i n t s .Ex p e r i me n t a l r e s u l t s c o mp a r e d wi t h o t h e r a l g o r i t h ms d e mo n s t r a t e t h a t t h e p r o p o s e d a l g o — r i t h m c a n u t i l i z e t h e s ma l l a mo u n t o f p r i o r i n f o r ma t i o n t O i mp r o v e t h e c l u s t e r i n g p e r f o r ma n c e . Ke y wo r d s :c o mp l e x n e t wo r k;c l u s t e r i n g;d e n s i t y - b a s e d ;s e mi - s u p e r v i s e d ;c o n s t r a i n t s
深度优先局部聚合哈希
Vol.48,No.6Jun. 202 1第48卷第6期2 0 2 1年6月湖南大学学报)自然科学版)Journal of Hunan University (Natural Sciences )文章编号:1674-2974(2021 )06-0058-09 DOI : 10.16339/ki.hdxbzkb.2021.06.009深度优先局艺B 聚合哈希龙显忠g,程成李云12(1.南京邮电大学计算机学院,江苏南京210023;2.江苏省大数据安全与智能处理重点实验室,江苏南京210023)摘 要:已有的深度监督哈希方法不能有效地利用提取到的卷积特征,同时,也忽视了数据对之间相似性信息分布对于哈希网络的作用,最终导致学到的哈希编码之间的区分性不足.为了解决该问题,提出了一种新颖的深度监督哈希方法,称之为深度优先局部聚合哈希(DeepPriority Local Aggregated Hashing , DPLAH ). DPLAH 将局部聚合描述子向量嵌入到哈希网络 中,提高网络对同类数据的表达能力,并且通过在数据对之间施加不同权重,从而减少相似性 信息分布倾斜对哈希网络的影响.利用Pytorch 深度框架进行DPLAH 实验,使用NetVLAD 层 对Resnet18网络模型输出的卷积特征进行聚合,将聚合得到的特征进行哈希编码学习.在CI-FAR-10和NUS-WIDE 数据集上的图像检索实验表明,与使用手工特征和卷积神经网络特征的非深度哈希学习算法的最好结果相比,DPLAH 的平均准确率均值要高出11%,同时,DPLAH 的平均准确率均值比非对称深度监督哈希方法高出2%.关键词:深度哈希学习;卷积神经网络;图像检索;局部聚合描述子向量中图分类号:TP391.4文献标志码:ADeep Priority Local Aggregated HashingLONG Xianzhong 1,覮,CHENG Cheng1,2,LI Yun 1,2(1. School of Computer Science & Technology ,Nanjing University of Posts and Telecommunications ,Nanjing 210023, China ;2. Key Laboratory of Jiangsu Big Data Security and Intelligent Processing ,Nanjing 210023, China )Abstract : The existing deep supervised hashing methods cannot effectively utilize the extracted convolution fea tures, but also ignore the role of the similarity information distribution between data pairs on the hash network, result ing in insufficient discrimination between the learned hash codes. In order to solve this problem, a novel deep super vised hashing method called deep priority locally aggregated hashing (DPLAH) is proposed in this paper, which em beds the vector of locally aggregated descriptors (VLAD) into the hash network, so as to improve the ability of the hashnetwork to express the similar data, and reduce the impact of similarity distribution skew on the hash network by im posing different weights on the data pairs. DPLAH experiment is carried out by using the Pytorch deep framework. Theconvolution features of the Resnet18 network model output are aggregated by using the NetVLAD layer, and the hashcoding is learned by using the aggregated features. The image retrieval experiments on the CIFAR-10 and NUS - WIDE datasets show that the mean average precision (MAP) of DPLAH is11 percentage points higher than that of* 收稿日期:2020-04-26基金项目:国家自然科学基金资助项目(61906098,61772284),National Natural Science Foundation of China(61906098, 61772284);国家重 点研发计划项目(2018YFB 1003702) , National Key Research and Development Program of China (2018YFB1003702)作者简介:龙显忠(1985—),男,河南信阳人,南京邮电大学讲师,工学博士,硕士生导师覮 通信联系人,E-mail : *************.cn第6期龙显忠等:深度优先局部聚合哈希59non-deep hash learning algorithms using manual features and convolution neural network features,and the MAP of DPLAH is2percentage points higher than that of asymmetric deep supervised hashing method.Key words:deep Hash learning;convolutional neural network;image retrieval;vector of locally aggregated de-scriptors(VLAD)随着信息检索技术的不断发展和完善,如今人们可以利用互联网轻易获取感兴趣的数据内容,然而,信息技术的发展同时导致了数据规模的迅猛增长.面对海量的数据以及超大规模的数据集,利用最近邻搜索[1(Nearest Neighbor Search,NN)的检索技术已经无法获得理想的检索效果与可接受的检索时间.因此,近年来,近似最近邻搜索[2(Approximate Nearest Neighbor Search,ANN)变得越来越流行,它通过搜索可能相似的几个数据而不再局限于返回最相似的数据,在牺牲可接受范围的精度下提高了检索效率.作为一种广泛使用的ANN搜索技术,哈希方法(Hashing)[3]将数据转换为紧凑的二进制编码(哈希编码)表示,同时保证相似的数据对生成相似的二进制编码.利用哈希编码来表示原始数据,显著减少了数据的存储和查询开销,从而可以应对大规模数据中的检索问题.因此,哈希方法吸引了越来越多学者的关注.当前哈希方法主要分为两类:数据独立的哈希方法和数据依赖的哈希方法,这两类哈希方法的区别在于哈希函数是否需要训练数据来定义.局部敏感哈希(Locality Sensitive Hashing,LSH)[4]作为数据独立的哈希代表,它利用独立于训练数据的随机投影作为哈希函数•相反,数据依赖哈希的哈希函数需要通过训练数据学习出来,因此,数据依赖的哈希也被称为哈希学习,数据依赖的哈希通常具有更好的性能.近年来,哈希方法的研究主要侧重于哈希学习方面.根据哈希学习过程中是否使用标签,哈希学习方法可以进一步分为:监督哈希学习和无监督哈希学习.典型的无监督哈希学习包括:谱哈希[5(Spectral Hashing,SH);迭代量化哈希[6](Iterative Quantization, ITQ);离散图哈希[7(Discrete Graph Hashing,DGH);有序嵌入哈希[8](Ordinal Embedding Hashing,OEH)等.无监督哈希学习方法仅使用无标签的数据来学习哈希函数,将输入的数据映射为哈希编码的形式.相反,监督哈希学习方法通过利用监督信息来学习哈希函数,由于利用了带有标签的数据,监督哈希方法往往比无监督哈希方法具有更好的准确性,本文的研究主要针对监督哈希学习方法.传统的监督哈希方法包括:核监督哈希[9](Supervised Hashing with Kernels,KSH);潜在因子哈希[10](Latent Factor Hashing,LFH);快速监督哈希[11](Fast Supervised Hashing,FastH);监督离散哈希[1(Super-vised Discrete Hashing,SDH)等.随着深度学习技术的发展[13],利用神经网络提取的特征已经逐渐替代手工特征,推动了深度监督哈希的进步.具有代表性的深度监督哈希方法包括:卷积神经网络哈希[1(Convolutional Neural Networks Hashing,CNNH);深度语义排序哈希[15](Deep Semantic Ranking Based Hash-ing,DSRH);深度成对监督哈希[16](Deep Pairwise-Supervised Hashing,DPSH);深度监督离散哈希[17](Deep Supervised Discrete Hashing,DSDH);深度优先哈希[18](Deep Priority Hashing,DPH)等.通过将特征学习和哈希编码学习(或哈希函数学习)集成到一个端到端网络中,深度监督哈希方法可以显著优于非深度监督哈希方法.到目前为止,大多数现有的深度哈希方法都采用对称策略来学习查询数据和数据集的哈希编码以及深度哈希函数.相反,非对称深度监督哈希[19](Asymmetric Deep Supervised Hashing,ADSH)以非对称的方式处理查询数据和整个数据库数据,解决了对称方式中训练开销较大的问题,仅仅通过查询数据就可以对神经网络进行训练来学习哈希函数,整个数据库的哈希编码可以通过优化直接得到.本文的模型同样利用了ADSH的非对称训练策略.然而,现有的非对称深度监督哈希方法并没有考虑到数据之间的相似性分布对于哈希网络的影响,可能导致结果是:容易在汉明空间中保持相似关系的数据对,往往会被训练得越来越好;相反,那些难以在汉明空间中保持相似关系的数据对,往往在训练后得到的提升并不显著.同时大部分现有的深度监督哈希方法在哈希网络中没有充分有效利用提60湖南大学学报(自然科学版)2021年取到的卷积特征.本文提出了一种新的深度监督哈希方法,称为深度优先局部聚合哈希(Deep Priority Local Aggregated Hashing,DPLAH).DPLAH的贡献主要有三个方面:1)DPLAH采用非对称的方式处理查询数据和数据库数据,同时DPLAH网络会优先学习查询数据和数据库数据之间困难的数据对,从而减轻相似性分布倾斜对哈希网络的影响.2)DPLAH设计了全新的深度哈希网络,具体来说,DPLAH将局部聚合表示融入到哈希网络中,提高了哈希网络对同类数据的表达能力.同时考虑到数据的局部聚合表示对于分类任务的有效性.3)在两个大型数据集上的实验结果表明,DPLAH在实际应用中性能优越.1相关工作本节分别对哈希学习[3]、NetVLAD[20]和Focal Loss[21]进行介绍.DPLAH分别利用NetVLAD和Focal Loss提高哈希网络对同类数据的表达能力及减轻数据之间相似性分布倾斜对于哈希网络的影响. 1.1哈希学习哈希学习[3]的任务是学习查询数据和数据库数据的哈希编码表示,同时要满足原始数据之间的近邻关系与数据哈希编码之间的近邻关系相一致的条件.具体来说,利用机器学习方法将所有数据映射成{0,1}r形式的二进制编码(r表示哈希编码长度),在原空间中不相似的数据点将被映射成不相似)即汉明距离较大)的两个二进制编码,而原空间中相似的两个数据点将被映射成相似(即汉明距离较小)的两个二进制编码.为了便于计算,大部分哈希方法学习{-1,1}r形式的哈希编码,这是因为{-1,1}r形式的哈希编码对之间的内积等于哈希编码的长度减去汉明距离的两倍,同时{-1,1}r形式的哈希编码可以容易转化为{0,1}r形式的二进制编码.图1是哈希学习的示意图.经过特征提取后的高维向量被用来表示原始图像,哈希函数h将每张图像映射成8bits的哈希编码,使原来相似的数据对(图中老虎1和老虎2)之间的哈希编码汉明距离尽可能小,原来不相似的数据对(图中大象和老虎1)之间的哈希编码汉明距离尽可能大.h(大象)=10001010h(老虎1)=01100001h(老虎2)=01100101相似度尽可能小相似度尽可能大图1哈希学习示意图Fig.1Hashing learning diagram1.2NetVLADNetVLAD的提出是用于解决端到端的场景识别问题[20(场景识别被当作一个实例检索任务),它将传统的局部聚合描述子向量(Vector of Locally Aggregated Descriptors,VLAD[22])结构嵌入到CNN网络中,得到了一个新的VLAD层.可以容易地将NetVLAD 使用在任意CNN结构中,利用反向传播算法进行优化,它能够有效地提高对同类别图像的表达能力,并提高分类的性能.NetVLAD的编码步骤为:利用卷积神经网络提取图像的卷积特征;利用NetVLAD层对卷积特征进行聚合操作.图2为NetVLAD层的示意图.在特征提取阶段,NetVLAD会在最后一个卷积层上裁剪卷积特征,并将其视为密集的描述符提取器,最后一个卷积层的输出是H伊W伊D映射,可以将其视为在H伊W空间位置提取的一组D维特征,该方法在实例检索和纹理识别任务[23別中都表现出了很好的效果.NetVLAD layer(KxD)x lVLADvectorh------->图2NetVLAD层示意图⑷Fig.2NetVLAD layer diagram1201NetVLAD在特征聚合阶段,利用一个新的池化层对裁剪的CNN特征进行聚合,这个新的池化层被称为NetVLAD层.NetVLAD的聚合操作公式如下:NV((,k)二移a(x)(血⑺-C((j))(1)i=1式中:血(j)和C)(j)分别表示第i个特征的第j维和第k个聚类中心的第j维;恣&)表示特征您与第k个视觉单词之间的权.NetVLAD特征聚合的输入为:NetVLAD裁剪得到的N个D维的卷积特征,K个聚第6期龙显忠等:深度优先局部聚合哈希61类中心.VLAD的特征分配方式是硬分配,即每个特征只和对应的最近邻聚类中心相关联,这种分配方式会造成较大的量化误差,并且,这种分配方式嵌入到卷积神经网络中无法进行反向传播更新参数.因此,NetVLAD采用软分配的方式进行特征分配,软分配对应的公式如下:-琢II Xi-C*II 2=—e(2)-琢II X-Ck,II2k,如果琢寅+肄,那么对于最接近的聚类中心,龟&)的值为1,其他为0.aS)可以进一步重写为:w j X i+b ka(x i)=—e-)3)w J'X i+b kk,式中:W k=2琢C k;b k=-琢||C k||2.最终的NetVLAD的聚合表示可以写为:N w;x+b kv(j,k)=移—----(x(j)-Ck(j))(4)i=1w j.X i+b k移ek,1.3Focal Loss对于目标检测方法,一般可以分为两种类型:单阶段目标检测和两阶段目标检测,通常情况下,两阶段的目标检测效果要优于单阶段的目标检测.Lin等人[21]揭示了前景和背景的极度不平衡导致了单阶段目标检测的效果无法令人满意,具体而言,容易被分类的背景虽然对应的损失很低,但由于图像中背景的比重很大,对于损失依旧有很大的贡献,从而导致收敛到不够好的一个结果.Lin等人[21]提出了Focal Loss应对这一问题,图3是对应的示意图.使用交叉爛作为目标检测中的分类损失,对于易分类的样本,它的损失虽然很低,但数据的不平衡导致大量易分类的损失之和压倒了难分类的样本损失,最终难分类的样本不能在神经网络中得到有效的训练.Focal Loss的本质是一种加权思想,权重可根据分类正确的概率p得到,利用酌可以对该权重的强度进行调整.针对非对称深度哈希方法,希望难以在汉明空间中保持相似关系的数据对优先训练,具体来说,对于DPLAH的整体训练损失,通过施加权重的方式,相对提高难以在汉明空间中保持相似关系的数据对之间的训练损失.然而深度哈希学习并不是一个分类任务,因此无法像Focal Loss一样根据分类正确的概率设计权重,哈希学习的目的是学到保相似性的哈希编码,本文最终利用数据对哈希编码的相似度作为权重的设计依据具体的权重形式将在模型部分详细介绍.正确分类的概率图3Focal Loss示意图[21】Fig.3Focal Loss diagram12112深度优先局部聚合哈希2.1基本定义DPLAH模型采用非对称的网络设计.Q={0},=1表示n张查询图像,X={X i}m1表示数据库有m张图像;查询图像和数据库图像的标签分别用Z={Z i},=1和Y ={川1表示;i=[Z i1,…,zj1,i=1,…,n;c表示类另数;如果查询图像0属于类别j,j=1,…,c;那么z”=1,否则=0.利用标签信息,可以构造图像对的相似性矩阵S沂{-1,1}"伊”,s”=1表示查询图像q,和数据库中的图像X j语义相似,S j=-1表示查询图像和数据库中的图像X j语义不相似.深度哈希方法的目标是学习查询图像和数据库中图像的哈希编码,查询图像的哈希编码用U沂{-1,1}"",表示,数据库中图像的哈希编码用B沂{-1,1}m伊r表示,其中r表示哈希编码的长度.对于DPLAH模型,它在特征提取部分采用预训练好的Resnet18网络[25].图4为DPLAH网络的结构示意图,利用NetVLAD层聚合Resnet18网络提取到的卷积特征,哈希编码通过VLAD编码得到,由于VLAD编码在分类任务中被广泛使用,于是本文将NetVLAD层的输出作为分类任务的输入,利用图像的标签信息监督NetVLAD层对卷积特征的利用.事实上,任何一种CNN模型都能实现图像特征提取的功能,所以对于选用哪种网络进行特征学习并不是本文的重点.62湖南大学学报(自然科学版)2021年conv1图4DPLAH结构Fig.4DPLAH structure图像标签soft-max1,0,1,1,0□1,0,0,0,11,1,0,1,0---------*----------VLADVLAD core)c)l・>:i>数据库图像的哈希编码2.2DPLAH模型的目标函数为了学习可以保留查询图像与数据库图像之间相似性的哈希编码,一种常见的方法是利用相似性的监督信息S e{-1,1}n伊"、生成的哈希编码长度r,以及查询图像的哈希编码仏和数据库中图像的哈希编码b三者之间的关系[9],即最小化相似性的监督信息与哈希编码对内积之间的L损失.考虑到相似性分布的倾斜问题,本文通过施加权重来调节查询图像和数据库图像之间的损失,其公式可以表示为:min J=移移(1-w)(u T b j-rs)专,B i=1j=1s.t.U沂{-1,1}n伊r,B沂{-1,1}m伊r,W沂R n伊m(5)受FocalLoss启发,希望深度哈希网络优先训练相似性不容易保留图像对,然而Focal Loss利用图像的分类结果对损失进行调整,因此,需要重新进行设计,由于哈希学习的目的是为了保留图像在汉明空间中的相似性关系,本文利用哈希编码的余弦相似度来设计权重,其表达式为:1+。
文献 (10)Semi-supervised and unsupervised extreme learning
Semi-supervised and unsupervised extreme learningmachinesGao Huang,Shiji Song,Jatinder N.D.Gupta,and Cheng WuAbstract—Extreme learning machines(ELMs)have proven to be an efficient and effective learning paradigm for pattern classification and regression.However,ELMs are primarily applied to supervised learning problems.Only a few existing research studies have used ELMs to explore unlabeled data. In this paper,we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization,thus greatly expanding the applicability of ELMs.The key advantages of the proposed algorithms are1)both the semi-supervised ELM (SS-ELM)and the unsupervised ELM(US-ELM)exhibit the learning capability and computational efficiency of ELMs;2) both algorithms naturally handle multi-class classification or multi-cluster clustering;and3)both algorithms are inductive and can handle unseen data at test time directly.Moreover,it is shown in this paper that all the supervised,semi-supervised and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping,which is the key concept in ELM theory.Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.Index Terms—Clustering,embedding,extreme learning ma-chine,manifold regularization,semi-supervised learning,unsu-pervised learning.I.I NTRODUCTIONS INGLE layer feedforward networks(SLFNs)have been intensively studied during the past several decades.Most of the existing learning algorithms for training SLFNs,such as the famous back-propagation algorithm[1]and the Levenberg-Marquardt algorithm[2],adopt gradient methods to optimize the weights in the network.Some existing works also use forward selection or backward elimination approaches to con-struct network dynamically during the training process[3]–[7].However,neither the gradient based methods nor the grow/prune methods guarantee a global optimal solution.Al-though various methods,such as the generic and evolutionary algorithms,have been proposed to handle the local minimum This work was supported by the National Natural Science Foundation of China under Grant61273233,the Research Fund for the Doctoral Program of Higher Education under Grant20120002110035and20130002130010, the National Key Technology R&D Program under Grant2012BAF01B03, the Project of China Ocean Association under Grant DY125-25-02,and Tsinghua University Initiative Scientific Research Program under Grants 2011THZ07132.Gao Huang,Shiji Song,and Cheng Wu are with the Department of Automation,Tsinghua University,Beijing100084,China(e-mail:huang-g09@;shijis@; wuc@).Jatinder N.D.Gupta is with the College of Business Administration,The University of Alabama in Huntsville,Huntsville,AL35899,USA.(e-mail: guptaj@).problem,they basically introduce high computational cost. One of the most successful algorithms for training SLFNs is the support vector machines(SVMs)[8],[9],which is a maximal margin classifier derived under the framework of structural risk minimization(SRM).The dual problem of SVMs is a quadratic programming and can be solved conveniently.Due to its simplicity and stable generalization performance,SVMs have been widely studied and applied to various domains[10]–[14].Recently,Huang et al.[15],[16]proposed the extreme learning machines(ELMs)for training SLFNs.In contrast to most of the existing approaches,ELMs only update the output weights between the hidden layer and the output layer, while the parameters,i.e.,the input weights and biases,of the hidden layer are randomly generated.By adopting squared loss on the prediction error,the training of output weights turns into a regularized least squares(or ridge regression)problem which can be solved efficiently in closed form.It has been shown that even without updating the parameters of the hidden layer,the SLFN with randomly generated hidden neurons and tunable output weights maintains its universal approximation capability[17]–[19].Compared to gradient based algorithms, ELMs are much more efficient and usually lead to better generalization performance[20]–[22].Compared to SVMs, solving the regularized least squares problem in ELMs is also faster than solving the quadratic programming problem in standard SVMs.Moreover,ELMs can be used for multi-class classification problems directly.The predicting accuracy achieved by ELMs is comparable with or even higher than that of SVMs[16],[22]–[24].The differences and similarities between ELMs and SVMs are discussed in[25]and[26], and new algorithms are proposed by combining the advan-tages of both models.In[25],an extreme SVM(ESVM) model is proposed by combining ELMs and the proximal SVM(PSVM).The ESVM algorithm is shown to be more accurate than the basic ELMs model due to the introduced regularization technique,and much more efficient than SVMs since there is no kernel matrix multiplication in ESVM.In [26],the traditional RBF kernel are replaced by ELM kernel, leading to an efficient algorithm with matched accuracy of SVMs.In the past years,researchers from variesfields have made substantial contribution to ELM theories and applications.For example,the universal approximation ability of ELMs has been further studied in a classification context[23].The gen-eralization error bound of ELMs has been investigated from the perspective of the Vapnik-Chervonenkis(VC)dimension theory and the initial localized generalization error model(LGEM)[27],[28].Varies extensions have been made to the basic ELMs to make it more efficient and more suitable for specific problems,such as ELMs for online sequential data [29]–[31],ELMs for noisy/missing data[32]–[34],ELMs for imbalanced data[35],etc.From the implementation aspect, ELMs has recently been implemented using parallel tech-niques[36],[37],and realized on hardware[38],which made ELMs feasible for large data sets and real time reasoning. Though ELMs have become popular in a wide range of domains,they are primarily used for supervised learning tasks such as classification and regression,which greatly limits their applicability.In some cases,such as text classification, information retrieval and fault diagnosis,obtaining labels for fully supervised learning is time consuming and expensive, while a multitude of unlabeled data are easy and cheap to collect.To overcome the disadvantage of supervised learning al-gorithms that they cannot make use of unlabeled data,semi-supervised learning(SSL)has been proposed to leverage both labeled and unlabeled data[39],[40].The SSL algorithms assume that the input patterns from both labeled and unlabeled data are drawn from the same marginal distribution.Therefore, the unlabeled data naturally provide useful information for exploring the data structure in the input space.By assuming that the input data follows some cluster structure or manifold in the input space,SSL algorithms can incorporate both la-beled and unlabeled data into the learning process.Since SSL requires less effort to collect labeled data and can offer higher accuracy,it has been applied to various domains[41]–[43].In some other cases where no labeled data are available,people may be interested in exploring the underlying structure of the data.To this end,unsupervised learning(USL)techniques, such as clustering,dimension reduction or data representation, are widely used to fulfill these tasks.In this paper,we extend ELMs to handle both semi-supervised and unsupervised learning problems by introducing the manifold regularization framework.Both the proposed semi-supervised ELM(SS-ELM)and unsupervised ELM(US-ELM)inherit the computational efficiency and the learn-ing capability of traditional pared with existing algorithms,SS-ELM and US-ELM are not only inductive (straightforward extension for out-of-sample examples at test time),but also can be used for multi-class classification or multi-cluster clustering directly.We test our algorithms on a variety of data sets,and make comparisons with other related algorithms.The results show that the proposed algorithms are competitive with state-of-the-art algorithms in terms of accuracy and efficiency.It is worth to mention that all the supervised,semi-supervised and unsupervised ELMs can actually be put into a unified framework,that is all the algorithms consist of two stages:1)random feature mapping;and2)output weights solving.Thefirst stage is to construct the hidden layer using randomly generated hidden neurons.This is the key concept in the ELM theory,which differs it from many existing feature learning methods.Generating feature mapping randomly en-ables ELMs for fast nonlinear feature learning and alleviates the problem of over-fitting.The second stage is to solve the weights between the hidden layer and the output layer, and this is where the main difference of supervised,semi-supervised and unsupervised ELMs lies.We believe that the unified framework for the three types of ELMs might provide us a new perspective to understand the underlying behavior of the random feature mapping in ELMs.The rest of the paper is organized as follows.In Section II,we give a brief review of related existing literature on semi-supervised and unsupervised learning.Section III and IV introduce the basic formulation of ELMs and the man-ifold regularization framework,respectively.We present the proposed SS-ELM and US-ELM algorithms in Sections V and VI.Experiment results are given in Section VII,and Section VIII concludes the paper.II.R ELATED WORKSOnly a few existing research studies on ELMs have dealt with the problem of semi-supervised learning or unsupervised learning.In[44]and[45],the manifold regularization frame-work was introduce into the ELMs model to leverage both labeled and unlabeled data,thus extended ELMs for semi-supervised learning.However,both of these two works are limited to binary classification problems,thus they haven’t explore the full power of ELMs.Moreover,both algorithms are only effective when the number of training patterns is more than the number of hidden neurons.Unfortunately,this condition is usually violated in semi-supervised learning since the training data is relatively scarce compared to the hidden neurons,whose number is commonly set to several hundreds or several thousands.Recently,a co-training approach have been proposed to train ELMs in a semi-supervised setting [46].In this algorithm,the labeled training sets are augmented gradually by moving a small set of most confidently predicted unlabeled data to the labeled set at each loop,and ELMs are trained repeatedly on the pseudo-labeled set.Since the algo-rithm need to train ELMs repeatedly,it introduces considerable extra computational cost.The proposed SS-ELM is related to a few other mani-fold assumption based semi-supervised learning algorithms, such as the Laplacian support vector machines(LapSVMs) [47],the Laplacian regularized least squares(LapRLS)[47], semi-supervised neural networks(SSNNs)[48],and semi-supervised deep embedding[49].It has been shown in these works that manifold regularization is effective in a wide range of domains and often leads to a state-of-the-art performance in terms of accuracy and efficiency.The US-ELM proposed in this paper are related to the Laplacian Eigenmaps(LE)[50]and spectral clustering(SC) [51]in that they both use spectral techniques for embedding and clustering.In all these algorithms,an affinity matrix is first built from the input patterns.The SC performs eigen-decomposition on the normalized affinity matrix,and then embeds the original data into a d-dimensional space using the first d eigenvectors(each row is normalized to have unit length and represents a point in the embedded space)corresponding to the d largest eigenvalues.The LE algorithm performs generalized eigen-decomposition on the graph Laplacian,anduses the d eigenvectors corresponding to the second through the(d+1)th smallest eigenvalues for embedding.When LE and SC are used for clustering,then k-means is adopted to cluster the data in the embedded space.Similar to LE and SC,the US-ELM are also based on the affinity matrix,and it is converted to solving a generalized eigen-decomposition problem.However,the eigenvectors obtained in US-ELM are not used for data representation directly,but are used as the parameters of the network,i.e.,the output weights.Note that once the US-ELM model is trained,it can be applied to any presented data in the original input space.In this way,US-ELM provide a straightforward way for handling new patterns without recomputing eigenvectors as in LE and SC.III.E XTREME LEARNING MACHINES Consider a supervised learning problem where we have a training set with N samples,{X,Y}={x i,y i}N i=1.Herex i∈R n i,y i is a n o-dimensional binary vector with only one entry(correspond to the class that x i belongs to)equal to one for multi-classification tasks,or y i∈R n o for regression tasks,where n i and n o are the dimensions of input and output respectively.ELMs aim to learn a decision rule or an approximation function based on the training data. Generally,the training of ELMs consists of two stages.The first stage is to construct the hidden layer using afixed number of randomly generated mapping neurons,which can be any nonlinear piecewise continuous functions,such as the Sigmoid function and Gaussian function given below.1)Sigmoid functiong(x;θ)=11+exp(−(a T x+b));(1)2)Gaussian functiong(x;θ)=exp(−b∥x−a∥);(2) whereθ={a,b}are the parameters of the mapping function and∥·∥denotes the Euclidean norm.A notable feature of ELMs is that the parameters of the hidden mapping functions can be randomly generated ac-cording to any continuous probability distribution,e.g.,the uniform distribution on(-1,1).This makes ELMs distinct from the traditional feedforward neural networks and SVMs. The only free parameters that need to be optimized in the training process are the output weights between the hidden neurons and the output nodes.By doing so,training ELMs is equivalent to solving a regularized least squares problem which is considerately more efficient than the training of SVMs or backpropagation algorithms.In thefirst stage,a number of hidden neurons which map the data from the input space into a n h-dimensional feature space (n h is the number of hidden neurons)are randomly generated. We denote by h(x i)∈R1×n h the output vector of the hidden layer with respect to x i,andβ∈R n h×n o the output weights that connect the hidden layer with the output layer.Then,the outputs of the network are given byf(x i)=h(x i)β,i=1,...,N.(3)In the second stage,ELMs aim to solve the output weights by minimizing the sum of the squared losses of the prediction errors,which leads to the following formulationminβ∈R n h×n o12∥β∥2+C2N∑i=1∥e i∥2s.t.h(x i)β=y T i−e T i,i=1,...,N,(4)where thefirst term in the objective function is a regularization term which controls the complexity of the model,e i∈R n o is the error vector with respect to the i th training pattern,and C is a penalty coefficient on the training errors.By substituting the constraints into the objective function, we obtain the following equivalent unconstrained optimization problem:minβ∈R n h×n oL ELM=12∥β∥2+C2∥Y−Hβ∥2(5)where H=[h(x1)T,...,h(x N)T]T∈R N×n h.The above problem is widely known as the ridge regression or regularized least squares.By setting the gradient of L ELM with respect toβto zero,we have∇L ELM=β+CH H T(Y−Hβ)=0(6) If H has more rows than columns and is of full column rank,which is usually the case where the number of training patterns are more than the number of the hidden neurons,the above equation is overdetermined,and we have the following closed form solution for(5):β∗=(H T H+I nhC)−1H T Y,(7)where I nhis an identity matrix of dimension n h.Note that in practice,rather than explicitly inverting the n h×n h matrix in the above expression,we can use Gaussian elimination to directly solve a set of linear equations in a more efficient and numerically stable manner.If the number of training patterns are less than the number of hidden neurons,then H will have more columns than rows, which often leads to an underdetermined least squares prob-lem.In this case,βmay have infinite number of solutions.To handle this problem,we restrictβto be a linear combination of the rows of H:β=H Tα(α∈R N×n o).Notice that when H has more columns than rows and is of full row rank,then H H T is invertible.Multiplying both side of(6) by(H H T)−1H,we getα+C(Y−H H Tα)=0,(8) This yieldsβ∗=H Tα∗=H T(H H T+I NC)−1Y(9)where I N is an identity matrix of dimension N. Therefore,in the case where training patterns are plentiful compared to the hidden neurons,we use(7)to compute the output weights,otherwise we use(9).IV.T HE MANIFOLD REGULARIZATION FRAMEWORK Semi-supervised learning is built on the following two assumptions:(1)both the label data X l and the unlabeled data X u are drawn from the same marginal distribution P X ;and (2)if two points x 1and x 2are close to each other,then the conditional probabilities P (y |x 1)and P (y |x 2)should be similar as well.The latter assumption is widely known as the smoothness assumption in machine learning.To enforce this assumption on the data,the manifold regularization framework proposes to minimize the following cost functionL m=12∑i,jw ij ∥P (y |x i )−P (y |x j )∥2,(10)where w ij is the pair-wise similarity between two patterns x iand x j .Note that the similarity matrix W =[w ij ]is usually sparse,since we only place a nonzero weight between two patterns x i and x j if they are close,e.g.,x i is among the k nearest neighbors of x j or x j is among the k nearest neighbors of x i .The nonzero weights are usually computed using Gaussian function exp (−∥x i −x j ∥2/2σ2),or simply fixed to 1.Intuitively,the formulation (10)penalizes large variation in the conditional probability P (y |x )when x has a small change.This requires that P (y |x )vary smoothly along the geodesics of P (x ).Since it is difficult to compute the conditional probability,we can approximate (10)with the following expression:ˆLm =12∑i,jw ij ∥ˆyi −ˆy j ∥2,(11)where ˆyi and ˆy j are the predictions with respect to pattern x i and x j ,respectively.It is straightforward to simplify the above expression in a matrix form:ˆL m =Tr (ˆY T L ˆY ),(12)where Tr (·)denotes the trace of a matrix,L =D −W isknown as the graph Laplacian ,and D is a diagonal matrixwith its diagonal elements D ii =l +u∑j =1w i,j .As discussed in [52],instead of using L directly,we can normalize it byD −12L D −12or replace it by L p (p is an integer),based on some prior knowledge.V.S EMI -SUPERVISED ELMIn the semi-supervised setting,we have few labeled data and plenty of unlabeled data.We denote the labeled data in the training set as {X l ,Y l }={x i ,y i }l i =1,and unlabeled dataas X u ={x i }ui =1,where l and u are the number of labeled and unlabeled data,respectively.The proposed SS-ELM incorporates the manifold regular-ization to leverage unlabeled data to improve the classification accuracy when labeled data are scarce.By modifying the ordinary ELM formulation (4),we give the formulation ofSS-ELM as:minβ∈R n h ×n o12∥β∥2+12l∑i =1C i ∥e i ∥2+λ2Tr (F T L F )s.t.h (x i )β=y T i −e T i ,i =1,...,l,f i =h (x i )β,i =1,...,l +u(13)where L ∈R (l +u )×(l +u )is the graph Laplacian built fromboth labeled and unlabeled data,and F ∈R (l +u )×n o is the output matrix of the network with its i th row equal to f (x i ),λis a tradeoff parameter.Note that similar to the weighted ELM algorithm (W-ELM)introduced in [35],here we associate different penalty coeffi-cient C i on the prediction errors with respect to patterns from different classes.This is because we found that when the data is skewed,i.e.,some classes have significantly more training patterns than other classes,traditional ELMs tend to fit the classes that having the majority of patterns quite well but fits other classes poorly.This usually leads to poor generalization performance on the testing set (while the prediction accuracy may be high,but the some classes are neglected).Therefore,we propose to alleviate this problem by re-weighting instances from different classes.Suppose that x i belongs to class t i ,which has N t i training patterns,then we associate e i with a penalty ofC i =C 0N t i.(14)where C 0is a user defined parameter as in traditional ELMs.In this way,the patterns from the dominant classes will not be over fitted by the algorithm,and the patterns from a class with less samples will not be neglected.We substitute the constraints into the objective function,and rewrite the above formulation in a matrix form:min β∈R n h×n o 12∥β∥2+12∥C 12( Y −Hβ)∥2+λ2Tr (βT H TL Hβ)(15)where Y∈R (l +u )×n o is the training target with its first l rows equal to Y l and the rest equal to 0,C is a (l +u )×(l +u )diagonal matrix with its first l diagonal elements [C ]ii =C i ,i =1,...,l and the rest equal to 0.Again,we compute the gradient of the objective function with respect to β:∇L SS −ELM =β+H T C ( Y−H β)+λH H T L H β.(16)By setting the gradient to zero,we obtain the solution tothe SS-ELM:β∗=(I n h +H T C H +λH H T L H )−1H TC Y .(17)As in Section III,if the number of labeled data is fewer thanthe number of hidden neurons,which is common in SSL,we have the following alternative solution:β∗=H T (I l +u +C H H T +λL L H H T )−1C Y .(18)where I l +u is an identity matrix of dimension l +u .Note that by settingλto be zero and the diagonal elements of C i(i=1,...,l)to be the same constant,(17)and (18)reduce to the solutions of traditional ELMs(7)and(9), respectively.Based on the above discussion,the SS-ELM algorithm is summarized as Algorithm1.Algorithm1The SS-ELM algorithmInput:The labeled patterns,{X l,Y l}={x i,y i}l i=1;The unlabeled patterns,X u={x i}u i=1;Output:The mapping function of SS-ELM:f:R n i→R n oStep1:Construct the graph Laplacian L from both X l and X u.Step2:Initiate an ELM network of n h hidden neurons with random input weights and biases,and calculate the output matrix of the hidden neurons H∈R(l+u)×n h.Step3:Choose the tradeoff parameter C0andλ.Step4:•If n h≤NCompute the output weightsβusing(17)•ElseCompute the output weightsβusing(18)return The mapping function f(x)=h(x)β.VI.U NSUPERVISED ELMIn this section,we introduce the US-ELM algorithm for unsupervised learning.In an unsupervised setting,the entire training data X={x i}N i=1are unlabeled(N is the number of training patterns)and our target is tofind the underlying structure of the original data.The formulation of US-ELM follows from the formulation of SS-ELM.When there is no labeled data,(15)is reduced tomin β∈R n h×n o ∥β∥2+λTr(βT H T L Hβ)(19)Notice that the above formulation always attains its mini-mum atβ=0.As suggested in[50],we have to introduce addtional constraints to avoid a degenerated solution.Specifi-cally,the formulation of US-ELM is given bymin β∈R n h×n o ∥β∥2+λTr(βT H T L Hβ)s.t.(Hβ)T Hβ=I no(20)Theorem1:An optimal solution to problem(20)is given by choosingβas the matrix whose columns are the eigenvectors (normalized to satisfy the constraint)corresponding to thefirst n o smallest eigenvalues of the generalized eigenvalue problem:(I nh +λH H T L H)v=γH H T H v.(21)Proof:We can rewrite the problem(20)asminβ∈R n h×n o,ββT Bβ=I no Tr(βT Aβ),(22)Algorithm2The US-ELM algorithmInput:The training data:X∈R N×n i;Output:•For embedding task:The embedding in a n o-dimensional space:E∈R N×n o;•For clustering task:The label vector of cluster index:y∈N N×1+.Step1:Construct the graph Laplacian L from X.Step2:Initiate an ELM network of n h hidden neurons withrandom input weights,and calculate the output matrix of thehidden neurons H∈R N×n h.Step3:•If n h≤NFind the generalized eigenvectors v2,v3,...,v no+1of(21)corresponding to the second through the n o+1smallest eigenvalues.Letβ=[ v2, v3,..., v no+1],where v i=v i/∥H v i∥,i=2,...,n o+1.•ElseFind the generalized eigenvectors u2,u3,...,u no+1of(24)corresponding to the second through the n o+1smallest eigenvalues.Letβ=H T[ u2, u3,..., u no+1],where u i=u i/∥H H T u i∥,i=2,...,n o+1.Step4:Calculate the embedding matrix:E=Hβ.Step5(For clustering only):Treat each row of E as a point,and cluster the N points into K clusters using the k-meansalgorithm.Let y be the label vector of cluster index for allthe points.return E(for embedding task)or y(for clustering task);where A=I nh+λH H T L H and B=H T H.It is easy to verify that both A and B are Hermitianmatrices.Thus,according to the Rayleigh-Ritz theorem[53],the above trace minimization problem attains its optimum ifand only if the column span ofβis the minimum span ofthe eigenspace corresponding to the smallest n o eigenvaluesof(21).Therefore,by stacking the normalized eigenvectors of(21)corresponding to the smallest n o generalized eigenvalues,we obtain an optimal solution to(20).In the algorithm of Laplacian eigenmaps,thefirst eigenvec-tor is discarded since it is always a constant vector proportionalto1(corresponding to the smallest eigenvalue0)[50].In theUS-ELM algorithm,thefirst eigenvector of(21)also leadsto small variations in embedding and is not useful for datarepresentation.Therefore,we suggest to discard this trivialsolution as well.Letγ1,γ2,...,γno+1(γ1≤γ2≤...≤γn o+1)be the(n o+1)smallest eigenvalues of(21)and v1,v2,...,v no+1be their corresponding eigenvectors.Then,the solution to theoutput weightsβis given byβ∗=[ v2, v3,..., v no+1],(23)where v i=v i/∥H v i∥,i=2,...,n o+1are the normalizedeigenvectors.If the number of labeled data is fewer than the numberTABLE ID ETAILS OF THE DATA SETS USED FOR SEMI-SUPERVISED LEARNINGData set Class Dimension|L||U||V||T|G50C2505031450136COIL20(B)2102440100040360USPST(B)225650140950498COIL2020102440100040360USPST1025650140950498of hidden neurons,problem(21)is underdetermined.In this case,we have the following alternative formulation by using the same trick as in previous sections:(I u+λL L H H T )u=γH H H T u.(24)Again,let u1,u2,...,u no +1be generalized eigenvectorscorresponding to the(n o+1)smallest eigenvalues of(24), then thefinal solution is given byβ∗=H T[ u2, u3,..., u no +1],(25)where u i=u i/∥H H T u i∥,i=2,...,n o+1are the normal-ized eigenvectors.If our task is clustering,then we can adopt the k-means algorithm to perform clustering in the embedded space.We summarize the proposed US-ELM in Algorithm2. Remark:Comparing the supervised ELM,the semi-supervised ELM and the unsupervised ELM,we can observe that all the algorithms have two similar stages in the training process,that is the random feature learning stage and the out-put weights learning stage.Under this two-stage framework,it is easy tofind the differences and similarities between the three algorithms.Actually,all the algorithms share the same stage of random feature learning,and this is the essence of the ELM theory.This also means that no matter the task is a supervised, semi-supervised or unsupervised learning problem,we can always follow the same step to generate the hidden layer. The differences of the three types of ELMs lie in the second stage on how the output weights are computed.In supervised ELM and SS-ELM,the output weights are trained by solving a regularized least squares problem;while the output weights in the US-ELM are obtained by solving a generalized eigenvalue problem.The unified framework for the three types of ELMs might provide new perspectives to further develop the ELM theory.VII.E XPERIMENTAL RESULTSWe evaluated our algorithms on wide range of semi-supervised and unsupervised parisons were made with related state-of-the-art algorithms, e.g.,Transductive SVM(TSVM)[54],LapSVM[47]and LapRLS[47]for semi-supervised learning;and Laplacian Eigenmap(LE)[50], spectral clustering(SC)[51]and deep autoencoder(DA)[55] for unsupervised learning.All algorithms were implemented using Matlab R2012a on a2.60GHz machine with4GB of memory.TABLE IIIT RAINING TIME(IN SECONDS)COMPARISON OF TSVM,L AP RLS,L AP SVM AND SS-ELMData set TSVM LapRLS LapSVM SS-ELMG50C0.3240.0410.0450.035COIL20(B)16.820.5120.4590.516USPST(B)68.440.9210.947 1.029COIL2018.43 5.841 4.9460.814USPST68.147.1217.259 1.373A.Semi-supervised learning results1)Data sets:We tested the SS-ELM onfive popular semi-supervised learning benchmarks,which have been widely usedfor evaluating semi-supervised algorithms[52],[56],[57].•The G50C is a binary classification data set of which each class is generated by a50-dimensional multivariate Gaus-sian distribution.This classification problem is explicitlydesigned so that the true Bayes error is5%.•The Columbia Object Image Library(COIL20)is a multi-class image classification data set which consists1440 gray-scale images of20objects.Each pattern is a32×32 gray scale image of one object taken from a specific view.The COIL20(B)data set is a binary classification taskobtained from COIL20by grouping thefirst10objectsas Class1,and the last10objects as Class2.•The USPST data set is a subset(the testing set)of the well known handwritten digit recognition data set USPS.The USPST(B)data set is a binary classification task obtained from USPST by grouping thefirst5digits as Class1and the last5digits as Class2.2)Experimental setup:We followed the experimental setup in[57]to evaluate the semi-supervised algorithms.Specifi-cally,each of the data sets is split into4folds,one of which was used for testing(denoted by T)and the rest3folds for training.Each of the folds was used as the testing set once(4-fold cross-validation).As in[57],this random fold generation process were repeated3times,resulted in12different splits in total.Every training set was further partitioned into a labeled set L,a validation set V,and an unlabeled set U.When we train a semi-supervised learning algorithm,the labeled data from L and the unlabeled data from U were used.The validation set which consists of labeled data was only used for model selection,i.e.,finding the optimal hyperparameters C0andλin the SS-ELM algorithm.The characteristics of the data sets used in our experiment are summarized in Table I. The training of SS-ELM consists of two stages:1)generat-ing the random hidden layer;and2)training the output weights using(17)or(18).In thefirst stage,we adopted the Sigmoid function for nonlinear mapping,and the input weights and biases were generated according to the uniform distribution on(-1,1).The number of hidden neurons n h wasfixed to 1000for G50C,and2000for the rest four data sets.In the second stage,wefirst need to build the graph Laplacian L.We followed the methods discussed in[52]and[57]to compute L,and the hyperparameter settings can be found in[47],[52] and[57].The trade off parameters C andλwere selected from。
一种基于约束的半监督聚类查询扩展方法
中 国 科 技 论 文
CH I NA S CI ENCEP AP ER
Vo 1 . 8 No . 1 0
0C t .2 0 1 3
一
种 基 于 约 束 的 半 监 督 聚 类 查 询 扩 展 方 法
杨 静, 刘 宁 , 张键 沛
A q u e r y e x p a ns i o n me t ho d b a s e d o n c o ns t r a i n d e s e mi - s u p e r v i s d e c l u s t e r i n g
Ya n g J i n g, Li u Ni n g, Z h a n g J i a n p e i
( 哈 尔滨 工 程 大 学计 算机 科 学 与技 术 学 院 , 哈 尔滨 1 5 0 0 0 1 )
摘 要: 针 对伪相关反馈模型反馈文档信息质量差和扩展词选择不适 产生的漂移现象等 问题 , 提 出 了一种基于约束 的半监督聚 类查询扩展方法。该方法对初检 结果 的前 k个文档进行人工标注 , 分成相关文档与不相 关文档两类 ; 并利用一种半监 督聚类算 法对初检结果 的前 n个文档进行分析, 提取 出与查询相关的文档作 为反馈 文档。该方 法通 过对少量标 注文档 与查询相 关性 的 学习, 能够较准确地估计 出大量未知文档与查询 的相关性 , 提 高反馈 文档 的质 量, 从 而有效提高检 索 的查全 率和查 准率。实验 结果 表 明 , 该 方 法 比传 统 的伪 相 关 反 馈 和 基 于 无 监 督 聚 类 的 伪 相 关 反 馈 有 更优 的检 索性 能 。 关键词 : 信 息检 索 ; 查询扩展 ; 约束 聚 类 ; 半 监督 聚 类 ; 伪 相 关 反馈 中 图分 类 号 : TP 3 9 1 文献标志码 : A 文章编号 : 2 0 9 5 —2 7 8 3 ( 2 0 1 3 ) 1 0 ~0 9 9 4— 0 4
稀疏距离扩展目标自适应检测及性能分析
第39卷第7期自动化学报Vol.39,No.7 2013年7月ACTA AUTOMATICA SINICA July,2013稀疏距离扩展目标自适应检测及性能分析魏广芬1苏峰2简涛2摘要在球不变随机向量杂波背景下,研究了稀疏距离扩展目标的自适应检测问题.基于有序检测理论,利用协方差矩阵估计方法,分析了自适应检测器(Adaptive detector,AD).其中,基于采样协方差矩阵(Sample covariance matrix,SCM)和归一化采样协方差矩阵(Normalized sample covariance matrix,NSCM),分别建立了AD-SCM和AD-NSCM检测器.从恒虚警率特性和检测性能综合来看,AD-NSCM的性能优于AD-SCM和已有的修正广义似然比检测器.最后,通过仿真实验验证了所提方法的有效性.关键词稀疏距离扩展目标,自适应检测,采样协方差矩阵,归一化采样协方差矩阵,有序统计量引用格式魏广芬,苏峰,简涛.稀疏距离扩展目标自适应检测及性能分析.自动化学报,2013,39(7):1126−1132DOI10.3724/SP.J.1004.2013.01126Sparsely Range-spread Target Detector and Performance AssessmentWEI Guang-Fen1SU Feng2JIAN Tao2Abstract In the background where the clutter is modeled as a spherically invariant random vector,the adaptive detection of sparsely range-spread targets is addressed.By exploiting the order statistics and the covariance matrix estimators,the adaptive detector(AD)is assessed.Herein,the detectors of AD-SCM and AD-NSCM are proposed based on the sample covariance matrix(SCM)and normalized sample covariance matrix(NSCM),respectively.In terms of constant false alarm rate properties and detection performance,the AD-NSCM outperforms the AD-SCM and the existing detector of modified generalized likelihood ratio.Finally,the performance assessment conducted by simulation confirms the effectiveness of the proposed detectors.Key words Sparsely range-spread target,adaptive detection(AD),sample covariance matrix(SCM),normalized sample covariance matrix(NSCM),order statisticsCitation Wei Guang-Fen,Su Feng,Jian Tao.Sparsely range-spread target detector and performance assessment.Acta Automatica Sinica,2013,39(7):1126−1132低分辨率雷达的目标尺寸小于距离分辨率,这种目标常称之为点目标[1].通过采用脉冲压缩技术,高分辨率雷达能够在空间上把一个目标分解成许多散射点[2−3],目标回波在雷达径向上的多个散射点分布在不同的距离分辨单元中,形成距离扩展目标[4].在许多情况下,距离扩展目标的散射点密度是稀疏的,可将这种目标简称为“稀疏距离扩展目标”.目前,高斯背景下的距离扩展目标检测已取得一定进收稿日期2011-12-28录用日期2012-08-27Manuscript received December28,2011;accepted August27, 2012国家自然科学基金(61174007,61102166),山东省优秀中青年科学家科研奖励基金(BS2010DX022)资助Supported by National Natural Science Foundation of China (61174007,61102166)and the Scientific Research Founda-tion for Outstanding Young Scientists of Shandong Province (BS2010DX022)本文责任编委韩崇昭Recommended by Associate Editor HAN Chong-Zhao1.山东工商学院信息与电子工程学院烟台2640052.海军航空工程学院信息融合技术研究所烟台2640011.School of Information and Electronics,Shandong Institute of Business and Technology,Yantai2640052.Research Insti-tute of Information Fusion,Naval Aeronautical and Astronauti-cal University,Yantai264001展,其中,针对估计参数空间过大的问题,文献[5]提出了一种无需辅助数据的检测器,简称为修正的广义似然比检验(Modified generalized likelihood ratio test,MGLRT)检测器,其在高斯背景下是有界恒虚警率(Constant false alarm rate,CFAR)的.但在高距离分辨率的条件下,背景杂波呈现出诸多的非高斯特性[1],高斯背景下获得的检测器已无法有效检测目标.在非高斯背景下,文献[6]研究了已知杂波协方差矩阵条件下的距离扩展目标检测;而通过利用不含目标信号的辅助数据,文献[7]和文献[8]分别针对距离扩展目标和距离–多普勒二维分布式目标展开了自适应检测研究.需要指出的是,以上自适应检测方法[7−8]都是基于辅助数据的.当无法获得满足条件的辅助数据时,实现非高斯背景下距离扩展目标的自适应检测具有重要意义.文献[9]基于迭代估计方法实现了自适应检测,但迭代估计计算量较大,如何在保证性能的同时减小计算量,也是值得探讨的问题.7期魏广芬等:稀疏距离扩展目标自适应检测及性能分析1127稀疏距离扩展目标的散射点只占据目标距离扩展范围的一部分,与含纯杂波的距离分辨单元幅值相比,含目标散射点的距离分辨单元幅值明显更高,这就为实现目标的自适应检测提供了条件.本文针对非高斯杂波中的稀疏距离扩展目标检测问题,在不需要辅助数据的条件下,首先,采用有序统计检测理论和协方差矩阵估计方法,粗略估计目标散射点单元集合;然后,进一步利用适当估计方法获得协方差矩阵的精确估计,设计了自适应检测器(Adaptivedetector,AD),并通过仿真实验验证了检测器的有效性.1问题观测数据来源于N个阵元的线性阵列天线,需跨过K个可能存在目标的距离分辨单元z t,t=1,···,K,判决一个距离扩展目标的存在与否.假设可能的目标完全包含在这些数据中,并且忽略目标距离走动的问题.在杂波背景下,待解决的检测问题可由以下二元假设检验公式来表达.H0:z t=c t,t=1,···,KH1:z t=αt p+c t,t=1,···,K(1)其中,p=(1,e jφ,e j2φ,···,e j(N−1)φ)T/√N表示已知单位导向矢量,即p H p=1,这里(·)H表示共轭转置,φ表示相移常量,(·)T表示转置,αt,t=1,···,K是反映目标幅度的未知参数.非高斯杂波可用球不变随机向量建模[10],由于中心极限定理在较小区域的杂波范围内仍是有效的,球不变随机向量可以表示为两个分量的乘积:一个是反映受照区域反射率的时空“慢变化”纹理分量,另一个是变化“较快”的“散斑”高斯过程.那么,距离分辨单元t的N维杂波向量c t为c t=√τt·ηt,t=1,···,K(2)其中,ηt=(ηt(1),ηt(2),···,ηt(N))T是零均值协方差矩阵为Σ的复高斯随机向量,非负的纹理分量τt与ηt相互独立,其用来描述杂波功率在不同距离分辨单元间的起伏,且服从未知分布fτ.另外,杂波协方差矩阵结构Σ可以表示为Σ=E{ηt ηHt}(3)距离扩展目标完全包含在K个距离分辨单元的滑窗中,假设一个等效散射点最多只占据一个距离分辨单元,即目标等效散射点数目与其所占据的距离分辨单元数目是相等的.通常目标散射点是稀疏分布的,与含纯杂波的距离分辨单元相比,有散射点的距离分辨单元幅值往往更高.含目标等效散射点的距离分辨单元数目用h0表示,而其所对应的距离分辨单元下标用集合Θh表示.为了简化分析,假设h0是已知的,若其未知,可利用模型阶数选择方法获得合适的估计值[11].如前所述,对距离扩展目标的检测只需在距离分辨单元Θh内进行,式(1)表示的假设检验问题可以进一步表示为H0:z t=c t,t∈ΘhH1:z t=αt p+c t,t∈Θh(4)在分布fτ未知的条件下,距离分辨单元t的杂波是条件高斯的,其相应的方差为τt.由于幅度αt 未知而向量p已知,针对不同假设,观测向量z t的联合概率密度可表示为t∈Θhf(z t|τt,H0)=t∈Θh1πNτN t det(Σ)×exp[−1τtz HtΣ−1z t](5)t∈Θhf(z t|αt,τt,H1)=t∈Θh1πNτN t det(Σ)×exp−1τt(z t−αt p)HΣ−1(z t−αt p)(6)其中,det(·)表示方阵的行列式.2检测器实现在未知集合Θh的条件下,为了获得估计的参数集合ˆΘh,这里先假设已知矩阵Σ.由于未知参数α={αt|t∈Θh}和τ={τt|t∈Θh},可利用广义似然比检验(GLRT)原理进行检测器设计[12].在矩阵Σ已知的条件下,根据GLRT原理,对于似然比中的未知参数,可用最大似然(Maximum likelihood,ML)估计进行替换,即考虑如下二元判决:maxτmaxαt∈Θhf(z t|αt,τt,H1)maxτt∈Θhf(z t|τt,H0)H1><H0T0(7)在H1假设下求得αt的ML估计为[13]ˆαt=p HΣ−1z tp HΣ−1p(8)将ˆαt代入式(7)后,可进一步在不同假设条件下求得τt的ML估计:H0:ˆτt=1Nz HtΣ−1z t(9) H1:ˆτt=1N(z t−ˆαt p)HΣ−1(z t−ˆαt p)(10)1128自动化学报39卷将式(8)∼(10)代入式(7)中,可得自然对数形式的GLRT判决为λ1=−Nt∈Θh0ln1−|p HΣ−1z t|2(z H tΣ−1z t)(p HΣ−1p)H1><H0T1(11)令w t=|p HΣ−1z t|2(z H tΣ−1z t)(p HΣ−1p)(12)值得注意的是,w t的结构类似于一个归一化匹配滤波器(权向量为Σ−1p)[14].可以看出,式(12)的分子部分p HΣ−1z t等效于给定距离分辨单元观测z t经过匹配滤波后的结果[14].而分母部分的两项z HtΣ−1z t和p HΣ−1p起到了归一化处理的作用,因此,w t是距离单元观测z t经过匹配滤波后模平方的归一化,可以看作是距离单元观测经归一化匹配滤波后的能量.由于目标完全包含在K个单元的距离滑窗中,且距离扩展目标等效散射点所占据的距离分辨单元幅值往往大于纯杂波的距离分辨单元幅值,因此,可通过归一化能量w t,t=1,···,K中最大的h0个值来确定未知集合ˆΘh.实际应用中协方差矩阵结构Σ往往是未知的,为了确定集合ˆΘh,需先对协方差矩阵结构进行估计.如前所述,纹理分量τt的分布fτ是未知的,因此,协方差矩阵结构Σ的ML估计不能通过期望最大化得到[13].本文考虑两种协方差矩阵估计方法.一种是高斯背景下的经典采样协方差矩阵(Sample covariance matrix,SCM),其可以表示为ˆΣSCM =1RRr=1y r y Hr(13)其中,y r,r=1,···,R表示可用于估计的R个数据.当R≥N时,SCM是以概率为1非奇异的,同时也是正定Hermitian矩阵[12].另外,在非高斯背景下,也常常利用辅助数据获得归一化采样协方差矩阵(Normalized sample covariance matrix, NSCM),可以表示为ˆΣNSCM =1RRr=1Ny Hry ry r y Hr(14)与文献[9]类似,针对稀疏距离扩展目标的自适应检测,AD检测器的实现分为如下三个步骤.步骤1.基于SCM或NSCM方法,利用K个待检测单元的观测数据获得初步估计矩阵ˆΣ1,进一步将估计矩阵ˆΣ1代入式(12)中,可得到初步估计ˆw(1)t.对ˆw(1)t,t=1,···,K按升序排列,可得如下有序序列:0≤ˆw(1)(1)≤···≤ˆw(1)(t)≤···≤ˆw(1)(K)≤1(15)步骤2.考虑有序序列的K−h0个最小值(即ˆw(1)(t),t=1,···,K−h0),并用Ωh表示相应距离分辨单元下标的集合.为了获得可逆的估计矩阵,需满足K−h0≥N.根据之前的分析,集合Ωh中的距离分辨单元极可能只包含纯杂波,故可以利用Ωh0对应的距离分辨单元观测值,精确估计矩阵Σ,并采用与初步估计中相同的估计方法(SCM或NSCM),进一步获得较为精确的协方差矩阵结构估计ˆΣ2.利用ˆΣ2代替式(12)中的未知矩阵Σ,得到w t的精确估计值用ˆw(2)(t)表示.对ˆw(2)(t),t=1,···,K按升序排列,可得如下有序序列:0≤ˆw(2)(1)≤···≤ˆw(2)(t)≤···≤ˆw(2)(K)≤1(16)考虑有序序列的h0个最大值(即ˆw(2)(t),t=K−h0+1,···,K),并用ˆΘh表示相应距离分辨单元下标的集合.步骤3.将距离分辨单元下标的集合ˆΘh和协方差矩阵的精确估计ˆΣ2代入式(11)中,获得自适应检测器AD的检测统计量可以表示为λ2=−NKt=K−h0+1ln(1−ˆw(2)(t))=−Nt∈ˆΘhln[1−|p HˆΣ−12z t|2(z H tˆΣ−12z t)(p HˆΣ−12p)]H1><H0T2(17)需要说明的是,在存在目标散射点的情况下,步骤1的初步估计矩阵不可避免地引入了估计误差,虽然这种误差在步骤2中得到了一定的抑制,但它仍将影响后续精确估计矩阵的精度.在存在辅助数据的前提下,为了获得良好的检测性能,一般要求辅助数据个数不小于阵元数N的两倍[15].在待检测单元数K不变的情况下,可利用的纯杂波单元数(K−h0)将随着散射点个数的增加而减小,因此,此处需等价满足(K−h0)≥2N.进一步考虑到步骤1中散射点单元所引起的估计误差,实际应用中可能需要更大的(K−h0)/N值以弥补步骤1中导致的性能损失,具体取值将在接下来的性能评估中给出.由于采用不同的估计方法会获得不同的自适应检测器,在这里,我们分别将采用SCM和NSCM估计方法获得的相应检测器简称为AD-SCM和AD-NSCM.由于本文的自适应检测器中ˆΘh和ˆΣ2均受到协方差矩阵估计方法的影响,因此,有必要评估自适应距离扩展目标检测器的CFAR特性,这将在接下来的性能分析中进行.7期魏广芬等:稀疏距离扩展目标自适应检测及性能分析1129 3性能评估本节对稀疏距离扩展目标自适应检测器AD-SCM和AD-NSCM进行了CFAR特性和检测性能评估,并与无需辅助数据的MGLRT检测器[5]进行了比较分析.利用Toeplitz矩阵对Σ进行建模,具体采用指数相关结构,在杂波一阶相关系数为γ的条件下,第m行第n列的矩阵元素为[Σ]m,n=γ|m−n|,1≤m,n≤N(18)利用Γ分布对纹理分量的分布fτ进行建模:fτ(x)=LbLΓ(L)x L−1e−(L b)x,x≥0(19)其中,Γ(·)是Gamma函数,均值b代表了平均杂波功率;参数L表示分布fτ的非高斯拖尾特征,具体来说,随着L的减小,函数fτ的拖尾将增大,而杂波的非高斯尖峰程度将增大.采用蒙特卡罗方法计算相应的检测概率P d和虚警概率P fa.根据前面的假设,在所有距离分辨单元均存在杂波的条件下,目标等效散射点只存在于h0个距离分辨单元中,且一个等效散射点最多只占据一个距离分辨单元.在所有K个距离分辨单元上,每个单元的目标或杂波的平均功率分别用σ2s 或σ2c表示.对于存在目标散射点的距离分辨单元(t∈Θh),用零均值独立复高斯变量对等效散射点建模,即目标散射点幅度在不同距离分辨单元间瑞利起伏;相应的方差表示为E{|αt|2}=εtσ2sK(εt表示单个散射点占目标总能量的比率).由|αt|2,t=1,···,K的独立性可知,检测性能与散射点在待检测单元中的位置无关.几种典型的散射点分布模型如表1所示.其中,Model 1中的目标能量等量分布在h0个距离分辨单元范围内;Model2∼4中某个距离分辨单元具有大部分能量,而剩下的能量在其余距离分辨单元中等量分布.Model5相当于点目标,是Model2∼4的极端特例.输入信杂比(Signal to clutter ratio,SCR)定义为K个距离分辨单元内的平均信杂比,即SCR=σ2sσ2cp HΣ−1p(20)为了便于CFAR特性评估,需针对杂波功率水平(对应于b)、尖峰程度(对应于L)和协方差矩阵结构(对应于γ)的不同情况,分析检测器的检测阈值与虚警概率间的关系.相关研究表明[9],在非高斯杂波下MGLRT是非CFAR的,即高斯背景下获得的MGLRT检测器不适用于非高斯背景.为了便于比较,在K=15,h0=3,N=2,L=0.1,1,γ=0,0.5,0.9和b=1,10条件下,图1和图2分别给出了AD-SCM和AD-NSCM的检测阈值(De-tection threshold)与虚警概率(False alarm prob-ability)的关系曲线.图1表明,AD-SCM检测器对杂波协方差矩阵结构和功率水平具有自适应性,但对杂波尖峰不具有适应能力.而图2说明,AD-NSCM对杂波尖峰和杂波功率水平具有CFAR特性,但其检测阈值仍受协方差矩阵结构的轻微影响.综合来看,AD-NSCM的检测阈值在不同杂波条件下的鲁棒性更好.图1K=15,N=2,L=0.1,1,γ=0,0.5,0.9,b=1,10,h0=3时,AD-SCM的CFAR特性曲线Fig.1CFAR curves of AD-SCM for K=15,N=2, L=0.1,1,γ=0,0.5,0.9,b=1,10,h0=3表1不同散射点分布模型的εt值Table1Values ofεt for typical scatters models目标距离分辨单元12···h0Model11h01h01h01h0Model20.50.5h0−10.5h0−10.5h0−1Model30.90.1h0−10.1h0−10.1h0−1Model40.990.01h0−10.01h0−10.01h0−1Model510001130自动化学报39卷图2K=15,N=2,L=0.1,1,γ=0,0.5,0.9,b=1,10,h0=3时,AD-NSCM的CFAR特性曲线Fig.2CFAR curves of AD-NSCM for K=15,N=2, L=0.1,1,γ=0,0.5,0.9,b=1,10,h0=3接下来分析AD检测器的检测性能.图3给出了MGLRT、AD-SCM和AD-NSCM的性能曲线.可以看出,AD-NSCM的检测性能最优,MGLRT 其次,而AD-SCM的检测性能最差.从以上分析综合来看,与MGLRT和AD-SCM相比,AD-NSCM 在CFAR特性和检测性能方面均具有一定的优势.下文将重点对AD-NSCM的检测性能展开分析.图3K=15,N=2,L=1,γ=0.9,h0=3,P fa=10−4, Model1时,MGLRT,AD-SCM和AD-NSCM的检测性能曲线Fig.3Detectability curves of MGLRT,AD-SCM and AD-NSCM for K=15,N=2,L=1,γ=0.9,h0=3,P fa=10−4,Model1首先,针对表1中5种不同模型,图4评估了散射点能量分布对AD-NSCM检测性能的影响.可以看出,随着距离分辨单元间散射点能量分布的均匀性增加,检测性能逐渐改善.为了便于分析,下文中主要针对Model1模型.另外,在不同的散射点密度条件下,图5分析了AD-NSCM检测性能.由图5可知,当h0<7时,协方差矩阵结构的估计误差较小,其对检测性能的影响也较小,当散射点数目增加时,检测器可利用的目标能量增大,AD-NSCM的检测性能得到一定的改善.当h0≥7时,协方差矩阵结构的估计误差影响较大,当散射点数目增加时,进行矩阵估计所用的观测数据量减少,估计矩阵的误差加大,导致较为严重的检测损失,且损失量高于增加散射点数目所获得的性能增益,并引起总检测性能的退化.综合来看,当h0<K/2时,AD-NSCM 的检测性能较好.图4K=15,N=2,L=1,γ=0.9,h0=3,P fa=10−4, Model1∼5对应的AD-NSCM检测性能曲线Fig.4Detectability curves of AD-NSCM for K=15, N=2,L=1,γ=0.9,h0=3,P fa=10−4,Model1∼5图5K=15,N=2,L=1,γ=0.9,P fa=10−4,Model 1时,h0=2,4,6,7,8,10,12对应的AD-NSCM检测性能曲线Fig.5Detectability curves of AD-NSCM for K=15, N=2,L=1,γ=0.9,P fa=10−4,Model1,h0=2,4,6,7,8,10,12在不同杂波尖峰条件下,图6给出了AD-NSCM检测性能.由图6可知,随着L的减小,杂波尖峰程度增大,AD-NSCM的检测性能有所改善.图7给出了不同杂波相关性对应的检测性能曲线.可以看出,杂波一阶相关系数的变化对检测性能几乎没有影响,说明AD-NSCM对杂波相关性7期魏广芬等:稀疏距离扩展目标自适应检测及性能分析1131的变化具有良好适应性.图8进一步分析了阵元数变化(N =2,4,6,8)对AD-NSCM 检测性能的影响.可以看出,在阵元数N ≤4的条件下,当N 增加时,检测性能有所提高;而在N >4的条件下,当N 增加时,检测性能反而有所下降.可能的原因是,当进行矩阵估计所用的观测数据量不变时(R =K −h 0=12),N 的增加会导致协方差矩阵维数变大,待估参量的数目增加,估计精度下降,并直接引起检测性能的退化.综合来看,当K −h 0≥3N 时,AD-NSCM 的检测性能较好.图6K =15,N =2,γ=0.9,h 0=3,P fa =10−4,Model 1时,L =0.5,1,2,10对应的AD-NSCM 检测性能曲线Fig.6Detectability curves of AD-NSCM for K =15,N =2,γ=0.9,h 0=3,P fa =10−4,Model 1,L =0.5,1,2,10图7K =15,N =2,L =1,h 0=3,P fa =10−4,Model 1时,γ=0,0.5,0.9对应的AD-NSCM 检测性能曲线Fig.7Detectability curves of AD-NSCM for K =15,N =2,L =1,h 0=3,P fa =10−4,Model 1,γ=0,0.5,0.94结论本文研究了非高斯杂波中的稀疏距离扩展目标检测问题.在不需要辅助数据的条件下,基于SCM 和NSCM 估计器,分别建立了AD-SCM 和AD-NSCM 检测器.从CFAR 特性和检测性能综合来看,AD-NSCM 的性能优于AD-SCM 和MGLRT.对于典型的非高斯杂波环境,随着杂波尖峰程度的增大,AD-NSCM 的检测性能得到提高,且其对杂波相关性的变化也具有良好适应性.另外,对于h 0<K/2的稀疏距离扩展目标,在K −h 0≥3N 条件下,AD-NSCM 能获得满意的检测性能.需要说明的是,与文献[9]中的检测器相比,AD-NSCM 虽然减小了计算量,但也牺牲了部分CFAR 特性.如何减小检测器对散射点信息的依赖性,是下一步需要研究的问题.图8K =15,L =1,γ=0.9,h 0=3,P fa =10−4,Model 1时,N =2,4,6,8对应的AD-NSCM 检测性能曲线Fig.8Detectability curves of AD-NSCM for K =15,L =1,γ=0.9,h 0=3,P fa =10−4,Model 1,N =2,4,6,8References1Zhou Yu,Zhang Lin-Rang,Liu Xin,Liu Nan.Adap-tive detection based on Bayesian approach in heteroge-neous environments.Acta Automatica Sinica ,2011,37(10):1206−1212(周宇,张林让,刘昕,刘楠.非均匀杂波环境下基于贝叶斯方法的自适应检测.自动化学报,2011,37(10):1206−1212)2He Chu,Liu Ming,Feng Qian,Deng Xin-Ping.PolIn-SAR image classification based on compressed sensing and multi-scale pyramid.Acta Automatica Sinica ,2011,37(7):820−827(何楚,刘明,冯倩,邓新萍.基于多尺度压缩感知金字塔的极化干涉SAR 图像分类.自动化学报,2011,37(7):820−827)3Zhu Yi-Long,Fan Hong-Qi,Lu Zai-Qi,Fu Qiang.A radar target maneuver detection algorithm based on the one-dimensional high resolution Doppler profile.Acta Automat-ica Sinica ,2011,37(8):901−914(祝依龙,范红旗,卢再奇,付强.基于高分辨一维多普勒像的雷达目标机动检测算法.自动化学报,2011,37(8):901−914)4Jian T,He Y,Su F,Qu C W,Ping D F.High resolu-tion radar target adaptive detector and performance assess-ment.Journal of Systems Engineering and Electronics ,2011,22(2):212−2185Gerlach K,Steiner M J.Adaptive detection of range dis-tributed targets.IEEE Transactions on Signal Processing ,1999,47(7):1844−18511132自动化学报39卷6Gerlach K.Spatially distributed target detection in non-Gaussian clutter.IEEE Transactions on Aerospace and Elec-tronic Systems,1999,35(3):926−9347Conte E,De Maio A,Ricci G.CFAR detection of distributed targets in non-Gaussian disturbance.IEEE Transactions on Aerospace and Electronic Systems,2002,38(2):612−6218Bon N,Khenchaf A,Garello R.GLRT subspace detection for range and Doppler distributed targets.IEEE Transac-tions on Aerospace and Electronic Systems,2008,44(2): 678−6969Jian Tao,Su Feng,He You,Li Bing-Rong,Gu Xue-Feng.Distributed target detection without secondary data.Acta Aeronautica et Astronautica Sinica,2011,32(8): 1542−1547(简涛,苏峰,何友,李炳荣,顾雪峰.无需辅助数据的分布式目标自适应检测器.航空学报,2011,32(8):1542−1547)10Jian T,He Y,Su F,Qu C W,Gu X F.Performance charac-terization of two adaptive range-spread target detectors for unwanted signal.In:Proceedings of the9th International Conference on Signal Processing.Beijing,China:IEEE, 2008.2326−232911Gini F,Bordoni F,Farina A.Multiple radar targets de-tection by exploiting induced amplitude modulation.IEEE Transactions on Signal Processing,2004,52(4):903−913 12Kay S M.Fundamentals of Statistical Signal Processing, vol.2:Detection Theory.New York:Prentice-Hall,1998.196−26013Kay S M.Fundamentals of Statistical Signal Processing, vol.1:Estimation Theory.New York:Prentice-Hall,1993.157−21814Kraut S,Scharf L L,McWhorter L T.Adaptive subspace detectors.IEEE Transactions on Signal Processing,2001, 49(1):1−1615Reed I S,Mallett J D,Brennan L E.Rapid convergence rate in adaptive arrays.IEEE Transactions on Aerospace and Electronic Systems,1974,10(6):853−863魏广芬博士,山东工商学院副教授.2005年获得大连理工大学机械电子工程专业工学博士学位.主要研究方向为传感器检测与信号处理理论及技术.本文通信作者.E-mail:*******************(WEI Guang-Fen Ph.D.,associateprofessor at Shandong Institute of Busi-ness and Technology.She received her Ph.D.degree from Dalian University of Technology in2005.Her research in-terest covers theory and technology of sensor detection and signal processing.Corresponding author of this paper.)苏峰博士,海军航空工程学院信息融合技术研究所讲师.主要研究方向为雷达信号检测与信号处理.E-mail:*****************(SU Feng Ph.D.,lecturer at NavalAeronautical and Astronautical Univer-sity.His research interest covers radarsignal detection and signal processing.)简涛博士,海军航空工程学院信息融合技术研究所讲师.主要研究方向为雷达信号检测与信号处理.E-mail:********************.cn(JIAN Tao Ph.D.,lecturer at NavalAeronautical and Astronautical Univer-sity.His research interest covers radarsignal detection and signal processing.)。
深度学习中的半监督学习方法
深度学习中的半监督学习方法在深度学习领域,半监督学习(Semi-Supervised Learning)是一种处理具有标记和未标记样本的学习方法。
相比于完全监督学习,半监督学习利用未标记样本的信息能够提供更多的数据,从而改善模型的性能。
在本文中,我们将深入探讨深度学习中的半监督学习方法,包括其优势、主要技术以及应用领域。
半监督学习背景传统的监督学习方法通常需要大量标记样本来训练模型,但在许多实际应用中,标记样本往往难以获取或者标记成本过高。
与此同时,未标记样本相对容易获取,但其无法直接用于模型的训练。
半监督学习的目标就是充分利用未标记样本的信息,提高模型的性能。
半监督学习方法可以看作是无监督学习和监督学习的结合,通过利用无标记样本进行模型训练,同时使用有标记样本进行模型优化。
半监督学习方法1. 自训练(Self-training)自训练是最基本的半监督学习方法之一。
该方法通过将有标记样本的预测结果作为伪标签,然后使用伪标签和未标记样本一起训练模型。
自训练方法通常采用迭代的方式,每轮迭代后,使用更新的模型对未标记样本进行预测并生成新的伪标签。
2. 半监督生成模型(Semi-supervised Generative Models)半监督生成模型利用生成模型来学习数据的分布,并且通过生成模型与有标记样本的条件概率进行建模。
典型的半监督生成模型包括生成对抗网络(GAN)、变分自编码器(Variational Autoencoder)等。
通过生成模型,半监督生成模型可以生成未标记样本,从而扩大样本空间,提高模型的性能。
3. 半监督降噪(Semi-Supervised Denoising)半监督降噪方法通过在训练过程中引入噪声,利用噪声和未标记样本之间的关系来改进模型。
该方法的核心思想是将未标记样本与具有噪声的样本进行混合,并在训练过程中对模型进行约束,以提高模型的泛化能力。
半监督学习的优势半监督学习方法相比于完全监督学习方法具有以下几个优势:1. 数据利用率高:通过利用未标记样本,半监督学习能够充分利用数据资源,提高模型的性能。
半监督深度学习图像分类方法研究综述
半监督深度学习图像分类方法研究综述吕昊远+,俞璐,周星宇,邓祥陆军工程大学通信工程学院,南京210007+通信作者E-mail:*******************摘要:作为人工智能领域近十年来最受关注的技术之一,深度学习在诸多应用中取得了优异的效果,但目前的学习策略严重依赖大量的有标记数据。
在许多实际问题中,获得众多有标记的训练数据并不可行,因此加大了模型的训练难度,但容易获得大量无标记的数据。
半监督学习充分利用无标记数据,提供了在有限标记数据条件下提高模型性能的解决思路和有效方法,在图像分类任务中达到了很高的识别精准度。
首先对于半监督学习进行概述,然后介绍了分类算法中常用的基本思想,重点对近年来基于半监督深度学习框架的图像分类方法,包括多视图训练、一致性正则、多样混合和半监督生成对抗网络进行全面的综述,总结多种方法共有的技术,分析比较不同方法的实验效果差异,最后思考当前存在的问题并展望未来可行的研究方向。
关键词:半监督深度学习;多视图训练;一致性正则;多样混合;半监督生成对抗网络文献标志码:A中图分类号:TP391.4Review of Semi-supervised Deep Learning Image Classification MethodsLYU Haoyuan +,YU Lu,ZHOU Xingyu,DENG XiangCollege of Communication Engineering,Army Engineering University of PLA,Nanjing 210007,ChinaAbstract:As one of the most concerned technologies in the field of artificial intelligence in recent ten years,deep learning has achieved excellent results in many applications,but the current learning strategies rely heavily on a large number of labeled data.In many practical problems,it is not feasible to obtain a large number of labeled training data,so it increases the training difficulty of the model.But it is easy to obtain a large number of unlabeled data.Semi-supervised learning makes full use of unlabeled data,provides solutions and effective methods to improve the performance of the model under the condition of limited labeled data,and achieves high recognition accuracy in the task of image classification.This paper first gives an overview of semi-supervised learning,and then introduces the basic ideas commonly used in classification algorithms.It focuses on the comprehensive review of image classification methods based on semi-supervised deep learning framework in recent years,including multi-view training,consistency regularization,diversity mixing and semi-supervised generative adversarial networks.It summarizes the common technologies of various methods,analyzes and compares the differences of experimental results of different methods.Finally,this paper thinks about the existing problems and looks forward to the feasible research direction in the future.Key words:semi-supervised deep learning;multi-view training;consistency regularization;diversity mixing;semi-supervised generative adversarial networks计算机科学与探索1673-9418/2021/15(06)-1038-11doi:10.3778/j.issn.1673-9418.2011020基金项目:国家自然科学基金(61702543)。
GSM0710中文版
CMUX协议文档修订情况记录:目录CMUX协议文档 (1)1. 引言 (4)1.1. 编写目的 (4)1.2. 编写背景 (4)1.3. 参考资料 (4)1.4. 名字解释 (4)2. CMUX协议框架 (5)3. Non Error Recovery Mode (8)3.1. 服务接口定义 (8)3.1.1. 服务模型 (8)3.1.2. 启动CMUX服务模式 (9)3.1.3. 建立DLC服务 (9)3.1.4. 数据服务 (10)3.1.5. 功耗控制 (10)3.1.5.1. Sleep模式 (11)3.1.5.2. Wake up模式 (11)3.1.6. 释放DLC服务 (11)3.1.7. 关闭服务 (11)3.1.8. 控制服务 (11)3.1.8.1. 07.10协议服务 (12)3.1.8.2 虚拟端口服务 (13)3.2. 数据帧结构 (14)3.2.1. 帧域 (14)3.2.1.1. 标志域(Flag Sequence Field) (14)3.2.1.2. 地址域(Address Field) (14)3.2.1.3. 控制域(Control Field) (15)3.2.1.4. 信息域(Information Field) (15)3.2.1.5. 长度指示域(Length Indicator) (15)3.2.1.6. FCS域(Frame Checking Sequence Field) (15)3.2.2. 格式约定 (16)3.2.3. 帧有效性 (16)3.2.4. 帧中止 (16)3.2.5. 数据帧之间的填充 (16)3.2.6. 基本Basic (16)3.2.6.1. 约束 (17)3.2.7. 高级Advanced (17)3.2.7.1. 控制字节透明 (17)3.2.7.2. 开始/停止传输-扩展透明 (17)3.2.7.3. 流控(Flow-control)透明 (18)3.2.7.4. 帧的结构 (18)3.3. 帧类型 (18)3.4. 过程和状态 (20)3.4.1. 建立DLC链路 (20)3.4.2. 释放DLC链路 (20)3.4.3. 信息传输 (21)3.4.4. 帧变量 (21)3.4.5. 超时的考虑 (22)3.4.6. 多路控制通道 (22)3.4.6.1. 控制消息格式 (22)3.4.6.2. 控制消息类型参数 (23)3.4.7. 电源控制与唤醒机制 (32)3.4.8. 流控 (32)3.5. 集成层Convergence Layer (34)3.5.1. 类型1-未结构化的字节流 (34)3.5.2. 类型2-带参数的未结构化的字节流 (34)3.5.3. 类型3-不可中断的帧数据 (36)3.5.4. 类型4-可中断的帧数据 (36)3.6. DLCI值 (37)3.7. 系统参数 (37)3.7.1. 确认时间T1 (37)3.7.2. 帧的最大长度N1 (38)3.7.3. 最大重发次数N2 (38)3.7.4. 窗口大小k (38)3.7.5. 控制通道的响应时间T2 (38)3.7.6. 唤醒流程的响应时间T3 (38)3.8. 启动和关闭MUX (38)4. Error Recovery Mode (39)1. 引言1.1. 编写目的本文档根据GSM07.10协议同时结合E901 Mobile参考代码,对CMUX多串口协议进行了大致的描述。
机器学习技术中的半监督聚类方法
机器学习技术中的半监督聚类方法半监督聚类是机器学习领域中一种重要的技术,它结合了监督学习和无监督学习的方法。
通过利用少量标记数据和大量无标记数据,半监督聚类可以提供更准确和可靠的聚类结果。
半监督聚类方法旨在解决无标记数据量大、有标记数据量少的问题。
在传统的无监督聚类方法中,只利用无标记数据进行聚类,无法充分利用已有的有标记数据的信息。
而在监督学习中,虽然可以利用有标记数据进行分类或回归任务,但由于标记数据量的限制,很难满足大规模数据的需要。
半监督聚类方法的核心思想是将无标记数据和少量有标记数据的信息结合起来,通过半监督学习的方式进行聚类。
其中最经典的方法之一是S3C(Semi-Supervised Spectral Clustering)算法,它将无标记数据和有标记数据进行低维表示,并通过优化一个目标函数来实现聚类。
S3C算法在处理大规模数据集时具有较高的效率和可扩展性。
另一个常用的半监督聚类方法是Co-training算法,它通过同时训练两个相互独立的分类器来实现聚类。
其中一个分类器使用有标记数据进行训练,另一个分类器使用无标记数据进行训练。
通过交替迭代训练分类器,并利用它们在未标记数据上的一致性进行更新,Co-training算法能够充分利用有标记数据和无标记数据的信息,提高聚类的准确性。
除了以上两种方法,还有许多其他的半监督聚类方法,如基于图的半监督聚类算法、基于聚类原型的半监督聚类算法等。
这些方法根据不同的数据特点和问题需求,采用不同的策略进行模型设计和优化。
在选择合适的半监督聚类方法时,需要综合考虑数据规模、数据特征、标记数据的可用性等因素。
半监督聚类方法在许多领域都有广泛的应用。
例如,在社交网络分析中,可以利用半监督聚类方法对用户进行聚类,发现潜在的社交群体或兴趣群体。
在图像分割中,可以利用半监督聚类方法对图像进行分割,获取更准确的边界和目标提取结果。
在推荐系统中,可以利用半监督聚类方法对用户和物品进行聚类,实现个性化推荐和精准广告投放。
集成梯度特征归属方法-概述说明以及解释
集成梯度特征归属方法-概述说明以及解释1.引言1.1 概述在概述部分,你可以从以下角度来描述集成梯度特征归属方法的背景和重要性:集成梯度特征归属方法是一种用于分析和解释机器学习模型预测结果的技术。
随着机器学习的快速发展和广泛应用,对于模型的解释性需求也越来越高。
传统的机器学习模型通常被认为是“黑盒子”,即无法解释模型做出预测的原因。
这限制了模型在一些关键应用领域的应用,如金融风险评估、医疗诊断和自动驾驶等。
为了解决这个问题,研究人员提出了各种机器学习模型的解释方法,其中集成梯度特征归属方法是一种非常受关注和有效的技术。
集成梯度特征归属方法能够为机器学习模型的预测结果提供可解释的解释,从而揭示模型对于不同特征的关注程度和影响力。
通过分析模型中每个特征的梯度值,可以确定该特征在预测中扮演的角色和贡献度,从而帮助用户理解模型的决策过程。
这对于模型的评估、优化和改进具有重要意义。
集成梯度特征归属方法的应用广泛,不仅适用于传统的机器学习模型,如决策树、支持向量机和逻辑回归等,也可以应用于深度学习模型,如神经网络和卷积神经网络等。
它能够为各种类型的特征,包括数值型特征和类别型特征,提供有益的信息和解释。
本文将对集成梯度特征归属方法的原理、应用优势和未来发展进行详细阐述,旨在为读者提供全面的了解和使用指南。
在接下来的章节中,我们将首先介绍集成梯度特征归属方法的基本原理和算法,然后探讨应用该方法的优势和实际应用场景。
最后,我们将总结该方法的重要性,并展望未来该方法的发展前景。
1.2文章结构文章结构内容应包括以下内容:文章的结构部分主要是对整篇文章的框架进行概述,指导读者在阅读过程中能够清晰地了解文章的组织结构和内容安排。
第一部分是引言,介绍了整篇文章的背景和意义。
其中,1.1小节概述文章所要讨论的主题,简要介绍了集成梯度特征归属方法的基本概念和应用领域。
1.2小节重点在于介绍文章的结构,将列出本文各个部分的标题和内容概要,方便读者快速了解文章的大致内容。
联合训练生成对抗网络的半监督分类方法
光学 精密工程Optics and Precision Engineering第 29 卷 第 5 期2021年5月Vol. 29 No. 5May 2021文章编号 1004-924X( 2021)05-1127-09联合训练生成对抗网络的半监督分类方法徐哲,耿杰*,蒋雯,张卓,曾庆捷(西北工业大学电子信息学院,西安710072)摘要:深度神经网络需要大量数据进行监督训练学习,而实际应用中往往难以获取大量标签数据°半监督学习可以减小深度网络对标签数据的依赖,基于半监督学习的生成对抗网络可以提升分类效果,旦仍存在训练不稳定的问题°为进一步提高网络的分类精度并解决网络训练不稳定的问题,本文提出一种基于联合训练生成对抗网络的半监督分类方法,通 过两个判别器的联合训练来消除单个判别器的分布误差,同时选取无标签数据中置信度高的样本来扩充标签数据集,提高半监督分类精度并提升网络模型的泛化能力°在CIFAR -10和SVHN 数据集上的实验结果表明,本文方法在不同数量的标签数据下都获得更好的分类精度°当标签数量为2 000时,在CIFAR -10数据集上分类精度可达80.36% ;当标签 数量为10时,相比于现有的半监督方法,分类精度提升了约5%°在一定程度上解决了 GAN 网络在小样本条件下的过拟合问题°关键词:生成对抗网络;半监督学习;图像分类;深度学习中图分类号:TP391文献标识码:Adoi :10. 37188/OPE. 20212905.1127Co -training generative adversarial networks forsemi -supervised classification methodXU Zhe , GENG Jie * , JIANG Wen , ZHANG Zhuo , ZENG Qing -jie(School of E lectronics and Information , Northwestern Polytechnical University , Xian 710072, China )* Corresponding author , E -mail : gengjie@nwpu. edu. cnAbstract : Deep neural networks require a large amount of data for supervised learning ; however , it is dif ficult to obtain enough labeled data in practical applications. Semi -supervised learning can train deep neuralnetworks with limited samples. Semi -supervised generative adversarial networks can yield superior classifi cation performance ; however , they are unstable during training in classical networks. To further improve the classification accuracy and solve the problem of training instability for networks , we propose a semi -su pervised classification model called co -training generative adversarial networks ( CT -GAN ) for image clas sification. In the proposed model , co -training of two discriminators is applied to eliminate the distribution error of a single discriminator and unlabeled samples with higher confidence are selected to expand thetraining set , which can be utilized for semi -supervised classification and enhance the generalization of deep networks. Experimental results on the CIFAR -10 dataset and the SVHN dataset showed that the pro posed method achieved better classification accuracies with different numbers of labeled data. The classifi cation accuracy was 80. 36% with 2000 labeled data on the CIFAR -10 dataset , whereas it improved by收稿日期:2020-11-04;修订日期:2021-01-04.基金项目:装备预研领域基金资助项目(No. 61400010304);国家自然科学基金资助项目(No. 61901376)1128光学精密工程第29卷about5%compared with the existing semi-supervised method with10labeled data.To a certain extent, the problem of GAN overfitting under a few sample conditions is solved.Key words:generative adversarial networks;semi-supervised learning;image classification;deep learning1引言图像分类作为计算机视觉领域最基础的任务之一,主要通过提取原始图像的特征并根据特征学习进行分类[11o传统的特征提取方法主要是对图像的颜色、纹理、局部特征等图像表层特征进行处理实现的,例如尺度不变特征变换法[21,方向梯度法[31以及局部二值法[41等。
纹理物体缺陷的视觉检测算法研究--优秀毕业论文
摘 要
在竞争激烈的工业自动化生产过程中,机器视觉对产品质量的把关起着举足 轻重的作用,机器视觉在缺陷检测技术方面的应用也逐渐普遍起来。与常规的检 测技术相比,自动化的视觉检测系统更加经济、快捷、高效与 安全。纹理物体在 工业生产中广泛存在,像用于半导体装配和封装底板和发光二极管,现代 化电子 系统中的印制电路板,以及纺织行业中的布匹和织物等都可认为是含有纹理特征 的物体。本论文主要致力于纹理物体的缺陷检测技术研究,为纹理物体的自动化 检测提供高效而可靠的检测算法。 纹理是描述图像内容的重要特征,纹理分析也已经被成功的应用与纹理分割 和纹理分类当中。本研究提出了一种基于纹理分析技术和参考比较方式的缺陷检 测算法。这种算法能容忍物体变形引起的图像配准误差,对纹理的影响也具有鲁 棒性。本算法旨在为检测出的缺陷区域提供丰富而重要的物理意义,如缺陷区域 的大小、形状、亮度对比度及空间分布等。同时,在参考图像可行的情况下,本 算法可用于同质纹理物体和非同质纹理物体的检测,对非纹理物体 的检测也可取 得不错的效果。 在整个检测过程中,我们采用了可调控金字塔的纹理分析和重构技术。与传 统的小波纹理分析技术不同,我们在小波域中加入处理物体变形和纹理影响的容 忍度控制算法,来实现容忍物体变形和对纹理影响鲁棒的目的。最后可调控金字 塔的重构保证了缺陷区域物理意义恢复的准确性。实验阶段,我们检测了一系列 具有实际应用价值的图像。实验结果表明 本文提出的纹理物体缺陷检测算法具有 高效性和易于实现性。 关键字: 缺陷检测;纹理;物体变形;可调控金字塔;重构
Keywords: defect detection, texture, object distortion, steerable pyramid, reconstruction
II
semi-supervised action recognition代码解读
semi-supervised action recognition代码解读“semi-supervised action recognition代码解读”这句话的意思是解读或分析用于半监督动作识别的代码。
对于“semi-supervised action recognition代码解读”的举例,由于具体的代码实现会因研究团队或个人而异,这里只能提供一些相关的半监督学习在动作识别中的应用案例或概念性示例:1.预训练与微调(Pretraining and Fine-tuning):首先使用大量有标签的数据进行预训练,然后使用少量有标签的数据对模型进行微调。
这种策略通常在迁移学习中使用,其中预训练模型已经在大量数据上进行了训练。
2.伪标签(Pseudo Labeling):给无标签数据添加伪标签,然后用这些带标签的数据来训练模型。
一种常见的做法是使用已经训练好的模型对无标签数据进行预测,并把这些预测结果作为伪标签。
3.生成对抗网络(Generative Adversarial Networks, GANs):GANs可以生成新的数据,这些数据可以用于扩充训练集。
在动作识别中,GANs 可以生成模拟的运动序列,这些序列可以与真实的有标签数据一起用于训练模型。
4.自监督学习(Self-supervised Learning):在这种方法中,模型通过学习如何根据输入数据的某种内部表示生成目标输出,来间接地学习预测任务。
例如,模型可以学习根据视频帧预测未来的帧。
总结来说,“semi-supervised action recognition代码解读”是指对用于半监督动作识别的代码进行深入分析和理解。
这需要理解半监督学习的原理以及在动作识别中的具体应用策略,同时也要具备相关的编程和机器学习知识。
基于标记信息的新增型半监督图像分割技术研究(IJIGSP-V4-N1-7)
I.J. Image, Graphics and Signal Processing, 2012, 1, 51-56Published Online February 2012 in MECS (/) DOI: 10.5815/ijigsp.2012.01.07A New Enhanced semi supervised image segmentation using Marker as Prior information.L.Sankari 11.Asst Professor, Department of Computer Science,Sri Ramakrishna College of Arts and Science for women, Coimbatore, IndiaE-mail: sankarivnm@Dr.C.Chandrasekar 22.Associate Professor , Dept of Computer Science, Periyar University, Salem, IndiaE-mail: ccsekar@Abstract — In Recent days Semi supervised image segmentation techniques play a noteworthy role in image processing. Semi supervised image segmentation needs both labeled data and unlabeled data. It means that a Small amount of human assistance or Prior information is given during clustering process. This paper discusses an enhancedsemi supervised image segmentation method from labeledimage. It uses both a background selection marker and foreground object selection marker separately. The EM (Expectation Maximization) algorithm is used for clusteringalong with must link constraints. The proposed method isapplied for natural images using MATLAB 7. Thus the proposed method extracts Object of Interest (OOI) from OONI (Object of Not Interest) efficiently and the experimental results are compared with Standard K Means and EM Algorithm also. The results show that the proposedsystem gives better results than the other two methods. Itmay also be suitable for object extraction from natural images and medical image analysis.Index Terms — Semi supervised image segmentation - prior knowledge – constrained clustering.I. I NTRODUCTIONImage segmentation is the method of dividing an image into different regions such that each region is homogeneous. By partitioning an image into a set of disjoint segments, image segmentation leads to more compact image representation. As the central step in computer vision and image understanding, image segmentation has been extensively investigated in the past decades, with a large number of image segmentation algorithms. There are number of segmentation techniques exist in the literature. But no single method can be considered best for all kind of images. Most of the techniques are being pretty ad hoc in nature.A)Need of semi supervised image segmentation:Semi supervised method is the combination of both supervised (classification) and unsupervised (clustering) and classification concept. Before performing clustering some prior knowledge is given. If the algorithm is purely an unsupervised (clustering) algorithm it will not show good result for all kind of images since an iterativeclustering algorithms commonly do not lead to optimal cluster solutions. Partitions that are generated by these algorithms are known to be sensitive to the initial partitions that are fed as an input parameter. A “good” selection of initial seed is an important clustering problem. Likewise the classification algorithm will not give best solution since the result depends on type of classifier. So this paper discuss about combination ofthese two methods called ‘semi supervised model’ for image segmentation. The following section discuss about semi supervised model[13][14]. b.)Semi supervised clustering: During clustering process a small amount of priorknowledge is given either as labels or constraints or any other prior information. The following figure explains about the semi supervised clustering model[15][16][17].Figure 1. Semi supervised clustering modelIn the above figure, the three clusters are formedusing certain constraints or prior information. Besides the similarity information which is used as color knowledge, the other kind of knowledge is also available by either pair wise (must-link or cannot-link) constraints between data items or class labels for some items. Instead of simply using this knowledge for the external validation of the results of clustering, one can imagine letting it “guide” or “adjust” the clustering process, i.e. provide a limited form of supervision. There are two ways to provide information for semi supervised clustering.1. Search based.2. Similarity based.52 A New Enhanced semi supervised image segmentation using Marker as Prior information1.Search based :The clustering algorithm itself is modified so that user-provided constraints or labels can be used to bias the search for an appropriate clustering. This can be done in several ways, such as by performing a transitive closure of the constraints and using them to initialize clusters [4], by including the cost function a penalty for lack of compliance with the specified constraints [10][11], or by requiring constraints to be satisfied during cluster assignment in the clustering process [12].2.Similarity based:There are several similarity measures existing in the domain. Any one similarity measure is adapted[7][8][9] so that the given constraints can be easily satisfied.In this paper semi supervised image segmentation with minimum user label is discussed. Instead of selecting some sample pixels with mouse clicks [1], a group of pixels are selected as a region using mouse. A pixel which has the same color and intensity as in the selected marker region (by mouse selection) will come under one cluster and others pixels will not be. This concept is given in detail in the following sections.I.P REVIOUS R ELATED W ORKSRecently there are many papers focusing the importance of semi supervised image segmentation. Among them a few papers are analyzed. According to paper [3], the semi-supervised C-Means algorithm is introduced in this paper to solve three problems in the domains like choosing and validating the correct number of clusters, Insuring that algorithmic labels correspond to meaningful physical labels tendency to recommend solutions that equalize cluster populations. The algorithm used MRI brain image for segmentation.In this [4] paper, how the popular k-means Clustering algorithm can be modified to make use of the available information with some artificial constraints. This method was implemented for six datasets and it has showed good improvement in clustering accuracy. This method was also applied to the real world problem of automatically detecting road lanes from GPS data and observed dramatic increases in performance.In paper [5] a novel semi-supervised Fuzzy C-means algorithm is proposed. A set called as seed set which contains a small amount of labeled data is used. First, an initial partition in the seed set is done, then use the center of each partition as the cluster center and optimize the objective function of FCM using EM algorithm. Experiments results show that the defect of fuzzy c-means is avoided that is sensitive to the initial centers partly and give much better partition accuracy.In Paper [6], Semi-supervised clustering uses a small amount of labeled data to aid and bias the clustering of unlabeled data. Here labeled data is used to generate initial seed clusters along with the constraints generated from labeled data to guide the clustering process. It introduces two semi-supervised variants of KMeans clustering that can be viewed as instances of the EM algorithm, where labeled data provides prior information about the conditional distributions of hidden category labels. Experimental results demonstrate the advantages of these methods over standard random seeding and COP-KMeans, a previously developed semi-supervised clustering algorithm.This paper [12] focuses on semi-supervised clustering, where the goal is to cluster a set of data-points given a set of similar/dissimilar examples. Along with instance-level equivalence (similar pairs belong to the same cluster) and in-equivalence constraints (dissimilar pairs belong to different clusters) feature space level constraints (how similar are two regions in feature space) are also used for getting final clustering. This task is accomplished by learning distance metrics (i.e., how similar are two regions in the feature space?) over the feature space which that are guided by the instance-level. A bag of words models, which are nothing but code words (or visual-words) are used as building blocks. Our proposed technique learns non-parametric distance metrics over codewords from these equivalence (and optionally, in-equivalence) constraints, which are then able to propagate back to compute a dissimilarity measure between any two points in the feature space. Thus this work is more advanced than previous works. First, unlike past efforts on global distance metric learning which try to transform the entire feature space so that similar pairs are close. This transformation is non-parametric and thus allows arbitrary non-linear deformations of the feature space. Second, while most Mahalanobis metrics are learnt using Semi-Definite Programming (SDP), this paper discuss about a Linear Program (LP) and in practice, is extremely fast. Finally, Corel image datasets (MSRC, Corel) where ground-truth segmentation is available. Over all, this idea gives improved clustering accuracy.II.METHODOLOGYIn this paper, ground truth image is taken with proper class labels. The Octree color quantization algorithm is applied to get the reduced colors. This color table is integrated with must link constraints for the given image using EM algorithm.A group of pixels as a region must be selected separately for object of interest (OOI) and object of not interest (OONI) for back ground and foreground from an input image. Here OOI and OONI refer foreground and back ground objects respectively. Find the smallestA New Enhanced semi supervised image segmentation using Marker as Prior information 53distance for each and every pixel in the marker to its neighboring pixels using Mahalanobi’s formula.---- (1) Wherex row vector which contains the pixels inside a marker and the other area.COV represents sample covariance matrix.If this distance is less than the assumed threshold value ( 0.5) then find the exact color index. If the index values are same then assign to the same region otherwise (same value & different class index) delete any one index and group. The above process is repeated for all pixels in the marker. Finally the object of interest (OOI) is clearly segmented than the other methods.Fig 1. Work Flow Diagram of the proposed work The algorithm based on proposed idea is given below:beled image is taken as input image.2.Octree color quantization to reduce the colorsand store with class label.3.Mark object of interest (OOI) and object of notinterest (OONI) using mouse.4.Cluster quantized color table using EMAlgorithm interated with must-link constraints.Must link constraints means that if any pointbelongs to same region, group them into oneregion otherwise need not group.5. Repeats steps 4 and 5 for each point in object ofinterest(OOI) and object of not interest(OONI)6. Let X be a selection with N points7. For each point in X na.Let Y = x i (x ∈ X)b.Consider its neighboring coordinates withina rectangle, R of size 3 x 3.c.Calculate distance vector usingMahalanobis distance d(R i, Y)d.Find minimum distance d(R i, Y).e.If d < threshold then find Color index of Yand R ii.If belong to same label, group intosame regionii.If belong to different labels buthave same color then delete R iIII.RESULTS AND DISCUSSIONSFigure 2:Here three different labeled images are taken as input .These three images are partially separated images.Fig 2.a Fig 2.bFig 2.cFigure 3:Marking of OOI (Object of interest) and OONI (Objectof not interest) for all the above three figures using blueand green color.Object of intereste (Green Color mark) Object of Not interest (Blue color mark )54A New Enhanced semi supervised image segmentation using Marker as Prior informationFig 3.a Fig 3.Fig 3.cFigure 4:Resullt of proposed method for the above three figures.Fig 4.aFig 4.bFig 4.c Figure 5:Labeled image and different marker selection.Fig 5.a Fig.5.bFig 5.cFigure 6:Results using proposed idea for the above figure 5 usingdifferent marker selection..Fig 6.a Fig 6.b(From these two pictues the object is segmented properly)Fig.6.cThe object of interest is not segmented properlyFigure 7:Results of input images using K Means method.Fig 7.aFig 7.bFig 7.cFigure 8:Result of input images using Standard EM method.Fig 8.aFig.8.bA New Enhanced semi supervised image segmentation using Marker as Prior information 55Fig 8.cThe time taken for getting the result of the input image1 (Bird) using proposed method is noted down in the table.Table 1. Performance Table for image1It shows that the proposed method has taken more time than K means and less time than EM method. The performance is shown in the chart given below.Figure 9. Performance chartUsing proposed idea any labeled image is taken as input. The two markings are given like OOI and OONI(object of interest and object of not interest) with different colors. There are two different marker selections are shown in the figure 3 and 5. According to the marker selection given, the object is extracted from its background. Compared to the results in Figure 4, the segmentation results are better in figure 6 for some pictures. This is because the quality of segmentation depends on marker selection. These proposed results are also compared with K means and Standard EM Algorithm. In Figure7 and 8, the K Means & EM Algorithms do not segment the foreground of the object accurately. But the proposed method results are better in figure 6 and the object of interest is also separatedaccurately.IV. CONCLUSIONThe above result shows that the proposed semi supervised segmentation extracts the object of interest precisely. But the result of segmentation depends on the marker selection on left and right side of the image. If the marker selection is not given properly, the result of segmentation will not be good. This may be eliminated in future by adding certain other constraints for texture, color etc.REFERENCES[1] Yuntao Qian, Wenwu Si, IEEE, ” Semi-supervised Color Image Segmentation Method”-2005[2] Yanhua Chen, Manjeet Rege, Ming Dong, JingHua FarshadFotouhi Department of Computer Science Wayne State UniversityDetroit, MI48202 “Incorporating User Provided Constraints into Document Clustering”,2009[3] Amine M. Bensaid, Lawrence O. Hall Department ofComputer Science and Engineering Universit of South Florida Tampa, Partially Supervised Clustering for Image Segmentation -1994[4] Kiri Wagstaff, Claire Cardie ,Seth Rogers &StefanSchroedl ,”Constrained K-means Clustering with Background Knowledge-2001” [5]Kunlun Li; Zheng Cao; Liping Cao; Rui Zhao; Coll. Of Electron. & Inf. Eng., Hebei Univ., Baoding, China ,“A novel semi-supervised fuzzy c-means clustering method”2009,IEEE Explorer[6]Sugato Basu , Arindam Banerjee , R. Mooney , In proceedings of 19th international conference on Machine Learning(ICML-2002),Semi-supervised Clustering by Seeding (2002)[7] David Cohn, Rich Caruana, and Andrew McCallum. Semi-supervised clustering with user feedback, 2000.[8]Dan Klein, Sepandar D. Kamvar, and Christopher D. Manning. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In Proceedings of the 19th International Conference on Machine Learning, pages 307–314. Morgan Kaufmann Publishers Inc., 2002.[9]Eric P. Xing, Andrew Y. Ng, Michael I. Jordan, and Stuart Russell. Distance metric learning with application to clustering with sideinformation. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 505–512, Cambridge, MA, 2003. MIT Press.[10] A. Demiriz, K. Bennett, and M. Embrechts. Semi-supervised clustering using genetic algorithms. In C. H. Dagli et al., editor, Intelligent Engineering Systems Through Artificial Neural Networks 9, pages 809–814. ASME Press, 1999. [11] [11] K.Wagstaff and C. Cardie. Clustering with instance-level constraints. In Proceedings of the 17th International Conference on Machine Learning, pages 1103–1110, 2000. [12]Dhruv Batra, Rahul Sukthankar and Tsuhan Chen,“Semi-Supervised Clustering via LearntCodeword Distances”,2008.[13]Richard Nock, Frank Nielsen,Semi supervisedstatistical region refinement for color image segmentation,2005S NoMethodsTime in sec.1 K Means 6.32142 EM 40.22405 3Proposed13.0559856 A New Enhanced semi supervised image segmentation using Marker as Prior information [14]` Jan Kohout,Czech Technical University in PragueFaculty of Electrical Engineering,Supervisor: Ing .Jan Urban,Semi supervised image segmentation ofbiological samples-PPT,July 29, 2010[15] Ant´onio R. C. Paiva1 and Tolga Tasdizen,Fast SemiSupervised image segmentation by novelty selection,2009[16] Kwangcheol Shin and Ajith Abraham,Two Phase Semi-supervised ClusteringUsing Background Knowledge,2006.[17] M´ario A. T. Figueiredo, Dong Seon Cheng, Vittorio Murino,Clustering Under Prior Knowledge with Application toImage Segmentation,2005.Biography:Mrs L.Sankari is currently workingas an Assistant Professor in theDepartment of Computer Science, SriRamakrishna College of Arts andScience for women, Coimbatore- 641044, Tamilnadu, India. She is about16 years of teaching experience. She has published fournational and four international research papers. Her researchinterest area includes image processing, Data mining , Patternclassification and optimization techniques.Dr. C. Chandrasekar received hisPh.D. degree from Periyar University,Salem. He has been working asAssociate Professor at Dept. ofComputer Science, Periyar University,Salem – 636 011, TamilNadu, India.His research interest includes Wirelessnetworking, Mobile computing, Computer Communication andNetworks. He was a Research guide at various University inIndia. He has been published more than 50 research papers atvarious National/ International Journals.。
semi-supervised deep continuous learning 笔记
semi-supervised deep continuous
learning 笔记
半监督学习(semi-supervised learning)研究的是如何充分利用已标记数据和大量未标记数据来提升学习性能的方法。
假设数据取样自样本空间$X$和标记空间$Y$上的未知分布$D$。
在半监督学习中,给定标记数据集$L$和未标记数据集$U$,其中$xi∈X,yi∈Y$,且通常有$l<<m$,目标为学习获得$H:X→V$。
简单起见,考虑二分类任务,即$Y=\{-1,+1\}$。
未标记数据可以帮助揭示潜在数据分布的一些信息,进而帮助构建具有更强泛化能力的模型。
半监督学习方法的基础就在于如何构建假设来刻画未标记数据所揭示的数据分布和类别标记信息之间的联系。
两类基本假设包括聚类假设(cluster assumption)和流形假设(manifold assumption)。
聚类假设假设相似的输入具有相似的类别标记;流形假设假设所有的数据都在一个低维流形中,未标记数据有助于构造此流形。
聚类假设关注分类任务,流形假设可被应用到分类以外的其他学习任务上。
另外,直推学习和半监督学习紧密相关,两者的区别在于对测试数据的假设。
直推学习采用封闭世界假设(closed-world assumption),即未标记数据就是测试数据,事先已知;半监督学习是基于开放世界假设(open-world assumption),即测试数据未知,未标记数据未必是测试数据。
因此,直推学习可以看成一种特殊的半监督学习。
总的来说,半监督学习是一个不断发展和演进的领域,它在机器学习和数据挖掘领域中具有重要的研究价值和应用前景。
dbscan算法的基本概念
dbscan算法的基本概念DBSCAN(Density-Based Spatial Clustering of Applications with Noise)是一种常用的密度聚类算法,用于对数据集进行聚类分析。
它将数据根据样本之间的密度关系划分为不同的簇,并且可以有效地识别出噪声点。
本文将逐步回答DBSCAN算法的基本概念,以便更好地理解其原理和应用。
1. 什么是密度聚类?密度聚类是一种基于样本之间密度关系的聚类方法,与传统的基于距离的聚类方法不同。
它认为,在聚类中,样本点在高密度区域周围相对较近,而在低密度区域之间相对较远。
密度聚类算法通过评估样本点周围的邻域密度来确定样本点的聚类情况。
2. DBSCAN算法的基本原理是什么?DBSCAN算法基于以下两个基本概念:核心点(Core Point)和密度直达(Density-Reachability)。
核心点是指在给定的半径ε内至少包含最小样本数MinPts的样本点。
密度直达是指如果点A在核心点B的ε邻域内,并且B是核心点,则A可以从B密度直达,即可以通过一系列核心点的连接到达该点。
DBSCAN算法的基本思想是从任意一个未被访问的点开始,找出它的密度可达点,并将这些点加入同一个簇中。
然后,对这些点进行进一步的拓展,找出其密度可达点,并继续加入簇中,直到没有更多的点可以加入。
这样,一个簇就被发现了。
然后,寻找下一个未被访问的点,并重复以上过程,直到所有点都被访问。
3. DBSCAN算法的基本步骤是什么?DBSCAN算法的基本步骤包括以下几个步骤:步骤1:选择一个未被访问的点P。
步骤2:计算点P的ε邻域内的所有点个数。
步骤3:如果点P的ε邻域内的点个数大于等于MinPts,则将点P标记为核心点,并为其创建一个新的簇。
步骤4:找出点P的ε邻域内的所有点,并判断是否为核心点,如果是,则将其加入到簇中。
步骤5:重复步骤4,直到无法再加入更多的点。
步骤6:重复步骤1-5,直到所有点都被访问。
深度强化学习算法及其在无监督去噪中的应用
深度强化学习算法及其在无监督去噪中的应用汇报人:2023-11-20•深度强化学习算法介绍•深度强化学习算法应用•无监督去噪算法介绍目•深度强化学习算法在无监督去噪中的应用•未来展望与挑战录深度强化学习算法介绍强化学习定义01强化学习问题定义02强化学习问题分类03深度神经网络卷积神经网络DQN是一种基于Q-learning的深度强化学习算法,它可以解决具有离散动作空间的强化学习问题。
DQN通过使用神经网络来估计Q值,从而实现对复杂环境的处理和决策。
Proximal Policy Optimization (PPO)PPO是一种基于策略的深度强化学习算法,它可以解决具有连续动作空间的强化学习问题。
PPO通过使用神经网络来估计策略,并使用Actor-Critic结构来更新策略,从而实现对复杂环境的处理和决策。
Double Deep Q-Network (DDQN)DDQN是一种改进的DQN算法,它通过使用两个神经网络来估计Q值,从而解决DQN中存在的稳定性问题。
Asynchronous Advantage Actor-Critic (A3C)A3C是一种基于策略的深度强化学习算法,它可以解决多智能体任务的问题。
A3C通过使用多个并行智能体来收集数据并更新策略,从而实现对复杂环境的处理和决策。
深度强化学习算法应用传感器数据融合机器人通过多个传感器获取环境信息,深度强化学习算法可以帮助融合这些数据,提高机器人的感知能力,从而更好地适应环境变化。
机器人运动控制深度强化学习算法可以用于控制机器人的运动,使其能够自主地执行一系列复杂的动作,实现精准的目标追踪、抓取和放置等任务。
实时决策与规划深度强化学习算法还可以用于实时决策和路径规划,使机器人在动态环境中能够快速做出最优决策,适应各种复杂场景。
在机器人控制中的应用策略学习游戏中的角色需要执行各种复杂的动作,深度强化学习算法可以用于生成这些动作,提高游戏的真实感和流畅度。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Proceedings of the Ninth AAAI/SIGART Doctoral Consortium,pp. 979-980, San Jose, CA, July 2004Semi-supervised Clustering with Limited Background KnowledgeSugato BasuEmail:sugato@Address:Department of Computer Sciences,University of Texas at Austin,Austin,TX-78712,USAThesis GoalIn many machine learning domains,there is a large supply of unlabeled data but limited labeled data,which can be expen-sive to generate.Consequently,semi-supervised learning, learning from a combination of both labeled and unlabeled data,has become a topic of significant recent interest.Our research focus is on semi-supervised clustering,which uses a small amount of supervised data in the form of class labels or pairwise constraints on some examples to aid unsuper-vised clustering.Semi-supervised clustering can be either constraint-based,i.e.,changes are made to the clustering ob-jective to satisfy user-specified labels/constraints,or metric-based,i.e.,the clustering distortion measure is trained to sat-isfy the given labels/constraints.Our main goal in this thesis is to study constraint-based semi-supervised clustering algo-rithms,integrate them with metric-based approaches,char-acterize some of their properties and empirically validate our algorithms on different domains,e.g.,text processing and bioinformatics.BackgroundExisting methods for semi-supervised clustering fall into two general approaches that we call constraint-based and metric-based methods.In constraint-based approaches,the clustering algorithm itself is modified so that user-provided labels or constraints are used to get a more appropriate clustering.Previous work in this area includes modifying the clustering objective func-tion so that it includes a term for satisfying specified con-straints(Demiriz,Bennett,&Embrechts1999),and enforc-ing constraints to be satisfied during the cluster assignment in the clustering process(Wagstaff et al.2001).In metric-based approaches,an existing clustering al-gorithm that uses a particular distortion measure is em-ployed;however,the measure isfirst trained to satisfy the labels or constraints in the supervised data.Several dis-tortion measures have been used for metric-based semi-supervised clustering,including Jensen-Shannon divergence trained using gradient descent(Cohn,Caruana,&McCal-lum2003),Euclidean distance modified by a shortest-path algorithm(Klein,Kamvar,&Manning2002),or Maha-Copyright c 2004,American Association for Artificial Intelli-gence().All rights nobis distances trained using convex optimization(Bar-Hillel et al.2003;Xing et al.2003).However,metric-based and constraint-based approaches to semi-supervised clustering have not been adequately compared in previous work,and so their relative strengths and weaknesses are largely unknown.An important domain that motivates the semi-supervised clustering problem is the clustering of genes for functional prediction.For most organisms,only a limited number of genes are annotated with their functional pathways,with the majority of the genes still having unknown functions. Categorization of these genes into functional groups using gene microarray data,phylogenetic profiles,etc.is a nat-ural semi-supervised clustering problem.Clustering(with model selection to choose the right number of clusters)is more well suited to this domain than classification,since the number of functional classes is not known a priori.More-over background knowledge,available in the form of func-tional pathway labels(KEGG,GO)or constraints over some of the genes(DIP),could easily be incorporated as supervi-sion to improve the clustering accuracy.ProgressIn ourfirst work,we showed how supervision in the form of labeled data can be incorporated into clustering(Basu, Banerjee,&Mooney2002).The labeled data were used to generate seed clusters for initializing model-based clustering algorithms,and constraints generated from the labeled data were used to guide the clustering process towards a parti-tioning similar to the user-specified labels.We showed that the K-Means algorithm is equivalent to an EM algorithm on a mixture of K Gaussians under assumptions of iden-tity covariance of the Gaussians,uniform mixture compo-nent priors and expectation under a particular type of condi-tional distribution.This underlying model helps us to prove convergence guarantees for the proposed label-based semi-supervised clustering algorithms.Next,we showed that semi-supervised clustering with pairwise must-link and cannot-link constraints has an under-lying probabilistic model–a Hidden Markov Random Field (HMRF)(Basu,Banerjee,&Mooney2004).In this work, we also outlined a method for selecting maximally infor-mative constraints in a query-driven framework for pairwise constrained clustering.In order to maximize the utility ofthe limited supervised data available in a semi-supervised setting,supervised training examples should be,if possible, actively selected as maximally informative ones rather than chosen at random.This would imply that fewer constraints will be required to significantly improve the clustering accu-racy.To this end,a new algorithm was developed to actively select good pairwise constraints for semi-supervised cluster-ing,using an active learning strategy based on farthest-first traversal.The proposed scheme has two phases:(a)explore the given data to get pairwise disjoint non-null neighbor-hoods,each belonging to a different cluster in the underly-ing true categorization of the data,within a small number of queries,and(b)consolidate this cluster structure using the remaining queries,to get better centroid estimates.In recent work,we have shown that the HMRF clus-tering model is able to incorporate any Bregman diver-gence(Banerjee et al.2004)as the clustering distortion measure,which allows using the framework with such com-mon distortion measures as KL-divergence,I-divergence, and parameterized squared Mahalanobis distance.Addition-ally,cosine similarity can also be used as the clustering dis-tortion measure in the framework,which makes it useful for directional datasets(Basu,Bilenko,&Mooney2004). For all such measures,minimizing the semi-supervised clus-tering objective function becomes equivalent tofinding the maximum a posteriori probability(MAP)configuration of the underlying HMRF.We have also developed a new semi-supervised cluster-ing approach that unifies constraint-based and metric-based techniques in an integrated framework(Bilenko,Basu,& Mooney2004).This algorithm trains the distortion measure with each clustering iteration,utilizing both unlabeled data and pairwise constraints.The formulation is able to learn individual metrics for each cluster,which permits clusters of different shapes.This work also explores metric learning for feature generation(in contrast to simple feature weight-ing),which we empirically demonstrate to outperform cur-rent state-of-the-art metric learning algorithms,under cer-tain conditions.In all these projects,experiments have been performed on both low dimensional UCI datasets and high dimensional text data sets,using KMeans and EM as the baseline cluster-ing algorithms.Proposed ResearchIn future,we want to study the following aspects of semi-supervised clustering,in decreasing order of priority: (1)Semi-supervised approaches forfinding overlapping clusters in the data.This is especially relevant for gene clus-tering in the bioinformatics domain,since genes often be-long to multiple functional pathways.(2)The feasibility of semi-supervising other clustering al-gorithms,e.g.,spectral clustering,agglomerative clustering, etc.We are currently exploring semi-supervised versions of kernel-based clustering,which would be useful for datasets that are not linearly separable.(3)Application of the semi-supervised clustering model to other domains apart from UCI datasets and text.We are currently focusing on two domains:(a)search result cluster-ing of web search engines,e.g.,Google,and(b)clustering of gene microarray data in bioinformatics.(4)Effect of noisy or probabilistic supervision in pairwise constrained clustering.This study will be especially impor-tant for deploying our proposed semi-supervised clustering algorithms to practical settings,where background knowl-edge would be in general noisy.(5)Model selection using both unsupervised data and the limited supervised data,for automatic selection of number of clusters in semi-supervised clustering.Most model se-lection criteria in clustering are only based on unsupervised data–we want to explore whether supervised data available in the form of labels or constraints can be used to select the number of clusters more effectively.(6)Theoretical study of the relative benefits of supervised and unsupervised data in semi-supervised clustering,similar to the analysis of(Ratsaby&Venkatesh1995).ReferencesBanerjee,A.;Merugu,S.;Dhillon,I.S.;and Ghosh,J.2004. Clustering with Bregman divergences.In Proc.of the2004SIAM Intl.Conf.on Data Mining(SDM-04).Bar-Hillel,A.;Hertz,T.;Shental,N.;and Weinshall,D.2003. Learning distance functions using equivalence relations.In Proc. of20th Intl.Conf.on Machine Learning(ICML-2003),11–18. Basu,S.;Banerjee,A.;and Mooney,R.J.2002.Semi-supervised clustering by seeding.In Proc.of19th Intl.Conf.on Machine Learning(ICML-2002),19–26.Basu,S.;Banerjee,A.;and Mooney,R.J.2004.Active semi-supervision for pairwise constrained clustering.In Proc.of the 2004SIAM Intl.Conf.on Data Mining(SDM-04).Basu,S.;Bilenko,M.;and Mooney,R.J. 2004.A probabilistic framework for semi-supervised clustering.In submission,available at /˜ml/publication. Bilenko,M.;Basu,S.;and Mooney,R.J.2004.Integrating constraints and metric learning in semi-supervised clustering.In Proc.of21st Intl.Conf.on Machine Learning(ICML-2004). Cohn,D.;Caruana,R.;and McCallum,A.2003.Semi-supervised clustering with user feedback.Technical Report TR2003-1892, Cornell University.Demiriz,A.;Bennett,K.P.;and Embrechts,M.J.1999.Semi-supervised clustering using genetic algorithms.In Artificial Neu-ral Networks in Engineering(ANNIE-99),809–814.Klein,D.;Kamvar,S.D.;and Manning,C.2002.From instance-level constraints to space-level constraints:Making the most of prior knowledge in data clustering.In Proc.of19th Intl.Conf.on Machine Learning(ICML-2002),307–314.Ratsaby,J.,and Venkatesh,S.S.1995.Learning from a mixture of labeled and unlabeled examples with parametric side informa-tion.In Proc.of the8th Annual Conf.on Computational Learning Theory,412–417.Wagstaff,K.;Cardie,C.;Rogers,S.;and Schroedl,S.2001. Constrained K-Means clustering with background knowledge.In Proc.of18th Intl.Conf.on Machine Learning(ICML-2001),577–584.Xing,E.P.;Ng,A.Y.;Jordan,M.I.;and Russell,S.2003.Dis-tance metric learning,with application to clustering with side-information.In Advances in Neural Information Processing Sys-tems15,505–512.Cambridge,MA:MIT Press.。