Abstract Localized Support Vector Machine and Its Efficient Algorithm

合集下载

Hessian Regularized Support Vector Machines for Mobile Image Annotation on the Cloud

D. Tao and L. Jin are with School of Electronic and Information Engineering, South China University of Technology, GuangZhou 510640, Guangdong, P. R. China (e-mail: dapeng.tao@; lianwen.jin@) W. Liu is with Control Engineering, China University of Petroleum, P.R. China (e-mail: liuwf@). X. Li is with the Center for OPTical IMagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, Shaanxi, P. R. China (e-mail: xuelong_li@). Copyright (c) 2008 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@.
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@.

Support Vector Machines and Kernel Methods

Slack variables
4 3.5 3 2.5 2 1.5 1 0.5 0 −0.5 −3
−2
−1
0
1
2
3
If not linearly separable, add slack variable s ≥ 0 y (x · w + c) + s ≥ 1 Then
i si is total amount by which constraints are violated i si as small as possible
So try to make
Perceptron as convex program
The ﬁnal convex program for the perceptron is: min
i si subject to
(y i x i ) · w + y i c + s i ≥ 1 si ≥ 0 We will try to understand this program using convex duality
10 8
6
4
2
0
−2
−4
−6
−8
−10 −10
−8
−6
−4
−2
0
2
4
6
8
10
Classiﬁcation problem
100
10
% Middle & Upper Class
. . .
95
8
6
90
4
85
2
80
0
75
−2
70
−4
−6
65
X

An Introduction to Support Vector Machines

16
9/13/2013
Input space 802. Prepared by Martin LawFeature space CSE
Example Transformation

Define the kernel function K (x,y) as Consider the following transformation

9/13/2013
CSE 802. Prepared by Martin Law
7
The Optimization Problem the problem to its dual

This is a quadratic programming (QP) problem

Kernel methods, large margin classifiers, reproducing kernel Hilbert space, Gaussian process
CSE 802. Prepared by Martin Law 3
9/13/2013
Two Class Problem: Linear Separable Case

Global maximum of ai can always be found

w can be recovered by
CSE 802. Prepared by Martin Law 8
9/13/2013
Characteristics of the Solution

Many of the ai are zero
Class 2
Many decision boundaries can separate these two classes Which one should we choose?

Fuzzy support vector machines

Fuzzy Support Vector Machines Chun-Fu Lin and Sheng-De WangAbstract—A support vector machine(SVM)learns the decision surface from two distinct classes of the input points.In many appli-cations,each input point may not be fully assigned to one of these two classes.In this paper,we apply a fuzzy membership to each input point and reformulate the SVMs such that different input points can make different constributions to the learning of deci-sion surface.We call the proposed method fuzzy SVMs(FSVMs). Index Terms—Classification,fuzzy membership,quadratic pro-gramming,support vector machines(SVMs).I.I NTRODUCTIONT HE theory of support vector machines(SVMs)is a new classification technique and has drawn much attention on this topic in recent years[1]–[5].The theory of SVM is based on the idea of structural risk minimization(SRM)[3].In many ap-plications,SVM has been shown to provide higher performance than traditional learning machines[1]and has been introduced as powerful tools for solving classification problems.An SVM first maps the input points into a high-dimensional feature space and finds a separating hyperplane that maximizes the margin between two classes in this space.Maximizing the margin is a quadratic programming(QP)problem and can be solved from its dual problem by introducing Lagrangian multi-pliers.Without any knowledge of the mapping,the SVM finds the optimal hyperplane by using the dot product functions in feature space that are called kernels.The solution of the optimal hyperplane can be written as a combination of a few input points that are called support vectors.There are more and more applications using the SVM tech-niques.However,in many applications,some input points may not be exactly assigned to one of these two classes.Some are more important to be fully assinged to one class so that SVM can seperate these points more correctly.Some data points cor-rupted by noises are less meaningful and the machine should better to discard them.SVM lacks this kind of ability.In this paper,we apply a fuzzy membership to each input point of SVM and reformulate SVM into fuzzy SVM(FSVM) such that different input points can make different constribu-tions to the learning of decision surface.The proposed method enhances the SVM in reducing the effect of outliers and noises in data points.FSVM is suitable for applications in which data points have unmodeled characteristics.The rest of this paper is organized as follows.A brief review of the theory of SVM will be described in Section II.The FSVMManuscript received January25,2001;revised August27,2001.C.-F.Lin is with the Department of Electrical Engineering,National Taiwan University,Taiwan(e-mail:genelin@.tw).S.-D.Wang is with the Department of Electrical Engineering,National Taiwan University,Taiwan(e-mail:sdwang@.tw).Publisher Item Identifier S1045-9227(02)01807-6.will be derived in Section III.Three experiments are presented in Section IV.Some concluding remarks are given in Section V.II.SVMsIn this section we briefly review the basis of the theory of SVM in classification problems[2]–[4].Suppose we are given asetThe optimal hyperplane problem is then regraded as the so-lution to theproblem(6)where can be regarded as aregularization parameter.This is the only free parameter in theSVM formulation.Tuning this parameter can make balance be-tween margin maximization and classification violation.Detaildiscussions can be found in[4],[6].Searching the optimal hyperplane in(6)is a QP problem,which can be solved by constructing a Lagrangian and trans-formed into thedual(7)where is the vector of nonnegative Lagrangemultipliers associated with the constraints(5).The Kuhn–Tucker theorem plays an important role in thetheory of SVM.According to this theorem,thesolution(8)in(8)are those for which the constraints(5)are satisfied with theequality sign.Thepointandis classified correctly and clearlyaway the decision margin.To construct the optimalhyperplane,it followsthat,the computationof problem(7)and(11)is impossible.There is a good propertyof SVM that it is not necessary to knowaboutcalled kernel that can compute the dotproduct of the data points in featurespace(14)and the decision functionis(15)III.FSVMsIn this section,we make a detail description about the ideaand formulations of FSVMs.A.Fuzzy Property of InputSVM is a powerful tool for solving classification problems[1],but there are still some limitataions of this theory.From thetraining set(1)and formulations discussed above,each trainingpoint belongs to either one class or the other.For each class,wecan easily check that all training points of this class are treateduniformly in the theory of SVM.In many real-world applications,the effects of the trainingpoints are different.It is often that some training points are moreimportant than others in the classificaiton problem.We wouldrequire that the meaningful training points must be classifiedcorrectly and would not care about some training points likenoises whether or not they are misclassified.That is,each training point no more exactly belongs to oneof the two classes.It may90%belong to one class and10%be meaningless,and it may20%belong to one class and80%be meaningless.In other words,there is a fuzzymembershipcan be regarded as the attitude ofmeaningless.We extend the concept of SVM with fuzzy mem-bership and make it an FSVM.B.Reformulate SVM Suppose we are given aset,and sufficientsmall denote the corresponding featurespace vector with amapping.Since the fuzzymembership is the attitude of the corre-spondingpoint(17)where(19)(22)and the Kuhn–Tucker conditions are definedas(23)is misclassi-fied.An important difference between SVM and FSVM is thatthe points with the same value ofin SVM controls the tradeoff be-tween the maximization of margin and the amount of misclassi-fications.Alargermakes SVMignore more training points and get wider margin.In FSVM,we canset(25)where,and make the firstpoint.If we want to make fuzzy membershipbe a linear function of the time,we canselect(26)By applying the boundary conditions,we cangetFig.1.The result of SVM learning for data with time property. By applying the boundary conditions,we canget(30)where(31)suchthat1,it may belongs to this class with lower accuracy or reallybelongs to another class.For this purpose,we can select thefuzzy membership as a function of respective class.Suppose we are given a sequence of trainingpointsifFig.2.The result of FSVM learning for data with time property.Fig.3shows the result of the SVM and Fig.4shows the result of FSVM bysettingis indicated as square.In Fig.3,the SVM finds theoptimal hyperplane with errors appearing in each class.In Fig.4,we apply different fuzzy memberships to different classes,theFSVM finds the optimal hyperplane with errors appearing onlyin one class.We can easily check that the FSVM classify theclass of cross with high accuracy and the class of square withlow accuracy,while the SVM does not.e Class Center to Reduce the Effects of OutliersMany research results have shown that the SVM is very sen-sitive to noises and outliners[8],[9].The FSVM can also applyto reduce to effects of outliers.We propose a model by settingthe fuzzy membership as a function of the distance between thepoint and its class center.This setting of the membership couldnot be the best way to solve the problem of outliers.We just pro-pose a way to solve this problem.It may be better to choose a dif-ferent model of fuzzy membership function in different trainingset.Suppose we are given a sequence of trainingpoints1(37)and the radius ofclassif(39)whereis indicated as square.In Fig.5,theSVM finds the optimal hyperplane with the effect of outliers,for example,a square at( 2.2).InFig.6,the distance of the above two outliers to its correspondingmean is equal to the radius.Since the fuzzy membership is afunction of the mean and radius of each class,these two pointsare regarded as less important in FSVM training such that thereis a big difference between the hyperplanes found by SVM andFSVM.Fig.3.The result of SVM learning for data sets with different weighting.Fig.4.The result of FSVM learning for data sets with different weighting.Fig.5.The result of SVM learning for data sets with outliers.Fig.6.The result of FSVM learning for data sets with outliers.V.C ONCLUSIONIn this paper,we proposed the FSVM that imposes a fuzzy membership to each input point such that different input points can make different constributions to the learning of decision sur-face.By setting different types of fuzzy membership,we can easily apply FSVM to solve different kinds of problems.This extends the application horizon of the SVM.There remains some future work to be done.One is to select a proper fuzzy membership function to a problem.The goal is to automatically or adaptively determine a suitable model of fuzzy membership function that can reduce the effect of noises and outliers for a class of problems.R EFERENCES[1] C.Burges,“A tutorial on support vector machines for pattern recogni-tion,”Data Mining and Knowledge Discovery,vol.2,no.2,1998.[2] C.Cortes and V.N.Vapnik,“Support vector networks,”MachineLearning,vol.20,pp.273–297,1995.[3]V.N.Vapnik,The Nature of Statistical Learning Theory.New York:Springer-Verlag,1995.[4],Statistical Learning Theory.New York:Wiley,1998.[5] B.Schölkopf,C.Burges,and A.Smola,Advances in Kernel Methods:Support Vector Learning.Cambridge,MA:MIT Press,1999.[6]M.Pontil and A.Verri,Massachusetts Inst.Technol.,1997,AI Memono.1612.Properties of support vector machines.[7]N.de Freitas,o,P.Clarkson,M.Niranjan,and A.Gee,“Sequen-tial support vector machines,”Proc.IEEE NNSP’99,pp.31–40,1999.[8]I.Guyon,N.Matic´,and V.N.Vapnik,Discovering Information Patternsand Data Cleaning.Cambridge,MA:MIT Press,1996,pp.181–203.[9]X.Zhang,“Using class-center vectors to build support vector machines,”in Proc.IEEE NNSP’99,1999,pp.3–11.。

SVMPPT课件

VC维：所谓VC维是对函数类的一种度量，可
以简单的理解为问题的复杂程度，VC维越高，一个问题就越复杂。正是因为SVM关注的是VC 维，后面我们可以看到，SVM解决问题的时候，和样本的维数是无关的（甚至样本是上万维的都可以，这使得SVM很适合用来解决像文本分类这样的问题，当然，有这样的能力也因为引入了核函数）。
11
SVM简介
置信风险：与两个量有关，一是样本数
量，显然给定的样本数量越大，我们的学习结果越有可能正确，此时置信风险越小；二是分类函数的VC维，显然VC维越大，推广能力越差，置信风险会变大。
12
SVM简介
泛化误差界的公式为：
R(w)≤Remp(w)+Ф(n/h) 公式中R(w)就是真实风险，Remp(w)表示经验风险，Ф(n/h)表示置信风险。此时目标就从经验风险最小化变为了寻求经验风险与置信风险的和最小，即结构风险最小。
4
SVM简介
支持向量机方法是建立在统计学习理论的VC 维理论和结构风险最小原理基础上的，根据有限的样本信息在模型的复杂性（即对特定训练样本的学习精度， Accuracy）和学习能力（即无错误地识别任意样本的能力）之间寻求最佳折衷，以期获得最好的推广能力（或称泛化能力）。
5
SVM简介
10
SVM简介
泛化误差界：为了解决刚才的问题，统计学
提出了泛化误差界的概念。就是指真实风险应该由两部分内容刻画，一是经验风险，代表了分类器在给定样本上的误差；二是置信风险，代表了我们在多大程度上可以信任分类器在未知样本上分类的结果。很显然，第二部分是没有办法精确计算的，因此只能给出一个估计的区间，也使得整个误差只能计算上界，而无法计算准确的值（所以叫做泛化误差界，而不叫泛化误差）。

SUPER VECTOR MACHINE

. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
4 Classiﬁcation Example: IRIS data 25
4.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Support Vector Regression 29
5.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1.1 −Insensitive Loss Function . . . . . . . . . . . . . . . . . . . . . . 30
7 Conclusions 43
A Implementation Issues 45
A.1 Support Vector Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . 45
A.2 Support Vector Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . . . . . . .

基于支持向量机的长波红外目标分类识别算法

基于支持向量机的长波红外目标分类识别算法王周春1,2,3,4，崔文楠1,2,4，张涛1,2,3,4（1. 中国科学院上海技术物理研究所，上海 200083； 2. 中国科学院大学，北京 100049；3. 上海科技大学，上海 201210；4. 中国科学院智能红外感知重点实验室，上海 200083）摘要：红外图像的分辨率低和色彩单一，但由于红外设备的全天候工作特点，因而在某些场景具有重要作用。

本文采用一种基于支持向量机（support vector machine, SVM）的长波红外目标图像分类识别的算法，在一幅图像中，将算法提取的边缘特征和纹理特征作为目标的识别特征，输入到支持向量机，最后输出目标的类别。

在实验中，设计方向梯度直方图＋灰度共生矩阵＋支持向量机的组合算法模型，采集8种人物目标场景图像进行训练和测试，实验结果显示：相同或者不相同人物目标，穿着不同服饰，算法模型的分类识别正确率较高。

因此，在安防监控、工业检测、军事目标识别等运用领域，此组合算法模型可以满足需要，在红外目标识别领域具有一定的优越性。

关键词：长波红外目标；支持向量机；识别特征；目标识别中图分类号：TN219 文献标识码：A 文章编号：1001-8891(2021)02-0153-09Classification and Recognition Algorithm for Long-wave Infrared TargetsBased on Support Vector MachineWANG Zhouchun1,2,3,4，CUI Wennan1,2,4，ZHANG Tao1,2,3,4(1. Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China;2. University of Chinese Academy of Sciences, Beijing 100049, China;3. Shanghai Tech University, Shanghai 201210, China;4. Key Laboratory of Intelligent Infrared Perception, Chinese Academy of Sciences, Shanghai 200083, China)Abstract: Infrared images have a low resolution and a single color, but they play an important role in some scenes because they can be used under all weather conditions. This study adopts a support vector machine algorithm for long-wave infrared target image classification and recognition. The algorithm extracts edge and texture features, which are used as the recognition features of the target, and forwards them to a support vector machine. Then, the target category is output for infrared target recognition. Several models, such as the histogram of oriented gradient, gray level co-occurrence matrix, and support vector machine, are combined to collect images of eight types of target scenes for training and testing. The experimental results show that the algorithm can classify the same target person wearing different clothes with high accuracy and that it has a good classification effect on different target characters. Therefore, under certain scene conditions, this combined algorithm model can meet the needs and has certain advantages in the field of target recognition.Key words: long-wave infrared target, support vector machine, recognition feature, target recognition0引言红外线是一种波长范围为760nm～1mm的电磁辐射[1]，红外图像的分辨率低、色彩单一，但是，由于红外设备具有全天工作的优点，因而在某些场景具有重大的运用价值，例如军事、交通、安全领域等。

基于弹性网和直方图相交的非负局部稀疏编码

DOI： 10． 11772 / j． issn． 1001-9081． 2018071483
基于弹性网和直方图相交的非负局部稀疏编码
*பைடு நூலகம்
万源，张景会，陈治平，孟晓静
( 武汉理工大学理学院，武汉 430070) ( * 通信作者电子邮箱 Jingzhang@ whut． edu． cn)
摘要：针对稀疏编码模型在字典基的选择时忽略了群效应，且欧氏距离不能有效度量特征与字典基之间距离的问题，提出基于弹性网和直方图相交的非负局部稀疏编码方法（ EH-NLSC）。首先，在优化函数中引入弹性网模型，消除字典基选择数目的限制，能够选择多组相关特征而排除冗余特征，提高了编码的判别性和有效性。然后，在局部性约束中引入直方图相交，重新定义特征与字典基之间的距离，确保相似的特征可以共享其局部的基。最后采用多类线性支持向量机进行分类。在 4 个公共数据集上的实验结果表明，与局部线性约束的编码算法（ LLC）和基于非负弹性网的稀疏编码算法（ NENSC）相比，EH-NLSC 的分类准确率分别平均提升了 10 个百分点和 9 个百分点，充分体现了其在图像表示和分类中的有效性。
Key words: sparse coding; elastic net model; locality; histogram intersection; image classification
0 引言
图像分类是计算机视觉领域的一个重要研究方向，广泛应用于生物特征识别、网络图像检索和机器人视觉等领域，其关键在于如何提取特征对图像有效表示。稀疏编码是图像特征表示的有效方法。考虑到词袋（ Bag of Words，BoW）模型［1］和空间金字塔匹配（ Spatial Pyramid Matching，SPM）模型［2］容易造成量化误差，Yang 等［3］结合 SPM 模型提出利用稀疏编码的空间金字塔的图像分类算法（ Spatial Pyramid Matching using Sparse Coding，ScSPM），在图像的不同尺度上进行稀疏编码，取得了较好的分类效果。在稀疏编码模型中，由于 1 范数在字典基选择时只考虑稀疏性而忽略了群体效应，Zou 等［4］提出一种新的正则化方法，将弹性网作为正则项和变量选择方法。Zhang 等［5］提出判别式弹性网正则化线性

人工智能核函数例题

人工智能核函数例题
人工智能中的核函数是一种用于支持向量机（SVM）和其他机器
学习算法的工具。

核函数可以把输入数据映射到高维空间，从而使
得原本在低维空间中线性不可分的数据在高维空间中变得线性可分。

这样一来，我们就可以使用线性分类器来处理这些数据。

举个例子，假设我们有一组二维数据点，它们在二维平面上是
线性不可分的。

但是，如果我们使用一个二次多项式核函数，它可
以将这些二维数据点映射到三维空间中，使得它们在三维空间中变
得线性可分。

这样一来，我们就可以使用一个平面来分隔这些数据点，实现了在原始的二维空间中无法实现的分类。

另一个常见的核函数是高斯核函数（也称为径向基函数）。

这
个核函数可以将数据映射到无限维的特征空间中，从而可以处理非
线性可分的数据。

除了上述的例子之外，核函数还可以是多项式核函数、字符串
核函数等等。

它们在不同的机器学习问题中发挥着重要的作用，能
够帮助算法更好地处理复杂的数据集。

总之，核函数在人工智能中扮演着至关重要的角色，它们可以
帮助我们处理线性不可分的数据，拓展了机器学习算法的应用范围。

通过合理选择和使用核函数，我们可以更好地解决各种复杂的实际
问题。

基于总体变化子空间自适应的i-vector说话人识别系统研究

短文自动化学报第40卷第8期2014年8月Brief Paper ACTA AUTOMATICA SINICA Vol.40,No.8August,2014基于总体变化子空间自适应的i-vector说话人识别系统研究栗志意1张卫强1何亮1刘加1摘要在说话人识别研究中,基于身份认证矢量(identity vector, i-vector)的子空间建模被证明是目前最前沿最有效的说话人建模技术,其中如何有效准确地估计总体变化子空间矩阵T成为影响系统性能好坏的关键问题.本文针对i-vector技术如何在新的应用环境下进行总体变化子空间矩阵T的自适应估计问题进行了研究,并提出了两种行之有效的自适应估计算法.在由美国国家标准技术局(American National Institute of Standard and Technology,NIST)组织的2008年说话人识别核心评测数据库以及自行采集的测试数据库上的实验结果显示,不论采用测试集数据本身还是与测试集较匹配的开发集数据,通过本文所提的自适应算法来更新总体变化子空间矩阵均可以使更新后的子空间更有利于新测试数据下的低维子空间描述,在新的测试环境下都更有利于说话人分类.此外实验结果还表明基于多子空间拼接的子空间自适应方法性能明显优于迭代自适应方法,而且两者的结合可达到最优的识别性能,且此时利用开发集数据进行自适应可以接近其利用测试集数据进行自适应得到的最优性能.关键词身份认证矢量,总体变化子空间,自适应,说话人识别引用格式栗志意,张卫强,何亮,刘加.基于总体变化子空间自适应的i-vector说话人识别系统研究.自动化学报,2014,40(8):1836−1840 DOI10.3724/SP.J.1004.2014.01836Total Variability Subspace Adaptation Based Speaker RecognitionLI Zhi-Yi1ZHANG Wei-Qiang1HE Liang1LIU Jia1Abstract In text-independent speaker recognition,the iden-tity vector(i-vector)based modeling method has recently been proved to be the most popular and eﬃcient method.It is a key problem to estimate the total variability subspace T eﬃ-ciently and accurately.In this paper,two adaptation algorithms are proposed in order to improve the performance of the i-vector base system in practical environments.Experiments on the2008 core speaker recognition evaluation dataset of American NIST and Technology and the self-collected speaker recognition eval-uation dataset demonstrate that using the proposed adaptation algorithms to adapt to the total variability subspace T from ei-ther the test dataset or the developing dataset is eﬀective for improving the performance.In addition,the combination of the two adaptation algorithms can achieve almost the best perfor-mance using the developing dataset rather than the test dataset. Key words i-vector,total variability subspace,adaptation, speaker recognitionCitation Li Zhi-Yi,Zhang Wei-Qiang,He Liang,Liu Jia.To-tal variability subspace adaptation based speaker recognition. Acta Automatica Sinica,2014,40(8):1836−1840收稿日期2013-11-13录用日期2013-11-23Manuscript received November13,2013;accepted November23, 2013本文责任编委吴玺宏Recommended by Associate Editor WU Xi-Hong国家自然科学基金(61370034,61273268,61005019,90920302),北京市自然科学基金项目(KZ201110005005)资助说话人识别技术是指利用从说话人语音信号中提取出的声纹特征进行辨识或确认说话人身份的一项技术.作为一项重要的生物特征身份鉴定技术,该技术可广泛应用于国家安全、司法鉴定、语音拨号、电话银行等诸多领域[1].近几年来,以身份认证矢量i-vector为基础的说话人建模技术取得了非常大的成功,使得说话人识别系统的性能有了非常显著的提升[2−3],在由美国国家标准技术局(Ameri-can National Institute of Standards and Technology,NIST)组织的国际说话人评测中,基于该技术的说话人识别系统的性能明显优于之前广泛采用的高斯混合模型超矢量-支持矢量机(Gaussian mixture model super vector-support vector machine,GSV-SVM)[4]、联合因子分析(Joint factor analy-sis,JFA)[5−6],成为目前占主导地位的说话人识别系统.基于身份认证矢量i-vector的说话人建模方法与先前的GSV-SVM、JFA建模方法一样,都是基于高斯混合模型-通用背景模型(Gaussian mixture model-universal background model,GMM-UBM)[7],其基本思想是假设说话人信息以及信道信息同时处于高斯混合模型高维均值超矢量空间中的一个低维线性子空间流形结构中,如式(1)所示.M=m+Tw w(1)其中,M表示高斯混合模型均值超矢量,m表示一个与特定说话人和信道都无关的超矢量.而总体变化子空间矩阵T完成从高维空间到低维空间的映射,从而使得降维后的矢量更有利于进一步地分类和识别.因此该建模过程中,首先通过因子分析的方法,训练得到矩阵T,然后再将高维的高斯混合模型均值超矢量在该子空间上进行投影,得到低维的总体变化因子矢量,也称之为身份认证矢量i-vector,最后将得到的低维身份认证矢量i-vector进行线性鉴别性分析(Linear discriminate analysis,LDA)降维和类内协方差归一化(Within class covariance normalization,WCCN).前者线性鉴别性分析技术LDA的目的在于在满足最小化类内说话人距离和最大化类间说话人距离的鉴别性优化准则下进一步降低i-vector的维数,后者类内协方差归一化技术WCCN的目的是通过白化协方差矩阵,使得变换后的子空间的基尽可能正交.最终将经过LDA和WCCN变换过后的i-vector矢量,作为输入特征送入后续的分类器进行分类判决.经典的i-vector分类器包括余弦距离打分(Cosine distance scoring,CDS)分类器和SVM分类器[8]等,本文采用与文献[2]中一致的余弦距离打分CDS分类器.从i-vector建模方法的基本假设中可以看出,如何准确地估计总体变化子空间T是一个非常基础性且关键性的环节.准确的估计子空间T意味着投影后的低维i-vector矢量对于说话人和信道信息描述的更具有区分性,更有利于进一步的分类.在近年来由NIST组织的说话人评测中,我们发现i-vector系统的性能一致性优于GSV-SVM系统,其中一个因素在于NIST评测数据库的数据资源充足,数据质量较好,数据来源比较一致,因此会对训练矩阵带来很多利好条件.而在实际应用部署i-vector系统的过程中发现,Supported by National Natural Science Foundation of China (61370034,61273268,61005019,90920302)and Beijing Natural Sci-ence Foundation(KZ201110005005)1.清华大学电子工程系清华信息与科学技术国家实验室北京1000841.Tsinghua National Laboratory for Information Science and Tech-nology,Department of Electronic Engineering,Tsinghua University, Beijing1000848期栗志意等:基于总体变化子空间自适应的i-vector说话人识别系统研究1837由于实际测试场景复杂,数据资源短缺,数据质量较差,常导致子空间估计的不甚稳健[9],使得i-vector说话人系统在实际部署测试时的识别性能在很多情况下反而没有GSV-SVM 系统稳健.由于i-vector建模方法的核心和基础是对总体变化子空间T矩阵的估计,因此本文针对i-vector说话人识别系统在子空间自适应估计问题进行了深入研究,在此基础上提出了两种行之有效的子空间自适应算法,并通过实验对比给出了最优的自适应策略.本文安排如下:第1节介绍了i-vector说话人系统中总体变化子空间T矩阵的估计过程;第2节介绍了i-vector说话人系统中模型的训练和测试过程;第3节介绍了本文提出的两种总体变化子空间矩阵T的自适应算法,并给出所对应的系统框图;第4节给出本文所提算法在测试数据集上的实验结果和分析;最后在第5节给出总结和结论.1总体变化子空间T矩阵估计1.1统计量估计在i-vector系统总体变化子空间T的估计过程中,由于高斯混合模型均值超矢量是通过计算声学特征相对于通用背景模型UBM均值超矢量的零阶、一阶和二阶统计量得到的,因此本小节将首先给出声学特征各阶统计量的估计过程.为了估计各阶统计量,需要首先利用一些训练数据通过期望最大化(Expectation maximum,EM)算法训练得到通用背景模型UBM,该模型提供了一个统一的参考坐标空间,并且可以在一定程度上解决由于说话人训练数据较少导致的小样本问题.而高斯混合模型则可通过训练数据在该UBM上面进行最大后验概率(Maximum a posterior,MAP)自适应得到.各阶统计量的估计过程如下所示,假设说话人s的声学特征表示为x s,t,则其相对于UBM均值超矢量m的零阶统计量N c,s,一阶统计量F c,s以及二阶统计量S c,s可如式(2)所示.N c,s=tγc,s,tF c,s=tγc,s,t(x s,t−m c)S c,s=diag{tγc,s,t(x s,t−m c)(x s,t−m c)T}(2)式中m c代表UBM均值超矢量m中的第c个高斯均值分量.t表示时间帧索引.γc,s,t表示UBM第c个高斯分量的后验概率.diag{·}表示取对角运算.假设单高斯模型的维数为F,则将所有C个高斯模型的均值矢量拼接成的高维均值超矢量维数为F C.1.2子空间T估计对于上述得到的各阶统计量,子空间T的估计可以采用如下的期望最大化(Expectation maximum,EM)算法得到,首先随机初始化子空间矩阵T,然后固定T,在最大似然准则下估计隐变量w的一阶和二阶统计量,估计过程如式(3)所示.其中超矢量F s是由F c,s矢量拼接成的F C×1维的矢量.N s是由N c,s作为主对角元拼接成的F C×F C维的矩阵.L s=I+T TΣ−1N s TE[w s]=L−1s T TΣ−1F sE[w s w T s]=E[w s]E[w T s]+L−1s(3)式中L s是临时变量,Σ是UBM的协方差矩阵.接着更新T矩阵和协方差矩阵Σ.T矩阵的更新过程可利用式(4)来实现,也可根据文献[10]中的快速算法来实现.sN s T E[w s w T s]=sF s E[w s](4)对UBM协方差矩阵Σ的更新过程如式(5)所示.Σ=N−1sS s−N−1diag{sF s E[w T s]T T}(5)式中S s是由S c,s进行矩阵对角拼接成的F C×F C维的矩阵,N=N s为所有说话人的零阶统计量之和.对于上述步骤反复进行迭代6∼8次后,可近似认为T 和Σ收敛.2i-vector模型训练和测试本文对于i-vector模型的训练和测试采用与文献[2]一致的过程,即首先通过LDA和WCCN对于上述子空间投影后的i-vector矢量进行进一步的鉴别性降维和白化,然后利用余弦距离打分对处理后的i-vector进行最终打分和判决.2.1线性鉴别性分析线性鉴别性分析[11](Linear discriminant analysis, LDA)是模式识别领域广泛采用的一种鉴别性降维技术.在基于身份认证矢量i-vector的说话人系统中,由于前述基于因子分析的方法没有采用鉴别性的准则,因此通常首先采用LDA对因子分析后的i-vector矢量进行鉴别性降维.训练LDA矩阵的过程主要通过优化如式(6)所示的目标函数,即在最小化类内说话人距离和最大化类间说话人距离的鉴别性准则下求该目标函数的最优解.J(w)=w T S B ww T S W w(6)式(6)中类间协方差矩阵S B和类内协方差矩阵S W的计算过程如式(7)和式(8)所示.S B=Ss=1(w s−w)(w s−w)T(7) S W=Ss=11n sn si=1(w s i−w s)(w s i−w s)T(8)其中w s=(1/n s)n si=1w s i代表第s个说话人的i-vector 均值矢量.S表示说话人的个数,n s表示第s个说话人的i-vector段数.最终求解式(6)中最优化目标函数的过程可转换为求解如式(9)所示广义特征值的过程.S B w=λS W w(9)2.2类内方差归一化类内方差归一化[12](Within class covariance normal-ization,WCCN)通过白化说话人因子使得变换后说话人子空间的基尽可能正交.WCCN矩阵可由式(10)估计.W=1SSs=11n sn si=1(w s i−w s)(w s i−w s)T(10)1838自动化学报40卷其中w s =(1/n s ) n s i =1w s i 代表第s 个说话人的i-vector 均值矢量.S 表示说话人个数,n s 表示第s 个说话人的i-vector 段数.2.3余弦距离打分余弦距离打分[2]是一种对称式的核函数分类器,即说话人模型i-vector 矢量与测试段i-vector 矢量交换后打分结果不变.在对i-vector 分类时,该分类器将说话人矢量w tar 和测试段矢量w tst 的余弦距离分数直接作为判决分数,并与阈值θ进行比较,给出判决结果,如式(11)所示.score (w tar ,w tst )=w tar ,w tst||w tar ||·||w tst ||θ(11)本质上讲,该分类器通过归一化矢量的模去除了矢量幅度的影响,将两个矢量的余弦角度作为分类的依据,计算简单快捷,而且该分类器可以通过合并计算,加快后续分数归一化的过程.3总体变化子空间T 的自适应算法与联合因子分析JFA 子空间建模方法不同,i-vector 子空间建模过程中T 矩阵的估计不需要任何标签信息,属于无监督训练过程,这为在实际应用中通过大量的无标注数据来自适应估计T 矩阵来提高子空间估计的性能提供了可能.在实际应用中,为了在新测试环境下对子空间T 矩阵进行稳健地估计,本文源于不同的出发点,提出了两种子空间自适应算法,并提出了两者结合的自适应算法.3.1迭代自适应算法该算法的基本思想类似于高斯混合模型-通用背景模型的思想,首先离线利用已有的训练数据来按照1.2中的算法流程估计一个与测试条件不相关的子空间矩阵T o ,称之为通用子空间矩阵,然后从通用子空间矩阵T o 上进行子空间自适应,完成子空间的迁移变换,如图1所示.图1总体变化子空间T 的迭代自适应算法示意图Fig.1Total variability diagram of total variability subspace Titeration adaptation algorithm具体实施过程中,首先通过已有的训练数据得到通用子空间矩阵T o ,然后将该矩阵作为初始化种子,利用EM 算法在新的测试数据集上进行迭代自适应.该迭代过程与1中的迭代过程类似,其自适应过程如算法1所示.算法1.总体变化子空间T 的迭代自适应算法步骤1.用已有数据训练得到通用子空间变化矩阵T o 和UBM 协方差矩阵Σ,作为初始化种子;步骤2.根据T ,计算临时变量L ,并估计总体变化矢量w 的一阶统计量E[w s ]和二阶统计量E[w s w T s ];步骤3.根据步骤2中的统计量对T 矩阵进行更新,如式(4)所示;步骤4.根据步骤2和步骤3中结果更新UBM 协方差矩阵Σ,更新过程如式(5)所示;步骤5.未达到迭代次数则返回步骤2中继续;否则结束退出.3.2拼接自适应算法该算法的基本思想源于文献中对联合因子分析建模技术中信道空间拼接的思想[13−14],在联合因子分析建模过程中,通过拼接信道子空间,从而可以更有效地去掉信道因子分量.而在i-vector 建模过程中,我们认为在利用原始训练数据和新环境下的数据集可以分别训练得到两个反映不同角度的总体变化子空间,意味着通过在这两个总体变化子空间构成的联合子空间中对高维矢量进行投影,可以从不同角度来反映低维的总体变化因子.图2总体变化子空间T 拼接自适应算法示意图Fig.2Diagram of total variability subspace Tcombination adaptation algorithm该算法如图2所示,首先通过原始训练数据和新数据集利用第1.2中的估计算法分别训练得到两个不同的总体变化子空间矩阵T o 和T n ,然后将两个子空间进行拼接,得到自适应后的子空间矩阵.该算法流程如下所示:算法2.总体变化子空间T 拼接自适应算法步骤1.利用原始数据训练得到子空间变化矩阵T o ;步骤2.利用新的测试数据训练得到子空间变化矩阵T n ;步骤3.拼接T o 和T n 得到最终的自适应子空间T .3.3两者结合的自适应算法针对上述提出的两种子空间自适应算法,我们进一步提出了两者相结合的自适应算法,该结合算法可以实现之前两种算法的有效互补,更有利于总体变化因子在低维子空间中的表示.图3给出了两者结合的自适应算法流程结构图.图3总体变化子空间T 的迭代自适应和拼接自适应结合的自适应算法示意图Fig.3Diagram of total variability subspace T integration algorithm of iteration adaptation and subspace combinationadaptation8期栗志意等:基于总体变化子空间自适应的i-vector说话人识别系统研究1839该算法如图3所示,首先通过已有的训练数据利用第1.2节中的估计算法训练得到总体变化子空间矩阵T o,然后再根据算法1在该子空间上对新数据集进行迭代得到T n,最后对两个子空间进行拼接,得到最终的子空间矩阵.该算法流程如下所示.算法3.总体变化子空间T拼接自适应算法步骤1.利用原始数据训练得到子空间变化矩阵T o;步骤2.利用迭代自适应算法1,对上述矩阵在新数据集上迭代得到子空间变化矩阵T n;步骤3.拼接T o和T n得到最终的自适应子空间T.3.4算法复杂度分析通过分析以上三种自适应算法的流程可以看出,与不进行自适应相比,每种自适应算法都需要增加额外的自适应时间,而具体的时间与自适应所用的数据量大小呈线性关系.此外,三种子空间自适应算法本身的时间复杂度是一样的,而不同点在于基于子空间拼接的自适应算法在后续低维投影上的时间和空间是第一种自适应算法的一倍左右,因此在实际应用中也需要根据系统实时率和响应需求来采用最合适的自适应算法.4实验配置与实验结果为了验证本文所提算法的性能,本文将针对所提子空间自适应i-vector说话人系统性能进行实验验证和结果对比,并分析得出结论.在实际部署中,不同的应用环境使得我们是否可以将新数据用于自适应也有所不同.比如离线测试环境或者说话人检索情况下,我们可以获得所有的测试数据,因此可以直接利用测试数据.而对于某些应用环境下,比如在线测试情况下,我们不能得到所有的测试集数据,但我们可以退而求其次,提前获得一些接近该测试集数据的开发集数据.4.1实验配置本文实验中采用两个数据集,一个是NIST发布的SRE 2008的核心数据集,原始训练子空间数据来自Switchboard I和II约20000条,这些数据同样用于训练UBM模型, ZTnorm伙伴集以及训练LDA和WCCN矩阵,新数据集包括新的说话人训练数据集以及12922条的测试数据集,以及3000条左右的开发集.另一个是自行采集的数据集,原始训练子空间数据和新数据来源于不同的地域和采集卡.原始数据采集了约20000条,这些数据用于训练UBM模型, ZTnorm伙伴集以及训练LDA和WCCN矩阵,新数据集包括新的说话人训练数据集以及8000条的测试数据集,以及2000条左右的开发集.实验中采用Mel频率倒谱特征(Mel-frequency cepstral coeﬃcients,MFCC)作为声学层特征.在预处理阶段采用G.723.1进行有效语音端点检测(Voice activity detection, VAD)以及采用倒谱均值减(Cepstral mean subtraction, CMS)技术来去除或抑制信道的卷积噪声,并设置了3s窗长用于特征弯折(Feature warping),进行了25%低能量删减以及预加重(因子为0.95).在上述预处理基础上,首先提取13维基本特征,并与一阶、二阶差分特征一起构成最终的39维MFCC声学层特征.实验中使用UBM的混合数设置为1024,高斯概率密度的方差采用对角阵.i-vector中总体变化子空间矩阵T的子空间维数也即列数均设置为400,训练时迭代次数取6次.LDA矩阵降维后的维数设置为200.4.2实验结果本文中衡量系统性能的指标采用等错误率(Equal error rate,EER)和最小检测代价函数(Minimum detection cost function,MinDCF).表1和表2给出了在两个数据集上由原始数据集训练得到子空间T和迭代自适应算法训练T在新开发集数据以及新测试集数据上的性能比较结果.可以看出,不管利用新开发集还是新测试集的数据进行自适应训练,均可以较之有性能提升.从表中数据还可以看出,在实际应用中,虽然开发集数据较之原始数据更接近测试集数据,但是由于开发集数据的获取有限,所以采用开发集数据进行迭代自适应获得的性能提升也有限.而由于测试集数据本身的匹配程度最高,因此可以得到最好的自适应性能.因此在实际应用中应该首先选择利用测试集数据本身来进行子空间T的自适应.在某些在线测试应用下,若无法利用测试集数据,也可以考虑采用开发集数据来做自适应.表1原始数据训练T与本文所提迭代自适应T在NIST SRE2008核心数据集上的性能比较Table1Performance comparison of baseline training T algorithm and the proposed iteration adaptation T algorithmon NIST SRE2008core dataset算法EER MinDCF 原始数据训练T 5.410.029新开发集数据自适应T 4.920.026新测试集数据自适应T 4.670.023表2原始数据训练T与本文所提迭代自适应T在自行采集数据集上的性能比较Table2Performance comparison of baseline training T algorithm and the proposed iteration adaptation T algorithmon actual application dataset算法EER MinDCF 原始数据训练T 3.000.014新开发集数据自适应T 2.990.013新测试集数据自适应T 2.000.011表3原始数据训练T与本文所提迭代自适应和子空间拼接自适应相结合的自适应算法在NIST SRE2008核心数据集上的性能比较Table3Performance comparison of baseline training T algorithm and the proposed integration algorithm of iteration adaptation and subspace combination adaptation on NISTSRE2008core dataset算法EER MinDCF 原始数据训练T 5.410.029新开发集数据自适应T 4.010.021新测试集数据自适应T 3.890.020从表1和表2中的实验结果可以看出,迭代自适应在两个数据集上均可以一致性地提高系统的性能.因此接下来直接对第3.3节中所提出的迭代自适应与拼接自适应相结合的自适应算法上进行实验比较.如表3和表4所示实验结果所示,通过与空间拼接自适应相结合,识别性能有更进一步的改善.且此时利用开发集数据进行自适应可以接近其利用测试集数据进行自适应得到的最优性能.因此实际应用中,如1840自动化学报40卷果在可表4原始数据训练T与本文所提迭代自适应和子空间拼接自适应相结合的自适应算法在自行采集数据集上性能比较Table4Performance comparison of baseline training T algorithm and the proposed integration algorithm of iteration adaptation and subspace combination adaptation on actualapplication dataset算法EER MinDCF原始数据训练T 3.000.014新开发集数据自适应T 1.990.012新测试集数据自适应T 1.990.010以获得测试集数据情况下,利用测试集数据进行自适应可以取得最优的自适应效果.当测试集数据不可用于训练子空间的情况下,可以退而求其次,利用与测试集较为匹配的开发集,可以取得同样不错的性能.这样也为我们在实际应用环境中,如何有效地通过自适应来提高i-vector说话人识别系统的性能提供了参考依据.5讨论与结论从基于身份认证矢量i-vector建模的说话人识别的原理假设来看,有效准确的估计子空间总体变化矩阵T是一个基本性和关键性的问题,会直接影响系统识别性能的好坏,同时也是影响该建模技术在实际应用中稳健性的关键问题.本文针对i-vector技术如何在实际应用中根据新数据来自适应子空间矩阵T进行了深入研究,提出几种切实可行的自适应估计算法,并针对不同的测试条件下给出了最优的自适应策略.本文所提算法在NIST SRE2008核心测试数据集和自行采集的测试数据库上的实验结果均显示,不论采用测试集本身还是与测试集较匹配的开发集数据,通过本文所提的自适应算法均可以使更新后的子空间更有利于新测试数据下的低维子空间描述,从而更有利于说话人识别.此外实验结果还表明基于多子空间拼接的自适应算法的性能明显优于迭代自适应算法,而且两者的结合可或得到最优的系统性能,且此时利用开发集数据进行自适应可以接近其利用测试集数据进行自适应得到的性能提升,这样为在实际应用环境下如何有效地通过子空间自适应来提高i-vector说话人识别系统的性能提供了重要的参考依据.References1Kinnunen T,Li H Z.An overview of text-independent speaker recognition:from features to supervectors.Speech Communication,2010,52(1):12−402Dehak N,Kenny P,Ouellet P,Dumouchel P.Front-end factor analysis for speaker veriﬁcation.IEEE Transactions on Audio,Speech and Language Processing,2011,19(4): 788−7983Li Zhi-Yi,He Liang,Zhang Wei-Qiang,Liu Jia.Speaker recognition based on discriminant i-vector local distance pre-serving projection.Journal of Tsinghua University(Science and Technology),2012,52(5):598−601(栗志意,何亮,张卫强,刘加.基于鉴别性i-vector局部距离保持映射的说话人识别.清华大学学报(自然科学版),2012,52(5): 598−601)4Campbell W M,Campbell J P,Reynolds D A,Singer E,Torres-Carrasquillo P A.Support vector machines for speaker and language puter Speech and Language,2006,20(2−3):210−2295Kenny P,Boulianne G,Ouellet P,Dumouchel P.Speaker and session variability in GMM-based speaker veriﬁcation.IEEE Transactions on Audio,Speech and Language Processing, 2007,15(4):1448−14606Kenny P,Boulianne G,Ouellet P,Dumouchel P.Joint factor analysis versus eigenchannels in speaker recognition.IEEE Transactions on Audio,Speech and Language Processing, 2007,15(4):1435−14477Reynolds D A,Quatieri T F,Dunn R B.Speaker veriﬁca-tion using adapted Gaussian mixture models.Digital Signal Processing,2000,10(1−3):19−418Cortes C,Vapnik V.Support vector networks.Machine Learning,1995,20(3):273−2979Zhang Wen-Lin,Zhang Wei-Qiang,Liu Jia,Li Bi-Cheng, Qu Dan.A new subspace based speaker adaptation method.Acta Automatica Sinica,2011,37(12):1495−1502(张文林,张卫强,刘加,李弼程,屈丹.一种新的基于子空间的说话人自适应方法.自动化学报,2011,37(12):1495−1502)10Kenny P,Boulianne G,Dumouchel P.Eigenvoice model-ing with sparse training data.IEEE Transactions on Audio, Speech,and Language Processing,2005,13(3):345−354 11Bishop C M.Pattern Recognition and Machine Learning.Berlin:Springer,200812Hatch A O,Kajarekar S,Stolcke A.Within-class covari-ance normalization for SVM-based speaker recognition.In: Proceedings of the International Conference on Spoken Lan-guage Processing.Pittsburgh,PA,2006.1471−147413He Liang,Shi Yong-Zhe,Liu Jia.Eigenchannel space com-bination method of joint factor analysis Acta Automatica Sinica,2011,37(7):849−856(何亮,史永哲,刘加.联合因子分析中的本征信道空间拼接方法.自动化学报,2011,37(7):849−856)14Guo Wu,Li Yi-Jie,Dai Li-Rong,Wang Ren-Hua.Factor analysis and space assembling in speaker recognition.Acta Automatica Sinica,2009,35(9):1193−1198(郭武,李轶杰,戴礼荣,王仁华.说话人识别中的因子分析以及空间拼接.自动化学报,2009,35(9):1193−1198)栗志意清华大学电子工程系博士研究生.主要研究方向为说话人识别与语种识别.本文通信作者.E-mail:************************(LI Zhi-Yi Ph.D.candidate in the Department of Electronic Engineering,Tsinghua University.His research interest covers speaker recognition and language recognition.Corresponding author of this paper.)张卫强清华大学电子工程系助理研究员.主要研究方向为说话人识别与语种识别.E-mail:********************.cn(ZHANG Wei-Qiang Assistant professor in the Department of Electronic Engineering,Tsinghua University.His research in-terest covers speaker recognition and language recognition.)何亮清华大学电子工程系助理研究员.主要研究方向为说话人识别与语种识别.E-mail:********************.cn(HE Liang Assistant professor in the Department of Elec-tronic Engineering,Tsinghua University.His research interest covers speaker recognition and language recognition.)刘加清华大学电子工程系教授.主要研究方向为语音识别和信号处理.E-mail:*****************.cn(LIU Jia Professor in the Department of Electronic Engineer-ing,Tsinghua University.His research interest covers speech recognition and signal processing.)。

Text Categorization with Support Vector Machines

Text Categorization with Support Vector Machines: Learning with Many Relevant
Features
Thorsten Joachims
Universitat Dortmund Informatik LS8, Baroper Str. 301
3 Support Vector Machines
Support vector machines are based on the Structural Risk Minimization principle 9 from computational learning theory. The idea of structural risk minimization is to nd a hypothesis h for which we can guarantee the lowest true error. The true error of h is the probability that h will make an error on an unseen and randomly selected test example. An upper bound can be used to connect the true error of a hypothesis h with the error of h on the training set and the complexity of H measured by VC-Dimension, the hypothesis space containing h 9 . Support vector machines nd the hypothesis h which approximately minimizes this bound on the true error by e ectively and e ciently controlling the VC-Dimension of H.

A Tutorial on Support Vector Machines for Pattern Recognition

CHRISTOPHER J.C. BURGES
burges@
Bell Laboratories, Lucent Technologies Editor: Usama Fayyad Abstract. The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even inﬁnite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will ﬁnd that even old material is cast in a fresh light. Keywords: support vector machines, statistical learning theory, VC dimension, pattern recognition

基于双谱的射频指纹提取方法

第19卷第1期太赫兹科学与电子信息学报Vo1.19，No.1 2021年2月 Journal of Terahertz Science and Electronic Information Technology Feb.，2021文章编号：2095-4980(2021)01-0107-05基于双谱的射频指纹提取方法贾济铖，齐琳(哈尔滨工程大学信息通信工程学院，黑龙江哈尔滨 150001)摘要：研究了基于通信辐射源射频指纹(RFF)的同类型设备分类识别理论，通过提取通信信号的围线积分双谱值来作为设备个体识别的特征向量，使用支持向量机(SVM)分类器进行识别。

构建辐射源识别系统，并使用实测信号进行仿真测试。

结果显示该方法具有稳定的识别效果，且在信噪比(SNR)为-22 dB时，系统可以达到接近90%的分类识别准确度。

这说明本文提出的基于双谱的RFF提取方法有效。

关键词：物理层安全；射频指纹；围线积分双谱；个体识别中图分类号：TN918文献标志码：A doi：10.11805/TKYDA2019291RF fingerprint extraction method based on bispectrumJIA Jicheng，QI Lin(College of Information and Communication Engineering，Harbin Engineering University，Harbin Helongjiang 150001，China)Abstract：The classification and recognition theories of the same type of equipment based on the Radio Frequency Fingerprint(RFF) of the communication radiation source are studied. The integralbispectrum values of the communication signal are extracted as the feature vector of the device, and theSupport Vector Machine(SVM) classifier is used for identification. After constructing a radiation sourceidentification system, the measured signals are used for simulation testing. The simulation results show astable recognition effect by using the proposed method, and the system can achieve nearly 90%classification recognition accuracy when the Signal to Noise Ratio(SNR) is -22 dB. This result validatesthe effectiveness of bispectrum-based RF fingerprint extraction method.Keywords：physical layer security；Radio Frequency Fingerprint(RFF)；contour integral bispectrum；individual identification物联网及5G的快速发展，使彼此链接的无线设备越来越多，同时也带来了一系列监管与安全的难题。

深度学习技术中的稀疏自编码器方法详解

深度学习技术中的稀疏自编码器方法详解深度学习在图像识别、语音识别、自然语言处理等领域取得了很大的成功，其中稀疏自编码器方法被广泛应用。

稀疏自编码器是一种无监督学习方法，能够自动从输入数据中学习到有效的表示，并在特征空间中进行重建。

稀疏自编码器是一种具有稀疏性约束的自编码器。

自编码器是一种神经网络模型，由输入层、隐藏层和输出层组成，其中隐藏层的维度一般比输入层低。

自编码器的目标是尽可能地重构输入数据，同时通过压缩隐藏层的维度来学习数据的有效表示。

稀疏性约束是稀疏自编码器的核心特性之一。

稀疏性要求隐藏层的激活值（或称为输出值）中只有少数几个元素是非零的，大部分元素都应该接近于零。

这样的约束能够促使自编码器学习到对输入数据进行压缩编码的稀疏表示。

稀疏性约束可以通过引入稀疏惩罚项来实现。

最常用的稀疏惩罚项是L1正则化，它可以通过限制隐藏层激活值的绝对值之和来实现。

具体地，对于每个隐藏层的激活值a，L1正则化可以定义为λ * |a|，其中λ是一个控制稀疏程度的超参数。

稀疏自编码器的训练过程包括两个阶段：编码阶段和解码阶段。

在编码阶段，输入数据通过前向传播传递到隐藏层，隐藏层的激活值被计算出来。

在解码阶段，隐藏层的激活值通过反向传播传递到输出层，重构数据被生成出来。

训练过程的目标是最小化重构误差，即输入数据与重构数据之间的差距。

稀疏自编码器的训练可以使用梯度下降法等优化算法进行。

在训练过程中，除了最小化重构误差外，还需要考虑稀疏性约束的损失。

稀疏自编码器中隐藏层的激活值可以使用sigmoid函数或ReLU函数进行非线性变换。

这些函数的选择对模型的性能和稀疏性有一定的影响。

稀疏自编码器方法在深度学习中有很多的应用。

其中一个重要的应用是特征学习。

通过训练稀疏自编码器，可以学习到输入数据的有效表示，这些表示可以作为其他任务（如分类、聚类等）的特征输入。

稀疏自编码器还可以被用作降维方法，通过减少输入数据的维度，可以减少计算复杂性，并提高模型的泛化能力。

支持向量机—ctj

VC维(Vapnik-Chervonenkis Dimension)的概念是为了研究学习过程一致收敛的速度和推广性，由统计学习理论定义的有关函数集学习性能的一个重要指标。对于一个指示函数集，如果存在h个样本能够被函数集中的函数按所有可能够的2h种形式分开，则称函数集能够把h个样本打散，函数集的VC维就是它能打散的最大样本数目h。VC维反映了函数集的学习能力，VC维越大则学习机器越复杂(学习能力越强)。

可以统一起表示为:
yi [(wT x) b] 1 0
i 1,2, , m
Support vector machine

最优分类超平面最后可以表示成一个约束优化问题：
(1)
这是一个严格凸规划问题，可以转换成拉格朗日(Lagrange) 问题进行求解。为此，可以定义如下的Lagrange函数：
* * * * 计算 w yi i xi ，选择 a 的一个正分量 j , 并据此计算
b yj yi i* xi xj
* i 1
l
w* x b* 0 ,决策函数 f x sgn((w* x) b* ) 构造分划超平面

i*都与一个训练点相对应。而分事实上，的每一个分量划超平面仅仅依赖于 i* 不为零的训练点 xi, yi ，而与对应于 i* 为零的那些训练点无关。
Support vector machine
图1 underfitting
图2 good fitting
图3 overfitting
支持向量机

Support vector machine
支持向量机(Support Vector Machine，SVM) 是二十世纪九十年代中期Vapnik等人提出了一种在统计学习理论体系基础之上的新型机器学习方法，已经成为机器学习领域的研究热点。它利用结构风险最小化准则训练学习机器，使其具有很好的学习能力，特别是泛化能力。支持向量机具有坚实的理论基础、简洁的数学形式、直观的几何解释，而且能够较好地解决小样本、非线性、维数灾难和局部极小等问题，因此在模式分类，回归问题，生物信息技术等很多领域均得到了广泛的应用。

support vector machine for histogram-based image classification

IEEE TRANSACTIONS ON NEURAL NETWORKS,VOL.10,NO.5,SEPTEMBER19991055 Support Vector Machines forHistogram-Based Image ClassiﬁcationOlivier Chapelle,Patrick Haffner,and Vladimir N.VapnikAbstract—Traditional classiﬁcation approaches generalizepoorly on image classiﬁcation tasks,because of the highdimensionality of the feature space.This paper shows thatsupport vector machines(SVM’s)can generalize well on difﬁcultimage classiﬁcation problems where the only features arehigh dimensional histograms.Heavy-tailed RBF kernels ofthe form K(x;y)=e00y jPublisher Item Identiﬁer S1045-9227(1056IEEE TRANSACTIONS ON NEURAL NETWORKS,VOL.10,NO.5,SEPTEMBER 1999bythesolution of the maxi-mization problem (3)has been found,theOSHhas the followingexpansion:(4)The support vectors are the points forwhichwith(6)to allow the possibility of examples that violate (2).The purpose of thevariablesis chosen by the user,alargerin (7),the penalty termfor misclassiﬁcations.When dealing with images,most of the time,the dimension of the input space is large(has in this case little impact on performance.C.Nonlinear Support Vector MachinesThe input data is mapped into a high-dimensional feature space through some nonlinear mapping chosen a priori [8].In this feature space,the OSH is constructed.If wereplace ,(3)becomesisneeded in the training algorithm and themappingsuchthatsatisfying Mercer’s condition has beenchosen,the training algorithm consists ofminimizing(8)and the decision functionbecomeshyperplanes areconstructed,wheredecisionfunctionsis givenby ,i.e.,the class withthe largest decision function.We made the assumption that every point has a single label.Nevertheless,in image classiﬁcation,an image may belong to several classes as its content is not unique.It would be possible to make multiclass learning more robust,and extend it to handle multilabel classiﬁcation problems by using error correcting codes [12].This more complex approach has not been experimented in this paper.III.T HE D ATAAND I TSR EPRESENTATIONAmong the many possible features that can be extracted from an image,we restrict ourselves to ones which are global and low-level (the segmentation of the image into regions,objects or relations is not in the scope of the present paper).CHAPELLE et al.:SVM’S FOR HISTOGRAM-BASED IMAGE CLASSIFICATION 1057The simplest way to represent an image is to consider its bitmap representation.Assuming the sizes of the images inthe database are ﬁxedto(for the width),then the input data for the SVM are vectorsofsizefor grey-level images and3for color images.Each component of the vector is associated to a pixel in the image.Some major drawbacks of this representation are its large size and its lack of invariance with respect to translations.For these reasons,our ﬁrst choice was the histogram representation which is described presently.A.Color HistogramsIn spite of the fact that the color histogram technique is a very simple and low-level method,it has shown good results in practice [2]especially for image indexing and retrieval tasks,where feature extraction has to be as simple and as fast as possible.Spatial features are lost,meaning that spatial relations between parts of an image cannot be used.This also ensures full translation and rotation invariance.A color is represented by a three dimensional vector corre-sponding to a position in a color space.This leaves us to select the color space and the quantization steps in this color space.As a color space,we chose the hue-saturation-value (HSV)space,which is in bijection with the red–green–blue (RGB)space.The reason for the choice of HSV is that it is widely used in the literature.HSV is attractive in theory.It is considered more suitable since it separates the color components (HS)from the lu-minance component (V)and is less sensitive to illumination changes.Note also that distances in the HSV space correspond to perceptual differences in color in a more consistent way than in the RGB space.However,this does not seem to matter in practice.All the experiments reported in the paper use the HSV space.For the sake of comparison,we have selected a few experiments and used the RGB space instead of the HSV space,while keeping the other conditions identical:the impact of the choice of the color space on performance was found to be minimal compared to the impacts of the other experimental conditions (choice of the kernel,remapping of the input).An explanation for this fact is that,after quantization into bins,no information about the color space is used by the classiﬁer.The number of bins per color component has been ﬁxedto 16,and the dimension of each histogram is.Some experiments with a smaller number of bins have been undertaken,but the best results have been reached with 16bins.We have not tried to increase this number,because it is computationally too intensive.It is preferable to compute the histogram from the highest spatial resolution available.Subsampling the image too much results in signiﬁcant losses in performance.subsampling,the histogram loses its sharp peaks,as pixel colors turn into averages (aliasing).B.Selecting Classes of Images in the Corel Stock Photo CollectionThe Corel stock photo collection consists of a set of photographs divided into about 200categories,each one with100images.For our experiments,the original 200categories have been reduced using two different labeling approaches.In the ﬁrst one,named Corel14,we chose to keep the cat-egories deﬁned by Corel.For the sake of comparison,we chose the same subset of categories as [13],which are:air shows,bears,elephants,tigers,Arabian horses,polar bears,African specialty animals,cheetahs-leopards-jaguars,bald eagles,mountains,ﬁelds,deserts,sunrises-sunsets,night scenes .It is important to note that we had no inﬂuence on the choices made in Corel14:the classes were selected by [13]and the examples illustrating a class are the 100images we found in a Corel category.In [13],some images which were visually deemed inconsistent with the rest of their category were removed.In the results reported in this paper,we use all 100images in each category and kept many obvious outliers:see for instance,in Fig.2,the “polar bear alert”sign which is considered to be an image of a polar bear.With 14categories,this results in a database of 1400images.Note that some Corel categories come from the same batch of photographs:a system trained to classify them may only have to classify color and exposure idiosyncracies.In an attempt to avoid these potential problems and to move toward a more generic classiﬁcation,we also deﬁned a second labeling approach,Corel7,in which we designed our own seven categories:airplanes,birds,boats,buildings,ﬁsh,people,vehicles .The number of images in each category varies from 300to 625for a total of 2670samples.For each category images were hand-picked from several original Corel categories.For example,the airplanes category includes images of air shows,aviation photography,ﬁghter jets and WW-II planes .The representation of what is an airplane is then more general.Table I shows the origin of the images for each category.IV.S ELECTINGTHEK ERNELA.IntroductionThe design of the SVM classiﬁer architecture is very simple and mainly requires the choice of the kernel (the only other parameter isandresults in a classiﬁer which has a polynomial decisionfunction.1058IEEE TRANSACTIONS ON NEURAL NETWORKS,VOL.10,NO.5,SEPTEMBER1999Fig.1.Corel14:each row includes images from the following seven categories:air shows,bears,Arabian horses,night scenes,elephants,bald eagles, cheetahs-leopards-jaguars.Encouraged by the positive results obtainedwithwhereIt is not known if the kernel satisﬁes Mercer’s condition.1Another obvious alternative is theCHAPELLE et al.:SVM’S FOR HISTOGRAM-BASED IMAGE CLASSIFICATION1059Fig.2.Corel14:each row includes images from the following seven categories:Tigers,African specialty animals,mountains,ﬁelds,deserts,sun-rises-sunsets,polar bears.B.ExperimentsThe ﬁrst series of experiments are designed to roughly assess the performance of the aforementioned input represen-tations and SVM kernels on our two Corel tasks.The 1400examples of Corel14were divided into 924training examples and 476test examples.The 2670examples of Corel7were split evenly between 1375training and test examples.The SVM error penaltyparametervalues were selectedheuristically.More rigorous procedures will be described in the second series of experiments.Table II shows very similar results for both the RBG and HSV histogram representations,and also,with HSV histograms,similar behaviors between Corel14and Corel7.The “leap”in performance does not happen,as normally expected by using RBF kernels but with the proper choice of metric within the RBF placian or1060IEEE TRANSACTIONS ON NEURAL NETWORKS,VOL.10,NO.5,SEPTEMBER 1999TABLE IH AND -L ABELEDC ATEGORIES U SED WITHTHECOREL DATABASETABLE IIE RROR R ATES U SING THEF OLLOWING K ERNELS :L INEAR ,P OLYNOMIAL OFD EGREE 2,G AUSSIAN RBF,L APLACIAN RBF AND 2RBFTABLE IIIE RROR R ATES WITHKNNSVM’s.To demonstrate this,we conducted some experiments of image histogram classiﬁcation with aK-nearest neighbors(KNN)algorithm with the distancesgave the best results.Table III presents the results.As expected,the64images.Except in the linearcase,the convergence of the support vector search process was problematic,often ﬁnding a hyperplane where every sample is a support vector.The same database has been used by [13]with a decision tree classiﬁer and the error rate was about 50%,to 47.7%error rate obtained with the traditional combination of an HSV histogram and a KNN classiﬁer.The 14.7%error rate obtained with the Laplacian or-pixel bin in the histogram accounts for a singleuniform color region in the image (with histogrampixels to aneighboring bin,resulting in a slightly different histogram,the kernel values arewithThe decay rate around zero is given by:decreasing the value ofwould provide for a slower decay.A data-generating interpretation of RBF’s is that they corre-spond to a mixture of local densities (generally in this case,lowering the value of(Gaussian)to (Laplacian)oreven(Sublinear)[16].Note that if we assume that histograms are often distributed around zero (only a few bins have nonzero values),decreasing the value of.22Aneven more general type of Kernel is K (x ;y )=e 0 dc:Decreasing the value of c does not improve performance as much as decreasing a and b ,and signiﬁcantly increases the number of support vectors.CHAPELLE et al.:SVM’S FOR HISTOGRAM-BASED IMAGE CLASSIFICATION1061Fig.3.Corel7:each row includes images from the following categories:airplanes,birds,boats,buildings,ﬁsh,people,cars.The choice ofdoes not have to be interpreted in terms of kernelproducts.One can see it as the simplest possible nonlinearremapping of the input that does not affect the dimension.The following gives us reasons to believe that,thenumber of pixels is multiplied by,and-exponentiation could lower thisquadratic scaling effect to a more reasonable,which transforms all thecomponents which are not zero to one(we assume that1062IEEE TRANSACTIONS ON NEURAL NETWORKS,VOL.10,NO.5,SEPTEMBER 1999For the reasons stated in Section III.A,the only imagerepresentation we consider here is the1616HSV histogram.Our second series of experiments attempts todeﬁne a rigor-ous procedure to choosehas to be chosen large enoughcompared to the diameter of the sphere containingthe input data (The distance betweenis equal to .With proper renormalization ofthe input data,we can setas In the linear case,the diameter of the data depends on theway it is normalized.The choice ofwith,(7)becomes(10)Similar experimental conditions are applied to both Corel7and Corel14.Each category is divided into three sets,each containing one third of the images,used as training,validation and test sets.For each value of the input renormalization,support vectors are obtained from the training set and tested on the validation set.Theand sum up to or 1.Nonoptimal10and can be computed in advance.sqrt square root exp exponentialExcept in the sublinear RBF case,the number of flt is the dominating factor.In the linear case,the decision function (5)allows the support vectors to be linearly combined:there is only one flt per class and component.In the RBF case,there is one flt per class,component and support vector.Because of the normalization by 7.In the sublinear RBF case,the number of sqrt is dom-inating.sqrt is in theory required for each component of the kernel product:this is the number we report.It is a pessimistic upper bound since computations can be avoided for components with value zero.E.ObservationsThe analysis of the Tables IV–VI shows the following characteristics that apply consistently to both Corel14and Corel7:CHAPELLE et al.:SVM’S FOR HISTOGRAM-BASED IMAGE CLASSIFICATION 1063TABLE VIC OMPUTATIONAL R EQUIREMENTS FOR C OREL 7,R EPORTED AS THEN UMBER OF O PERATIONS FOR THE R ECOGNITION OF O NE E XAMPLE ,D IVIDED BY 724096•As anticipated,decreasing.(comparecolumntolineandto 0.25makes linear SVM’s a very attractivesolution for many applications:its error rate is only 30%higher than the best RBF-based SVM,while its compu-tational and memory requirements are several orders of magnitude smaller than for the most efﬁcient RBF-based SVM.•Experiments withwith the validation set,a solutionwith training misclassiﬁcations was preferred (around 1%error on the case of Corel14and 5%error in the case of Corel7).Table VII presents the class-confusion matrix corresponding to the use of the Laplacian kernel on Corel7with(these values yield the best results for both Corel7and Corel14).The most common confusions happen between birds and airplanes ,which is consistent.VI.S UMMARYIn this paper,we have shown that it is possible to push the classiﬁcation performance obtained on image histograms to surprisingly high levels with error rates as low as 11%for the classiﬁcation of 14Corel categories and 16%for a more generic set of objects.This is achieved without any other knowledge about the task than the fact that the input is some sort of color histogram or discrete density.TABLE VIIC LASS -C ONFUSION M ATRIX FOR a =0:25AND b =1:0.F ORE XAMPLE ,R OW (1)I NDICATES T HAT ON THE 386I MAGES OF THE A IRPLANES C ATEGORY ,341H A VE B EEN C ORRECTLY C LASSIFIED ,22H A VE B EEN C LASSIFIED IN B IRDS ,S EVEN IN B OATS ,F OUR IN B UILDINGS ,AND 12IN V EHICLESThis extremely good performance is due to the superior generalization ability of SVM’s in high-dimensional spaces to the use of heavy-tailed RBF’s as kernels and to nonlin-ear transformations applied to the histogram bin values.We studied how the choice of the-exponentiation withandimproves the performance of linearSVM’s to such an extent that it makes them a valid alternative to RBF kernels,giving comparable performance for a fraction of the computational and memory requirements.This suggests a new strategy for the use of SVM’s when the dimension of the input space is extremely high.kernels intended at making this dimension even higher,which may not be useful,it is recommended to ﬁrst try nonlinear transformations of the input components in combination with linear SVM’s.The computations may be orders of magnitude faster and the performances comparable.This work can be extended in several ways.Higher-level spatial features can be added to the histogram features.Al-lowing for the detection of multiple objects in a single image would make this classiﬁcation-based technique usable for image retrieval:an image would be described by the list of objects it contains.Histograms are used to characterize other types of data than images,and can be used,for instance,for fraud detection applications.It would be interesting to investigate if the same type of kernel brings the same gains in performance.R EFERENCES[1]W.Niblack,R.Barber,W.Equitz,M.Flickner, D.Glasman, D.Petkovic,and P.Yanker,“The qbic project:Querying image by content1064IEEE TRANSACTIONS ON NEURAL NETWORKS,VOL.10,NO.5,SEPTEMBER1999using color,texture,and shape,”SPIE,vol.1908,pp.173–187,Feb.1993.[2]M.Swain and D.Ballard,“Indexing via color histograms,”Int.J.Comput.Vision,vol.7,pp.11–32,1991.[3]V.Vapnik,The Nature of Statistical Learning Theory.New York:Springer-Verlag,1995.[4]V.V apnik,Statistical Learning Theory.New York:Wiley,1998.[5]P.Bartlett and J.Shawe-Taylor,“Generalization performance of supportvector machines and other pattern classiﬁers,”in Advances in Ker-nel Methods—Support Vector Learning.Cambridge,MA:MIT Press, 1998.[6]M.Bazaraa and C.M.Shetty,Nonlinear Programming New York:Wiley,1979.[7] C.Cortes and V.Vapnik,“Support vector networks,”Machine Learning,vol.20,pp.1–25,1995.[8] B.E.Boser,I.M.Guyon,and V.N.Vapnik,“A training algorithm foroptimal margin classiﬁer,”in Proc.5th ACM put.Learning Theory,Pittsburgh,PA,July1992,pp.144–152.[9]J.Weston and C.Watkins,“Multiclass support vector machines,”Univ.London,U.K.,Tech.Rep.CSD-TR-98-04,1998.[10]M.Pontil and A.Verri,“Support vector machines for3-d objectrecognition,”in Pattern Anal.Machine Intell.,vol.20,June1998. [11]V.Blanz,B.Sch¨o lkopf,H.B¨u lthoff,C.Burges,V.Vapnik,and T.Vetter,“Comparison of view-based object recognition algorithms using realistic3d models,”in Artiﬁcial Neural Networks—ICANN’96,Berlin, Germany,1996,pp.251–256.[12]R.Schapire and Y.Singer,“Improved boosting algorithms usingconﬁdence-rated predictions,”in put.Learning Theory,1998.[13] C.Carson,S.Belongie,H.Greenspan,and J.Malik,“Color-andtexture-based images segmentation using em and its application to image querying and classiﬁcation,”submitted to Pattern Anal.Machine Intell., 1998.[14] B.Sch¨o lkopf,K.Sung,C.Burges,F.Girosi,P.Niyogi,T.Poggio,andV.Vapnik,“Comparing support vector machines with Gaussian kernels to radial basis function classiﬁers,”Massachusetts Inst.Technol.,A.I.Memo1599,1996.[15] B.Schiele and J.L.Crowley,“Object recognition using multidimen-sional receptiveﬁeld histograms,”in ECCV’96,4th European Conf.Comput.Vision,vol.I,1996,pp.610–619.[16]S.Basu and C.A.Micchelli,“Parametric density estimation for theclassiﬁcation of acoustic feature vectors in speech recognition,”in Nonlinear Modeling:Advanced Black-Box Techniques,J.A.K.Suykens and J.Vandewalle,Eds.Boston,MA:Kluwer,1998.[17] E.Osuna,R.Freund,and F.Girosi,“Training support vector machines:An application to face detection,”in IEEE CVPR’97,Puerto Rico,June 17–19,1997.[18] E.Osuna,R.Freund,and F.Girosi,“Improved training algorithm forsupport vector machines,”in IEEE NNSP’97,Amelia Island,FL,Sept.24–26,1997.Olivier Chapelle received the B.Sc.degree in com-puter science from the Ecole Normale Sup´e rieurede Lyon,France,in1998.He is currently studyingthe M.Sc.degree in computer vision at the EcoleNormale Superieure de Cachan,France.He has been a visiting scholar in the MOVIcomputer vision team at Inria,Grenoble,France,in1997and at AT&T Research Labs,Red Bank,NJ,in the summer of1998.For the last three months,hehas been working at AT&T Research Laboratorieswith V.Vapnik in theﬁeld of machine learning. His research interests include learning theory,computer vision,and support vectormachines.Patrick Haffner received the bachelor’s degreefrom Ecole Polytechnique,Paris,France,in1987and from Ecole Nationale Sup´e rieure desT´e l´e communications(ENST),Paris,France,in1989.He received the Ph.D.degree in speechand signal processing from ENST in1994.In1988and1990,he worked with A.Waibelon the design of the TDNN and the MS-TDNNarchitectures at A TR,Japan,and Carnegie MellonUniversity,Pittsburgh,PA.From1989to1995,asa Research Scientist for CNET/France-T´e l´e com in Lannion,France,he developed connectionist learning algorithms for telephone speech recognition.In1995,he joined AT&T Bell Laboratories and worked on the application of Optical Character Recognition and transducers to the processing ofﬁnancial documents.In1997,he joined the Image Processing Services Research Department at AT&T Labs-Research.His research interests include statistical and connectionist models for sequence recognition,machine learning,speech and image recognition,and information theory.Vladimir N.Vapnik,for a photograph and biography,see this issue,p.999.。

磁梯度张量不变量的目标定位滤波,英文文献

磁梯度张量不变量的目标定位滤波,英文文献Magnetic Gradient Tensor Invariant-Based Target Localization FilteringThe precise localization of targets is a critical task in various applications, such as remote sensing, geophysical exploration, and defense systems. One of the effective approaches to target localization is the utilization of magnetic gradient tensor (MGT) data, which provides valuable information about the spatial distribution of the magnetic field. The MGT is a second-order tensor that describes the rate of change of the magnetic field in different directions, and it can be used to extract useful features for target detection and localization.The use of MGT data for target localization is based on the concept of tensor invariants, which are scalar quantities that are independent of the coordinate system used to represent the tensor. These invariants can be used to characterize the magnetic field and its spatial variations, and they can be employed in the design of target localization filters that are robust to changes in the sensor orientation or the target's position.One of the key advantages of using MGT-based target localization is its ability to provide accurate and reliable results even in the presence of various types of noise and interference, such as sensor errors, environmental disturbances, and target-induced magnetic fields. By exploiting the tensor invariant properties of the MGT, it is possible to develop filtering algorithms that can effectively suppress these unwanted effects and enhance the target detection and localization performance.In the literature, several MGT-based target localization approaches have been proposed, each with its own strengths and limitations. One common approach is to use the MGT eigenvalues, which represent the principal components of the magnetic field gradient, to identify and locate targets. Another approach is to utilize the MGT invariants, such as the trace, determinant, and eigenvalues, to construct target detection and localization filters.For example, one study presented a target localization method based on the MGT invariants, where the authors developed a filter that exploits the fact that the target-induced magnetic field is characterized by a specific pattern in the MGT invariants. The filter was shown to be effective in locating targets even in the presence of strong background magnetic fields and sensor noise.Another study proposed a Bayesian framework for MGT-based targetlocalization, where the authors used the MGT invariants to construct a likelihood function that describes the probability of detecting a target given the observed MGT data. The Bayesian approach allowed for the incorporation of prior information about the target's characteristics and the sensor's capabilities, leading to improved localization accuracy.In addition to these approaches, researchers have also explored the use of machine learning techniques, such as neural networks and support vector machines, to leverage the MGT data for target localization. These data-driven methods have the potential to capture complex relationships between the MGT features and the target's position, potentially leading to even more accurate and robust localization algorithms.Despite the significant progress in MGT-based target localization, there are still several challenges and open research questions that need to be addressed. For example, the sensitivity of the MGT data to environmental factors, such as geological structures and electromagnetic interference, can pose challenges in real-world applications. Additionally, the development of efficient and scalable algorithms for processing large-scale MGT data, particularly in the context of sensor networks or aerial surveys, is an active area of research.In conclusion, the use of magnetic gradient tensor data for target localization is a promising approach that has been extensively studied in the literature. By exploiting the tensor invariant properties of the MGT, researchers have developed a range of effective filtering and detection algorithms that can provide accurate and reliable target localization results, even in the presence of various types of noise and interference. As the field continues to evolve, further advancements in MGT-based target localization are expected to have a significant impact on a wide range of applications, from remote sensing and geophysical exploration to defense and security systems.。

局部保留最大信息差v-支持向量机

局部保留最大信息差v-支持向量机陶剑文;王士同【期刊名称】《自动化学报》【年(卷),期】2012(038)001【摘要】The state-of-the-art pattern classifiers can not efficiently preserve the local geometrical structure or the diversity (or discriminative) information of data points embedded in high-dimensional data space, which is useful for pattern recognition. A novel so-called locality-preserved maximum information variance v-support vector machine (v-LPMIVSVM) algorithm is presented based on manifold learning to address those problems mentioned above. The v-LPMIVSVM introduces within-locality homogeneous scatter and within-locality heterogeneous scatter, which respectively denote the within-locality manifold information of data points and the within-locality diversity information of data points, thus constructing an optimal classifier with optimal projection weight vector by minimizing the within-locality homogeneous scatter and simultaneously maximizing the within-locality heterogeneous scatter. Meanwhile, the v-LPMIVSVM adopts geodesic distance metric to measure the distance between data in the manifold space, which can reflect the true geometry of the manifold. Experimental results on artificial and real world problems show the outperformed or comparable effectiveness of v-LPMIVSVM.%针对现有模式分类方法不能较好地保持数据空间的局部流形信息或差异信息等问题,提出一种基于流形学习的局部保留最大信息差v-支持向量机(Locality-preserved maximum information variance v-support vector machine,v-LPMIVSVM).对于模式分类问题,v-LPMIVSVM引入局部同类离散度和局部异类离散度概念,分别体现输入空间局部流形结构和局部差异(或判别)信息,通过最小化局部同类离散度和最大化局部异类离散度,优化分类器的投影方向.同时,v-LPMIVSVM采用适于流形数据的测地线距离来度量数据点对间的相似性,以更好地反映流形数据的本质结构.人造和实际数据集实验结果显示所提方法具有良好的泛化性能.【总页数】12页(P97-108)【作者】陶剑文;王士同【作者单位】江南大学信息工程学院无锡 214122;浙江工商职业技术学院信息工程学院宁波 315012;江南大学信息工程学院无锡 214122【正文语种】中文【相关文献】1.基于V-支持向量机与ε-支持向量机的非线性系统辨识 [J], 张智;朱齐丹;严勇杰2.基于V-支持向量机与ε-支持向量机的非线性系统辨识 [J], 张智;朱齐丹;严勇杰3.基于L1范数的二维鉴别局部保留投影的最大间距准则方法 [J], 谢玉凯;卢桂馥;宣东东4.基于样本权重的v-支持向量机 [J], 李凯;翟璐璐;崔丽娟5.v-支持向量机洪水预报模型研究 [J], 但灵芝;王建群;陈理想;陈红红因版权原因，仅展示原文概要，查看原文内容请购买。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Localized Support Vector Machine and Its Eﬃcient Algorithm Haibin Cheng Pang-Ning Tan Rong JinAbstractNonlinear Support Vector Machines employ sophisticated kernel functions to classify data sets with complex decision surfaces.Determining the right parameters of such functions is not only computationally expensive,the resulting models are also susceptible to overﬁtting due to their large VC di-mensions.Instead ofﬁtting a nonlinear model,this paper presents a framework called Localized Support Vector Machine(LSVM),which builds multiple linear SVM mod-els from the training data.Since each model is designed to classify a particular test example,it has high compu-tational cost.To overcome this limitation,we propose an eﬃcient implementation of LSVM,termed Proﬁle SVM (PSVM).PSVM partitions the training examples into clus-ters and builds a separate linear SVM model for each cluster. Our empirical results show that(1)Both LSVM and PSVM outperform nonlinear SVM on the majority of the evaluated data sets;and(2)PSVM achieves comparable accuracy as LSVM but with signiﬁcant computational savings.1IntroductionNonlinear Support Vector Machine(SVM)has been widely used in many applications,from text catego-rization to protein classiﬁcation.Despite its well-documented successes,nonlinear SVM must employ so-phisticated kernel functions toﬁt data sets with complex decision surfaces.Determining the right parameters of such functions is not only computationally expensive, the resulting models are also susceptible to overﬁtting when the number of training examples is small due to their large VC dimensions.Instead of learning such a complex global model, an alternative strategy is to build simple models that ﬁt the data in the local neighborhood around a test example.A well-known technique that employs such a strategy is the K-nearest neighbor(KNN)classiﬁer[4]. KNN does not require any prior assumptions about characteristics of the data and its decision surfaces,thus avoiding the unnecessary bias of global functionﬁtting [1].Nevertheless,because of its lazy learning scheme, classifying test examples is computationally expensive.In this paper,we present a framework called Local-ized Support Vector Machine(LSVM),which leverages the strengths of SVM and KNN.Instead of using so-phisticated kernel functions,LSVM builds a linear SVM model for each test example using only the training ex-amples located in the vicinity of the test example.We empirically show that such a strategy often leads to sig-niﬁcant improvement in accuracy over nonlinear SVM.Since each model is designed for a particular test example,LSVM can be very expensive when the number of test examples is large.To overcome this problem, we propose an eﬃcient technique called Proﬁle Support Vector Machine(PSVM).The intuition behind PSVM is that the models for test examples in the same neighborhood tend to have similar support vectors. Therefore,instead of building a separate model for each test example,PSVM partitions the training data into clusters and builds a linear SVM model for each cluster. PSVM also assigns each test example to its closest cluster and applies the corresponding linear SVM model to predict its class label.By reducing the number of constructed models,PSVM maintains the high accuracy of LSVM without its computational overhead.2PreliminariesConsider a training set D={(x1,y1),(x2,y2),..., (x n,y n)},where x i is an instance of the input space ℜd and y i∈{−1,+1}is its corresponding class label. In KNN[4],the label of a test example is determined by the training examples in its local neighborhood.Since KNN is sensitive to the neighborhood size,the weighted K-nearest neighbor[6]classiﬁer was introduced,which assigns a weight factor to each training example based on its similarity to the test example.The posterior probability of a test example x is computed as follows: p(y|x)= n i=1δ(y,y i)σ(x,x i)n i=1σ(x,x i)(2.1)whereσ(x,x i)is the similarity between x and x i whileδ(y,y i)= 1if y=y i0otherwise(2.2)Without loss of generality,we assume that the weight factorσis bounded between0and1.Support Vector Machine[7]ﬁnds an optimal hy-perplane to separate training examples from diﬀerent classes by maximizing the classiﬁcation margin[5,2].Itis applicable to nonlinear decision surfaces by employ-ing a technique known as kernel trick,which projects the input data to a higher-dimensional feature space where a separating hyperplane can be found.During model building,a nonlinear SVM is trained to solve the following optimization problem:max α1···αnn i =1αi −12ni,j =1αi αj y i y j φ(x i ,x j )(2.3)s.t.ni =1αi y i =00≤αi ≤C,i =1,2,...,n where φis the kernel function and αi is the weight assigned to the training example x i .The kernel function φis used to compute the dot product ϕ(x i )•ϕ(x j ),where ϕis a function that maps an instance to its higher dimensional space.Data points with αi >0are called support vectors.Once the weights have been determined,a test example x is classiﬁed as follows:y =sign n i =1αi y i φ(x ,x i ) (2.4)3Localized Support Vector Machine (LSVM)To overcome the limitations of KNN and nonlinear SVM,a natural idea is to combine the strengths of both methods.Zhang et al.[10]proposed a hybrid algo-rithm called KNN-SVM [10]for visual object recogni-tion.Their algorithm selects the K nearest neighbors of each test example and builds a local SVM model from its nearest neighbors [9].This method is a straightfor-ward adaptation of KNN to SVM and suﬀers from a number of limitations.First,it is sensitive to the neigh-borhood size.Second,it is not quite ﬂexible because the neighborhood size is ﬁxed for every test example.Fi-nally,their method decouples nearest-neighbor search from the SVM learning algorithm.Once the K near-est neighbors have been identiﬁed,the SVM algorithm completely ignores their similarities to the given test example when solving the dual optimization problem given in (2.3).This motivates us to develop a more integrated framework called Localized Support Vector Machine (LSVM),which incorporates the neighborhood informa-tion directly into SVM learning.The rationale behind LSVM is to reduce the impact of support vectors lo-cated far away from a given test example.This can be accomplished by weighting the classiﬁcation error of each training example according to its similarity to the test example.The similarity is captured by a weight function σ,similar to the approach used by weighted KNN.Let D ={x 1,x 2,...,x m }denote the set of test exam-ples.For each x s ∈D ,we construct its localized SVM model by solving the following optimization problem:min w 12 w 22+Cn i =1σ(x s ,x i )ξi (3.5)s.ty i (w ⊤x i −b )≥1−ξi ,ξi ≥0,i =1,2,...,nThe solution to (3.5)identiﬁes the decision surface aswell as the local neighborhood of the test example.The function σpenalizes training examples that arelocated far away from the test example.As a result,the classiﬁcation of the test example depends only on the support vectors in its local neighborhood.To further appreciate the role of the weight func-tion,consider the dual form of (3.5):max α1···αn n i =1αi −12n i,j =1αi αj y i y j φ(x i ,x j )(3.6)s.t.ni =1αi y i =00≤αi ≤Cσ(x s ,x i ),i =1,2,...,nCompared to (2.3),the only diﬀerence between LSVM and nonlinear SVM is that the constraint on the upper bound for αi has been modiﬁed from C to Cσ(x s ,x i ).Such modiﬁcation has the following two eﬀects:(1)It reduces the impact of far away support vectors,(2)Non-support vectors of the nonlinear SVM may become support vectors of LSVM.Note that the KNN-SVM method [10]is a special case of our LSVM framework.If σproduces a con-tinuous value output,our LSVM framework is called Soft Localized Support Vector Machine (SLSVM ).Con-versely,if σproduces a binary value output (0or 1),it is called Hard Localized Support Vector Machine (HLSVM ).For HLSVM,the upper bound for αi is con-strained to C ,which is equivalent to the optimization problem for KNN-SVM.Finally,we use a linear kernel function for both HLSVM and SLSVM.4Proﬁle Support Vector Machine (PSVM)Since LSVM builds a separate model for each test example,it is computationally expensive when the size of the test set is large.PSVM aims to provide a reasonable approximation to LSVM by reducing the number of constructed models via clustering.To understand the intuition behind PSVM,let σs =[σ(x s ,x 1),...,σ(x s ,x n )]Tdenote a column vector of similarities between a test example x s to each training example x i (∀i ∈{1,···,n }).From (3.6),noticethat the local optimization problem to be solved foreach test example x s is almost identical,except forthe upper bound constraint onαi,which depends onσ(x s,x i).Sinceαi determines whether x i is a supportvector,we expect test examples with similar σs to sharemany common support vectors.Therefore,if we can ﬁnd a set of prototype vectors σ1, σ2,..., σκsuch that the similarity vector of each test example is closelyapproximated by one of theκprototypes,we need tobuild onlyκlinear SVM models(instead of building aseparate model for each test example).4.1Supervised Clustering for PSVMLetΣbe an n×m weight matrix,where n is the trainingset size,m is the test set size,and the j-th columnof the matrix corresponds to the similarity vector σj.Our clustering approach is equivalent to approximatingΣby the product of two lower rank matricesΛ=[λ]n×κandΓ=[γ]κ×m.The j th column ofΛdenotethe membership of each training example in cluster j,whereas the i th row ofΓdenote the membership of eachtest example in cluster i.Our clustering task is somewhat diﬀerent than con-ventional unsupervised clustering.First,the data ma-trix to be clustered isΣ,which contains the similaritiesbetween the training and test examples.Second,con-ventional clustering methods consider only the proxim-ity between examples and often end up grouping train-ing examples from the same class into the same cluster.Because such clusters tend to be pure,their inducedmodels are trivial.Therefore,the clustering criterionmust be modiﬁed to ensure each cluster contains train-ing examples from all classes.In this paper,we propose the“MagKmeans”algorithm,which modiﬁes the clustering criterion of thek-means algorithm to incorporate the class distributionof training examples within each cluster.The data to beclustered consists of two parts:(1)the matrixΣand(2)the class label vector Y=(y1,...,y n)⊤of the trainingexamples.The objective function for MagKmeans is:min Z,Cκj=1n i=1Z i,j X i−C j 22+Rκ j=1 n i=1Z i,j Y i ,where X T i is the i-th row vector inΣ,C j is the centroid of the j th cluster,Y i is an element of the vector Y, R>0is a scaling parameter,and Z is the cluster membership matrix,whose(i,j)-th element is one if the i th training example is assigned to the j th cluster, and zero otherwise.Note that theﬁrst term in the objective function is identical to the cluster cohesion criterion used by regular k-means.Minimizing this term would lead to compact clusters.The second termFigure1:An illustration of the MagKmeans clustering algorithmin the objective function measures the class imbalance within the clusters.This term is minimized when every cluster contains equal number of positive and negative examples.Minimizing this term enforces the requirement that the class distribution within each cluster must be balanced.Our algorithm iteratively performs the following two steps to optimize the clustering objective function. First,we compute the cluster membership matrix Z by ﬁxing the centroid C j for all j.Next,we compute the centroid C j byﬁxing the cluster memberships Z.These steps are repeated until the algorithm converges to a local minimum.When C j isﬁxed,Z can be computed eﬃciently using linear programming.To do this,weﬁrst transform the original optimization problem into the following form usingκslack variables t j(j=1,···,κ):min Zi,j κ j=1n i=1Z i,j(X i−C j)2+Rκ j=1t j s.t.−t j≤ni=1Z i,j Y i≤t jt j>0,0≤Z i,j≤1κj=1Z i,j=1When the cluster membership matrix Z isﬁxed,the following equation is used to update each centroid:C j= n i=1Z i,j X in i=1Z i,jFigure1illustrates how the MagKmeans algorithm works.The initial cluster(the leftﬁgure)contains only positive examples.As the algorithm progresses,some positive examples are expelled from the cluster while some negative examples are absorbed into the cluster (the rightﬁgure).By ensuring that the cluster has almost equal representation from each class,one can then build a linear SVM from the training examples.0.20.250.30.350.40.450.50.550.60.650.70.20.30.40.50.60.70.80.91Figure 2:Regular Kmeans clustering result (left).MagKmeans clustering result (right).Clusters are represented by diﬀerent colors.Finally,we illustrate the diﬀerence between the result-ing clusters produced by regular k-means and MagK-means using the synthetic data set shown in Figure 2.Based on the cluster cohesion criterion,the k-means al-gorithm produces clusters that correspond to each class,whereas MagKmeans generates clusters with represen-tatives from both classes.4.2Model Building and TestingAfter applying the MagKmeans algorithm,we obtaina prototype matrix Λ=[λ]n ×κ,where each elementλi,j =Z i,j ,i.e.,it represents the membership of eachtraining example x i in cluster j .We then build aseparate linear SVM model for each cluster by solving the following optimization problem:max αn i =1 αi −12ni,j =1 αi αj y i y j x i x j s.t.n i =1αi y i =00≤ αi ≤Cλi,k ,i =1,2,...,n Since λi,k ∈{0,1},this is equivalent to building a linear SVM using only the training examples assigned to the cluster k .The models obtained from the clusters form a piecewise decision surface consisting of κhyperplanes.Let C denote the κ×m centroid matrix obtained by the MagKmeans algorithm.During the testing step,we need to determine which local model should be used to classify a test example.Since the (i,j )-th element of the centroid matrix indicates the similarity between a test example x j to cluster i ,we assign the test example to the cluster with highest similarity.More speciﬁcally,we construct the lower rank matrix Γby setting:γi,j =1if i =arg max i C i,j0otherwise The class label of the test example x j is determinedby applying the corresponding linear SVM model of its assigned cluster.In short,PSVM uses the MagKmeans algorithm to identify the κprototypes.It then trains a local SVM model for each prototype.In our experiments,we found that the number of clusters κtends to be much smaller than m and n .We found that this criterion usually delivers satisfactory performance.Since κis generallymuch smaller than the number of training examples n and the number of test examples m ,PSVM reduces thecomputational cost of LSVM signiﬁcantly.R is another parameter in PSVM that needs to be determined.In our work,we empirically set it to 1/κtimes the diameter ofthe data set.5Experimental Evaluation We have conducted extensive experiments to evaluatethe performance of our proposed algorithms in compar-ison to KNN and nonlinear SVM algorithms.To imple-ment the proposed LSVM algorithm,we modiﬁed the C++code of the LIBSVM tool developed by Changand Lin [3]to use Cσas its upper bound constraint for αinstead of C .For PSVM,we have implementedthe MagKmeans algorithm to cluster the weight matrixΣinto κprototypes.All our experiments were con-ducted on a Windows XP machine with 3.0GHz CPUand 1.0GB RAM.5.1Comparisons of Support Vectors and Deci-sion Boundaries In this experiment,we illustrate the piecewise linear decision boundaries formed by PSVM.The top panelof Figure 3shows the data distributions for two syn-thetic data sets with nonlinear decision boundaries.The bottom panel illustrates their corresponding decisionboundaries generated using PSVM.For the ﬁrst dataset in the left of Figure 3,the horse-shoe shaped de-cision boundary is now approximated by 11piecewise linear decision boundaries.The spiral-shaped decision boundary of the data set in the right ﬁgure is also ap-proximated by 11piecewise linear decision boundaries.In short,the results of this experiment show the abil-ity of PSVM to ﬁt a complex decision boundary using multiple piecewise linear segments.5.2Performance ComparisonWe use eight data sets from the UCI repository [8]to compare the performances of HLSVM,SLSVM,and PSVM against KNN and nonlinear SVM in terms of their accuracy and computational time.Some of the data sets such as “Breast”,“Glass”,“Iris”,“KDDcup IDS”,“Physics”,“Yeast”,“Robot”and “Forest”are00.10.20.30.40.50.60.70.80.910.10.20.30.40.50.60.70.80.9Figure 3:The top panel shows two synthetic data sets.Each data set is comprised of two classes marked by ◦and ∗.The bottom panel shows the decision boundaries generated by PSVM.Each color represents a diﬀerent cluster produced by the MagKmeans algorithm.Table 1:Classiﬁcation accuracies (%)for SVM,KNN,HLSVM (KNN-SVM),SLSVM ,and PSVM on the UCI datasets.Data SVM KNN HLSVM SLSVM PSVM Breast 94.5795.7095.4296.8596.52Glass 64.3354.3062.6766.3966.91Iris92.7589.7574.7196.4297.06KDDCup 98.2994.0498.1399.6799.22Physics 82.2366.3783.7786.7285.77Yeast 92.3193.3694.6296.0095.83Robot 86.5184.5485.4589.0190.21Covtype86.2167.4073.3390.2290.39multi-class prediction problems.Since our proposed algorithm is designed for binary classiﬁcation problems,we divide the classes for these data sets into two groups and relabel one group as the positive class and the other as the negative class.For some of the larger data sets such as “KDDcup IDS”,“Physics”,and “Robot”,we randomly sample 600records from each class to form the data sets.The attributes for all the eight data sets are normalized based on the maximum value of the attribute.The parameters of the classiﬁcation algorithm,i.e.the K in KNN,C in SVM,bandwidth λin RBF kernel and κin PSVM are determined by 10-fold cross validation on the training set.5.2.1Accuracy Comparison The experimental re-sults reported in this study are obtained based on apply-ing a 5-fold cross validation on the data sets.To makethe problem more challenging,we use 1/5of the entire data set for training and the remaining 4/5for testing.Each experiment is repeated ten times and the accu-racy reported is obtained by averaging the results over ten trials.Table 1summarizes the results of our experi-ments.First,observe that,for most data sets,nonlinear SVM outperforms the KNN algorithm.The most no-ticeable case is the “Covtype”data set,for which the accuracy of SVM is 86.21%while the accuracy of KNN is only 67.40%.Second,observe that HLSVM,which is the hard version of LSVM,fails to improve the ac-curacy over nonlinear SVM.In fact,the performance of HLSVM degrades signiﬁcantly on data sets such as “Glass”,“Iris”,and “Covtype”.The most noticeable case is “Iris”,where the classiﬁcation accuracy drops from 92.75%to 74.71%when using HLSVM instead of nonlinear SVM.One possible explanation for the poor performance of HLSVM is the diﬃculty of choosing the right number of nearest neighbors (K )when the number of training examples is small.We observe that the SLSVM algorithm consistently outperforms nonlinear SVM for all the data sets.In fact,with the exception of “KDD Cup”,the diﬀerence in classiﬁcation accuracies of SLSVM and nonlinear SVM is found to be statistically signiﬁcant according to Student’s t-test.This should not come as a surprise because nonlinear SVM is a special case of SLSVM by setting the kernel width to ∞.Finally,we observe that PSVM,which is an eﬃcient implementation of LSVM,achieves comparable accuracy as SLSVM and outperforms nonlinear SVM for all the data sets.5.2.2Runtime Comparison The purpose of this experiment is to evaluate the eﬃciency of LSVM and PSVM.Recall from Section 3that the main drawback of LSVM is its high computational cost since a separate LSVM model must be trained for each test example.PSVM alleviates this problem by grouping the training examples into a small number of clusters and building a linear SVM model for each cluster.Figure 4shows the runtime comparison (in seconds)among the diﬀerent classiﬁcation methods using the “Physics”data set.To evaluate the performance,we choose the two largest classes of the data set and randomly sample 300records from each class to form the training set.We then apply LSVM and PSVM to the data set,while varying the number of test examples from 600to 4800.The left panel of Figure 4shows the overall runtime for each method,which includes the training and testing times.Notice that the computational times for both HLSVM and SLSVM grows rapidly as the test set size increases.In contrast,the runtime of PSVM grows more gradually as the number of test examples increases.The right topNumber of Test ExamplesNumber of Test ExamplesNumber of Test ExamplesFigure 4:The left panel shows the overall runtime of SVM,KNN,HLSVM,SLSVM,and PSVM.The right top panel shows the runtime of PSVM during clustering.The right bottom panel shows the runtime of each method when predicting the class labels of test examples.panel shows a more detailed runtime analysis for PSVM.It compares the time needed to perform the MagKmeans clustering (represented by the PSVM-clustering line)and the time needed to build the LSVMs of diﬀerent clusters (represented by the PSVM-LSVM line).The result suggests that PSVM spends the majority of its time clustering the training data.Furthermore,if the clustering time of PSVM is discounted,then the time needed to train the multiple linear SVMs of the clusters as well as to apply the models to the test examples is faster than training and testing times for nonlinear SVM,as shown in the right bottom panel of Figure 4.To summarize,the SLSVM algorithm generally out-performs both SVM and KNN but at the expense of incurring a much higher computational cost.PSVM is able to improve its computational eﬃciency,while achieving comparable accuracy as the SLSVM algo-rithm.6Conclusion and Future WorkIn this paper,we proposed a framework for Localized Support Vector Machine,which utilizes a weight func-tion to constrain the maximum weights that can be as-signed to training examples based on their similarity to the test example.We tested our LSVM framework on a number of data sets and showed that its soft ver-sion (SLSVM)outperforms both KNN and nonlinearSVM in terms of model accuracy.Nevertheless,SLSVM is computational expensive since it requires training a separate model for each test example.To overcome this problem,we propose an approximation algorithm called PSVM,which reduces the training time by ex-tracting a small number of clusters and building linear SVM models only for the clusters.Our analysis further showed that PSVM outperforms both KNN and SVM and is comparable in accuracy but much more compu-tationally eﬃcient than LSVM.In the future,we plan to expand our framework to multi-class problems.For MagKmeans,this can be accomplished by modifying the clustering criterion to maximize the cluster impu-rity.We will also investigate the possibility of using other clustering algorithms such as spectral clustering to further enhance the results.References [1] C.Atkeson,A.Moore,and S.Schaal.Locally weighted learning.Artiﬁcial Intelligence Review ,11:11–73,April 1997.[2] C.J.C.Burges.A tutorial on support vector machinesfor pattern recognition.In Knowledge Discovery andData Mining ,page 2(2),1998.[3]C.-C.Chang and C.-J.Lin.LIBSVM:a library for support vector machines .Software available at .tw/cjlin/libsvm,2001.[4]T.Cover and P.Hart.Nearest neighbor pattern clas-siﬁcation.IEEE Transactions in Information Theory ,pages IT–13,21–27,1967.[5]S.R.Gunn.Support vector machines for classiﬁca-tion and regression.Technical report,University of Southampton,1997.[6]K.Hechenbichler and K.Schliep.Weighted k-nearest-neighbor techniques and ordinal classiﬁcation.Discus-sion Paper 399,SFB 386,2006.[7]T.Joachims.Transductive inference for text classiﬁca-tion using support vector machines prodigy .In Inter-national Conference on Machine Learning ,San Fran-cisco,1999.Morgan Kaufmann.[8]D.Newman,S.Hettich, C.Blake,and C.Merz.UCI repository of machine learning databases,/∼mlearn/mlrepository.,1998.[9]J.Platt,N.Cristianini,and rge margin dags for multiclass classiﬁcation.Advances in Neural Information Processing Systems 12,pages pp.547–553,MIT Press,2000.[10]H.Zhang,A.C.Berg,M.Maire,and J.Malik.Svm-knn:discriminative nearest neighbor for visual object recognition.In IEEE Conference on Computer Vision and Pattern Recognition ,2006.。