support vector machine
Support vector machine reference manual
snsv
ascii2bin bin2ascii
The rest of this document will describe these programs. To nd out more about SVMs, see the bibliography. We will not describe how SVMs work here. The rst program we will describe is the paragen program, as it speci es all parameters needed for the SVM.
sv
- the main SVM program - program for generating parameter sets for the SVM - load a saved SVM and classify a new data set
paragen loadsv
rm sv
- special SVM program for image recognition, that implements virtual support vectors BS97]. - program to convert SN format to our format - program to convert our ASCII format to our binary format - program to convert our binary format to our ASCII format
Controlling the sensitivity of support vector machines
ber of real world problems such as handwritten charac-
ter and digit recognition Scholkopf, 1997; Cortes, 1995;
LeteCalu.,n1e9t9a7l].,a1n9d9s5p;eVaakperniikd,e1n9ti95c]a, tfiaocne
cients
i are found by
L = Xp
i=1
i
?
1 2
Xp
i;j=1
i
jyiyjK(xi; xj)
(1)
subject to constraints:
i0
Xp iyi = 0
(2)
i=1
Only those points which lie closest to the hyperplane
pdoeicnitssionxifumncatpiopningis
to targets formulated
yini
(i = terms
of these kernels:
f(x) = sign
Xp
!
iyiK(x; xi) + b
i=1
where b is the bias and the coe maximising the Lagrangian:
have In
thie>p0re(stehnecesuopfpnoortisve,ecttworos
). techniques
can
be
used
to allow for, and control, a trade o between training
An Introduction to Support Vector Machines
9/13/2013
Input space 802. Prepared by Martin LawFeature space CSE
Example Transformation
Define the kernel function K (x,y) as Consider the following transformation
9/13/2013
CSE 802. Prepared by Martin Law
7
The Optimization Problem the problem to its dual
This is a quadratic programming (QP) problem
Kernel methods, large margin classifiers, reproducing kernel Hilbert space, Gaussian process
CSE 802. Prepared by Martin Law 3
9/13/2013
Two Class Problem: Linear Separable Case
Global maximum of ai can always be found
w can be recovered by
CSE 802. Prepared by Martin Law 8
9/13/2013
Characteristics of the Solution
Many of the ai are zero
Class 2
Many decision boundaries can separate these two classes Which one should we choose?
Support vector machine_A tool for mapping mineral prospectivity
Support vector machine:A tool for mapping mineral prospectivityRenguang Zuo a,n,Emmanuel John M.Carranza ba State Key Laboratory of Geological Processes and Mineral Resources,China University of Geosciences,Wuhan430074;Beijing100083,Chinab Department of Earth Systems Analysis,Faculty of Geo-Information Science and Earth Observation(ITC),University of Twente,Enschede,The Netherlandsa r t i c l e i n f oArticle history:Received17May2010Received in revised form3September2010Accepted25September2010Keywords:Supervised learning algorithmsKernel functionsWeights-of-evidenceTurbidite-hosted AuMeguma Terraina b s t r a c tIn this contribution,we describe an application of support vector machine(SVM),a supervised learningalgorithm,to mineral prospectivity mapping.The free R package e1071is used to construct a SVM withsigmoid kernel function to map prospectivity for Au deposits in western Meguma Terrain of Nova Scotia(Canada).The SVM classification accuracies of‘deposit’are100%,and the SVM classification accuracies ofthe‘non-deposit’are greater than85%.The SVM classifications of mineral prospectivity have5–9%lowertotal errors,13–14%higher false-positive errors and25–30%lower false-negative errors compared tothose of the WofE prediction.The prospective target areas predicted by both SVM and WofE reflect,nonetheless,controls of Au deposit occurrence in the study area by NE–SW trending anticlines andcontact zones between Goldenville and Halifax Formations.The results of the study indicate theusefulness of SVM as a tool for predictive mapping of mineral prospectivity.&2010Elsevier Ltd.All rights reserved.1.IntroductionMapping of mineral prospectivity is crucial in mineral resourcesexploration and mining.It involves integration of information fromdiverse geoscience datasets including geological data(e.g.,geologicalmap),geochemical data(e.g.,stream sediment geochemical data),geophysical data(e.g.,magnetic data)and remote sensing data(e.g.,multispectral satellite data).These sorts of data can be visualized,processed and analyzed with the support of computer and GIStechniques.Geocomputational techniques for mapping mineral pro-spectivity include weights of evidence(WofE)(Bonham-Carter et al.,1989),fuzzy WofE(Cheng and Agterberg,1999),logistic regression(Agterberg and Bonham-Carter,1999),fuzzy logic(FL)(Ping et al.,1991),evidential belief functions(EBF)(An et al.,1992;Carranza andHale,2003;Carranza et al.,2005),neural networks(NN)(Singer andKouda,1996;Porwal et al.,2003,2004),a‘wildcat’method(Carranza,2008,2010;Carranza and Hale,2002)and a hybrid method(e.g.,Porwalet al.,2006;Zuo et al.,2009).These techniques have been developed toquantify indices of occurrence of mineral deposit occurrence byintegrating multiple evidence layers.Some geocomputational techni-ques can be performed using popular software packages,such asArcWofE(a free ArcView extension)(Kemp et al.,1999),ArcSDM9.3(afree ArcGIS9.3extension)(Sawatzky et al.,2009),MI-SDM2.50(aMapInfo extension)(Avantra Geosystems,2006),GeoDAS(developedbased on MapObjects,which is an Environmental Research InstituteDevelopment Kit)(Cheng,2000).Other geocomputational techniques(e.g.,FL and NN)can be performed by using R and Matlab.Geocomputational techniques for mineral prospectivity map-ping can be categorized generally into two types–knowledge-driven and data-driven–according to the type of inferencemechanism considered(Bonham-Carter1994;Pan and Harris2000;Carranza2008).Knowledge-driven techniques,such as thosethat apply FL and EBF,are based on expert knowledge andexperience about spatial associations between mineral prospec-tivity criteria and mineral deposits of the type sought.On the otherhand,data-driven techniques,such as WofE and NN,are based onthe quantification of spatial associations between mineral pro-spectivity criteria and known occurrences of mineral deposits ofthe type sought.Additional,the mixing of knowledge-driven anddata-driven methods also is used for mapping of mineral prospec-tivity(e.g.,Porwal et al.,2006;Zuo et al.,2009).Every geocomputa-tional technique has advantages and disadvantages,and one or theother may be more appropriate for a given geologic environmentand exploration scenario(Harris et al.,2001).For example,one ofthe advantages of WofE is its simplicity,and straightforwardinterpretation of the weights(Pan and Harris,2000),but thismodel ignores the effects of possible correlations amongst inputpredictor patterns,which generally leads to biased prospectivitymaps by assuming conditional independence(Porwal et al.,2010).Comparisons between WofE and NN,NN and LR,WofE,NN and LRfor mineral prospectivity mapping can be found in Singer andKouda(1999),Harris and Pan(1999)and Harris et al.(2003),respectively.Mapping of mineral prospectivity is a classification process,because its product(i.e.,index of mineral deposit occurrence)forevery location is classified as either prospective or non-prospectiveaccording to certain combinations of weighted mineral prospec-tivity criteria.There are two types of classification techniques.Contents lists available at ScienceDirectjournal homepage:/locate/cageoComputers&Geosciences0098-3004/$-see front matter&2010Elsevier Ltd.All rights reserved.doi:10.1016/j.cageo.2010.09.014n Corresponding author.E-mail addresses:zrguang@,zrguang1981@(R.Zuo).Computers&Geosciences](]]]])]]]–]]]One type is known as supervised classification,which classifies mineral prospectivity of every location based on a training set of locations of known deposits and non-deposits and a set of evidential data layers.The other type is known as unsupervised classification, which classifies mineral prospectivity of every location based solely on feature statistics of individual evidential data layers.A support vector machine(SVM)is a model of algorithms for supervised classification(Vapnik,1995).Certain types of SVMs have been developed and applied successfully to text categorization, handwriting recognition,gene-function prediction,remote sensing classification and other studies(e.g.,Joachims1998;Huang et al.,2002;Cristianini and Scholkopf,2002;Guo et al.,2005; Kavzoglu and Colkesen,2009).An SVM performs classification by constructing an n-dimensional hyperplane in feature space that optimally separates evidential data of a predictor variable into two categories.In the parlance of SVM literature,a predictor variable is called an attribute whereas a transformed attribute that is used to define the hyperplane is called a feature.The task of choosing the most suitable representation of the target variable(e.g.,mineral prospectivity)is known as feature selection.A set of features that describes one case(i.e.,a row of predictor values)is called a feature vector.The feature vectors near the hyperplane are the support feature vectors.The goal of SVM modeling is tofind the optimal hyperplane that separates clusters of feature vectors in such a way that feature vectors representing one category of the target variable (e.g.,prospective)are on one side of the plane and feature vectors representing the other category of the target variable(e.g.,non-prospective)are on the other size of the plane.A good separation is achieved by the hyperplane that has the largest distance to the neighboring data points of both categories,since in general the larger the margin the better the generalization error of the classifier.In this paper,SVM is demonstrated as an alternative tool for integrating multiple evidential variables to map mineral prospectivity.2.Support vector machine algorithmsSupport vector machines are supervised learning algorithms, which are considered as heuristic algorithms,based on statistical learning theory(Vapnik,1995).The classical task of a SVM is binary (two-class)classification.Suppose we have a training set composed of l feature vectors x i A R n,where i(¼1,2,y,n)is the number of feature vectors in training samples.The class in which each sample is identified to belong is labeled y i,which is equal to1for one class or is equal toÀ1for the other class(i.e.y i A{À1,1})(Huang et al., 2002).If the two classes are linearly separable,then there exists a family of linear separators,also called separating hyperplanes, which satisfy the following set of equations(KavzogluandFig.1.Support vectors and optimum hyperplane for the binary case of linearly separable data sets.Table1Experimental data.yer A Layer B Layer C Layer D Target yer A Layer B Layer C Layer D Target1111112100000 2111112200000 3111112300000 4111112401000 5111112510000 6111112600000 7111112711100 8111112800000 9111012900000 10111013000000 11101113111100 12111013200000 13111013300000 14111013400000 15011013510000 16101013600000 17011013700000 18010113811100 19010112900000 20101014010000R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]2Colkesen,2009)(Fig.1):wx iþb Zþ1for y i¼þ1wx iþb rÀ1for y i¼À1ð1Þwhich is equivalent toy iðwx iþbÞZ1,i¼1,2,...,nð2ÞThe separating hyperplane can then be formalized as a decision functionfðxÞ¼sgnðwxþbÞð3Þwhere,sgn is a sign function,which is defined as follows:sgnðxÞ¼1,if x400,if x¼0À1,if x o08><>:ð4ÞThe two parameters of the separating hyperplane decision func-tion,w and b,can be obtained by solving the following optimization function:Minimize tðwÞ¼12J w J2ð5Þsubject toy Iððwx iÞþbÞZ1,i¼1,...,lð6ÞThe solution to this optimization problem is the saddle point of the Lagrange functionLðw,b,aÞ¼1J w J2ÀX li¼1a iðy iððx i wÞþbÞÀ1Þð7Þ@ @b Lðw,b,aÞ¼0@@wLðw,b,aÞ¼0ð8Þwhere a i is a Lagrange multiplier.The Lagrange function is minimized with respect to w and b and is maximized with respect to a grange multipliers a i are determined by the following optimization function:MaximizeX li¼1a iÀ12X li,j¼1a i a j y i y jðx i x jÞð9Þsubject toa i Z0,i¼1,...,l,andX li¼1a i y i¼0ð10ÞThe separating rule,based on the optimal hyperplane,is the following decision function:fðxÞ¼sgnX li¼1y i a iðxx iÞþb!ð11ÞMore details about SVM algorithms can be found in Vapnik(1995) and Tax and Duin(1999).3.Experiments with kernel functionsFor spatial geocomputational analysis of mineral exploration targets,the decision function in Eq.(3)is a kernel function.The choice of a kernel function(K)and its parameters for an SVM are crucial for obtaining good results.The kernel function can be usedTable2Errors of SVM classification using linear kernel functions.l Number ofsupportvectors Testingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0.2580.00.00.0180.00.00.0 1080.00.00.0 10080.00.00.0 100080.00.00.0Table3Errors of SVM classification using polynomial kernel functions when d¼3and r¼0. l Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0.25120.00.00.0160.00.00.01060.00.00.010060.00.00.0 100060.00.00.0Table4Errors of SVM classification using polynomial kernel functions when l¼0.25,r¼0.d Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)11110.00.0 5.010290.00.00.0100230.045.022.5 1000200.090.045.0Table5Errors of SVM classification using polynomial kernel functions when l¼0.25and d¼3.r Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0120.00.00.01100.00.00.01080.00.00.010080.00.00.0 100080.00.00.0Table6Errors of SVM classification using radial kernel functions.l Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0.25140.00.00.01130.00.00.010130.00.00.0100130.00.00.0 1000130.00.00.0Table7Errors of SVM classification using sigmoid kernel functions when r¼0.l Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0.25400.00.00.01400.035.017.510400.0 6.0 3.0100400.0 6.0 3.0 1000400.0 6.0 3.0R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]3to construct a non-linear decision boundary and to avoid expensive calculation of dot products in high-dimensional feature space.The four popular kernel functions are as follows:Linear:Kðx i,x jÞ¼l x i x j Polynomial of degree d:Kðx i,x jÞ¼ðl x i x jþrÞd,l40Radial basis functionðRBFÞ:Kðx i,x jÞ¼exp fÀl99x iÀx j992g,l40 Sigmoid:Kðx i,x jÞ¼tanhðl x i x jþrÞ,l40ð12ÞThe parameters l,r and d are referred to as kernel parameters. The parameter l serves as an inner product coefficient in the polynomial function.In the case of the RBF kernel(Eq.(12)),l determines the RBF width.In the sigmoid kernel,l serves as an inner product coefficient in the hyperbolic tangent function.The parameter r is used for kernels of polynomial and sigmoid types. The parameter d is the degree of a polynomial function.We performed some experiments to explore the performance of the parameters used in a kernel function.The dataset used in the experiments(Table1),which are derived from the study area(see below),were compiled according to the requirementfor Fig.2.Simplified geological map in western Meguma Terrain of Nova Scotia,Canada(after,Chatterjee1983;Cheng,2008).Table8Errors of SVM classification using sigmoid kernel functions when l¼0.25.r Number ofSupportVectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0400.00.00.01400.00.00.010400.00.00.0100400.00.00.01000400.00.00.0R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]4classification analysis.The e1071(Dimitriadou et al.,2010),a freeware R package,was used to construct a SVM.In e1071,the default values of l,r and d are1/(number of variables),0and3,respectively.From the study area,we used40geological feature vectors of four geoscience variables and a target variable for classification of mineral prospec-tivity(Table1).The target feature vector is either the‘non-deposit’class(or0)or the‘deposit’class(or1)representing whether mineral exploration target is absent or present,respectively.For‘deposit’locations,we used the20known Au deposits.For‘non-deposit’locations,we randomly selected them according to the following four criteria(Carranza et al.,2008):(i)non-deposit locations,in contrast to deposit locations,which tend to cluster and are thus non-random, must be random so that multivariate spatial data signatures are highly non-coherent;(ii)random non-deposit locations should be distal to any deposit location,because non-deposit locations proximal to deposit locations are likely to have similar multivariate spatial data signatures as the deposit locations and thus preclude achievement of desired results;(iii)distal and random non-deposit locations must have values for all the univariate geoscience spatial data;(iv)the number of distal and random non-deposit locations must be equaltoFig.3.Evidence layers used in mapping prospectivity for Au deposits(from Cheng,2008):(a)and(b)represent optimum proximity to anticline axes(2.5km)and contacts between Goldenville and Halifax formations(4km),respectively;(c)and(d)represent,respectively,background and anomaly maps obtained via S-Afiltering of thefirst principal component of As,Cu,Pb and Zn data.R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]5the number of deposit locations.We used point pattern analysis (Diggle,1983;2003;Boots and Getis,1988)to evaluate degrees of spatial randomness of sets of non-deposit locations and tofind distance from any deposit location and corresponding probability that one deposit location is situated next to another deposit location.In the study area,we found that the farthest distance between pairs of Au deposits is71km,indicating that within that distance from any deposit location in there is100%probability of another deposit location. However,few non-deposit locations can be selected beyond71km of the individual Au deposits in the study area.Instead,we selected random non-deposit locations beyond11km from any deposit location because within this distance from any deposit location there is90% probability of another deposit location.When using a linear kernel function and varying l from0.25to 1000,the number of support vectors and the testing errors for both ‘deposit’and‘non-deposit’do not vary(Table2).In this experiment the total error of classification is0.0%,indicating that the accuracy of classification is not sensitive to the choice of l.With a polynomial kernel function,we tested different values of l, d and r as follows.If d¼3,r¼0and l is increased from0.25to1000,the number of support vectors decreases from12to6,but the testing errors for‘deposit’and‘non-deposit’remain nil(Table3).If l¼0.25, r¼0and d is increased from1to1000,the number of support vectors firstly increases from11to29,then decreases from23to20,the testing error for‘non-deposit’decreases from10.0%to0.0%,whereas the testing error for‘deposit’increases from0.0%to90%(Table4). In this experiment,the total error of classification is minimum(0.0%) when d¼10(Table4).If l¼0.25,d¼3and r is increased from 0to1000,the number of support vectors decreases from12to8,but the testing errors for‘deposit’and‘non-deposit’remain nil(Table5).When using a radial kernel function and varying l from0.25to 1000,the number of support vectors decreases from14to13,but the testing errors of‘deposit’and‘non-deposit’remain nil(Table6).With a sigmoid kernel function,we experimented with different values of l and r as follows.If r¼0and l is increased from0.25to1000, the number of support vectors is40,the testing errors for‘non-deposit’do not change,but the testing error of‘deposit’increases from 0.0%to35.0%,then decreases to6.0%(Table7).In this experiment,the total error of classification is minimum at0.0%when l¼0.25 (Table7).If l¼0.25and r is increased from0to1000,the numbers of support vectors and the testing errors of‘deposit’and‘non-deposit’do not change and the total error remains nil(Table8).The results of the experiments demonstrate that,for the datasets in the study area,a linear kernel function,a polynomial kernel function with d¼3and r¼0,or l¼0.25,r¼0and d¼10,or l¼0.25and d¼3,a radial kernel function,and a sigmoid kernel function with r¼0and l¼0.25are optimal kernel functions.That is because the testing errors for‘deposit’and‘non-deposit’are0%in the SVM classifications(Tables2–8).Nevertheless,a sigmoid kernel with l¼0.25and r¼0,compared to all the other kernel functions,is the most optimal kernel function because it uses all the input support vectors for either‘deposit’or‘non-deposit’(Table1)and the training and testing errors for‘deposit’and‘non-deposit’are0% in the SVM classification(Tables7and8).4.Prospectivity mapping in the study areaThe study area is located in western Meguma Terrain of Nova Scotia,Canada.It measures about7780km2.The host rock of Au deposits in this area consists of Cambro-Ordovician low-middle grade metamorphosed sedimentary rocks and a suite of Devonian aluminous granitoid intrusions(Sangster,1990;Ryan and Ramsay, 1997).The metamorphosed sedimentary strata of the Meguma Group are the lower sand-dominatedflysch Goldenville Formation and the upper shalyflysch Halifax Formation occurring in the central part of the study area.The igneous rocks occur mostly in the northern part of the study area(Fig.2).In this area,20turbidite-hosted Au deposits and occurrences (Ryan and Ramsay,1997)are found in the Meguma Group, especially near the contact zones between Goldenville and Halifax Formations(Chatterjee,1983).The major Au mineralization-related geological features are the contact zones between Gold-enville and Halifax Formations,NE–SW trending anticline axes and NE–SW trending shear zones(Sangster,1990;Ryan and Ramsay, 1997).This dataset has been used to test many mineral prospec-tivity mapping algorithms(e.g.,Agterberg,1989;Cheng,2008). More details about the geological settings and datasets in this area can be found in Xu and Cheng(2001).We used four evidence layers(Fig.3)derived and used by Cheng (2008)for mapping prospectivity for Au deposits in the yers A and B represent optimum proximity to anticline axes(2.5km) and optimum proximity to contacts between Goldenville and Halifax Formations(4km),yers C and D represent variations in geochemical background and anomaly,respectively, as modeled by multifractalfilter mapping of thefirst principal component of As,Cu,Pb,and Zn data.Details of how the four evidence layers were obtained can be found in Cheng(2008).4.1.Training datasetThe application of SVM requires two subsets of training loca-tions:one training subset of‘deposit’locations representing presence of mineral deposits,and a training subset of‘non-deposit’locations representing absence of mineral deposits.The value of y i is1for‘deposits’andÀ1for‘non-deposits’.For‘deposit’locations, we used the20known Au deposits(the sixth column of Table1).For ‘non-deposit’locations(last column of Table1),we obtained two ‘non-deposit’datasets(Tables9and10)according to the above-described selection criteria(Carranza et al.,2008).We combined the‘deposits’dataset with each of the two‘non-deposit’datasets to obtain two training datasets.Each training dataset commonly contains20known Au deposits but contains different20randomly selected non-deposits(Fig.4).4.2.Application of SVMBy using the software e1071,separate SVMs both with sigmoid kernel with l¼0.25and r¼0were constructed using the twoTable9The value of each evidence layer occurring in‘non-deposit’dataset1.yer A Layer B Layer C Layer D100002000031110400005000061000700008000090100 100100 110000 120000 130000 140000 150000 160100 170000 180000 190100 200000R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]] 6training datasets.With training dataset1,the classification accuracies for‘non-deposits’and‘deposits’are95%and100%, respectively;With training dataset2,the classification accuracies for‘non-deposits’and‘deposits’are85%and100%,respectively.The total classification accuracies using the two training datasets are97.5%and92.5%,respectively.The patterns of the predicted prospective target areas for Au deposits(Fig.5)are defined mainly by proximity to NE–SW trending anticlines and proximity to contact zones between Goldenville and Halifax Formations.This indicates that‘geology’is better than‘geochemistry’as evidence of prospectivity for Au deposits in this area.With training dataset1,the predicted prospective target areas occupy32.6%of the study area and contain100%of the known Au deposits(Fig.5a).With training dataset2,the predicted prospec-tive target areas occupy33.3%of the study area and contain95.0% of the known Au deposits(Fig.5b).In contrast,using the same datasets,the prospective target areas predicted via WofE occupy 19.3%of study area and contain70.0%of the known Au deposits (Cheng,2008).The error matrices for two SVM classifications show that the type1(false-positive)and type2(false-negative)errors based on training dataset1(Table11)and training dataset2(Table12)are 32.6%and0%,and33.3%and5%,respectively.The total errors for two SVM classifications are16.3%and19.15%based on training datasets1and2,respectively.In contrast,the type1and type2 errors for the WofE prediction are19.3%and30%(Table13), respectively,and the total error for the WofE prediction is24.65%.The results show that the total errors of the SVM classifications are5–9%lower than the total error of the WofE prediction.The 13–14%higher false-positive errors of the SVM classifications compared to that of the WofE prediction suggest that theSVMFig.4.The locations of‘deposit’and‘non-deposit’.Table10The value of each evidence layer occurring in‘non-deposit’dataset2.yer A Layer B Layer C Layer D110102000030000411105000060110710108000091000101110111000120010131000140000150000161000171000180010190010200000R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]7classifications result in larger prospective areas that may not contain undiscovered deposits.However,the 25–30%higher false-negative error of the WofE prediction compared to those of the SVM classifications suggest that the WofE analysis results in larger non-prospective areas that may contain undiscovered deposits.Certainly,in mineral exploration the intentions are notto miss undiscovered deposits (i.e.,avoid false-negative error)and to minimize exploration cost in areas that may not really contain undiscovered deposits (i.e.,keep false-positive error as low as possible).Thus,results suggest the superiority of the SVM classi-fications over the WofE prediction.5.ConclusionsNowadays,SVMs have become a popular geocomputational tool for spatial analysis.In this paper,we used an SVM algorithm to integrate multiple variables for mineral prospectivity mapping.The results obtained by two SVM applications demonstrate that prospective target areas for Au deposits are defined mainly by proximity to NE–SW trending anticlines and to contact zones between the Goldenville and Halifax Formations.In the study area,the SVM classifications of mineral prospectivity have 5–9%lower total errors,13–14%higher false-positive errors and 25–30%lower false-negative errors compared to those of the WofE prediction.These results indicate that SVM is a potentially useful tool for integrating multiple evidence layers in mineral prospectivity mapping.Table 11Error matrix for SVM classification using training dataset 1.Known All ‘deposits’All ‘non-deposits’TotalPrediction ‘Deposit’10032.6132.6‘Non-deposit’067.467.4Total100100200Type 1(false-positive)error ¼32.6.Type 2(false-negative)error ¼0.Total error ¼16.3.Note :Values in the matrix are percentages of ‘deposit’and ‘non-deposit’locations.Table 12Error matrix for SVM classification using training dataset 2.Known All ‘deposits’All ‘non-deposits’TotalPrediction ‘Deposits’9533.3128.3‘Non-deposits’566.771.4Total100100200Type 1(false-positive)error ¼33.3.Type 2(false-negative)error ¼5.Total error ¼19.15.Note :Values in the matrix are percentages of ‘deposit’and ‘non-deposit’locations.Table 13Error matrix for WofE prediction.Known All ‘deposits’All ‘non-deposits’TotalPrediction ‘Deposit’7019.389.3‘Non-deposit’3080.7110.7Total100100200Type 1(false-positive)error ¼19.3.Type 2(false-negative)error ¼30.Total error ¼24.65.Note :Values in the matrix are percentages of ‘deposit’and ‘non-deposit’locations.Fig.5.Prospective targets area for Au deposits delineated by SVM.(a)and (b)are obtained using training dataset 1and 2,respectively.R.Zuo,E.J.M.Carranza /Computers &Geosciences ](]]]])]]]–]]]8。
遗传算法优化svm参数
遗传算法优化svm参数遗传算法是一种基于自然适应性进化理论的优化算法,它通过模拟自然界中的进化过程,通过遗传算子(交叉和变异操作)对个体进行进化和选择,以找到最优解决方案。
支持向量机(Support Vector Machine,SVM)是一种非常有效的分类算法,通过在数据集中找到最有代表性的样本点,构建超平面分离不同类别的样本。
优化SVM的参数可以提高分类的准确率和稳定性。
下面是使用遗传算法优化SVM参数的一般步骤:1. 确定优化目标:首先明确需要优化的SVM参数,如惩罚系数C、核函数类型和参数、松弛变量等,这些参数会影响模型的性能。
2. 设计基因编码:将待优化的参数映射为基因的编码形式,可以使用二进制、整数或浮点数编码。
例如,某个参数的取值范围为[0, 1],可以使用浮点数编码。
3. 初始化种群:随机生成初始的种群,每个个体都表示一个SVM参数的取值组合。
4. 适应度评估:使用训练集对每个个体进行评估,计算其在测试集上的准确率或其他指标作为个体的适应度。
5. 选择操作:根据适应度排序或轮盘赌等策略,选择优秀个体进行遗传操作。
6. 交叉操作:从选中的个体中进行交叉操作,生成新的个体。
可以使用单点交叉、多点交叉或均匀交叉等策略。
7. 变异操作:对生成的新个体进行变异操作,引入随机扰动,增加种群的多样性。
变异操作可以改变某个基因的值或重新随机生成某个基因。
8. 更新种群:将交叉和变异生成的个体合并到种群中。
9. 重复步骤4-8,直到满足终止条件(如达到最大迭代次数或种群适应度不再改变)。
10. 选择最优个体:从最终的种群中选择适应度最好的个体作为最优解,即SVM的最优参数。
通过以上步骤,遗传算法可以搜索参数空间,并找到最有解决方案。
通过尝试不同的参数组合,可以优化SVM模型的性能。
请注意,以上只是一般的遗传算法优化SVM参数的步骤,实际应用中可能会根据具体问题进行适当的调整。
在实际操作中,还可以通过引入其他优化技巧(如局部搜索)来进一步提高搜索效率。
零基础学SVM—Support Vector Machine系列之一
零基础学SVM—Support Vector Machine系列之一本文原作者耳东陈,本文原载于作者的知乎文章。
AI 研习社已获得转载授权。
如果你是一名模式识别专业的研究生,又或者你是机器学习爱好者,SVM是一个你避不开的问题。
如果你只是有一堆数据需要SVM帮你处理一下,那么无论是Matlab的SVM工具箱,LIBSVM还是python框架下的SciKit Learn都可以提供方便快捷的解决方案。
但如果你要追求的不仅仅是会用,还希望挑战一下“理解”这个层次,那么你就需要面对一大堆你可能从来没听过的名词,比如:非线性约束条件下的最优化、KKT条件、拉格朗日对偶、最大间隔、最优下界、核函数等等。
这些名词往往会跟随一大堆天书一般的公式。
如果你稍微有一点数学基础,那么单个公式你可能看得明白,但是怎么从一个公式跳到另一个公式就让人十分费解了,而最让人糊涂的其实并不是公式推导,而是如果把这些公式和你脑子里空间构想联系起来。
我本人就是上述问题的受害者之一。
我翻阅了很多关于SVM的书籍和资料,但没有找到一份材料能够在公式推导、理论介绍,系统分析、变量说明、代数和几何意义的解释等方面完整地对SVM加以分析和说明的。
换言之,对于普通的一年级非数学专业的研究生而言,要想看懂SVM需要搜集很多资料,然后对照阅读和深入思考,才可能比较透彻地理解SVM算法。
由于我本人也在东北大学教授面向一年级硕士研究生的《模式识别技术与应用》课程,因此希望能总结出一份相对完整、简单和透彻的关于SVM算法的介绍文字,以便学生能够快速准确地理解SVM算法。
以下我会分为四个步骤对最基础的线性SVM问题加以介绍,分别是1)问题原型,2)数学模型,3)最优化求解,4)几何解释。
我尽可能用最简单的语言和最基本的数学知识对上述问题进行介绍,希望能对困惑于SVM算法的学生有所帮助。
SVM算法要解决什么问题SVM的全称是Support Vector Machine,即支持向量机,主要用于解决模式识别领域中的数据分类问题,属于有监督学习算法的一种。
SVM Aggregation SVM, SVM Ensemble, SVM Classification Tree
SVM Aggregation: SVM, SVM Ensemble, SVM Classification TreeBy Shaoning Pang1. IntroductionSupport Vector Machine (SVM), since first proposed by Vapnik and his group at AT\&T laboratory has been extensively studied and discussed to develop its working principle of classification and regression. As a result, different types of SVM and SVM extensions [1] have been proposed. Suykens introduced the quadratic cost function in SVM and proposed LSSVM (Least Squares Support Vector Machine). Mangasarian et al. used an implicit Lagrangian reformulation in SVM, and proposed LSVM (Lagrangian Support Vector Machine) and NSVM (Newton Support Vector Machine). Later, Lee and Mangasarian used a smooth unconstrained optimization in SVM, and had SSVM (Smooth Support Vector Machine). Recently, new interesting SVM models were published, such as, Chun-fu Lin's FSVM (Fuzzy Support Vector Machine). Zhang et al proposed HSSVMs (Hidden Space Support Vector Machines). Shilton et al. proposed an incremental version of SVM. All these SVM types have significantly enhanced the original SVM performance. Most importantly, they have applied the original SVM to suit different real application needs.SVM aggregation, as an alternative aspect of SVM study, specializes on combining a family of concurrent SVMs for advanced artificial intelligence. The well known SVM aggregation methods are the One-against-all and One-against-one methods. The purpose of such aggregations is to expand SVM binary classification to multi-class classification. A typical procedure of SVM aggregation can be summarized as three steps, SVM model selection, convex aggregation, and aggregation training.Over the last 5 years, I have been working on SVM aggregation, and have developed the original single SVM classification in our previous work, to SVM ensemble for classification, SVM classification tree (including 2-class SVM tree (2-SVMT), and Multi-class SVMT tree (m-SVMT)). Evolving SVM classification tree is an ongoing research topic of adapting SVMTto the incremental learning of data stream by evolving SVM and SVM tree structure.2. SVM EnsembleIn SVM ensemble, individual SVMs are aggregated to make a collective decision in several ways such as the majority voting, least-squares estimation-based weighting, and the double layer hierarchical combing. The training SVM ensemble can be conducted in the way of bagging or boosting. In bagging, each individual SVM is trained independently using the randomly chosen training samples via a boostrap technique. In boosting, each individual SVM is trained using the training samples chosen according to the sample’s probability distribution that is updated in proportion to the error in the sample. SVM ensemble is essentially a type of cross-validation optimization of single SVM, having a more stable classification performance than other models. The details on SVM ensemble construction and application are described in [2,3].3. 2-class SVM Tree (2-SVMT)The principle of SVMT is to encapsulate a number of binary SVMs into a multi-layer hierarchy by adapting a "divide and conquer" strategy. The benefits of SVMT model can be summarized as: (1) SVMT is capable of effectively reducing classification difficulties from class mixture and overlap through a supervised LLE data partitioning.(2) Importantly, SVMT outperforms single SVM and SVM ensemble on the robustness to class imbalance.A 2-class SVM tree can be modeled under the ‘depth first’ policy. The employed partitioning function for depth first is a binary data splitting whose targeting function is to partition all samples of class 1 into one cluster and all samples of class 2 into the other cluster. 2-SVMT of this type is particularly useful for the 2-class task with serious class overlap. Fig 1 shows an example of 2-class SVM binary tree over a face membership authentication [3,4] case with 30 of 271 persons as membership group.Fig. 1.Example of 2-class SVM binary treeAlternatively, a 2-class SVM tree also can be modeled under the ‘width first’ policy, where the employed partitioning function is a multiple data splitting, and the targeting function for partitioning here is to steer data samples in the same cluster with the same class label. A multiple data splitting is capable of controlling the size of the tree to a limited size, which is very optimal for decision making in such a tree structural model. Fig 3 gives an example of 2-class SVM multiple tree over the same case of face membership authentication as Fig. 2.Fig. 2.Example of 2-class SVM multiple tree.3. multi-class SVM Tree (m -SVMT)The above SVMTs are merely two-class SVM classification tree (2-SVMT) model, which are not sustainable for normal multi-class classification tasks. However in real application, class imbalance of multi-class problem is also a critical challenge for most classifiers, thus it is desirable to develop a multi-class SVM tree with the above properties of 2-SVMT.Fig. 3.Example m-SVMT over a 3 class taskThe construction of m-SVMT [9] is to decompose an m -class task into a certain number of 1-m classes regional tasks, under the criterion of minimizing SVM tree loss function. The proposed m-SVMT is demonstrated very competitive in discriminability as compared to other typical classifiers such as single SVMs, C4.5, K-NN, and MLP neural network, and particularly has a superior robustness to class imbalance, which other classifiers can not match. Fig.3 gives an example of m-SVMT for a 3-class task.4. Evolving SVMT, an ongoing research topicLearning over datasets in real-world application, we often confront difficult situations where a complete set of training samples is not given in advance. Actually in most of cases, data is being presented as a data stream where we can not know what kind of data, even what class of data, is coming in the future. Obviously, one-pass incremental learning gives a method to deal with such data streams [8,9].For the needs of incremental learning over data stream, I am working to realize a concept of evolving SVM classification tree (eSVMT) for the classification of streaming data, where chunks of data is being presented at different time. The constructed eSVMT is supposed to be capable of accommodating new data by adjusting SVM classification tree as in the simulation shown in Fig. 4.Fig.4. A simulation of evolving SVM classification treeT=1 T=2 T=3T=4T=5The difficulty for eSVMT modelling is, (1) eSVMT needs to acquire knowledge with a single presentation of training data, and retaining the knowledge acquired in the past without keeping a large number of training samples in memory. (2) eSVMT needs to accommodate new data continuously, while always keeping a good size of SVM tree structure, and a good classification in real time.Acknowledgement:The research reported in the article was partially funded by the ministry of Education, South Korea under the program of BK21, and the New Zealand Foundation for Research, Science and Technology under the grant: NERF/AUTX02-01. Also the author would like to present the thanks to Prof. Nik Kasabov of Auckland University of Technology, Prof. S. Y.Bang, and Prof. Dajin Kim of Pohang University of Science and Technology, for their support and supervision during 2001 to 2005, when most of the reported research in this article was carried out.References:1.;; /dmi/2.Hyun-Chul Kim, Shaoning Pang, Hong-Mo Je, Daijin Kim, Sung Yang Bang: Constructing support vector machine ensemble. Pattern Recognition vol. 36, no. 12, pp. 2757-2767, 20033.Shaoning Pang, D. Kim, S. Y. Bang, Membership authentication in the dynamic group by face classification using SVM ensemble. Pattern Recognition Letters, vol. 24, no. (1-3), pp. 215-225, 2003.4.Shaoning Pang, D. Kim, S. Y. Bang, Face Membership Authentication Using SVM Classification Tree Generated by Membership-based LLE Data Partition, IEEE Trans. on Neural Network, vol. 16 no. 2, pp. 436-446, 2005.5.Shaoning Pang, Constructing SVM Multiple Tree for Face Membership Authentication. ICBA 2004, Lecture Notes in Computer Science 3072, pp. 37-43, Springer 2004.6.Shaoning Pang, Seiichi Ozawa, Nikola Kasabov, One-pass Incremental Membership Authentication by Face Classification. ICBA 2004, Lecture Notes in Computer Science 3072, pp. 155-161, Springer 2004.7.Shaoning Pang, and Nikola Kasabov, Multi-Class SVM Classification Tree, (submitted), 2005.8. Shaoning Pang, Seiichi Ozawa and Nik Kasabov, Incremental Linear Discriminant Analysis for Classification of Data Streams ,IEEE Trans. on System, Man, and Cybernetics-Part B, vol. 35, no. 5, pp. 905 – 914, 20059.Seiichi Ozawa, Soon Toh, Shigeo Abe, Shaoning Pang and Nikola Kasabov, Incremental Learning for Online Face Recognition, Neural Network, vol.18, no. (5-6), pp. 575-584, 2005.Dr. Shaoning PANGKnowledge Engineering & Discovery Research Institute Auckland University of Technology, New Zealand Email: spang@。
【support vector machines】
~ ~
这样分类间隔就等于 ,因此要求分类间隔最大,就要求 最大.而要求分类面对所有样本正确分类,就是要求满足
yi ( w xi + b) ≥ 1, i i 使等号成立的样本点称为支持向量
2 w
Linear SVM Mathematically
x+
M=Margin Width
X-
What we know: w . x+ + b = +1 w . x- + b = -1 w . (x+-x-) = 2
首先建立Lagrange函数 函数 首先建立 w J ( w, b, α ) = ∑ α [ y ( w x + b) 1] 2
2 l i i i i =1
J ( w, b, α ) =0 条件1: w J ( w, b, α ) =0 条件2: b 最终可得到
1 l l Q(α ) = J ( w, b, α ) = ∑ α i ∑∑ α iα jyiyj ( xi xj ) 2 i =1 j =1 i =1 寻找最大化目标函数Q (α )的Lagrange乘子{α i }il =1 , 满足约束条件 (1)
SVM的理论基础
由于SVM 的求解最后转化成二次规划问题的求 解,因此SVM 的解是全局唯一的最优解 SVM在解决小样本、非线性及高维模式识别问题 中表现出许多特有的优势,并能够推广应用到函 数拟合等其他机器学习问题中.
线性判别函数和判别面
一个线性判别函数(discriminant function)是指 由x的各个分量的线性组合而成的函数
l l
∑α y = 0
i =1 i i l * , i i,
对偶问题
Text Categorization with Support Vector Machines
Features
Thorsten Joachims
Universitat Dortmund Informatik LS8, Baroper Str. 301
3 Support Vector Machines
Support vector machines are based on the Structural Risk Minimization principle 9 from computational learning theory. The idea of structural risk minimization is to nd a hypothesis h for which we can guarantee the lowest true error. The true error of h is the probability that h will make an error on an unseen and randomly selected test example. An upper bound can be used to connect the true error of a hypothesis h with the error of h on the training set and the complexity of H measured by VC-Dimension, the hypothesis space containing h 9 . Support vector machines nd the hypothesis h which approximately minimizes this bound on the true error by e ectively and e ciently controlling the VC-Dimension of H.
RVM
A Tutorial on Relevance Vector Machine楊善詠June9,20061前言這篇文章的內容主要在介紹Relevence Vector Machine(RVM)的基本概念與做法。
由於RVM使用機率的方法來克服Support Vector Machine(SVM)的缺點,因此我也會一併介紹一些重要的機率概念。
我會假設這篇文章的讀者對機器學習有最基本的知識,並且稍微了解SVM的原理。
為了避免混淆,在所有的數學式中,一般的小寫斜體表示純量,如w i,t i等;小寫粗體表示向量,如x,w,α等;而大寫正體或大寫希臘字母表示矩陣,如A,Φ,Σ等。
此外,大寫的P(·)表示離散的機率分佈函數,而小寫的p(·)則是連續的機率分佈函數。
2簡介Supervised learning意指我們要解決如下的問題:給定一群向量{x i}N i=1與對應的目標{t i}N i=1作為輸入,我們想要找出x i與t i之間的對應關係,讓我們能夠在遇到一個新的向量x∗時,能夠預測出它所對應的目標t∗。
這邊的t i可能是類別標籤(分類:classification),或是任意實數(回歸:regression)。
如果使用SVM解這類問題,會導出x與t的對映關係符合以下的函數:t=y(x;w)=Ni=1w i K(x,x i)+w0(1)其中K(x,x i)是我們選用的kernel function,而w i則代表不同的權重。
只有在x i是屬於support vector之一時,w i才會是零以外的值。
1實作顯示SVM的表現良好,因此SVM被運用在許多地方。
然而SVM並非沒有缺點,以下是SVM較為人垢病之處:•雖然support vector的數量會明顯少於training instance的個數,但依然會隨著train-ing instance的數量線性成長。
一方面可能造成過度調適(overfitting)的問題,另一方面則浪費計算時間。
Support_Vector_Machine简介
Support Vector Machine簡介By Biano1.資料分群 (Data Classification)對於一群資料而言,有時候我們會希望依據資料的一些特性來將這群資料分為兩群。
而就資料分群而言,我們已知有一些效果不錯的方法。
例如:Nearest Neighbor、類神經網路(Neural Networks)、Decision Tree等等方式,而如果在正確的使用的前提之下,這些方式的準確率相去不遠,然而,SVM的優勢在於使用上較為容易。
2.Support Vector Machine概念R空間中的資料,我們希望能夠在該空間之中找出一對於一群在dHyper-plan,並且,希望此Hyper-plan可以將這群資料切成兩群(ie:群組A、群組B)。
而屬於群組A的資料均位於Hyper-plan的同側,而群組B的資料均位於Hyper-plan的另一側,如下圖:比較上圖左以及右,我們可以發現左圖所找出的Hyper-plan(虛線),其兩平行且與兩組資料點相切的Hyper-plan(實線)之間距離較近,而右圖則具有較大的Margin。
而由於我們希望可以找出將兩群資料點分的較開的Hyper-plan,因次我們認為右圖所找出的是比較好的Hyper-plan。
因此,將問題簡述如下:已知Training Data Sets::{,},1,2,...,i i x y i n =,,{1,1}d i i x y ∈∈−R ,我們希望利用Training Data 找出一最佳Hyper-plan H ,以利將未知的Xi 歸類。
3. SVM 理論說明:Preliminaries由上圖,實線為我們找出的Hyper-plan ,而我們將H1與H2稱之為Support Hyper-plans ,而我們希望能夠找出最佳的Classification Hyper-plan 使兩Support Hyper-plans 之間有最大的Margin 。
基于SVM的特征选择方法
02
基于SVM的特征选择方 法的主要种类
基于SVM的过滤式特征选择方法
总结词
过滤式特征选择方法是一种独立于分类器的特征选择方法,通过计算特征的统计性质来评估其重要性 。
详细描述
基于SVM的过滤式特征选择方法通常包括以下步骤:首先,根据特征的权重或相关性计算每个特征的 重要性得分;然后,根据这些得分对特征进行排序;最后,选择得分较高的特征组成子集。这种方法 不依赖于特定的分类器,因此具有较高的通用性。
• 基于SVM的特征选择方法的基本原理是将特征选择过程看 作一个二分类问题,通过构造一个最优分类超平面,将数 据集划分为两个类别。在特征选择过程中,通过将每个特 征单独或组合进行分类能力的评估,选择使得分类超平面 最优化(即分类间隔最大)的特征子集。常用的基于SVM 的特征选择方法包括基于惩罚项的特征选择、基于间隔的 特征选择和基于递归的特征消除等。
详细描述
在疾病预测中,基于SVM的特征选择方法可以帮助我 们从海量的生物信息数据中筛选出与疾病发生和发展 相关的关键特征,为精准医疗和个性化治疗提供有力 支持。该方法在生物信息学、医学数据分析和精准医 疗等领域具有广泛的应用前景。
05
基于SVM的特征选择方 法的未来发展及研究方向
现有研究的不足与改进方向
01
缺乏对特征选择方法 的全局考虑
现有的SVM特征选择方法往往只关注 局部特征,缺乏对整体特征的考虑, 导致选择的特征可能无法全面反映样 本的特性。
02
缺乏对特征选择方法 的多角度考虑
现有的SVM特征选择方法往往只从一 个角度考虑特征的选择,无法全面反 映样本的复杂性。
03
缺乏对特征选择方法 的评估标准
基于SVM的特征选择方法的应用范围
支持向量机模型参数选择方法综述
Computer Knowledge and Technology 电脑知识与技术人工智能及识别技术本栏目责任编辑:唐一东第6卷第28期(2010年10月)支持向量机模型参数选择方法综述付阳1,李昆仑2(1.南昌大学信息工程学院江西南昌330031;2.南昌大学科学技术学院,江西南昌330029)摘要:支持向量机是机器学习和数据挖掘领域的热门研究课题之一,作为一种尚未完全成熟的技术,目前仍有许多不足,其中之一就是没有统一的模型参数选择标准和理论。
在具体使用中,对支持向量机性能有重要影响的参数包括惩罚因子C ,核函数及其参数的选取。
文章首先分析了模型参数对支持向量机性能的影响,然后对几种常用的模型参数选择方法进行介绍,分析以及客观评价,最后概括了支持向量机模型参数选择方法的现状,以及对其发展趋势进行了展望。
关键词:支持向量机;模型参数选择;惩罚因子;核函数;核参数中图分类号:TP181文献标识码:A 文章编号:1009-3044(2010)28-8081-02A Survey of Model Parameters Selection Method for Support Vector MachinesFU Yang 1,LI Kun-lun 2(rmation Engineering College of NanChang University,Nanchang 330031,China;2.Science and Technology College of NanChang University,Nanchang 330029,China)Abstract:Support vector machine is machine learning and data mining area is one of the hot research topic,as a kind of technology,has not yet been fully mature now,there are still many deficiencies,one is no unified model parameter selection criteria and theory.In the spe -cific use of support vector machine has a significant effect on the performance of the parameters including the penalty C,kernel function and parameters selection.This paper analyzes the model of support vector machine performance parameters,the influence of several com -mon model and parameter selection method,analyzed and summarized,the final objective evaluation support vector machine (SVM)model parameter selection method,and its development trend was prospected.Key words:support vector machine;model parameter selection;the penalty;kernel functions;kernel functions parameter支持向量机(Support Vector Machines ,SVM )是一种机器学习方法,它是在统计学习理论的基础上发展而来的,最早由Vapnik 等人于1992年在计算机理论大会上提出,其主要内容在1995年间才基本完成,目前仍处在不断发展的阶段[1]。
IMPROVED SUPPORT VECTOR MACHINE
专利名称:IMPROVED SUPPORT VECTOR MACHINE 发明人:GATES, Kevin, E.申请号:AU2004001507申请日:20041029公开号:WO05/043450P1公开日:20050512专利内容由知识产权出版社提供摘要:A method for operating a computer as a support vector machine (SVM) in order to define a decision surface separating two opposing classes of a training set of vectors. The method involves associating a distance parameter with each vector of the SVM's training set. The distance parameter indicates a distance from its associated vector, being in a first class, to the opposite class. A number of approaches to calculating distance parameters are provided. For example, a distance parameter may be calculated as the average of the distances from its associated vector to each of the vectors in the opposite class. The method further involves determining a linearly independent set of support vectors from the training set such that the sum of the distances associated with the linearly independent support vectors is minimised.申请人:GATES, Kevin, E.地址:AU,AU国籍:AU,AU代理机构:EAGAR & BUCK PATENT AND TRADE MARK ATTORNEYS更多信息请下载全文后查看。
Twin Support Vector Machine with Universum Data
Twin Support Vector Machine with Universum DataZhiquan Qi a,Yingjie Tian a,∗,Yong Shi a,b,∗a Research Center on Fictitious Economy&Data Science,Chinese Academy of Sciences,Beijing100190,Chinab College of Information Science&Technology,University of Nebraska at Omaha,Omaha,NE68182,USA.AbstractThe Universum,which is defined as the sample not belonging to either class of the classification problem of interest,has been proved to be helpful in supervised learning.In this work,we designed a new Twin Support Vec-tor Machine with Universum(called U-TSVM),which can utilize Universum data to improve the classification performance of TSVM.Unlike U-SVM,in U-TSVM,Universum data are located in a nonparallel insensitive loss tube by using two Hinge Loss functions,which can exploit these prior knowledge embedded in Universum data moreflexible.Empirical experiments demon-strate that U-TSVM can directly improve the classification accuracy of stan-dard TSVM that use the labeled data alone and is superior to U-SVM in most cases.Keywords:Classification,Twin support vector machine,Universum1.IntroductionSupervised learning problem with Universum samples is a new research subject in machine learning.The concept of Universum sample wasfirstly introduced by Weston et al.(2006),owing its name to the intuition that the Universum captures a general backdrop against which a problem at hand is solved.It is defined as the sample not belonging to any of the classes the learning task concerns.For instance,considering the classification of‘5’∗Corresponding authorEmail addresses:qizhiquan@(Zhiquan Qi),tyj@(Yingjie Tian),yshi@(Yong Shi)Preprint submitted to Neural Networks October28,2012Neural Networksagainst‘8’in handwritten digits recognition,‘0’,‘1’,‘2’,‘3’,‘4’,‘5’,‘6’,‘7’,‘9’can be considered as Universum samples.Since it is not required to have the same distribution with the training data,the Universum is able to show some prior information for the possible classifiers.Several works have been done using the Universum samples in machine learning.In Weston et al.(2006) the authors proposed a new Support Vector Machine(SVM)framework, called U-SVM and their experimental results show that U-SVM outperforms those SVMs without considering Universum data.Sinz et al.(2008)gave an analysis of U-SVM.Then they presented a Least Squares(LS)version of the U-SVM algorithm.Zhang et al.(2008)proposed a graph based semi-supervised algorithm,which learns from the labeled data,unlabeled data and the Universum data at the same time.Other literatures also can be found in Chen and Zhang(2009);Cherkassky et al.(2011);Shen et al.(2011). Recently,Jayadeva et al.(2007)proposed a twin support vector machine (TSVM)classifier for binary classification,motivated by GEPSVM1(Man-gasarian and Wild(2006)).TSVMs generates two nonparallel planes such that each plane is closer to one of two classes and is at least one distance from the other.It is implemented by solving two smaller Quadratic Programming Problems(QPPs)rather than a single large QPP,which makes the learning speed of TSVM faster than that of a classical SVM.Experimental results in Jayadeva et al.(2007);Kumar and Gopal(2008)show the TSVM outper-forms both standard SVM and GEPSVM in the most case.Some extensions to the TSVM can be found in Shao et al.(2011);Shao and Deng(2011); Kumar and Gopal(2008);Khemchandani et al.(2009);Kumar and Gopal (2009);Ghorai et al.(2009);Peng(2012,2010).Inspired by the success of TSVM,in this paper,we propose a new Twin Support Vector Machines with Universum(called U-TSVM).The proposed U-TSVM has the following compelling properties.♢Except for labeled data from two classes,U-TSVM exploits Universum data as well.All experiments in both Toy data,UCI datasets and TFDS datasets show that the classification accuracy of U-TSVM is better than conventional TSVM algorithms that do not use Universum data.In1In this approach,data points of each class are proximal to one of two nonparallel planes.Each plane is generated such that it is closest to one of the two data sets and as far as possible from the other data set.Each of the two nonparallel proximal planes is obtained by a single MATLAB command as the eigenvector corresponding to a smallest eigenvalue of a generalized eigenvalue problem Mangasarian and Wild(2006).Neural Networksaddtion,to our knowledge,this is thefirst TSVM implementation with Universum data.We also show that the TSVM are the special cases of U-TSVM.This provides an alternative explanation for the success of U-TSVM.♢The area of Universum data is defined by using two Hinge Loss functions.The definition is moreflexible than that of U-SVM,which can more fully exploit the information embedded in Universum data to construct thefinal classifier.Figure2in Section3gave the intuitive geometric interpretations.The remaining parts of the paper are organized as follows.Section2briefly introduces the background of SVM and TSVM;Section3describes the detail of U-TSVM;In the Section4,we show experiments of U-TSVM on various data sets.We conclude this work in Section5.2.Background2.1.Support Vector Classification(SVC)(Vapnik(1995))For classification about the training dataT={(x1,y1),···,(x l,y l)}∈(ℜn×Y)l,(1) where x i∈ℜn,y i∈Y={1,−1},i=1,···,l.Linear SVM is to solve the following primal QPPmin w,b,ξ12∥w∥22+Cl∑i=1ξi,s.t.y i((w·x i)+b)≥1−ξi,ξi≥0,i=1,2,···,l,(2)where C is a penalty parameter andξi are the slack variables.The goal is to find an optimal separating hyperplane(w·x)+b=0,(3) where x∈ℜn.The Wolf Dual of(2)can be expressed asmax αl∑j=1αj−12l∑i=1l∑j=1y i y j(x i·x j)αiαjs.t.l∑i=1y iαi=0,0≤αi≤C,i=1,···,l,(4)Neural Networkswhereα∈ℜl are lagrangian multipliers.The optimal separating hyperplane of(3)can be given byw=l∑i=1α∗iy i x i,b=1N sv(y j−N sv∑i=1α∗iy i(x i·x j)),(5)whereα∗is the solution of the dual problem(4),N sv represents the numberof support vectors satisfying0<α<C.A new sample is classified as+1or −1according to thefinally decision function f(x)=sgn((w·x)+b).2.2.Twin Support Vector Machine(TSVM)(Jayadeva et al.(2007)) Consider a binary classification problem of l1positive points and l2negativepoints(l1+l2=l).Suppose that data points belong to positive class aredenoted by A∈ℜl1×n,where each row A i∈ℜn represents a data point. Similarly,B∈ℜl2×n represents all of the data points belong to negativeclass.For the linear case,the TSVM(Jayadeva et al.(2007))determines twononparallel hyperplanes:f+(x)=(w+·x)+b+=0and f−(x)=(w−·x)+b−=0,(6) where w+,w−∈ℜn,b+,b−∈ℜ.Here,each hyperplane is closer to one of the two classes and is at least one distance from the other.A new data point is assigned to positive class or negative class depending upon its prox-imity to the two nonparallel hyperplanes.Formally,forfinding the positive and negative hyperplanes,the TSVM optimizes the following two respective QPPs:min w+,b+,ξ12∥Aw++e+b+∥2+c1e⊤−ξ,s.t.−(Bw++e−b+)+ξ≥e−,ξ≥0,(7) andmin w−,b−,η12∥Bw−+e−b−∥2+c2e⊤+η,s.t.(Aw−+e+b−)+η≥e+,η≥0,(8)where c1,c2≥0are the pre-specified penalty factors,e+,e−are vectors of ones of appropriate dimensions.By introducing the Lagrangian multipliers, the Wolfe dual of QPPs(7)and(8)can be represented as follows:Neural Networksmax αe ⊤−α−12α⊤G (H ⊤H )−1G ⊤αs.t.0≤α≤c 1e −,(9)andmax βe ⊤+β−12β⊤P (Q ⊤Q )−1P ⊤βs.t.0≤β≤c 2e +,(10)where G =[B e −],H =[A e +],P =[A e +]and Q =[B e −],α∈ℜm 2,β∈ℜm 1are Lagrangian multipliers.The non-parallel hyperplanes (6)can be obtained from the solutions αand βof (9)and (10)byv 1=−(H ⊤H )−1G ⊤α,where v 1=[w ⊤+b +]⊤,v 2=−(Q ⊤Q )−1P ⊤β,where v 1=[w ⊤−b −]⊤.(11)For the nonlinear case,we can refer to Jayadeva et al.(2007).3.Univesum-Twin Support Vector Machine(Linear U -TSVM)3.1.Linear CaseWe firstly give the formal representation of classification problem with Universum.Suppose that the training set ˜Tconsists of two parts:˜T=T ∪U,(12)where the symbol ∪means the union of sets;T ={(x 1,y 1),···,(x l ,y l )}∈(ℜn ×Y )l ,U ={x ∗1,···,x ∗u }∈ℜn ,(13)with x i ∈ℜn ,y ∈Y ={−1,1},i =1,···,l and x ∗j ∈ℜn ,j =1,···,u .Thegoal is to induce a real-valued functiony =sgn(g (x )),(14)Neural Networksto infer the label y corresponding to any sample x inℜn space. U-SVM uses theε-insensitive loss for Universum:1 2||w||22+cl∑i=1φε[y i f w,b(x i)]+du∑j=1ρ[f w,b(x∗j)],(15)whereφε[t]=max{0,ε−t}is the hinge loss function,prior knowledge em-bedded in the Universumρ[t]=ρ−ε[t]+ρ−ε[−t](16)Figure1:Three loss functions mentioned in this work.(a):the hinge loss function of the optimization problem(17)∼(19),(b):the standardεloss function in(15),(c):the hinge loss function of the optimization problem(20)∼(22).is theε-insensitive loss(see Figure1(b)).In this way,the prior knowl-edge embedded in the Universum can be reflected in the sum of the losses:∑uj=1ρ[f w,b(x∗j)],The smaller is this value,the higher prior possibility is thisclassifier f w,b,and vice versa(Zhang et al.(2008)).Figure2(a)shows the geometric interpretation of this formulation for a toy example.However,For TSVM,A new data point is assigned to class+1or−1depending upon its proximity to the two nonparallel hyperplanes,so the prior knowledge embed-ded in the Universum can not directly expressed by(16).In this paper,we use this knowledge through adding two hinge loss functions(see Figure1(a) and(c))to the following QQPs respectively.min w+,b+,ξ,ψ12||Aw++e+b+||2+c1e⊤−ξ+c u e⊤uψ,(17)Neural Networks(a)U-SVM(b)U-TSVMFigure2:The geometric interpretations of U-SVM and U-TSVM.s.t.−(Bw++e−b+)+ξ≥e−,ξ≥0,(18)(Uw++e u b+)+ψ≥(−1+ε)e u,ψ≥0,(19) andmin w−,b−,η,ψ∗12||Bw−+e−b−||2+c2e⊤+η+c u e⊤uψ∗,(20)s.t.(Aw−+e+b−)+η≥e+,η≥0,(21)Neural Networks−(Uw −+e u b −)+ψ∗≥(−1+ε)e u ,ψ∗≥0,(22)where ψ=(ψ1,···,ψu )T ,ψ∗=(ψ∗1,···,ψ∗u )T and ε,c 1,c 2,c u ∈[0,+∞]are prior parameters,e +,e −,e u are vectors of ones of appropriate dimensions,U ∈R u ×n denotes Universum class,each row U i ∈R n represents a Univesum sample.(19)means ψi ={0,f w +,b +(x ∗i )≥−1+ε−1+ε−f w +,b +(x ∗i ),f w +,b +(x ∗i )<−1+ε=max {0,−1+ε−f w +,b +(x ∗i )},(23)and (22)meansψ∗i ={0,−f w −,b −(x ∗i )≥−1+ε−1+ε+f w −,b −(x ∗i ),−f w −,b −(x ∗i )<−1+ε=max {0,−1+ε+f w −,b −(x ∗i )},(24)where i =1,···,u .It is easy to see that the sum of (23)and (24)is approx-imately equivalent to (16).Figure 2(b)shows the related geometric interpre-tation.The red “o”denotes positive sample;the blue “∗”denotes negative sample;the green “+”denotes Universum sample.(a):The cyan solid and dotted lines are the hyperplane and boundaries of U -SVM respectively;the part between purple dotted lines is the Universum area of U -SVM.(b):The red and blue solid lines are the two hyperplanes of U -TSVM;the red and blue dotted lines are the two boundaries of U -TSVM;the part between green dotted lines is the Universum area of U -SVM.The Lagrangian corresponding to the problem (17)∼(19)is given byL (Θ)=12||Aw ++e +b +||2+c 1e ⊤−ξ+c u e ⊤u ψ−α⊤(−(Bw ++e −b +)+ξ−e −)−β⊤ξ−µ⊤((Uw ++e u b +)+ψ+(1−ε)e u )−γ⊤ψ(25)where Θ={w +,b +,ξ,ψ,α,β,µ,γ},α=(α1,···,αm 1)⊤,β=(β1,···,βm 1)⊤,µ=(µ1,···,µu )⊤,γ=(γ1,···,γu )⊤)⊤are the Lagrange multipliers.The dual problem can be formulated asmax ΘL (Θ)s.t.∇w +,b +,ξ,ψL (Θ)=0,α,β,µ,γ≥0.(26)Neural NetworksFrom equation(26),we get∇w+L=A⊤(Aw++e+b+)+B⊤α−U⊤µ=0,(27)∇b+L=e⊤+(Aw++e+b+)+e⊤−α−e⊤uµ=0,(28)∇ξL=c1e−−α−β=0,(29)∇ψL=c u e u−µ−γ=0.(30) Sinceβ,γ≥0,(29)and(30)turn to be0≤α≤c1e−,(31)0≤µ≤cµe u.(32) Next,combining(27)and(28)leads to[A⊤e⊤+]⊤[A e+][w+b+]⊤+[B⊤e⊤−]⊤α−[U⊤e⊤u ]⊤µ=0.(33)LetH=[A e+],G=[B e−],O=[U e u],(34) and the augmented vectorϑ+=[w+b+]⊤,the equality(33)can be rewritten as:H⊤Hϑ++G⊤α−O⊤µ=0,i.e.,ϑ+=−(H⊤H)−1(G⊤α−O⊤µ).(35)According to the dual theory of optimization problem(Deng and Tian (2009)),the Wolfe dual of the problem(17)∼(19)can be expressed as:max α,µ−12(α⊤G−µ⊤O)(H⊤H)−1(G⊤α−O⊤µ)+e⊤−α+(ε−1)e⊤uµs.t.0≤α≤c1e−,0≤µ≤cµe−,(36)whereH=[A e+],G=[B e−],O=[U e u],(37)Neural Networksand the augmented vectorϑ+=[w+b+]⊤can be rewritten as:H⊤Hϑ++G⊤α−O⊤µ=0,i.e.,ϑ+=−(H⊤H)−1(G⊤α−O⊤µ).(38) Similarly,the dual of(20)∼(22)is formulated asmax λ,υ−12(λ⊤P−υ⊤S)(Q⊤Q)−1(P⊤λ−S⊤υ)+e⊤+λ+(ε−1)e⊤uυs.t.0≤λ≤c2e+,0≤υ≤cµe+,(39)whereQ=[A e−],P=[B e+],S=[U e u],(40) and the augmented vectorϑ−=[w−b−]⊤,which is given byϑ−=−(Q⊤Q)−1(P⊤λ−S⊤υ).(41) Once vectorsϑ+andϑ−are obtained from(38)and(41),the separating planesw⊤+x+b+=0,w⊤−x+b−=0(42) are known.A new data point x∈ℜn is then assigned to the positive or negative class,depending on which of the two hyperplanes given by(42)it lies closest to,i.e.f(x)=argmin+,−{d+(x),d−(x)},(43) whered+(x)=|w⊤+x+b+|,d−(x)=|w⊤−x+b−|,(44)where|·|is the perpendicular distance of point x from the planes w⊤+x+b+and w⊤−x+b−.Neural Networks3.2.Nonlinear CaseThe above discussion is restricted in the linear case.Here,we will analyze nonlinear U -TSVM by introducing Gaussian kernel functionK (x,x ⊤)=exp(−||x −x ⊤||2/2σ2),(45)where σis a real parameter,and the corresponding transformation:x =Φ(x ),(46)where x ∈H ,H is a Hilbert space.So the training set (10)becomes˜T ∪˜U ={(Φ(x 1),y i ),···,(Φ(x l ),y l )}∪{Φ(x ∗1),···,Φ(x ∗u )}.(47)We consider the following kernel-generated hyperplanes:K (x ⊤,C ⊤)k ++b +=0,K (x ⊤,C ⊤)k −+b −=0,(48)where C ⊤=[A B ]⊤and K is a chosen kernel function.The nonlinear optimization problem can be expressed asmin k +,b +,ξ,ψ12||K (A,C ⊤)k ++e +b +||2+c 1e ⊤−ξ+c u e ⊤u ψ,s.t.−(K (B,C ⊤)k ++e −b +)+ξ≥e −,ξ≥0,(K (U,C ⊤)k ++e −b +)+ψ≥(−1+ε)e u ,ψ≥0,(49)andmink −,b −,η,ψ∗12||K (B,C ⊤)k −+e −b −||2+c 2e ⊤+η+c u e ⊤u ψ∗,s.t.(K (A,C ⊤)k −+e +b −)+η≥e +,η≥0,−(K (U,C ⊤)k −+e +b −)+ψ∗≥(−1+ε)e +,ψ∗≥0.(50)The Wolfe dual of the problem (49)can be expressed as:max α,µ−12(α⊤G Φ−µ⊤O Φ)(H ⊤ΦH Φ)−1(G ⊤Φα−O ⊤Φµ)+e ⊤−α+(ε−1)e ⊤u µs.t.0≤α≤c 1e −,0≤µ≤c µe −,(51)Neural NetworkswhereHΦ=[K(A,C⊤)e+],GΦ=[K(B,C⊤)e−],OΦ=[K(U,C⊤)e u],(52) and the augmented vectorρ+=[k+b+]⊤can be rewritten as:H⊤ΦHΦρ++G⊤Φα−O⊤Φµ=0,i.e.,ρ+=−(H⊤ΦHΦ)−1(G⊤Φα−O⊤Φµ).(53)In a similar manner,the dual of(50)can be formulated asmax λ,υ−12(λ⊤PΦ−υ⊤SΦ)(Q⊤ΦQΦ)−1(P⊤Φλ−S⊤Φυ)+e⊤+λ+(ε−1)e⊤+υs.t.0≤λ≤c2e+,0≤υ≤cµe+,(54)whereQΦ=[K(A,C⊤)e−],PΦ=[K(B,C⊤)e+],SΦ=[K(U,C⊤)e u],(55) and the augmented vectorρ−=[k−b−]⊤,which is given byρ−=−(Q⊤ΦQΦ)−1(P⊤Φλ−S⊤Φυ).(56)Once vectorsρ+andρ−are obtained from(53)and(56),a new data point x∈ℜn is then assigned to the positive or negative class,depending on a manner similar to the linear case.Consider the primal optimization problem(17)-(19)of U-TSVM,suppose c u=0and Universum data are empty,our model will degenerate to the TSVM;Therefore,TSVM are the special cases of our models.4.ExperimentsWe compare U-TSVM against TSVM and U-SVM2on various data sets in this section.For simplicity,we set c1=c2=c u in U-TSVM.All codes are 2U-SVM is a Universum method based on standard Support Vector Machine.Neural Networkswritten in MATLAB2010.The experiment environment:Intel Core I5CPU,2GB memory.The“quadprog”function in Matlab is used to solve the relatedoptimization problems in this paper3.The testing accuracies are computedby using standard10-fold cross validation.The parameters c1,c2,c u and theRBF kernel parameterσis selected from the set{2i|i=−7,···,7}by10-fold cross validation on the tuning set comprising of random10%of the trainingdata.Once the parameters are selected,the tuning set is returned to thetraining set to learn thefinal decision function.The accuracy is defined byaccuracy=number of ture positives+number of true negativesall samples(57)4.1.Toy dataIn the subsection,we use a2-D toy data to show the difference between U-TSVM and TSVM and U-SVM algorithms.Thefirst data set contains 200positive data,200negative data and200Universum data.All points are generated by Gaussian distribution:positive points(the mean ofµ=[0,0], the standard deviation ofσ=[3,3]),negative points(µ=[4,4],σ=[3,3]) and Universum points(µ=[2,2],σ=[1.5,1.5]).We carry out two exper-iments and repeat10times for each one.For thefirst experiment,we use the20%of the points for training and others for testing.The comparative results of U-TSVM and TSVM are given in Figure3.For the second ex-periment,we respectively use50,80,110,130,160of data as the Universum points.The comparative results of U-TSVM and U-SVM are shown in Figure 4and Figure5.For thefirst experiment,The average accuracies of TSVM and U-TSVM are 79.01%and83.21%,respectively.This shows that the knowledge embedded in the Univesum points indeed helps the U-TSVM to have better performance than its original model.For the second experiment,we compare the U-SVM and U-TSVM.With the increase of Universum data,the average accuracies of the two algorithms are obviously improved,which prove the importance of Universum data again.In addition,we alsofind that the result of U-TSVM is superior to U-SVM.The main reason is that U-TSVM is rooted in TSVM and has all advantages of TSVM.3In fact,U-TSVM can be quickly solved by Successive Over Relaxation(SOR)(Shao et al.(2011);Mangasarian and Musicant(1999))technique.However the focus of this pa-per is to analyze the performance of U-TSVM,here we only use the traditional“quadprog”function in Matlab to solve the QPPS of U-TSVM.Neural Networkstraining set the training settesting set the testing setof TSVM of U-TSVMFigure3:The resuts of TSVM and U-TSVM in thefirst toy data.The“+and“∗”scatters of(e)and(f)denote positive point and negative point,respectively.Neural NetworksFigure4:The resuts of TSVM and U-TSVM in the second toy data.Neural NetworksFigure5:The accuracy rates of TSVM and U-TSVM in the second toy data.4.2.UCI datasetsIn this section,we perform these methods on the UCI Machine Learning Repository datasets,which is a collection of databases,domain theories,and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms(Asuncion and Newman (2007)).For each dataset,we randomly select the same number of data from different classes to compose a dataset.30%percent of each extracted dataset are for training,35%of them are used to generate Universum data4,others for testing.Each experiment repeats10times.These average results are shown in the Table1.From Table1,we canfind that the U-TSVM which adds the Universum data outperforms the normal TSVM and U-SVM in the most cases.At the same time,the computing speed of U-TSVM is about four times faster than U-SVM.4.3.Dataset In Trouble of Moving Freight Car Detection System(called TFDS) TFDS system is an intelligent system that integrates high speed digital image collection,real-time processing for mass of image data and recognizing 4Each Universum examples is generated by selecting the data from two different cate-gories and then combined with a mean coefficientNeural NetworksTable1:The testing accuracy and training times on UCI datasets in the case of RBFDatasets U-TSVM U-SVM TSVM Accuracy Accuracy Accuracy Time(s)Time(s)Time(s)Hepatitis76.21±3.1275.16±4.2474.54±3.51 (155×19) 2.899.43–Australian66.48±3.4764.91±4.1364.97±3.25 (690×14)30.22109.25–BUPA liver65.21±3.2264.67±2.7864.22±3.18 (345×6)11.3845.61–CMC63.34±2.3163.90±2.8862.71±2.75 (844×9)37.54131.21–Credit73.93±2.6371.64±3.2472.47±2.89 (690×19)25.1186.46–Diabetis59.76±2.1758.94±3.6857.67±2.90 (768×8)29.54121.33–Flare-Solar55.26±2.3454.11±1.2353.46±3.26 (1066×9)40.12116.47–German58.55±2.5359.37±2.6458.23±2.65 (1000×20)43.17137.21–Heart-Statlog71.54±3.2769.45±4.2970.28±3.39 (270×14)9.2932.31–Image79.34±2.1179.84±3.2777.38±2.41 (2310×18)89.33342.21–Ionosphere70.44±2.7868.89±1.1967.34±1.84 (351×34)13.3750.30–Spect67.21±1.9767.11±2.4266.59±2.24 (267×44)9.5647.29–shoes;(c),(d):Universum brake shoes;(e),(f):fault brake shoes.the train’s fault.It plays an important role in the transportation safetyfield. In this section,we will apply our method to recognize the brake shoe fault (see Figure6).Brake shoe is the key component in the train braking system. The loss of brake shoe will probably result in a serious accident.TFDS datasets are collected in Changsha City,Hunan Province of China.Figure 7.shows the brake shoe’s image:(a)a normal brake shoe;(c)the brake shoe has lost.The brake shoe showed in(b)is in a special middle state.We canNeural NetworksFigure8:The results of brake shoe’s recognition.The left:The size of the Universum samples isfixed to be600.The right:The number of the training samples isfixed to be 100.the number of positive samples and negative samples is equal in this experiment. take these ones as Universum samples.Adaboost(Viola and Jones(2001))is employed to detect the brake shoe’s position in an image.Figure8shows the detection result of Adaboost method.We use these results as the training samples for recognizing brake shoes,which are also randomly rotated between [−10◦,+10◦]and shifted between[-2,+2]to generatefive virtual samples. The size of them is20×20pixels and each dimension value is normalized to [0,1].From Figure8,we can see that the error rate of U-TSVM is lower than U-SVM when the number of Universum samples is less than5000.How-ever,when the number of Universum samples is more than6000,the two method have almost the same performance and the Universum samples playa dominant role in this case.5.Conclusion and DiscussionUniversum is a new concept proposed recently,which is defined to be the sample that does not belong to any classes concerned Weston et al. (2006).Previous works have proved Universum data is helpful for super-vised learning problem.In this paper,a new Twin Support Vector Machine with Universum(U-TSVM)has been proposed,which can exploit Universum samples to improve the classification performance of pared with U-SVM,the main differences are as follows:1)We cancel the limitation of parallel hyperplanes and define the region of Universum data of U-TSVM by using two Hinge Loss functions.the definition mode of U-SVM is only aNeural Networksspecial case of that of U-TSVM.2)In order to further enhancing the com-puting speeding of algorithm,U-TSVM uses two smaller sized model to take place a large model.The theory shows U-TSVM is approximately four times faster than the U-SVM.3)For improving classification accuracy,U-TSVM applies two nonparallel hyperplanes to construct thefinally classifier and a new data point is then assigned to the positive or negative class depend-ing upon its proximity to the two nonparallel hyperplanes.All experiments show the performance of U-TSVM is better than the original twin support vector machine.This fully demonstrates Universum examples is helpful to improve the generalization ability of model.At the same time,experiments also demonstrate U-TSVM outperforms U-SVM in most cases.Furthermore, we also confirm the training time of U-TSVM is faster than U-SVM.In the future work,a semi-supervised U-TSVM may be developed for classification and an online version is also interesting and under our consideration.6.AcknowledgmentThis work has been partially supported by grants from National Natu-ral Science Foundation of China(NO.70921061),the CAS/SAFEA Inter-national Partnership Program for Creative Research Teams,Major Inter-national(Ragional)Joint Research Project(NO.71110107026),the President Fund of GUCAS.ReferencesAsuncion,A.,Newman,D.,2007.UCI machine learning repository. Chen,S.,Zhang,C.,2009.Selecting informative universum sample for semi-supervised learning.In:IJCAI.pp.1016–1021.Cherkassky,V.,Dhar,S.,Dai,W.,2011.Practical conditions for effectiveness of the universum learning.IEEE Transactions on Neural Networks22(8), 1241–1255.Deng,N.,Tian,Y.,2009.Support vector machines:Theory,Algorithms and Extensions.Science Press,China.Ghorai,S.,Mukherjee,A.,Dutta,P.K.,2009.Nonparallel plane proximal classifier.Signal Process89(4),510C522.Neural NetworksJayadeva,Khemchandani,R.,Chandra,S.,2007.Twin support vector ma-chines for pattern classification.IEEE Trans.Pattern Anal.Mach.Intell. 29(5),905–910.Khemchandani,R.,Jayadeva,Chandra,S.,2009.Optimal kernel selection in twin support vector machines.Optimization Letters3(1),77–88.Kumar,M.A.,Gopal,M.,2008.Application of smoothing technique on twin support vector machines.Pattern Recognition Letters29(13),1842–1848.Kumar,M.A.,Gopal,M.,2009.Least squares twin support vector machines for pattern classification.Expert Systems with Applications36(4),7535–7543.Mangasarian,O.,Wild,E.,2006.Multisurface proximal support vector clas-sification via generalized eigenvalues.IEEE Transactions on Pattern Anal-ysis and Machine Intellegence28(1),69–74.Mangasarian,O.L.,Musicant,D.R.,1999.Successive overrelaxation for support vector machines.IEEE Transactions on Neural Networks10(5), 1032–1037.Peng,X.,2010.Primal twin support vector regression and its sparse approx-imation.Neurocomputing73(16-18),2846–2858.Peng,X.,2012.Efficient twin parametric insensitive support vector regression model.Neurocomputing79,26–38.Shao,Y.-H.,Deng,N.-Y.,Aug.2011.A coordinate descent margin based-twin support vector machine for classification.Neural Networks.Shao,Y.-H.,Zhang,C.-H.,Wang,X.-B.,Deng,N.-Y.,2011.Improvements on twin support vector machines.IEEE Transactions on Neural Networks 22(6),962–968.Shen,C.,Wang,P.,Shen,F.,Wang,H.,2011.Uboost:Boosting with the universum.IEEE Transaction on Pattern Analysis and Machine Intelli-gence.Sinz,F.H.,Chapelle,O.,Agarwal,A.,Sch¨o lkopf,B.,2008.An analysis of inference with the universum.In:Advances in Neural Information Pro-cessing Systems20.MIT Press,pp.1369–1376.Neural NetworksVapnik,1995.The nature of statistical learning theory.Springer-Verlag New York,Inc.,New York,NY,USA.Viola,P.,Jones,M.,Apr.2001.Rapid object detection using a boosted cascade of simple features.In:CVPR2001.Vol.1.IEEE Comput.Soc, Los Alamitos,CA,USA,pp.511–518.Weston,J.,Collobert,R.,Sinz,F.,Bottou,L.,Vapnik,V.,2006.Inference with the Universum.In:ICML’06:Proceedings of the23rd international conference on machine learning.ACM,pp.1009–1016.Zhang,D.,Wang,J.,Wang,F.,Zhang,C.,2008.Semi-supervised classifi-cation with universum.In:SIAM International Conference on Data Min-ing(SDM).。
fabric 造句
fabric造句1、A soft fabric made of this wool or of similar fibers软毛织物由这种羊毛或者类似的纤维织成的细软毛织物2、Prediction of Fabric Drape with Support Vector Machine用支持向量机预测织物的悬垂性能3、Research on the Simulation of Fabric Drape织物悬垂性的计算机仿真研究4、A fabric or an article of apparel made from such silk.马拉布生丝织物由这种生丝制成的织物或衣物5、Stretch fabric is quick-drying and wicks moisture.弹力织物快干和灯芯水分。
6、a fabric that launders well一种耐洗的料子7、An extra lining between the outer fabric and regular lining of a garment.内衬衣物表面与正常衬料间的一层附加衬料8、Research on Application of the PTFE Fabric BushPTFE纤维织物衬套应用的研究9、a lightweight fabric woven with white threads across a colored warp用白色的线穿插织成的轻度的纤维织品。
10、A similar sturdy fabric made on a power loom手工纺织呢纺织机织出的结实的织物11、Fabric treated with clay, oil, and pigments to make it waterproof.油布,漆布用粘土、油和色素处理过使之防水的织物12、The fabric is fed through the machine.布料放进了机器。
svm拉格朗日乘子法
SVM拉格朗日乘子法引言支持向量机(Support Vector Machine,SVM)是一种具有广泛应用的非线性分类器。
其核心思想是通过构造一个最优的超平面,将不同类别的样本尽可能地分开。
在SVM的训练过程中,拉格朗日乘子法被广泛应用于解决优化问题。
SVM基本原理SVM的核心思想是在样本空间中找到一个超平面,将不同类别的样本分隔开。
对于二分类问题,我们可以定义超平面为:w⋅x+b=0其中,w是法向量,x是样本特征向量,b是截距。
对于线性可分的情况,我们可以找到无穷多个超平面,但是我们需要寻找一个最优的超平面,使得两个不同类别的样本距离该超平面的间隔最大。
这个间隔称为“间隔最大化”(maximum margin)。
函数间隔和几何间隔给定一个样本点(x i,y i),其中x i是样本特征向量,y i是样本的类别标签(y i∈{−1,1}),函数间隔f(x i)=y i(w⋅x i+b)。
若将超平面w⋅x+b=0按比例缩放,函数间隔的值也会相应地缩放。
为了消除可缩放性带来的影响,我们引入几何间隔(geometric margin),几何间隔定义为:γi=f(x i)∥w∥其中∥w∥是超平面的范数。
优化问题SVM的优化问题可以表达为:min w,b 12∥w∥2s.t. y i(w⋅x i+b)−1≥0,∀i这是一个二次优化问题,同时存在的不等式约束使得问题的求解变得困难。
为了解决这个优化问题,我们引入拉格朗日乘子法。
拉格朗日乘子法拉格朗日乘子法是一种优化方法,用于解决包含等式约束和不等式约束的优化问题。
它通过构造一个拉格朗日函数,将主问题转化为一个无约束优化问题。
对于我们的SVM优化问题,我们可以定义拉格朗日函数为:L(w,b,α)=12∥w∥2−∑αini=1[y i(w⋅x i+b)−1]其中,αi是拉格朗日乘子,用于对不等式约束进行惩罚。
通过求解拉格朗日函数的极小极大问题,我们可以得到原优化问题的最优解。
ml算法公式
ML算法公式简介机器学习(Machine Learning, ML)算法的公式因算法类型和具体应用而异。
以下是几种常见的机器学习算法的公式:1.线性回归(Linear Regression):线性回归是一种通过找到最佳拟合直线来预测连续值的算法。
其公式如下:y = b0 + b1 * x其中,y是因变量,x是自变量,b0和b1是模型参数,通过最小化预测值与实际值之间的平方误差来估计。
2.逻辑回归(Logistic Regression):逻辑回归是一种用于分类问题的机器学习算法,其公式如下:h(x) = g(b0 + b1 * x) = 1 / (1 + e^(-(b0 + b1 * x)))其中,h(x)表示在给定自变量x的情况下,因变量为1的概率,g(x)是sigmoid函数,将线性回归模型的输出映射到0和1之间。
3.决策树(Decision Tree):决策树是一种监督学习算法,其公式如下:if condition1 then result1 else result2其中,condition1是一个或多个属性上的条件,result1和result2是对应的分类结果。
决策树通过递归地将数据集划分为更纯的子集来构建决策树。
4.随机森林(Random Forest):随机森林是一种集成学习算法,其公式如下:y = argmax(w * f(x)) 其中,w是一个权重向量,f(x)是一个基学习器(通常是一个决策树)。
随机森林通过构建多个基学习器并将它们的输出组合起来以进行预测。
5.支持向量机(Support Vector Machine, SVM):支持向量机是一种分类算法,其公式如下:f(x) = w * x + b 其中,w和b是模型参数,x是输入特征向量,f(x)是分类函数。
支持向量机通过找到能够将不同类别的数据点最大化分隔的决策边界来实现分类。
6.K-近邻(K-Nearest Neighbor, KNN):K-近邻是一种基于实例的学习算法,其公式如下:y = argmax(k) 其中,k是最近邻的距离度量,y是最近邻的类别标签。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
支持向量机算法和软件ChemSVM介绍陆文聪1 ,陈念贻1,叶晨洲2,李国正2(1. 上海大学化学系计算机化学研究室,上海,200436)(2. 上海交通大学图象及模式识别研究所,上海,200030)摘要Vladimir N. Vapnik等提出的统计学习理论(statistical learning theory,简称SLT)和支持向量机(support vector machine,简称SVM)算法已取得令人鼓舞的研究成果。
本文旨在对这一新理论和新算法的原理作一介绍,并展望这一计算机学界的新成果在化学化工领域的应用前景。
“ChemSVM”软件提供了通用的支持向量机算法,并将其与数据库、知识库、原子参数及其它数据挖掘方法有机地集成起来。
关键词模式识别;支持向量机;支持向量分类;支持向量回归中图分类号:O 06-04Introduction to the Algorithm of Support Vector Machine and the Software ChemSVMLU Wen-cong1, CHEN Nian-yi1, YE Chen-zhou2, LI Guo-zheng2(1. Laboratory of Chemical Data Mining, Department of Chemistry, Shanghai University, Shanghai, 200436, China)(2. Institute of Image and Pattern Recognition, Jiaotong University, Shanghai, 200030, China)Abstracts: The great achievements have been approached in the development of statistical learning theory (STL) and support vector machine (SVM) as well as kernel techniques. This paper aimed at introducing the principle of SLT and SVM algorithm and prospecting their applications in the fields of chemistry and chemical industry..Key Words:Statistical learning theory, Support vector machine, Support vector classification, Support vector regression众所周知,统计模式识别、线性或非线性回归以及人工神经网络等方法是数据挖掘的有效工具,已随着计算机硬件和软件技术的发展得到了广泛的应用[1-4],我们亦曾将若干数据挖掘方法用于材料设计和药物构效关系的研究[5-12]。
但多年来我们也受制于一个难题:传统的模式识别或人工神经网络方法都要求有较多的训练样本,而许多实际课题中已知样本较少。
对于小样本集,训练结果最好的模型不一定是预报能力最好的模型。
因此,如何从小样本集出发,得到预报(推广)能力较好的模型,遂成为模式识别研究领域内的一个难点,即所谓“小样本难题”。
最近我们注意到:数学家Vladimir N. Vapnik等通过三十余年的严格的数学理论研究,提出来的统计学习理论(statistical learning theory,简称SLT)[13]和支持向量机(support vector machine,简称SVM)算法已得到国际数据挖掘学术界的重视,并在语音识别[14]、文字识别[15]、药物设计[16]、组合化学[17]、时间序列预测[18]等研究领域得到成功应用,该新方法从严格的数学理论出发,论证和实现了在小样本情况下能最大限度地提高预报可靠性的方法,其研究成果令人鼓舞。
张学工、杨杰等率先将有关研究成果引入国内计算机学界,并开展了SVM算法及其应用研究[19],但国内化学化工领域内尚未见SVM的应用报道。
收稿日期:2002-06-10;修回日期:2002-09-10资金资助:国家自然科学基金委和美国福特公司联合资助,批准号:9716214作者简介:陆文聪(1964—),男,教授。
研究方向:计算机化学。
本文是本论文系列的第一篇,主要介绍Vapnik 等在SLT 基础上提出的SVM 算法,包括支持向量分类(support vector classification ,简称SVC )算法和支持向量回归(support vector regression ,简称SVR )算法,并展望这一计算机学界的新成果在化学化工领域的应用前景。
1 统计学习理论(SLT )简介[13]1.1 背景现实世界中存在大量我们尚无法准确认识但却可以进行观测的事物,如何从一些观测数据(样本)出发得出目前尚不能通过原理分析得到的规律,进而利用这些规律预测未来的数据,这是统计模式识别(基于数据的机器学习的特例)需要解决的问题。
统计是我们面对数据而又缺乏理论模型时最基本的(也是唯一的)分析手段。
Vapnik 等人早在20世纪60年代就开始研究有限样本情况下的机器学习问题,但这些研究长期没有得到充分的重视。
近十年来,有限样本情况下的机器学习理论逐渐成熟起来,形成了一个较完善的SLT 体系。
而同时,神经网络等较新兴的机器学习方法的研究则遇到一些重要的困难,比如如何确定网络结构的问题、过拟合与欠拟合问题、局部极小点问题等。
在这种情况下,试图从更本质上研究机器学习的SLT 体系逐步得到重视。
1992-1995年,Vapnik 等在SLT 的基础上发展了SVM 算法,在解决小样本、非线性及高维模式识别问题中表现出许多特有的优势,并能够推广应用到函数拟合等其它机器学习问题。
很多学者认为,它们正在成为继模式识别和神经网络研究之后机器学习领域中新的研究热点,并将推动机器学习理论和技术有重大的发展。
神经网络研究容易出现过拟合问题,是由于学习样本不充分和学习机器设计不合理的原因造成的,由于此矛盾的存在,所以造成在有限样本情况下:1)经验风险最小不一定意味着期望风险最小;2)学习机器的复杂性不但与所研究的系统有关,而且要和有限的学习样本相适应。
SLT 体系及其SVM 算法在解决“小样本难题”过程中所取得的核函数应用等方面的突出进展令人鼓舞,已被认为是目前针对小样本统计估计和预测学习的最佳理论。
1.2 原理Vapnik 的SLT 的核心内容包括下列四个方面:1) 经验风险最小化原则下统计学习一致性的条件;2) 在这些条件下关于统计学习方法推广性的界的结论;3) 在这些界的基础上建立的小样本归纳推理原则;4) 实现这些新的原则的实际方法(算法)。
设训练样本集为()()R y R x x y x y m n n ∈∈,11,,,,, ,其拟合(建模)的数学实质是从函数集中选出合适的函数 f (x),使风险函数:dxdy y x P x f y f R Y X ),())((][2⎰⨯-= (1) 为最小。
但因其中的几率分布函数),(y x P 为未知,上式无法计算,更无法求其极小。
传统的统计数学遂假定上述风险函数可用经验风险函数][f R em p 代替: ∑=-=ni i em p x f y n f R 12))((1][ (2) 根据大数定律,式(2)只有当样本数n 趋于无穷大且函数集足够小时才成立。
这实际上是假定最小二乘意义的拟合误差最小作为建模的最佳判据,结果导致拟合能力过强的算法的预报能力反而降低。
为此,SLT 用结构风险函数 ][f R h 代替][f R em p ,并证明了][f R h 可用下列函数求极小而得: ⎭⎬⎫⎩⎨⎧-++n h n h f R emp S h )4/ln()1/2(ln ][min δ (3)此处n 为训练样本数目,S h 为VC 维空间结构,h 为VC 维数,即对函数集复杂性或者学习能力的度量。
1-δ为表征计算的可靠程度的参数。
SLT 要求在控制以VC 维为标志的拟合能力上界(以限制过拟合)的前提下追求拟合精度。
控制VC 维的方法有三大类:1〕拉大两类样本点集在特征空间中的间隔;2〕缩小两类样本点各自在特征空间中的分布范围;3〕降低特征空间维数。
一般认为特征空间维数是控制过拟合的唯一手段,而新理论强调靠前两种手段可以保证在高维特征空间的运算仍有低的VC 维,从而保证限制过拟合。
对于分类学习问题,传统的模式识别方法强调降维,而SVM 与此相反。
对于特征空间中两类点不能靠超平面分开的非线性问题,SVM 采用映照方法将其映照到更高维的空间,并求得最佳区分二类样本点的超平面方程,作为判别未知样本的判据。
这样,空间维数虽较高,但VC 维仍可压低,从而限制了过拟合。
即使已知样本较少,仍能有效地作统计预报。
对于回归建模问题,传统的化学计量学算法在拟合训练样本时,将有限样本数据中的误差也拟合进数学模型了。
针对传统方法这一缺点,SVR 采用“ε 不敏感函数”,即对于用f (x)拟合目标值y 时()b x w x f T +=,目标值y i 拟合在 ε≤--b x w y T i 时,即认为进一步拟合是无意义的。
这样拟合得到的不是唯一解,而是一组无限多个解。
SVR 方法是在一定约束条件下,以2w 取极小的标准来选取数学模型的唯一解。
这一求解策略使过拟合受到限制,显著提高了数学模型的预报能力。
2 支持向量分类(SVC )算法2.1 线性可分情形SVM 算法是从线性可分情况下的最优分类面(Optimal Hyperplane )提出的。
所谓最优分类面就是要求分类面不但能将两类样本点无错误地分开,而且要使两类的分类空隙最大。
d 维空间中线性判别函数的一般形式为()b x w x g T +=,分类面方程是0=+b x w T ,我们将判别函数进行归一化,使两类所有样本都满足()1≥x g ,此时离分类面最近的样本的()1=x g ,而要求分类面对所有样本都能正确分类,就是要求它满足n i b x w y i T i ,,2,1,01)( =≥-+。
(4)式(4)中使等号成立的那些样本叫做支持向量(Support Vectors )。
两类样本的分类空隙(Margin )的间隔大小:Margin =w /2 (5)因此,最优分类面问题可以表示成如下的约束优化问题,即在条件(4)的约束下,求函数())(21221w w w w T ==φ (6) 的最小值。
为此,可以定义如下的Lagrange 函数: ]1)([21),,(1-+-=∑=b x w y w w b w L i T i n i i T αα (7) 其中,0≥i a 为Lagrange 系数,我们的问题是对w 和b 求Lagrange 函数的最小值。