Selection of Scale-Invariant Parts for Object Class Recognition
图像局部特证及其匹配的详细讲解
We want to:
detect the same interest points regardless of image changes
Models of Image Change
Harris Detector: Some Properties
• Quality of Harris detector for different scale changes
Repeatability rate:
# correspondences # possible correspondences
1
Harris Detector: Mathematics
Measure of corner response:
R det M k trace M 2
det M 12 trace M 1 2
(k – empirical constant, k = 0.04-0.06)
• Shifting a window in any direction should give a large change in intensity
Harris Detector: Basic Idea
“flat” region: no change in all directions
“edge”: no change along the edge direction
Matching with Invariant Features
Example: Build a Panorama
M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003
基于匹配质量提纯的改进D-Nets算法
Journal of C o m p u t e r Applications计算机应用,2018, 38(4): 1121 - 1126I S S N 1001-9081C ODE N J Y I I D U2018-04-10http: //w w w.joca. c n文章编号:l〇〇l-9081(2018)04-1121-06D O I:10.11772/j. issn. 1001-9081.2017102394基于匹配质量提纯的改进D-N ets算法叶峰'洪崢,赖乙宗,赵雨亭,谢先治(华南理工大学机械与汽车工程学院,广州51064〇)(*通信作者电子邮箱mefengye@ scut. edu. cn)摘要:针对基于特征的图像配准在较大仿射变形以及存在相似目标情况下适应性不佳的问题,为减少算法的 时间开销,提出一种基于匹配质量提纯的改进描述网(D-N e t s)算法。
首先,通过F A S T算法检测特征点,并根据Harris 角点响应函数以及网格划分相结合的方式进行筛选;然后,在计算直线描述子的基础上构建哈希表和投票表决,从而 得到粗匹配对;最后,采用基于匹配质量的提纯方法剔除误匹配。
针对牛津大学M i k o l a j c z y k标准图像数据集进行了实验,结果表明:提出的改进D-N e t s算法在尺度、视差和光照变化较大的情况下平均配准精度为92.2%,平均时间开销为2.48 s。
与尺度不变特征变换(S I F T)、仿射-尺度不变特征变换(A f f i n e-S I F T)、原始D-N e t s等算法相比,提出的改进算法与原始算法的配准精度基本相当,但速度最高可提升80倍,并具有最佳鲁棒性,显著优于S I F T、A S I F T算法,非 常适于图像配准应用。
关键词:图像配准;特征匹配;匹配提纯;特征点;特征描述子中图分类号:T P391.41文献标志码:AImproved D-Nets algorithm with matching quality purificationYE Feng \HONG Zheng, LAI Yizong, ZHAO Yuting, XIE Xianzhi(School o f Mechanical & Automotive Engineering, South China University o f Technology, Guangzhou Guangdong 510640, China)Abstract:T o address the u n d e r p e r f o r m a n c e of feature-based i m a g e registration u n d e r situations with large affined e formation a n d similar targets, a n d r e d u c e the tim e cost, a n i m p r o v e d Descriptor-Nets (D-N e t s)algorithm b a s e d o n m a t c h i n gquality purification w a s proposed. T h e feature points w e r e detected b y Features F r o m Accel e r a t e d S e g m e n t Test ( F A S T) algorithm initially, a n d t h e n they w e r e filtered a ccording to Harris c o m e r r e s ponse function a n d m e s h i n g.F u r t h ermore, o n the basis of calculating the line-descriptor, a h a s h table a n d a vote w e r e constructed, thus r o u g h-m a t c h i n g pairs c o u l d b e obtained. Eventually, m i s m a t c h e s w e r e eliminated b y the purification b a s e d o n m a t c h i n g quality. E x p e r i m e n t s w e r e carried out o n M i k o l a j c z y k standard i m a g e data set of O x f o r d University. Results s h o w that the p r o p o s e d i m p r o v e d D-N e t s algorithm h a s a n average registration a c c u r a c y of 92. 2%a n d a n average t ime cost of 2. 48 s u n d e r large variation of scale, parallax a n d light.C o m p a r e d to Scale-Invariant Feature T r a n s f o r m( S I F T),A f£i n e-S I F T( A S I F T),original D-N e t s algorithms, the i m p r o v e dalgorithm h a s a similar registration a c c u r a c y with the original algorithm b u t with u p to 80 times s p e e d boost, a n d it h a s the best robustness w h i c h significantly outperforms S I F T a n d A S I F T,w h i c h is practical for i m a g e registration applications.Key words:i m a g e registration; feature matc h i n g;m a t c h i n g purification; feature point; feature descriptor〇引言基于图像不变特征的配准方法以其鲁棒性好、计算量小 等特点广泛应用于计算机视觉的诸多领域,如目标识别[1]、图片拼接[2]、影像测量[3]、立体视觉[4],以及场景重建[5]等。
a comparsion of affine region detectors
International Journal of Computer Vision 65(1/2), 43–72, 2005c 2005Springer Science +Business Media, Inc. Manufactured in The Netherlands.DOI:10.1007/s11263-005-3848-xA Comparison of Affine Region DetectorsK.MIKOLAJCZYKUniversity of Oxford,OX13PJ,Oxford,United Kingdomkm@T.TUYTELAARSUniversity of Leuven,Kasteelpark Arenberg10,3001Leuven,Belgiumtuytelaa@esat.kuleuven.beC.SCHMIDINRIA,GRAVIR-CNRS,655,av.de l’Europe,38330,Montbonnot,Franceschmid@inrialpes.frA.ZISSERMANUniversity of Oxford,OX13PJ,Oxford,United Kingdomaz@J.MATASCzech Technical University,Karlovo Namesti13,12135,Prague,Czech Republicmatas@cmp.felk.cvut.czF.SCHAFFALITZKY AND T.KADIRUniversity of Oxford,OX13PJ,Oxford,United Kingdomfsm@tk@L.V AN GOOLUniversity of Leuven,Kasteelpark Arenberg10,3001Leuven,Belgiumvangool@esat.kuleuven.beReceived August20,2004;Revised May3,2005;Accepted May11,2005First online version published in January,2006Abstract.The paper gives a snapshot of the state of the art in affine covariant region detectors,and compares their performance on a set of test images under varying imaging conditions.Six types of detectors are included: detectors based on affine normalization around Harris(Mikolajczyk and Schmid,2002;Schaffalitzky and Zisserman,2002)and Hessian points(Mikolajczyk and Schmid,2002),a detector of‘maximally stable extremal regions’,proposed by Matas et al.(2002);an edge-based region detector(Tuytelaars and Van Gool,1999) and a detector based on intensity extrema(Tuytelaars and Van Gool,2000),and a detector of‘salient regions’,44Mikolajczyk et al.proposed by Kadir,Zisserman and Brady(2004).The performance is measured against changes in viewpoint,scale, illumination,defocus and image compression.The objective of this paper is also to establish a reference test set of images and performance software,so that future detectors can be evaluated in the same framework.Keywords:affine region detectors,invariant image description,local features,performance evaluation1.IntroductionDetecting regions covariant with a class of transforma-tions has now reached some maturity in the computer vision literature.These regions have been used in quite varied applications including:wide baseline matching for stereo pairs(Baumberg,2000;Matas et al.,2002;Pritchett and Zisserman,1998;Tuytelaars and Van Gool,2000),reconstructing cameras for sets of disparate views(Schaffalitzky and Zisserman, 2002),image retrieval from large databases(Schmid and Mohr,1997;Tuytelaars and Van Gool,1999), model based recognition(Ferrari et al.,2004;Lowe, 1999;Obdrˇz´a lek and Matas,2002;Rothganger et al., 2003),object retrieval in video(Sivic and Zisserman, 2003;Sivic et al.,2004),visual data mining(Sivic and Zisserman,2004),texture recognition(Lazebnik et al.,2003a,b),shot location(Schaffalitzky and Zisserman,2003),robot localization(Se et al.,2002) and servoing(Tuytelaars et al.,1999),building panoramas(Brown and Lowe,2003),symmetry detection(Turina et al.,2001),and object categoriza-tion(Csurka et al.,2004;Dorko and Schmid,2003; Fergus et al.,2003;Opelt et al.,2004).The requirement for these regions is that they should correspond to the same pre-image for dif-ferent viewpoints,i.e.,their shape is notfixed but automatically adapts,based on the underlying image intensities,so that they are the projection of the same 3D surface patch.In particular,consider images from two viewpoints and the geometric transformation between the images induced by the viewpoint change. Regions detected after the viewpoint change should be the same,modulo noise,as the transformed versions of the regions detected in the original image–image transformation and region detection commute.As such,even though they have often been called invariant regions in the literature(e.g.,Dorko and Schmid,2003;Lazebnik et al.,2003a;Sivic and Zisserman,2004;Tuytelaars and Van Gool,1999),in principle they should be termed covariant regions since they change covariantly with the transformation. The confusion probably arises from the fact that,even though the regions themselves are covariant,the nor-malized image pattern they cover and the feature de-scriptors derived from them are typically invariant. Note,our use of the term‘region’simply refers to a set of pixels,i.e.any subset of the image.This differs from classical segmentation since the region bound-aries do not have to correspond to changes in image appearance such as colour or texture.All the detectors presented here produce simply connected regions,but in general this need not be the case.For viewpoint changes,the transformation of most interest is an affinity.This is illustrated in Fig.1. Clearly,a region withfixed shape(a circular exam-ple is shown in Fig.1(a)and(b))cannot cope with the geometric deformations caused by the change in view-point.We can observe that the circle does not cover the same image content,i.e.,the same physical surface. Instead,the shape of the region has to be adaptive,or covariant with respect to affinities(Fig.1(c)–close-ups shown in Fig.1(d)–(f)).Indeed,an affinity is suffi-cient to locally model image distortions arising from viewpoint changes,provided that(1)the scene sur-face can be locally approximated by a plane or in case of a rotating camera,and(2)perspective effects are ignored,which are typically small on a local scale any-way.Aside from the geometric deformations,also pho-tometric deformations need to be taken into account. These can be modeled by a linear transformation of the intensities.To further illustrate these issues,and how affine covariant regions can be exploited to cope with the geometric and photometric deformation between wide baseline images,consider the example shown in Fig.2.Unlike the example of Fig.1(where a circular region was chosen for one viewpoint)the elliptical image regions here are detected independently in each viewpoint.As is evident,the pre-images of these affineA Comparison of Affine Region Detectors45Figure 1.Class of transformations needed to cope with viewpoint changes.(a)First viewpoint;(b,c)second viewpoint.Fixed size circular patches (a,b)clearly do not suffice to deal with general viewpoint changes.What is needed is an anisotropic rescaling,i.e.,an affinity (c).Bottom row shows close-up of the images of the toprow.Figure 2.Affine covariant regions offer a solution to viewpoint and illumination changes.First row:one viewpoint;second row:other viewpoint.(a)Original images,(b)detected affine covariant regions,(c)close-up of the detected regions.(d)Geometric normalization to circles.The regions are the same up to rotation.(e)Photometric and geometric normalization.The slight residual difference in rotation is due to an estimation error.covariant regions correspond to the same surface region.Given such an affine covariant region,it is then possible to normalize against the geometric and photometric deformations (shown in Fig.2(d),(e))and to obtain a viewpoint and illumination invariant description of the intensity pattern within the region.In a typical matching application,the regions are used as follows.First,a set of covariant regions is46Mikolajczyk et al.detected in an image.Often a large number,perhaps hundreds or thousands,of possibly overlapping regions are obtained.A vector descriptor is then asso-ciated with each region,computed from the intensity pattern within the region.This descriptor is chosen to be invariant to viewpoint changes and,to some extent, illumination changes,and to discriminate between the regions.Correspondences may then be established with another image of the same scene,byfirst detect-ing and representing regions(independently)in the new image;and then matching the regions based on their descriptors.By design the regions commute with viewpoint change,so by design,corresponding regions in the two images will have similar(ideally identical) vector descriptors.The benefits are that correspon-dences can then be easily established and,since there are multiple regions,the method is robust to partial occlusions.This paper gives a snapshot of the state of the art in affine covariant region detection.We will describe and compare six methods of detecting these regions on images.These detectors have been designed and implemented by a number of researchers and the comparison is carried out using binaries supplied by the authors.The detectors are:(i)the ‘Harris-Affine’detector(Mikolajczyk and Schmid, 2002,2004;Schaffalitzky and Zisserman,2002); (ii)the‘Hessian-Affine’detector(Mikolajczyk and Schmid,2002,2004);(iii)the‘maximally stable extremal region’detector(or MSER,for short)(Matas et al.,2002,2004);(iv)an edge-based region detec-tor(Tuytelaars and Van Gool,1999,2004)(referred to as EBR);(v)an intensity extrema-based region detector(Tuytelaars and Van Gool,2000,2004) (referred to as IBR);and(vi)an entropy-based region detector(Kadir et al.,2004)(referred to as salient regions).To limit the scope of the paper we have not included methods for detecting regions which are covariant only to similarity transformations(i.e.,in particular scale), such as(Lowe,1999,2004;Mikolajczyk and Schmid, 2001;Mikolajczyk et al.,2003),or other methods of computing affine invariant descriptors,such as image lines connecting interest points(Matas et al.,2000; Tell and Carlson,2000,2002),or invariant vertical line segments(Goedeme et al.,2004).Also the detectors proposed by Lindeberg and G˚a rding(1997)and Baum-berg(2000)have not been included,as they come very close to the Harris-Affine and Hessian-Affine detectors.The six detectors are described in Section2.They are compared on the data set shown in Fig.9.This data set includes structured and textured scenes as well as different types of transformations:viewpoint changes,scale changes,illumination changes,blur and JPEG compression.It is described in more detail in Section3.Two types of comparisons are carried out. First,in Section10,the repeatability of the detector is measured:how well does the detector determine cor-responding scene regions?This is measured by com-paring the overlap between the ground truth and de-tected regions,in a manner similar to the evaluation test used in Mikolajczyk and Schmid(2002),but with special attention paid to the effect of the different scales (region sizes)of the various detectors’output.Here, we also measure the accuracy of the regions’shape, scale and localization.Second,the distinctiveness of the detected regions is assessed:how distinguishable are the regions detected?Following(Mikolajczyk and Schmid,2003,2005),we use the SIFT descriptor de-veloped by Lowe(1999),which is an128-dimensional vector,to describe the intensity pattern within the im-age regions.This descriptor has been demonstrated to be superior to others used in literature on a number of measures(Mikolajczyk and Schmid,2003).Our intention is that the images and tests de-scribed here will be a benchmark against which fu-ture affine covariant region detectors can be assessed. The images,Matlab code to carry out the performance tests,and binaries of the detectors are available from /∼vgg/research/affine. 2.Affine Covariant DetectorsIn this section we give a brief description of the six re-gion detectors used in the comparison.Section2.1de-scribes the related methods Harris-Affine and Hessian-Affine.Sections2.2and2.3describe methods for detecting edge-based regions and intensity extrema-based regions.Finally,Sections2.4and2.5describe MSER and salient regions.For the purpose of the comparisons the output re-gion of all detector types are represented by a common shape,which is an ellipse.Figures3and4show the el-lipses for all detectors on one pair of images.In order not to overload the images,only some of the corre-sponding regions that were actually detected in both images have been shown.This selection is obtained by increasing the threshold.A Comparison of Affine Region Detectors47Figure3.Regions generated by different detectors on corresponding sub-parts of thefirst and third graffiti images of Fig.9(a).The ellipses show the original detection size.In fact,for most of the detectors the output shape is an ellipse.However,for two of the de-tectors(edge-based regions and MSER)it is not, and information is lost by this representation,as ellipses can only be matched up to a rotational degree of freedom.Examples of the original re-gions detected by these two methods are given in Fig.5.These are parallelogram-shaped regions for the edge-based region detector,and arbitrarily shaped regions for the MSER detector.In the following the representing ellipse is chosen to have the same first and second moments as the originally detected region,which is an affine covariant construction method.48Mikolajczyk etal.Figure 4.Regions generated by different detectors continued.2.1.Detectors Based on Affine Normalization—Harris-Affine &Hessian-AffineWe describe here two related methods which detect interest points in scale-space,and then determine an elliptical region for each point.Interest points are either detected with the Harris detector or with a detector based on the Hessian matrix.In both casesscale-selection is based on the Laplacian,and the shape of the elliptical region is determined with the second moment matrix of the intensity gradient (Baumberg,2000;Lindeberg and G˚a rding,1997).The second moment matrix,also called the auto-correlation matrix,is often used for feature detection or for describing local image structures.Here it is used both in the Harris detector and the ellipticalA Comparison of Affine Region Detectors49Figure5.Originally detected region shapes for the regions shown in Figs.3(c)and4(b). shape estimation.This matrix describes the gradientdistribution in a local neighbourhood of a point:M=µ(x,σI,σD)= µ11µ12µ21µ22=σ2D g(σI)∗I2x(x,σD)I x I y(x,σD)I x I y(x,σD)I2y(x,σD)(1)The local image derivatives are computed with Gaussian kernels of scaleσD(differentiation scale). The derivatives are then averaged in the neighbourhood of the point by smoothing with a Gaussian window of scaleσI(integration scale).The eigenvalues of this matrix represent two principal signal changes in a neighbourhood of the point.This property enables the extraction of points,for which both curvatures are significant,that is the signal change is significant in orthogonal directions.Such points are stable in arbitrary lighting conditions and are representative of an image.One of the most reliable interest point detectors,the Harris detector(Harris and Stephens, 1988),is based on this principle.A similar idea is explored in the detector based on the Hessian matrix:H=H(x,σD)=h11h12h21h22=I xx(x,σD)I xy(x,σD)I xy(x,σD)I yy(x,σD)(2)The second derivatives,which are used in this matrix give strong responses on blobs and ridges.The regions are similar to those detected by a Laplacian operator (trace)(Lindeberg,1998;Lowe,1999)but a function based on the determinant of the Hessian matrix penal-izes very long structures for which the second deriva-tive in one particular orientation is very small.A local maximum of the determinant indicates the presence of a blob structure.To deal with scale changes a scale selection method(Lindeberg,1998)is applied.The idea is to select the characteristic scale of a local structure, for which a given function attains an extremum over scales(see Fig.6).The selected scale is characteristic in the quantitative sense,since it measures the scale50Mikolajczyk etal.Figure 6.Example of characteristic scales.Top row shows images taken with different zoom.Bottom row shows the responses of the Laplacian over scales.The characteristic scales are 10.1and 3.9for the left and right image,respectively.The ratio of scales corresponds to the scale factor (2.5)between the two images.The radius of displayed regions in the top row is equal to 3times the selected scales.at which there is maximum similarity between the feature detection operator and the local image struc-tures.The size of the region is therefore selected in-dependently of image resolution for each point.The Laplacian operator is used for scale selection in both detectors since it gave the best results in the ex-perimental comparison in Mikolajczyk and Schmid (2001).Given the set of initial points extracted at their char-acteristic scales we can apply the iterative estimation of elliptical affine region (Lindeberg and G˚a rding,1997).The eigenvalues of the second moment matrix are used to measure the affine shape of the point neighbourhood.To determine the affine shape,we find the transforma-tion that projects the affine pattern to the one with equal eigenvalues.This transformation is given by the square root of the second moment matrix M 1/2.If the neigh-bourhood of points x R and x L are normalized by trans-formations x R =M 1/2R x R and x L =M 1/2L x L ,respec-tively,the normalized regions are related by a simple rotation x L =R x R (Baumberg,2000;Lindeberg andG˚a rding,1997).The matrices M Land M R computed in the normalized frames are equal to a rotation matrix (see Fig.7).Note that rotation preserves the eigen-value ratio for an image patch,therefore,the affine deformation can be determined up to a rotation fac-tor.This factor can be recovered by other methods,for example normalization based on the dominant gradi-ent orientation (Lowe,1999;Mikolajczyk and Schmid,2002).The estimation of affine shape can be applied to any initial point given that the determinant of the second moment matrix is larger than zero and the signal to noise ratio is insignificant for this point.We can there-fore use this technique to estimate the shape of initial regions provided by the Harris and Hessian based de-tector.The outline of the iterative region estimation:1.Detect initial region with Harris or Hessian detector and select the scale.2.Estimate the shape with the second moment matrix3.Normalize the affine region to the circular one4.Go to step 2if the eigenvalues of the second moment matrix for new point are not equal.Examples of Harris-Affine and Hessian-Affine re-gions are displayed on Fig.3(a)and (b).2.2.An Edge-Based Region DetectorWe describe here a method to detect affine covariant regions in an image by exploiting the edges present in the image.The rationale behind this is that edges are typically rather stable features,that can be detected over a range of viewpoints,scales and/or illumination changes.Moreover,by exploiting the edge geometry,the dimensionality of the problem can be significantly reduced.Indeed,as will be shown next,the 6D search problem over all possible affinities (or 4D,once theA Comparison of Affine Region Detectors51Figure 7.Diagram illustrating the affine normalization using the second moment matrices.Image coordinates are transformed with matrices M −1/2L and M −1/2R .center point is fixed)can further be reduced to a one-dimensional problem by exploiting the nearby edges geometry.In practice,we start from a Harris corner point p (Harris and Stephens,1988)and a nearby edge,extracted with the Canny edge detector (Canny,1986).To increase the robustness to scale changes,these basic features are extracted at multiple scales.Two points p 1and p 2move away from the corner in both directions along the edge,as shown in Fig.8(a).Their relative speed is coupled through the equality of relative affine invariant parameters l 1and l 2:l i =absp i (1)(s i )p −p i (s i ) ds i (3)with s i an arbitrary curve parameter (in both direc-tions),p i (1)(s i )the first derivative of p i (s i )with respectto s i ,abs()the absolute value and |...|the determi-nant.This condition prescribes that the areas between the joint p ,p 1 and the edge and between the joint p ,p 2 and the edge remain identical.This is an affine invariant criterion indeed.From now on,we simply use l when referring to l 1=l 2.For each value l ,the two points p 1(1)and p 2(1)together with the corner p define a parallelogram (l ):the parallelogram spanned by the vectors p 1(l )−p and p 2(l )−p .This yields a one dimensional family of parallelogram-shaped regions as a function of l .From this 1D family we select one (or a few)parallelogram for which the following photometric quantities of the texture go through an extremum.Inv 1=abs|p 1−p g p 2−p g ||p −p 1p −p 2| M 100 M 200M 000−(M 100)2Inv 2=abs|p −p g q −p g |1p −p 2 M 100 M 200M 000−(M 100)2Figure 8.Construction methods for EBR and IBR.(a)The edge-based region detector starts from a corner point p and exploits nearby edgeinformation;(b)The intensity extrema-based region detector starts from an intensity extremum and studies the intensity pattern along rays emanating from this point.52Mikolajczyk et al.with M n pq =I n (x ,y )x p y q dxdy(4)p g = M 110M 100,M 101M 100with M n pq the n th order ,(p +q )th degree momentcomputed over the region (l ),p g the center of gravity of the region,weighted with intensity I (x ,y ),and q the corner of the parallelogram opposite to the corner point p (see Fig.8(a)).The second factor in these formula has been added to ensure invariance under an intensity offset.In the case of straight edges,the method described above cannot be applied,since l =0along the entire edge.Since intersections of two straight edges occur quite often,we cannot simply neglect this case.To cir-cumvent this problem,the two photometric quantities given in Eq.(4)are combined and locations where both functions reach a minimum value are taken to fix the parameters s 1and s 2along the straight edges.Moreover,instead of relying on the correct detection of the Harris corner point,we can simply use the straight lines intersection point instead.A more detailed expla-nation of this method can be found in Tuytelaars and Van Gool (1999,2004).Examples of detected regions are displayed in Fig.5(b).For easy comparison in the context of this paper,the parallelograms representing the invariant regions are replaced by the enclosed ellipses,as shown in Fig.4(b).However,in this way the orientation-information is lost,so it should be avoided in a practical application,as discussed in the beginning of Section 2.2.3.Intensity Extrema-Based Region DetectorHere we describe a method to detect affine covariant regions that starts from intensity extrema (detected at multiple scales),and explores the image around them in a radial way,delineating regions of arbitrary shape,which are then replaced by ellipses.More precisely,given a local extremum in inten-sity,the intensity function along rays emanating from the extremum is studied,as shown in Fig.8(b).The following function is evaluated along each ray:f I (t )=abs (I (t )−I 0)max t 0abs (I (t )−I 0)dt t,d with t an arbitrary parameter along the ray,I (t )the intensity at position t ,I 0the intensity value at the ex-tremum and d a small number which has been added to prevent a division by zero.The point for which this function reaches an extremum is invariant under affine geometric and linear photometric transforma-tions (given the ray).Typically,a maximum is reached at positions where the intensity suddenly increases or decreases.The function f I (t )is in itself already in-variant.Nevertheless,we select the points where this function reaches an extremum to make a robust selec-tion.Next,all points corresponding to maxima of f I (t )along rays originating from the same local extremum are linked to enclose an affine covariant region (see Fig.8(b)).This often irregularly-shaped region is re-placed by an ellipse having the same shape moments up to the second order.This ellipse-fitting is again an affine covariant construction.Examples of detected re-gions are displayed in Fig.4(a).More details about this method can be found in Tuytelaars and Van Gool (2000,2004).2.4.Maximally Stable Extremal Region DetectorA Maximally Stable Extremal Region (MSER)is aconnected component of an appropriately thresholded image.The word ‘extremal’refers to the property that all pixels inside the MSER have either higher (bright extremal regions)or lower (dark extremal regions)intensity than all the pixels on its outer boundary.The ‘maximally stable’in MSER describes the property optimized in the threshold selection process.The set of extremal regions E ,i.e.,the set of all connected components obtained by thresholding,has a number of desirable properties.Firstly,a mono-tonic change of image intensities leaves E unchanged,since it depends only on the ordering of pixel intensi-ties which is preserved under monotonic transforma-tion.This ensures that common photometric changes modelled locally as linear or affine leave E unaffected,even if the camera is non-linear (gamma-corrected).Secondly,continuous geometric transformations pre-serve topology–pixels from a single connected compo-nent are transformed to a single connected component.Thus after a geometric change locally approximated by an affine transform,homography or even continuous non-linear warping,a matching extremal region will be in the transformed set E .Finally,there are no more extremal regions than there are pixels in the image.SoA Comparison of Affine Region Detectors53a set of regions was defined that is preserved under a broad class of geometric and photometric changes and yet has the same cardinality as e.g.the set offixed-sized square windows commonly used in narrow-baseline matching.Implementation Details.The enumeration of the set of extremal regions E is very efficient,almost linear in the number of image pixels.The enumeration pro-ceeds as follows.First,pixels are sorted by intensity. After sorting,pixels are marked in the image(either in decreasing or increasing order)and the list of growing and merging connected components and their areas is maintained using the union-find algorithm(Sedgewick, 1988).During the enumeration process,the area of each connected component as a function of intensity is stored.Among the extremal regions,the‘maximally stable’ones are those corresponding to thresholds were the relative area change as a function of relative change of threshold is at a local minimum.In other words, the MSER are the parts of the image where local bi-narization is stable over a large range of thresholds. The definition of MSER stability based on relative area change is only affine invariant(both photomet-rically and geometrically).Consequently,the process of MSER detection is affine covariant.Detection of MSER is related to thresholding,since every extremal region is a connected component of a thresholded image.However,no global or‘optimal’threshold is sought,all thresholds are tested and the stability of the connected components evaluated.The output of the MSER detector is not a binarized image. For some parts of the image,multiple stable thresholds exist and a system of nested subsets is output in this case.Finally we remark that the different sets of extremal regions can be defined just by changing the ordering function.The MSER described in this section and used in the experiments should be more precisely called intensity induced MSERs.2.5.Salient Region DetectorThis detector is based on the pdf of intensity values computed over an elliptical region.Detection proceeds in two steps:first,at each pixel the entropy of the pdf is evaluated over the three parameter family of el-lipses centred on that pixel.The set of entropy extrema over scale and the corresponding ellipse parameters are recorded.These are candidate salient regions.Second,the candidate salient regions over the entire image are ranked using the magnitude of the derivative of the pdf with respect to scale.The top P ranked regions are retained.In more detail,the elliptical region E centred on a pixel x is parameterized by its scale s(which specifies the major axis),its orientationθ(of the major axis), and the ratio of major to minor axesλ.The pdf of intensities p(I)is computed over E.The entropy H is then given byH=−Ip(I)log p(I)The set of extrema over scale in H is computed for the parameters s,θ,λfor each pixel of the image.For each extrema the derivative of the pdf p(I;s,θ,λ)with s is computed asW=s22s−1I∂p(I;s,θ,λ)∂s,and the saliency Y of the elliptical region is com-puted as Y=HW.The regions are ranked by their saliency Y.Examples of detected regions are displayed in Fig.4(c).More details about this method can be found in Kadir et al.(2004).3.The Image Data SetFigure9shows examples from the image sets used to evaluate the detectors.Five different changes in imag-ing conditions are evaluated:viewpoint changes(a) &(b);scale changes(c)&(d);image blur(e)&(f); JPEG compression(g);and illumination(h).In the cases of viewpoint change,scale change and blur,the same change in imaging conditions is applied to two different scene types.This means that the effect of changing the image conditions can be separated from the effect of changing the scene type.One scene type contains homogeneous regions with distinctive edge boundaries(e.g.graffiti,buildings),and the other con-tains repeated textures of different forms.These will be referred to as structured versus textured scenes re-spectively.In the viewpoint change test the camera varies from a fronto-parallel view to one with significant fore-shortening at approximately60degrees to the camera. The scale change and blur sequences are acquired by varying the camera zoom and focus respectively.。
lbp特征边缘算子
lbp特征边缘算子英文回答:Local binary patterns (LBPs) are a type of feature descriptor used in computer vision for texture classification. LBPs were first introduced by Ojala et al. in 1994, and have since become one of the most popular texture descriptors due to their simplicity, efficiency, and robustness.LBPs are calculated by comparing the values of a pixel with the values of its neighbors. The result is a binary string that represents the local texture pattern around the pixel. The binary string is then converted into a decimal number, which is the LBP code for that pixel.LBPs can be used to describe a variety of texture patterns, including uniform patterns, such as stripes and checkerboards, and non-uniform patterns, such as clouds and wood. LBPs are also invariant to rotation and scale, whichmakes them suitable for use in a variety of applications, such as object recognition and image retrieval.Edge operators are used in image processing to detect edges in an image. Edges are important features in an image, as they can be used to segment the image into different regions and to identify objects.There are many different edge operators, each with its own strengths and weaknesses. Some of the most popular edge operators include the Sobel operator, the Canny operator, and the Laplacian operator.The Sobel operator is a simple edge operator that usesa 3x3 kernel to calculate the gradient of an image. The gradient is a vector that points in the direction of the greatest change in intensity. The Sobel operator isefficient and easy to implement, but it is not verysensitive to noise.The Canny operator is a more complex edge operator that uses a 5x5 kernel to calculate the gradient of an image.The Canny operator is more sensitive to noise than theSobel operator, but it also produces more accurate edge detections.The Laplacian operator is a second-order edge operator that uses a 3x3 kernel to calculate the Laplacian of an image. The Laplacian is a measure of the second derivativeof the image intensity. The Laplacian operator is very sensitive to noise, but it can produce very sharp edge detections.中文回答:局部二值模式(LBP)是一种在计算机视觉中用于纹理分类的特征描述符。
A Discriminatively Trained, Multiscale, Deformable Part Model
A Discriminatively Trained,Multiscale,Deformable Part ModelPedro Felzenszwalb University of Chicago pff@David McAllesterToyota Technological Institute at Chicagomcallester@Deva RamananUC Irvinedramanan@AbstractThis paper describes a discriminatively trained,multi-scale,deformable part model for object detection.Our sys-tem achieves a two-fold improvement in average precision over the best performance in the2006PASCAL person de-tection challenge.It also outperforms the best results in the 2007challenge in ten out of twenty categories.The system relies heavily on deformable parts.While deformable part models have become quite popular,their value had not been demonstrated on difficult benchmarks such as the PASCAL challenge.Our system also relies heavily on new methods for discriminative training.We combine a margin-sensitive approach for data mining hard negative examples with a formalism we call latent SVM.A latent SVM,like a hid-den CRF,leads to a non-convex training problem.How-ever,a latent SVM is semi-convex and the training prob-lem becomes convex once latent information is specified for the positive examples.We believe that our training meth-ods will eventually make possible the effective use of more latent information such as hierarchical(grammar)models and models involving latent three dimensional pose.1.IntroductionWe consider the problem of detecting and localizing ob-jects of a generic category,such as people or cars,in static images.We have developed a new multiscale deformable part model for solving this problem.The models are trained using a discriminative procedure that only requires bound-ing box labels for the positive ing these mod-els we implemented a detection system that is both highly efficient and accurate,processing an image in about2sec-onds and achieving recognition rates that are significantly better than previous systems.Our system achieves a two-fold improvement in average precision over the winning system[5]in the2006PASCAL person detection challenge.The system also outperforms the best results in the2007challenge in ten out of twenty This material is based upon work supported by the National Science Foundation under Grant No.0534820and0535174.Figure1.Example detection obtained with the person model.The model is defined by a coarse template,several higher resolution part templates and a spatial model for the location of each part. object categories.Figure1shows an example detection ob-tained with our person model.The notion that objects can be modeled by parts in a de-formable configuration provides an elegant framework for representing object categories[1–3,6,10,12,13,15,16,22]. While these models are appealing from a conceptual point of view,it has been difficult to establish their value in prac-tice.On difficult datasets,deformable models are often out-performed by“conceptually weaker”models such as rigid templates[5]or bag-of-features[23].One of our main goals is to address this performance gap.Our models include both a coarse global template cov-ering an entire object and higher resolution part templates. The templates represent histogram of gradient features[5]. As in[14,19,21],we train models discriminatively.How-ever,our system is semi-supervised,trained with a max-margin framework,and does not rely on feature detection. We also describe a simple and effective strategy for learn-ing parts from weakly-labeled data.In contrast to computa-tionally demanding approaches such as[4],we can learn a model in3hours on a single CPU.Another contribution of our work is a new methodology for discriminative training.We generalize SVMs for han-dling latent variables such as part positions,and introduce a new method for data mining“hard negative”examples dur-ing training.We believe that handling partially labeled data is a significant issue in machine learning for computer vi-sion.For example,the PASCAL dataset only specifies abounding box for each positive example of an object.We treat the position of each object part as a latent variable.We also treat the exact location of the object as a latent vari-able,requiring only that our classifier select a window that has large overlap with the labeled bounding box.A latent SVM,like a hidden CRF[19],leads to a non-convex training problem.However,unlike a hidden CRF, a latent SVM is semi-convex and the training problem be-comes convex once latent information is specified for thepositive training examples.This leads to a general coordi-nate descent algorithm for latent SVMs.System Overview Our system uses a scanning window approach.A model for an object consists of a global“root”filter and several part models.Each part model specifies a spatial model and a partfilter.The spatial model defines a set of allowed placements for a part relative to a detection window,and a deformation cost for each placement.The score of a detection window is the score of the root filter on the window plus the sum over parts,of the maxi-mum over placements of that part,of the partfilter score on the resulting subwindow minus the deformation cost.This is similar to classical part-based models[10,13].Both root and partfilters are scored by computing the dot product be-tween a set of weights and histogram of gradient(HOG) features within a window.The rootfilter is equivalent to a Dalal-Triggs model[5].The features for the partfilters are computed at twice the spatial resolution of the rootfilter. Our model is defined at afixed scale,and we detect objects by searching over an image pyramid.In training we are given a set of images annotated with bounding boxes around each instance of an object.We re-duce the detection problem to a binary classification prob-lem.Each example x is scored by a function of the form, fβ(x)=max zβ·Φ(x,z).Hereβis a vector of model pa-rameters and z are latent values(e.g.the part placements). To learn a model we define a generalization of SVMs that we call latent variable SVM(LSVM).An important prop-erty of LSVMs is that the training problem becomes convex if wefix the latent values for positive examples.This can be used in a coordinate descent algorithm.In practice we iteratively apply classical SVM training to triples( x1,z1,y1 ,..., x n,z n,y n )where z i is selected to be the best scoring latent label for x i under the model learned in the previous iteration.An initial rootfilter is generated from the bounding boxes in the PASCAL dataset. The parts are initialized from this rootfilter.2.ModelThe underlying building blocks for our models are the Histogram of Oriented Gradient(HOG)features from[5]. We represent HOG features at two different scales.Coarse features are captured by a rigid template covering anentireImage pyramidFigure2.The HOG feature pyramid and an object hypothesis de-fined in terms of a placement of the rootfilter(near the top of the pyramid)and the partfilters(near the bottom of the pyramid). detection window.Finer scale features are captured by part templates that can be moved with respect to the detection window.The spatial model for the part locations is equiv-alent to a star graph or1-fan[3]where the coarse template serves as a reference position.2.1.HOG RepresentationWe follow the construction in[5]to define a dense repre-sentation of an image at a particular resolution.The image isfirst divided into8x8non-overlapping pixel regions,or cells.For each cell we accumulate a1D histogram of gra-dient orientations over pixels in that cell.These histograms capture local shape properties but are also somewhat invari-ant to small deformations.The gradient at each pixel is discretized into one of nine orientation bins,and each pixel“votes”for the orientation of its gradient,with a strength that depends on the gradient magnitude.For color images,we compute the gradient of each color channel and pick the channel with highest gradi-ent magnitude at each pixel.Finally,the histogram of each cell is normalized with respect to the gradient energy in a neighborhood around it.We look at the four2×2blocks of cells that contain a particular cell and normalize the his-togram of the given cell with respect to the total energy in each of these blocks.This leads to a vector of length9×4 representing the local gradient information inside a cell.We define a HOG feature pyramid by computing HOG features of each level of a standard image pyramid(see Fig-ure2).Features at the top of this pyramid capture coarse gradients histogrammed over fairly large areas of the input image while features at the bottom of the pyramid capture finer gradients histogrammed over small areas.2.2.FiltersFilters are rectangular templates specifying weights for subwindows of a HOG pyramid.A w by hfilter F is a vector with w×h×9×4weights.The score of afilter is defined by taking the dot product of the weight vector and the features in a w×h subwindow of a HOG pyramid.The system in[5]uses a singlefilter to define an object model.That system detects objects from a particular class by scoring every w×h subwindow of a HOG pyramid and thresholding the scores.Let H be a HOG pyramid and p=(x,y,l)be a cell in the l-th level of the pyramid.Letφ(H,p,w,h)denote the vector obtained by concatenating the HOG features in the w×h subwindow of H with top-left corner at p.The score of F on this detection window is F·φ(H,p,w,h).Below we useφ(H,p)to denoteφ(H,p,w,h)when the dimensions are clear from context.2.3.Deformable PartsHere we consider models defined by a coarse rootfilter that covers the entire object and higher resolution partfilters covering smaller parts of the object.Figure2illustrates a placement of such a model in a HOG pyramid.The rootfil-ter location defines the detection window(the pixels inside the cells covered by thefilter).The partfilters are placed several levels down in the pyramid,so the HOG cells at that level have half the size of cells in the rootfilter level.We have found that using higher resolution features for defining partfilters is essential for obtaining high recogni-tion performance.With this approach the partfilters repre-sentfiner resolution edges that are localized to greater ac-curacy when compared to the edges represented in the root filter.For example,consider building a model for a face. The rootfilter could capture coarse resolution edges such as the face boundary while the partfilters could capture details such as eyes,nose and mouth.The model for an object with n parts is formally defined by a rootfilter F0and a set of part models(P1,...,P n) where P i=(F i,v i,s i,a i,b i).Here F i is afilter for the i-th part,v i is a two-dimensional vector specifying the center for a box of possible positions for part i relative to the root po-sition,s i gives the size of this box,while a i and b i are two-dimensional vectors specifying coefficients of a quadratic function measuring a score for each possible placement of the i-th part.Figure1illustrates a person model.A placement of a model in a HOG pyramid is given by z=(p0,...,p n),where p i=(x i,y i,l i)is the location of the rootfilter when i=0and the location of the i-th part when i>0.We assume the level of each part is such that a HOG cell at that level has half the size of a HOG cell at the root level.The score of a placement is given by the scores of eachfilter(the data term)plus a score of the placement of each part relative to the root(the spatial term), ni=0F i·φ(H,p i)+ni=1a i·(˜x i,˜y i)+b i·(˜x2i,˜y2i),(1)where(˜x i,˜y i)=((x i,y i)−2(x,y)+v i)/s i gives the lo-cation of the i-th part relative to the root location.Both˜x i and˜y i should be between−1and1.There is a large(exponential)number of placements for a model in a HOG pyramid.We use dynamic programming and distance transforms techniques[9,10]to compute the best location for the parts of a model as a function of the root location.This takes O(nk)time,where n is the number of parts in the model and k is the number of cells in the HOG pyramid.To detect objects in an image we score root locations according to the best possible placement of the parts and threshold this score.The score of a placement z can be expressed in terms of the dot product,β·ψ(H,z),between a vector of model parametersβand a vectorψ(H,z),β=(F0,...,F n,a1,b1...,a n,b n).ψ(H,z)=(φ(H,p0),φ(H,p1),...φ(H,p n),˜x1,˜y1,˜x21,˜y21,...,˜x n,˜y n,˜x2n,˜y2n,). We use this representation for learning the model parame-ters as it makes a connection between our deformable mod-els and linear classifiers.On interesting aspect of the spatial models defined here is that we allow for the coefficients(a i,b i)to be negative. This is more general than the quadratic“spring”cost that has been used in previous work.3.LearningThe PASCAL training data consists of a large set of im-ages with bounding boxes around each instance of an ob-ject.We reduce the problem of learning a deformable part model with this data to a binary classification problem.Let D=( x1,y1 ,..., x n,y n )be a set of labeled exam-ples where y i∈{−1,1}and x i specifies a HOG pyramid, H(x i),together with a range,Z(x i),of valid placements for the root and partfilters.We construct a positive exam-ple from each bounding box in the training set.For these ex-amples we define Z(x i)so the rootfilter must be placed to overlap the bounding box by at least50%.Negative exam-ples come from images that do not contain the target object. Each placement of the rootfilter in such an image yields a negative training example.Note that for the positive examples we treat both the part locations and the exact location of the rootfilter as latent variables.We have found that allowing uncertainty in the root location during training significantly improves the per-formance of the system(see Section4).tent SVMsA latent SVM is defined as follows.We assume that each example x is scored by a function of the form,fβ(x)=maxz∈Z(x)β·Φ(x,z),(2)whereβis a vector of model parameters and z is a set of latent values.For our deformable models we define Φ(x,z)=ψ(H(x),z)so thatβ·Φ(x,z)is the score of placing the model according to z.In analogy to classical SVMs we would like to trainβfrom labeled examples D=( x1,y1 ,..., x n,y n )by optimizing the following objective function,β∗(D)=argminβλ||β||2+ni=1max(0,1−y i fβ(x i)).(3)By restricting the latent domains Z(x i)to a single choice, fβbecomes linear inβ,and we obtain linear SVMs as a special case of latent tent SVMs are instances of the general class of energy-based models[18].3.2.Semi-ConvexityNote that fβ(x)as defined in(2)is a maximum of func-tions each of which is linear inβ.Hence fβ(x)is convex inβ.This implies that the hinge loss max(0,1−y i fβ(x i)) is convex inβwhen y i=−1.That is,the loss function is convex inβfor negative examples.We call this property of the loss function semi-convexity.Consider an LSVM where the latent domains Z(x i)for the positive examples are restricted to a single choice.The loss due to each positive example is now bined with the semi-convexity property,(3)becomes convex inβ.If the labels for the positive examples are notfixed we can compute a local optimum of(3)using a coordinate de-scent algorithm:1.Holdingβfixed,optimize the latent values for the pos-itive examples z i=argmax z∈Z(xi )β·Φ(x,z).2.Holding{z i}fixed for positive examples,optimizeβby solving the convex problem defined above.It can be shown that both steps always improve or maintain the value of the objective function in(3).If both steps main-tain the value we have a strong local optimum of(3),in the sense that Step1searches over an exponentially large space of latent labels for positive examples while Step2simulta-neously searches over weight vectors and an exponentially large space of latent labels for negative examples.3.3.Data Mining Hard NegativesIn object detection the vast majority of training exam-ples are negative.This makes it infeasible to consider all negative examples at a time.Instead,it is common to con-struct training data consisting of the positive instances and “hard negative”instances,where the hard negatives are data mined from the very large set of possible negative examples.Here we describe a general method for data mining ex-amples for SVMs and latent SVMs.The method iteratively solves subproblems using only hard instances.The innova-tion of our approach is a theoretical guarantee that it leads to the exact solution of the training problem defined using the complete training set.Our results require the use of a margin-sensitive definition of hard examples.The results described here apply both to classical SVMs and to the problem defined by Step2of the coordinate de-scent algorithm for latent SVMs.We omit the proofs of the theorems due to lack of space.These results are related to working set methods[17].We define the hard instances of D relative toβas,M(β,D)={ x,y ∈D|yfβ(x)≤1}.(4)That is,M(β,D)are training examples that are incorrectly classified or near the margin of the classifier defined byβ. We can show thatβ∗(D)only depends on hard instances. Theorem1.Let C be a subset of the examples in D.If M(β∗(D),D)⊆C thenβ∗(C)=β∗(D).This implies that in principle we could train a model us-ing a small set of examples.However,this set is defined in terms of the optimal modelβ∗(D).Given afixedβwe can use M(β,D)to approximate M(β∗(D),D).This suggests an iterative algorithm where we repeatedly compute a model from the hard instances de-fined by the model from the last iteration.This is further justified by the followingfixed-point theorem.Theorem2.Ifβ∗(M(β,D))=βthenβ=β∗(D).Let C be an initial“cache”of examples.In practice we can take the positive examples together with random nega-tive examples.Consider the following iterative algorithm: 1.Letβ:=β∗(C).2.Shrink C by letting C:=M(β,C).3.Grow C by adding examples from M(β,D)up to amemory limit L.Theorem3.If|C|<L after each iteration of Step2,the algorithm will converge toβ=β∗(D)infinite time.3.4.Implementation detailsMany of the ideas discussed here are only approximately implemented in our current system.In practice,when train-ing a latent SVM we iteratively apply classical SVM train-ing to triples x1,z1,y1 ,..., x n,z n,y n where z i is se-lected to be the best scoring latent label for x i under themodel trained in the previous iteration.Each of these triples leads to an example Φ(x i,z i),y i for training a linear clas-sifier.This allows us to use a highly optimized SVM pack-age(SVMLight[17]).On a single CPU,the entire training process takes3to4hours per object class in the PASCAL datasets,including initialization of the parts.Root Filter Initialization:For each category,we auto-matically select the dimensions of the rootfilter by looking at statistics of the bounding boxes in the training data.1We train an initial rootfilter F0using an SVM with no latent variables.The positive examples are constructed from the unoccluded training examples(as labeled in the PASCAL data).These examples are anisotropically scaled to the size and aspect ratio of thefilter.We use random subwindows from negative images to generate negative examples.Root Filter Update:Given the initial rootfilter trained as above,for each bounding box in the training set wefind the best-scoring placement for thefilter that significantly overlaps with the bounding box.We do this using the orig-inal,un-scaled images.We retrain F0with the new positive set and the original random negative set,iterating twice.Part Initialization:We employ a simple heuristic to ini-tialize six parts from the rootfilter trained above.First,we select an area a such that6a equals80%of the area of the rootfilter.We greedily select the rectangular region of area a from the rootfilter that has the most positive energy.We zero out the weights in this region and repeat until six parts are selected.The partfilters are initialized from the rootfil-ter values in the subwindow selected for the part,butfilled in to handle the higher spatial resolution of the part.The initial deformation costs measure the squared norm of a dis-placement with a i=(0,0)and b i=−(1,1).Model Update:To update a model we construct new training data triples.For each positive bounding box in the training data,we apply the existing detector at all positions and scales with at least a50%overlap with the given bound-ing box.Among these we select the highest scoring place-ment as the positive example corresponding to this training bounding box(Figure3).Negative examples are selected byfinding high scoring detections in images not containing the target object.We add negative examples to a cache un-til we encounterfile size limits.A new model is trained by running SVMLight on the positive and negative examples, each labeled with part placements.We update the model10 times using the cache scheme described above.In each it-eration we keep the hard instances from the previous cache and add as many new hard instances as possible within the memory limit.Toward thefinal iterations,we are able to include all hard instances,M(β,D),in the cache.1We picked a simple heuristic by cross-validating over5object classes. We set the model aspect to be the most common(mode)aspect in the data. We set the model size to be the largest size not larger than80%of thedata.Figure3.The image on the left shows the optimization of the la-tent variables for a positive example.The dotted box is the bound-ing box label provided in the PASCAL training set.The large solid box shows the placement of the detection window while the smaller solid boxes show the placements of the parts.The image on the right shows a hard-negative example.4.ResultsWe evaluated our system using the PASCAL VOC2006 and2007comp3challenge datasets and protocol.We refer to[7,8]for details,but emphasize that both challenges are widely acknowledged as difficult testbeds for object detec-tion.Each dataset contains several thousand images of real-world scenes.The datasets specify ground-truth bounding boxes for several object classes,and a detection is consid-ered correct when it overlaps more than50%with a ground-truth bounding box.One scores a system by the average precision(AP)of its precision-recall curve across a testset.Recent work in pedestrian detection has tended to report detection rates versus false positives per window,measured with cropped positive examples and negative images with-out objects of interest.These scores are tied to the reso-lution of the scanning window search and ignore effects of non-maximum suppression,making it difficult to compare different systems.We believe the PASCAL scoring method gives a more reliable measure of performance.The2007challenge has20object categories.We entered a preliminary version of our system in the official competi-tion,and obtained the best score in6categories.Our current system obtains the highest score in10categories,and the second highest score in6categories.Table1summarizes the results.Our system performs well on rigid objects such as cars and sofas as well as highly deformable objects such as per-sons and horses.We also note that our system is successful when given a large or small amount of training data.There are roughly4700positive training examples in the person category but only250in the sofa category.Figure4shows some of the models we learned.Figure5shows some ex-ample detections.We evaluated different components of our system on the longer-established2006person dataset.The top AP scoreaero bike bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tvOur rank 31211224111422112141Our score .180.411.092.098.249.349.396.110.155.165.110.062.301.337.267.140.141.156.206.336Darmstadt .301INRIA Normal .092.246.012.002.068.197.265.018.097.039.017.016.225.153.121.093.002.102.157.242INRIA Plus.136.287.041.025.077.279.294.132.106.127.067.071.335.249.092.072.011.092.242.275IRISA .281.318.026.097.119.289.227.221.175.253MPI Center .060.110.028.031.000.164.172.208.002.044.049.141.198.170.091.004.091.034.237.051MPI ESSOL.152.157.098.016.001.186.120.240.007.061.098.162.034.208.117.002.046.147.110.054Oxford .262.409.393.432.375.334TKK .186.078.043.072.002.116.184.050.028.100.086.126.186.135.061.019.036.058.067.090Table 1.PASCAL VOC 2007results.Average precision scores of our system and other systems that entered the competition [7].Empty boxes indicate that a method was not tested in the corresponding class.The best score in each class is shown in bold.Our current system ranks first in 10out of 20classes.A preliminary version of our system ranked first in 6classes in the official competition.BottleCarBicycleSofaFigure 4.Some models learned from the PASCAL VOC 2007dataset.We show the total energy in each orientation of the HOG cells in the root and part filters,with the part filters placed at the center of the allowable displacements.We also show the spatial model for each part,where bright values represent “cheap”placements,and dark values represent “expensive”placements.in the PASCAL competition was .16,obtained using a rigid template model of HOG features [5].The best previous re-sult of.19adds a segmentation-based verification step [20].Figure 6summarizes the performance of several models we trained.Our root-only model is equivalent to the model from [5]and it scores slightly higher at .18.Performance jumps to .24when the model is trained with a LSVM that selects a latent position and scale for each positive example.This suggests LSVMs are useful even for rigid templates because they allow for self-adjustment of the detection win-dow in the training examples.Adding deformable parts in-creases performance to .34AP —a factor of two above the best previous score.Finally,we trained a model with partsbut no root filter and obtained .29AP.This illustrates the advantage of using a multiscale representation.We also investigated the effect of the spatial model and allowable deformations on the 2006person dataset.Recall that s i is the allowable displacement of a part,measured in HOG cells.We trained a rigid model with high-resolution parts by setting s i to 0.This model outperforms the root-only system by .27to .24.If we increase the amount of allowable displacements without using a deformation cost,we start to approach a bag-of-features.Performance peaks at s i =1,suggesting it is useful to constrain the part dis-placements.The optimal strategy allows for larger displace-ments while using an explicit deformation cost.The follow-Figure 5.Some results from the PASCAL 2007dataset.Each row shows detections using a model for a specific class (Person,Bottle,Car,Sofa,Bicycle,Horse).The first three columns show correct detections while the last column shows false positives.Our system is able to detect objects over a wide range of scales (such as the cars)and poses (such as the horses).The system can also detect partially occluded objects such as a person behind a bush.Note how the false detections are often quite reasonable,for example detecting a bus with the car model,a bicycle sign with the bicycle model,or a dog with the horse model.In general the part filters represent meaningful object parts that are well localized in each detection such as the head in the person model.Figure6.Evaluation of our system on the PASCAL VOC2006 person dataset.Root uses only a rootfilter and no latent place-ment of the detection windows on positive examples.Root+Latent uses a rootfilter with latent placement of the detection windows. Parts+Latent is a part-based system with latent detection windows but no rootfilter.Root+Parts+Latent includes both root and part filters,and latent placement of the detection windows.ing table shows AP as a function of freely allowable defor-mation in thefirst three columns.The last column gives the performance when using a quadratic deformation cost and an allowable displacement of2HOG cells.s i01232+quadratic costAP.27.33.31.31.345.DiscussionWe introduced a general framework for training SVMs with latent structure.We used it to build a recognition sys-tem based on multiscale,deformable models.Experimental results on difficult benchmark data suggests our system is the current state-of-the-art in object detection.LSVMs allow for exploration of additional latent struc-ture for recognition.One can consider deeper part hierar-chies(parts with parts),mixture models(frontal vs.side cars),and three-dimensional pose.We would like to train and detect multiple classes together using a shared vocab-ulary of parts(perhaps visual words).We also plan to use A*search[11]to efficiently search over latent parameters during detection.References[1]Y.Amit and A.Trouve.POP:Patchwork of parts models forobject recognition.IJCV,75(2):267–282,November2007.[2]M.Burl,M.Weber,and P.Perona.A probabilistic approachto object recognition using local photometry and global ge-ometry.In ECCV,pages II:628–641,1998.[3] D.Crandall,P.Felzenszwalb,and D.Huttenlocher.Spatialpriors for part-based recognition using statistical models.In CVPR,pages10–17,2005.[4] D.Crandall and D.Huttenlocher.Weakly supervised learn-ing of part-based spatial models for visual object recognition.In ECCV,pages I:16–29,2006.[5]N.Dalal and B.Triggs.Histograms of oriented gradients forhuman detection.In CVPR,pages I:886–893,2005.[6] B.Epshtein and S.Ullman.Semantic hierarchies for recog-nizing objects and parts.In CVPR,2007.[7]M.Everingham,L.Van Gool,C.K.I.Williams,J.Winn,and A.Zisserman.The PASCAL Visual Object Classes Challenge2007(VOC2007)Results./challenges/VOC/voc2007/workshop.[8]M.Everingham, A.Zisserman, C.K.I.Williams,andL.Van Gool.The PASCAL Visual Object Classes Challenge2006(VOC2006)Results./challenges/VOC/voc2006/results.pdf.[9]P.Felzenszwalb and D.Huttenlocher.Distance transformsof sampled functions.Cornell Computing and Information Science Technical Report TR2004-1963,September2004.[10]P.Felzenszwalb and D.Huttenlocher.Pictorial structures forobject recognition.IJCV,61(1),2005.[11]P.Felzenszwalb and D.McAllester.The generalized A*ar-chitecture.JAIR,29:153–190,2007.[12]R.Fergus,P.Perona,and A.Zisserman.Object class recog-nition by unsupervised scale-invariant learning.In CVPR, 2003.[13]M.Fischler and R.Elschlager.The representation andmatching of pictorial structures.IEEE Transactions on Com-puter,22(1):67–92,January1973.[14] A.Holub and P.Perona.A discriminative framework formodelling object classes.In CVPR,pages I:664–671,2005.[15]S.Ioffe and D.Forsyth.Probabilistic methods forfindingpeople.IJCV,43(1):45–68,June2001.[16]Y.Jin and S.Geman.Context and hierarchy in a probabilisticimage model.In CVPR,pages II:2145–2152,2006.[17]T.Joachims.Making large-scale svm learning practical.InB.Sch¨o lkopf,C.Burges,and A.Smola,editors,Advances inKernel Methods-Support Vector Learning.MIT Press,1999.[18]Y.LeCun,S.Chopra,R.Hadsell,R.Marc’Aurelio,andF.Huang.A tutorial on energy-based learning.InG.Bakir,T.Hofman,B.Sch¨o lkopf,A.Smola,and B.Taskar,editors, Predicting Structured Data.MIT Press,2006.[19] A.Quattoni,S.Wang,L.Morency,M.Collins,and T.Dar-rell.Hidden conditional randomfields.PAMI,29(10):1848–1852,October2007.[20] ing segmentation to verify object hypothe-ses.In CVPR,pages1–8,2007.[21] D.Ramanan and C.Sminchisescu.Training deformablemodels for localization.In CVPR,pages I:206–213,2006.[22]H.Schneiderman and T.Kanade.Object detection using thestatistics of parts.IJCV,56(3):151–177,February2004. [23]J.Zhang,M.Marszalek,zebnik,and C.Schmid.Localfeatures and kernels for classification of texture and object categories:A comprehensive study.IJCV,73(2):213–238, June2007.。
基于双目视觉机器人自定位与动态目标定位
基于双目视觉机器人自定位与动态目标定位卢洪军【摘要】Aiming at the fact that, the mobile robot based on binocular vision is very easy to be disturbed by the complex environment, such as the influence of noise, illumination change and the occlusion of the robot, which will seriously affect the positioning accuracy of the self localization and the moving objects, the color feature of the HSV model is proposed to accurately segment the artificial landmarks, and the robot position is determined according to the principle of parallax.A method was proposed based on Harris operator which is accurate to the position of a moving object in a complex environment.The dynamic object is detected by the frame difference method.Harris operator was used to extract the feature points on the moving objects, so as to obtain the disparity value, and then to calculate the position of the moving objects.The experimental results show that the self localization and target localization can overcome the external disturbance and have strong adaptability by using this method.The algorithm has good real-time performance.%针对基于双目视觉自定位与动态目标定位极易受复杂环境(如噪声、机器人发生遮挡、光照变化等)的干扰导致移动机器人定位精度低的问题,提出基于HSV颜色模型特征准确分割出人工路标,根据视差原理确定机器人位置.同时提出一种双目机器人基于Harris算子实现在复杂环境下对动态目标精确定位的方法,利用帧间差分法将运动目标检测出来,采用Harris算子在该运动目标上提取特征点,并获得视差值,从而精确的计算出运动目标的位置.实验结果表明,利用该方法进行自定位与目标定位能够克服外界干扰,具有较强的适应性,且算法的实时性好.【期刊名称】《沈阳大学学报》【年(卷),期】2017(029)001【总页数】6页(P37-42)【关键词】双目视觉;目标定位;Harris算子;帧间差分法;HSV模型【作者】卢洪军【作者单位】沈阳工业大学信息科学与工程学院, 辽宁沈阳 110870【正文语种】中文【中图分类】TP391.420世纪末,对目标定位技术主要有基于红外线的定位技术、基于超声波的定位技术和基于频射识别技术等[1].近年来,由于图像处理和计算机视觉的飞速发展,机器视觉的研究越来越受到广大专家和学者的青睐[2].双目立体视觉是机器视觉的一个重要分支,能够直接模仿人类双眼处理外界环境[3],可以代替人类完成危险的工作(如深海探测、火灾救援、核泄漏监测等)[4].而基于双目立体视觉对动态目标检测与定位也是机器视觉领域备受关注的前沿课题之一[5].双目立体视觉定位主要分为六个步骤[6]:①图像获取;②图像预处理;③摄像机标定;④特征点提取;⑤特征点的立体匹配获取视差值;⑥基于视差原理实现机器人定位.特征点提取和立体匹配是实现机器人定位的关键环节.通常的方法是依靠目标的形状、颜色等特征检测目标,并以运动物体的形心或中心作为特征点[7].该方法虽然计算简单,但极易受噪声干扰,只选择一个点作为特征点,一旦该特征点发生遮挡或光照变化等,都会严重影响定位精度.1977年,Moravec提出根据图像的灰度变化来提取图像角点,称为Moravec角点[8].该方法计算相对简单,但对于处于边缘上的点会存在误检,也极易受光照变化的影响.SIFT特征点[9]和CenSurE特征点[10]虽然对尺度、亮度变化不敏感,但在弱纹理等复杂情况下难以提取稳定的特征点,算法复杂度高,计算时间较长.不满足移动机器人对实时性的要求.针对以上缺陷,本文首先利用帧间差分法检测出运动目标,然后在运动目标上基于Harris算法提取多个特征点来实现移动机器人在复杂环境下实时的对运动目标精确定位.机器人整体定位流程如图1所示,移动机器人首先基于HSV颜色模型空间分割出人工路标,实现机器人自定位.然后利用帧间差分法检测出运动目标,根据Harris算法在左右两幅图像上提取特征点,根据区域匹配原理获取视差值,利用视差原理即可求出运动目标的世界坐标,即完成了对运动目标的定位.1.1 人工路标检测(1) HSV颜色模型.RGB色彩空间分别以红色、绿色、蓝色为三原色,通过适当的搭配可以合成成千上万种颜色,是一种常见的颜色表示法.但是RGB色彩空间与人眼的感知差异大,其空间的相似不代表实际颜色的相似.为了能够更准确分割出人工路标,本文采用HSV色彩空间颜色模型,如图2所示.RGB颜色空间转化到HSV色彩空间只是一个简单的非线性变换,计算简单.HSV模型中H代表色调,S代表饱和度,并且独立于亮度信息V.色调H代表颜色信息,取值范围为0~180°,对其设定阈值可以区分不同颜色的路标;饱和度S表示颜色中掺杂白色的程度,取值范围为0~1,S 越大,颜色越深;亮度V表示颜色的明暗程度,取值范围为0~1,V越大,物体亮度越高.(2) 基于颜色特征提取人工路标.由于本文是在室内环境下对移动机器人定位,所以本文设计的人工路标是由红黄蓝三种颜色组成的矩形纸板.如图3a所示为左摄像机拍摄到的带有人工路标的室内环境.根据HSV颜色模型对H、S、V三个分量进行阈值设置即可分割出人工路标,如图3b所示.然后利用图像处理中的形态学操作对分割出的路标进行完善使其效果最佳,如图3c所示.图3d为获取人工路标的中心点,利用视差原理即可得到当前帧机器人的位置.1.2 帧间差分法帧间差分法[11]的思想是对一段连续视频的相邻两帧进行差分运算,从差分运算的结果中得到运动目标的轮廓.该算法的优点是实现简单,对光照变化不敏感,稳定性好.适用于多目标或背景变化较快的场合.图4为在室内环境下用帧间差分法检测到运动物体.结果显示,帧间差分法能够有效的将运动目标检测出来.2.1 双目立体视觉测距原理双目立体视觉的视差原理[12]是利用两台摄像机从两个视点观察同一景物,以获取在不同视角下的感知图像,通过计算空间点在两幅图像中的视差来获取目标物体的三维坐标.2.2 Harris角点检测Harris角点[13]是在Moravec角点的基础进行改进的算法. Harris算子是用高斯函数代替二值窗口函数, 对离中心点越远的像素赋予越小的权重, 以减少噪声的影响. 高斯函数如式(1)所示.Moravec算子只考虑了四个方向的像素值,Harris算子则用Taylor展开式去近似任意方向.图像I(x,y)平移(Δx,Δy)可以一阶近似为在图像I(u,v)中,像点(u,v)平移(Δx,Δy)后的自相关函数为将式(2)代入式(3)可得:其中M如下所示:根据式(5)中矩阵M的特征值可以近似的表示函数C(x,y)的变化特征.矩阵M的特征值需要考虑以下三种情况,如图5所示.(1) 如果矩阵M的两个特征值都比较小,则表征图像灰度变化函数C(x,y)的值也较小,就说明该像素点的邻域内灰度差值不大,图像较平滑,无角点.(2) 如果矩阵M的两个特征值一个较大,一个较小,说明该像素点的曲率也是如此,则该点的窗口区域处于一条边界,无角点.(3) 如果矩阵M的两个特征值都比较大,则图像灰度变化的自相关函数值也较大,该点的窗函数沿任意方向都将引起灰度的剧烈变化,该点即为角点.根据这一准则,只要计算行列式的特征值就可以检测图像中的哪些点是角点.Harris 提出角点的响应函数:det(M)为行列式的值,trace(M)为行列式的迹.而k根据Harris的建议一般取0.04~0.06之间.若Harris角点响应大于阈值,则被认为是角点.Harris角点的生成只涉及到一阶导数,所以该角点对噪声影响、光照变化引起的灰度值变化都不敏感,是一种比较稳定的特征提取算子.3.1 实验环境本文使用的机器人是由北京博创兴盛技术有限公司开发的自主移动机器人旅行家Ⅱ号,如图6所示.该机器人上安装了由加拿大Point Grey Research公司生产的Bumblebee2双目摄像机,其性能参数如表1所示.3.2 传统移动机器人对运动目标定位实验环境为一间办公室,装有双目摄像机Bumblebee2的移动机器人为工作机器人,用于检测运动目标.将另一台机器人视为运动目标,运动速度为0.1 m/s.传统的方法是提取运动目标的中心点,获取视差值,从而给运动目标定位.传统方法仅获取图像中的一个点作为立体匹配的特征点,当该点受到环境的干扰时定位精度会受到极大的影响,图7为传统方法提取运动目标中心点.表2为传统方法对运动目标定位的实验数据,表3为改变光照后传统方法移动机器人对运动目标定位的实验数据.可以得出传统方法机器人定位误差相对较大,一旦光照发生改变,对运动物体定位误差会更加严重.3.3 基于Harris算子机器人对运动目标定位针对传统方法定位精度不足,极易受外界环境的干扰的问题,决定采用基于Harris角点特征提取,即在相机获得的左右两幅图像上基于Harris算子提取多对特征点,如图8所示.表4、表5为基于Harris方法机器人对运动目标定位的实验数据,可以得出基于该方法对运动目标定位误差很小,相对误差降低到1%左右,当光照发生变化时也能实现对运动目标精确定位.最后将每一帧的两幅图像根据区域匹配原理[14]和极限束准则找到正确的匹配点,排出易受噪声干扰的点,从而得到视差值,即可准确的对运动目标定位.(1) 本文研究了机器人基于双目立体视觉实现自定位与对运动目标定位,充分利用双目视差原理,并结合Harris算法和帧间差分法来实现运动目标的精确定位.从仿真结果可以看出,提取多个特征点可以避免只用一个点易受干扰的不足,实现更精确的运动目标定位.(2) 虽然本文在运动目标上提取多个特征点,有效的克服了传统方法的不足.但还存在问题需要改进.首先,需要找到一种更快更准确的特征点立体匹配算法;其次,本文只是将每一帧图像得到的多个视差值做平均值处理,如何有效的将多个视差值融合也是对运动目标精确定位的关键.【相关文献】[1] 李艳. 双视频目标定位技术[J]. 沈阳大学学报(自然科学版), 2016,28(4):302-305. (LI Y. Dual video target location technology[J]. Journal of Shenyang University(Natural Science), 2016,28(4):302-305.)[2] 李天健. 基于机器人视觉的汽车零配件表面缺陷检测算法研究与实现[J]. 沈阳大学学报(自然科学版), 2013,25(6):476-480. (LI T J. Research and implementation of auto parts surface defect detection algorithm bases on robot visio[J]. Journal of Shenyang University (Natural Science), 2013,25(6):476-480.)[3] 杜宇. 三维重建中双目立体视觉关键技术的研究[D]. 哈尔滨:哈尔滨理工大学, 2014:1-5. (DU Y. Research on key technology of binocular stereo vision in three-dimensional reconstruction[D]. Harbin:Harbin University of Science and Technology, 2004:1-5.)[4] 余俊. 基于双目视觉的机器人目标检测与控制研究[D]. 北京:北京交通大学, 2011:1-4. (YU J. Research on target detection and robot control based on binocular vision[D]. Beijing: Beijing Jiaotong University, 2011:1-4.)[5] DESOUZA G N, KAK A C. Vision for mobile robot navigation: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002,24(2): 237-267.[6] 高栋栋. 基于双目立体视觉定位和识别技术的研究[D]. 秦皇岛:燕山大学, 2013:9-11. (GAO D D. Research on recognizing and locating binocular stereo vision technology[D]. Qinhuangdao:Yanshan University, 2013:9-11)[7] 崔宝侠,栾婷婷,张弛,等. 基于双目视觉的移动机器人运动目标检测与定位[J]. 沈阳工业大学学报, 2016,38(4):421-427. (CUI B X, LUAN T T, ZHANG C, et al. Moving object detection and positioning of robot based on binocular vision[J]. Journal of Shenyang University of Technology, 2016,38(4):421-427.)[8] 邓国栋. 基于多尺度特征的双目立体视觉目标定位[D]. 哈尔滨:哈尔滨工业大学, 2012: 21-22. (DENG G D. Object location of binocular stereo vision base on multi-scale feature[D]. Harbin: Harbin Institute of Technology, 2012:21-22.)[9] LOWE D G. Distinctive image feature from scale-invariant key point[J]. International Journal of Computer Vision, 2004,60(2):91-110.[10] KONOLIGE K, AGRAWAL M, SOLA J. Large-scale visual odometry for roughterrain[C]∥Robotics Research: The 13thInternational Symposium ISRR, 2011,66:201-212.[11] 熊英. 基于背景和帧间差分法的运动目标提取[J]. 计算机时代, 2014(3):38-41. (XIONG Y. Moving object extraction based on background difference and frame differencemethod[J]. Computer Era, 2014(3):38-41.)[12] 林琳. 机器人双目视觉定位技术研究[D]. 西安:西安电子科技大学, 2009:8-10. (LIN L. The research of visual positioning technology on the binocular robot[D]. Xi’an: Xi'an Electronic and Science University, 2009:8-10.)[13] 张从鹏,魏学光. 基于Harris角点的矩形检测[J]. 光学精密工程, 2014,22(8):2259-2266. (ZHANG C P, WEI X G. Rectangle detection base on Harris corner[J]. Optics and Precision Engineering, 2014,22(8):2259-2266.)[14] 罗桂娥. 双目立体视觉深度感知与三维重建若干问题研究[D]. 长沙:中南大学, 2012:48-53. (LUO G E. Some issues of depth perception and three dimension reconstruction from binocular stereo vision[D]. Changsha: Central South University, 2012:48-53.)。
Local scale-invariance in disordered systems
Recall that physical ageing as it is understood here comes from reversible microscopic processes, whereas chemical or biological ageing may come from the action of essentially irreversible (bio-)chemical processes.
1 Introduction
Understanding cooperative phenomena far from equilibrium poses one of the most challenging research problems of present-day many-body physics. At the same time, the practical handling of many of these materials has been pushed to great sophistication, and a lot of practical knowledge about them exists since prehistoric times. Paradigmatic examples of such system are glasses. In many cases, they are made by rapidly cooling (‘quenching’) a molten liquid to below some characteristic temperature-threshold. If this cooling happens rapidly enough, normal crystallization no longer takes place and the material remains in some non-equilibrium state. These non-equilibrium states may at first and even second sight look very stationary – everyone has probably seen in archaeological museums intact specimens of Roman glass or even older tools from the Paleolithic or old-stone-age – after all, obsidian or fire-stone is a quenched volcanic melt. But since the material is not at equilibrium, at least in principle it is possible (and it does happen very often in practice) that over time the properties of the material change - in other words, the material ages.1 The properties of such non-equilibrium systems depend on the time – their age – since they were brought out of equilibrium and this is colloquially referred to as ageing behaviour.
特征点筛选法
特征点筛选法(中英文实用版)Title: Feature Point Selection Method中文标题:特征点筛选法Section 1: Introduction英文段落:The feature point selection method is a technique commonly used in image processing and computer vision to identify distinct points that represent important features in an image.These points are crucial for various applications such as image recognition, object detection, and image registration.The main objective of this method is to accurately detect and extract these feature points, which can then be used to perform further processing tasks.中文段落:特征点筛选法是一种在图像处理和计算机视觉领域常用的技术,用于识别代表图像中重要特征的独特点。
这些点对于各种应用至关重要,如图像识别、目标检测和图像配准。
本方法的主要目标是准确检测和提取这些特征点,然后可使用它们执行进一步的处理任务。
Section 2: Feature Detection英文段落:Feature detection is the first step in the feature point selection method.It involves identifying points in an image that have distinctproperties or characteristics.This can be achieved using various algorithms such as SIFT (Scale-Invariant Feature Transform), SURF (Speeded Up Robust Features), or ORB (Oriented FAST and Rotated BRIEF).These algorithms analyze the image intensities and gradients to detect points with high contrast and uniqueness.中文段落:特征检测是特征点筛选方法的第一步。
Selection of Invariant Objects With a Data-Mining Approach
Selection of Invariant Objects With a Data-Mining ApproachAndrew Kusiak,Member,IEEEAbstract—An approach based on data-mining for identifying in-variant objects in semiconductor applications is presented.An in-variant object represents a set of parameter(feature)values of a process and the corresponding outcome,e.g.,process quality.The key characteristic of an invariant object is that its outcome can be accurately predicted in the changing data environment.One of the most powerful applications of invariant objects involves the generation of robust settings for controllers in a multiparameter process.The prediction accuracy of such robust settings should be invariant in time,features,and data form.The notion of time-in-variant objects refers to objects for which the prediction accuracy is not affected by time.Analogous to time-invariance,objects can be invariant in the features and the data form.The former implies that the prediction accuracy of a set of objects is not impacted by the set of features selected from the same data set.The outcomes of data-form invariant objects prevail despite the change in the data form(data transformation).The use of data transformation methods defined in this paper is twofold:first,to identify the in-variance of objects and secondly,to enhance prediction accuracy. The concepts presented in this paper are illustrated with a numer-ical example and two semiconductor case studies.Index Terms—Data mining,form invariance,invariant objects, knowledge discovery,parameter invariance,process control,time invariance.I.I NTRODUCTIONH ARNESSING the value of the growing volume of dataoffers an opportunity to improve the quality of decision-making.Effective decision-making in a data-intensive environ-ment is likely to define future business activities.In this paper,data mining is applied for the selection of in-variant objects that are invariant in time,features,and data form. Other types of object-invariance are possible and therefore are referred to as X-invariant objects.The notion of time-invariant objects refers to objects for which the prediction accuracy is not affected by time.The concept of time-invariant objects is par-tially based on the time-invariant association rules discussed in [1].Analogous to time-invariance,objects can be invariant in the features and the data form.The former implies that the pre-diction accuracy for a set of objects is not affected by the set of features selected from the same data set.The outcomes of data-form invariant objects prevail despite the change in the data form(data transformation).The invariant objects have a multitude of applications.The feature values corresponding to the time-invariant objects areManuscript received November23,2003;revised August31,2004.The author is with the Intelligent Systems Laboratory,Department of Me-chanical and Industrial Engineering,The University of Iowa,Iowa City,IA 52242-1527USA(e-mail:andrew-kusiak@).Digital Object Identifier10.1109/TEPM.2005.846832used as parameter settings(called process control signatures)in process control.Feature-and data-invariant objects are useful in dealing with missing and incompatible data sets used to derive protocols for disease diagnosis and treatment,process plans,and so on.The recent advances in data mining have produced algo-rithms for extracting knowledge contained in large data sets. This knowledge can be explicit,e.g.,represented as decision rules,and utilized for decision-making in areas where decision models do not exist.Machine-learning algorithms construct relationships among various parameters(features)that can be used to control processes.A typical process,e.g.,a wafer production process,may involve stages that are either not well understood or controlled based on an insufficient number of parameters.An important advantage of the data-mining approach is that models are built using the data collected during normal process operations.Machine-learning algorithms that form the basis of this paper are discussed in the next section.A.Machine-Learning AlgorithmsThe learning systems of potential interest to this research fall into the following eight categories:A)Classical statistical methods(e.g.,linear discriminant,quadratic discriminant,and logistic discriminant analyses[2]).These methods are concerned with classificationproblems using a discriminant function for scoring the data.The construction of a discriminant function involves selection of weights.Once constructed,the discriminant function is used to predict decisions for unclassified objects[3].B)Modern statistical techniques(e.g.,projection pursuitclassification,densityestimation,-nearest neighbor,ca-sual networks,Bayes theorem[4]).As a new science,data mining borrows from other areas,including statistics.The techniques listed in this category involve different uses of probability to group objects or predict outcomes.C)Neural networks(e.g.,backpropagation,Kohonen,linearvector quantifiers,and radial function networks[2]).Neural networks represent a“black box”learning and decision-making tool.In the process of training a neural network,e.g.,by a backpropagation algorithm,weights are assigned to the neural connectors.The efficiency and prediction accuracy of a neural network is greatly affected by the network type(e.g.,Kohonen)and the network architecture(e.g.,a three-layer network).1521-334X/$20.00©2005IEEED)Support vector machines(SVMs)[4],[6].SVMs are al-gorithms for learning and classifying data using a ually they work with transformed data in the space larger than the original space.SVMs perform best when classifying objects with two outcome values(deci-sions).E)Decision-tree methods(e.g.,ID3[7],CN2[8],C4.5[9],T2[10],lazy decision trees[11],OODG[12],OC1[13], AC,BayTree,CAL5,CART,ID5R,IDL,TDIDT,and PROSM(discussed in[2])).The decision-tree algorithms are widely known in industry.A decision-tree algorithm extracts a decision tree from the data,often based on the entropy-related function.The tree contains explicit knowledge that can be easily interpreted by a user.F)Decision-rule algorithms(e.g.,AQ15[14],[15],LERS[16],and numerous other algorithms based on the roughset theory[17],[18]).Though the knowledge contained in the decision tree can be transformed into rules,deci-sion-rule algorithms make a separate class of machine-learning algorithms due to different principles of knowl-edge extraction,e.g.,the rough set theory.G)Learning classifier systems(LCSs)(e.g.,GOFFER-1[19],MonaLysa[20],and XCS[21]).An LCS evolves ageneralized stimulus-response representation of an envi-ronment into decision rules(classifiers).LCSs belong toa class of reinforcement algorithms.F)Association-rule algorithms(e.g.,DB2IntelligentMiner[22]).This class of algorithm is widely used in segmentingdata according to predefined parameters,strength,sup-port,and confidence.The data are usually not labeled(i.e.,no decision feature is present).Retail sale andmarketing applications of association-rule algorithms are most often cited.Lim[23]presented a comprehensive comparative study of more than30learning algorithms of category A,B,C,E,and F. The background of category D learning algorithms is provided in[24].The class G algorithms are discussed in[25].The class G algorithms were initiated by Holland[26]and Goldberg[27], and expanded by Butz[28].Class H and many other algorithms are discussed in[29].II.T HE C ONCEPT OF I NV ARIANT O BJECTSThe topic of invariant objects has not been sufficiently researched in the data-mining literature.Most data-mining projects have naturally concentrated on the knowledge dis-covery process.In this section,the concept of invariant objects is illustrated with an example involving feature-and data form-invariant objects.The approach used to derive these objects is based on the assumption that in any data set there may exist a set of objects resulting in the desired decisions,and some feature values of these objects may prevail in spite of the changes in the data form and the features.A.How to Construct Invariant ObjectsMachine-learning algorithms extract decision rules that are supported by the objects from a training data set.The signifi-cance of a decision rule can be measured with the numberof Fig.1.The rule set extracted from the data set in Table I.TABLE II LLUSTRATIVE D ATA SETsupporting objects,which is called rule support.The quality of the knowledge extracted by a learning algorithm is measured by the prediction accuracy,which in turn can be assessed by cross-validation.One of the cross-validation schemes(called theone-out-of scheme)is based on the selection of one object at a time,extraction of the knowledge from theremaining1ob-jects,and checking whether the decision associated with this ob-ject agrees with the decision predicted by the extracted knowl-edge based the feature values of the object[30].This process is repeated forall objects of the data set.The percentage of correctly predicted decisions is called prediction accuracy(for continuous outcomes)and classification accuracy(for discrete outcomes).Assume that training data sets have been collected from an application of interest and theone-out-of cross-validation scheme has been performed for each of these data sets,where every data set DS produces aset of objects with correctly predicteddecisions.The product of allsets,,is denoted as S.The objects in set S are invariant to the data vari-ability inherent to the training data sets.It is proposed that the feature values corresponding to the objects in the set S be used to control a process,i.e.,it becomes a process control signature. The ideas discussed above are illustrated in the example pre-sented next.1)Illustrative Example:Consider the data set in Table I with seven examples,four features,and thedecision(nega-tive),Z(zero),and P(positive).The goal is to identify invariant objects with theoutcome.It is natural to hypothesize that the invariant objects could be potentially associated with the rules having the strongest support.The data of Table I and numerous other computational studies do not point to a mean-ingful association between the invariant objects and the strong support rules.Consider Rule3in Fig.1,which was extracted from the data set in Table I.It has the strongest support(the objects3,5,7)of all the rules in Fig.1.To derive invariant objects leading to thedecision, the followingfive data sets DS,)have been con-sidered:1)DS“As-is”data set(the same as in Table I);KUSIAK:SELECTION OF INV ARIANT OBJECTS WITH DATA-MINING APPROACH 189TABLE IIT HE R ESULTS OF THE O NE -O UT -OF n (=7)C ROSS -V ALIDATION FOR F IVE D ATA S ETS DS 0DS O RIGINATING F ROM THE D ATA S ET IN T ABLEI2)DS DS with feature F5removed (to demonstrate feature invariance);3)DSDS with the feature set F2_F3and the feature F4discretized into three intervalsint,int ,andint ,cor-responds to the column “Disc F3”in Table II (to demon-strate data form invariance);4)DSDS with features F2and F3removed (to demon-strate data form and feature invariance);5)DSDS with feature F5removed (to demonstrate data form and feature invariance).Theone-out-of cross-validation scheme was ap-plied to each of the five sets,i.e.,one object at a time was re-tained as a test object and rules were extracted from the re-maining six objects (see [30]for the details of cross-validation).This process was repeated seven times for each of the five data sets.The results of the cross-validation are shown in Table II,where entry“”denotes that the corresponding outcome has been correctly predicted,“”indicates a prediction error,and “?”indicates that the decision could not be made.The high-lighted entries in Table II denote the objectswith.For the two objects 3and 5,out of the three highlighted objects inTable II,theoutcomeshave been correctly predicted.For object 7,decisions could not be reached for data setsDS and DS ,and for sets DS andDS erroneous decisions have been predicted.This simple computational experiment shows that the out-come of one of the three highlighted objects in Table II with theoutcome,object 7,could not be predicted over all five data sets.The rules generated from two data sets,DS andDS ,have produced incorrect decisions,and for the other two data sets,DS and DS ,no conclusive decision could be reached.Only one data set,DS ,has produced the correct decision.The outcomes of objects 3and 5have been correctly predicted for all five data sets.The latter indicates that not all objects cor-responding to the rules with the strongest support guarantee the highest predictability.The results produced in Table II could be used in numerous ways.One way,discussed later in this paper,is to control an industrial process by generating a control signature involving the feature values associated with the two objects 3and 5shown in Table III.Note that for feature F4in Table III,both continuous and discrete values have been provided.The control signature derived from the two objects 3and 5of Table III iscs .TABLE IIIT HE O BJECTS I MPACTING THE C ONTROL SIGNATUREThe base data set used to derive this control signature providesbounds on the value of F4.Due to measurement errors,noise,and other factors,the con-trol signatures may provide a range of control parameters that need to be tested.The feature values provided by the two objects 3and 5of Table III have been derived based on full cross-vali-dation over five data sets of Table II.The classi fication accuracy of this control signature therefore is 100%.One could form a control signature based on Rule 3of Fig.1,cs.This control signature would include only one feature value,i.e.,.In contrast,the signature constructed from the two objects 3and 5of Table III includes all features but feature F1(note that F2and F3are represented as feature set F2_F3).The classi fication quality (see [17]and [31])of thisfeature isCQ.It is known that a feature with zero value of classi fication quality does not individually contribute to the prediction accuracy.Based on the data of the illustrative example,it has been shown that the objects corresponding to the rules with strongest support do not guarantee the highest prediction accuracy over different variations of data generated from the same process.This can be explained by the fact that the decision rules derived from a data set represent a set of objects that are similar in some features.The decision rules are generally numerous and are in-tended to make predictions based on the entire rule set,given an object with an unknown outcome rather than to recommend speci fic feature values,e.g.,a control signature.Rather than following the approach based on individual deci-sion rules,this paper aims at the determination of objects of high prediction accuracy that are invariant across multiple training data sets of interest.Such a collection of objects contains a range of feature values that can be used as the settings of process con-trol parameters.To reduce the scope of data analysis,the case considered in the illustrative example has the following limitations.a)A small number of objects.b)A small number of features.c)The classi fication quality for the four features in Table I is as follows:CQ ,CQ ,CQ,CQ .The small size of the data set [items a)and b)above]allowed for the concise presentation of the ideas discussed in this paper.The extreme values of the classi fication quality [item c)above]for the two featuresCQandCQ were used to demonstrate the association between the classi fication quality and the support of decision rules.The size limitation of the data set in the illustrative example has been addressed in the case study presented in Section IV .The next section introduces evolving data sets and provides the necessary de finitions used in the remainder of this paper.190IEEE TRANSACTIONS ON ELECTRONICS PACKAGING MANUFACTURING,VOL.28,NO.2,APRIL 2005III.E VOLVING D ATA S ETSMost data-mining projects assume that training data sets are static and do not take into account that data evolve in time.Re-cently,the problem of mining evolving data sets has received some attention,and incremental model maintenance algorithms have been developed (see [32]and [33]).These algorithms are designed to incrementally maintain a data-mining model under arbitrary insertions and deletions of data objects.However,they do not examine the nature of changes and trends of the data values.In many applications,the feature values change rather systematically,e.g.,the vibration of a jet engine rotor increases in time due to the aging of the material,or a patient ’s blood pres-sure may increase with his/her age.Rather than the direct mining of data sets,this paper investi-gates the mining of transformed data sets.The transformed data sets may result in rules that offer new qualities,e.g.,lead to the discovery of time invariant objects.The following notation is useful in de fining methods for the analysis of data sets,called data evaluation methods,and is used in this paper.Data set index (the time period in which a data set has beencollected,data formindex,featuresetindex),.T Collection of data sets.DSData set,.Feature of data set DS .Featureindex,,.Set of features for data set DS,.avAverage offeature of data set DS .varVariance offeature of data set DS .corrCorrelation coef ficient offeaturesand of data set DS .CQ Classi fication quality offeature of data setDS.p-value offeature generated from a regression model.noeNoise (e.g.,due to measurement error)associated withfeatureof data set DS .noeMaximum acceptable noise level offeature of data set DS .av Average offeature computed overlag,and .mavMoving average offeature computed overlag,and .A.Data Evaluation Methods P1)Computing and displaying the averageav and thevariancevarfor eachfeature,,and .P2)Computing and displaying the average and the vari-ance of eachfeatureover a time lag and movingaverages,,and .P3)Fitting in a statistical distribution for eachfeature,,and .P4)Computing and displaying correlation coef ficientscorrbetween for any twofeaturesand ,,,and .P5)Computing and displaying an upper and a lower bound(a value range)on eachfeature,,and .P6)Computing feature relevance metrics.Examples ofsuch metrics include:the p-value for each quantitativefeature,,andeach ;the classi fica-tion quality for each qualitative or integerfeature,,andeach .P7)Analysis of the noise (e.g.,due to measurement error)of,,andeach .The above data evaluation methods can be used to de fine suit-able data transformation methods.Some of the most widely ap-plicable data transformation methods are presented next.Ex-amples of data evaluation methods are provided for some data transformation methods.B.Data Transformation MethodsT1)Computing averages and moving averages overlag offeatures ,foralland .Note that in this case the averages are computed over multiple objects rather than the features.T2)Replacing the value of twofeaturesand with theratio ,e.g.,whenavav andavav ,for all,andeach .T3)Replacing the value of twofeaturesand withthedifference,e.g.,whenvarvarvar ,for all,andeach .T4)Forming a featureset , e.g.,whencorrandcorr for each,,,and .T5)Forming a featureset ,e.g.,for the se-lectedfeatureswith a low value ofCQ or high valueof,for some,,,andsome .T6)Removal of insigni ficant features for selected featureswith a low valueof,for some,,,andsome.T7)Discretization of feature values with signi ficant noise,noenoe .T8)Replacing the values offeature with averagesavand moving averagesmav computed overlagforalland ,e.g.,when the features involve many transient values.The above data transformation methods serve two main pur-poses:1)creating data sets in a form that may increase the likeli-hood of extracting rules that may prevail in time,which is important in mining temporal data sets;2)deriving high prediction accuracy objects that are re-silient to the variability of the data,e.g.,the data form.Some of these data transformation methods and the concept of invariant objects are discussed in the two case studies pre-sented below.IV .C ASE S TUDIESThe first study involves data from a wafer production process with product quality as the outcome.Process control signaturesKUSIAK:SELECTION OF INV ARIANT OBJECTS WITH DATA-MINING APPROACH 191derived from invariant objects are presented.The domain of the second case study is an energy conversion process with ef fi-ciency as the predicted parameter.This case study demonstrates the impact of transformed features on the prediction accuracy of ef ficiency.A.Wafer Production ProcessIn this case study a data set with 82features,86objects,and three decision values was considered.This data set was trans-formed by forming feature sets [T3)data transformation method of Section III]removing insigni ficant features [T6)method of Section III],and discretization [T7)method of Section III].To increase the degree of con fidence in the results,the transformed data set was partitioned into two disjoint subsets P and Q,each with an independent subset of features.Each of the two subsets was discretized into eight,six,four,and three intervals,thus resulting in eight data sets P-8,P-6,P-4,P-3,Q-8,Q-6,Q-4,and Q-3(see the corresponding columns of Table IV).To iden-tify high prediction accuracy objects,theone-out-ofcross-validation scheme was applied to each of the eight sub-sets,i.e.,one object at a time was retained as a test object and rules were extracted from the remaining 85objects.This process was repeated 86times for each of the eight data sets.The results of the cross-validation are shown in Table IV ,where entry“”denotes that the corresponding outcome has been correctly pre-dicted,“”indicates a prediction error,and “?”indicates that the decision could not be predicted.The notation for the last four columns in Table IV is as follows:“Neg ”is the number of erro-neous decisions for the corresponding object (the row number),“Und ”is the number of data sets (rules)for which the decision could not be reached,“Object ”denotes the objects of interest,and “D ”is the known decision.The highlighted entries in the “Object ”column denote the objects for which the decision has been correctly predicted for all eight subsets.The remaining en-tries in this column denote objects for which the vast majority of the eight decisions have been correctly predicted.Thirteen of the 32objects (marked in bold)in Table IV (6,12,25,29,33,34,37,39,42,74,76,78,and 82)are associatedwith thedecision(Positive).This decision is the most desirable outcome of the process control.The eight data sets used to generate all the invariant 13ob-jects make up a small sample of all possible data sets that could be considered.This raises the question of whether each of these objects leads to a robust control signature.Some of the objects included in the training data set represent transient states while others represent steady states;however,such process status in-formation is not possible to record.The identi fication of invariant objects offers some additional bene fits.1)Detection of outcomes in the training data sets that have been assigned in error.The objects with high value in the “Neg ”column may have been assigned an erroneous deci-sion value that could be modi fied and the cross-validation process would have to be repeated.2)The decision “D ”assigned in error is likely the reason behind the high value in the “Neg ”column in Table IV.TABLE IVR ESULTS OF THE O NE -O UT -OF n (=86)C ROSS -VALIDATION3)The insuf ficient value of the data (number of objects)is likely the reason behind the small number in the “Und ”column of Table IV.4)An unsuitable data transformation scheme is likely the reason behind the low value in the “Neg ”column in Table IV.In the absence of the process status information,it is likely that only a subset of all invariant objects is associated with the steady state of the process.The process operators have observed192IEEE TRANSACTIONS ON ELECTRONICS PACKAGING MANUFACTURING,VOL.28,NO.2,APRIL 2005TABLE IV (Continued.)R ESULTS OF THE O NE -O UT -OF n (=86)C ROSS -VALIDATIONthat the desired process outcomes relate to groupings of certain features (control parameters).To follow this working hypoth-esis,from the eight subsets considered in Table IV ,two arbi-trary subsets P8and Q8have been created.The decision rules extracted from these two data sets are presented in Appendixes 1and 2.These rules are concerned with three decisions:N (neg-ative),Z (zero),and P (positive).The rules with the desirableTABLE VR ULE —I NV ARIANT O BJECT I NCIDENCE MATRIXoutcome are of interest to this research.To depict therelationship between the ruleswithand the invariant objects of Table IV ,the incidence matrix shown in Table V was constructed.Each entry“”in Table V corresponds to the high-lighted object from Table V that is also on the list of objects supporting the corresponding rules.For example,object 82(see row 82in Table IV)is one of the nine objects supporting rule 9-1(see Appendix 1).To enable further analysis,the data in the matrix of Table V have been clustered as shown in Table VI.The cluster of four rules {9-2,11-1,13-2,8-1}and three ob-jects {37,76,42}is clearly visible in Table VI.To analyze the commonalty among the invariant (Inv)objects in this cluster,the features present in the rules of this cluster are considered.The original values of these features are shown in Table VII.The similarity between the feature values among the three ob-jects can be observed.The last five features in Table VII ap-pear as the set F68_F69_F70_F71_F72in the rule R9-2of Ap-pendix 2.This leads one to believe that the feature values in Table VII might provide a comfortable range of settings for the eight parameters (features)and one parameter set (feature set).In addition to the similarity among the feature values in Table VII,the analysis of the original data for the object cluster {37,42,76}has resulted in the set of features in Table VIII.The sets of these features F45_F46_F47_F48,F57_F58_F59_F60,F63_F64_F65_F66_F67,F73_F74_F75_F76,and F78_F79_F80_F81_F82are the most similar of all 13objects in Table VI.The control signature involves the feature values from Ta-bles VII and VIII.The case study presented in the next section illustrates the application of a transformation scheme to a large-scale data set to improve prediction accuracy.B.Energy Conversion ProcessThe ideas proposed in this research have been applied to im-prove performance of the energy conversion process.A large database of legacy data and essentially any amount of new data can be collected.In this case the data for 90parameters have been recorded every minute,which amounts to 1440observa-tions per 24hours,every day throughout a year,except for a maintenance period.The 90parameters encompass the main。
基于跨模态检索的效率优化算法
基于跨模态检索的效率优化算法徐明亮; 余肖生【期刊名称】《《计算机技术与发展》》【年(卷),期】2019(029)011【总页数】4页(P67-70)【关键词】跨模态检索; 语义鸿沟; 典型相关分析; 主成分分析; 子空间投影【作者】徐明亮; 余肖生【作者单位】三峡大学计算机与信息学院湖北宜昌 443002【正文语种】中文【中图分类】TP3010 引言随着互联网技术的不断发展变化,人们越来越注重于信息的交互。
人们对于信息的需求已从最初的单一新闻上的文字发展到后来的图片、视频、声音等。
在各种网络平台上,这些不同类型的数据相互交织,互为补充,且存在一定的关联。
同一信息可能以不同类型的数据呈现。
为了从不同类型的数据中同时找到表示同一信息的数据,跨模态信息检索技术应运而生。
传统的信息检索主要针对同类型的数据提取特征向量,对其进行相似度度量,根据相似度的排名来实现单模态的信息检索。
而跨模态信息检索则是建立不同模态的隐式关系模型,让不同模态能在同一空间下像单模态度量一样进行相似度度量,从而完成不同模态间的相互检索。
不同类型的模态数据,由于提取的特征向量的方式不同,导致在同一空间投影和匹配时工作量巨大。
针对传统的跨模态检索算法在处理高维度计算量巨大的问题,文中提出了一种跨模态信息检索的优化方法。
实验表明与原有算法相比,该方法在保证查准率基本不变的情况下,可以大幅减少原有算法的计算量,提高检索效率。
1 相关研究跨模态信息检索主要包括三个步骤:一是提取不同模态的特征信息来构建特征子空间;二是采用某种算法判断不同模态间特征子空间数据的关联性;三是在特征子空间下进行相似度度量,得出相应结果。
1.1 模态信息特征表达为了提取不同类型数据信息,需对原始数据进行特征提取,取出原有数据特征向量。
根据图像的特征表示,图像类型的特征可分为全局特征和局部特征两大类。
对全局特征而言,常用的提取结果主要有颜色直方图和纹理灰度矩阵;对局部特征,常用的处理结果主要有尺度不变特征,方向梯度直方图等。
BEASTc操作指南
Bayesian evolutionary analysis of viruses:A practical introduction to BEASTAndrew Rambaut and Alexei J DrummondSeptember 3,2009IntroductionThis practical will introduce the BEAST software for Bayesian evolutionary analysis,with a focus on virus evolution.It is divided into three sections or exercises:1.Rates and dates -The first exercise will take you through the process of estimating divergence times in a phylogeny when you have calibration information from fossil evidence or other prior knowledge.It will also demonstrate how to analyze a data set using a relaxed molecular clock.2.Time-stamped data -The second exercise will demonstrate how to use BEAST to estimate the rate of evolution of a virus that has been sampled from multiple time points.3.Bayesian skyline plot -The third exercise will take you though the estimation of populationhistory of a virus epidemic using the Bayesian skyline plot.To undertake you will need to have access to the following software packages in a format that is compatible with your computer system (all three are available for Mac OS X,Windows and Linux/UNIX operating systems):•BEAST -this package contains the BEAST program,BEAUti,TreeAnnota-tor and other utility programs.At the time of writing,the current version is v1.5.1.It is available for download from /and /p/beast-mcmc/downloads/list .•Tracer -this program is used to explore the output of BEAST (and other Bayesian MCMC programs).It graphically and quantitively summarizes the dis-tributions of continuous parameters and provides diagnostic information.At the time of writing,the current version is v1.4.1.It is available for download from /.•FigTree-this is an application for displaying and printing molecular phylogenies, in particular those obtained using BEAST.At the time of writing,the current version is v1.2.3.It is available for download from /. Exercise1:Rates and datesThis exercise will guide you through the analysis of an alignment of feline papilloma virus(FPV)sequences.The goal is to estimate the rate of evolution on each lineage based on dates of divergence of their host species.BEAST is currently unique in its ability to estimate the phylogenetic tree and the divergence times simultaneously.Thefirst step will be to convert a NEXUSfile with a DATA or CHARACTERS block into a BEAST XML inputfile.This is done using the program BEAUti(this stands for Bayesian Evolutionary Analysis Utility).This is a user-friendly program for setting the evolutionary model and options for the MCMC analysis.The second step is to actually run BEAST using the inputfile that contains the data,model and settings. Thefinal step is to explore the output of BEAST in order to diagnose problems and to summarize the results.BEAUtiThe program BEAUti is a user-friendly program for setting the model parameters for BEAST.Run BEAUti by double clicking on its icon.Loading the NEXUSfileTo load a NEXUS format alignment,simply select the Import Alignment...option from the File menu:Select thefile called FPV.nex.Thisfile contains an alignment of partial genome sequences of papilloma virus from5species of cat along with related viruses from a racoon and a dog.It looks like this(the lines have been truncated):#NEXUSBegin data;Dimensions ntax=7nchar=1434;Format datatype=nucleotide gap=-;MatrixCanineOralPV ATGGCAAGGAAAAGACGCGCAGCCCCTCAAGATATATACCCTGCTTGTAAA FelisPV1ATGCTTAGGCAAAAACGTGCAGCCCCAAAAGATATTTACCCACAATGCAAG LynxPV1ATGCTACGGCGAAAACGTGCAGCCCCCCATGATATCTACCCCCAATGCAAA PumaPV1ATGCTTAGGCGAAAACGTGCAGCCCCCAAAGATATTTACCCCCAATGCAAA RacoonPV1ATGACTCGCAAACGCCGCGCCGCTCCTCGTGATATATACCCCTCTTGCAAA AsianLionPV1ATGCTAAGGCGAAAACGTGCAGCCCCCTCAGATATCTACCCCCAATGCAAA SnowLeopardPV1ATGCTAAGGCGAAAACGTGCAGCCCCTTCTGATATTTACCCACAATGCAAA ;End;Once loaded,the list of taxa and the actual alignment will be displayed in the main panel:Defining the calibration nodesSelect the Taxon Sets tab at the top of window.You will see the panel that allows you to create sets of taxa.Once you have created a taxa set you will be able to add calibration information for it most recent common ancestor(MRCA)later on. Press the small“plus”button at the bottom left of the panel.This will create a new taxon set.Rename it by double-clicking on the entry that appears(it will initially be called untitled1).Call it Felis/Lynx/Puma.In the next table along you will see the available taxa.Select the FelisPV1,LynxPV1and PumaPV1taxa and press the green arrow button to move them into the included taxa set.Now repeat the whole procedure creating a set called Lion/Leopard that contains on the SnowLeopardPV1and AsianLionPV1taxa.The screen should look like this: Finally,create a taxon group that contains all the cat papilloma virus sequences (i.e.everything except RacoonPV1and CanineOralPV).Call this taxon set Cats.If we wished to enforce the racoon and canine PV sequences as an outgroup,we could select the checkbox in the Monophyletic?column.This will ensure that the Cats ingroup is kept monophyletic during the MCMC analysis.However,for this analysis we are not going to enforce this assumption as we want to confirm that the papilloma virus’tree is the same as the hosts’.定义校验节点123Setting modelThenext thing to do is to click on the Site Models tab at the top of the main window.This will reveal the evolutionary model settings for BEAST.Exactly which options appear depend on whether the data are nucleotides or amino acids.The settings that will appear after loading the FPV data set will be the default values so we need to make some changes.Most of the models should be familiar to you.For this analysis,we will make three changes.First,select Empirical under the Base frequencies menu.Second,select Gamma under the Site Heterogeneity Model menu which will allow rate variation between sites in the alignment.Setting the clock modelThe third thing we will do is to click on the Clock Models tab at the top of the main window,and to change the molecular clock model to Relaxed Clock:Uncor-related Log-normal so as to account for lineage-specific rate heterogeneity.Your model options should now look like this:The Estimate check box is required to be checked.This is because we wish to estimate the clock rate (and in doing so the divergence times).But this will be auto-matically checked,in this case,when we put a proper prior on tmcra statistics appeared in Priors panel.TreesThe Trees tab allows priors to be specified for each parameter in the model.The first thing to do is to specify that we wish to use the Yule model as the tree prior.Thisis a simple model of speciation that is generally more appropriate when considering sequences from different species.Select this from the Tree prior dropdown menu.需要点击选定PriorsWe now need to specify a prior distribution for some of the divergence times on Prioron our prior fossil knowledge.This is known as calibrating our tree.Weuse two calibrations in this analysis.on the button in the table next:box will appear allowing you to specify a prior for the MRCA of these three species.Select the Normal distribution我们需要详细说明先验分布为了分叉时间,并且表中如有红色字体,请先校正红字后再进行下面操作具体设置可参考第9页最上面表中内容。
计算机视觉算法与应用的一些测试数据集和源码站点
以下是computer vision:algorithm and application计算机视觉算法与应用这本书中附录里的关于计算机视觉的一些测试数据集和源码站点,我整理了下,加了点中文注解。
Computer Vision:Algorithms and ApplicationsRichard Szeliski在本书的最好附录中,我总结了一些对学生,教授和研究者有用的附加材料。
这本书的网址/Book包含了更新的数据集和软件,请同样访问他。
C.1 数据集一个关键就是用富有挑战和典型的数据集来测试你算法的可靠性。
当有背景或者他人的结果是可行的,这种测试可能甚至包含更多的信息(和质量更好)。
经过这些年,大量的数据集已经被提出来用于测试和评估计算机视觉算法。
许多这些数据集和软件被编入了计算机视觉的主页。
一些更新的网址,像CV online(/rbf/CV online), (/), and Computer Vision online (/ ), 有更多最新的数据集和软件。
下面,我列出了一些用的最多的数据集,我将它们让章节排列以便它们联系更紧密。
第二章:图像信息CUReT: Columbia-Utrecht 反射率和纹理数据库Reflectance and Texture Database, /CA VE/software/curet/(Dana, van Ginneken, Nayar et al. 1999).Middlebury Color Datasets:不同摄像机拍摄的图像,注册后用于研究不同的摄像机怎么改变色域和彩色registered color images taken by different cameras to study how they transform gamuts and colors, /color/data/Chakrabarti, Scharstein, and Zickler 2009).第三章:图像处理Middlebury test datasets for evaluating MRF minimization/inference algorithms评估隐马尔科夫随机场最小化和推断算法,/MRF/results/ (Szeliski, Zabih, Scharstein et al. 2008).第四章:特征检测和匹配Affine Covariant Features database(反射协变的特征数据集)for evaluating feature detector and descriptor matching quality and repeatability(评估特征检测和描述匹配的质量和定位精度), /~vgg/research/affine/(Miko-lajczyk and Schmid 2005; Mikolajczyk, Tuytelaars, Schmid et al. 2005).Database of matched image patches for learning (图像斑块匹配学习数据库)and feature descriptor evaluation(特征描述评估数据库),http://cvlab.epfl.ch/~brown/patchdata/patchdata.html(Winder and Brown 2007; Hua,Brown, and Winder 2007).第五章;分割Berkeley Segmentation Dataset(分割数据库)and Benchmark of 1000 images labeled by 30 humans,(30个人标记的1000副基准图像)along with an evaluation,/Research/Projects/CS/vision/grouping/segbench/(Martin, Fowlkes, Tal et al. 2001).Weizmann segmentation evaluation database of 100 grayscale images with ground truth segmentations,http://www.wisdom.weizmann.ac.il/~vision/Seg Evaluation DB/index.html(Alpert, Galun, Basri et al. 2007).第八章:稠密运动估计The Middlebury optic flow evaluation(光流评估)Web site,/flow/data/(Baker, Scharstein, Lewis et al. 2009).The Human-Assisted Motion Annotation database,(人类辅助运动数据库)/celiu/motionAnnotation/(Liu, Freeman, Adelson et al. 2008)第十章:计算机摄像学High Dynamic Range radiance(辐射)maps, /Research/HDR/(De-bevec and Malik 1997).Alpha matting evaluation Web site, / (Rhemann, Rother, Wanget al. 2009).第十一章:Stereo correspondence立体对应Middlebury Stereo Datasets and Evaluation, /stereo/(Scharstein and Szeliski 2002).Stereo Classification(立体分类)and Performance Evaluation(性能评估)of different aggregation(聚类)costs for stereo matching(立体匹配),http://www.vision.deis.unibo.it/spe/SPEHome.aspx(Tombari, Mat-toccia, Di Stefano et al. 2008).Middlebury Multi-View Stereo Datasets,/mview/data/(Seitz,Curless, Diebel et al. 2006).Multi-view and Oxford Colleges building reconstructions,/~vgg/data/data-mview.html .Multi-View Stereo Datasets, http://cvlab.epfl.ch/data/strechamvs/(Strecha, Fransens,and Van Gool 2006).Multi-View Evaluation, http://cvlab.epfl.ch/~strecha/multiview/ (Strecha, von Hansen,Van Gool et al. 2008).第十二章:3D重建HumanEva: synchronized video(同步视频)and motion capture (动作捕捉)dataset for evaluation of articulated human motion, /humaneva/Sigal, Balan, and Black 2010).第十三章:图像渲染The (New) Stanford Light Field Archive, /(Wilburn, Joshi,Vaish et al. 2005).Virtual Viewpoint Video: multi-viewpoint video with per-frame depth maps,/en-us/um/redmond/groups/ivm/vvv/(Zitnick, Kang, Uytten- daele et al. 2004).第十四章:识别查找一系列的视觉识别数据库,在表14.1–14.2.除了那些,这里还有:Buffy pose classes, /~vgg/data/buffy pose classes/ and Buffy stickmen V2.1, /~vgg/data/stickmen/index.html(Ferrari,Marin- Jimenez, and Zisserman 2009; Eichner and Ferrari 2009).H3D database of pose/joint annotated photographs of humans,/~lbourdev/h3d/(Bourdev and Malik 2009).Action Recognition Datasets, /projects/vision/action, has point-ers to several datasets for action and activity recognition, as well as some papers.(有一些关于人活动和运动的数据库和论文)The human action database athttp://www.nada.kth.se/cvap/actions/包含更多的行动序列。
印刷体数学公式识别系统的设计与实现——分割、识别与重组
摘要随着计算机的普及,人们越来越多的使用计算机处理日常工作和存储信息。
目前广泛应用的OCR系统对手写、印刷体文本都有很高的识别率,已经广泛应用于办公自动化、快速录入等领域,克服了人工输入费时费力的缺点。
但是,对于一篇科技文献,其中有大量的数学公式,它们是由特殊的符号、希腊字母、英文字符和数字组成的复杂的结构体。
当前的OCR系统只能识别单个字符,还不能分析公式结构,这样识别出来的公式只是一组毫无关系的字符串,失去了它所表达的数学含义。
为此,我们提出了一种新的关于表达式识别的设计思想,并给出了完整的算法,将印刷体的数学公式(图像格式)转换成可编辑的电子格式(如MⅨ,Word公式编辑器)。
按照表达式识别系统的流程,本文相应的分为以下四部分:粘连字符的分割。
由于纸质文档的印刷质量、纸张的光洁度、扫描仪的分辨率、二值化等因素的影响,扫描得到的图像中的字符可能是粘连的。
这为字符识别带来了困难。
本文提出用自组织映射作字符分割的方法,对经典的自组织学习规则做了一些改进,使其能以较少的神经元结点、较快的速度逼近粘连字符的白像素点的分布。
文中对最短路径分割方法和自组织映射法分割做了对比,后者能分割一些前者不能处理的粘连字符。
特征提取与选择。
一个字符图像只是模式空间中的特征,还不能用来分类.必须在它上面提取抗旋转、缩放、平移的几何不变性特征。
文中介绍三种常用的矩方法:规则矩、Zernike矩和样条小波矩。
通过计算这三种矩可分性度量,发现Zernike矩更适于做字符的特征。
文中还介绍了基于神经网络的主分量分析方法,在38维矩特征中选取18维的主特征,保留信息量的同时,大大降低了特征矢量的维数.消除了样本间的相关性,突出了差异性。
字符识别。
分类器是整个识别系统的核心。
神经网络已经被广泛用于模式识别,克服了当前常用的模式识别方法的缺点,有效提高了识别率。
文中用自组织特征映射做字符的粗分类,将特征相近的字符分在一组。
然后BP神经网络对各组字符做细分类,识别出同一组的不同字符,有效地提高了分类精度,公式重构。
卡梅伦液压数据手册(第 20 版)说明书
iv
⌂
CONTENTS OF SECTION 1
☰ Hydraulics
⌂ Cameron Hydraulic Data ☰
Introduction. . . . . . . . . . . . . ................................................................ 1-3 Liquids. . . . . . . . . . . . . . . . . . . ...................................... .......................... 1-3
4
Viscosity etc.
Steam data....................................................................................................................................................................................... 6
1 Liquid Flow.............................................................................. 1-4
Viscosity. . . . . . . . . . . . . . . . . ...................................... .......................... 1-5 Pumping. . . . . . . . . . . . . . . . . ...................................... .......................... 1-6 Volume-System Head Calculations-Suction Head. ........................... 1-6, 1-7 Suction Lift-Total Discharge Head-Velocity Head............................. 1-7, 1-8 Total Sys. Head-Pump Head-Pressure-Spec. Gravity. ...................... 1-9, 1-10 Net Positive Suction Head. .......................................................... 1-11 NPSH-Suction Head-Life; Examples:....................... ............... 1-11 to 1-16 NPSH-Hydrocarbon Corrections.................................................... 1-16 NPSH-Reciprocating Pumps. ....................................................... 1-17 Acceleration Head-Reciprocating Pumps. ........................................ 1-18 Entrance Losses-Specific Speed. .................................................. 1-19 Specific Speed-Impeller. .................................... ........................ 1-19 Specific Speed-Suction...................................... ................. 1-20, 1-21 Submergence.. . . . . . . . . ....................................... ................. 1-21, 1-22 Intake Design-Vertical Wet Pit Pumps....................................... 1-22, 1-27 Work Performed in Pumping. ............................... ........................ 1-27 Temperature Rise. . . . . . . ...................................... ........................ 1-28 Characteristic Curves. . ...................................... ........................ 1-29 Affinity Laws-Stepping Curves. ..................................................... 1-30 System Curves.. . . . . . . . ....................................... ........................ 1-31 Parallel and Series Operation. .............................. ................. 1-32, 1-33 Water Hammer. . . . . . . . . . ...................................... ........................ 1-34 Reciprocating Pumps-Performance. ............................................... 1-35 Recip. Pumps-Pulsation Analysis & System Piping...................... 1-36 to 1-45 Pump Drivers-Speed Torque Curves. ....................................... 1-45, 1-46 Engine Drivers-Impeller Profiles. ................................................... 1-47 Hydraulic Institute Charts.................................... ............... 1-48 to 1-52 Bibliography.. . . . . . . . . . . . ...................................... ........................ 1-53
Distinctive Image Features from Scale-Invariant Keypoints译文
Distinctive Image Features from Scale-Invariant KeypointsDavid G.LoweComputer Science Department, University of British Columbia, Vancouver, B.C., Canada摘要:本文提出了一种从图像中提取独特不变特征的方法,可用于不同视角之间目标或场景的可靠匹配的方法。
这种特点对图像的尺度和旋转具有不变性。
并在大范围的仿射变换,三维视点的改变,噪音和光照变化的图像匹配具有鲁棒性。
特征是highly distinctive的,使场景图像中的单一特征和许多图像中提取的大型特征数据库一样,有很高的正确匹配率。
本文还介绍了一个使用该特征来识别目标的方法。
通过将个别特征与由已知目标特征组成的数据库进行快速最近邻算法的匹配,然后使用Hough变换来识别属于单一目标的类簇(clusters),最后通过执行一致的构成参数的最小二乘解来验证。
这种识别方法可以在杂乱和遮挡的对象间鲁棒的识别目标并且具有接近线性的时间复杂度。
关键词:不变特征,目标识别,尺度不变性,图像匹配1. Introduction图像匹配是计算机视觉领域中很多问题的关键,包括目标和场景识别、多幅影像的3D structure、stereo correspondence、motion tracking等。
本文描述的图像特征有很多特性使得它适合将一个目标或场景的不同影像进行匹配。
这些特征对于图像尺度和旋转具有不变性,并在光照变化和三维相机视点变化的情况下具有部分的不变性。
它在空间域和频率域具有很好的局部性,减少了遮挡(occlusion)、杂乱和噪音的影响。
通过有效的算法,可以从典型的图像中提取海量的特征。
另外,这些特征是highly distinctive的,使场景图像中的单一特征和大型特征数据库一样,有很高的正确匹配率,为目标和场景识别提供了基础。
A survey of content based 3d shape retrieval methods
A Survey of Content Based3D Shape Retrieval MethodsJohan W.H.Tangelder and Remco C.VeltkampInstitute of Information and Computing Sciences,Utrecht University hanst@cs.uu.nl,Remco.Veltkamp@cs.uu.nlAbstractRecent developments in techniques for modeling,digitiz-ing and visualizing3D shapes has led to an explosion in the number of available3D models on the Internet and in domain-specific databases.This has led to the development of3D shape retrieval systems that,given a query object, retrieve similar3D objects.For visualization,3D shapes are often represented as a surface,in particular polygo-nal meshes,for example in VRML format.Often these mod-els contain holes,intersecting polygons,are not manifold, and do not enclose a volume unambiguously.On the con-trary,3D volume models,such as solid models produced by CAD systems,or voxels models,enclose a volume prop-erly.This paper surveys the literature on methods for con-tent based3D retrieval,taking into account the applicabil-ity to surface models as well as to volume models.The meth-ods are evaluated with respect to several requirements of content based3D shape retrieval,such as:(1)shape repre-sentation requirements,(2)properties of dissimilarity mea-sures,(3)efficiency,(4)discrimination abilities,(5)ability to perform partial matching,(6)robustness,and(7)neces-sity of pose normalization.Finally,the advantages and lim-its of the several approaches in content based3D shape re-trieval are discussed.1.IntroductionThe advancement of modeling,digitizing and visualizing techniques for3D shapes has led to an increasing amount of3D models,both on the Internet and in domain-specific databases.This has led to the development of thefirst exper-imental search engines for3D shapes,such as the3D model search engine at Princeton university[2,57],the3D model retrieval system at the National Taiwan University[1,17], the Ogden IV system at the National Institute of Multimedia Education,Japan[62,77],the3D retrieval engine at Utrecht University[4,78],and the3D model similarity search en-gine at the University of Konstanz[3,84].Laser scanning has been applied to obtain archives recording cultural heritage like the Digital Michelan-gelo Project[25,48],and the Stanford Digital Formae Urbis Romae Project[75].Furthermore,archives contain-ing domain-specific shape models are now accessible by the Internet.Examples are the National Design Repos-itory,an online repository of CAD models[59,68], and the Protein Data Bank,an online archive of struc-tural data of biological macromolecules[10,80].Unlike text documents,3D models are not easily re-trieved.Attempting tofind a3D model using textual an-notation and a conventional text-based search engine would not work in many cases.The annotations added by human beings depend on language,culture,age,sex,and other fac-tors.They may be too limited or ambiguous.In contrast, content based3D shape retrieval methods,that use shape properties of the3D models to search for similar models, work better than text based methods[58].Matching is the process of determining how similar two shapes are.This is often done by computing a distance.A complementary process is indexing.In this paper,indexing is understood as the process of building a datastructure to speed up the search.Note that the term indexing is also of-ten used for the identification of features in models,or mul-timedia documents in general.Retrieval is the process of searching and delivering the query results.Matching and in-dexing are often part of the retrieval process.Recently,a lot of researchers have investigated the spe-cific problem of content based3D shape retrieval.Also,an extensive amount of literature can be found in the related fields of computer vision,object recognition and geomet-ric modelling.Survey papers to this literature have been provided by Besl and Jain[11],Loncaric[50]and Camp-bell and Flynn[16].For an overview of2D shape match-ing methods we refer the reader to the paper by Veltkamp [82].Unfortunately,most2D methods do not generalize di-rectly to3D model matching.Work in progress by Iyer et al.[40]provides an extensive overview of3D shape search-ing techniques.Atmosukarto and Naval[6]describe a num-ber of3D model retrieval systems and methods,but do not provide a categorization and evaluation.In contrast,this paper evaluates3D shape retrieval meth-ods with respect to several requirements on content based 3D shape retrieval,such as:(1)shape representation re-quirements,(2)properties of dissimilarity measures,(3)ef-ficiency,(4)discrimination abilities,(5)ability to perform partial matching,(6)robustness,and(7)necessity of posenormalization.In section2we discuss several aspects of3D shape retrieval.The literature on3D shape matching meth-ods is discussed in section3and evaluated in section4. 2.3D shape retrieval aspectsIn this section we discuss several issues related to3D shape retrieval.2.1.3D shape retrieval frameworkAt a conceptual level,a typical3D shape retrieval frame-work as illustrated byfig.1consists of a database with an index structure created offline and an online query engine. Each3D model has to be identified with a shape descrip-tor,providing a compact overall description of the shape. To efficiently search a large collection online,an indexing data structure and searching algorithm should be available. The online query engine computes the query descriptor,and models similar to the query model are retrieved by match-ing descriptors to the query descriptor from the index struc-ture of the database.The similarity between two descriptors is quantified by a dissimilarity measure.Three approaches can be distinguished to provide a query object:(1)browsing to select a new query object from the obtained results,(2) a direct query by providing a query descriptor,(3)query by example by providing an existing3D model or by creating a3D shape query from scratch using a3D tool or sketch-ing2D projections of the3D model.Finally,the retrieved models can be visualized.2.2.Shape representationsAn important issue is the type of shape representation(s) that a shape retrieval system accepts.Most of the3D models found on the World Wide Web are meshes defined in afile format supporting visual appearance.Currently,the most common format used for this purpose is the Virtual Real-ity Modeling Language(VRML)format.Since these mod-els have been designed for visualization,they often contain only geometry and appearance attributes.In particular,they are represented by“polygon soups”,consisting of unorga-nized sets of polygons.Also,in general these models are not“watertight”meshes,i.e.they do not enclose a volume. By contrast,for volume models retrieval methods depend-ing on a properly defined volume can be applied.2.3.Measuring similarityIn order to measure how similar two objects are,it is nec-essary to compute distances between pairs of descriptors us-ing a dissimilarity measure.Although the term similarity is often used,dissimilarity corresponds to the notion of dis-tance:small distances means small dissimilarity,and large similarity.A dissimilarity measure can be formalized by a func-tion defined on pairs of descriptors indicating the degree of their resemblance.Formally speaking,a dissimilarity measure d on a set S is a non-negative valued function d:S×S→R+∪{0}.Function d may have some of the following properties:i.Identity:For all x∈S,d(x,x)=0.ii.Positivity:For all x=y in S,d(x,y)>0.iii.Symmetry:For all x,y∈S,d(x,y)=d(y,x).iv.Triangle inequality:For all x,y,z∈S,d(x,z)≤d(x,y)+d(y,z).v.Transformation invariance:For a chosen transforma-tion group G,for all x,y∈S,g∈G,d(g(x),g(y))= d(x,y).The identity property says that a shape is completely similar to itself,while the positivity property claims that dif-ferent shapes are never completely similar.This property is very strong for a high-level shape descriptor,and is often not satisfied.However,this is not a severe drawback,if the loss of uniqueness depends on negligible details.Symmetry is not always wanted.Indeed,human percep-tion does not alwaysfind that shape x is equally similar to shape y,as y is to x.In particular,a variant x of prototype y,is often found more similar to y then vice versa[81].Dissimilarity measures for partial matching,giving a small distance d(x,y)if a part of x matches a part of y, do not obey the triangle inequality.Transformation invariance has to be satisfied,if the com-parison and the extraction process of shape descriptors have to be independent of the place,orientation and scale of the object in its Cartesian coordinate system.If we want that a dissimilarity measure is not affected by any transforma-tion on x,then we may use as alternative formulation for (v):Transformation invariance:For a chosen transforma-tion group G,for all x,y∈S,g∈G,d(g(x),y)=d(x,y).When all the properties(i)-(iv)hold,the dissimilarity measure is called a metric.Other combinations are possi-ble:a pseudo-metric is a dissimilarity measure that obeys (i),(iii)and(iv)while a semi-metric obeys only(i),(ii)and(iii).If a dissimilarity measure is a pseudo-metric,the tri-angle inequality can be applied to make retrieval more effi-cient[7,83].2.4.EfficiencyFor large shape collections,it is inefficient to sequen-tially match all objects in the database with the query object. Because retrieval should be fast,efficient indexing search structures are needed to support efficient retrieval.Since for query by example the shape descriptor is computed online, it is reasonable to require that the shape descriptor compu-tation is fast enough for interactive querying.2.5.Discriminative powerA shape descriptor should capture properties that dis-criminate objects well.However,the judgement of the sim-ilarity of the shapes of two3D objects is somewhat sub-jective,depending on the user preference or the application at hand.E.g.for solid modeling applications often topol-ogy properties such as the numbers of holes in a model are more important than minor differences in shapes.On the contrary,if a user searches for models looking visually sim-ilar the existence of a small hole in the model,may be of no importance to the user.2.6.Partial matchingIn contrast to global shape matching,partial matching finds a shape of which a part is similar to a part of another shape.Partial matching can be applied if3D shape mod-els are not complete,e.g.for objects obtained by laser scan-ning from one or two directions only.Another application is the search for“3D scenes”containing an instance of the query object.Also,this feature can potentially give the user flexibility towards the matching problem,if parts of inter-est of an object can be selected or weighted by the user. 2.7.RobustnessIt is often desirable that a shape descriptor is insensitive to noise and small extra features,and robust against arbi-trary topological degeneracies,e.g.if it is obtained by laser scanning.Also,if a model is given in multiple levels-of-detail,representations of different levels should not differ significantly from the original model.2.8.Pose normalizationIn the absence of prior knowledge,3D models have ar-bitrary scale,orientation and position in the3D space.Be-cause not all dissimilarity measures are invariant under ro-tation and translation,it may be necessary to place the3D models into a canonical coordinate system.This should be the same for a translated,rotated or scaled copy of the model.A natural choice is tofirst translate the center to the ori-gin.For volume models it is natural to translate the cen-ter of mass to the origin.But for meshes this is in gen-eral not possible,because they have not to enclose a vol-ume.For meshes it is an alternative to translate the cen-ter of mass of all the faces to the origin.For example the Principal Component Analysis(PCA)method computes for each model the principal axes of inertia e1,e2and e3 and their eigenvaluesλ1,λ2andλ3,and make the nec-essary conditions to get right-handed coordinate systems. These principal axes define an orthogonal coordinate sys-tem(e1,e2,e3),withλ1≥λ2≥λ3.Next,the polyhe-dral model is rotated around the origin such that the co-ordinate system(e x,e y,e z)coincides with the coordinatesystem(e1,e2,e3).The PCA algorithm for pose estimation is fairly simple and efficient.However,if the eigenvalues are equal,prin-cipal axes may switch,without affecting the eigenvalues. Similar eigenvalues may imply an almost symmetrical mass distribution around an axis(e.g.nearly cylindrical shapes) or around the center of mass(e.g.nearly spherical shapes). Fig.2illustrates the problem.3.Shape matching methodsIn this section we discuss3D shape matching methods. We divide shape matching methods in three broad cate-gories:(1)feature based methods,(2)graph based meth-ods and(3)other methods.Fig.3illustrates a more detailed categorization of shape matching methods.Note,that the classes of these methods are not completely disjoined.For instance,a graph-based shape descriptor,in some way,de-scribes also the global feature distribution.By this point of view the taxonomy should be a graph.3.1.Feature based methodsIn the context of3D shape matching,features denote ge-ometric and topological properties of3D shapes.So3D shapes can be discriminated by measuring and comparing their features.Feature based methods can be divided into four categories according to the type of shape features used: (1)global features,(2)global feature distributions,(3)spa-tial maps,and(4)local features.Feature based methods from thefirst three categories represent features of a shape using a single descriptor consisting of a d-dimensional vec-tor of values,where the dimension d isfixed for all shapes.The value of d can easily be a few hundred.The descriptor of a shape is a point in a high dimensional space,and two shapes are considered to be similar if they are close in this space.Retrieving the k best matches for a3D query model is equivalent to solving the k nearest neighbors -ing the Euclidean distance,matching feature descriptors can be done efficiently in practice by searching in multiple1D spaces to solve the approximate k nearest neighbor prob-lem as shown by Indyk and Motwani[36].In contrast with the feature based methods from thefirst three categories,lo-cal feature based methods describe for a number of surface points the3D shape around the point.For this purpose,for each surface point a descriptor is used instead of a single de-scriptor.3.1.1.Global feature based similarityGlobal features characterize the global shape of a3D model. Examples of these features are the statistical moments of the boundary or the volume of the model,volume-to-surface ra-tio,or the Fourier transform of the volume or the boundary of the shape.Zhang and Chen[88]describe methods to com-pute global features such as volume,area,statistical mo-ments,and Fourier transform coefficients efficiently.Paquet et al.[67]apply bounding boxes,cords-based, moments-based and wavelets-based descriptors for3D shape matching.Corney et al.[21]introduce convex-hull based indices like hull crumpliness(the ratio of the object surface area and the surface area of its convex hull),hull packing(the percentage of the convex hull volume not occupied by the object),and hull compactness(the ratio of the cubed sur-face area of the hull and the squared volume of the convex hull).Kazhdan et al.[42]describe a reflective symmetry de-scriptor as a2D function associating a measure of reflec-tive symmetry to every plane(specified by2parameters) through the model’s centroid.Every function value provides a measure of global shape,where peaks correspond to the planes near reflective symmetry,and valleys correspond to the planes of near anti-symmetry.Their experimental results show that the combination of the reflective symmetry de-scriptor with existing methods provides better results.Since only global features are used to characterize the overall shape of the objects,these methods are not very dis-criminative about object details,but their implementation is straightforward.Therefore,these methods can be used as an activefilter,after which more detailed comparisons can be made,or they can be used in combination with other meth-ods to improve results.Global feature methods are able to support user feed-back as illustrated by the following research.Zhang and Chen[89]applied features such as volume-surface ratio, moment invariants and Fourier transform coefficients for 3D shape retrieval.They improve the retrieval performance by an active learning phase in which a human annotator as-signs attributes such as airplane,car,body,and so on to a number of sample models.Elad et al.[28]use a moments-based classifier and a weighted Euclidean distance measure. Their method supports iterative and interactive database searching where the user can improve the weights of the distance measure by marking relevant search results.3.1.2.Global feature distribution based similarityThe concept of global feature based similarity has been re-fined recently by comparing distributions of global features instead of the global features directly.Osada et al.[66]introduce and compare shape distribu-tions,which measure properties based on distance,angle, area and volume measurements between random surface points.They evaluate the similarity between the objects us-ing a pseudo-metric that measures distances between distri-butions.In their experiments the D2shape distribution mea-suring distances between random surface points is most ef-fective.Ohbuchi et al.[64]investigate shape histograms that are discretely parameterized along the principal axes of inertia of the model.The shape descriptor consists of three shape histograms:(1)the moment of inertia about the axis,(2) the average distance from the surface to the axis,and(3) the variance of the distance from the surface to the axis. Their experiments show that the axis-parameterized shape features work only well for shapes having some form of ro-tational symmetry.Ip et al.[37]investigate the application of shape distri-butions in the context of CAD and solid modeling.They re-fined Osada’s D2shape distribution function by classifying2random points as1)IN distances if the line segment con-necting the points lies complete inside the model,2)OUT distances if the line segment connecting the points lies com-plete outside the model,3)MIXED distances if the line seg-ment connecting the points lies passes both inside and out-side the model.Their dissimilarity measure is a weighted distance measure comparing D2,IN,OUT and MIXED dis-tributions.Since their method requires that a line segment can be classified as lying inside or outside the model it is required that the model defines a volume properly.There-fore it can be applied to volume models,but not to polyg-onal soups.Recently,Ip et al.[38]extend this approach with a technique to automatically categorize a large model database,given a categorization on a number of training ex-amples from the database.Ohbuchi et al.[63],investigate another extension of the D2shape distribution function,called the Absolute Angle-Distance histogram,parameterized by a parameter denot-ing the distance between two random points and by a pa-rameter denoting the angle between the surfaces on which two random points are located.The latter parameter is ac-tually computed as an inner product of the surface normal vectors.In their evaluation experiment this shape distribu-tion function outperformed the D2distribution function at about1.5times higher computational costs.Ohbuchi et al.[65]improved this method further by a multi-resolution ap-proach computing a number of alpha-shapes at different scales,and computing for each alpha-shape their Absolute Angle-Distance descriptor.Their experimental results show that this approach outperforms the Angle-Distance descrip-tor at the cost of high processing time needed to compute the alpha-shapes.Shape distributions distinguish models in broad cate-gories very well:aircraft,boats,people,animals,etc.How-ever,they perform often poorly when having to discrimi-nate between shapes that have similar gross shape proper-ties but vastly different detailed shape properties.3.1.3.Spatial map based similaritySpatial maps are representations that capture the spatial lo-cation of an object.The map entries correspond to physi-cal locations or sections of the object,and are arranged in a manner that preserves the relative positions of the features in an object.Spatial maps are in general not invariant to ro-tations,except for specially designed maps.Therefore,typ-ically a pose normalization is donefirst.Ankerst et al.[5]use shape histograms as a means of an-alyzing the similarity of3D molecular surfaces.The his-tograms are not built from volume elements but from uni-formly distributed surface points taken from the molecular surfaces.The shape histograms are defined on concentric shells and sectors around a model’s centroid and compare shapes using a quadratic form distance measure to compare the histograms taking into account the distances between the shape histogram bins.Vrani´c et al.[85]describe a surface by associating to each ray from the origin,the value equal to the distance to the last point of intersection of the model with the ray and compute spherical harmonics for this spherical extent func-tion.Spherical harmonics form a Fourier basis on a sphere much like the familiar sine and cosine do on a line or a cir-cle.Their method requires pose normalization to provide rotational invariance.Also,Yu et al.[86]propose a descrip-tor similar to a spherical extent function and a descriptor counting the number of intersections of a ray from the ori-gin with the model.In both cases the dissimilarity between two shapes is computed by the Euclidean distance of the Fourier transforms of the descriptors of the shapes.Their method requires pose normalization to provide rotational in-variance.Kazhdan et al.[43]present a general approach based on spherical harmonics to transform rotation dependent shape descriptors into rotation independent ones.Their method is applicable to a shape descriptor which is defined as either a collection of spherical functions or as a function on a voxel grid.In the latter case a collection of spherical functions is obtained from the function on the voxel grid by restricting the grid to concentric spheres.From the collection of spher-ical functions they compute a rotation invariant descriptor by(1)decomposing the function into its spherical harmon-ics,(2)summing the harmonics within each frequency,and computing the L2-norm for each frequency component.The resulting shape descriptor is a2D histogram indexed by ra-dius and frequency,which is invariant to rotations about the center of the mass.This approach offers an alternative for pose normalization,because their method obtains rotation invariant shape descriptors.Their experimental results show indeed that in general the performance of the obtained ro-tation independent shape descriptors is better than the cor-responding normalized descriptors.Their experiments in-clude the ray-based spherical harmonic descriptor proposed by Vrani´c et al.[85].Finally,note that their approach gen-eralizes the method to compute voxel-based spherical har-monics shape descriptor,described by Funkhouser et al.[30],which is defined as a binary function on the voxel grid, where the value at each voxel is given by the negatively ex-ponentiated Euclidean Distance Transform of the surface of a3D model.Novotni and Klein[61]present a method to compute 3D Zernike descriptors from voxelized models as natural extensions of spherical harmonics based descriptors.3D Zernike descriptors capture object coherence in the radial direction as well as in the direction along a sphere.Both 3D Zernike descriptors and spherical harmonics based de-scriptors achieve rotation invariance.However,by sampling the space only in radial direction the latter descriptors donot capture object coherence in the radial direction,as illus-trated byfig.4.The limited experiments comparing spherical harmonics and3D Zernike moments performed by Novotni and Klein show similar results for a class of planes,but better results for the3D Zernike descriptor for a class of chairs.Vrani´c[84]expects that voxelization is not a good idea, because manyfine details are lost in the voxel grid.There-fore,he compares his ray-based spherical harmonic method [85]and a variation of it using functions defined on concen-tric shells with the voxel-based spherical harmonics shape descriptor proposed by Funkhouser et al.[30].Also,Vrani´c et al.[85]accomplish pose normalization using the so-called continuous PCA algorithm.In the paper it is claimed that the continuous PCA is better as the conventional PCA and better as the weighted PCA,which takes into account the differing sizes of the triangles of a mesh.In contrast with Kazhdan’s experiments[43]the experiments by Vrani´c show that for ray-based spherical harmonics using the con-tinuous PCA without voxelization is better than using rota-tion invariant shape descriptors obtained using voxelization. Perhaps,these results are opposite to Kazhdan results,be-cause of the use of different methods to compute the PCA or the use of different databases or both.Kriegel et al.[46,47]investigate similarity for voxelized models.They obtain a spatial map by partitioning a voxel grid into disjoint cells which correspond to the histograms bins.They investigate three different spatial features asso-ciated with the grid cells:(1)volume features recording the fraction of voxels from the volume in each cell,(2) solid-angle features measuring the convexity of the volume boundary in each cell,(3)eigenvalue features estimating the eigenvalues obtained by the PCA applied to the voxels of the model in each cell[47],and a fourth method,using in-stead of grid cells,a moreflexible partition of the voxels by cover sequence features,which approximate the model by unions and differences of cuboids,each containing a number of voxels[46].Their experimental results show that the eigenvalue method and the cover sequence method out-perform the volume and solid-angle feature method.Their method requires pose normalization to provide rotational in-variance.Instead of representing a cover sequence with a single feature vector,Kriegel et al.[46]represent a cover sequence by a set of feature vectors.This approach allows an efficient comparison of two cover sequences,by compar-ing the two sets of feature vectors using a minimal match-ing distance.The spatial map based approaches show good retrieval results.But a drawback of these methods is that partial matching is not supported,because they do not encode the relation between the features and parts of an object.Fur-ther,these methods provide no feedback to the user about why shapes match.3.1.4.Local feature based similarityLocal feature based methods provide various approaches to take into account the surface shape in the neighbourhood of points on the boundary of the shape.Shum et al.[74]use a spherical coordinate system to map the surface curvature of3D objects to the unit sphere. By searching over a spherical rotation space a distance be-tween two curvature distributions is computed and used as a measure for the similarity of two objects.Unfortunately, the method is limited to objects which contain no holes, i.e.have genus zero.Zaharia and Prˆe teux[87]describe the 3D Shape Spectrum Descriptor,which is defined as the histogram of shape index values,calculated over an en-tire mesh.The shape index,first introduced by Koenderink [44],is defined as a function of the two principal curvatures on continuous surfaces.They present a method to compute these shape indices for meshes,byfitting a quadric surface through the centroids of the faces of a mesh.Unfortunately, their method requires a non-trivial preprocessing phase for meshes that are not topologically correct or not orientable.Chua and Jarvis[18]compute point signatures that accu-mulate surface information along a3D curve in the neigh-bourhood of a point.Johnson and Herbert[41]apply spin images that are2D histograms of the surface locations around a point.They apply spin images to recognize models in a cluttered3D scene.Due to the complexity of their rep-resentation[18,41]these methods are very difficult to ap-ply to3D shape matching.Also,it is not clear how to define a dissimilarity function that satisfies the triangle inequality.K¨o rtgen et al.[45]apply3D shape contexts for3D shape retrieval and matching.3D shape contexts are semi-local descriptions of object shape centered at points on the sur-face of the object,and are a natural extension of2D shape contexts introduced by Belongie et al.[9]for recognition in2D images.The shape context of a point p,is defined as a coarse histogram of the relative coordinates of the re-maining surface points.The bins of the histogram are de-。
scale-invariant attack method -回复
scale-invariant attack method -回复什么是尺度不变的攻击方法及其重要性?尺度不变的攻击方法是一种在计算机视觉和图像处理领域中广泛应用的方法。
它通过在不同尺度上进行图像分析和处理,使得算法能够在不同分辨率的图像上达到相似的效果。
尺度不变的攻击方法旨在提高图像处理算法的大视场性能,并能够在计算机视觉任务中提供更加稳定和可靠的结果。
尺度不变的攻击方法被广泛应用于人脸识别、目标检测、图像分类以及图像分割等领域。
在人脸识别中,人脸图像可能会以不同尺度出现,例如近距离和远距离拍摄的人脸图像具有不同的分辨率。
使用尺度不变的攻击方法可以使算法能够在不同尺度下识别人脸,并提供更加准确的结果。
同样,在目标检测任务中,尺度不变的攻击方法可以帮助算法在不同尺度上检测目标物体,使其具有更好的适应性和鲁棒性。
尺度不变的攻击方法的重要性体现在以下几个方面:1. 适应各种尺度:尺度不变的攻击方法可以在不同尺度的图像上实现相似的处理效果,使算法能够适应不同分辨率的图像。
这样可以增强算法的适应性,使其在不同环境和场景下都能取得良好的表现。
2. 鲁棒性和可靠性:尺度不变的攻击方法能够提高算法对于图像尺度变换的鲁棒性。
无论是图像的缩放、拉伸还是旋转,尺度不变的方法能够使算法具有更好的性能,减少因尺度变换导致的识别或检测错误。
3. 提高算法性能:尺度不变的攻击方法可以提高图像处理算法的准确性和稳定性。
通过在不同尺度上对图像进行分析和处理,算法可以获取更多的图像信息,从而提高图像处理的效果和成功率。
4. 实用性和广泛性:尺度不变的攻击方法被广泛应用于不同领域的计算机视觉任务中,如人脸识别、目标检测、图像分类等。
这些任务对于算法的尺度不变性要求较高,因此尺度不变的攻击方法在实际应用中具有重要的实用性和广泛性。
尺度不变的攻击方法实现的主要步骤如下:1. 尺度空间构建:在尺度不变的攻击方法中,一种常见的技术是使用高斯金字塔或拉普拉斯金子塔构建尺度空间。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Selection of Scale-Invariant Parts for Object Class RecognitionGy.Dork´o and C.SchmidINRIA Rhˆo ne-Alpes,GRA VIR-CNRS655,av.de l’Europe,38330Montbonnot,Francedorko,schmid@inrialpes.frAbstractThis paper introduces a novel method for constructing and selecting scale-invariant object parts.Scale-invariant local descriptors arefirst grouped into basic parts.A clas-sifier is then learned for each of these parts,and feature selection is used to determine the most discriminative ones. This approach allows robust part detection,and it is invari-ant under scale changes—that is,neither the training im-ages nor the test images have to be normalized.The proposed method is evaluated in car detection tasks with significant variations in viewing conditions,and promising results are demonstrated.Different local re-gions,classifiers and feature selection methods are quanti-tatively compared.Our evaluation shows that local invari-ant descriptors are an appropriate representation for object classes such as cars,and it underlines the importance of feature selection.1.IntroductionRecognizing general object classes and parts is one of the most challenging problems in computer vision.The combi-nation of computer vision and machine learning techniques has recently led to significant progress[1,17,18],but exist-ing approaches are based onfixed-size windows and do not make use of recent advances in scale-invariant local feature extraction[6,8].Thus,they require normalizing the train-ing and test images.We propose in this paper a method for selecting discrimi-native scale-invariant object parts.Figure1(a)demonstrates the importance of feature selection.It shows the output of a scale-invariant operator forfinding significant circu-lar patches in images[6].In this context,it is natural to define object parts in terms of clusters of patches with sim-ilar brightness patterns.However,consider the two patchesmarked in black in thefigure.The corresponding patterns are very close,but one of the patches lies on a car,while the other lies in the background.This shows that the cor-responding part is not discriminative for cars(in this en-vironment at least).To demonstrate the effect of the pro-posed feature selection method,Fig.1(b)shows the initially detected features(white)and discriminative descriptors de-termined by feature selection(black).These are the ones which should be used in afinal,robust detectionsystem.(a)(b)Figure1.Illustration of feature selection(seetext for details).1.1.Related WorkMost appearance-based approaches to object class recognition characterize the object by its whole image [9,15].They are not robust to occlusion and also suffer 1from a lack of invariance.Furthermore,these methods are only applicable to rigid objects and either they require pre-liminary segmentation,or windows have to be extracted for different locations,scales and rotations.The representa-tion is also high-dimensional,therefore many learning tech-niques cannot be used.To overcome these problems the use of local features is becoming increasingly popular for object detection and recognition.Weber et al.[18]use localized image patches and ex-plicitly compute their joint spatial probability distribution. This approach has recently been extended to include scale-invariant image regions[11].Agarwal and Roth[1]first learn a vocabulary of parts,determine spatial relations on these parts and use them to train a Sparse Network of Win-nows(SNoW)learning zebnik et al.[5]take advantage of local affine invariants to represent textures.Some recent methods combine feature selection and lo-cal features.Viola and Jones[17]select rectangular(Haar-like)features with an AdaBoost trained classifier.Chen et al.[3]also use this boosting approach for components learned by local non-negative matrix factorization.Amit and Geman[2]employ small,localized and oriented edges and combine them with decision trees.Mahamud and Hebert[7]select discriminative object parts and develop an optimal distance measure for nearest neighbor search. Rikert et al.[12]use a mixture model,but only keep the discriminative clusters and Schmid[14]selects significant texture descriptors in a weakly supervised framework.Both approaches select features based on their likelihood.Ull-mann et al.[16]use image fragments and combine them with a linear discriminative type classification rule.Their selection algorithm is based on mutual information.1.2.OverviewThefirst step of our approach is the detection of scale-invariant interest points(regions)and the computation ofa rotation-invariant descriptor for each region(cf.section2.1).These descriptors are then clustered and we obtaina set of parts each of which is described by a classifier (cf.section2.2).Finally,we select a set of discriminative parts/classifiers(cf.section3).An experimental evaluation compares different approaches to region extraction,classi-fication and selection(cf.Section4).Finally in Section5 we conclude and outline our future work.2.Object-Part ClassifiersIn the following wefirst describe how to compute in-variant descriptors and then explain how to learn object part classifiers.2.1.Scale-Invariant DescriptorsTo obtain invariant descriptors we detect scale-invariant interest points(regions)and characterize each of them by a scale,rotation and illumination invariant descriptor.Scale-invariant detectors.We have used two differ-ent scale-invariant detectors:Harris-Laplace[8]and DoG (Difference-of-Gaussian)[6].Harris-Laplace detects multi-scale Harris points and then selects characteristic points in scale-space with the Laplace operator.DoG interest points[6]are local scale-space maxima of the Difference-of-Gaussian.The image locations(regions)selected by the two detec-tors differ:The DoG detectorfinds blob-like structures and Harris-Laplace detects corners and highly textured points. Examples for detection are shown in thefirst column of Fig-ure7.Scale and rotation invariant descriptors.The output of the two detectors are scale-invariant regions of different sizes.These regions arefirst mapped to circular regions of afixed-sized radius.Point neighborhoods which are larger than the normalized region,are smoothed before the size normalization.Rotation-invariance is obtained by rotation in the direction of the average gradient orientation(within a small point neighborhood).Affine illumination changes of the pixel intensities are eliminated by normal-ization of the image region with the mean and the standard deviation of the intensities within the point neighborhood. These normalized regions are then described by the SIFT descriptor(Scale Invariant Feature Transform)[6].SIFT is computed for8orientation planes and each gradient image is sampled over a4x4grid of locations.The resulting de-scriptor is of dimension128.2.2.ClassifiersObject-part classifiers are learned from sets of similar descriptors.These sets are obtained automatically by clus-tering local invariant descriptors.Figure2shows a few im-ages of two different clusters.The top row displays a“tire”cluster and the bottom row a“front window”cluster.We have used two types of classifiers:Support Vector Machines(SVMs)and classification based on a Gaussian mixture model(GMM).The training data consists of posi-tive and negative descriptors.Note that the descriptors are labeled manually.Support Vector Machine.Each object part is described by a separate SVM.A descriptor is classified as a part,if the the corresponding SVM has a positive response.The SVMs are trained as follows.Thefirst step is to de-termine groups of similar descriptors.We cluster the pos-itive training descriptors with a hierarchical clustering al-gorithm.The number of clusters is set to300.We thenFigure2.A few images of two different clusters.Thefirst row shows a cluster which represents “tires”.The second row shows a cluster which contains regions detected in the“front window”.learn a linear SVM[4]for each positive cluster.The SVM is trained with all descriptors of the positive cluster and a subset of the negative descriptors.This subset are the medi-ans of negative clusters.Note that this pre-selection of the negative samples is necessary.Otherwise we would have highly unbalanced training sets,which can not be handled by current state of the art SVM techniques.Gaussian mixture model.The distribution of the train-ing descriptors is described by a Gaussian mixture model.Each Gaussian corresponds to an “object-part”.A descriptor is assigned to the most likely Gaussian,i.e.it is classified as the corresponding part. Each is assumed to be a Gaussian with mean and covariance matrix.We use the EM algorithm to estimate the parameters of the mixture model,namely the means,covariances,and mixing weights.EM is initialized with the output of the-means algorithm.In this work,we use the600components to describe the train-ing set which includes positive and negative descriptors.We use all positive and randomly choose the same number of negative descriptors.We limit the number of free parame-ters in the optimization by using diagonal Gaussians.This restriction also helps prevent the covariance matrices from becoming singular.3.Feature SelectionGiven a set of classifiers,we want to rank them by their distinctiveness.Here,we use two different feature selection techniques:likelihood ratio and mutual information.These techniques assign a score to each classifier depending on its performance on a validation set.The two feature selection methods are based on the prob-abilities described in the following.Let be a classifier and the object to be recognized(detected).is the probability that classifies a object descriptor correctly(i.e the true positives for over the number of positives descriptors).is the probability of non-objects descriptors being accepted by classifier.Likelihood ratio.A classifier is representative of an ob-ject class if it is likely to be found in the class,but unlikely to be detected in non-class images.The likelihood ratio of classifier is defined by:Mutual information.Mutual information[10]selects in-formative features.Mutual information between the classifier and the object class is defined by:For both feature selection methods presented above,the higher the score the more relevant it is.The difference be-tween the two methods is illustrated by Figure3.The image is one of the test images and regions are detected with the DoG detector.The top4rows show the descriptors classified as object parts by the best SVM classifiers.We can see that the likelihood selects very specific features which are highly discriminative.For example there is no car feature detectedLikelihood Mutual Information1selected part,SVMno regions12correct+1incorrect5selected parts,SVM1correct16correct,6incorrect10selected parts,SVM2correct21correct,7incorrect25selected parts,SVM25correct,7incorrect74correct,33incorrect50selected parts,GMM7correct,1incorrect56correct,38incorrect100selected parts,GMM19correct,8incorrect86correct,83incorrect Figure parison of feature selection with likelihood ratio and mutual information.by the“best”classifier in the case of likelihood ratio.This feature is very specific and only detected on certain cars. In contrast mutual information selects informative features. For example thefirst selected features already classifies13 descriptors as object descriptors.Note that one of them is incorrect.If we look at the overall performance of the two feature selection methods,we can observe that the likeli-hood ratio performs slightly better than mutual information. Note however that fewer classifiers are used in the case of mutual information.This is confirmed by the images in the 2bottom rows which show the results for the best GMM classifiers as well as by the quantitative evaluation in Sec-tion4.Note that to obtain similar results for GMM we have to use more classifiers.This is due to the fact that there are twice as many classifiers and that they are in general more specific.4.ExperimentsIn the previous sections we have presented several tech-niques for the different steps of our approach.We now eval-uate these techniques in the context of car detection.We then present car detection results for a few test images.4.1.Set-upOur training database contains images of cars with a relatively large amount of background(more that on average).We have marked the cars in these images man-ually.Note that the car images can be at different scale levels and do not require normalization.We extract scale-invariant interest points(regions)with the DoG detector and Harris-Laplace.For DoG we obtained positive and negative regions.For Harris-Laplace we detectedpositive and negative regions.The test images were taken independently and contain unseen cars and unseen background.We have used im-ages which often contain several cars and a large amount of background.To evaluate and compare different meth-ods,we marked them manually.We therefore know that the test images contain positive and negative de-scriptor if the DoG detector is used and and descriptors for Harris-Laplace.4.2.Evaluation of Different MethodsIn the following we evaluate our approach and compare the performance of different techniques.The comparison criterion is true positive rate(the number of positive de-scriptors retrieved over the total number of positive descrip-tors)against false positive rate(the number of false positives over the total number of negatives descriptors).Classification and Feature selection.Figure4comparesthe performance of two different classification techniques and two different feature selection criteria.Regions are ex-tracted with the DoG detector.Fig.4shows the ROC curve (true positive rate against false positive rate).We can ob-serve that the combination of Gaussian mixture model and likelihood ratio performs best.The second best is the curve for SVM and likelihood ratio which performs slightly better than SVM and mutual information.Results for the combi-nation of mixture model and mutual information are signif-icantly worse.This can be explained by the fact that the classifiers are mostly specific.Fig.5(a)and (b)compare the criteria true positive rate and false negative rate separately as a function of the number of selected classifiers.As ex-pected mutual information has a higher true positive rate and the false negatives rate is better (lower)for the likeli-hood ratio.0.20.40.60.8100.20.40.60.81T r u e P o s i t i v e R a t eFalse Positive RateDoG, SVM, Likelihood DoG, SVM, Mutual Inf.DoG, GMM, Likelihood DoG, GMM, Mutual Inf.Figure parison of the performance of the likelihood ratio and the mutual informa-tion for SVM and GMM.Regions are extracted with the DoG detector.Descriptors.We have also compared the performance of the two detectors DoG and Harris-Laplace.Figure 6shows the results for Harris-Laplace.We can observe that the rank-ing of the different combinations of classifier and feature selection techniques are the same as for DoG.Furthermore,Harris-Laplace and DoG show a similar performance.How-ever,we have noticed that the behavior depends on the test image.Furthermore,Harris-Laplace detects less points on the background and therefore detects more true positives than DoG for a fixed number of false positives.4.3.Car Recognition/DetectionIn this section we illustrate the performance of our ap-proach with two examples.Figure 7shows results for DoG and Harris-Laplace as well as the two classification tech-niques.The first column displays the detected regions.The0.2 0.4 0.6 0.8 1 050100150200250300T r u e P o s i t i v e R a t ePartsDoG, SVM, Likelihood DoG, SVM, Mutual Inf.a0.2 0.4 0.6 0.8 1 050100150 200250300F a l s e P o s i t i v e R a t ePartsDoG, SVM, Likelihood DoG, SVM, Mutual Inf.bFigure parison of the performance of the likelihood ratio and the mutual informa-tion for SVM.Regions are extracted with the DoG detector.0.20.40.60.810.20.40.60.81T r u e P o s i t i v e R a t eFalse Positive RateHarris, SVM, Likelihood Harris, SVM, Mutual Inf.Harris, GMM, Likelihood Harris, GMM, Mutual Inf.Figure parison of the performance of the likelihood ratio and the mutual informa-tion for SVM and GMM.Regions are extracted with the Harris-Laplace detector.second column shows the results of the25best parts ob-tained with SVM and selected by the likelihood ratio.The third column displays the results of the100best parts for GMM and likelihood ratio.We can see that the method al-lows to select car features.It can be further improved by adding relaxation.Relaxation.The descriptors selected in Figure7are sparsely distributed over the object(car).We would like to obtain a dense feature map which permits segmentation of the object.Given the selected features,we can use the order of se-lection to assign a probability to each descriptor.A de-scriptor which is classified by a more discriminative feature is assigned a higher probability.We can then use relax-ation[13]to improve the classification of the descriptors. Relaxation reinforces or weakens the probabilities depend-ing on the probabilities of the nearest neighbors(5in our experiments).Figure8shows the descriptors classified as car features after applying the relaxation algorithm.Initial results based only on feature selection are shown in Figure7 (DoG,SVM and likelihood).Compared to these initial re-sults,we can clearly observe that more features are detected on the cars and less on the background,that is the overall performance is significantly improved.Further improve-ment is possible by integrating spatial constraints into the neighborhood relations of the relaxation process.5.Conclusion and Future workIn this paper,we have introduced a method for construct-ing object-part classifiers and selecting the most discrimi-nant ones.Object-parts are invariant to scale and rotation as well as illumination changes.Alignment of the training and test images is therefore not necessary.This paper has also illustrated the importance of feature selection and has compared different techniques.This comparison shows that likelihood is well suited for object recognition and mutual information for focus of attention mechanisms,that is rapid localization based on a few classifiers.Learning of the parts is unsupervised,but the descrip-tors are manually marked as positive and negative.We plan to extend the approach to the weakly supervised case where the descriptors are unlabeled and only the images are marked as positive or negative.This should be straightfor-ward in the case of classification with a Gaussian mixture model.AcknowledgmentsThis work was supported by the European project LA V A. We thank S.Agarwal for providing the carimages.Figure8.Improved results for object detec-tion by adding relaxation.References[1]S.Agarwal and D.Roth.Learning a sparse representationfor object detection.In ECCV,2002.[2]Y.Amit and D.Geman.A computational model for visualselection.Neural Computation,11(7):1691–1715,1999. [3]X.Chen,L.Gu,S.Li,and H.-J.Zhang.Learning represen-tative local features for face detection.In CVPR,2001. [4]N.Cristianini and J.Shawe-Taylor.Support Vector Ma-chines.Cambridge University Press,2000.[5]zebnik,C.Schmid,and J.Ponce.Sparse texture rep-resentation using affine-invariant neighborhoods.In CVPR, 2003.[6] D.G.Lowe.Object recognition from local scale-invariantfeatures.In ICCV,pages1150–1157,1999.[7]S.Mahamud and M.Hebert.The optimal distance measurefor object detection.In CVPR,2003.[8]K.Mikolajczyk and C.Schmid.Indexing based on scaleinvariant interest points.In ICCV,pages525–531,2001. [9] C.Papageorgiou and T.Poggio.A trainable system for ob-ject detection.IJCV,38(1):15–33,2000.[10] A.Papoulis.Probability,Random Variables,and StochasticProcesses.McGraw Hill,1991.[11]R.Fergus,P.Perona and A.Zisserman.Object class recog-nition by unsupervised scale-invariant learning.In CVPR, 2003.[12]T.Rikert,M.Jones,and P.Viola.A cluster-based statisti-cal model for object detection.In ICCV,pages1046–1053, 1999.[13] A.Rosenfeld,R.Hummel,and S.Zucker.Scene labeling byrelaxation operations.IEEE Transactions on Systems,Man and Cybernetics,6:420–433,1976.[14] C.Schmid.Constructing models for content-based imageretrieval.In CVPR,2001.[15]K.Sung and T.Poggio.Example-based learning for view-based human face detection.IEEE Transactions on PAMI, 20(1):39–51,1998.[16]S.Ullman,E.Sali,and M.Vidal-Naquet.A fragment-basedapproach to object representation and classification.In4th International Workshop on Visual Form,Capri,Italy,2001.[17]P.Viola and M.Jones.Rapid object detection using aboosted cascade of simple features.In CVPR,volume I, pages511–518,2001.[18]M.Weber,M.Welling,and P.Perona.Unsupervised learn-ing of models for recognition.In ECCV,pages18–32,2000.Detection25parts,SVM,Likelihood100parts,GMM,LikelihoodDoG508regions32correct,14incorrect18correct,8incorrectHarris-Laplace364regions49correct,37incorrect15correct,11incorrectDoG431regions30correct,18incorrect13correct,17incorrectHarris-Laplace277regions43correct,36incorrect32correct,13incorrect Figure7.Results for two test images.The left column shows the interest regions detected with DoG and Harris-Laplace.The middle column displays the descriptors classified by the25best SVM classifiers selected with the likelihood ratio.The right column shows the results for the100best GMM classifiers selected with likelihood ratio.。