Facets of Distributed Display Environments

合集下载

稠密匹配和稀疏匹配

稠密匹配和稀疏匹配稠密匹配（Dense Matching）和稀疏匹配（Sparse Matching）是计算机视觉领域中常用的两种图像匹配方法。

它们在图像处理、目标识别、三维重建等方面有着广泛的应用。

稠密匹配是指在图像中对每个像素进行匹配，得到其在另一幅图像中的对应像素。

这种方法的优势在于可以获得图像间的详细对应关系，从而可以进行高精度的图像配准、目标跟踪等任务。

稠密匹配常用的算法有块匹配算法、光流法等。

块匹配算法是一种常用的稠密匹配方法，它通过比较图像块之间的相似性来确定它们的对应关系。

在块匹配算法中，首先选择一个参考图像块，然后在另一幅图像中搜索与之最相似的块。

相似度通常使用块内像素的差异度量，如均方差或相关性来衡量。

通过遍历图像中的每个像素，可以得到整幅图像的稠密匹配结果。

光流法是另一种常见的稠密匹配方法，它基于图像中像素的运动信息来确定它们的对应关系。

光流法假设图像中的像素在连续帧之间的运动是连续的，并且可以通过一定的数学模型来描述。

通过求解光流方程，可以得到图像中每个像素在下一帧中的位置，从而实现稠密匹配。

与稠密匹配相对应的是稀疏匹配。

稀疏匹配是指只对图像中的一部分像素进行匹配，得到其在另一幅图像中的对应像素。

稀疏匹配的优势在于计算量较小，适用于实时性要求较高的应用场景。

稀疏匹配常用的算法有SIFT、SURF等。

SIFT（Scale-Invariant Feature Transform）是一种常用的稀疏匹配算法，它通过在图像中提取关键点，并计算关键点的特征描述子来实现匹配。

SIFT算法具有尺度不变性和旋转不变性，可以在不同尺度和旋转条件下进行稳定的匹配。

SURF（Speeded Up Robust Features）是另一种常见的稀疏匹配算法，它是对SIFT算法的改进。

SURF算法通过加速图像特征的提取和匹配过程，提高了匹配的速度和精度。

SURF算法在图像匹配、目标检测等方面有着广泛的应用。

Fast 3D Recognition and Pose Using the Viewpoint Feature Histogram

Fast3D Recognition and Pose Using the Viewpoint Feature Histogram Radu Bogdan Rusu,Gary Bradski,Romain Thibaux,John HsuWillow Garage68Willow Rd.,Menlo Park,CA94025,USA{rusu,bradski,thibaux,hsu}@Abstract—We present the Viewpoint Feature Histogram (VFH),a descriptor for3D point cloud data that encodes geometry and viewpoint.We demonstrate experimentally on a set of60objects captured with stereo cameras that VFH can be used as a distinctive signature,allowing simultaneous recognition of the object and its pose.The pose is accurate enough for robot manipulation,and the computational cost is low enough for real time operation.VFH was designed to be robust to large surface noise and missing depth information in order to work reliably on stereo data.I.I NTRODUCTIONAs part of a long term goal to develop reliable capabilities in the area of perception for mobile manipulation,we address a table top manipulation task involving objects that can be manipulated by one robot hand.Our robot is shown in Fig.1. In order to manipulate an object,the robot must reliably identify it,as well as its6degree-of-freedom(6DOF)pose. This paper proposes a method to identify both at the same time,reliably and at high speed.We make the following assumptions.•Objects are rigid and relatively Lambertian.They can be shiny,but not reﬂective or transparent.•Objects are in light clutter.They can be easily seg-mented in3D and can be grabbed by the robot hand without obstruction.•The item of interest can be grabbed directly,so it is not occluded.•Items can be grasped even given an approximate pose.The gripper on our robot can open to9cm and each grip is2.5cm wide which allows an object8.5cm wide object to be grasped when the pose is off by+/-10 degrees.Despite these assumptions our problem has several prop-erties that make the task difﬁcult.•The objects need not contain texture.•Our dataset includes objects of very similar shapes,for example many slight variations of typical wine glasses.•To be usable,the recognition accuracy must be very high,typically much higher than,say,for image retrieval tasks,since false positives have very high costs and so must be kept extremely rare.•To interact usefully with humans,recognition cannot take more than a fraction of a second.This puts constraints on computation,but more importantly this precludes the use of accurate but slow3Dacquisition Fig.1.A PR2robot from Willow Garage,showing its grippers and stereo camerasusing lasers.Instead we rely on stereo data,which suffers from higher noise and missing data.Our focus is perception for mobile manipulation.Working on a mobile versus a stationary robot means that we can’t depend on instrumenting the external world with active vision systems or special lighting,but we can put such devices on the robot.In our case,we use projected texture1 to yield dense stereo depth maps at30Hz.We also cannot ensure environmental conditions.We may move from a sunlit room to a dim hallway into a room with no light at all.The projected texture gives us a fair amount of resilience to local lighting conditions as well.1Not structured light,this is random textureAlthough this paper focuses on3D depth features,2D imagery is clearly important,for example for shiny and transparent objects,or to distinguish items based on texture such as telling apart a Coke can from a Diet Coke can.In our case,the textured light alternates with no light to allow for2D imagery aligned with the texture based dense depth, however adding2D visual features will be studied in future work.Here,we look for an effective purely3D feature. Our philosophy is that one should use or design a recogni-tion algorithm thatﬁts one’s engineering needs such as scal-ability,training speed,incremental training needs,and so on, and thenﬁnd features that make the recognition performance of that architecture meet one’s speciﬁcations.For reasons of online training,and because of large memory availability, we choose fast approximate K-Nearest Neighbors(K-NN) implemented in the FLANN library[1]as our recognition architecture.The key contribution of this paper is then the design of a new,computationally efﬁcient3D feature that yields object recognition and6DOF pose.The structure of this paper is as follows:Related work is described in Section II.Next,we give a brief description of our system architecture in Section III.We discuss our surface normal and segmentation algorithm in Section IV followed by a discussion of the Viewpoint Feature Histogram in Section V.Experimental setup and resulting computational and recognition performance are described in Section VI. Conclusions and future work are discussed in Section VII.II.R ELATED W ORKThe problem that we are trying to solve requires global (3D object level)classiﬁcation based on estimated features. This has been under investigation for a long time in various researchﬁelds,such as computer graphics,robotics,and pattern matching,see[2]–[4]for comprehensive reviews.We address the most relevant work below.Some of the widely used3D point feature extraction approaches include:spherical harmonic invariants[5],spin images[6],curvature maps[7],or more recently,Point Feature Histograms(PFH)[8],and conformal factors[9]. Spherical harmonic invariants and spin images have been successfully used for the problem of object recognition for densely sampled datasets,though their performance seems to degrade for noisier and sparser datasets[4].Our stereo data is noisier and sparser than typical line scan data which motivated the use of our new features.Conformal factors are based on conformal geometry,which is invariant to isometric transformations,and thus obtains good results on databases of watertight models.Its main drawback is that it can only be applied to manifold meshes which can be problematic in stereo.Curvature maps and PFH descriptors have been studied in the context of local shape comparisons for data registration.A side study[10]applied the PFH descriptors to the problem of surface classiﬁcation into3D geometric primitives,although only for data acquired using precise laser sensors.A different pointﬁngerprint representation using the projections of geodesic circles onto the tangent plane at a point p i was proposed in[11]for the problem of surface registration.As the authors note,geodesic distances are more sensitive to surface sampling noise,and thus are unsuitable for real sensed data without a priori smoothing and reconstruction.A decomposition of objects into parts learned using spin images is presented in[12]for the problem of vehicle identiﬁcation.Methods relying on global features include descriptors such as Extended Gaussian Images(EGI)[13],eigen shapes[14],or shape distributions[15].The latter samples statistics of the entire object and represents them as distri-butions of shape properties,however they do not take into account how the features are distributed over the surface of the object.Eigen shapes show promising results but they have limits on their discrimination ability since important higher order variances are discarded.EGIs describe objects based on the unit normal sphere,but have problems handling arbitrarily curved objects.The work in[16]makes use of spin-image signatures and normal-based signatures to achieve classiﬁcation rates over 90%with synthetic and CAD model datasets.The datasets used however are very different than the ones acquired using noisy640×480stereo cameras such as the ones used in our work.In addition,the authors do not provide timing information on the estimation and matching parts which is critical for applications such as ours.A system for fully automatic3D model-based object recognition and segmentation is presented in[17]with good recognition rates of over95%for a database of55objects.Unfortunately,the computational performance of the proposed method is not suitable for real-time as the authors report the segmentation of an object model in a cluttered scene to be around2 minutes.Moreover,the objects in the database are scanned using a high resolution Minolta scanner and their geometric shapes are very different.As shown in Section VI,the objects used in our experiments are much more similar in terms of geometry,so such a registration-based method would fail. In[18],the authors propose a system for recognizing3D objects in photographs.The techniques presented can only be applied in the presence of texture information,and require a cumbersome generation of models in an ofﬂine step,which makes this unsuitable for our work.As previously presented,our requirements are real-time object recognition and pose identiﬁcation from noisy real-world datasets acquired using projective texture stereo cam-eras.Our3D object classiﬁcation is based on an extension of the recently proposed Fast Point Feature Histogram(FPFH) descriptors[8],which record the relative angular directions of surface normals with respect to one another.The FPFH performs well in classiﬁcation applications and is robust to noise but it is invariant to viewpoint.This paper proposes a novel descriptor that encodes the viewpoint information and has two parts:(1)an extended FPFH descriptor that achieves O(k∗n)to O(n)speed up over FPFHs where n is the number of points in the point cloud and k is how many points used in each local neighborhood;(2)a new signature that encodes important statistics between the viewpoint and the surface normals on the object.We callthis new feature the Viewpoint Feature Histogram(VFH)as detailed below.III.A RCHITECTUREOur system architecture employs the following processing steps:•Synchronized,calibrated and epipolar aligned left and right images of the scene are acquired.•A dense depth map is computed from the stereo pair.•Surface normals in the scene are calculated.•Planes are identiﬁed and segmented out and the remain-ing point clouds from non-planar objects are clustered in Euclidean space.•The Viewpoint Feature Histogram(VFH)is calculated over large enough objects(here,objects having at least 100points).–If there are multiple objects in a scene,they are processed front to back relative to the camera.–Occluded point clouds with less than75%of the number of points of the frontal objects are notedbut not identiﬁed.•Fast approximate K-NN is used to classify the object and its view.Some steps from the early processing pipeline are shown in Figure2.Shown left to right,top to bottom in thatﬁgure are: a moderately complex scene with many different vertical and horizontal surfaces,the resulting depth map,the estimated surface normals and the objects segmented from the planar surfaces in thescene.Fig.2.Early processing steps row wise,top to bottom:A scene,its depthmap,surface normals and segmentation into planes and outlier objects.For computing3D depth maps,we use640x480stereowith textured light.The textureﬂashes on only very brieﬂyas the cameras take a picture resulting in lights that look dimto the human eye but bright to the camera.Textureﬂashesonly every other frame so that raw imagery without texturecan be gathered alternating with densely textured scenes.Thestereo has a38degreeﬁeld of view and is designed for closein manipulation tasks,thus the objects that we deal with arefrom0.5to1.5meters away.The stereo algorithm that weuse was developed in[19]and uses the implementation in theOpenCV library[20]as described in detail in[21],runningat30Hz.IV.S URFACE N ORMALS AND3D S EGMENTATIONWe employ segmentation prior to the actual feature es-timation because in robotic manipulation scenarios we areonly interested in certain precise parts of the environment,and thus computational resources can be saved by tacklingonly those parts.Here,we are looking to manipulate reach-able objects that lie on horizontal surfaces.Therefore,oursegmentation scheme proceeds at extracting these horizontalsurfacesﬁrst.Fig.3.From left to right:raw point cloud dataset,planar and clustersegmentation,more complex segmentation.Compared to our previous work[22],we have improvedthe planar segmentation algorithms by incorporating surfacenormals into the sample selection and model estimationsteps.We also took care to carefully build SSE aligneddata structures in memory for any computationally expensiveoperation.By rejecting candidates which do not supportour constraints,our system can segment data at about7Hz,including normal estimation,on a regular Core2Duo laptopusing a single core.To get frame rate performance(realtime),we use a voxelized data structure over the input point cloudand downsample with a leaf size of0.5cm.The surfacenormals are therefore estimated only for the downsampledresult,but using the information in the original point cloud.The planar components are extracted using a RMSAC(Ran-domized MSAC)method that takes into account weightedaverages of distances to the model together with the angleof the surface normals.We then select candidate table planesusing a heuristic combining the number of inliers whichsupport the planar model as well as their proximity to thecamera viewpoint.This approach emphasizes the part of thespace where the robot manipulators can reach and grasp theobjects.The segmentation of object candidates supported by thetable surface is performed by looking at points whose projec-tion falls inside the bounding2D polygon for the table,andapplying single-link clustering.The result of these processingsteps is a set of Euclidean point clusters.This works toreliably segment objects that are separated by about half theirminimum radius from each other.An can be seen in Figure3.To resolve further ambiguities with to the chosen candidate clusters,such as objects stacked on other planar objects(such as books),we repeat the mentioned step by treating each additional horizontal planar structure on top of the table candidates as a table itself and repeating the segmentation step(see results in Figure3).We emphasize that this segmentation step is of extreme importance for our application,because it allows our methods to achieve favorable computational performances by extract-ing only the regions of interest in a scene(i.e.,objects that are to be manipulated,located on horizontal surfaces).In cases where our“light clutter”assumption does not hold and the geometric Euclidean clustering is prone to failure, a more sophisticated segmentation scheme based on texture properties could be implemented.V.V IEWPOINT F EATURE H ISTOGRAMIn order to accurately and robustly classify points with respect to their underlying surface,we borrow ideas from the recently proposed Point Feature Histogram(PFH)[10]. The PFH is a histogram that collects the pairwise pan,tilt and yaw angles between every pair of normals on a surface patch (see Figure4).In detail,for a pair of3D points p i,p j ,and their estimated surface normals n i,n j ,the set of normal angular deviations can be estimated as:α=v·n jφ=u·(p j−p i)dθ=arctan(w·n j,u·n j)(1)where u,v,w represent a Darboux frame coordinate system chosen at p i.Then,the Point Feature Histogram at a patch of points P={p i}with i={1···n}captures all the sets of α,φ,θ between all pairs of p i,p j from P,and bins the results in a histogram.The bottom left part of Figure4 presents the selection of the Darboux frame and a graphical representation of the three angular features.Because all possible pairs of points are considered,the computation complexity of a PFH is O(n2)in the number of surface normals n.In order to make a more efﬁcient algorithm,the Fast Point Feature Histogram[8]was de-veloped.The FPFH measures the same angular features as PFH,but estimates the sets of values only between every point and its k nearest neighbors,followed by a reweighting of the resultant histogram of a point with the neighboring histograms,thus reducing the computational complexity to O(k∗n).Our past work[22]has shown that a global descriptor (GFPFH)can be constructed from the classiﬁcation results of many local FPFH features,and used on a wide range of confusable objects(20different types of glasses,bowls, mugs)in500scenes achieving96.69%on object class recognition.However,the categorized objects were only split into4distinct classes,which leaves the scaling problem open.Moreover,the GFPFH is susceptible to the errors of the local classiﬁcation results,and is more cumbersome to estimate.In any case,for manipulation,we require that the robot not only identiﬁes objects,but also recognizes their6DOF poses for grasping.FPFH is invariant both to object scale (distance)and object pose and so cannot achieve the latter task.In this work,we decided to leverage the strong recognition results of FPFH,but to add in viewpoint variance while retaining invariance to scale,since the dense stereo depth map gives us scale/distance directly.Our contribution to the problem of object recognition and pose identiﬁcation is to extend the FPFH to be estimated for the entire object cluster (as seen in Figure4),and to compute additional statistics between the viewpoint direction and the normals estimated at each point.To do this,we used the key idea of mixing the viewpoint direction directly into the relative normal angle calculation in the FPFH.Figure6presents this idea with the new feature consisting of two parts:(1)a viewpoint direction component(see Figure5)and(2)a surface shape component comprised of an extended FPFH(see Figure4).The viewpoint component is computed by collecting a histogram of the angles that the viewpoint direction makes with each normal.Note,we do not mean the view angle to each normal as this would not be scale invariant,but instead we mean the angle between the central viewpoint direction translated to each normal.The second component measures the relative pan,tilt and yaw angles as described in[8],[10] but now measured between the viewpoint direction at the central point and each of the normals on the surface.We call the new assembled feature the Viewpoint Feature Histogram (VFH).Figure6presents the resultant assembled VFH for a random object.piαFig.5.The Viewpoint Feature Histogram is created from the extendedFast Point Feature Histogram as seen in Figure4together with the statisticsof the relative angles between each surface normal to the central viewpointdirection.The computational complexity of VFH is O(n).In ourexperiments,we divided the viewpoint angles into128binsand theα,φandθangles into45bins each or a total of263dimensions.The estimation of a VFH takes about0.3ms onaverage on a2.23GHz single core of a Core2Duo machineusing optimized SSE instructions.p 7p p 8p 9p 10p 11p 5p 1p p 3p 4cn c =uun 5v=(p 5-c)×u w=u ×vc p 5wv αφθFig.4.The extended Fast Point Feature Histogram collects the statistics of the relative angles between the surface normals at each point to the surface normal at the centroid of the object.The bottom left part of the ﬁgure describes the three angular feature for an example pair of points.Viewpoint componentextended FPFH componentFig.6.An example of the resultant Viewpoint Feature Histogram for one of the objects used.Note the two concatenated components.VI.V ALIDATION AND E XPERIMENTAL R ESULTS To evaluate our proposed descriptor and system archi-tecture,we collected a large dataset consisting of over 60IKEA kitchenware objects as show in Figure 8.These objects consisted of many kinds each of:wine glasses,tumblers,drinking glasses,mugs,bowls,and a couple of boxes.In each of these categories,many of the objects were distinguished only by subtle variations in shape as can be seen for example in the confusions in Figure 10.We captured over 54000scenes of these objects by spinning them on a turn table 180◦2at each of 2offsets on a platform that tilted 0,8,16,22and 30degrees.Each 180◦rotation was captured with about 90images.The turn table is shown in Fig.7.We additionally worked with a subset of 20objects in 500lightly cluttered scenes with varying arrangements of horizontal and vertical surfaces,using the same data set provided by in [22].No2Wedidn’t go 360degrees so that we could keep the calibration box inviewFig.7.The turn table used to collect views of objects with known orientation.pose information was available for this second dataset so we only ran experiments separately for object recognition results.The complete source code used to generate our experimen-tal results together with both object databases are available under a BSD open source license in our ROS repository at Willow Garage 3.We are currently taking steps towards creating a web page with complete tutorials on how to fully replicate the experiments presented herein.Both the objects in the [22]dataset as well as the ones we acquired,constitute valid examples of objects of daily use that our robot needs to be able to reliably identify and manipulate.While 60objects is far from the number of objects the robot eventually needs to be able to recognize,it may be enough if we assume that the robot knows what3Fig.8.The complete set of IKEA objects used for the purpose of our experiments.All transparent glasses have been painted white to obtain3D information during the acquisition process.TABLE IR ESULTS FOR OBJECT RECOGNITION AND POSE DETECTION OVER 54000SCENES PLUS500LIGHTLY CLUTTERED SCENES.Object PoseMethod Recognition EstimationVFH98.52%98.52%Spin75.3%61.2%context(kitchen table,workbench,coffee table)it is in, so that it needs only discriminate among a small context dependent set of objects.The geometric variations between objects are subtle,and the data acquired is noisy due to the stereo sensor character-istics,yet the perception system has to work well enough to differentiate between,say,glasses that look similar but serve different purposes(e.g.,a wine glass versus a brandy glass). As presented in Section II,the performance of the3D descriptors proposed in the literature degrade on noisier datasets.One of the most popular3D descriptor to date used on datasets acquired using sensing devices similar to ours (e.g.,similar noise characteristics)is the spin image[6].To validate the VFH feature we thus compare it to the spin image,by running the same experiments multiple times. For the reasons given in Section I,we base our recogni-tion architecture on fast approximate K-Nearest Neighbors (KNN)searches using kd-trees[1].The construction of the tree and the search of the nearest neighbors places an equal weight on each histogram bin in the VFH and spin images features.Figure11shows time stop sequentially aggregated exam-ples of the training set.Figure12shows example recognition results for VFH.Andﬁnally,Figure10gives some idea of the performance differences between VFH and spin images. The object recognition rates over the lightly cluttered dataset were98.1%for VFH and73.2%for spin images.The overall recognition rates for VFH andSpin imagesare shown inTable I where VFH handily outperforms spin images for both object recognition and pose.Fig.9.Data training performed in simulation.Theﬁgure presents a snapshot of the simulation with a water bottle from the object model database and the corresponding stereo point cloud output.VII.C ONCLUSIONS AND F UTURE W ORKIn this paper we presented a novel3D feature descriptor, the Viewpoint Feature Histogram(VFH),useful for object recognition and6DOF pose identiﬁcation for application where a priori segmentation is possible.The high recognition performance and fast computational properties,demonstrated the superiority of VFH over spin images on a large scale dataset consisting of over54000scenes with pared to other similar initiatives,our architecture works well with noisy data acquired using standard stereo cameras in real-time,and can detect subtle variations in the geometry of objects.Moreover,we presented an integrated approach for both recognition and6DOF pose identiﬁcation for untextured objects,the latter being of extreme importance for mobile manipulation and grasping applications.Fig.10.VFH consistently outperforms spin images for both recognition and for pose.The bottom of the ﬁgure presents an example result of VFH run on a mug.The bottom left corner is the learned models and the matches go from best to worse from left to right across the bottom followed by left to right across the top.The top part of the ﬁgure presents the results obtained using a spin image.For VFH,3of 5object recognition and 3of 5pose results are correct.For spin images,2of 5object recognition results are correct and 0of 5pose results arecorrect.Fig.11.Sequence examples of object training with calibration box on the outside.An automatic training pipeline can be integrated with our 3D simulator based on Gazebo [23]as depicted in ﬁgure 9,where the stereo point cloud is generated from perfectly rectiﬁed camera images.We are currently working on making both the fully an-notated database of objects together with the source codeof VFH available to the research community as open source.The preliminary results of our efforts can already be checked from the trunk of our Willow Garage ROS repository,but we are taking steps towards generating a set of tutorials on how to replicate and extend the experiments presented in this paper.R EFERENCES[1]M.Muja and D.G.Lowe,“Fast approximate nearest neighbors withautomatic algorithm conﬁguration,”VISAPP ,2009.[2]J.W.Tangelder and R.C.Veltkamp,“A Survey of Content Based3D Shape Retrieval Methods,”in SMI ’04:Proceedings of the Shape Modeling International ,2004,pp.145–156.[3] A.K.Jain and C.Dorai,“3D object recognition:Representation andmatching,”Statistics and Computing ,vol.10,no.2,pp.167–182,2000.[4] A.D.Bimbo and P.Pala,“Content-based retrieval of 3D models,”ACM Trans.Multimedia mun.Appl.,vol.2,no.1,pp.20–43,2006.[5]G.Burel and H.H´e nocq,“Three-dimensional invariants and theirapplication to object recognition,”Signal Process.,vol.45,no.1,pp.1–22,1995.[6] A.Johnson and M.Hebert,“Using spin images for efﬁcient objectrecognition in cluttered 3D scenes,”IEEE Transactions on Pattern Analysis and Machine Intelligence ,May 1999.[7]T.Gatzke,C.Grimm,M.Garland,and S.Zelinka,“Curvature Mapsfor Local Shape Comparison,”in SMI ’05:Proceedings of the Inter-national Conference on Shape Modeling and Applications 2005(SMI’05),2005,pp.246–255.[8]R.B.Rusu,N.Blodow,and M.Beetz,“Fast Point Feature Histograms(FPFH)for 3D Registration,”in ICRA ,2009.[9] B.-C.M.and G.C.,“Characterizing shape using conformal factors,”in Eurographics Workshop on 3D Object Retrieval ,2008.[10]R.B.Rusu,Z.C.Marton,N.Blodow,and M.Beetz,“LearningInformative Point Classes for the Acquisition of Object Model Maps,”in In Proceedings of the 10th International Conference on Control,Automation,Robotics and Vision (ICARCV),2008.[11]Y .Sun and M.A.Abidi,“Surface matching by 3D point’s ﬁngerprint,”in Proc.IEEE Int’l Conf.on Computer Vision ,vol.II,2001,pp.263–269.[12] D.Huber,A.Kapuria,R.R.Donamukkala,and M.Hebert,“Parts-based 3D object classiﬁcation,”in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 04),June 2004.[13] B.K.P.Horn,“Extended Gaussian Images,”Proceedings of the IEEE ,vol.72,pp.1671–1686,1984.[14]R.J.Campbell and P.J.Flynn,“Eigenshapes for 3D object recognitionin range data,”in Computer Vision and Pattern Recognition,1999.IEEE Computer Society Conference on.,pp.505–510.[15]R.Osada,T.Funkhouser,B.Chazelle,and D.Dobkin,“Shape distri-butions,”ACM Transactions on Graphics ,vol.21,pp.807–832,2002.[16]X.Li and I.Guskov,“3D object recognition from range images usingpyramid matching,”in ICCV07,2007,pp.1–6.[17] A.S.Mian,M.Bennamoun,and R.Owens,“Three-dimensionalmodel-based object recognition and segmentation in cluttered scenes,”IEEE Trans.Pattern Anal.Mach.Intell ,vol.28,pp.1584–1601,2006.[18] F.Rothganger,zebnik,C.Schmid,and J.Ponce,“3d objectmodeling and recognition using local afﬁne-invariant image descriptors and multi-view spatial constraints,”International Journal of Computer Vision ,vol.66,p.2006,2006.[19]K.Konolige,“Small vision systems:hardware and implementation,”in In Eighth International Symposium on Robotics Research ,1997,pp.111–116.[20]“OpenCV ,Open source Computer Vision library,”in/wiki/,2009.[21]G.Bradski and A.Kaehler,“Learning OpenCV:Computer Vision withthe OpenCV Library,”in O’Reilly Media,Inc.,2008,pp.415–453.[22]R. B.Rusu, A.Holzbach,M.Beetz,and G.Bradski,“Detectingand segmenting objects for mobile manipulation,”in ICCV S3DV workshop ,2009.[23]N.Koenig and A.Howard,“Design and use paradigms for gazebo,an open-source multi-robot simulator,”in IEEE/RSJ International Conference on Intelligent Robots and Systems ,Sendai,Japan,Sep 2004,pp.2149–2154.。

低频活动漂浮潜水船声探测系统（LFATS）说明书

LOW-FREQUENCY ACTIVE TOWED SONAR (LFATS)LFATS is a full-feature, long-range,low-frequency variable depth sonarDeveloped for active sonar operation against modern dieselelectric submarines, LFATS has demonstrated consistent detection performance in shallow and deep water. LFATS also provides a passive mode and includes a full set of passive tools and features.COMPACT SIZELFATS is a small, lightweight, air-transportable, ruggedized system designed specifically for easy installation on small vessels. CONFIGURABLELFATS can operate in a stand-alone configuration or be easily integrated into the ship’s combat system.TACTICAL BISTATIC AND MULTISTATIC CAPABILITYA robust infrastructure permits interoperability with the HELRAS helicopter dipping sonar and all key sonobuoys.HIGHLY MANEUVERABLEOwn-ship noise reduction processing algorithms, coupled with compact twin line receivers, enable short-scope towing for efficient maneuvering, fast deployment and unencumbered operation in shallow water.COMPACT WINCH AND HANDLING SYSTEMAn ultrastable structure assures safe, reliable operation in heavy seas and permits manual or console-controlled deployment, retrieval and depth-keeping. FULL 360° COVERAGEA dual parallel array configuration and advanced signal processing achieve instantaneous, unambiguous left/right target discrimination.SPACE-SAVING TRANSMITTERTOW-BODY CONFIGURATIONInnovative technology achievesomnidirectional, large aperture acousticperformance in a compact, sleek tow-body assembly.REVERBERATION SUPRESSIONThe unique transmitter design enablesforward, aft, port and starboarddirectional transmission. This capabilitydiverts energy concentration away fromshorelines and landmasses, minimizingreverb and optimizing target detection.SONAR PERFORMANCE PREDICTIONA key ingredient to mission planning,LFATS computes and displays systemdetection capability based on modeled ormeasured environmental data.Key Features>Wide-area search>Target detection, localization andclassification>T racking and attack>Embedded trainingSonar Processing>Active processing: State-of-the-art signal processing offers acomprehensive range of single- andmulti-pulse, FM and CW processingfor detection and tracking. Targetdetection, localization andclassification>P assive processing: LFATS featuresfull 100-to-2,000 Hz continuouswideband coverage. Broadband,DEMON and narrowband analyzers,torpedo alert and extendedtracking functions constitute asuite of passive tools to track andanalyze targets.>Playback mode: Playback isseamlessly integrated intopassive and active operation,enabling postanalysis of pre-recorded mission data and is a keycomponent to operator training.>Built-in test: Power-up, continuousbackground and operator-initiatedtest modes combine to boostsystem availability and accelerateoperational readiness.UNIQUE EXTENSION/RETRACTIONMECHANISM TRANSFORMS COMPACTTOW-BODY CONFIGURATION TO ALARGE-APERTURE MULTIDIRECTIONALTRANSMITTERDISPLAYS AND OPERATOR INTERFACES>State-of-the-art workstation-based operator machineinterface: Trackball, point-and-click control, pull-down menu function and parameter selection allows easy access to key information. >Displays: A strategic balance of multifunction displays,built on a modern OpenGL framework, offer flexible search, classification and geographic formats. Ground-stabilized, high-resolution color monitors capture details in the real-time processed sonar data. > B uilt-in operator aids: To simplify operation, LFATS provides recommended mode/parameter settings, automated range-of-day estimation and data history recall. >COTS hardware: LFATS incorporates a modular, expandable open architecture to accommodate future technology.L3Harrissellsht_LFATS© 2022 L3Harris Technologies, Inc. | 09/2022NON-EXPORT CONTROLLED - These item(s)/data have been reviewed in accordance with the InternationalTraffic in Arms Regulations (ITAR), 22 CFR part 120.33, and the Export Administration Regulations (EAR), 15 CFR 734(3)(b)(3), and may be released without export restrictions.L3Harris Technologies is an agile global aerospace and defense technology innovator, delivering end-to-endsolutions that meet customers’ mission-critical needs. The company provides advanced defense and commercial technologies across air, land, sea, space and cyber domains.t 818 367 0111 | f 818 364 2491 *******************WINCH AND HANDLINGSYSTEMSHIP ELECTRONICSTOWED SUBSYSTEMSONAR OPERATORCONSOLETRANSMIT POWERAMPLIFIER 1025 W. NASA Boulevard Melbourne, FL 32919SPECIFICATIONSOperating Modes Active, passive, test, playback, multi-staticSource Level 219 dB Omnidirectional, 222 dB Sector Steered Projector Elements 16 in 4 stavesTransmission Omnidirectional or by sector Operating Depth 15-to-300 m Survival Speed 30 knotsSize Winch & Handling Subsystem:180 in. x 138 in. x 84 in.(4.5 m x 3.5 m x 2.2 m)Sonar Operator Console:60 in. x 26 in. x 68 in.(1.52 m x 0.66 m x 1.73 m)Transmit Power Amplifier:42 in. x 28 in. x 68 in.(1.07 m x 0.71 m x 1.73 m)Weight Winch & Handling: 3,954 kg (8,717 lb.)Towed Subsystem: 678 kg (1,495 lb.)Ship Electronics: 928 kg (2,045 lb.)Platforms Frigates, corvettes, small patrol boats Receive ArrayConfiguration: Twin-lineNumber of channels: 48 per lineLength: 26.5 m (86.9 ft.)Array directivity: >18 dB @ 1,380 HzLFATS PROCESSINGActiveActive Band 1,200-to-1,00 HzProcessing CW, FM, wavetrain, multi-pulse matched filtering Pulse Lengths Range-dependent, .039 to 10 sec. max.FM Bandwidth 50, 100 and 300 HzTracking 20 auto and operator-initiated Displays PPI, bearing range, Doppler range, FM A-scan, geographic overlayRange Scale5, 10, 20, 40, and 80 kyd PassivePassive Band Continuous 100-to-2,000 HzProcessing Broadband, narrowband, ALI, DEMON and tracking Displays BTR, BFI, NALI, DEMON and LOFAR Tracking 20 auto and operator-initiatedCommonOwn-ship noise reduction, doppler nullification, directional audio。

Discriminative Regions for Human Face Detection

ACCV2002:The5th Asian Conference on Computer Vision,23–25January2002,Melbourne,Australia.Discriminative Regions for Human Face Detection∗J.Matas1,2,P.B´ılek1,M.Hamouz2,and J.Kittler21Center for Machine Perception,Czech Technical University{bilek,matas}@cmp.felk.cvut.cz2Centre for Vision,Speech,and Signal Processing,University of Surrey{m.hamouz,j.kittler}@AbstractWe propose a robust method for face detection based on the assumption that face can be represented by arrange-ments of automatically detectable discriminative regions. The appearance of face is modelled statistically in terms of local photometric information and the spatial relationship of the discriminative regions.The spatial relationship be-tween these regions serves mainly as a preliminary evidence for the hypothesis that a face is present in a particular po-sition.Theﬁnal decision is carried out using the complete information from the whole image patch.The results are very promising.1IntroductionDetection and recognition of objects is the most difﬁcult task in computer vision.In many papers object detection and object recognition are considered as distinct problems, treated separately and under different names,e.g.object localisation(detection)and recognition.In our approach localisation of an object of a given class is a natural gener-alisation of object recognition.In the terminology that we introduce object detection is understood to mean the recog-nition of object’s class,while object recognition implies dis-tinguishing between speciﬁc objects from one class.Ac-cordingly,an object class,or category,is a set of objects with similar local surface properties and global geometry. In this paper we focus on object detection,in particular,we address the problem of face localisation.The main idea of this paper is based on the premise that objects in a class can be represented by arrangements of automatically detectable discriminative regions.Discrimi-∗This research was supported by the Czech Ministry of Education under Research Programme MSM210000012Transdisciplinary Biomedical En-gineering Research and by European Commission IST-1999-11159project BANCA.native regions are distinguished regions exhibiting proper-ties important for object detection and recognition.Distin-guished regions are”local parts”of the object surface,ap-pearance of which is stable over a wide range of views and illumination conditions.Instances of the category are repre-sented by a statistical model of appearance of local patches deﬁned in terms of discriminative regions and by their re-lationship.Such a local model of objects has a number of attractive properties,e.g.robustness to partial occlusion and simpler illumination compensation in comparison with global models.Superﬁcially,the framework seems to be no more than a local appearance-based method.The main difference is the focus in our work on the selection of regions where appear-ance is modelled.Detectors of such regions are built during the learning phase.In the detection stage,multiple detec-tors of discriminative regions process the image.Detection is then posed as a combinatorial optimisation problem.De-tails of the scheme are presented in Section3.Before that, previous work is revised in Section2.Experiments in de-tecting human faces based on the proposed framework are described in Section4.Possible reﬁnements of the general framework are discussed in Section5.The main contribu-tions of this paper are summarised in Section6.2Previous WorkMany early object recognition systems were based on two basic approaches:•template matching—one or moreﬁlters(templates), representing each object,are applied to a part of im-age,and from their responses the degree of similarity between the templates and the image is deduced.•measuring geometric features—geometric measure-ments(distance,angle...)between features are ob-tained and different objects are characterised by differ-ent constraints imposed on the measurements.It is was showed by Brunelli et al.[3]that template match-ing outperforms measuring geometric features,since the ap-proach exploits more information extracted from the image. Although template matching works well for some types of patterns,there must be complex solutions to cope with non-rigid objects,illumination variations or geometrical trans-formation due to different camera projections.Both approaches,template matching and measuring ge-ometric constraints,can be combined together to reduce their respective disadvantages.Brunelli et al.[3]showed that a face detector consisting of individual features linked together with crude geometry constraints have better per-formance than a detector based on”whole-face”template matching.Yuille[20]proposed the use of deformable templates to beﬁtted to contrast proﬁles by the gradient descent of a suitable energy function.A similar approach was proposed by Lades et al.[9]and Wiskott et al.[19].They developed a recognition method based on deformable meshes.The mesh(representing object or object’s class)is overlaid over image and adjusted to obtain the best match between the node descriptors and the image.The likelihood of match is computed from the extent of mesh deformation.Schmid et al.[14,17]proposed detectors based on local-jets.The robustness is achieved by using spatial constraints between locally detected features.The spatial constraints are represented by angle and length ratios,that are supposed to be Gaussian variables each with their own mean and stan-dard deviation.Burl et al.[4,5,6]introduced a principled framework for representing possible deformations of objects using prob-abilistic shape models.The objects are again represented as constellations of rigid features(parts).The features are characterised photometrically.The variability of constella-tions is represented by a joint probability density function.A similar approach is used by Mohan et al.[13]for the detection of human bodies.The local parts are again recog-nised by detectors based on photometric information.The geometric constraints on mutual positions of the local parts in the image are deﬁned heuristically.All the above mentioned methods make decisions about the presence or absence of the object in the image only from geometric constraints.Our proposed method shares the same framework,but in our work the local feature de-tector and geometric constraints deﬁne only a set of pos-sible locations of object in the image.Theﬁnal decision is made using photometric information,where the parts of object between the local features are taken into account as well.There are other differences between our approach and the approach of Schmid[17]or Burl[4,6].A coordinate system is introduced for each object from the object class. This allows us to tackle the problem of selecting distinctive and well localisable features in a natural way whereas in the case of Schmid’s approach,detectable regions were selected heuristically and a model was built from such selected fea-tures.Eventhough Weber[18]used an automatic feature selection,this was not carried out in an object-normalised space(as was in our approach),and consequently no re-quirements on the spatial stability of features were speci-ﬁed.The relative spatial stability of discriminative regions used in our method facilitates a natural afﬁne-invariant way of verifying the presence of a face in the image using corre-spondences between points in the normalized object space and the image,as will be discussed into detail further.3Method OutlineObject detection is performed in three stages.First,the discriminative region detectors are applied to image,and thus a set of candidate locations is obtained.In the second stage,the possible constellations(hypotheses)of discrimi-native regions are formed.In the third stage the likelihood of each hypothesis is computed.The best hypotheses are veriﬁed using the photometric information content from the test image.For algorithmic details see Section4.3.In the following sections we deﬁne several terms used in object recognition in a more formal way.The main aim of the sections is to unify different approaches in the literature and different taxonomy.3.1Object ClassesFor our purposes,we deﬁne an object class as a collec-tion of objects which share characteristic features,i.e.ob-jects are composed of several local parts and these parts are in a speciﬁc spatial relationship.We assume the local parts are detectable in the image directly and the possible arrangements of the local parts are given by geometrical constraints.The geometrical constraints should be invari-ant with respect to a predeﬁned group of transformations. Under this assumption,the task of discrimination between two classes can be reduced to measuring the differences be-tween local parts and their geometrical relationships.3.2Discriminative RegionsImagine you are presented with two images depicting ob-jects from one class.You are asked to mark corresponding points in the image pair.We would argue that,unless distin-guished regions are present in the two images,the task is ex-tremely hard.Two views of a white featureless wall,a patch of grass,sea surface or an ant hill might be good examples. However,on most objects,weﬁnd surface patches that can be separated from their surroundings and are detectable overa wide range of views.Before proceeding further,we give a more formal deﬁnition of distinguished region:Deﬁnition1Distinguished Region(DR)is any subset of an image that is a projection of a part of scene(an object) possessing a distinguishing property allowing its detection (segmentation,ﬁgure-ground separation)over a range of viewing and illumination conditions.In other words,the DR detection must be repeatable and stable w.r.t.viewpoint and illumination changes.DRs are referred to in the literature as’interest points’[7],’features’[1]or’invariant regions’[16].Note that we do not require DRs to have some transformation-invariant property that is unique in the image.If a DR possessed such a property,ﬁnding its corresponding DR in an other image would be greatly simpliﬁed.To increase the likelihood of this hap-pening,DRs can be equipped with a characterisation com-puted on associated measurement regions:Deﬁnition2A Measurement Region(MR)is any subset of an image deﬁned by a transformation-invariant construc-tion(projective,afﬁne,similarity invariant)from one or more(in case of grouping)regions.The separation of the concepts of DR and MRs is impor-tant and not made explicit in the literature.Since DRs are projections of the same part of an object in both views and MRs are deﬁned in a transformation-invariant manner they are quasi view-point invariant.Besides the simplest and most common case where the MR is the DR itself,a MR may be constructed for example as a convex hull of a DR, aﬁtted ellipse(afﬁnelly invariant,[16]),a line segment be-tween a pair of interest points[15]or any region deﬁned in a DR-derived coordinates.Of course,invariant measure-ments from a single or even multiple MRs associated with a DR will not guarantee a unique match on e.g.repetitive patterns.However,often DR characterisation by invariants computed on MR might be unique or almost unique.Note that,any set of pixels,not necessarily continu-ous,can posses a distinguishing property.Many percep-tual grouping processes detect such arrangements,e.g.a set of(unconnected)edges lying along a straight line form a DR of maximum edge density.The property is view-point quasi-invariant and detectable by the Hough Trans-form.The’distinguished pixel set’[10]would be a more precise term,but it is cumbersome.The deﬁnition of”local part”(sometimes also called ”feature”,”object component”etc.)is very vague in the recent literature.For our purpose it is important to deﬁne it more precisely.In the following discussion we will use the term”discriminative region”instead of”local part”.In this way,we would like to emphasise the difference between our deﬁnition of discriminative region and the usual sense of lo-cal part(a discriminative region is a local part with special properties important for its detection and recognition).Deﬁnition3A Discriminative Region is any subset of an image deﬁned by discriminative descriptors computed on measurement region.Discriminative descriptors have to have the following properties:•Stability under change of imaging conditions.A discriminative region must be detectable over a wide range of imaging conditions(viewpoint,illumination).This property is guaranteed by deﬁnition of a DR.•Good intra-category localization.The variation in the position of the discriminative region in the object coordinate system should be small for different objects in the same category.•Uniqueness.A small number of similar discriminative regions should be present in the image of both object and background.•High incidence.The discriminative region should be detectable in a high proportion of objects from the same category.Note,there exists a trade-off between the ability to localise objects and the ability to discriminate between.A very dis-criminative part can be a strong cue,even if it appears in an arbitrary location on the surface of the object.On the other hand,a less discriminative part can only contribute infor-mation if it occurs in a stable spatial relationship relative to other parts.3.3Combining EvidenceThis is a rather important stage of the detection process, which signiﬁcantly inﬂuences the overall performance of the system and makes it robust with respect to arbitrary geometrical transformations.The combination of evidence coming from the detected discriminative regions is carried out in a novel way,signiﬁcantly different from approaches of the Schmid et al.[14,17]or Burl et al.[4,5,6].In most approaches,a shape model is built over the placement of particular discriminative regions.If an admis-sible conﬁguration of these regions is found in an image,an instance of object in the image is hypothesised.It means that all the information conveyed by the area that lies be-tween the detected discriminative regions is discarded.If you imagine a collage,consisting of one eye,a nostril and a mouth corner placed in a reasonable manner on a black background,this will still be detected as a face,since no other parts of the image are needed to accept the”face-present”hypothesis.In our approach the geometrical constraints are modelled probabilistically in terms of spatial coordinates of discrim-inative regions.But these geometrical constraints are used only to deﬁne possible positions(hypotheses)of object inthe image.Theﬁnal decision about object presence in the image is deduced from the photometric information content in the original image.4ExperimentWe have carried out the experiment on face localisation [2]with the XM2VTS database[11].In order to verify the correctness of our localization framework,several simpli-ﬁcations to the general scheme are made.In the exper-iment the discriminative regions were semi-automatically deﬁned as the eye-corners,the eye-centers the nostrils and the mouth corners.4.1Detector of discriminative regionsAs a distinguished region detector we use the improved Harris corner detector[8].Our implementation[2]of the detector is relatively insensitive to illumination changes, since the threshold is computed automatically from the neighborhood of the interest point.Such a corner detec-tor is not generally invariant to scale change,but we solve this problem by searching for interest points through several scales.We have observed[2]that the distribution of interest points coincide with the manually labelled points.It means, these points should deﬁne discriminative regions(here we suppose,that humans often identify interest points as most discriminative parts of object).Further,we have assumed that all potential in-plane face rotations and differences in face scale are covered by the training database.The MRs was deﬁned very simply,as rectangular regions with the centre at the interest points.We select ten positions (the left eye centre,the right eye centre,the right left-eye corner,the left left-eye corner,the right right-eye corner, the left right-eye corner,the left nostril,the right nostril,the left mouth corner,the right mouth corner),which we further denote as regions1–10.All properties of a discriminative region are then determined by the size of the region.As a descriptor of a region we use the normalised colour infor-mation of all points contained in the region.Each region was modelled by a uni-modal Gaussian in a low-dimensional sub-space and the hypothesis whether the sample belongs to the class of faces is decided from the distance of this sample from the mean for a given region. The distance from the mean is measured as a sum of the in sub-space(DISS)and the from sub-space(DFSS)distances (Moghaddam et al.[12]).4.2Combining EvidenceThe proposed method is based onﬁnding the correspon-dences between generic face features(referred to as dis-criminative regions)that lie in the face-space and the face features detected in an image.This correspondence is then used to estimate the transformation that a generic face pos-sibly underwent.So far the correspondence of three points was used to estimate a four or six parametric afﬁne trans-formation.When the the transformation from the face space to im-age space determined,the veriﬁcation of a”face-present”hypothesis becomes an easy task.An inverse transforma-tion(i.e.transformation from the image space into the face-space)is found and the image patch(containing the three points of correspondence)is transformed into the face-space.The decision whether the”face-present”hypothesis holds or not is carried out in the face-space,where all the variations introduced by the geometrical transformation(so far only afﬁne transformation is assumed to be the admis-sible transformation that a generic face can undergo)are compensated(or at least reduced to a negligible extent). The distance from a generic face class[12]is computed for the transformed patch and a threshold is used to determine whether the patch is from a face class or not.Moreover,many possible face patches do not have to be necessarily veriﬁed,since certain constraints can be put on the estimated transformation.Imagine for instance that all the feasible transformations that a face can undergo are the scaling from50%to150%of the original size in the face space and rotations up to30degrees.This is quite a rea-sonable limitation which will cause most of the correspon-dences to be discarded without doing a costly veriﬁcation in the face space(in our experiments the pruning reached about70%).In case of the six parametric afﬁne transform both shear and anisotropic scale is incorporated as the ad-missible transformation.4.3Algorithm summaryAlgorithm1:Detection of human faces1.Detection of the distinguished regions.For each im-age from the test set,detect the distinguished regions using the illumination invariant version of the Harris detector2.Detection of the discriminative regions.For each de-tected distinguished region determine to which class the region belongs using the PCA-based classiﬁer in the colour space from among ten discriminative regionclasses(practically the eye corners,the eye centres,the nostrils and the mouth corners).The distinguished regions that do not belong to any of the predeﬁned classes are discarded.bination of evidence.•Compute the estimate of the transformation fromthe image space to the face space using the corre-spondences between the three points in the facespace and in the image space.•Decompose this transformation into rotation,scale,translation and possibly shear and testwhether these parameters lie within a predeﬁnedconstraints,i.e.make the decision,whether thetransformation is admissible or not.•If the transformation derived from the correspon-dences is admissible,transform the image patchthat is deﬁned by the transformation of the faceoutline into the face space.4.Veriﬁcation.Verify the”face present”hypothesis us-ing a PCA-based classiﬁer.4.4ResultsResults of discriminative regions detector are sum-marised in Tab.1.Note that since the classiﬁer is very sim-ple,the performance is not very high.However,even with such a simple detector of discriminative regions the system is capable of detecting faces with very low error,since we need only a small number of successfully detected discrim-inative regions(in our case only3).Several extensive experiments were conducted.Image patches were declared as”face”when their Mahanalobis distance based score lied below a certain threshold.200im-ages from the XM2VTS database were used for training a grayscale classiﬁer based on the Moghaddam method[12], as mentioned earlier.The detection rate reached98%in case of XM2VTS database-see Fig.1for examples.Faces in several images containing cluttered background were successfully detected as shown in Fig.2.5Discussion and Future WorkWe proposed a method for face detection using discrim-inative regions.The detector performance is very good for the case when the general face detection problem is con-strained by assuming a particular camera and pose position.Table1.Performance of discriminative regiondetectorsfalse negative false positive%#%#Region131.8919172.263831Region210.686437.881342Region357.7634633.03433Region454.9232919.85218Region515.039022.34538Region613.698262.333260Region715.5393 4.0078Region812.5275 5.07104Region948.75292 6.2770Region1033.5620114.90233Correctly detected False rejections Figure1.Experiment resultsWe also assumed that the parts that appear distinctive to the human observer will be also discriminative,and therefore the discriminative regions were selected manually.In gen-eral,the correlation between distinctiveness and discrimi-nativeness cannot necessarily be assumed and therefore the discriminative regions should be”learned”from the training images.The training problem was addressed in this paper only partially.As an alternative the method proposed by Weber et al.[18]can be exploited.The admissible transformation,which a face can undergo has so far been restricted to afﬁne transformation.Never-theless,the results showed even in such a simple case,that high detection performance can be achieved.Future modiﬁ-cations will involve the employment of more complex trans-formations(such as general non-rigid transformations).The PCA based classiﬁcation can be replaced by more powerful classiﬁers,such as Neural Networks,or Support Vector Ma-chines.Figure2.Experiments with cluttered back-ground6ConclusionIn the paper,a novel framework for face detection wasproposed.The framework is based on the idea that mostreal objects can be decomposed into a collection of localparts tied by geometrical constraints imposed on their spa-tial arrangement.By exploiting this fact,face detection canbe treated as recognition of local image patches(photomet-ric information)in a given conﬁguration(geometric con-straints).In our approach,discriminative regions serve as apreliminary evidence reducing the search time dramatically.This evidence is utilised for generating a normalised versionof the image patch,which is then used for the veriﬁcationof the”face present”hypothesis.The proposed method was applied to the problem of facedetection.The results of extensive experiments are verypromising.The experiments demonstrated that the pro-posed method is able to solve a rather difﬁcult problem incomputer vision.Moreover we showed that even simplerecognition methods(with a limited capability when usedalone)can be conﬁgured to create powerful framework ableto tackle such a difﬁcult task as face detection.References[1] A.Baumberg.Reliable feature matching across widely sepa-rated views.In Proc.of Computer Vision and Pattern Recog-nition,pages I:774–781,2000.[2]P.B´ılek,J.Matas,M.Hamouz,and J.Kittler.Detection ofhuman faces from discriminative regions.Technical ReportVSSP–TR–2/2001,Department of Electronic&ElectricalEngineering,University of Surrey,2001.[3]R.Brunelli and T.Poggio.Face recognition:Features vs.templates.IEEE Trans.on Pattern Analysis and MachineIntelligence,15(10):1042–1053,1993.[4]M.C.Burl,T.K.Leung,and P.Perona.Face localizationvia shape statistics.In Proc.of International Workshop onAutomatic Face and Gesture Recognition,pages154–159,1995.[5]M.C.Burl and P.Perona.Recognition of planar objectclasses.In Proc.of Computer Vision and Pattern Recog-nition,pages223–230,1996.[6]M.C.Burl,M.Weber,and P.Perona.A Probabilistic ap-proach to object recognition using local photometry abdglobal Geometry.In Proc.of European Conference on Com-puter Vision,pages628–641,1998.[7]Y.Dufournaud,C.Schmid,and R.Horaud.Matching im-ages with different resolutions.In Proc.of Computer Visionand Pattern Recognition,pages I:612–618,2000.[8] C.J.Harris and M.Stephens.A combined corner and edgedetector.In Proc.of Alvey Vision Conference,pages147–151,1988.[9]des,J. C.V orbr¨u ggen,J.Buhmann,nge,C.von der Malsburg,R.P.W¨u rtz,and W.Konen.Distrotioninvariant object recognition in the dynamic link architecture.IEEE Trans.on Pattern Analysis and Machine Intelligence,42(3):300–310,1993.[10]J.Matas,M.Urban,and T.Pajdla.Unifying view for wide-baseline stereo.In B.Likar,editor,puter Vi-sion Winter Workshop,pages214–222,Ljubljana,Sloveni,February2001.Slovenian Pattern Recorgnition Society.[11]K.Messer,J.Matas,J.Kittler,J.Luettin,and G.Maitre.XM2VTSDB:The extended M2VTS database.In R.Chel-lapa,editor,Second International Conference on Audio andVideo-based Biometric Person Authentication,pages72–77,Washington,USA,March1999.University of Maryland.[12] B.Moghaddam and A.Pentland.Probabilistic visual learn-ing for object detection.In Proc.of International Confer-ence on Computer Vision,pages786–793,1995.[13] A.Mohan,C.Papageorgiou,and T.Poggio.Example-basedobject detection in images by components.IEEE Trans.onPattern Analysis and Machine Intelligence,23(4):349–361,2001.[14] C.Schmid and R.Mohr.Local grayvalue invariants for im-age retrieval.IEEE Trans.on Pattern Analysis and MachineIntelligence,19(5):530–535,1997.[15] D.Tell and S.Carlsson.Wide baseline point matching usingafﬁne invariants computed from intensity proﬁles.In Proc.of European Conference on Computer Vision,pages754–760,2000.[16]T.Tuytelaars and L.van Gool.Wide baseline stereo match-ing based on local,afﬁnely invariant regions.In Proc.ofBritish Machine Vision Conference,pages412–422,2000.[17]V.V ogelhuber and C.Schmid.Face detection based ongeneric local descriptors and spatial constraints.In Proc.of International Conference on Computer Vision,pagesI:1084–1087,2000.[18]M.Weber,M.Welling,and P.Perona.Unsupervised learn-ing of models for recognition.In Proc.of European Confer-ence on Computer Vision,pages18–32,2000.[19]L.Wiskott,J.-M.Fellous,N.Kr¨u ger,and C.von der Mals-burg.Face recognition by elastic bunch graph matching.IEEE Trans.on Pattern Analysis and Machine Intelligence,19(7):775–779,1997.[20] A.L.Yuille.Deformable templates for face recognition.Journal of Cognitive Neuroscience,3(1):59–70,1991.。

纹理物体缺陷的视觉检测算法研究--优秀毕业论文

摘要
在竞争激烈的工业自动化生产过程中，机器视觉对产品质量的把关起着举足轻重的作用，机器视觉在缺陷检测技术方面的应用也逐渐普遍起来。与常规的检测技术相比，自动化的视觉检测系统更加经济、快捷、高效与安全。纹理物体在工业生产中广泛存在，像用于半导体装配和封装底板和发光二极管，现代化电子系统中的印制电路板，以及纺织行业中的布匹和织物等都可认为是含有纹理特征的物体。本论文主要致力于纹理物体的缺陷检测技术研究，为纹理物体的自动化检测提供高效而可靠的检测算法。纹理是描述图像内容的重要特征，纹理分析也已经被成功的应用与纹理分割和纹理分类当中。本研究提出了一种基于纹理分析技术和参考比较方式的缺陷检测算法。这种算法能容忍物体变形引起的图像配准误差，对纹理的影响也具有鲁棒性。本算法旨在为检测出的缺陷区域提供丰富而重要的物理意义，如缺陷区域的大小、形状、亮度对比度及空间分布等。同时，在参考图像可行的情况下，本算法可用于同质纹理物体和非同质纹理物体的检测，对非纹理物体的检测也可取得不错的效果。在整个检测过程中，我们采用了可调控金字塔的纹理分析和重构技术。与传统的小波纹理分析技术不同，我们在小波域中加入处理物体变形和纹理影响的容忍度控制算法，来实现容忍物体变形和对纹理影响鲁棒的目的。最后可调控金字塔的重构保证了缺陷区域物理意义恢复的准确性。实验阶段，我们检测了一系列具有实际应用价值的图像。实验结果表明本文提出的纹理物体缺陷检测算法具有高效性和易于实现性。关键字: 缺陷检测；纹理；物体变形；可调控金字塔；重构
Keywords: defect detection, texture, object distortion, steerable pyramid, reconstruction
II

retinaface原理

retinaface原理RetinaFace是一种检测人脸的深度学习算法，能够精确地检测出多个人脸并对其位置、大小和姿态进行准确估计。

RetinaFace最初由中国香港城市大学傅仁辉教授领导的团队开发，在2019年通过CVPR论文《RetinaFace: Single-stage Dense Face Localisation in the Wild》首次公布于众。

RetinaFace的原理基于SSD（Single Shot Multibox Detector）和DenseBox两种方法的结合。

SSD是一种基于深度学习的目标检测算法，能够在不使用候选框的情况下，精确地检测目标。

而DenseBox则是一种密集的候选框生成方法，能够在不牺牲精度的情况下提高检测速度。

通过将这两种方法结合，RetinaFace能够在不牺牲精度的情况下提高检测速度。

更具体地说，RetinaFace首先使用一系列锚框（anchor）对图像进行划分，然后对每个锚框进行分类和回归。

其中，分类是指判断锚框是否包含人脸，回归则是指根据锚框的位置、大小和姿态等信息，对人脸的位置、大小和姿态进行精确估计。

为了提高检测效果，RetinaFace使用了多尺度归一化（multi-scale normalization）和特征金字塔（feature pyramid）等技术，使得算法能够适应不同大小和姿态的人脸。

总的来说，RetinaFace是一种非常高效和精确的人脸检测算法，能够广泛应用于实际场景中的人脸识别、人脸对齐、人脸特征提取等任务。

尤其是在进行大规模人群检测时，RetinaFace具有非常大的优势，能够大幅提高检测速度和准确度。

未来，随着深度学习技术的不断发展，相信RetinaFace会在更多应用场景中得到广泛应用。

Envi软件界面中英文对比参考表

去除水平航迹方向的亮度变化
DarkSubtraction（暗像元采集）
确定暗像元的像素值
EFFORT Polishing（EFFORT波谱打磨）
AnomalyDetection Workflow（异常检测流程化工具）
启动异常检测流程化工具
RXAnomaly Detection（RX异常检测）
启动RX异常检测工具
BandRatio工具箱及其功能
BandMath（波段运算）
自定义简单或复杂的处理程序进行波段间运算
BandRatios（波段比值）
波段之间的比值运算
ChangeDetection工具箱及其功能
ChangeDetection Difference Map（直接比较法生成变化图像）
对两幅图像直接生成变化图像
ChangeDetection Statistics（分类后处理变化统计）
对两个分类后的数据生成土地利用转移矩阵
ImageChange Workflow（直接比较法流程化工具）
蝴蝶结校正工具，对GLT重投影
SuperGLT Georeference（超级GLT几何校正）
使用超级GLT文件进行几何校正
SuperIGM Georeference（超级IGM几何校正）
使用超级IGM文件进行几何校正
ImageSharpening工具箱及其功能
CNSpectral Sharping（CN融合）
RADARAT（RADARAT处理工具）
RADARSAT数据处理工具，包括斜地距转换、生成入射角图像、查看元数据文件等
SIR-C（SIR-C处理工具）
SIR-C数据处理工具，如多视、极化信号等
SaveCOSMO-SkyMed Metadata toXML（将CSM元数据保存为XML文件）

人脸特征值聚类算法

人脸特征值聚类算法英文回答：Face feature clustering algorithms are widely used in various applications, such as face recognition, emotion analysis, and facial expression detection. The goal of these algorithms is to group similar face features together based on their similarities and dissimilarities. This allows for efficient face feature representation and analysis.One commonly used algorithm for face feature clustering is the k-means algorithm. This algorithm aims to partition the face features into k clusters, where each cluster represents a group of similar face features. The algorithm starts by randomly selecting k initial cluster centroids and then iteratively assigns each face feature to the nearest centroid. After each assignment, the centroids are updated based on the mean of the assigned face features. This process continues until convergence, where the facefeatures are assigned to their final clusters.Another popular algorithm for face feature clustering is the hierarchical clustering algorithm. This algorithm builds a hierarchy of clusters by successively merging or splitting clusters based on their similarities. It starts with each face feature as a separate cluster and then merges the closest pair of clusters at each step. This process continues until all face features are in a single cluster or until a predefined stopping criterion is met. The resulting hierarchy of clusters can be represented as a dendrogram, which provides insights into the relationships between different clusters.In addition to these algorithms, there are also other advanced techniques for face feature clustering, such as spectral clustering, density-based clustering, and fuzzy clustering. These techniques offer different advantages and can be chosen based on the specific requirements of the application.中文回答：人脸特征聚类算法被广泛应用于各种应用中，如人脸识别、情绪分析和面部表情检测。

人脸识别论文：人脸识别独立成分分析核独立成分分析分块

【关键词】人脸识别独立成分分析核独立成分分析分块【英文关键词】Face Recognition Independent Component Analysis Kernel IndependentComponent Analysis Block人脸识别论文：分块核独立成分分析的人脸识别方法研究【中文摘要】本文利用人脸图像分块的思想,结合核独立成分分析,提出基于列分块的核独立成分分析的人脸识别方法。

基于列分块的核独立成分分析方法先将人脸图像按列分块,得到新的样本空间,然后在新的样本空间中进行核独立成分分析提取人脸特征进行识别。

实验表明,基于列分块的核独立成分分析方法通过降低样本维数增加样本个数,在一定程度上解决了高维小样本问题,较传统的核独立成分分析方法,能更好的提取到人脸的局部特征,具有更好的识别性能。

通过改进基于列分块的核独立成分分析的人脸识别方法,本文又提出行列分块的核独立成分分析的人脸识别方法。

行列分块的核独立成分分析方法先将人脸图像按行列进行分块并重组,得到新的样本空间,然后依次进行行的核独立成分分析和列的核独立成分分析处理,最后通过求解左右解混矩阵提取人脸特征进行识别。

实验表明,行列分块的核独立成分分析方法,依次对训练样本进行行和列的核独立成分分析处理,较好的消除了样本之间的相关性,取得更好的识别效果,具有更好的鲁棒性。

【英文摘要】A face recognition method based on the column-block and kernel independent components analysis is proposed combining with the kernel independent component analysis and the thought of image divided by column in this paper. First of all, the face image matrix are divided into blocks by column according to this method. Then kernel independent components analysis could be directly used to extract the feature of face image and recognition in the new eigenspace constructed by all the blocks. The Experimental results of show that this method can solve the defects of small high-dimensional and number samples in some degree through reducing the dimension of samples and increasing the number of samples. This method can extract the local feature of face image more effectively than the traditional kernel independent components analysis, besides the recognition performance of this method is better than the traditional kernel independent components analysis.Another face recognition method based on the ranks-block and kernel independent components analysis is proposed through improvement of the above method. First of all, the face image matrix are divided into blocks by columns androws according to this method, and these blocks are mixed to construct the new eigenspace. Then kernel independent components analysis used twice followed by rows and columns in the new eigenspace to obtain the left-unmixed matrix and right-unmixed matrix. At last we can extract the feature of face image and recognize according to the left-unmixed matrix and right-unmixed matrix. The Experimental results of show that this method can eliminate the correlation between the samples through the twice use of kernel independent components analysis, besides the recognition performance and the robustness of this method is better than the first method.【目录】分块核独立成分分析的人脸识别方法研究摘要4-5Abstract5目录6-8Contents8-10第一章绪论10-16 1.1 选题背景10-11 1.2 人脸识别的定义及研究内容11 1.3 人脸识别的发展阶段及研究现状11-14 1.3.1 发展阶段11-12 1.3.2 研究现状12-14 1.4 论文各部分主要内容14-16第二章独立成分分析16-23 2.1 引言16 2.2 独立成分分析16-20 2.2.1 独立成分分析的模型16-18 2.2.2 ICA模型估计方法18-20 2.3 基于ICA的人脸特征提取20-21 2.4 分类识别21-22 2.5 本章小结22-23第三章核独立成分分析23-28 3.1 引言23 3.2 核方法的基本原理23-25 3.2.1 核的定义23-24 3.2.2 核矩阵的定义及性质24 3.2.3 常用的核函数24-25 3.3 基于核独立成分分析的人脸特征提取25-27 3.4 本章小结27-28第四章基于列分块的核独立成分分析的人脸识别28-33 4.1 引言28 4.2 基于列分块的核独立成分分析的人脸识别28-31 4.2.1 列分块的基本思想28-29 4.2.2 特征提取算法29-30 4.2.3 分类识别30-31 4.3 试验结果比较及分析31-32 4.4 本章小结32-33第五章行列分块的核独立成分分析的人脸识别33-39 5.1 引言33 5.2 行列分块的核独立成分分析的人脸特征提取33-36 5.2.1 行列分块的基本思想33-34 5.2.2 特征提取算法34-36 5.2.3 分类识别36 5.3 试验结果比较及分析36-38 5.4 本章小结38-39结论39-40参考文献40-44攻读硕士学位期间发表的学术论文44-46致谢46。

基于单窗算法反演地表温度的ENVI操作教程

单窗算法反演地表温度教程1.1 算法原理1.1.1 单窗算法单窗算法（MW 算法）是覃志豪于2001年提出的针对TM 数据只有一个热红外波段的地面温度反演算法。

经过众多学者验证，单窗算法具有很高的反演精度，且同样适用于ETM+和landsat 8数据。

公式如下：式中，LST 为地表温度（K ），T sensor 是传感器上的亮度温度（K ），T a 是大气平均温度（K ）；a 、b 为参考系数，当地表温度为0-70℃时，a = -67.355351，b = 0.458606；C 、D 为中间变量，计算公式为：式中，为地表比辐射率，为地面到传感器的大气总透射率。

因此单窗算法反演地表温度的关键是计算得到亮度温度T senso 、地表比辐射率、大气透射率和大气平均作用温度T a 。

1.1.2 参数计算1.1.2.1 辐射亮温计算利用Planck 公式将图像像元对应传感器辐射强度值转换为对应的亮度温度值。

公式如下6666666666/)))1(()1((C T D T D C D C b D C a T a sensor s -++--+--=式中，T senso 为亮度温度值；λL 影像预处理后得到的光谱辐射值，单位为)/(2m sr m w μ⋅⋅，K1 、K2为常量，可由数据头文件获取。

计算图像辐射亮温之前，需采用辐射定标参数将像元灰度值DN 转换为热辐射强度值，公式如下：式中，M L 为增益参数，A L 为偏移参数，该参数可直接在影像通文件数据中获取，且ENVI 软件中已经集成，不需要自己在查找。

1.1.2.2 地表比辐射率计算根据覃志豪针对TM 影像提出的混合像元分解法来确定区域地表福辐射率。

对于城市区域，我们简单的将其分为水体、自然表面和建筑表面三种，因此针对混合像元尺度上的地表比辐射率通过下式来估算：式中，为混合像元的地表比辐射率；P V 为植被覆盖率；R V 为植被的温度比率；R M 为建筑表面的温度比率；V 表示植被法地表比辐射率，m 表示建筑表面的地表比辐射率；d 表示辐射校正项。

实时人脸特征提取(外文翻译)

(4) 外文翻译译文外文标题:实时人脸特征提取作者：赵杰煜,刘箴出处：中国科学杂志（外文版）.2008年4月刊E 辑.信息科学收稿日期：2007-07-10;接受日期：2008-02-26摘要快速精确的人脸特征提取是人脸识别和表情分析的基础.文中提出了一种新型高效的视频人脸几何特征实时提取方法.视频输入图像以加权图形式表示，通过在加权图上的随机游动实现人脸像素级特征的自动提取，脸部特征包括外轮廓、眉毛、眼睛、鼻子和嘴唇.加权图釆用8-邻接结构，定义在图的边上的加权值反映随机游动通过该边的似然度.随机游动模拟了一个各向异性的扩散过程，此扩散过程在滤除图像噪声点的同时保留下脸部特征点.随机游动从一些事先通过颜色和运动信息确定的、最具人脸特征的种子点开始，通过随机游动获得的人脸特征点以其原始形式统一保存在多个链表结构中，并根据人脸各部分的相对位置聚集成对应的特征点集合.有关人脸结构的先验知识通过Bayes 方法结合到分析过程中.为了便于高层视觉计算, 釆用统计形状分析方法，将人脸特征点进一步表示成形状和配准信息, 形状是具有仿射不变特性的几何信息，用于描述人脸的全局特征.形状的距离度量釆用Procrustes 距离.实验结果表明，提出的方法快速高效，能够实时地从视频中提取出人脸特征，在一定程度的光线变化、尺度变化、头部转动、手部干扰的情形下仍可以正常工作. 人脸信息处理是一个富有挑战性的研究领域，是多学科交叉的研究热点，由于技术的进步和市场的需求，近年来引起了学术界和工业界的广泛关注^.人脸特征提取作为人脸信息处理最为重要的一步，对于后续的基于视觉的人机交互至关重要，脸部表情分析、人脸识别、生物认证、动画制作、视频会议等都依赖于高效精确的人脸特征提取.人脸特征提取过程中的细小误差很容易导致身份验证或表情分析的错误.然而，由对于视频人脸特征的分割和提取，采用求图模型最优解的方法由于计算量大做到实时运算目前尚无法实现.因此，我们采用了有限的随机游动来近似地实现一个各向异性的扩散过程该过程在滤除图像噪声点的同时保留人脸特征点，从而获得精确的人脸几何特征.本文的组织如下：第1节介绍图模型的表示并定义加权图上的随机游动；第2节给出采用随机游动实现人脸特征点分割的方法；第3节介绍统计形状分析方法；第4节给出实验结果, 第5节为小结.(4)1图模型与随机游动加权图是数字图像非常自然的一种表示形式，图的顶点对应于像素，加权边用于表示像素间的关联度.本文采用8-邻接结构的无向加权图.边上的加权反映该边所具有的脸部特征的程度，越具有典型的脸部特征，加权值越大；越不具有脸部特征的，加权值越小.这里我们简要地给出图模型以及加权图上随机游动的正式定义.一个图G = (F,五）由可数顶点集F 和边集五组成，边ee 五为顶点对：e=〈x,少>=〈少，x>, x,_yeF.如果顶点x 和y 通过一条边相连，这种相邻关系就表示为—个加权图 Gw=(G,w),其中G 是一个图，w 是一个实函数，w:五(G)R 汧>0.定义在加权图上的随机游动是通过随机选择一条当前顶点的相连边连续不断地实现对一系列相邻顶点访问的随机过程.选择一条相连边的概率由该边的加权与所有相连边的加权之和的比率决定.对边e z e 五，设其上的加权为％=W (e z ).对顶点veF,设与v 相连的所有边的集合为#(v) = {ee £(G): e=〈兄v>,少e 「}, r(v)表示与v 相连的所有边上的加权之和，即妒(v) = I eieW(v)W;.则加权图G w = (G,w)上的随机游动X 是一个采用以下转移概率的随机过程，其中/v ~Xn 是连接v 和X n 的边的集合的指示函数. 2人脸特征点分割加权图上的随机游动可以被有效地用于图像滤波和图像分割.图像滤波过程由Perona 等人M 提出的各向异性扩散方程描述；图像分割过程则由Laplace 方程表示给定一些种子像素，算法通过随机游动来标记那些最易到达种子点的像素.大多数各向异性扩散滤波算法的目标是在不穿越边界的前提下平滑图像的同质区域，而图像分割的目标是标记出同质区域. 对于图像质量不高的视频人脸特征提取，我们需要同时达到2个目标：既要滤除图像噪声，又要分割出人脸特征点.假设当前图像由实函数描述向异性扩散可以表示为如下形式：为了在滤除噪声平滑图像的同时保留人脸结构信息，传导系数一般定义为空间位置相关项，最常见的选择如下：c ( x , y , t ) = exp (|VF (x ,y ,t )||2 ^ 2l 2(3) c ( x , y , t ) = exp (|VF (x ,y ,t )||2 ^ 2l 2其中2为常数.由此可见，对于图像的一致性较好的区域，c的取值较大，达到平滑的效果；对于变化较大的人脸特征区域,c的取值较小，从而达到保留人脸结构信息的目的.上述的各向异性扩散过程可以由一组随机游动来实现.设G w =(G,W)是对应于输入图像的加权图，采用8-邻接结构，为连接顶点/和y的边上的加权，随机游动采用自回避形式, 即随机游动不重复通过同一个点，单步转移概率如下,其中V。

A survey of content based 3d shape retrieval methods

A Survey of Content Based3D Shape Retrieval MethodsJohan W.H.Tangelder and Remco C.VeltkampInstitute of Information and Computing Sciences,Utrecht University hanst@cs.uu.nl,Remco.Veltkamp@cs.uu.nlAbstractRecent developments in techniques for modeling,digitiz-ing and visualizing3D shapes has led to an explosion in the number of available3D models on the Internet and in domain-speciﬁc databases.This has led to the development of3D shape retrieval systems that,given a query object, retrieve similar3D objects.For visualization,3D shapes are often represented as a surface,in particular polygo-nal meshes,for example in VRML format.Often these mod-els contain holes,intersecting polygons,are not manifold, and do not enclose a volume unambiguously.On the con-trary,3D volume models,such as solid models produced by CAD systems,or voxels models,enclose a volume prop-erly.This paper surveys the literature on methods for con-tent based3D retrieval,taking into account the applicabil-ity to surface models as well as to volume models.The meth-ods are evaluated with respect to several requirements of content based3D shape retrieval,such as:(1)shape repre-sentation requirements,(2)properties of dissimilarity mea-sures,(3)efﬁciency,(4)discrimination abilities,(5)ability to perform partial matching,(6)robustness,and(7)neces-sity of pose normalization.Finally,the advantages and lim-its of the several approaches in content based3D shape re-trieval are discussed.1.IntroductionThe advancement of modeling,digitizing and visualizing techniques for3D shapes has led to an increasing amount of3D models,both on the Internet and in domain-speciﬁc databases.This has led to the development of theﬁrst exper-imental search engines for3D shapes,such as the3D model search engine at Princeton university[2,57],the3D model retrieval system at the National Taiwan University[1,17], the Ogden IV system at the National Institute of Multimedia Education,Japan[62,77],the3D retrieval engine at Utrecht University[4,78],and the3D model similarity search en-gine at the University of Konstanz[3,84].Laser scanning has been applied to obtain archives recording cultural heritage like the Digital Michelan-gelo Project[25,48],and the Stanford Digital Formae Urbis Romae Project[75].Furthermore,archives contain-ing domain-speciﬁc shape models are now accessible by the Internet.Examples are the National Design Repos-itory,an online repository of CAD models[59,68], and the Protein Data Bank,an online archive of struc-tural data of biological macromolecules[10,80].Unlike text documents,3D models are not easily re-trieved.Attempting toﬁnd a3D model using textual an-notation and a conventional text-based search engine would not work in many cases.The annotations added by human beings depend on language,culture,age,sex,and other fac-tors.They may be too limited or ambiguous.In contrast, content based3D shape retrieval methods,that use shape properties of the3D models to search for similar models, work better than text based methods[58].Matching is the process of determining how similar two shapes are.This is often done by computing a distance.A complementary process is indexing.In this paper,indexing is understood as the process of building a datastructure to speed up the search.Note that the term indexing is also of-ten used for the identiﬁcation of features in models,or mul-timedia documents in general.Retrieval is the process of searching and delivering the query results.Matching and in-dexing are often part of the retrieval process.Recently,a lot of researchers have investigated the spe-ciﬁc problem of content based3D shape retrieval.Also,an extensive amount of literature can be found in the related ﬁelds of computer vision,object recognition and geomet-ric modelling.Survey papers to this literature have been provided by Besl and Jain[11],Loncaric[50]and Camp-bell and Flynn[16].For an overview of2D shape match-ing methods we refer the reader to the paper by Veltkamp [82].Unfortunately,most2D methods do not generalize di-rectly to3D model matching.Work in progress by Iyer et al.[40]provides an extensive overview of3D shape search-ing techniques.Atmosukarto and Naval[6]describe a num-ber of3D model retrieval systems and methods,but do not provide a categorization and evaluation.In contrast,this paper evaluates3D shape retrieval meth-ods with respect to several requirements on content based 3D shape retrieval,such as:(1)shape representation re-quirements,(2)properties of dissimilarity measures,(3)ef-ﬁciency,(4)discrimination abilities,(5)ability to perform partial matching,(6)robustness,and(7)necessity of posenormalization.In section2we discuss several aspects of3D shape retrieval.The literature on3D shape matching meth-ods is discussed in section3and evaluated in section4. 2.3D shape retrieval aspectsIn this section we discuss several issues related to3D shape retrieval.2.1.3D shape retrieval frameworkAt a conceptual level,a typical3D shape retrieval frame-work as illustrated byﬁg.1consists of a database with an index structure created ofﬂine and an online query engine. Each3D model has to be identiﬁed with a shape descrip-tor,providing a compact overall description of the shape. To efﬁciently search a large collection online,an indexing data structure and searching algorithm should be available. The online query engine computes the query descriptor,and models similar to the query model are retrieved by match-ing descriptors to the query descriptor from the index struc-ture of the database.The similarity between two descriptors is quantiﬁed by a dissimilarity measure.Three approaches can be distinguished to provide a query object:(1)browsing to select a new query object from the obtained results,(2) a direct query by providing a query descriptor,(3)query by example by providing an existing3D model or by creating a3D shape query from scratch using a3D tool or sketch-ing2D projections of the3D model.Finally,the retrieved models can be visualized.2.2.Shape representationsAn important issue is the type of shape representation(s) that a shape retrieval system accepts.Most of the3D models found on the World Wide Web are meshes deﬁned in aﬁle format supporting visual appearance.Currently,the most common format used for this purpose is the Virtual Real-ity Modeling Language(VRML)format.Since these mod-els have been designed for visualization,they often contain only geometry and appearance attributes.In particular,they are represented by“polygon soups”,consisting of unorga-nized sets of polygons.Also,in general these models are not“watertight”meshes,i.e.they do not enclose a volume. By contrast,for volume models retrieval methods depend-ing on a properly deﬁned volume can be applied.2.3.Measuring similarityIn order to measure how similar two objects are,it is nec-essary to compute distances between pairs of descriptors us-ing a dissimilarity measure.Although the term similarity is often used,dissimilarity corresponds to the notion of dis-tance:small distances means small dissimilarity,and large similarity.A dissimilarity measure can be formalized by a func-tion deﬁned on pairs of descriptors indicating the degree of their resemblance.Formally speaking,a dissimilarity measure d on a set S is a non-negative valued function d:S×S→R+∪{0}.Function d may have some of the following properties:i.Identity:For all x∈S,d(x,x)=0.ii.Positivity:For all x=y in S,d(x,y)>0.iii.Symmetry:For all x,y∈S,d(x,y)=d(y,x).iv.Triangle inequality:For all x,y,z∈S,d(x,z)≤d(x,y)+d(y,z).v.Transformation invariance:For a chosen transforma-tion group G,for all x,y∈S,g∈G,d(g(x),g(y))= d(x,y).The identity property says that a shape is completely similar to itself,while the positivity property claims that dif-ferent shapes are never completely similar.This property is very strong for a high-level shape descriptor,and is often not satisﬁed.However,this is not a severe drawback,if the loss of uniqueness depends on negligible details.Symmetry is not always wanted.Indeed,human percep-tion does not alwaysﬁnd that shape x is equally similar to shape y,as y is to x.In particular,a variant x of prototype y,is often found more similar to y then vice versa[81].Dissimilarity measures for partial matching,giving a small distance d(x,y)if a part of x matches a part of y, do not obey the triangle inequality.Transformation invariance has to be satisﬁed,if the com-parison and the extraction process of shape descriptors have to be independent of the place,orientation and scale of the object in its Cartesian coordinate system.If we want that a dissimilarity measure is not affected by any transforma-tion on x,then we may use as alternative formulation for (v):Transformation invariance:For a chosen transforma-tion group G,for all x,y∈S,g∈G,d(g(x),y)=d(x,y).When all the properties(i)-(iv)hold,the dissimilarity measure is called a metric.Other combinations are possi-ble:a pseudo-metric is a dissimilarity measure that obeys (i),(iii)and(iv)while a semi-metric obeys only(i),(ii)and(iii).If a dissimilarity measure is a pseudo-metric,the tri-angle inequality can be applied to make retrieval more efﬁ-cient[7,83].2.4.EfﬁciencyFor large shape collections,it is inefﬁcient to sequen-tially match all objects in the database with the query object. Because retrieval should be fast,efﬁcient indexing search structures are needed to support efﬁcient retrieval.Since for query by example the shape descriptor is computed online, it is reasonable to require that the shape descriptor compu-tation is fast enough for interactive querying.2.5.Discriminative powerA shape descriptor should capture properties that dis-criminate objects well.However,the judgement of the sim-ilarity of the shapes of two3D objects is somewhat sub-jective,depending on the user preference or the application at hand.E.g.for solid modeling applications often topol-ogy properties such as the numbers of holes in a model are more important than minor differences in shapes.On the contrary,if a user searches for models looking visually sim-ilar the existence of a small hole in the model,may be of no importance to the user.2.6.Partial matchingIn contrast to global shape matching,partial matching ﬁnds a shape of which a part is similar to a part of another shape.Partial matching can be applied if3D shape mod-els are not complete,e.g.for objects obtained by laser scan-ning from one or two directions only.Another application is the search for“3D scenes”containing an instance of the query object.Also,this feature can potentially give the user ﬂexibility towards the matching problem,if parts of inter-est of an object can be selected or weighted by the user. 2.7.RobustnessIt is often desirable that a shape descriptor is insensitive to noise and small extra features,and robust against arbi-trary topological degeneracies,e.g.if it is obtained by laser scanning.Also,if a model is given in multiple levels-of-detail,representations of different levels should not differ signiﬁcantly from the original model.2.8.Pose normalizationIn the absence of prior knowledge,3D models have ar-bitrary scale,orientation and position in the3D space.Be-cause not all dissimilarity measures are invariant under ro-tation and translation,it may be necessary to place the3D models into a canonical coordinate system.This should be the same for a translated,rotated or scaled copy of the model.A natural choice is toﬁrst translate the center to the ori-gin.For volume models it is natural to translate the cen-ter of mass to the origin.But for meshes this is in gen-eral not possible,because they have not to enclose a vol-ume.For meshes it is an alternative to translate the cen-ter of mass of all the faces to the origin.For example the Principal Component Analysis(PCA)method computes for each model the principal axes of inertia e1,e2and e3 and their eigenvaluesλ1,λ2andλ3,and make the nec-essary conditions to get right-handed coordinate systems. These principal axes deﬁne an orthogonal coordinate sys-tem(e1,e2,e3),withλ1≥λ2≥λ3.Next,the polyhe-dral model is rotated around the origin such that the co-ordinate system(e x,e y,e z)coincides with the coordinatesystem(e1,e2,e3).The PCA algorithm for pose estimation is fairly simple and efﬁcient.However,if the eigenvalues are equal,prin-cipal axes may switch,without affecting the eigenvalues. Similar eigenvalues may imply an almost symmetrical mass distribution around an axis(e.g.nearly cylindrical shapes) or around the center of mass(e.g.nearly spherical shapes). Fig.2illustrates the problem.3.Shape matching methodsIn this section we discuss3D shape matching methods. We divide shape matching methods in three broad cate-gories:(1)feature based methods,(2)graph based meth-ods and(3)other methods.Fig.3illustrates a more detailed categorization of shape matching methods.Note,that the classes of these methods are not completely disjoined.For instance,a graph-based shape descriptor,in some way,de-scribes also the global feature distribution.By this point of view the taxonomy should be a graph.3.1.Feature based methodsIn the context of3D shape matching,features denote ge-ometric and topological properties of3D shapes.So3D shapes can be discriminated by measuring and comparing their features.Feature based methods can be divided into four categories according to the type of shape features used: (1)global features,(2)global feature distributions,(3)spa-tial maps,and(4)local features.Feature based methods from theﬁrst three categories represent features of a shape using a single descriptor consisting of a d-dimensional vec-tor of values,where the dimension d isﬁxed for all shapes.The value of d can easily be a few hundred.The descriptor of a shape is a point in a high dimensional space,and two shapes are considered to be similar if they are close in this space.Retrieving the k best matches for a3D query model is equivalent to solving the k nearest neighbors -ing the Euclidean distance,matching feature descriptors can be done efﬁciently in practice by searching in multiple1D spaces to solve the approximate k nearest neighbor prob-lem as shown by Indyk and Motwani[36].In contrast with the feature based methods from theﬁrst three categories,lo-cal feature based methods describe for a number of surface points the3D shape around the point.For this purpose,for each surface point a descriptor is used instead of a single de-scriptor.3.1.1.Global feature based similarityGlobal features characterize the global shape of a3D model. Examples of these features are the statistical moments of the boundary or the volume of the model,volume-to-surface ra-tio,or the Fourier transform of the volume or the boundary of the shape.Zhang and Chen[88]describe methods to com-pute global features such as volume,area,statistical mo-ments,and Fourier transform coefﬁcients efﬁciently.Paquet et al.[67]apply bounding boxes,cords-based, moments-based and wavelets-based descriptors for3D shape matching.Corney et al.[21]introduce convex-hull based indices like hull crumpliness(the ratio of the object surface area and the surface area of its convex hull),hull packing(the percentage of the convex hull volume not occupied by the object),and hull compactness(the ratio of the cubed sur-face area of the hull and the squared volume of the convex hull).Kazhdan et al.[42]describe a reﬂective symmetry de-scriptor as a2D function associating a measure of reﬂec-tive symmetry to every plane(speciﬁed by2parameters) through the model’s centroid.Every function value provides a measure of global shape,where peaks correspond to the planes near reﬂective symmetry,and valleys correspond to the planes of near anti-symmetry.Their experimental results show that the combination of the reﬂective symmetry de-scriptor with existing methods provides better results.Since only global features are used to characterize the overall shape of the objects,these methods are not very dis-criminative about object details,but their implementation is straightforward.Therefore,these methods can be used as an activeﬁlter,after which more detailed comparisons can be made,or they can be used in combination with other meth-ods to improve results.Global feature methods are able to support user feed-back as illustrated by the following research.Zhang and Chen[89]applied features such as volume-surface ratio, moment invariants and Fourier transform coefﬁcients for 3D shape retrieval.They improve the retrieval performance by an active learning phase in which a human annotator as-signs attributes such as airplane,car,body,and so on to a number of sample models.Elad et al.[28]use a moments-based classiﬁer and a weighted Euclidean distance measure. Their method supports iterative and interactive database searching where the user can improve the weights of the distance measure by marking relevant search results.3.1.2.Global feature distribution based similarityThe concept of global feature based similarity has been re-ﬁned recently by comparing distributions of global features instead of the global features directly.Osada et al.[66]introduce and compare shape distribu-tions,which measure properties based on distance,angle, area and volume measurements between random surface points.They evaluate the similarity between the objects us-ing a pseudo-metric that measures distances between distri-butions.In their experiments the D2shape distribution mea-suring distances between random surface points is most ef-fective.Ohbuchi et al.[64]investigate shape histograms that are discretely parameterized along the principal axes of inertia of the model.The shape descriptor consists of three shape histograms:(1)the moment of inertia about the axis,(2) the average distance from the surface to the axis,and(3) the variance of the distance from the surface to the axis. Their experiments show that the axis-parameterized shape features work only well for shapes having some form of ro-tational symmetry.Ip et al.[37]investigate the application of shape distri-butions in the context of CAD and solid modeling.They re-ﬁned Osada’s D2shape distribution function by classifying2random points as1)IN distances if the line segment con-necting the points lies complete inside the model,2)OUT distances if the line segment connecting the points lies com-plete outside the model,3)MIXED distances if the line seg-ment connecting the points lies passes both inside and out-side the model.Their dissimilarity measure is a weighted distance measure comparing D2,IN,OUT and MIXED dis-tributions.Since their method requires that a line segment can be classiﬁed as lying inside or outside the model it is required that the model deﬁnes a volume properly.There-fore it can be applied to volume models,but not to polyg-onal soups.Recently,Ip et al.[38]extend this approach with a technique to automatically categorize a large model database,given a categorization on a number of training ex-amples from the database.Ohbuchi et al.[63],investigate another extension of the D2shape distribution function,called the Absolute Angle-Distance histogram,parameterized by a parameter denot-ing the distance between two random points and by a pa-rameter denoting the angle between the surfaces on which two random points are located.The latter parameter is ac-tually computed as an inner product of the surface normal vectors.In their evaluation experiment this shape distribu-tion function outperformed the D2distribution function at about1.5times higher computational costs.Ohbuchi et al.[65]improved this method further by a multi-resolution ap-proach computing a number of alpha-shapes at different scales,and computing for each alpha-shape their Absolute Angle-Distance descriptor.Their experimental results show that this approach outperforms the Angle-Distance descrip-tor at the cost of high processing time needed to compute the alpha-shapes.Shape distributions distinguish models in broad cate-gories very well:aircraft,boats,people,animals,etc.How-ever,they perform often poorly when having to discrimi-nate between shapes that have similar gross shape proper-ties but vastly different detailed shape properties.3.1.3.Spatial map based similaritySpatial maps are representations that capture the spatial lo-cation of an object.The map entries correspond to physi-cal locations or sections of the object,and are arranged in a manner that preserves the relative positions of the features in an object.Spatial maps are in general not invariant to ro-tations,except for specially designed maps.Therefore,typ-ically a pose normalization is doneﬁrst.Ankerst et al.[5]use shape histograms as a means of an-alyzing the similarity of3D molecular surfaces.The his-tograms are not built from volume elements but from uni-formly distributed surface points taken from the molecular surfaces.The shape histograms are deﬁned on concentric shells and sectors around a model’s centroid and compare shapes using a quadratic form distance measure to compare the histograms taking into account the distances between the shape histogram bins.Vrani´c et al.[85]describe a surface by associating to each ray from the origin,the value equal to the distance to the last point of intersection of the model with the ray and compute spherical harmonics for this spherical extent func-tion.Spherical harmonics form a Fourier basis on a sphere much like the familiar sine and cosine do on a line or a cir-cle.Their method requires pose normalization to provide rotational invariance.Also,Yu et al.[86]propose a descrip-tor similar to a spherical extent function and a descriptor counting the number of intersections of a ray from the ori-gin with the model.In both cases the dissimilarity between two shapes is computed by the Euclidean distance of the Fourier transforms of the descriptors of the shapes.Their method requires pose normalization to provide rotational in-variance.Kazhdan et al.[43]present a general approach based on spherical harmonics to transform rotation dependent shape descriptors into rotation independent ones.Their method is applicable to a shape descriptor which is deﬁned as either a collection of spherical functions or as a function on a voxel grid.In the latter case a collection of spherical functions is obtained from the function on the voxel grid by restricting the grid to concentric spheres.From the collection of spher-ical functions they compute a rotation invariant descriptor by(1)decomposing the function into its spherical harmon-ics,(2)summing the harmonics within each frequency,and computing the L2-norm for each frequency component.The resulting shape descriptor is a2D histogram indexed by ra-dius and frequency,which is invariant to rotations about the center of the mass.This approach offers an alternative for pose normalization,because their method obtains rotation invariant shape descriptors.Their experimental results show indeed that in general the performance of the obtained ro-tation independent shape descriptors is better than the cor-responding normalized descriptors.Their experiments in-clude the ray-based spherical harmonic descriptor proposed by Vrani´c et al.[85].Finally,note that their approach gen-eralizes the method to compute voxel-based spherical har-monics shape descriptor,described by Funkhouser et al.[30],which is deﬁned as a binary function on the voxel grid, where the value at each voxel is given by the negatively ex-ponentiated Euclidean Distance Transform of the surface of a3D model.Novotni and Klein[61]present a method to compute 3D Zernike descriptors from voxelized models as natural extensions of spherical harmonics based descriptors.3D Zernike descriptors capture object coherence in the radial direction as well as in the direction along a sphere.Both 3D Zernike descriptors and spherical harmonics based de-scriptors achieve rotation invariance.However,by sampling the space only in radial direction the latter descriptors donot capture object coherence in the radial direction,as illus-trated byﬁg.4.The limited experiments comparing spherical harmonics and3D Zernike moments performed by Novotni and Klein show similar results for a class of planes,but better results for the3D Zernike descriptor for a class of chairs.Vrani´c[84]expects that voxelization is not a good idea, because manyﬁne details are lost in the voxel grid.There-fore,he compares his ray-based spherical harmonic method [85]and a variation of it using functions deﬁned on concen-tric shells with the voxel-based spherical harmonics shape descriptor proposed by Funkhouser et al.[30].Also,Vrani´c et al.[85]accomplish pose normalization using the so-called continuous PCA algorithm.In the paper it is claimed that the continuous PCA is better as the conventional PCA and better as the weighted PCA,which takes into account the differing sizes of the triangles of a mesh.In contrast with Kazhdan’s experiments[43]the experiments by Vrani´c show that for ray-based spherical harmonics using the con-tinuous PCA without voxelization is better than using rota-tion invariant shape descriptors obtained using voxelization. Perhaps,these results are opposite to Kazhdan results,be-cause of the use of different methods to compute the PCA or the use of different databases or both.Kriegel et al.[46,47]investigate similarity for voxelized models.They obtain a spatial map by partitioning a voxel grid into disjoint cells which correspond to the histograms bins.They investigate three different spatial features asso-ciated with the grid cells:(1)volume features recording the fraction of voxels from the volume in each cell,(2) solid-angle features measuring the convexity of the volume boundary in each cell,(3)eigenvalue features estimating the eigenvalues obtained by the PCA applied to the voxels of the model in each cell[47],and a fourth method,using in-stead of grid cells,a moreﬂexible partition of the voxels by cover sequence features,which approximate the model by unions and differences of cuboids,each containing a number of voxels[46].Their experimental results show that the eigenvalue method and the cover sequence method out-perform the volume and solid-angle feature method.Their method requires pose normalization to provide rotational in-variance.Instead of representing a cover sequence with a single feature vector,Kriegel et al.[46]represent a cover sequence by a set of feature vectors.This approach allows an efﬁcient comparison of two cover sequences,by compar-ing the two sets of feature vectors using a minimal match-ing distance.The spatial map based approaches show good retrieval results.But a drawback of these methods is that partial matching is not supported,because they do not encode the relation between the features and parts of an object.Fur-ther,these methods provide no feedback to the user about why shapes match.3.1.4.Local feature based similarityLocal feature based methods provide various approaches to take into account the surface shape in the neighbourhood of points on the boundary of the shape.Shum et al.[74]use a spherical coordinate system to map the surface curvature of3D objects to the unit sphere. By searching over a spherical rotation space a distance be-tween two curvature distributions is computed and used as a measure for the similarity of two objects.Unfortunately, the method is limited to objects which contain no holes, i.e.have genus zero.Zaharia and Prˆe teux[87]describe the 3D Shape Spectrum Descriptor,which is deﬁned as the histogram of shape index values,calculated over an en-tire mesh.The shape index,ﬁrst introduced by Koenderink [44],is deﬁned as a function of the two principal curvatures on continuous surfaces.They present a method to compute these shape indices for meshes,byﬁtting a quadric surface through the centroids of the faces of a mesh.Unfortunately, their method requires a non-trivial preprocessing phase for meshes that are not topologically correct or not orientable.Chua and Jarvis[18]compute point signatures that accu-mulate surface information along a3D curve in the neigh-bourhood of a point.Johnson and Herbert[41]apply spin images that are2D histograms of the surface locations around a point.They apply spin images to recognize models in a cluttered3D scene.Due to the complexity of their rep-resentation[18,41]these methods are very difﬁcult to ap-ply to3D shape matching.Also,it is not clear how to deﬁne a dissimilarity function that satisﬁes the triangle inequality.K¨o rtgen et al.[45]apply3D shape contexts for3D shape retrieval and matching.3D shape contexts are semi-local descriptions of object shape centered at points on the sur-face of the object,and are a natural extension of2D shape contexts introduced by Belongie et al.[9]for recognition in2D images.The shape context of a point p,is deﬁned as a coarse histogram of the relative coordinates of the re-maining surface points.The bins of the histogram are de-。

分割mask增强方法

分割mask增强方法
对于分割mask的增强，有几种常用的方法：
1. 旋转：对图像进行一定角度的旋转，有助于提高模型对不同方向物体的识别能力。

2. 颜色抖动：对图像的曝光度、饱和度和色调进行随机变化，形成不同光照及颜色下的图片，有助于提高模型对不同光照和颜色的适应性。

3. 随机遮挡：对图像进行小区域的遮挡，有助于提高模型对遮挡物体的识别能力。

4. 灰度化：将图像转换为灰度图，有助于去除颜色信息，使模型专注于形状和结构。

5. 翻转：对图像进行水平或垂直翻转，有助于提高模型对图像对称性的感知。

6. 缩放：对图像进行缩放，有助于提高模型对不同尺度的物体识别能力。

7. 剪裁：对图像进行剪裁，选取不同的区域进行训练，有助于提高模型对不同位置和长宽比的物体识别能力。

8. 旋转和翻转的组合：同时进行旋转和翻转操作，进一步增强图像的多样性和复杂性。

以上方法可以根据实际需求选择使用，也可以结合多种方法进行联合增强。

需要注意的是，在使用这些方法时，应保持标签的一致性，即对增强后的图像重新进行标注。

无人机倾斜影像密集匹配点云的处理与应用

II
目录
4.4 滤波实验及质量分析 ......................................................... 49
4.4.1 实验地块的滤波实验................................................ 49
4.4.2 实验结果质量分析.................................................... 63
4.2.2 点云粗差剔除的主要算法........................................ 38
4.2.3 实验与分析 ................................................................ 39
4.3 点云滤波算法简介 ............................................................. 43
4.3.3 数学形态学 ................................................................ 45
4.3.4 不规则三角网加密法................................................ 46
4.3.5 布料滤波 .................................................................... 47
5.3 成果分析.............................................................................. 92

最大间隔稀疏表示判别投影在人脸识别中的应用

最大间隔稀疏表示判别投影在人脸识别中的应用作者：张强来源：《报刊荟萃（下）》2017年第11期摘要：从局部几何结构保持和参数自动选择的角度出发，通过引入稀疏子空间学习的思想，提出了一种称为最大间隔稀疏表示判别投影（MSRDM）的特征提取方法。

人脸识别实验结果表明了理论分析和算法设计的有效性。

关键词：稀疏学习；判别投影；非负矩阵分解；特征提取；人脸识别1引言近年来，许多整合稀疏表示理论、压缩传感和子空间学习的方法被相继提出，并成功用于许多实际的应用中[1-2]。

在本文中提出一种新颖的称为最大间隔稀疏表示判别投影（MSRDM）的判别特征提取方法。

MSRDM能够同时编码局部几何结构和类别标签信息，是一种参数自动选择监督式特征提取方法。

为了克服矩阵奇异问题，MSRDM采用了最大间隔目标准则，能够避免矩阵奇异问题并且算法实现上简单高效。

在人脸数据库上的实验结果验证了该算法的有效性。

2最大间隔稀疏表示判别投影（MSRDM）旨在同时保持数据点的局部几何和类别标签信息，基于类内稀疏表示和类间均值稀疏重构，提出一种新颖的基于最大间隔准则的判别特征提取方法，称该方法为最大间隔稀疏表示判别映射（MSRDM）。

假设为训练样本图像，为相应的类标签，其中N是样本数目，c是类别数目，X为所有样本构成的矩阵。

假设我们考虑一个线性变换映射D维图像空间到d维特征空间，其中d（1）其中，是一个变换矩阵。

为了保持类内近邻几何结构，相似于稀疏表示，MSRDM的第一个目标函数是要为在同类样本中寻求其稀疏表示：（2）假设，，为训练样本的稀疏系数向量，其中由范数优化问题计算得到，对应于不同类训练样本的表示系数设置为0，定义加权的邻接矩阵为：（3）其中，绝对值用于表示样本在稀疏表示中的重要性。

对于任意样本和，通过选择使得W 为对称矩阵。

因此，我们定义类内局部几何保持目标函数为：（4）我们将每类的均值向量集用矩阵表示，其中表示类别总数。

为了保持类间近邻几何结构，相似于类内稀疏表示方法，MSRDM的第二个目标函数是在矩阵范围内寻求的稀疏表示：（5）类间间隔最大化目标函数定义为：（6）以分类为目标，我们要寻求的低维映射就是要使得同类样本更加靠近，同时使得不同类均值样本更加远离。

facial_landmarks_idxs的常见用法

FACIAL_LANDMARKS_IDXS是一个有序字典，用于存储人脸特征点的索引。

这些特征点通常用于人脸识别和表情分析。

在人脸识别中，通过检测人脸的特征点位置，可以确定人脸的姿态、表情和形状等重要信息。

在dlib库中，使用FACIAL_LANDMARKS_IDXS可以提取人脸特征点。

该库使用68个点来包围每个部位，例如，第37个点到第42个点代表右眼，在图片上这几个点若显示出来就是把右眼那块区域包围着。

通过这些点之间距离的变化可以判断人脸的变化，比如是否眨眼等操作。

在具体使用中，首先需要将人脸检测器和人脸关键点检测器进行训练。

然后可以使用检测器在图像或视频帧上找到人脸并提取出特征点。

这些特征点可以用于人脸识别、表情分析、姿态估计等任务。

此外，还可以将提取的特征点用于人脸识别和验证。

通过比较两张人脸的特征点，可以确定它们是否匹配，从而验证身份。

总之，FACIAL_LANDMARKS_IDXS是一个用于人脸特征点提取的有序字典，可以在人脸识别、表情分析、姿态估计等任务中使用。

基于LTS优化图像切边LRC的面部遮挡人脸识别研究

基于LTS优化图像切边LRC的面部遮挡人脸识别研究邵丹【摘要】针对现实人脸识别中由于伪装(如围巾、太阳镜和头发)或其他物体引起的面部遮挡而严重影响识别率的问题,提出了一种基于最小截平方和的图像切边线性回归分类算法.首先,使用一个鲁棒性强的估计量检测并裁剪查询样本、训练样本中受污染的像素点;然后,利用线性回归分类算法对图像进行切边;最后,利用LTS计算出规范化的重构误差,并根据重构误差最小的类完成人脸的识别.在通用人脸数据库AR、扩展的YaleB数据库以及一个户外人脸库上的实验验证了所提算法的有效性及可靠性,实验结果表明,相比其他几种回归分类算法,取得了更高的识别率,同时大大降低了训练总完成时间.【期刊名称】《科学技术与工程》【年(卷),期】2014(014)018【总页数】6页(P99-104)【关键词】人脸识别;图像切边;面部遮挡;最小截平方和;线性回归分类;重构误差【作者】邵丹【作者单位】铁道警察学院公安技术系,郑州450000【正文语种】中文【中图分类】TP391.41作为一种使用非常广泛的技术，人脸识别[1]吸引了很多学者的研究兴趣。

过去的几十年里，学者们已经提出了许多构建可靠的人脸识别系统的方法[2]。

当前的框架已经能很好地处理限制条件下采集的人脸图像，然而，识别由伪装(如围巾、太阳镜和头发)或其他物体引起的遮挡人脸在现实人脸识别系统中很常见并且很困难[3]，因此，解决这类非限制条件下的人脸识别问题显得尤为重要[4，5]。

当人脸有遮挡，如特征脸[6]和费舍尔脸[7]这类流行的全局算法就不能使用了，因为提取的特征具有太大的噪声[8]。

因此，学者们也提出了基于局部特征的方法，将人脸图像划分成几个局部块，独立处理每个块[9]，最终基于所有块混合的分类结果进行决策。

例如，文献[10]对每个块应用稀疏表示编码(sparse representation coding, SRC)，使用多数选举作为决策。

基于超像素分割的非局部均值去噪方法

基于超像素分割的非局部均值去噪方法
杨洲;陈莉;贾建
【期刊名称】《计算机应用研究》
【年(卷),期】2018(035)005
【摘要】针对非局部均值(NLM)去噪算法在变化丰富的纹理区域采用平移窗口的方法选择相似块的不足进行了研究,提出一种基于超像素分割的非局部均值去噪算法.该方法充分考虑非局部均值去噪算法中相似性对噪声去除的影响,利用经过超像素分割处理得到的图像块内部相邻像素间以及纹理边缘都具有一定相似性的特点,在超像素分割块基础上优化纹理区域相似窗口的选择策略,提高图像块与中心像素块之间的相似性,从而达到提升非局部均值算法的去噪水平、边缘纹理不被模糊的目的.在多幅经典自然图像上的实验结果表明,该方法能够有效地去除图像中包含的噪声信息,相比于传统的非局部均值方法,保留了更多的纹理信息.
【总页数】5页(P1573-1577)
【作者】杨洲;陈莉;贾建
【作者单位】西北大学信息科学与技术学院,西安710127;西北大学信息科学与技术学院,西安710127;西北大学信息科学与技术学院,西安710127;西北大学数学学院,西安710127
【正文语种】中文
【中图分类】TP391.41
【相关文献】
1.基于PDTDFB变换域各向异性双变量模型和非局部均值滤波的图像去噪方法 [J], 吴建宁;石满红;兴志
2.一种基于多通道联合估计的非局部均值彩色图像去噪方法 [J], 王翔;干宗良;陈昌红;刘峰
3.基于非局部均值算法的图像去噪方法研究 [J], 钱海明; 孙金彦; 王春林
4.基于泊松分布的非局部均值图像去噪方法 [J], 高晓玲
5.基于概率非局部均值和小波阈值的去噪方法 [J], 解令楠
因版权原因，仅展示原文概要，查看原文内容请购买。

在ENVI中进行掩膜处理

ENVI里如何利用矢量对影像进行掩膜如何将矢量转化成ＲＯＩ以及如何使用掩摸工具都是处理影像常用的手段，在这里将两者串联在一起，讲叙了利用矢量范围圈定区域对影像进行掩膜的方法，这在实际应用也非常实用．我们在进行分类的时候有时候会遇到把背景也分到图像中去，比如说像：这样的图片。

这时，我们可以利用掩膜来对其进行处理，从而来消除背景的影像。

具体操作步骤如下：1.首先，打开要进行掩膜处理的遥感影像。

2.在打开的遥感影像上选择Overlay->Region of interest选择感兴趣。

3、准备好矢量和待掩膜的影像，ENVI不支持将线状的矢量转换为面状的掩膜，线状矢量转换为只能转换成线状的ROI，面状的矢量可以直接转换为面状ROI。

要对影像的一个区域淹膜，所以这里矢量这里需要面文件。

确保影像与矢量能够叠加，如果不能叠加，需要将影像与矢量进行配准。

4、将矢量数据转换为ROI：（1）使用Vector->Open Vector打开矢量数据；（2）在Vector显示窗口中的File菜单下选择Export Active Layer to ROIs （3）选择与ROI对应的文件；（4）选择每条记录生成一个ROI；（5）在在主图像窗口里点击右键，选择 ROI TOOLS，选择Save RoIs保存转换的ROI；（6）对于线状的ROI，可以事先将线状的ROI转换为面状ROI再导入ENVI，或者根据我们编的程序将线状的ROI转换为面状ROI。

5、使用ROI生成掩膜（1）在ENVI的菜单下选择Basic Tools->Masking->Build Mask（2）选择需要进行掩膜的影像（3）在掩膜定义窗口中，在Options下选择Import ROIs，然后选择前面生成的面状ROI。

5）生成掩膜点击Apply，输入掩膜文件保存的位置，生成掩膜文件。

4、应用掩膜（1）在Basic Tools下选择Masking->Apply Mask，首先选择需要进行掩膜的文件，然后在选择文件的对话框中选择Select Mask Band，在弹出的对话框中选择前面生成的掩膜文件。