Robust feature matching in 2.3
Robust-forensic-...
Current BiologyMagazineCurrent Biology 28, R1–R16, January 8, 2018 © 2017 Elsevier Ltd. R13Robust forensic matching of confi scated horns to individualpoached African rhinocerosCindy Harper 1,2,*, Anette Ludwig 1, Amy Clarke 1, Kagiso Makgopela 1, Andrey Yurchenko 2, Alan Guthrie 1, Pavel Dobrynin 2, Gaik Tamazian 2,Richard Emslie 3, Marile van Heerden 4, Markus Hofmeyr 1,5, Roderick Potter 6, Johannes Roets 7, Piet Beytell 8, Moses Otiende 9, Linus Kariuki 9,Raoul du Toit 10, Natasha Anderson 10, Joseph Okori 11, Alexey Antonik 2, Klaus-Peter Koepfl i 2,12, Peter Thompson 1,and Stephen J. O’Brien 2,13Black and white rhinoceros (Diceros bicornis and Ceratotherium simum ) are iconic African species that are classifi ed by the International Union for the Conservation of Nature (IUCN) as Critically Endangered and Near Threatened (/), respectively [1]. At the end of the 19th century, Southern white rhinoceros (Ceratotherium simum simum ) numbers had declined to fewer than 50 animals in the Hluhluwe-iMfolozi region of the KwaZulu-Natal (KZN) province of South Africa, mainly due to uncontrolled hunting [2,3]. Efforts by the Natal Parks Board facilitated an increase in population to over 20,000 in 2015 through aggressive conservation management [2].Black rhinoceros (Diceros bicornis ) populations declined from several hundred thousand in the early 19th century to ~65,000 in 1970 and to ~2,400 by 1995 [1] withsubsequent genetic reduction, also due to hunting, land clearances and later poaching [4]. In South Africa, rhinoceros poaching incidents have increased from 13 in 2007 to 1,215 in 2014 [1]. This has occurred despite strict trade bans on rhinoceros products and strict enforcement in recent years.The signifi cant increase in illegal killing of African rhinoceros and the involvement of transnationalorganised criminal syndicates in horn traffi cking has met with increased law enforcement efforts to apprehend, successfully prosecute and sentence traffi ckers and poachers with the aim of reducing poaching. In Africa, wildlife rangers, law enforcement offi cials and genome scientists have instituted a DNA-based individual identifi cation protocol usingcomposite short tandem repeat (STR) genotyping of rhinoceros horns,rhinoceros tissue products and crime scene carcasses to link confi scated evidence to specifi c poaching incidents for support of criminalinvestigations. This method has been used extensively and documented in the RhODIS® (Rhinoceros DNA Index System) database of confi scated horn and living rhinoceros genotypes (http://rhodis.co.za ), eRhODIS™applications to collect field and forensic sample data and RhODIS® biospecimen collection kits. These are made available to trained RhODIS® certifi ed offi cials to fulfi ll chain of custody requirements providing a pipeline to connect illegally traffi cked rhinoceros products to individual poached rhinoceros victims. This study applies a panel of 23 STR (microsatellite) loci to genotype 3,968 individual rhinoceros DNA specimens from distinct white and black rhinoceros populations [5]. We assessed the population genetic structure of these (Supplemental information) and applied them to forensic match analyses of specifi c DNA profi les in more than 120 criminal cases to date.Four methods were applied to support forensic matching of confi scated tissue evidence to crime scenes: fi rst, furthercharacterization and optimization of STR panels informative for rhinoceros species; second, development and application of the RhODIS® database containing genotypes and demographic information of more than 20,000 rhinoceros acquisitions; third, analysis of the populationgenetic structure of white and black rhinoceros species, subspecies and structured populations; and fourth, computation of match probability Correspondencestatistics for specifi c profi les derived from white and black rhinoceroses. We established a reference database consisting of 3,085 genotypes of white rhinoceros (C. simum ) and 883 black rhinoceros (D. bicornis ) sampled since 2010 which provide the basis for robust match probability statistics.The effects of historic range contractions or expansions, migration, translocation andpopulation fragmentation caused by poaching and habitat reduction on rhinoceros population genetic structure have been reported but are limited [6–8]. Southern whiterhinoceros are traditionally considered panmictic and comprising a single subspecies, C. s. simum , as a result of the severe founder effect in the late 19th century [2]. Black rhinoceros are generally subdivided into three modern subspecies, D.b. bicornis , D.b. michaeli and D.b. minor [8]. Population structure of white and black rhinoceros based upon three different analyses (Supplemental information) affi rmed the partition of white versus black rhinoceros species plus the separation of the three black rhinoceros subspecies. The STRUCTURE algorithm revealed a fi ne grain distinctiveness between black rhinoceros D.b. minor populations from Zimbabwe and KwaZulu-Natal (KZN), South Africa and also indicates that black rhinoceros in the Kruger National Park (KNP) are comprised of a mix of KZN and Zimbabwe rhinoceros as expected, since KNP black rhinoceros founders originated from these two locales [9].For forensic match applications, we calculated allele frequencies for all polymorphic unlinked loci for white (3,085 genotypes) and black rhinoceros (883 genotypes). These estimates and other STR locus statistics were calculated for each rhinoceros species. Population differentiation (F ST ) between white and black rhinoceros subspecies supports the recognition of theSouthern white rhinoceros subspecies (C. s. simum ), and three blackrhinoceros subspecies, D.b. bicornis , D.b. michaeli and D.b. minor , with signifi cant partitioning of the Zimbabwe versus KZN D.b. minor populations in the present Africanrhinoceros populations.Current BiologyMagazineR14 Current Biology 28, R1–R16, January 8, 2018Over 5,800 rhinoceros crime cases have been submitted to RhODIS ® since 2010 and in excess of 120 case reports relating carcass material to evidence items (horn, tissue, blood stains and other confi scated materials) have been provided to investigators. Table 1 summarizes nine of these rhinoceros crime cases which have been concluded in court. These are illustrative of where DNA matches were made and the use of thisevidence for prosecution, conviction and sentencing of perpetrators of rhinoceros crimes. Table 1 includes case sample details, species identifi ed and match probability calculated using the RhODIS ® reference database. Thesuccessful prosecution, conviction and sentencing of suspects in South Africa and other countries affi rm the utilityof the RhODIS ®approach in criminal prosecutions of the perpetrators of illegal rhinoceros trade and provide an international legal precedent for prosecution of rhinoceros crimes using a robust forensic matching of confi scated evidence items to specifi c wildlife crime scenes.SUPPLEMENTAL INFORMATIONSupplemental Information includingexperimental procedures, one figure and one table can be found with this article online at https:///10.1016/j.cub.2017.11.005.Table 1. Summary of nine prosecuted cases of rhinoceros crime. Samples were successfully matched using composite STR genotyping with cumulative match probability calculated using a conservative Theta ( ) of 0.1. Details of case with matching evi-dence items, location of poaching incident, species and subspecies identifi ed, cumulative match probability, status of the case (conviction date: sentence) and the nationalities of the accused are provided for six South African cases and single cases from Kenya, Namibia and Singapore. (KNP – Kruger National Park, SA – South Africa, ORTIA – OR Tambo International Airport, HiP – Hluhluwe-iMfolozi Park, OPC – Ol Pejeta Conservancy, ENP – Etosha National Park). a and b refer to match probability calculations for specifi c white and black rhinoceros summarised in Supple-mental information.REFERENCES1. Emslie, R.H., Milliken, T., Talukdar, B., Ellis,S., Adcock, K., and Knight, M.H. (2016). African and Asian Rhinoceroses - Status,Conservation and Trade. In A Report from the IUCN Species Survival Commission (IUCN SSC) African and Asian Rhino SpecialistGroups and TRAFFIC to the CITES Secretariat pursuant to Resolution Conf. 9.14 (Rev. CoP15).2. Player, I. (2013). The White Rhino Saga,(Johannesburg: Jonathan Ball Publishers). 3. Walker, C., and Walker, A. (2012). The RhinoKeepers, (Johannesburg: Jacana Media). 4. Milliken, T., and Shaw, J. (2012). The SouthAfrica – Viet Nam Rhino Horn Trade Nexus: A deadly combination of institutional lapses, corrupt wildlife industry professionals and Asian crime syndicates. TRAFFIC, Johannesburg, South Africa.5. Harper, C.K., Vermeulen, G.J., Clarke, A.B.,De Wet, J.I., and Guthrie, A.J. (2013).Extraction of nuclear DNA from rhinoceros horn and characterization of DNA profi ling systems for white (Ceratotherium simum ) and black (Diceros bicornis ) rhinoceros. Forensic Sci. Int. Genet. 7, 428–433.6. Anderson-Lederer, R.M., Linklater, W.L., andRitchie, P .A. (2012). Limited mitochondrial DNA variation within South Africa’s black rhino (Diceros bicornis minor ) population and implications for management. Afr. J. Ecol. 50, 404–413.7. Kotzé, A., Dalton, D.L., Du Toit, R.,Anderson, N., and Moodley, Y. (2014). Genetic structure of the black rhinoceros (Diceros bicornis ) in south-eastern Africa. Conserv. Genet. 15, 1479–1489.8. Moodley, Y., Russo, I.R.M., Dalton, D.L., Kotzé,A., Muya, S., Haubensak, P ., Bálint,B., Munimanda, G.K., Deimel,C., Setzer, A.,et al . (2017). Extinctions, genetic erosion and conservation options for the black rhinoceros (Diceros bicornis ). Sci. Rep. 7, 41417.9. Hall-Martin, A. (1988). Conservation of the blackrhino: the strategy of the National Parks Board of South Africa. Quagga 1, 12–17.1Faculty of Veterinary Science, University of Pretoria, Onderstepoort 0110, South Africa. 2Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia 199004. 3IUCN SSC African Rhino Specialist Group, Hilton 3245, South Africa. 4National Prosecuting Authority, Silverton 0184, South Africa. 5Veterinary Wildlife Services, South African National Parks, Skukuza, South Africa. 6Ezemvelo KZN Wildlife, Queen Elizabeth Park, Pietermaritzburg 3201, South Africa. 7South African Police Service, Stock Theft and Endangered Species Unit, Pretoria 0001, South Africa. 8Ministry of Environment and Tourism, Windhoek, Namibia. 9Kenya Wildlife Service, Nairobi 00100, Kenya. 10Lowveld Rhino Trust, Harare, Zimbabwe. 11WWF: African Rhino Programme, Cape Town, South Africa. 12Smithsonian Conservation Biology Institute, 3001 Connecticut Ave NW,Washington, DC 20008, USA. 13Guy Harvey Oceanographic Center, Nova Southeastern University, 8000 North Ocean Drive, Ft Lauderdale, FL 33004, USA.*E-mail: ******************.za。
图像拼接算法及实现(一).
图像拼接算法及实现(一)论文关键词:图像拼接图像配准图像融合全景图论文摘要:图像拼接(image mosaic)技术是将一组相互间重叠部分的图像序列进行空间匹配对准,经重采样合成后形成一幅包含各图像序列信息的宽视角场景的、完整的、高清晰的新图像的技术。
图像拼接在摄影测量学、计算机视觉、遥感图像处理、医学图像分析、计算机图形学等领域有着广泛的应用价值。
一般来说,图像拼接的过程由图像获取,图像配准,图像合成三步骤组成,其中图像配准是整个图像拼接的基础。
本文研究了两种图像配准算法:基于特征和基于变换域的图像配准算法。
在基于特征的配准算法的基础上,提出一种稳健的基于特征点的配准算法。
首先改进Harris角点检测算法,有效提高所提取特征点的速度和精度。
然后利用相似测度NCC(normalized cross correlation——归一化互相关),通过用双向最大相关系数匹配的方法提取出初始特征点对,用随机采样法RANSAC(Random Sample Consensus)剔除伪特征点对,实现特征点对的精确匹配。
最后用正确的特征点匹配对实现图像的配准。
本文提出的算法适应性较强,在重复性纹理、旋转角度比较大等较难自动匹配场合下仍可以准确实现图像配准。
Abstract:Image mosaic is a technology that carries on thespatial matching to a series of image which are overlapped with each other, and finally builds a seamless and high quality image which has high resolution and big eyeshot. Image mosaic has widely applications in the fields of photogrammetry, computer vision, remote sensing image processing, medical image analysis, computer graphic and so on. 。
人脸识别介绍_IntroFaceDetectRecognition
Knowledge-based Methods: Summary
Pros:
Easy to come up with simple rules Based on the coded rules, facial features in an input image are extracted first, and face candidates are identified Work well for face localization in uncluttered background
Template-Based Methods: Summary
Pros:
Simple
Cons:
Templates needs to be initialized near the face images Difficult to enumerate templates for different poses (similar to knowledgebased methods)
Knowledge-Based Methods
Top Top-down approach: Represent a face using a set of human-coded rules Example:
The center part of face has uniform intensity values The difference between the average intensity values of the center part and the upper part is significant A face often appears with two eyes that are symmetric to each other, a nose and a mouth
image alignment and stitching a tutorial
Richard Szeliski Last updated, December 10, 2006 Technical Report MSR-TR-2004-92
This tutorial reviews image alignment and image stitching algorithms. Image alignment algorithms can discover the correspondence relationships among images with varying degrees of overlap. They are ideally suited for applications such as video stabilization, summarization, and the creation of panoramic mosaics. Image stitching algorithms take the alignment estimates produced by such registration algorithms and blend the images in a seamless manner, taking care to deal with potential problems such as blurring or ghosting caused by parallax and scene movement as well as varying image exposures. This tutorial reviews the basic motion models underlying alignment and stitching algorithms, describes effective direct (pixel-based) and feature-based alignment algorithms, and describes blending algorithms used to produce seamless mosaics. It closes with a discussion of open research problems in the area.
多条带侧扫声呐图像精拼接方法研究
2021年5期创新前沿科技创新与应用Technology Innovation and Application多条带侧扫声呐图像精拼接方法研究*高飞1,王晓2*,杨敬华2,张博宇2,周海波2,陈佳星2(1.青岛市勘察测绘研究院,山东青岛266032;2.江苏海洋大学海洋技术与测绘学院,江苏连云港222005)引言随着陆地资源日益枯竭,世界各国已将资源开发和利用的重点转向海洋,我国为此制定了“海洋强国”战略。
针对海洋的勘察活动日益增多,了解海底表层及浅表层结构对海洋科学研究和海洋工程建设意义重大[1]。
侧扫声呐(Side Scan Sonar ,SSS )作为海底高分辨率图像的快速获取技术,在海洋工程建设、海底资源开发和目标探测、识别等领域应用广泛[2-7]。
由于侧扫声呐拖曳式作业和海洋中潮汐、波浪等环境影响,由测量船GNSS 坐标推算所得托鱼位置存在偏差,地理编码图像位置不准,因此,多条带编码拼接图像存在相邻条带目标错位问题;常用的国外数据处理软件诸如Isis (Triton )、Sips (Caris )、SonarWeb (Chesaspeake )等,均提供地理拼接功能[8,9],不能实现海底地貌“一张图”精细获取。
为解决此问题,Zhao等[10,11]提出了相邻条带SSS 图像SURF 特征拼接方法,为解决SURF 特征匹配耗时问题,采取了基于航迹坐标的图像分段策略,一定程度上提高了运算效率;王爱学等[12]顾及目标的局部畸变问题,给出了弹性匹配策略,实现了共视海床目标的绝对保形;郭军[13],倪先锋[14],侯雪[15],伍梦[16],潘建平[17]等也研究了相关SSS 图像SURF 特征拼接问题;但前述所有方法SURF 特征匹配耗时均不能满足大区域图像拼接程序实时处理的要求。
且传统特征拼接时,固定一幅图像,旋转变换其余条带,最远端条带图像拼接后,地理位置丢失;若存在不能特征拼接条带,也无法联合地理编码和特征实现海底地貌“一张图”精细获取。
Illumination-robust pattern matching using distorted color histograms
Pattern Matching Using Distorted ColorHistogramsGeorg Thimm and Juergen LuettinInstitut Dalle Molle d’Intelligence Artificielle Perceptive(IDIAP),C.P.592,CH-1920Martigny,Switzerland.Email:Thimm,Luettin@idiap.chThe appearance of objects is often subject to illumination variations,which impedes the recognition of these objects.Appearance changes caused by illumination variation can roughly be classified in partial shadowing(including shadows caused by the object itself),occlusion,specular reflection,total shadowing,and global illumination changes (i.e.the average grey-level of the scene,respectively of the whole object,is changing). Occlusion,partial shadowing,and specular reflection cause the most difficulties in the context of computer vision.Highly sophisticated approaches use for example an approx-imate3-dimensional representation of the scene and the position of the light source(s) [6][8],a combined PCA model of shape and intensity on landmark points[4],respect-ively active shapes[5][7],a model for the object under multiple illumination situations (Eigenfaces[13]),a direct model of the illumination variation and specular reflections[2], or3-dimensional models and neural networks to estimate the position of the light sources [3].Global illumination changes and total shadowing,however,are not well modeled by these approaches.To the knowledge of the authors,global illumination changes were only considered in combination with other image analysis methods,for example in the context of change detection(see[12]for more references),or opticalflow[10].We assume that it is inefficient to model directly and only the appearance change of an object.A better approach will model the illumination that is global to the object separate from other appearance changes.The example infigure1illustrates a possible situation considered in this publication. Suppose that faces have to be recognized in an outdoor scene,neglecting appearance changes due to orientation and partial shadowing.Depending on the daytime,the il-lumination of the scene changes and at the same time the relative brightest of objects. Consequently,a normalized grey-level histogram1of the scene is also subject to altern-ations.For example,the grey-level histogram of the scene in the late and early eveningare different.The artificial light sources correspond as before to the brightest parts of the histogram,but the middle of the histogram is “emptied”on the cost of the darker parts.At the same time,the face appears to be darker as compared to the image taken in the early evening.Looking at the distribution of grey-levels,this means that the face contributes to the score of lower intensities.This is symbolized by the dashed boxes in figure 1.In other words,the grey-level histogram is non-linearly projected onto another,or distorted .In order to compensate for the distortion of the grey-level histograms,the histogram of the %gray-level %gray-levelEarly eveningLate evening N o r m a l i z e d g r a y l e v e l h i s t o g r a m s Figure 1:illumination changes distort the grey-level histogram.template has to be modified prior to a matching with some image location.The function mapping the histogram of the template to the histogram of the image models the illu-mination variation.Therefore the shape of this function is constrained according to three assumptions:1.As the image is normalized,the lowest and highest intensities in the grey-level histogram will be mapped onto themselves.2.Contrasts diminish or augment comparably when the global illumination changes.Therefore,modifications of grey-levels must vary smoothly within neighboring in-tensity values.3.The relative brightness of arbitrary objects must remain unchanged:if a certain spot in the image is brighter than another spot,it will remain brighter or,in the limit,assume the same intensity.A simple pattern algorithm using such a histogram mapping function matching can be formulated in the following way:let be a feature vector of grey values,representing the template and a vector extracted from some image to be compared with .Then the most likely position for the object represented by the template can be defined as2Where the function distorts the color or grey-level histogram.is parameterized by a vector corresponding to the deviation of the illumination as compared to the il-lumination of he template.Since is usually unknown,it has to be included into the minimization of the error:As discussed earlier,has to fulfill some conditions in order to avoid a tooflexible mapping which would result in low scores for illicit image locations.1.The invariability of the lowest and highest intensity can be directly formulated as acondition on.Supposed that the images from which and were extracted are normalized and black is coded as min and white as max(usually and).Then has to fulfill:,andmin minmax max2.The similarity constrain on the variation of close grey-levels can be fulfilled bydemanding that posses a smoothfirst derivative.3.That grey-levels are not interchangeable implies,that the mapping function is non-decreasing for the range of valid grey-levels.As posses afirst derivative:for min maxConsidering these constraints,was chosen to be a second order polynomial.It follows from the constraints above that has the formmin maxwith a free variable restricted to the intervalmax minThis function has the property,that,depending on the sign of,either the contrasts in the brighter parts,respectively the darker parts,of the image are augmented.At the same time,the contrasts in the darker parts,respectively the brighter parts,are lowered.The form of has the advantage that an explicit solution for exists.The proposed method was tested on4,000X-ray images of the vocal tract of talking persons[9].In these tests,fillings in the upper and lower teeth,as well as the tips of the front teeth were tracked.The results are compared in experiments with a standard pattern matching algorithm(that is equivalent to)and the Eigenface method using different numbers of eigenvectors.The results showed that the proposed method performed better than each of the other approaches.3In the way the distorted pattern matching approach is described above,it can not sensibly be applied when appearance changes are evoked by global illumination changes and other incidents.This deficiency may in principle be overcome by combining it with other techniques,for example PCA models of the grey-level appearance[5].For this method of object modelization,the most likely position of can be redefined aswhere an eigenvector matrix,a mean appearance,and an appearance vector de-scribe the appearance of an object under constant global illumination.First experiments were performed with automatically generated images of faces under various illuminations[1](implemented by H.Rowley[11]).In these experiments the combined method showed some improvement for the tasks“locate the mouth”and“locate the left eye”over pattern matching with and without distorted grey-level histogram and genuine grey-level PCA modelization.Full details and statistics on the performance of the algorithms will be included in the full paper.ConclusionWe proposed a simple to use,but still efficient,method for the modelization of a global illumination using distorted grey-level histograms.A quantitative comparison in experi-ments with standard pattern matching and PCA modelization of the grey-level appearance shows that the proposed algorithm outperforms both.Besides this,pattern matching with distorted histograms has a complexity close to standard pattern matching.This gives a further advantage over the Eigenface algorithm,which has a higher computational com-plexity and is somewhat more difficult to use and implement.References1.P.N.Belhumeur and D.J.Kriegman.What is the set of images of an object under allpossible lighting conditions?In Proceedings of the1996Conference on Computer Vision and Pattern Recognition(CVPR’96),pages270–277,1996.2.Michael J.Black,David J.Fleet,and Yaser Yacoob.A framework for modeling ap-pearance change in image sequences.In Proc.of the Sixth International Conference on Computer Vision(ICCV98).IEEE,January1998.3.R.Brunelli.Estimation of pose and illuminant direction for face processing.Imageand Vision Computing,10(15):741–748,1997.44.T.F.Cootes and C.J.Taylor.Modelling object appearance using the grey-level sur-face.In Proceedings of the5th British Machine Vision Conference,pages479–488, York,1994.5.T.F.Cootes and ing grey-level models to improve active shape modelsearch.In Proceedings-International Conference on Pattern Recognition,volume1, pages63–67.IEEE,Piscataway,NJ,USA,1994.6.A.S.Georghiades,D.J.Kriegman,and P.N.Belhumeur.Illumination cones for recog-nition under variable lighting:Faces.In IEEE Conf.on Computer Vision and Pattern Recognition,1998.nitis,C.J.Taylor,and T.F.Cootes.Recognising human faces using shapeand grey-level information.In Proceedings of the3rd International Conference on Automation,Robotics and Computer Vision,volume2,pages1153–1157,Singapore, 1994.8.N.Mukawa.Estimation of shape,reflection coefficients,and illuminant directionfrom image sequences.In International Conference on Computer Vision(ICCV90), pages507–512,1990.9.K.G.Munhall,E.V atikiotis-Bateson,and Y.Tokhura.X-rayfilm database for speechresearch.Journal of the Acoustical Society of America,98(2):1222–1224,1995. 10.S.Negahdaripour and C.H.Y u.A generalized brightness change model for comput-ing opticalflow.In International Conference on Computer Vision(ICCV93),pages 2–11,1993.11.Henry Rowley.WWW Home page,URL:/afs//user/har/Web/home.html,1998.12.K.D.Skifstad and R.C.Jain.Illumination independent change detection for realworld image puter Vision Graphics and Image Processing(CVGIP), 46(3):387–399,June1989.13.M.Turk and A.Pentland.Eigenfaces for recognition.Journal of Cognitive Neuros-cience,3(1):71–96,1991.5。
英文-基于视觉的旋翼无人机地面目标跟踪
SIFT algorithm is used to recognize the ground target in 用于识别地面目 this paper. The SIFT algorithm, first proposed by David. G. 标 Lowe in 1999 [5] and improved in 2004 [6], is a hot field of feature-matching at present, and its effectiveness is invariant of image rotation, scale zoom and brightness transformations, and also maintains a certain degree of stability on 透视变换 仿射变换 perspective transformation and affine transformation. SIFT feature points are scale-invariant local points of an image, with the characteristics of good uniqueness, informative, large amounts, high speed, scalability, and so on. A. SIFT Algorithm The SIFT algorithm consists of four parts. The process of SIFT feature construction is shown in Fig. 1.
目标识别算法
SIFT算法本文是
性、准确性快
I. INTRODUCTION UAV is one of the best platforms to perform dull, dirty or dangerous (3D) tasks [1]. UAV can be used in various applications where human is impossible to intervene. It greatly expands the application space of visual tracking. Research on the technology of vision based ground target tracking for UAV has been a great concern among cybernetic experts and robotic experts, and has become one of the most active research directions in UAV applications. Currently, researchers from America, Britain, France and Sweden are on the cutting edge in this field [2]. Typical visual tracking platforms for UAV include Scan Eagle, GTMax, RQ-11, RQ-16, DragonFly, etc. Because of many advantages, such as small size, light weight, flexible, easy to carry and low cost, rotor UAV has a broad application prospect in the fields of traffic monitoring, resource exploration, electricity patrol, forest fire prevention, aerial photography, atmospheric monitoring, etc [3]. Vision based ground target tracking system for rotor UAV is such a system that gets images by the camera installed on a low-flying rotor UAV, then recognizes the target in the images and estimates the motion state of the target, and finally according to the visual information regulates the pan-tilt-zoom (PTZ) camera automatically to keep the target at the center of the camera view. In view of the current situation of international researches, the study of ground target tracking system for
图像拼接技术
图像拼接技术图像拼接技术简介图像拼接是将同⼀场景的多个重叠图像拼接成较⼤的图像的⼀种⽅法,在医学成像、计算机视觉、卫星数据、军事⽬标⾃动识别等领域具有重要意义。
图像拼接的输出是两个输⼊图像的并集。
所谓图像拼接就是将两张有共同拍摄区域的图像⽆缝拼接在⼀起。
这种应⽤可应⽤于车站的动态检测、商城的⼈流检测、⼗字路⼝的交通检测等,给⼈以全景图像,告别⽬前的监控墙或视频区域显⽰的时代,减轻⼯作⼈员“眼”的压⼒。
基本思想:图像拼接并⾮简单的将两张有共同区域的图像把相同的区域重合起来,由于两张图像拍摄的⾓度与位置不同,虽然有共同的区域,但拍摄时相机的内参与外参均不相同,所以简单的覆盖拼接是不合理的。
因此,对于图像拼接需要以⼀张图像为基准对另外⼀张图像进⾏相应的变换(透视变换),然后将透视变换后的图像进⾏简单的平移后与基准图像的共同区域进⾏重合。
说明:1、图像预处理是为了增强图像的特征,预处理可以包含:灰度化、去燥、畸变校正等。
2、特征点提取可⽤的⽅法有:sift、surf、fast、Harris等,sift具有旋转与缩放不变性,surf为sift的加速,检测效果都不错,在此先⽤sift进⾏实现。
3、单应性矩阵求取时要清楚映射关系,是第⼀张图像空间到第⼆张图像空间的映射,还是第⼆张图像到第⼀张图像的映射,这个在变换的时候很重要。
4、判断左右(上下)图像是为了明确拼接关系,建议将左右图像的判断放在求取单应性矩阵之前,这样映射关系不⾄于颠倒。
否则将会出现拼接成的图像有⼀半是空的。
通常⽤到五个步骤:特征提取 Feature Extraction:在所有输⼊图像中检测特征点图像配准 Image Registration:建⽴了图像之间的⼏何对应关系,使它们可以在⼀个共同的参照系中进⾏变换、⽐较和分析。
⼤致可以分为以下⼏个类1. 直接使⽤图像的像素值的算法,例如,correlation methods2. 在频域处理的算法,例如,基于快速傅⾥叶变换(FFT-based)⽅法;3. 低⽔平特征的算法low level features,通常⽤到边缘和⾓点,例如,基于特征的⽅法,4. ⾼⽔平特征的算法high-level features,通常⽤到图像物体重叠部分,特征关系,例如,图论⽅法(Graph-theoretic methods)图像变形 Warping:图像变形是指将其中⼀幅图像的图像重投影,并将图像放置在更⼤的画布上。
Speeded-Up Robust Features (SURF)
Speeded-Up Robust Features (SURF)Herbert Bay a ,Andreas Essa,*,Tinne Tuytelaars b ,Luc Van Goola,ba ETH Zurich,BIWI,Sternwartstrasse 7,CH-8092Zurich,Switzerland bK.U.Leuven,ESAT-PSI,Kasteelpark Arenberg 10,B-3001Leuven,BelgiumReceived 31October 2006;accepted 5September 2007Available online 15December 2007AbstractThis article presents a novel scale-and rotation-invariant detector and descriptor,coined SURF (Speeded-Up Robust Features).SURF approximates or even outperforms previously proposed schemes with respect to repeatability,distinctiveness,and robustness,yet can be computed and compared much faster.This is achieved by relying on integral images for image convolutions;by building on the strengths of the leading existing detectors and descriptors (specifically,using a Hessian matrix-based measure for the detector,and a distribution-based descriptor);and by sim-plifying these methods to the essential.This leads to a combination of novel detection,description,and matching steps.The paper encompasses a detailed description of the detector and descriptor and then explores the effects of the most important param-eters.We conclude the article with SURF’s application to two challenging,yet converse goals:camera calibration as a special case of image registration,and object recognition.Our experiments underline SURF’s usefulness in a broad range of topics in computer vision.Ó2007Elsevier Inc.All rights reserved.Keywords:Interest points;Local features;Feature description;Camera calibration;Object recognition1.IntroductionThe task of finding point correspondences between two images of the same scene or object is part of many com-puter vision applications.Image registration,camera cali-bration,object recognition,and image retrieval are just a few.The search for discrete image point correspondences can be divided into three main steps.First,‘interest points’are selected at distinctive locations in the image,such as cor-ners,blobs,and T-junctions.The most valuable property of an interest point detector is its repeatability.The repeat-ability expresses the reliability of a detector for finding the same physical interest points under different viewing condi-tions.Next,the neighbourhood of every interest point is represented by a feature vector.This descriptor has to be distinctive and at the same time robust to noise,detectiondisplacements and geometric and photometric deforma-tions.Finally,the descriptor vectors are matched between different images.The matching is based on a distance between the vectors,e.g.the Mahalanobis or Euclidean dis-tance.The dimension of the descriptor has a direct impact on the time this takes,and less dimensions are desirable for fast interest point matching.However,lower dimensional feature vectors are in general less distinctive than their high-dimensional counterparts.It has been our goal to develop both a detector and descriptor that,in comparison to the state-of-the-art,are fast to compute while not sacrificing performance.In order to succeed,one has to strike a balance between the above requirements like simplifying the detection scheme while keeping it accurate,and reducing the descriptor’s size while keeping it sufficiently distinctive.A wide variety of detectors and descriptors have already been proposed in the literature (e.g.[21,24,27,39,25]).Also,detailed comparisons and evaluations on benchmarking datasets have been performed [28,30,31].Our fast detector and descriptor,called SURF (Speeded-Up Robust1077-3142/$-see front matter Ó2007Elsevier Inc.All rights reserved.doi:10.1016/j.cviu.2007.09.014*Corresponding author.E-mail address:aess@vision.ee.ethz.ch (A.Ess)./locate/cviuAvailable online at Computer Vision and Image Understanding 110(2008)346–359Features),was introduced in[4].It is built on the insights gained from this previous work.In our experiments on these benchmarking datasets,SURF’s detector and descriptor are not only faster,but the former is also more repeatable and the latter more distinctive.We focus on scale and in-plane rotation-invariant detec-tors and descriptors.These seem to offer a good compromise between feature complexity and robustness to commonly occurring photometric deformations.Skew,anisotropic scaling,and perspective effects are assumed to be second order effects,that are covered to some degree by the overall robustness of the descriptor.Note that the descriptor can be extended towards affine-invariant regions using affine normalisation of the ellipse(cf.[31]),although this will have an impact on the computation time.Extending the detector, on the other hand,is less straightforward.Concerning the photometric deformations,we assume a simple linear model with a bias(offset)and contrast change(scale factor).Nei-ther detector nor descriptor use colour information.The article is structured as follows.In Section2,we give a review over previous work in interest point detection and description.In Section3,we describe the strategy applied for fast and robust interest point detection.The input image is analysed at different scales in order to guarantee invariance to scale changes.The detected interest points are provided with a rotation and scale-invariant descriptor in Section4.Furthermore,a simple and efficientfirst-line indexing technique,based on the contrast of the interest point with its surrounding,is proposed.In Section5,some of the available parameters and their effects are discussed,including the benefits of an upright version(not invariant to image rotation).We also investi-gate SURF’s performance in two important application scenarios.First,we consider a special case of image regis-tration,namely the problem of camera calibration for3D reconstruction.Second,we will explore SURF’s applica-tion to an object recognition experiment.Both applications highlight SURF’s benefits in terms of speed and robustness as opposed to other strategies.The article is concluded in Section6.2.Related work2.1.Interest point detectionThe most widely used detector is probably the Harris corner detector[15],proposed back in1988.It is based on the eigenvalues of the second moment matrix.However, Harris corners are not scale invariant.Lindeberg[21]intro-duced the concept of automatic scale selection.This allows to detect interest points in an image,each with their own characteristic scale.He experimented with both the deter-minant of the Hessian matrix as well as the Laplacian (which corresponds to the trace of the Hessian matrix)to detect blob-like structures.Mikolajczyk and Schmid[26] refined this method,creating robust and scale-invariant feature detectors with high repeatability,which they coined Harris-Laplace and Hessian-Laplace.They used a(scale-adapted)Harris measure or the determinant of the Hessian matrix to select the location,and the Laplacian to select the scale.Focusing on speed,Lowe[23]proposed to approxi-mate the Laplacian of Gaussians(LoG)by a Difference of Gaussians(DoG)filter.Several other scale-invariant interest point detectors have been proposed.Examples are the salient region detec-tor,proposed by Kadir and Brady[17],which maximises the entropy within the region,and the edge-based region detector proposed by Jurie and Schmid[16].They seem less amenable to acceleration though.Also several affine-invari-ant feature detectors have been proposed that can cope with wider viewpoint changes.However,these fall outside the scope of this article.From studying the existing detectors and from published comparisons[29,30],we can conclude that Hessian-based detectors are more stable and repeatable than their Harris-based counterparts.Moreover,using the determinant of the Hessian matrix rather than its trace(the Laplacian) seems advantageous,as itfires less on elongated,ill-localised structures.We also observed that approximations like the DoG can bring speed at a low cost in terms of lost accuracy.2.2.Interest point descriptionAn even larger variety of feature descriptors has been proposed,like Gaussian derivatives[11],moment invari-ants[32],complex features[1],steerablefilters[12], phase-based local features[6],and descriptors representing the distribution of smaller-scale features within the interest point neighbourhood.The latter,introduced by Lowe[24], have been shown to outperform the others[28].This can be explained by the fact that they capture a substantial amount of information about the spatial intensity patterns, while at the same time being robust to small deformations or localisation errors.The descriptor in[24],called SIFT for short,computes a histogram of local oriented gradients around the interest point and stores the bins in a128D vec-tor(8orientation bins for each of4Â4location bins).Various refinements on this basic scheme have been pro-posed.Ke and Sukthankar[18]applied PCA on the gradi-ent image around the detected interest point.This PCA-SIFT yields a36D descriptor which is fast for matching, but proved to be less distinctive than SIFT in a second comparative study by Mikolajczyk and Schmid[30];and applying PCA slows down feature computation.In the same paper[30],the authors proposed a variant of SIFT, called GLOH,which proved to be even more distinctive with the same number of dimensions.However,GLOH is computationally more expensive as it uses again PCA for data compression.The SIFT descriptor still seems the most appealing descriptor for practical uses,and hence also the most widely used nowadays.It is distinctive and relatively fast, which is crucial for on-line applications.Recently,Se et al.[37]implemented SIFT on a Field ProgrammableH.Bay et al./Computer Vision and Image Understanding110(2008)346–359347Gate Array(FPGA)and improved its speed by an order of magnitude.Meanwhile,Grabner et al.[14]also used inte-gral images to approximate SIFT.Their detection step is based on difference-of-mean(without interpolation),their description step on integral histograms.They achieve about the same speed as we do(though the description step is constant in speed),but at the cost of reduced quality compared to SIFT.Generally,the high dimensionality of the descriptor is a drawback of SIFT at the matching step. For on-line applications relying only on a regular PC,each one of the three steps(detection,description,matching)has to be fast.An entire body of work is available on speeding up the matching step.All of them come at the expense of getting an approximative matching.Methods include the best-bin-first proposed by Lowe[24],balltrees[35],vocabulary trees[34],locality sensitive hashing[9],or redundant bit vectors[13].Complementary to this,we suggest the use of the Hessian matrix’s trace to significantly increase the matching speed.Together with the descriptor’s low dimen-sionality,any matching algorithm is bound to perform faster.3.Interest point detectionOur approach for interest point detection uses a very basic Hessian matrix approximation.This lends itself to the use of integral images as made popular by Viola and Jones[41],which reduces the computation time drastically. Integral imagesfit in the more general framework of box-lets,as proposed by Simard et al.[38].3.1.Integral imagesIn order to make the article more self-contained,we briefly discuss the concept of integral images.They allow for fast computation of box type convolutionfilters.The entry of an integral image I RðxÞat a location x¼ðx;yÞT represents the sum of all pixels in the input image I within a rectangular region formed by the origin and x.I RðxÞ¼X i6xi¼0X j6yj¼0Iði;jÞð1ÞOnce the integral image has been computed,it takes three additions to calculate the sum of the intensities over any upright,rectangular area(see Fig.1).Hence,the calcu-lation time is independent of its size.This is important in our approach,as we use bigfilter sizes.3.2.Hessian matrix-based interest pointsWe base our detector on the Hessian matrix because of its good performance in accuracy.More precisely,we detect blob-like structures at locations where the determi-nant is maximum.In contrast to the Hessian-Laplace detector by Mikolajczyk and Schmid[26],we rely on the determinant of the Hessian also for the scale selection,as done by Lindeberg[21].Given a point x¼ðx;yÞin an image I,the Hessian matrix Hðx;rÞin x at scale r is defined as followsHðx;rÞ¼L xxðx;rÞL xyðx;rÞL xyðx;rÞL yyðx;rÞ;ð2Þwhere L xxðx;rÞis the convolution of the Gaussian secondorder derivative o22gðrÞwith the image I in point x,and similarly for L xyðx;rÞand L yyðx;rÞ.Gaussians are optimal for scale-space analysis[19,20], but in practice they have to be discretised and cropped (Fig.2,left half).This leads to a loss in repeatability under image rotations around odd multiples of p.This weakness holds for Hessian-based detectors in general. Fig.3shows the repeatability rate of two detectors based on the Hessian matrix for pure image rotation. The repeatability attains a maximum around multiples of p2.This is due to the square shape of thefilter.Nev-ertheless,the detectors still perform well,and the slight decrease in performance does not outweigh the advan-tage of fast convolutions brought by the discretisation and cropping.As realfilters are non-ideal in any case, and given Lowe’s success with his LoG approximations, we push the approximation for the Hessian matrix even further with boxfilters(in the right half of Fig.2). These approximate second order Gaussian derivatives and can be evaluated at a very low computationalcost ing integral images,it takes only three additions and four memory accesses to calculate the sum of intensities inside a rectangular region of anysize.Fig.2.Left to right:The(discretised and cropped)Gaussian second order partial derivative in y-(L yy)and xy-direction(L xy),respectively;our approximation for the second order Gaussian partial derivative in y-(D yy) and xy-direction(D xy).The grey regions are equal to zero.348H.Bay et al./Computer Vision and Image Understanding110(2008)346–359using integral images.The calculation time therefore is independent of thefilter size.As shown in Section5 and Fig.3,the performance is comparable or better than with the discretised and cropped Gaussians.The9Â9boxfilters in Fig.2are approximations of a Gaussian with r¼1:2and represent the lowest scale(i.e. highest spatial resolution)for computing the blob response maps.We will denote them by D xx,D yy,and D xy.The weights applied to the rectangular regions are kept simple for computational efficiency.This yieldsdetðH approxÞ¼D xx D yyÀðwD xyÞ2:ð3ÞThe relative weight w of thefilter responses is used to bal-ance the expression for the Hessian’s determinant.This is needed for the energy conservation between the Gaussian kernels and the approximated Gaussian kernels,w¼j L xyð1:2ÞjFj D yyð9ÞjFj L yyð1:2ÞjFj D xyð9ÞjF¼0:912:::’0:9;ð4Þwhere j x jF is the Frobenius norm.Notice that for theoret-ical correctness,the weighting changes depending on the scale.In practice,we keep this factor constant,as this did not have a significant impact on the results in our experiments.Furthermore,thefilter responses are normalised with respect to their size.This guarantees a constant Frobenius norm for anyfilter size,an important aspect for the scale space analysis as discussed in the next section.The approximated determinant of the Hessian repre-sents the blob response in the image at location x.These responses are stored in a blob response map over different scales,and local maxima are detected as explained in Sec-tion3.4.3.3.Scale space representationInterest points need to be found at different scales,not least because the search of correspondences often requires their comparison in images where they are seen at different scales.Scale spaces are usually implemented as an image pyramid.The images are repeatedly smoothed with a Gaussian and then sub-sampled in order to achieve a higher level of the pyramid.Lowe[24]subtracts these pyr-amid layers in order to get the DoG(Difference of Gaussi-ans)images where edges and blobs can be found.Due to the use of boxfilters and integral images,we do not have to iteratively apply the samefilter to the output of a previouslyfiltered layer,but instead can apply boxfilters of any size at exactly the same speed directly on the original image and even in parallel(although the latter is not exploited here).Therefore,the scale space is analysed by up-scaling thefilter size rather than iteratively reducing the image size,Fig.4.The output of the9Â9filter,intro-duced in previous section,is considered as the initial scale layer,to which we will refer as scale s¼1:2(approximating Gaussian derivatives with r¼1:2).The following layers are obtained byfiltering the image with gradually bigger masks,taking into account the discrete nature of integral images and the specific structure of ourfilters.Note that our main motivation for this type of sampling is its computational efficiency.Furthermore,as we do not have to downsample the image,there is no aliasing.On the downside,boxfilters preserve high-frequency compo-nents that can get lost in zoomed-out variants of the same scene,which can limit scale-invariance.This was however not noticeable in our experiments.The scale space is divided into octaves.An octave repre-sents a series offilter response maps obtained by convolv-ing the same input image with afilter of increasing size.In total,an octave encompasses a scaling factor of2(which implies that one needs to more than double thefilter size, see below).Each octave is subdivided into a constant num-ber of scale levels.Due to the discrete nature of integral images,the minimum scale difference between two subse-quent scales depends on the length l0of the positive or neg-ative lobes of the partial second order derivative in the direction of derivation(x or y),which is set to a third of thefilter size length.For the9Â9filter,this length l0is 3.For two successive levels,we must increase this size byFig.3.Top:Repeatability score for image rotation of up to180°.Hessian-based detectors have in general a lower repeatability score for anglesFig.4.Instead of iteratively reducing the image size(left),the use ofintegral images allows the up-scaling of thefilter at constant cost(right).H.Bay et al./Computer Vision and Image Understanding110(2008)346–359349a minimum of 2pixels (1pixel on every side)in order to keep the size uneven and thus ensure the presence of the central pixel.This results in a total increase of the mask size by 6pixels (see Fig.5).Note that for dimensions different from l 0(e.g.the width of the central band for the vertical filter in Fig.5),rescaling the mask introduces rounding-offerrors.However,since these errors are typically much smaller than l 0,this is an acceptable approximation.The construction of the scale space starts with the 9Â9filter,which calculates the blob response of the image for the smallest scale.Then,filters with sizes 15Â15,21Â21,and 27Â27are applied,by which even more than a scale change of two has been achieved.But this is needed,as a 3D non-maximum suppression is applied both spa-tially and over the neighbouring scales.Hence,the first and last Hessian response maps in the stack cannot contain such maxima themselves,as they are used for reasons of comparison only.Therefore,after interpolation,see Sec-tion 3.4,the smallest possible scale is r ¼1:6¼1:2129corre-sponding to a filter size of 12Â12,and the highest to r ¼3:2¼1:224.For more details,we refer to [2].Similar considerations hold for the other octaves.For each new octave,the filter size increase is doubled (going from 6–12to 24–48).At the same time,the sampling inter-vals for the extraction of the interest points can be doubled as well for every new octave.This reduces the computation time and the loss in accuracy is comparable to the image sub-sampling of the traditional approaches.The filter sizes for the second octave are 15,27,39,51.A third octave is com-puted with the filter sizes 27,51,75,99and,if the original image size is still larger than the corresponding filter sizes,the scale space analysis is performed for a fourth octave,using the filter sizes 51,99,147,and 195.Fig.6gives an over-view of the filter sizes for the first three octaves.Further octaves can be computed in a similar way.In typical scale-space analysis however,the number of detected interest points per octave decays very quickly,cf.Fig.7.The large scale changes,especially between the first fil-ters within these octaves (from 9to 15is a change of 1.7),renders the sampling of scales quite crude.Therefore,we have also implemented a scale space with a finer sam-pling of the scales.This computes the integral image on the image up-scaled by a factor of 2,and then starts the first octave by filtering with a filter of size 15.Additional filter sizes are 21,27,33,and 39.Then a second octave starts,again using filters which now increase their sizes by 12pixels,after which a third and fourth octave follow.Now the scale change between the first two filters is only 1.4(21/15).The lowest scale for the accurate version that can be detected through quadratic interpolation is s ¼ð1:2189Þ=2¼1:2.As the Frobenius norm remains constant for our filters at any size,they are already scale normalised,and no fur-ther weighting of the filter response is required,for more information on that topic,see [22].3.4.Interest point localisationIn order to localise interest points in the image and over scales,a non-maximum suppression in a 3Â3Â3neigh-bourhood is applied.Specifically,we use a fast variant introduced by Neubeck and Van Gool [33].The maxima of the determinant of the Hessian matrix are then interpo-lated in scale and image space with the method proposed by Brown and Lowe [5].Scale space interpolation is especially important in our case,as the difference in scale between the first layers of every octave is relatively large.Fig.8shows an example of the detected interest points using our ‘Fast-Hessian’detector.4.Interest point description and matchingOur descriptor describes the distribution of the intensity content within the interest point neighbourhood,similartoFig.5.Filters D yy (top)and D xy (bottom)for two successive scale levels (9Â9and 15Â15).The length of the dark lobe can only be increased by an even number of pixels in order to guarantee the presence of a central pixel(top).Fig.6.Graphical representation of the filter side lengths for three different octaves.The logarithmic horizontal axis represents the scales.Note that the octaves are overlapping in order to cover all possible scales seamlessly.350H.Bay et al./Computer Vision and Image Understanding 110(2008)346–359the gradient information extracted by SIFT [24]and its variants.We build on the distribution of first order Haar wavelet responses in x and y direction rather than the gra-dient,exploit integral images for speed,and use only 64D.This reduces the time for feature computation and match-ing,and has proven to simultaneously increase the robust-ness.Furthermore,we present a new indexing step based on the sign of the Laplacian,which increases not only the robustness of the descriptor,but also the matching speed (by a factor of 2in the best case).We refer to our detec-tor-descriptor scheme as SURF—Speeded-Up Robust Features.The first step consists of fixing a reproducible orienta-tion based on information from a circular region around the interest point.Then,we construct a square region aligned to the selected orientation and extract the SURF descriptor from it.Finally,features are matched between two images.These three steps are explained in the following.4.1.Orientation assignmentIn order to be invariant to image rotation,we identify a reproducible orientation for the interest points.For that purpose,we first calculate the Haar wavelet responses in x and y direction within a circular neighbourhood of radius 6s around the interest point,with s the scale at which the interest point was detected.The sampling step is scale dependent and chosen to be s .In keeping with the rest,also the size of the wavelets are scale dependent and set to a side length of 4s .Therefore,we can again use integral images for fast filtering.The used filters are shown in Fig.9.Only six operations are needed to compute the response in x or y direction at any scale.Once the wavelet responses are calculated and weighted with a Gaussian (r ¼2s )centred at the interest point,the responses are represented as points in a space with the hor-izontal response strength along the abscissa and the vertical response strength along the ordinate.The dominant orien-tation is estimated by calculating the sum of all responses within a sliding orientation window of size p ,see Fig.10.The horizontal and vertical responses within the window are summed.The two summed responses then yield a local orientation vector.The longest such vector over all win-dows defines the orientation of the interest point.The size of the sliding window is a parameter which had to be cho-sen carefully.Small sizes fire on single dominating gradi-ents,large sizes tend to yield maxima in vector length that are not outspoken.Both result in a misorientation of the interest point.Note that for many applications,rotation invariance is not necessary.Experiments of using the upright version of SURF (U-SURF,for short)for object detection can be found in [3,4].U-SURF is faster to compute and can increase distinctivity,while maintaining a robustness to rotation of about ±15°.4.2.Descriptor based on sum of Haar wavelet responses For the extraction of the descriptor,the first step con-sists of constructing a square region centred around the interest point and oriented along the orientation selected in previous section.The size of this window is 20s .Exam-ples of such square regions are illustrated in Fig.11.The region is split up regularly into smaller 4Â4square sub-regions.This preserves important spatial information.For each sub-region,we compute Haar waveletresponsesFig.8.Detected interest points for a Sunflower field.This kind of scenes shows the nature of the features obtained using Hessian-baseddetectors.Fig.9.Haar wavelet filters to compute the responses in x (left)and y direction (right).The dark parts have the weight À1and the light parts þ1.H.Bay et al./Computer Vision and Image Understanding 110(2008)346–359351at 5Â5regularly spaced sample points.For reasons of simplicity,we call d x the Haar wavelet response in horizon-tal direction and d y the Haar wavelet response in vertical direction (filter size 2s ),see Fig.9again.‘‘Horizontal’’and ‘‘vertical’’here is defined in relation to the selected interest point orientation (see Fig.12).1To increase the robustness towards geometric deformations and localisa-tion errors,the responses d x and d y are first weighted with a Gaussian (r ¼3:3s )centred at the interest point.Then,the wavelet responses d x and d y are summed up over each sub-region and form a first set of entries in thefeature vector.In order to bring in information about the polarity of the intensity changes,we also extract the sum of the absolute values of the responses,j d x j and j d y j .Hence,each sub-region has a 4D descriptor vector v for its underlying intensity structure v ¼ðP d x ;P d y ;Pj d x j ;P j d y jÞ.Concatenating this for all 4Â4sub-regions,this results in a descriptor vector of length 64.The wavelet responses are invariant to a bias in illumina-tion (offset).Invariance to contrast (a scale factor)is achieved by turning the descriptor into a unit vector.Fig.13shows the properties of the descriptor for three distinctively different image-intensity patterns within a sub-region.One can imagine combinations of such local intensity patterns,resulting in a distinctive descriptor.SURF is,up to some point,similar in concept as SIFT,in that they both focus on the spatial distribution of gradi-ent information.Nevertheless,SURF outperforms SIFT in practically all cases,as shown in Section 5.We believe this is due to the fact that SURF integrates the gradient infor-mation within a subpatch,whereas SIFT depends on the orientations of the individual gradients.This makesSURFFig.10.Orientation assignment:a sliding orientation window of size p3detects the dominant orientation of the Gaussian weighted Haar wavelet responses at every sample pointwithin a circular neighbourhood around the interest point.Fig.11.Detail of the Graffiti scene showing thesize of the oriented descriptor window at different scales.Fig.12.To build the descriptor,an oriented quadratic grid with 4Â4square sub-regions is laid over the interest point (left).For each square,the wavelet responses are computed from 5Â5samples (for illustrative purposes,we show only 2Â2sub-divisions here).For each field,we collect the sums d x ,j d x j ;d y ,and j d y j ,computed relatively to the orientation of the grid (right).1For efficiency reasons,the Haar wavelets are calculated in the unrotated image and the responses arethen interpolated,instead of actually rotating the image.Fig.13.The descriptor entries of a sub-region represent the nature of the underlying intensity pattern.Left:In case of a homogeneous region,all values are relatively low.Middle:In presence of frequencies in x direction,the value of P j d x j is high,but all others remain low.Ifthe intensity is gradually increasing in x direction,both values P d x andP j d x j are high.352H.Bay et al./Computer Vision and Image Understanding 110(2008)346–359。
slam特征跟踪方法
slam特征跟踪方法From a technical standpoint, SLAM feature tracking methods play a vital role in accurately estimating therobot's pose and mapping the environment. These methods typically rely on extracting and matching visual or geometric features across consecutive frames to establish correspondences and compute the robot's motion. Feature tracking algorithms should be robust to changes in lighting conditions, viewpoint variations, occlusions, and dynamic objects. Moreover, they should be able to handle large-scale environments and real-time processing requirements. Achieving these objectives is challenging due to the complexity and dynamic nature of real-world environments.One popular approach to SLAM feature tracking is theuse of feature descriptors, such as SIFT (Scale-Invariant Feature Transform) or ORB (Oriented FAST and Rotated BRIEF). These descriptors encode distinctive information about the features, allowing for reliable matching across frames. However, feature descriptors alone may not be sufficient inchallenging scenarios with significant viewpoint changes or occlusions. To address this, researchers have proposed methods that combine feature descriptors with geometric constraints, such as epipolar geometry or 3D point cloud information. These methods leverage the geometric relationships between the features to improve tracking accuracy and robustness.Another important aspect of SLAM feature tracking is the initialization of the tracking process. When a robot starts exploring a new environment, it needs to identify and track features from scratch. This initialization step is crucial for accurate motion estimation and subsequent mapping. Various methods have been proposed to address this challenge, including keypoint detection algorithms, such as Harris corners or FAST (Features from Accelerated Segment Test), which aim to identify salient features in the scene. Once the initial set of features is obtained, the tracking process can be initialized and refined using feature matching and motion estimation techniques.In recent years, deep learning-based approaches havealso shown promise in SLAM feature tracking. Convolutional neural networks (CNNs) have been employed to learn feature representations directly from raw image data, eliminating the need for handcrafted descriptors. These learned features can be more robust to variations in lighting and viewpoint, potentially improving tracking performance. Additionally, recurrent neural networks (RNNs) and long short-term memory (LSTM) networks have been explored for modeling temporal dependencies in feature tracking, enabling better handling of motion blur or fast camera movements.Despite the advancements in SLAM feature tracking methods, several challenges remain. One major challenge is the trade-off between tracking accuracy and computational efficiency. SLAM systems often operate in real-time, and the feature tracking component should be able to process frames at high frame rates while maintaining accurate estimates. This requires efficient feature detection, matching, and motion estimation algorithms. Another challenge is the robustness of feature tracking in dynamic environments. Moving objects or changes in the scene candisrupt feature correspondences and lead to tracking failures. Developing methods that can handle dynamic environments and recover from failures is an ongoing research topic.In conclusion, slam特征跟踪方法 (SLAM feature tracking methods) are crucial for enabling mobile robots to navigate and map their surroundings simultaneously. These methods involve extracting, matching, and tracking distinctive features in the environment to estimate the robot's motion and build a map. While feature descriptors and geometric constraints have been traditionally used, recent advancements in deep learning have opened new possibilities for improving tracking accuracy and robustness. However, challenges such as real-time processing, dynamic environments, and tracking initialization still need to be addressed. Continued research and development in SLAM feature tracking methods will contribute to the advancement of robotics and computer vision, enabling robots to operate autonomously in complex and dynamic environments.。
基于改进ORB_的无人机影像拼接算法
第 22卷第 4期2023年 4月Vol.22 No.4Apr.2023软件导刊Software Guide基于改进ORB的无人机影像拼接算法张平,孙林,何显辉(山东科技大学测绘与空间信息学院,山东青岛 266590)摘要:针对传统图像拼接算法在无人机遥感影像拼接过程中速度慢、效率低、无法满足实时准确拼接要求的问题,提出一种改进ORB的图像拼接算法。
首先构建尺度金字塔并利用ORB算法提取特征点,利用BEBLID描述符对特征点进行特征描述,采用最近邻比值(NNDR)算法进行粗匹配;然后基于特征点投票构建最优化几何约束对特征点进一步优化,利用随机采样一致性(RANSAC)算法计算变换矩阵,获取高精度变换矩阵;最后利用改进的渐入渐出加权融合算法实现图像拼接。
实验结果表明,所提算法配准精度最高达到100%,配准耗时低于0.91s,拼接图像信息熵达到6.807 9。
相较于传统算法,所提算法具有更高的拼接效率,在降低图像拼接时间的同时能够获取更高质量的拼接图像,性能显著提升。
关键词:图像拼接;多尺度FAST检测;BEBLID特征;最优化几何约束DOI:10.11907/rjdk.222267开放科学(资源服务)标识码(OSID):中图分类号:TP391.41 文献标识码:A文章编号:1672-7800(2023)004-0156-06UAV Image Mosaic Algorithm Based on Improved ORBZHANG Ping, SUN Lin, HE Xian-hui(College of Geodesy and Geomatics, Shandong University of Science and Technology, Qingdao 266590, China)Abstract:Aiming at the problems of slow speed and low efficiency of traditional image stitching algorithm in UAV remote sensing image stitching process, which cannot meet the requirements of real-time and accurate stitching, an improved ORB image stitching algorithm is pro‐posed. Firstly, the scale pyramid is constructed and the feature points are extracted by ORB algorithm, and then the feature points are de‐scribed by BEBLID descriptor; The nearest neighbor ratio (NNDR) algorithm is used for rough matching, and then the optimal geometric con‐straints are constructed based on the feature point voting to further optimize the feature points. The random sampling consistency (RANSAC)algorithm is used to calculate the transformation matrix and obtain the high-precision transformation matrix; Finally, the improved gradual in and gradual out weighted fusion algorithm is used to realize image mosaic. The experimental results show that the registration accuracy of the proposed algorithm reaches 100% at the highest, the registration time is less than 0.91s, and the information entropy of mosaic image reaches 6.807 9. Compared with the traditional algorithm,the algorithm in this paper has higher splicing efficiency,and can obtain higher quality splicing images while reducing the image splicing time. The algorithm performance is significantly improved.Key Words:image mosaic; multi scale FAST detection; BEBLID feature; optimal geometric constraint0 引言近年来,无人机航拍摄影技术越来越成熟,在遥感监测[1]、电力巡检[2]、灾害勘察[3]、军事侦察[4]等领域均有广泛应用。
MatchingwithPROSAC-ProgressiveSampleConsensus
Figure 1: The Great Wall image pair with an occlusion. Given 250 tentative correspondences as input, both PROSAC and RANSAC found 57 correct correspondences (inliers). To estimate the epipolar geometry, RANSAC tested 106,534 seven-tuples of correspondences in 10.76 seconds while PROSAC tested only 9 seven-tuples in 0.06 sec (on average, over hundred runs). Inlier correspondences are marked by a line segment joining the corresponding points. Standard RANSAC does not model the local matching process. It is viewed as a black box that generates N tentative correspondences, i.e. the error-prone matches established by comparing local descriptors. The set U of tentative correspondences contains an a priori unknown number I of 1
feature matching综述
feature matching综述正文feature matching(特征匹配)是计算机视觉领域中一个重要的任务,它在图像处理、目标检测和图像拼接等应用中起着关键作用。
在图像处理中,特征匹配是指通过比较两幅图像中的特征点,找出它们之间的对应关系。
本文将综述 feature matching 的方法、技术和应用领域。
在特征匹配中,最常用的方法是基于特征描述子的匹配。
特征描述子是对图像中特征点周围区域的描述,一般采用局部图像块的统计信息。
常见的特征描述子包括SIFT(尺度不变特征变换)、SURF(加速稳健特征)和ORB(Oriented FAST and Rotated BRIEF)等。
这些特征描述子具有尺度不变性、旋转不变性和光照不变性等优点,能够在不同图像中鲁棒地进行匹配。
在特征匹配的过程中,常见的算法包括基于暴力匹配、基于FLANN (快速最近邻搜索库)的匹配和基于词袋模型的匹配等。
暴力匹配是最简单直接的方法,但计算复杂度较高,适合处理较小规模的特征集合。
FLANN库则通过构建kd树或k-means树等数据结构,实现了快速的最近邻搜索,适用于大规模的特征匹配。
词袋模型则将特征描述子通过聚类算法进行划分,将图像表示为特征的统计分布,从而实现图像之间的匹配。
feature matching 在计算机视觉领域有着广泛的应用。
其中最常见的应用是目标检测与跟踪。
通过在图像中提取关键特征并进行匹配,可以实现对目标物体的自动检测与跟踪。
此外,feature matching 还广泛应用于图像拼接、三维重建、图像检索和图像匹配等领域。
总结而言,feature matching 是一项重要的计算机视觉任务,通过比较两幅图像中的特征点,找出它们之间的对应关系。
特征描述子和匹配算法是 feature matching 的核心技术,而应用领域涵盖了目标检测与跟踪、图像拼接、三维重建等。
未来随着深度学习等技术的发展,feature matching 可能会在更多领域有所应用和拓展。
浙江大学教师岗位聘任
浙江大学教师岗位聘任申请表*名:***学院(系、平台等):理学院现专业技术职务:教授现专业技术岗位等级:2级教师岗位类别:9级岗申报级别:B2所属学部:理学部年月日附件一:浙江大学2010年申报教师岗位人员主要业绩(2006.1—2009.12)学院(系、平台等):理学院姓名:彭群生性别:男出生年月:1947.5所在二级学科:应用数学最后学历及毕业时间:博士,1983年9月现专业技术职务及任职时间:教授,1988年博导时间:1993年现专业技术岗位等级:2级岗位聘任级别:9级兼任党政职务:曾担任民进浙大委员会主委(2003-2008),浙江省政协委员(2003-2007)。
主要学术兼职:担任中国计算机学会CAD与图形学专委会主任,《The Visual Computer》、《J. Computer Science & Technology》、《中国科学-信息科学》、《计算机学报》、《软件学报》、《计算机辅助设计与图形学学报》、《J of Zhejiang University- Science 》等7种国内外期刊编委,浙江大学CAD&CG国家重点实验室学术委员会副主任、北京航空航天大学虚拟现实系统国家重点实验室学术委员、复旦大学智能信息处理重点实验室学术委员。
现教师岗位类别:教学科研并重岗申报岗位聘任级别:B2(一)教学工作:1、主讲课程名称、课程类别(如本科生通识课程、大类课程、专业课程,研究生公共课程等)、授课对象、学生数、学时数、教学年度、考核结果计算机图形学,研究生学位课,研究生, 100人,72学时/年,每年秋-冬学期2、承担大学生体育文化活动指导等(列出教学年度、主要内容、授课/指导对象、人数(次)、学时数)3、指导本科生毕业论文(设计)22人(列出本科生专业、年级、姓名)数学2002级:张鑫,徐小东,金志栋;计算机2002级:杨颖振,谢立广,杨志亮数学2003级陈佳舟,程军,于洋;计算机2003级:陈曦,任智敏,徐小华,丁子昂,骆鹏程数学2004级石峰;计算机2004级:李路莹,黄若冠,刘华航数学2005级林乃养;计算机2005级:莫铭臻,段鑫,郗加河4、本科生导师工作情况,指导22名(列出所学专业、年级、姓名、考核等级)数学2003级陈佳舟,程军,于洋;计算机2003级:陈曦,任智敏,徐小华,丁子昂,骆鹏程数学2004级石峰;计算机2004级:李路莹,黄若冠,刘华航,数学2005级林乃养;计算机2005级:莫铭臻,段鑫,郗加河数学2006级鲁佳,金烁;计算机2006级:方婧,陈曦,王冉,陈翔5、指导博士生29 名、硕士生36名(列出研究生(不含研究生课程进修班学员)所学专业、年级、姓名)博士生:应用数学2002:吴向阳,柴登峰,缪永伟,王长波,应用数学2003:郭延文,肖春霞,江照意,刘世光,张龙,管宇,张涛;计算机科学与技术2003: 宋成芳,胡敏应用数学2004:刘春晓,汪莉计算机科学与技术2004:张繁,应用数学2005:钟凡,延诃,张元慧,潘斌,王锐,应用数学2006:张鑫,张艺江,韩玮,刘艳丽,赵勇,计算机科学与技术2006:范涵奇应用数学2007 陈佳舟;应用数学2008:张艺江,计算机科学与技术2009:张祯硕士生:应用数学2003:武凤霞;应用数学2004:李辉,潘梁,刘海芹,龙珑计算机科学与技术2004:范亦楠,马瑞金,龚怿,应逸亭,曾运,应用数学2005:刘舒,徐小东,王庆伟,涂晓兰;计算机科学与技术2005:虞宏毅,谈奇峰,陈飞飞应用数学2006:金志栋,姚芸,林成春;计算机科学与技术2006:罗功辉,杨志亮,杨颖振,应用数学2007:程军,于洋:计算机科学与技术2007:何戬,陈洪文,丁子昂,姚建应用数学2008:王智广,邢冠宇;计算机科学与技术2008:杨中雷应用数学2009:林乃养,崔晓燕;计算机科学与技术2009:王帅、袁霞(三)主持或主参(前三位)的科研项目项目名称、项目来源、项目编号、起止时间、经费总额本人排名/总人数1. 虚拟环境的统一信息表达理论与高效构建方法,国家973项目,2002CB312101,2003.1~2007.12,440万,1/10,2.蛋白质结构的分子场建模、表达与分析,国家自然科学基金(重点项目),60533050,2006.1~2009.12,200万,1/10,3.增强虚拟型混合环境的呈现,国家973项目,2009CB320802,2009.1~2013.12,495万,2/10, 4.场景表意式绘制方法研究,国家自然科学基金(面上项目)2010.1~2012.12, 32万,1/ 8(四)论文著作1、发表论文共 106 篇(列出论文题目、所载刊物、影响因子、他引次数、发表年月、本人排名/总人数),如Top期刊请注明。
基于相似度测度的匹配算法
基于相似度测度的匹配算法Matching algorithms based on similarity measures play a crucial role in various fields, including information retrieval, recommendation systems, and data mining. These algorithms aim to identify similarities between items or entities based on certain features or characteristics. By using similarity measures such as Jaccard similarity, cosine similarity, or Euclidean distance, these algorithms can make informed decisions about matching items that are most relevant or similar to each other.基于相似度测度的匹配算法在各个领域中扮演着至关重要的角色,包括信息检索、推荐系统和数据挖掘。
这些算法旨在根据某些特征或特征来识别项目或实体之间的相似之处。
通过使用Jaccard相似度、余弦相似度或欧氏距离等相似度测度,这些算法可以做出关于匹配最相关或最相似项目的明智决策。
One key aspect of matching algorithms based on similarity measures is the choice of the appropriate similarity measure for the specific task at hand. Different similarity measures have different strengths and weaknesses, and selecting the right one can significantly impact the performance of the matching algorithm. For example, cosinesimilarity is often used for text similarity tasks, while Euclidean distance is commonly used for matching numerical data.基于相似度测度的匹配算法的一个关键方面是为特定任务选择适当的相似度测度。
基于特征点的图像拼接方法
基于特征点的图像拼接方法张东;余朝刚【摘要】提出了一种基于特征点匹配的全景图像拼接方法.该方法首先利用sift算法提取各图像中的特征点并利用Harris算法对图像特征点提取进行了优化,然后采用基于K-d树的BBF算法查找和确定初始匹配点对,完成特征点的粗匹配,再根据图像配准结果使用稳健的RANSAC算法对粗匹配的特征点进行筛选,计算出图像间变换矩阵H,最后采用渐入渐出的加权平均的融合算法对两幅图像进行无缝拼接,形成一幅完整的全景画面.实验结果验证了该方法的有效性,拼接效果较好.%This paper proposes a panoramic image mosaic method based on feature points matching. This method firstly uses the sift algorithm to extract the image feature points and uses Harris algorithm to optimize the image feature extraction. Then the BBF algorithm based on K-d tree is used to find and determine the initial matching points and complete the coarse matching of the feature points. Then according to the result of image registration, robust RANSAC algorithm is utilized to filter coarse matching feature points and calculate the transformation matrix H. Finally, the gradually fading out fusion algorithm of the weighted average is used in the seamless Mosaic of two images, form a complete panoramic view picture. Experimental results verify the effectiveness of the proposed method and the splicing effect is better.【期刊名称】《计算机系统应用》【年(卷),期】2016(025)003【总页数】6页(P107-112)【关键词】图像拼接;sift特征点;图像配准;RANSAC;变换矩阵【作者】张东;余朝刚【作者单位】上海工程技术大学城市轨道交通学院,上海 201620;上海工程技术大学城市轨道交通学院,上海 201620【正文语种】中文随着计算机图像处理技术的不断发展, 全景图在生活中有着广泛的应用, 通常由于各种条件的制约, 人们往往很难拍出宽视角、高分辨率的全景图像, 因此, 可实现多幅图像无缝拼接的图像拼接技术应运而生.图像拼接就是将同一幅场景中有相互重叠的多幅图像拼接成大幅面、宽视角、失真小且没有明显缝合线的高分辨率图像[1]. 目前, 图像拼接算法主要可以分为两类: 一类是基于区域相关的图像拼接算法; 另一类是基于特征相关的图像拼接算法. 第一类是从图像的灰度值出发, 通过计算两幅图像的灰度相关性来确定相似程度, 得到拼接图像重叠区域的位置和范围,实现图像拼接; 而基于特征的拼接方法是通过提取图像特征点来对图像重叠部分的对应特征区域进行搜索匹配, 该类算法有比较高的稳定性, 应用也比较广泛[2]. 近年来, 国内外研究人员提出了多种图像拼接算法, Lowe等人在2003年提出了完整的sift算法[3], 该算法将特征点检测、描述和匹配整合为一个统一的过程; Yanke等[4]针对sift算法计算量大、耗时长这一缺点, 提出了PCA-SIFT改进算法, 但计算量并未减少, 且对原有的特征点提取部分并没有改进; 赵向阳、杜立民[5]提出了一种基于特征点匹配的拼接算法, 其中使用Harris 算法[6]提取角点并进行匹配, 该算法使用鲁棒变换估计技术, 一定程度提高匹配算法的稳健性, 但计算速度较慢.本文结合sift算法和harris算法各自的特点, 首先提出了改进的sift算法来提取待拼接图像的特征点; 然后通过BBF算法搜索查找匹配点对, 进行粗匹配, 并结合RANSAC算法实现图像特征点的精确匹配, 同时估计出图像间的变换矩阵; 最后根据变换矩阵采用加权平均的融合方法进行图像的无缝拼接得到全景图像.1.1 sift特征提取算法尺度不变换特征(Scale Invariant Feature Transform, SIFT)是1999年由Lowe 提出的一种提取局部特征的算法, 它在尺度空间中寻找极值点, 并提取出其位置、尺度、旋转不变量. 其实现主要包括4个步骤: 尺度空间的构建; 空间极值点检测; 确定特征点方向; 生成特征描述符. 下面将详细介绍sift特征提取的流程.①尺度空间的构建在不同的尺度下观察一幅图像, 成像是不同的, 我们寻找的局部点就是那些图像尺度发生变化, 但其自身的相对位置仍保持不变的特征点. 而高斯核被证明是唯一的线型卷积核, 因此采用高斯卷积核对初始图像进行一系列的尺度变换.首先我们使用高斯函数与图像进行卷积运算对图像进行若干次的连续滤波处理建立了第一个尺度组. 然后对图像进行降采样运算将图像尺寸减小到原来的一半, 进行同样的高斯滤波形成了第二个尺度组, 之后重复上述操作直到图像小于某一给定的阈值为止. 最后对每个尺度组的高斯图像进行差分运算得到高斯差分图像, 即DOG 图像[7].DOG可由下式求得:其中, 是尺度可变的高斯卷积核, , 为二维图像, 为卷积运算, 是空间坐标, 是尺度空间因子.②空间极值点检测得到了高斯差分图像以后, 接下来就要对特征点进行检测. 本文选取这些高斯差分图像中的局部极值点作为图像的特征点, 在尺度空间极值点的检测过程中, 每一个采样点要和它所有的相邻点进行比较, 为了确保检测出来的特征点不仅是二维空间上的局部极值而且还是尺度空间中的极值, 除底层和顶层外中间层次的每个检测点都要和它同尺度的8个点和上下相邻尺度的92个点共26个相邻像素点进行比较, 如果该点为极值点, 就认为其是图像在该尺度下的一个特征点.此时得到的极值点中有不稳定的点, 这些点对噪声敏感, 这将影响后续匹配的稳定性和鲁棒性, 因此要对极值点进行精确定位, 去除那些对比度低的特征点和不稳定的边缘响应点. 根据Lowe的研究, 通过三维二次方函数可以确定采样点的位置. 尺度函数的二阶泰勒展开式为:式中表示特征点与采样点之间的尺度和位置偏移, 对上式求导并另其等于0, 得到了极值点位置 , 将的值带入式(2)运算求得极值的大小为, 判断其绝对值大小, 如果小于某阈值, 判定该点不稳定将其舍弃, 保留下来的极值点即作为特征点.③确定特征点方向利用特征点邻域像素的梯度方向分布特征为每一个特征点指定方向参数, 处的梯度值和方向分别为:以特征点为中心, 对它的邻域像素采样, 并用梯度方向直方图进行统计, 直方图每为一个柱将到分为36个柱, 直方图的峰值则代表了该像素特征点处邻域梯度的主方向, 即为该特征点的主方向. 当存在另一个与主峰值能量相当的峰值时, 将这个方向作为该特征点的辅方向.④生成特征描述符得到特征点主方向以后, 旋转特征点主方向, 使之与坐标轴方向重合, 然后在旋转后区域内取以特征点为中心的1616像素大小的邻域, 并将它分成16个44的子窗口区域, 每个子窗口区域形成一个8维向量种子点, 这样就构成了一个168=128维的特征向量, 即为sift特征向量描述符[7].1.2 改进sift特征提取算法根据Lowe的研究, 使用sift算法对物体进行识别时, 如果能够匹配的关键点达到3个以上, 则可以确定该图像中存在目标物体, 因此在图像拼接过程中只需要少量的特征点就可以完成对重叠部分的匹配, 然而, 对于一幅512512的图像, 一般可以检测出1000个左右的特征点, 在增强鲁棒性处理之后, 仍可以达到200到500个特征点, 在拼接时相当大部分的时间都用在了检测冗余的特征点上.本文利用Harris算法对sift算法进行了改进, 上文中假设采用sift算法共获取了N 个特征点, 每个特征点包含3个信息: 位置、尺度和方向, 可以表示为 , , 其中, 、表示特征点的位置信息, 表示特征点的横坐标和纵坐标, 和分别表示特征点的尺度和方向信息. 利用这些信息可以计算Harris角响应值, 表示计算自相关矩阵时的高斯加权函数的标准差. 矩阵表示为:式中为高斯加权函数, 为方向的梯度, 为方向的梯度, 自相关矩阵M具有两个特征值和 , 它同矩阵M的曲率成正比, 因此可以得到Harris角点的响应函数为:通常取值0.04到0.06, 计算所有特征点响应值并计算其绝对值的累加, 然后求其均值的一半作为阈值, 表达式如下所示:若则剔除该特征点, 保留下来的点则作为改进sift算法提取出的特征点2.1 基于K-d树的匹配算法特征点的匹配可以采用穷举法来完成, 但是这样的话会消耗大量的时间, 因此, 一般采用基于K-d树的数据结构来完成匹配搜索. 搜索的内容是与匹配目标图像特征点最邻近和次邻近的原始图像特征点. K-d 树是分割k 维数据空间的一种数据结构, 每个数据结点表示k 维空间的一个点. 每一层都是根据该层的判决器来对相应对象做出分枝决策, 顶层结点按由判决器决定的一个维度来进行划分, 下一层则按照该层的判决器决定的一个维进行划分, 同理在余下各维之间不断地划分. 直至一个结点中的点数少于给定的最大点数时, 结束划分[10].当用到高维数据搜索问题时, 这种方法的效率明显下降, 针对这一问题, 提出了基于K-d树的BBF算法[11], 这种算法的实现依靠一个优先级队列, 队列成员是按照搜索节点与目标节点之间距离的升序排列的. 在搜索中, 当依靠 K-d 树判决器值的决策沿某个分支方向搜索某个节点时, 一个元素会被加入到优先级队列, 此元素记录了该节点未被搜索分支的信息, 主要包括当前节点与目标节点之间的距离和当前节点在K-d树中的位置. 当一个叶节点搜索到后, 由优先级队列中队首元素的信息按相同方法搜索包含下一近邻节点的其他分支, 并删除队首元素. 在指定数量的节点搜索完后, 结束搜索, 将按距离升序排列的指定数量的近似近邻结果返回.利用基于K-d树的BBF算法进行匹配: 首先, 采用BBF算法找出目标点与待匹配点欧式距离最近的K个点, 因sift特征向量的维数很高, 使很多非匹配点与待匹配点之间的距离集中在一起. 因此使用最近邻中的最小距离点和次小距离点比较, 并设置一个阈值T , 如果两者的比值大于阈值T, 则说明待匹配特征点没有匹配点, 否则, 则匹配成功. 具体做法如下:在BBF搜索中, 假设找到了K个待匹配特征点p的近似最邻近特征点, 则K个特征向量表示为: , 且 (距离按升序排列), 对于匹配点的判断可用下式进行:其中, 表示为向量与之间的距离, 若上式成立, 则为匹配的特征点, 否则, 匹配失败. 使用上述基于K-d树的BBF算法搜索匹配获得的匹配点对通常存在一定数量的误匹配. 为此, 本文采用了RANSAC算法进一步完成匹配工作.2.2 RANSAC算法RANSAC(Random Sample Consensus)算法[12], 即随机抽样一致性算法, 是在矩阵估计、模式识别中最常用的对特征点进行提纯的方法. 根据一组包含错误数据的样本集计算出数据的数学模型参数, 得到有效样本数据, 也就是说首先提出假设的模型, 再通过已知数据进行验证, 判定出该模型是否成立. 通过最优模型的建立, 可以判断出不符合模型的外点, 也就查找出了特征点粗匹配中的误匹配, 然后剔除误匹配, 达到提纯的效果. 本文采用RANSAC算法对图像进行精确匹配, 同时得到图像间变换关系, 具体步骤如下:①待拼接的两幅图像之间由于存在一部分的重叠区域, 它们之间是存在相应变换的, 本文采用的是投影变换模型来准确的描述图像之间的关系, 投影变换模型可以准确描述图像间存在的变换情况, 不仅可以对图像的旋转、缩放、平移和规则性的变换准确描述, 也可以对一些不规则的变换进行描述. 变换公式为:式中, h11、h12、h21、h22为缩放及旋转因子, h13、h23为平移因子, h31、h32为仿射因子.②在粗匹配的匹配点集S中随机抽取4对匹配点, 判断其中是否有3点共线的, 若有则舍弃重新随机抽取, 直至4对匹配点中无3点共线的情况. 得到4对匹配点后计算其对应的变换模型.③上一步确定了一个模型之后, 需要匹配点集S中所有的数据来验证该模型的有效性, 即计算所有的粗匹配点是否满足该模型. 能被模型描述的点定义为内点, 相反, 不能被模型描述的点定义为外点. 统计出内点数量, 分别求出内点匹配点和变换点之间的欧式距离, 再进行求和处理. 距离之和公式如下:式中为满足模型的内点数量.④通过设定阈值来判断该距离之和是否满足该阈值来选择模型参数.⑤重复②③两步, 更新变换模型, 直至得到最优模型为止, 根据最优模型的内点来计算最终的模型参数, 即为所要求的变换矩阵, 并用实现图像的配准[13].经过图像匹配后, 根据图像间变换矩阵可以对相应图像进行变换来确定图像间重叠区域, 对图像进行拼接形成一幅全景图像. 需要注意的是, 由于相机角度和曝光时间的各种差异导致拍摄到的图像可能存在亮度上的差异, 图像边缘可能出现失真现象, 这样会导致拼接后的图像缝合线处出现明显的明暗变化. 为了消除图像拼接缝隙, 实现无缝拼接, 要对图像拼接处的缝合线进行平滑处理[14]. 本文采用渐入渐出的加权平均融合法.加权平均融合法类似于直接平均法, 但其重叠区域的像素值不再是简单的叠加, 而是先进行加权后再叠加平均. 假设为融合之后的图像, 、是待拼接的图像, 拼接后图像如公式(6)所示为:式中和分别是第一幅和第二幅图像重叠区域对应像素的权值, 并且满足 , , 选择合适的权值可以使重叠区域实现平滑过渡, 同时可以消除拼接缝隙.本文主要实现的是基于sift特征点匹配的全景图像拼接功能, 实验是在参数为Intel Core i3-350M 2.27G/2G/Visual Studio 2012的PC机上加以实现. 首先利用相机拍摄两幅不同视角的图片, 提取其特征点, 然后采用基于K-d树的BBF算法对特征点并进行粗匹配; 再利用RANSAC算法对粗匹配进行提纯, 删除“外点”, 得到筛选后的匹配点对; 利用这些匹配点对计算图像间变换矩阵, 采用渐入渐出的加权平均法进行融合得到拼接图像, 如图1所示.图1中, (a)、(b)是两幅原始图像, (c)、(d)为特征点的提取图像, 其中(c)图检测到特征点473个, (d)图检测到特征点398个, (e)为采用BBF算法匹配后的图像, 共有178个匹配点, 匹配时间为0.125秒, 可以看出, 在这178个匹配点中存在的误匹配对数比较多, (f)为RANSAC算法匹配后的图像, 匹配时间为0.082秒, 可以看出经过RANSAC算法提炼后基本无误匹配点对, 计算出变换矩阵后, 对图像进行拼接, (g)为直接进行拼接效果图, 重叠区域有明显明暗变换, (h)为使用渐入渐出的加权平均融合算法得到的拼接图像, 由图(h)可知, 图像拼接效果较好, 没有明显的亮度差异, 视觉效果自然.为了验证该方法的有效性, 使用数码相机拍摄了4幅图片进行拼接实验, 每幅图片之间有大约30%到50%的重叠区域, 如图2所示上下左右四幅图像, 图3为拼接的效果图, 由结果可以看出处理后拼接痕迹消除, 实现了重叠区域的平滑过渡, 效果良好, 得到了高品质的全景图像.本文主要对基于特征点的图像拼接算法进行了研究, 利用改进的sift算法提取图像的特征点, 并结合RANSAC算法实现了相邻帧图像特征点的精确匹配, 利用变换投影模型估计出两幅图像之间的变换矩阵, 最后利用渐入渐出的加权平均融合算法消除了图像拼接处的缝合线和色彩差异, 实现了全景图像的高质量拼接.1 Brown M, Lowe DG. Automatic panoramic image stitching using invariant features. IJCV. 2007. 59–73.2 汪华琴.基于特征点匹配的图像拼接方法研究[学位论文].武汉:华中师范大学,2007.3 Lowe DG. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91–110.4 Ke Y, Sukthankar R. PCA-SIFT: A more distinctive representation for local image description. Proc. of IEEE Computer Vision and Pattern Recognition Conference. 2004, 2. 506–513.5 赵向阳,杜立民.一种全自动稳健的图像拼接融合算法.中国图形图像学报,2004.9(4):417–422.6 Harris C, Stephen M. A combined Corner and edge detect. Proc. of 4th Alvey Vision conference. UK. 1988. 15–50.7 李寒,牛纪帧,郭禾.基于特征点的全自动无缝图像拼接方法.计算机工程与设计,2007,28(9):2083–2085.8 张恒.基于sift的图像拼接算法研究[学位论文].天津:河北工业大学,2012.9 郭晓冉,崔少辉.基于局部特征点配准的图像拼接算法.半导体光电,2014,35(1):89–94.10 王俊秀,孔令德.基于特征点匹配的全景图像拼接技术研究.软件工程师,2014,17(11):10–1311 Jeff B, Lowe DG. Shape indexing using approximate nearest-neighbor search in high-dimensional spaces. Conference on Compute Vision andPattern Recognition. 1997. 1000–1006.12 Hartley R, Zisserman A. Multiple View Geometry in Computer Vision. London: 2nd Cambridge University Press, 2004.13 宋宝森.全景图像拼接方法研究与实现[学位论文].哈尔滨:哈尔滨工程大学,2012.14 林诚凯,李惠,潘金贵.一种全景图生成的改进算法.计算机工程与应用,2004,40(35):69–71,159.。
sarscape 偏移追踪流程
sarscape 偏移追踪流程English Answer:SARScape Offset Tracking Procedure.SARScape is a software tool that helps researchers track the movement of satellite imagery over time. It uses a technique called offset tracking to align images from different dates, allowing users to measure changes in the landscape. The offset tracking procedure in SARScape involves the following steps:1. Image Preprocessing: The first step is to preprocess the satellite images to make them suitable for offset tracking. This includes radiometric correction, geometric correction, and mosaicking.2. Feature Detection: Offset tracking relies on identifying corresponding features in the two images being aligned. SARScape uses a variety of feature detectionalgorithms, such as the Scale-Invariant Feature Transform (SIFT) and the Speeded Up Robust Features (SURF).3. Feature Matching: The detected features are then matched between the two images. SARScape uses a combination of geometric and photometric constraints to ensure that only corresponding features are matched.4. Offset Estimation: Based on the matched features, SARScape estimates the translation and rotation offsets between the two images. This is done using a robust estimation algorithm that minimizes the geometric error.5. Image Alignment: The estimated offsets are then used to align the two images. SARScape applies a geometric transformation to one of the images to bring it into alignment with the other.6. Error Assessment: After alignment, SARScape assesses the accuracy of the offset tracking procedure. This is done by measuring the residual geometric error between the aligned images.The offset tracking procedure in SARScape is highly automated and can be applied to large datasets of satellite imagery. It is a powerful tool for researchers studying landscape change, deforestation, and other environmental processes.Chinese Answer:SARScape 偏移追踪流程。
改进FAST特征点支持下的实时影像地标匹配算法
改进FAST特征点支持下的实时影像地标匹配算法杨琪莉;朱兰艳;李海涛【摘要】针对图像匹配技术中匹配时间与匹配精度不能同时满足要求的问题,提出一种基于特征点匹配的方法,利用随机森林分类器实现地标的匹配,将匹配问题转化为简单的分类问题,大大简化了计算过程,保证影像匹配实时性;采用FAST特征点表示影像地标,利用高斯金字塔结构以及仿射增强策略改进FAST特征点的尺度和仿射不变性,提升影像地标匹配率.将实验结果与尺度不变特征变换(SIFT)算法和加速鲁棒性(SURF)算法进行比较.实验结果表明在尺度变化、发生遮挡以及旋转情况下,匹配率能达到90%左右,保持与SIFT算法和SURF算法相近的匹配率,并且匹配时间相较其他两种算法减少了一个数量级,能有效地对影像地标进行匹配,匹配时间也满足实时影像地标匹配要求.【期刊名称】《计算机应用》【年(卷),期】2016(036)005【总页数】6页(P1404-1409)【关键词】随机森林;地标匹配;FAST特征点;高斯金字塔结构;仿射增强策略【作者】杨琪莉;朱兰艳;李海涛【作者单位】昆明理工大学国土资源工程学院,昆明650093;昆明理工大学国土资源工程学院,昆明650093;昆明理工大学国土资源工程学院,昆明650093【正文语种】中文【中图分类】TP391.41影像地标匹配的主要支撑技术是图像匹配。
文献[1]将图像匹配主要分为基于灰度相关匹配和基于特征匹配。
其中,基于特征匹配又可以细分为基于特征点匹配以及基于变换域匹配两种类型。
文献[2]总结出了灰度相关匹配方法是利用对待匹配图像遍历窗口进行相似性比较的方式进行搜索匹配的方法,该方法计算量较大,图像相似性计算对尺度变化和旋转情况比较敏感。
文献[3]指出基于特征的匹配方法通过对两幅图像提取特征,按照某种数学法则或几何约束方法对特征进行描述,通过特征匹配实现图像的匹配;该方法匹配效果好,但是计算复杂,达不到实时性要求。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Robust feature matching in2.3µsSimon Taylor Edward Rosten Tom Drummond Department of Engineering,University of Cambridge Trumpington Street,Cambridge,CB21PZ,UK{sjt59,er258,twd20}@AbstractIn this paper we present a robust feature matching scheme in which features can be matched in2.3µs.For a typical task involving150features per image,this re-sults in a processing time of500µs for feature extraction and matching.In order to achieve very fast matching we use simple features based on histograms of pixel intensities and an indexing scheme based on their joint distribution. The features are stored with a novel bit mask representation which requires only44bytes of memory per feature and al-lows computation of a dissimilarity score in20ns.A train-ing phase gives the patch-based features invariance to small viewpoint rger viewpoint variations are han-dled by training entirely independent sets of features from different viewpoints.A complete system is presented where a database of around13,000features is used to robustly localise a single planar target in just over a millisecond,including all steps from feature detection to modelfitting.The resulting system shows comparable robustness to SIFT[8]and Ferns[14] while using a tiny fraction of the processing time,and in the latter case a fraction of the memory as well.1.IntroductionMatching the same real world points in different images is a fundamental problem in computer vision,and a vi-tal component of applications such as automated panorama stitching(e.g.[2]),image retrieval(e.g.[16])and object lo-calisation(e.g.[8]).Matching schemes must define a measure of similarity between parts of images,which in the ideal case is high if the image locations correspond to the same real-world point and low otherwise.The most basic description of a region of an image is a patch of pixel values.Patch matches can be found by searching for a pair of patches with a high cross-correlation score or a low sum-of-squared-differences (SSD)score.However patch matching with SSD provides no invariance to common image transformations suchas Figure1.Two frames from a sequence including partial occlusion and significant viewpoint variation.The average total processing time per640x480frame for the sequence is1.37ms using one core of a2.4GHz processor.Extracting runtime patch descriptors and finding matches in the database accounts for520µs of this time. viewpoint change,and performing exhaustive patch match-ing between all possible pairs of patches is infeasible.Moravec proposed an interest point detector[13]to in-troduce some invariance to translation and hence reduce the number of patch matches to be considered.Interest point detection is now well-established as thefirst stage of state-of-the art matching schemes.There are many other trans-formations between the images,such as rotation and scale, which an ideal matching scheme should cope with.There are generally two approaches possible for each category of transformation;either factor out the effect of the transfor-mation,or make the representation of the area of interest invariant to it.Detecting interest points falls into thefirst category in that it factors out coarse changes in position.Schmid and Mohr[16]presented thefirst interest point approach to offer invariance to many image transforma-tions.A number of rotationally invariant features were com-puted around interest points in images.During matching the same features were computed at multiple scales to give the method invariance to both scale and rotation changes around the interest point.Instead of computing features invariant to rotation,a canonical orientation can be computed from the region around an interest point and used to factor out the effect of rotation.A variety of methods forfinding orientation have been proposed including the orientation of the largesteigenvector in Harris[4]corner detection,the maxima in an edge orientation histogram[8]or gradient direction at a very coarse scale[2].The interest point detection stage can also factor out more than just translation changes.Scale changes can be accounted for by a searching for interest regions over scale space[8,10].The space of affine orientation has too many dimensions to be searched directly,so schemes have been proposed to perform local searches for affine orientation starting from scale-space interest regions[11].Alterna-tively,interest regions can be found and affine orientation deduced from the shape of the region[9].Schemes such as those above can factor out large changes due to many common imaging transformations,but differences between matching patches will remain due to errors in the assignment of the canonical parameters and unmodelled distortions.To give robustness to these errors the patches extracted from the canonical frames undergo a further stage of processing.Lowe’s SIFT(scale invari-ant feature transform)method[8]typifies this approach and uses soft binning of edge orientation histograms which vary weakly with the position of edges.Other systems in this category include GLOH(Gradi-ent Location and Orientation Histogram)[12]and MOPS (Multi-scale Oriented Patches)[2]which extracts patches from a different scale image to the interest region detec-tion.Winder and Brown applied a learning approach tofind optimal parameters for these types of descriptor[18].The CS-LBP descriptor[5]uses a SIFT-style histogram of local information from the canonical patches but the local infor-mation used is a binary pattern rather than the local gradient used in SIFT.All of the above approaches aim to compute a single de-scriptor for a real-world feature which is as invariant as pos-sible to all likely image transformations.Correspondences between images are determined by extracting descriptors from both images andfinding those that are close neigh-bours in feature space.An interesting alternative approach recasts the match-ing problem as one of classification.This approach uses a training stage to train classifiers for the database features, which allows matching to be performed with less expensive computation at run-time than required by descriptor-based methods.Lepetit et al.demonstrated real-time matching us-ing randomised trees to classify patches extracted from lo-cation,scale and orientation-normalised interest regions[7]. Only around300bits are computed from the query images for each interest region to be classifiter work from Oyuzal et al.introduced the Ferns method[14]which im-proved classification performance to the point where the orientation normalisation of interest regions was no longer necessary.These methods only perform simple computa-tions on the runtime image,however the classifiers need to represent complicated joint distributions for each feature and so a large amount of memory is required.This limits the approach to a few hundred features on standard desktop PCs.Runtime performance is of key importance for many applications.The template tracking system of Jurie and Dhome[6]performs well but in common with any tracking scheme relies on small frame-to-frame motion and requires another method for initialisation.Recent work on adapting the SIFT and Fern approaches to mobile phones[17]made trade-offs to both approaches to increase speed whilst main-taining usable matching accuracy.Our method is around 4times faster than these optimised implementations and acheives more robust localisation.Existing state-of-the-art matching approaches based on descriptor computation or patch classification attempt to match any possible view of a target to a small set of key features.Descriptor-based approaches such as SIFT factor out image transformations with computationally expensive image processing.Classification methods such as Ferns of-fer reduced runtime computation but have a high memory cost to represent the complex joint distributions involved.Our method avoids the complexity inherent to matching areas of images subject to large transformations.Instead we employ a training phase to learn independent sets of features for different views of the target,and insert them all into the database for the target.The individual features are only in-variant to small changes of viewpoint.This simplifies the matching problem so neither the computationally expensive normalisation over transformations of SIFT-style methods or the complex classifier of the Fern-like approach are re-quired.As we only require features to be invariant to small view-point changes we need far less invariance from our interest point detector than other matching schemes.The FAST-9 (Features from Accelerated Segment Test)detector[15]is a perfectfit for our application as it shows good repeatability over small viewpoint variations and is extremely efficient as it requires no convolutions or searches over scale space.A potential problem with using features with less invari-ance than those of other approaches is that more database features will be required to allow robust matching over equivalent ranges of views at runtime.Therefore to make our new approach feasible we require features that have a low memory footprint and which permit rapid computation of a matching score.Our novel bit-mask patch feature ful-fils these criteria.As runtime performance is our primary concern we would like to avoid too much processing on the pixels around the detected interest ing pixel patches would be one of the simplest possible matching schemes but SSD-based patch matching would not even provide the small amount of viewpoint invariance we desire.One of thereasons SSD is very sensitive to registration errors is that it assigns equal weight to errors from all the pixels in the patch.Berg and Malik[1]state that registration errors,at least for scale and rotation,will have more effect on samples further from the centre of the patch.The authors reduce the weight of errors in those samples by employing a variable blur which is stronger further from the centre of the patch. We use the idea that not all pixels in a patch are equally im-portant for matching,but further note that the weights which should be assigned to pixels also depend on the individual feature:samples in the centre of large regions of constant intensity will be robust to small variations in viewpoint.We employ a training phase to learn a model for the range of patches expected for each feature.This model al-lows runtime matching to use simple pixel patches whilst providing sufficient viewpoint invariance for our frame-work.For fast localisation the memory and computational cost of matching is reduced by heavily quantising the model to a small binary representation that can be very efficiently matched at runtime.1.1.Our Contributions•We show fast and robust localisation of a target using simple features which only match under small view-point variations.•A large set of features from different views of a target are combined to allow matching under large transfor-mations.•We introduce a simple quantised-patch feature with a bit mask representation which enables very fast match-ing at runtime.The features represent the patch varia-tions observed in a training phase.2.Learning Features for a TargetWe use a large set of training images covering the entire range of viewpoints where localisation is required.The set of images could be captured for real,but we instead artifi-cially generate the set by warping a single reference image. Different scales,rotations and affine warps are included in the training set.Additionally random pixel noise and a blur of a small random size are added to each generated view so the trained features have more robustness to poor quality images.The training views for a target are grouped into sev-eral hundred viewpoint bins so that each bin covers a small range of viewpoints.The interest point detector is run on each image in the bin in sequence and patches are extracted from around the detected corners.The interest point loca-tions can be converted to a position in the reference frame as the warp between the reference and training image is known.If the database for the viewpoint already containsa Figure2.Left:The sparse8×8sampling grid used by the features. Right:The13samples selected to form the index.feature nearby the detected point in the new training image, then the patch model for that feature is updated with the new patch.Otherwise a new feature is created and added to the database.When all of the images in a viewpoint bin have been processed we select the n features(typically50-100) which were most repeatably detected by the FAST detec-tor and quantise their patch models to the binary feature descriptions used at runtime as described in the following section.2.1.Database Feature RepresentationThe features in our system are based on an8×8pixel patch extracted from a sparsely sampled grid around an in-terest point,as shown in Figure2.The extracted samples arefirstly normalised such that they have zero mean and unity standard deviation to give robustness to affine lighting variations.During training we build a model of the feature which consists of64independent empirical distributions of normalised intensity,one per pixel of the sampling grid.This model can be used to calculate the likelihood that a runtime patch is from a trained feature,assuming each pixel is independent.However computing this likelihood estimate would require too much memory and computation time to be used in real-time on a large database of fea-tures.Since features only need to match over small view-point ranges we are able to heavily quantise the model for a feature and still obtain excellent matching performance.We quantise the per-pixel distribution in two ways. Firstly the empirical intensity distributions are represented as histograms with5intensity bins.Secondly when train-ing is complete we replace the probability in each bin with a single bit which is1if pixels rarely fell into the bin(less than5%of the time).The quantisation is illustrated in Fig-ure3.A feature in the database D can be written as:D0,0D0,1D0,2D0,3D0,4D1,0D1,1D1,2D1,3D1,4...............D63,0D63,1D63,2D63,3D63,4,(1) where a row D i,...corresponds to the quantised histogramThe independent per-pixel empirical distributions are quantised into5intensity bins,and then further quantised into a bit mask identifying bins rarely observed during the training phase. This process is shown for:(left)a constant intensity region,(cen-tre)a step change in intensity,(right)an intensity ramp.The data was created by taking the image(top)and adding random blur, noise and translation errors.for a single pixel of the patch,andD i,j={1if P(B j<I(x i,y i)<B j+1)<0.050otherwise.(2)where B j is the minimum intensity value of histogram bin j and I(x i,y i)is the normalised value of pixel i.The resulting descriptor requires5bits for each of the64 samples giving a total of40bytes of memory per feature.4 additional bytes are used to store the position of the feature in the reference image.3.Runtime MatchingAfter the quantisation to bits the patch descriptions no longer represent probability distributions and so we cannot compute the likelihood of a feature giving rise to a patch. However the bit mask does identify the intensity bins that samples rarely fell into at training time and so good matches should only have a small number of samples which fall into these bins in the runtime patch.Hence we use a count of the number of samples which fall into bins marked with a1in the database patch description as our dissimilarity score.The best matching feature in the database is the one that gives the lowest dissimilarity score when compared to the query patch,as that represents the match with fewest “errors”(runtime pixels in unexpected bins).The major advantage of the simple error count measure is that it can be computed with bitwise operations,which allows a large number of potential matches to be scored very quickly.The bitwise representation of a runtime patch R is slightly different to the database feature of equation1.It is also represented by a320-bit value but has exactly1bit set for each pixel,corresponding to the intensity bin which the sample from the runtime patch is in:R i,j={1if B j<RP(x i,y i)<B j+10otherwise.(3)where RP(x i,y i)is the value of pixel i in the normalised runtime patch extracted from around an interest point de-tected in a runtime image.With the preceeding definitions of the database and run-time patch representations the dissimilarity score can be simply computed by counting the number of bits where both D i,j and R i,j are equal to1:e=∑i,jD i,j⊗R i,j,(4)where⊗is a logical AND.Since each row of R always has one single bit set,this can be rewritten as:e=∑i((D i,0⊗R i,0)⊕...⊕(D i,4⊗R i,4))(5)where⊕denotes logical OR.By packing each column of D and R into a64bit integer(D j and R j)the necessary logical operations can be performed for all rows in parallel. The dissimilarity score can thus be obtained from a bitcount of a64-bit integer:e=bitcount((D0⊗R0)⊕...⊕(D4⊗R4))(6) Computing the error measure therefore requires5ANDs, 4ORs and a bit count of a64bit integer.Some architectures (including recent x86CPUs with SSE4.2)support a single-instruction bitcount.For other architectures,including our test machine,the bitcount can be performed in16instruc-tions using an11bit lookup table to count chunks of11bits at a time.The total time to compute an error measure using the lookup table bitcount is about20ns.Thefirst stage offinding matches from a runtime image is to run the FAST-9interest point detector.As the training phase has selected the most repeatable FAST features from each viewpoint it is not necessary to obtain too many inter-est points from the input image.We typicallyfind no more than200are needed for robust localisation.The8×8patch of Figure2is extracted,and the mean and standard devi-ation of the samples are calculated to enable quantisation into the320-bits R i,j of equation3.The dissimilarity score between the patch and each database feature is computed using the fast method of equation6.The database feature with the lowest dissimilarity score for a runtime patch is treated as a match if the error count is below a threshold(typically5).The matches from all the runtime patches can be sorted by error count to order them in terms of quality.3.1.IndexingThe dissimilarity score between a runtime patch and a database feature can be computed very quickly using equa-tion6,however as we use larger numbers of features than alternative approaches it is desirable to combine the basic method above with an indexing scheme to reduce the num-ber of scores which must be computed and to prevent the search time growing linearly with the database size.The indexing approach we use is inspired by the Ferns work[14]which uses joint distributions of simple binary tests from training images.Our current implementation uses the13samples shown on the right of Figure2to com-pute an index number.The samples have been selected rea-sonably close to the patch centre as they are expected to be more consistent under rotation and scale,but somewhat spaced apart so that they are reasonably uncorrelated.Each of the samples selected for the index is quantised to a single bit:1if the pixel value is above the mean of the patch and0otherwise.The13samples are then concate-nated to form a13-bit integer.Thus the index in our cur-rent implementation can take values between0and8192. The index value is used to index a lookup table of sets of database features.At runtime the dissimilarity score is only computed against the set of features in the entry of the table with the matching index.The training phase is used to determine the set of index values which will account for most possible runtime views of a particular feature.Every patch from the training set that contributes to the model for a particular feature also con-tributes a vote for the index value computed from the patch. After training is complete we select the most-common in-dices until together the selected set of indices account for at least80%of the training patches used in building the fea-ture.This set of indices is saved with the feature,and the feature is inserted into all of the corresponding sets of fea-tures in the lookup table at runtime.3.2.Improving Robustness to BlurFAST is not an inherently multi-scale detector and fails to detect good features when the image is significantly blurred.Although our training set includes some random blur so the features are trained to be robust to this we still rely on the repeatability of the detector tofind the features in thefirst place.The few frames where blur is a problem in typical image sequences do not justify switching to a multi-scale detector,so we take a different approach.To perform detection in blurred images,we create an im-age pyramid with a factor of2in scale between images,and run FAST on each layer of the pyramid.In order to avoid incurring the cost of building the pyramid at each frame,we use a data driven approach to decide when to stop building the pyramid.Initially features are extracted and matched on the full-sized image.The features are then fed to the next stage of processing,such as estimating the camera pose.If the later stages of processing determine that there are too few good matches,then another set of features are extracted from the next layer of the image pyramid.These are aggregated with thefirst set of features,but the new features are assumed to have a better score.If again insufficient matches are found, the next layer of the pyramid is used and so on until either enough good matches or a minimum image size has been reached.We choose a factor of2between images in the pyra-mid,as this allows for a particularly efficient implementa-tion such that around200µs are required to half-sample a 640×480frame.We build a pyramid with a maximum of 3layers.The resulting system obtains considerable robust-ness to blur,since the blur in the smallest layer is reduced by a factor of4.Furthermore,it allows for matches to be made over a greater range of scales as the automatic fallback to sub-sampled images allows matching on frames when the camera is closer to the target than any training images. 4.Results and DiscussionIn order to validate our method,we apply it to the task of matching points in frames of a video sequence to a known planar object,andfinding the corresponding homography. Afterfinding matches the homography is estimated using PROSAC[3]and refined using the inliers.The inlier set is reestimated and refined for several iterations.The result-ing homography allows us to determine which points were matched correctly.The database for the frames shown in Figure1was gen-erated from a training set of21672images,generated by warping a single source image of the target.7different scale ranges and36different camera axis rotation ranges were used,giving a total of252viewpoint bins.Each bin covers a reduction in scale by a factor of0.8,10degrees of camera axis rotation,and out-of-plane viewpoints in all di-rections of up to30degrees.We extract around50features from each viewpoint bin(more from larger scale images), giving a total of13372features in the database.4.1.Validating the Bit Count Dissimilarity ScoreTwo short video sequences of the planar target of Figure 1were captured using a cheap VGA webcam.Thefirst se-quence was captured from viewpoints which were known to have been covered by our training phase whereas the second sequence was viewed with a larger out-of-plane rotation, known to be outside the range of training.The database fea-tures were trained from the source image,whereas the test sequences were poor-quality webcam images of a printed version of thefile.Thus both sequences test the method’sFAST interest point detection 0.55ms Building query bit masks 0.12ms Matching into database 0.35ms Robust pose estimation 0.1ms Total frame time 1.12msTable 1.Timings for the stages of our approachon a dataset with images taken from within the range of trainedviewpoints.Figure 4.The bit error count provides a reasonable way to deter-mine good matches.Left:matches from viewpoints contained in training set.Right:matches on viewpoints from outside training set.robustness to different imaging devices.Matching on the first test sequence was very good,cor-rectly localising the target in all 754frames of the test se-quence.There was little blur in the sequence so the full frame provided enough matches in all but 7frames of the sequence,when the half-sampled image fallback was used to obtain enough matches for a confident pose estimate.The average total frame time on the sequence was 1.12ms on a 2.4GHz processor.The time attributed to each stage of the process is shown in Table 1.Somewhat surprisingly our method also performed rea-sonably well on the second sequence,even though it was known the frames were taken from views that were not cov-ered by our training set.On this sequence the target was lo-calised in 635frames of the 675in the sequence (94%).As expected the pose estimate using onlythe full-frame image was generally less confident so the fallbacks to sub-sampled images were used more often:377frames used the half-image and 63also used the quarter-scale image.Because of this additional workload the per-frame average time in-creased to 1.52ms.The matching performance on these test sequences sug-Figure 5.Increasing the range of viewpoint bins in the training set allows more viewpoint invariance to be added in a straightforward manner.gests that the bit count dissimilarity score provides a reason-able way of scoring matches.To confirm this we computed the average number of inlier and outlier matches over all of the frames in the two sequences,and plotted these against the dissimilarity score obtained for the match in Figure 4.For the sequence on the left where the viewpoints are in-cluded in the training set many good matches are found in each frame,with on average 9.7zero-error inliers obtained.The inlier percentage for matches with low dissimilarity scores is also good at over 82%in the zero error case.The result that both the number of inliers and the inlier fraction drop off with increasing dissimilarity score demonstrates that the simple bit error count is a reasonable measure of the quality of a match.The figure provides strong support for a PROSAC-like robust estimation procedure once the matches have been sorted by dissimilarity score as the low error matches are very likely to be correct.Even when the viewpoint of the query image is outside the range for which features have been trained,as in the data on the right of Figure 4,the dissimilarity score still provides a reasonable way to sort the matches,as the inlier fraction can be seen to drop off with increasing dissimilarity.The inlier rate of the first matches when sorted by dissimilarity score is still sufficient in most frames to obtain a pose with a robust estimation stage such as PROSAC.4.2.Controllable Viewpoint InvarianceAs our framework uses independent features for different viewpoint bins it is possible to trade-off between robustness to viewpoint variations and computation required for local-isation by simply adding or removing more bins.For applications where viewpoints are restricted (for ex-ample if the camera has a roughly constant orientation)the number of database features can be drastically reduced lead-ing to even higher performance.Alternatively if more com-putational power is available it is possible to increase the。