Modeling Based Image Reconstruction in Time-Resolved Contrast-Enhanced Magnetic Resonance A
The Bamiyan project multi-resolution image-based modeling
1INTRODUCTIONThe region of Bamiyan, ca 200 km North-West of Kabul, Afghanistan, was one of the major Buddhist centers from the second century AD up to the time when Islam entered the area in the ninth century. For centuries, Bamiyan lay in the heart of the famous Silk Road, offering rest to caravans carrying goods across the area between China and Western Empires. Strategically situ-ated in a central location for travelers from North to South and East to West, Bamiyan was a common meeting place for many ancient cultures.In the region, many Buddha statues and hundreds of caves were carved out of the sedimen-tary rock. In particular, near the village of Bamiyan, at 2600 meters altitude, there were three big statues of Buddha carved out of a vertical cliff (Fig. 1). The larger statue of Bamiyan was 53 meters high while the smaller one measured 38 m. They were cut from the sandstone cliffs and they were covered with a mud and straw mixture to model fine details such as the expression of the face, the hands and the folds of the robe.The invasion of the Soviet army in December 1979 started 23-years long period of wars and barbary that left Afghanistan as a heavily hurt and damage country, with little hope for quick in-frastructure reconstruction, economic improvements, political stability and social peace.Figure 1. Panorama of the Bamiyan cliff with the three Buddha statues prior to demolition and a closer view of the two big standing statues of Bamiyan (color plate, see p._).The Bamiyan project: multi-resolution image-based modelingA. Gruen, F. Remondino & L. ZhangInstitute of Geodesy and Photogrammetry, ETH Zurich, SwitzerlandABSTRACT: The goal of the Bamiyan project of the Institute of Geodesy and Photogrammetry, ETH Zurich, started in July 2001, is the terrain modeling of the area, the 3D reconstruction of the two lost Buddha statues in Bamiyan as well the actual empty niches and the documentation of the UNESCO cultural heritage site with a topographic and tourist information system. Differ-ent types of images, with different spatial and temporal resolutions, have been used. The inte-gration of all the photogrammetrically recovered information requires powerful software able to handle large amounts of textured 3D data.Moreover at the end of the 1990’s the extremist Taleban regime started an internal war against all the non-Islamic symbol. This led in March 2001 to the complete destruction of the two big standing Buddha statues of Bamiyan (Fig. 2), as well as other small statues in Foladi and Kakrak. In 2003, the World Heritage Committee has decided to include the cultural land-scape and archaeological remains of the Bamiyan valley in the UNESCO World Heritage List (/). The area contains numerous Buddhist monastic ensembles and sanctu-aries, as well as fortified edifices from the Islamic period. The site symbolizes the hope of the international community that extreme acts of intolerance, such as the deliberate destruction of the Buddhas, are never repeated again.The whole area is nowadays in a fragile state of conservation as it has suffered from aban-donment, military actions and explosions. The major dangers are the risk of imminent collapse of the Buddha niches with the remaining fragments of the statues, further deterioration of still existing mural paintings in the caves, looting and illicit excavation.The main goals of the Bamiyan project are:-the terrain modeling of the entire Bamiyan area from satellite images for the generation of virtual flights over the UNESCO cultural heritage site;-the modeling of the rock cliff where the Buddha were carved out;-the 3D computer reconstruction of the two lost Buddha statues and the mapping of all the frescos of the niches;-the 3D modeling of the two empty niches where the Buddha statues once stood;-the documentation of the cultural heritage area with a topographic, tourist and cultural in-formation system.The project is an excellent example of image-based modeling, using many types of images, with different spatial and temporal resolution. It shows the capabilities and achievements of the photogrammetric modeling techniques and combines large site landscape modeling with highly detailed modeling of objects (i.e. the statues) by terrestrial images. Automated image-based modeling algorithms have been specifically developed for the modeling of the Great Buddha statue, but, at the end, manual measurements revealed to be the best procedure to recover reli-able and accurate 3D models.Figure 2. The explosion of March 2001 that destroyed the Buddha statues (Image Source: CNN). The two empty niches, where the Buddha once stood, as seen in August 2003 during our field campaign.2TERRAIN MODELING FROM SATELLITE IMAGERYFor the 3D modeling and visualization of the area of interest, an accurate DTM is required. Ae-rial images were not available to us and the idea to acquire them was unrealistic, due to the ab-sence of any surveying company operating in that area. Thus, space-based image acquisition and processing resulted as the only alternative to the aerial photos or any other surveying method. Nowadays space images are competing successfully with traditional aerial photos, for the purpose of DTM generation or terrain study in such problematic countries as the current Af-ghanistan. Also, the availability of high-resolution world-wide scenes taken from satellite plat-forms is constantly increasing. Those scenes are available in different radiometric modes (pan-chromatic, multispectral) and also in stereo mode.For the project, a B/W stereo pair acquired with the HRG sensor carried on SPOT-5 and a PAN Geo level IKONOS image mosaic over the Bamiyan area were available. The SPOT5 im-ages were acquired in across-track direction at 2.5 m ground resolution while the IKONOS im-age has a ground resolution of 1 m.The sensor modeling, DTM/DSM and ortho-image generation were performed with our soft-ware SAT-PP, recently developed for the processing of high-resolution satellite imagery (Zhang & Gruen 2004, Poli et al. 2004, Gruen et al. 2005).The IKONOS mosaic orientation was based on a 2D affine transformation. On the other hand, the SPOT scenes orientation was based on a rational function model. Using the camera model, the calibration data and the ephemeris contained in the metadata file, the software esti-mates the RPC (Rational Polynomial Coefficients) for each image and applies a block adjust-ment in order to remove systematic errors in the sensor external and internal orientation. The scenes' orientation was performed with the help of some GCPs measured with GPS.The DTM was afterwards generated from the oriented SPOT stereo pair using the SAT–PP module for DTM/DSM generation. A 20 m raster DTM for the whole area and 5 m raster DTM for the area covered by the IKONOS image were interpolated from the original matching results (Fig. 3), using also some manually measured breaklines near the Buddha cliff. The matching al-gorithm combines the matching results of feature points, grid points and edges. It is a modified version of MPGC (Multi Photo Geometrically Constrained) matching algorithm (Gruen 1985, Zhang & Gruen 2004) and can achieve sub-pixel accuracy for all the matched features.For the photo-realistic visualization of the whole Bamiyan area, a 2.5 m resolution B/W or-tho-image from SPOT images and a 1 m resolution RGB ortho-image from the IKONOS image were generated. The textured 3D model (rendered with Erdas-Virtual GIS) is shown in Figure 4 where two closer views on the 3D IKONOS textured model of the Bamiyan cliff and the old Bamiyan city (the pyramid-type hill to the left) are presented.Figure 3. The recovered 20 m DTM of the Bamiyan area displayed in color coding mode (left), overlaid by the 5 m DTM (right) (colour plate, see p._).33D MODELING OF THE ROCK CLIFFFor the reconstruction and modeling of the Bamiyan cliff (Fig. 5), a series of terrestrial images acquired with an analogue Rollei 6006 camera was used while ca. 30 control points (measured with a total station) distributed all along the rock cliff were used as reference. The images weredigitized at 20 µm resolution and then oriented with a photogrammetric bundle-adjustment. Then manual measurements were performed on stereo-pairs in order to get all the small details that an automated procedure would smooth out. The recovered point cloud was triangulated, ed-ited and finally textured, as shown in Figure 6.Because of the network configuration and the complex shape of the rock facade, the recov-ered geometric model is not really complete, in particular in the upper part. In some areas it was not possible to find corresponding features, because of occlusions, different lighting conditions and shadows. This is not such a big problem, because the cliff model is not meant to be used alone, but in a next step it will be integrated into the DTM for visualization purposes.Empty niche of the Great BuddhaEmpty niche of the Small Buddha Rock cliff with Buddha nichesThe new BazaarShahr-i-Ghulghulah, the old Bamiyan cityFigure 4. Close view of the Bamiyan terrain model textured with an IKONOS ortho-image.Figure 5. The Bamiyan cliff, approximately 1 km long and 100 m high.Figure 6. Textured 3D model of the Bamiyan cliff, modeled with 30 images. The entire cliff (above) and two closer views of the niches (left: Big Buddha, right: Small Buddha).43D MODELING OF THE GREAT BUDDHA AND ITS ACTUAL EMPTY NICHEThe 3D computer reconstruction of the Great Buddha statue was performed on different image data-sets and using different algorithms (Gruen et al. 2004). Various 3D computer models of different quality, mostly based on automated image measurements were produced. However, in most of the cases, the reconstructed 3D model did not contain essential small features, like the folds of the dress and some important edges of the niche. Therefore, for the generation of a complete and detailed 3D model, manual photogrammetric measurements were indispensable. They were performed along horizontal profiles at 20 cm interval on three metric images, ac-quired in 1970 by Prof. Kostka (Kostka 1974) and scanned at 10 µm resolution.The final 3D model of the Great Buddha (Fig. 7) was used for the generation of different physical models of the Great Buddha. In particular, a 1:25 scale model was generated for the Swiss pavilion of the 2005 EXPO in Aichi, Japan.The modeling of the empty Buddha niches was instead performed using five digital images acquired with a Sony Cybershot F707 during our field campaign in August 2003. The image size is 1920 X 2560 pixels while the pixel size is ca 3.4 µm. After the image orientation, three stereo-models were set up and points were manually measured along horizontal profiles, while the main edges were measured as breaklines. Thus a point cloud of ca 12,000 points was gener-ated. The final textured 3D model is displayed in Figure 7.5MOSAICKING AND MAPPING OF THE FRESCOSThe niches of the Bamiyan Buddha statues were rich with paintings, which have been partly de-stroyed earlier in history and ultimately during the explosions. The best way of proper docu-mentation and visualization of this lost art is the generation of an accurate and photo-realistic image-based 3D model.Figure 7. The 3D textured model of the Great Buddha of Bamiyan and its actual empty niche.In particular, the ceiling part of the Big Buddha niche (approximately 15 m of diameter and 16 m depth) was rich with mural paintings, of many different colors, representing Buddha-like figures, bright-colored persons, ornaments, flowers and hanging curtains. Using available im-ages that tourists acquired in the 60’s and 70’s, we were able to create different mosaics of the paintings and the use them for the photo-realistic texture mapping of the 3D model (Remondino & Niederoest 2004).6INTEGRATION OF MULTI-RESOLUTION IMAGE-BASED DATAIn the last years a big number of sites and objects have been digitally modeled, using different tools, mainly for visualization and documentation. A great force for this trend has been the availability and improvement of image and range sensors, as well as the increasing power of computers for storage, computation and rendering of the digital data.The Bamiyan project is a combination of multi-resolution and multi-temporal photogrammet-ric data, as summarized in Table 1. The geometric resolution of the recovered 3D data spans from 20 m (SPOT5) to 5 cm (Buddha model) while the texture information is between 2.5 m (SPOT5) and 2 mm (fresco) resolution. A factor 400 exists between the different geometry reso-lutions, while there is a factor 1250 in the texture. The whole triangulated surface model covers an area of ca 49 X 38 km and contains approximately 35 millions triangles, while the texture occupies ca 2 GB. The fusion of the multi-resolution (and multi-temporal) data is a very com-plex and critical task. Currently there is no commercial software able to handle all these kinds of data at the same time, mainly for these reasons:-the data is a combination of 2.5 and 3D geometry, limiting the use of packages for geodata visualization, usually very powerful for large site textured terrain models;-the amount of data is too big for graphical rendering and animation packages, generally able to handle textured 3D data.-The high-resolution texture information exceeds the memory capacity of most current graphic cards.Therefore, there is a need for rendering techniques able to maximize the available amount of visible data with an optimal use of the rendering power, while maintaining smooth motion dur-ing the interactive navigations. Towards this goal Borgeat et al. (2003) developed a multi-resolution representation and display method that integrates aspects of edge contraction progres-sive meshes and classical discrete LOD techniques. It aims at multi-resolution rendering with minimal visual artifacts while displaying high-resolution and detailed scenes or 3D objects. Table 1. Multi-resolution data (geometry and images) used in the Bamiyan project.Source of data Year Image resolution(µm)Geometry resolution(m)Texture resolution(m)aIKONOS a2001 - 5 1Rollei b2003 20 1 0.5Sony b2003 4 0.5 0.1[Kostka, 1974] b 1970 10 0.05 0.01Frescos b60’s & 70’s 20 N.A. 0.002a b7TOURIST INFORMATION SYSTEMThe information recovered from the high-resolution satellite imagery is imported in GIS soft-ware (ArcView and ArcGIS) for further analysis, data visualization and topographic information generation.The use of Geographic Information Systems in heritage management has been also under-lined by UNESCO, as a GIS allows: (1) historical and physical site documentation, (2) the as-sessment of physical condition, cultural significance and administrative context, (3) the prepara-tion of conservation and management strategies, (4) the implementation, monitoring and evaluation of management policies (/culture/gis/index.html). Fur-thermore, a GIS tool generates permanent records of heritage sites, including also text documen-tation, virtual flight-overs and 3D models.The Bamiyan valley includes 8 protected locations, identified with an area of interest and a buffer area (/pg.cfm?cid=31&id_site=208). All the areas were mapped and documented within a GIS, together with man-made objects (e.g. streets and buildings) and rivers from IKONOS imagery. A total of 243 objects were extracted and then overlapped onto the re-covered DTM and ortho-image (Fig. 8). Finally, using the contour lines generated from the DTM and the extracted objects, a new plan of the Bamiyan area was also generated, as the pre-vious one was done by the Russian in 70’s.Figure 8. Two views of the 3D model of the Bamiyan area with the extracted rivers and man-made struc-tures (streets, houses and airport) (colour plate, see p. ).8CONCLUSIONSThe reported Bamiyan project is a complete image-based 3D modeling application that com-bines multi-resolution geometry and multi-temporal high-resolution images. The modeling of the whole cultural heritage site of Bamiyan required the use of different types of sensors and produced a detailed terrain model as well as 3D models of other objects. The 3D data is now used for visualization, animation, documentation and for the generation of a cultural and tourist information system.For the photo-realistic rendering and visualization of the generated digital models different commercial packages have been used separately, as the management and visualization of the whole data is still problematic, in particular for real-time rendering.The 3D model of the Great Buddha has been used for the production of a 90 minute movie about “The Giant Buddha”, which is planned to be shown in movie theatres late in 2005. ACKNOWLEDGMENTSThe authors would like to thank Daniela Poli for her help in the satellite image acquisition, CNES for providing the SPOT-5/HRS images (www.spotimage.fr) at special conditions through the ISIS program (http://medias.obs-mip.fr/isis/?choix_lang=English) and Space Imaging () for providing a IKONOS scene for free. We also appreciate the con-tributions of Natalia Vassilieva in terms of doing photogrammetric measurements and the work done with Jana Niederoest for the mosaicking of the frescos.REFERENCESBorgeat, L., Fortin, P.-A. & Godin, G. 2003. A fast hybrid geomorphing LOD scheme. In Proc. of SIG-GRAPH’03, Sketches and Application, San Diego, CA, USA, 27-31 July (on CD-ROM).Gruen, A. 1985. Adaptive Least Squares Correlation: A powerful Image Matching Technique. South Afri-can Journal of Photogrammetry, Remote Sensing and Cartography 14 (3): 175-187.Gruen, A., Remondino, F. & Zhang, L. 2004.Photogrammetric Reconstruction of the Great Buddha of Bamiyan, Afghanistan. The Photogrammetric Record 19(107): 177-199.Gruen, A., Zhang, L. & Eisenbeiss, H. 2005. 3D Precision Processing of High-resolution Satellite Im-agery. In Proc. of ASPRS Annual Meeting, Baltimore, MD, USA, 7-11 March (on CD-ROM). Kostka, R., 1974. Die Stereophotogrammetrische Aufnahme des Grossen Buddha in Bamiyan. Afghani-stan Journal 3(1): 65-74.Poli, D., Zhang, L. & Gruen, A. 2004. SPOT-5/HRS Stereo Images Orientation and Automated DSM Generation. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sci-ences, Vol. 35, Part B1: 421-432.Remondino, F. & Niederoest, J. 2004. Generation of high-resolution Mosaic for photo-realistic texture-mapping of cultural heritage 3D models. In Proc. 5th International Symposium on Virtual Reality, Ar-chaeology and Cultural Heritage (VAST), 85-92, Brussels, Belgium, 6-10 December.Zhang, L. & Gruen, A. 2004. DSM Generation from Linear Array Imagery. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. 35, Part B3: 128-133.。
图像处理和计算机视觉中的经典论文
前言:最近由于工作的关系,接触到了很多篇以前都没有听说过的经典文章,在感叹这些文章伟大的同时,也顿感自己视野的狭小。
想在网上找找计算机视觉界的经典文章汇总,一直没有找到。
失望之余,我决定自己总结一篇,希望对 CV领域的童鞋们有所帮助。
由于自
己的视野比较狭窄,肯定也有很多疏漏,权当抛砖引玉了
1990年之前
1990年
1991年
1992年
1993年
1994年
1995年
1996年
1997年
1998年
1998年是图像处理和计算机视觉经典文章井喷的一年。
大概从这一年开始,开始有了新的趋势。
由于竞争的加剧,一些好的算法都先发在会议上了,先占个坑,等过一两年之后再扩展到会议上。
1999年
2000年
世纪之交,各种综述都出来了
2001年
2002年
2003年
2004年
2005年
2006年
2007年
2008年
2009年
2010年
2011年
2012年。
ddpm和ddim算法重参数技巧
DDPM(Diffusion Probabilistic Models)和DDIM(Diffusion Implicit Models)是一种基于扩散过程的概率生成模型,它们在计算机视觉、图像生成和样本自动生成领域取得了许多突破性的成果。
其中,DDPM和DDIM算法的重参数技巧是其关键之一,本文将就DDPM和DDIM算法的重参数技巧进行深入分析和讨论。
一、DDPM和DDIM算法概述1. DDPM算法概述DDPM是一种基于扩散过程的概率生成模型,它通过建模数据的漫步过程来实现对数据分布的建模。
DDPM算法利用了高斯过程的性质,将高斯过程的扩散过程应用到数据生成中,从而实现了对图像数据的生成和重参数化。
DDPM算法的核心思想是将数据视为扩散过程中的粒子,通过模拟这些粒子的运动轨迹来生成图像数据。
通过对扩散过程进行建模和估计,DDPM算法能够有效地捕捉数据的分布特征,从而实现对图像数据的高效生成。
2. DDIM算法概述DDIM是基于扩散过程的隐式生成模型,它利用了扩散过程的性质来建模数据的生成过程。
与DDPM算法不同,DDIM算法通过对潜在空间的建模和估计,从而实现对图像数据的生成和重参数化。
DDIM算法的核心思想是通过对隐变量的建模和边缘化,从而实现对图像数据的生成。
通过对潜在空间的建模和估计,DDIM算法能够有效地捕捉数据的分布特征,从而实现对图像数据的高效生成。
二、DDPM和DDIM算法的重参数技巧1. DDPM算法的重参数技巧DDPM算法的重参数技巧是其关键之一,它通过引入可微分的随机变量来实现对模型的训练和推断。
具体来说,DDPM算法通过引入重参数化技巧,将模型的参数与噪声变量进行耦合,从而实现对模型的训练和推断。
重参数化技巧的核心是将随机变量的采样过程分解为确定性的变换和随机的噪声变量,从而使得采样过程可微分。
通过引入这种可微分的随机变量,DDPM算法能够实现对模型的端到端训练和推断,从而提高了模型的训练效率和推断精度。
电脑的重要性 英语作文
电脑的重要性英语作文Title: The Importance of Computers。
In today's digital age, the importance of computers cannot be overstated. From personal use to business operations and scientific research, computers have become an indispensable part of our lives. In this essay, we will explore the multifaceted significance of computers in various aspects of modern society.Firstly, let us consider the realm of education. Computers have revolutionized the way we learn and acquire knowledge. With access to the internet, students can explore vast amounts of information, conduct research, and collaborate with peers from around the world. Educational software and online courses have made learning more engaging and accessible, catering to diverse learning styles and needs. Moreover, interactive multimedia resources enhance understanding and retention of complex concepts. Thus, computers play a pivotal role in moderneducation, empowering learners and educators alike.Moving on to the realm of commerce and industry, computers are the backbone of operations in virtually every sector. From managing inventory and processing transactions to analyzing market trends and communicating with stakeholders, businesses rely on computers for efficiency and competitiveness. Moreover, e-commerce platforms have transformed the way goods and services are bought and sold, expanding market reach and streamlining transactions. Additionally, computer-aided design (CAD) and manufacturing (CAM) technologies have revolutionized product development and production processes, driving innovation and quality improvement. Therefore, computers are indispensable tools for driving economic growth and progress.Furthermore, computers have revolutionized communication and social interaction. With the advent of email, social media, and instant messaging platforms, people can connect and communicate across vast distances instantaneously. Social networking sites have facilitated the formation of virtual communities based on sharedinterests and affiliations, fostering connections and collaborations beyond geographical boundaries. Moreover, video conferencing and online collaboration tools have transformed the way teams work together, enabling remote work and global collaboration. Thus, computers have redefined the dynamics of human interaction, making the world more interconnected than ever before.In the field of healthcare, computers play a crucial role in diagnosis, treatment, and research. Electronic health records (EHRs) streamline patient information management, ensuring continuity of care and facilitating data-driven decision-making. Medical imaging technologies, such as MRI and CT scans, rely on computer algorithms for image reconstruction and analysis, aiding in the detection and diagnosis of diseases. Moreover, computational modeling and simulation techniques enable researchers to study biological processes at the molecular level, leading to breakthroughs in drug discovery and personalized medicine. Therefore, computers are instrumental in advancing healthcare delivery and improving patient outcomes.Lastly, computers have transformed entertainment and leisure activities. From streaming movies and music to playing video games and engaging in virtual reality experiences, computers provide endless entertainment options for people of all ages. Moreover, digitalcreativity tools empower individuals to express themselves through art, music, and multimedia projects, fostering creativity and self-expression. Additionally, online gaming communities and virtual worlds offer opportunities for socializing and collaboration in immersive digital environments. Thus, computers enrich our lives by providing avenues for relaxation, entertainment, and creative expression.In conclusion, the importance of computers in modern society cannot be overstated. From education and commerce to communication and healthcare, computers play a pivotal role in driving progress and innovation across various domains. As we continue to embrace technological advancements, it is essential to harness the power of computers responsibly and ethically for the betterment of humanity.。
哈姆林中心杨广中实验室
Predictive Cardiac Motion Modeling and Correction with PLSR Predictive cardiac motion modeling and correction based on partial least squares regression to extract intrinsic relationships between three-dimensional (3D) cardiac deformation due to respiration and multiple one-dimensional real-time measurable surface intensity traces at chest or abdomen. - see IEEE TMI 23(10), 2004
Myocardial Strain and Stain Rate Analysis Virtual tagging with MR myocardial velocity mapping - IEEE TMI Strain rate analysis with constrained myocardial velocity restoration Review of methods for measuring intrinsic myocardial mechanics - JMRI Atheroma Imaging and Analysis The use of selective volume excitation for high resolution vessel wall imaging (JMRI, 2003;17(5):572-80). 3D morphological modeling of the arterial wall Feature reduction based atheroma classification Volume Selective Coronary Imaging A locally focused MR imaging method for 3-D zonal echo-planar coronary angiography using volume selective RF excitation. Spatially variable resolution was used for delineating coronary arteries and reducing the effect of residual signals caused by the imperfect excitation profile of the RF pulse. The use of variable resolution enabled the derivation of basis functions having variable spatial characteristics pertain to regional object details and a significantly smaller number of phase encoded signal measurements was needed for image reconstruction. Gatehouse PD, Keegan J, Yang GZ, Firmin DN. Magn Reson Med, 2001 Nov;46(5):1031-6. Yang GZ, Burger P, Gatehouse, PD, Firmin DN. Magn Reson Med, 41, 171-178, 1999. Yang GZ, Gatehouse PD, Keegan J, Mohiaddin RH, Firmin DN. J. Magn Reson Med, 39: 833-842, 1998.
基于无序图像的三维建模方法
Int J Comput VisDOI10.1007/s11263-007-0107-3Modeling the World from Internet Photo Collections Noah Snavely·Steven M.Seitz·Richard SzeliskiReceived:30January2007/Accepted:31October2007©Springer Science+Business Media,LLC2007Abstract There are billions of photographs on the Inter-net,comprising the largest and most diverse photo collec-tion ever assembled.How can computer vision researchers exploit this imagery?This paper explores this question from the standpoint of3D scene modeling and visualization.We present structure-from-motion and image-based rendering algorithms that operate on hundreds of images downloaded as a result of keyword-based image search queries like “Notre Dame”or“Trevi Fountain.”This approach,which we call Photo Tourism,has enabled reconstructions of nu-merous well-known world sites.This paper presents these algorithms and results as afirst step towards3D modeling of the world’s well-photographed sites,cities,and landscapes from Internet imagery,and discusses key open problems and challenges for the research community.Keywords Structure from motion·3D scene analysis·Internet imagery·Photo browsers·3D navigation1IntroductionMost of the world’s significant sites have been photographed under many different conditions,both from the ground and from the air.For example,a Google image search for“Notre Dame”returns over one million hits(as of September, 2007),showing the cathedral from almost every conceivable viewing position and angle,different times of day and night, N.Snavely( )·S.M.SeitzUniversity of Washington,Seattle,WA,USAe-mail:snavely@R.SzeliskiMicrosoft Research,Redmond,WA,USA and changes in season,weather,and decade.Furthermore, entire cities are now being captured at street level and from a birds-eye perspective(e.g.,Windows Live Local,1,2and Google Streetview3),and from satellite or aerial views(e.g., Google4).The availability of such rich imagery of large parts of the earth’s surface under many different viewing conditions presents enormous opportunities,both in computer vision research and for practical applications.From the standpoint of shape modeling research,Internet imagery presents the ultimate data set,which should enable modeling a signifi-cant portion of the world’s surface geometry at high resolu-tion.As the largest,most diverse set of images ever assem-bled,Internet imagery provides deep insights into the space of natural images and a rich source of statistics and priors for modeling scene appearance.Furthermore,Internet imagery provides an ideal test bed for developing robust and gen-eral computer vision algorithms that can work effectively “in the wild.”In turn,algorithms that operate effectively on such imagery will enable a host of important applications, ranging from3D visualization,localization,communication (media sharing),and recognition,that go well beyond tradi-tional computer vision problems and can have broad impacts for the population at large.To date,this imagery is almost completely untapped and unexploited by computer vision researchers.A major rea-son is that the imagery is not in a form that is amenable to processing,at least by traditional methods:the images are 1Windows Live Local,.2Windows Live Local—Virtual Earth Technology Preview,http:// .3Google Maps,.4Google Maps,.Int J Comput Visunorganized,uncalibrated,with widely variable and uncon-trolled illumination,resolution,and image quality.Develop-ing computer vision techniques that can operate effectively with such imagery has been a major challenge for the re-search community.Within this scope,one key challenge is registration,i.e.,figuring out correspondences between im-ages,and how they relate to one another in a common3D coordinate system(structure from motion).While a lot of progress has been made in these areas in the last two decades (Sect.2),many challenging open problems remain.In this paper we focus on the problem of geometrically registering Internet imagery and a number of applications that this enables.As such,wefirst review the state of the art and then present somefirst steps towards solving this problem along with a visualization front-end that we call Photo Tourism(Snavely et al.2006).We then present a set of open research problems for thefield,including the cre-ation of more efficient correspondence and reconstruction techniques for extremely large image data sets.This paper expands on the work originally presented in(Snavely et al. 2006)with many new reconstructions and visualizations of algorithm behavior across datasets,as well as a brief dis-cussion of Photosynth,a Technology Preview by Microsoft Live Labs,based largely on(Snavely et al.2006).We also present a more complete related work section and add a broad discussion of open research challenges for thefield. Videos of our system,along with additional supplementary material,can be found on our Photo Tourism project Web site,.2Previous WorkThe last two decades have seen a dramatic increase in the capabilities of3D computer vision algorithms.These in-clude advances in feature correspondence,structure from motion,and image-based modeling.Concurrently,image-based rendering techniques have been developed in the com-puter graphics community,and image browsing techniques have been developed for multimedia applications.2.1Feature CorrespondenceTwenty years ago,the foundations of modern feature detec-tion and matching techniques were being laid.Lucas and Kanade(1981)had developed a patch tracker based on two-dimensional image statistics,while Moravec(1983)intro-duced the concept of“corner-like”feature points.Först-ner(1986)and then Harris and Stephens(1988)both pro-posedfinding keypoints using measures based on eigenval-ues of smoothed outer products of gradients,which are still widely used today.While these early techniques detected keypoints at a single scale,modern techniques use a quasi-continuous sampling of scale space to detect points invari-ant to changes in scale and orientation(Lowe2004;Mikola-jczyk and Schmid2004)and somewhat invariant to affine transformations(Baumberg2000;Kadir and Brady2001; Schaffalitzky and Zisserman2002;Mikolajczyk et al.2005).Unfortunately,early techniques relied on matching patches around the detected keypoints,which limited their range of applicability to scenes seen from similar view-points,e.g.,for aerial photogrammetry applications(Hannah 1988).If features are being tracked from frame to frame,an affine extension of the basic Lucas-Kanade tracker has been shown to perform well(Shi and Tomasi1994).However,for true wide baseline matching,i.e.,the automatic matching of images taken from widely different views(Baumberg2000; Schaffalitzky and Zisserman2002;Strecha et al.2003; Tuytelaars and Van Gool2004;Matas et al.2004),(weakly) affine-invariant feature descriptors must be used.Mikolajczyk et al.(2005)review some recently devel-oped view-invariant local image descriptors and experimen-tally compare their performance.In our own Photo Tourism research,we have been using Lowe’s Scale Invariant Fea-ture Transform(SIFT)(Lowe2004),which is widely used by others and is known to perform well over a reasonable range of viewpoint variation.2.2Structure from MotionThe late1980s also saw the development of effective struc-ture from motion techniques,which aim to simultaneously reconstruct the unknown3D scene structure and camera positions and orientations from a set of feature correspon-dences.While Longuet-Higgins(1981)introduced a still widely used two-frame relative orientation technique in 1981,the development of multi-frame structure from mo-tion techniques,including factorization methods(Tomasi and Kanade1992)and global optimization techniques(Spet-sakis and Aloimonos1991;Szeliski and Kang1994;Olien-sis1999)occurred quite a bit later.More recently,related techniques from photogrammetry such as bundle adjustment(Triggs et al.1999)(with related sparse matrix techniques,Szeliski and Kang1994)have made their way into computer vision and are now regarded as the gold standard for performing optimal3D reconstruc-tion from correspondences(Hartley and Zisserman2004).For situations where the camera calibration parameters are unknown,self-calibration techniques,whichfirst esti-mate a projective reconstruction of the3D world and then perform a metric upgrade have proven to be successful (Pollefeys et al.1999;Pollefeys and Van Gool2002).In our own work(Sect.4.2),we have found that the simpler approach of simply estimating each camera’s focal length as part of the bundle adjustment process seems to produce good results.Int J Comput VisThe SfM approach used in this paper is similar to that of Brown and Lowe(2005),with several modifications to improve robustness over a variety of data sets.These in-clude initializing new cameras using pose estimation,to help avoid local minima;a different heuristic for selecting the initial two images for SfM;checking that reconstructed points are well-conditioned before adding them to the scene; and using focal length information from image EXIF tags. Schaffalitzky and Zisserman(2002)present another related technique for reconstructing unordered image sets,concen-trating on efficiently matching interest points between im-ages.Vergauwen and Van Gool have developed a similar approach(Vergauwen and Van Gool2006)and are hosting a web-based reconstruction service for use in cultural heritage applications5.Fitzgibbon and Zisserman(1998)and Nistér (2000)prefer a bottom-up approach,where small subsets of images are matched to each other and then merged in an agglomerative fashion into a complete3D reconstruction. While all of these approaches address the same SfM prob-lem that we do,they were tested on much simpler datasets with more limited variation in imaging conditions.Our pa-per marks thefirst successful demonstration of SfM tech-niques applied to the kinds of real-world image sets found on Google and Flickr.For instance,our typical image set has photos from hundreds of different cameras,zoom levels, resolutions,different times of day or seasons,illumination, weather,and differing amounts of occlusion.2.3Image-Based ModelingIn recent years,computer vision techniques such as structure from motion and model-based reconstruction have gained traction in the computer graphicsfield under the name of image-based modeling.IBM is the process of creating three-dimensional models from a collection of input images(De-bevec et al.1996;Grzeszczuk2002;Pollefeys et al.2004).One particular application of IBM has been the cre-ation of large scale architectural models.Notable exam-ples include the semi-automatic Façade system(Debevec et al.1996),which was used to reconstruct compellingfly-throughs of the University of California Berkeley campus; automatic architecture reconstruction systems such as that of Dick et al.(2004);and the MIT City Scanning Project (Teller et al.2003),which captured thousands of calibrated images from an instrumented rig to construct a3D model of the MIT campus.There are also several ongoing academic and commercial projects focused on large-scale urban scene reconstruction.These efforts include the4D Cities project (Schindler et al.2007),which aims to create a spatial-temporal model of Atlanta from historical photographs;the 5Epoch3D Webservice,http://homes.esat.kuleuven.be/~visit3d/ webservice/html/.Stanford CityBlock Project(Román et al.2004),which uses video of city blocks to create multi-perspective strip images; and the UrbanScape project of Akbarzadeh et al.(2006). Our work differs from these previous approaches in that we only reconstruct a sparse3D model of the world,since our emphasis is more on creating smooth3D transitions be-tween photographs rather than interactively visualizing a3D world.2.4Image-Based RenderingThefield of image-based rendering(IBR)is devoted to the problem of synthesizing new views of a scene from a set of input photographs.A forerunner to thisfield was the groundbreaking Aspen MovieMap project(Lippman1980), in which thousands of images of Aspen Colorado were cap-tured from a moving car,registered to a street map of the city,and stored on laserdisc.A user interface enabled in-teractively moving through the images as a function of the desired path of the user.Additional features included a navi-gation map of the city overlaid on the image display,and the ability to touch any building in the currentfield of view and jump to a facade of that building.The system also allowed attaching metadata such as restaurant menus and historical images with individual buildings.Recently,several compa-nies,such as Google6and EveryScape7have begun creating similar“surrogate travel”applications that can be viewed in a web browser.Our work can be seen as a way to automati-cally create MovieMaps from unorganized collections of im-ages.(In contrast,the Aspen MovieMap involved a team of over a dozen people working over a few years.)A number of our visualization,navigation,and annotation capabilities are similar to those in the original MovieMap work,but in an improved and generalized form.More recent work in IBR has focused on techniques for new view synthesis,e.g.,(Chen and Williams1993; McMillan and Bishop1995;Gortler et al.1996;Levoy and Hanrahan1996;Seitz and Dyer1996;Aliaga et al.2003; Zitnick et al.2004;Buehler et al.2001).In terms of appli-cations,Aliaga et al.’s(2003)Sea of Images work is perhaps closest to ours in its use of a large collection of images taken throughout an architectural space;the same authors address the problem of computing consistent feature matches across multiple images for the purposes of IBR(Aliaga et al.2003). However,our images are casually acquired by different pho-tographers,rather than being taken on afixed grid with a guided robot.In contrast to most prior work in IBR,our objective is not to synthesize a photo-realistic view of the world from all viewpoints per se,but to browse a specific collection of 6Google Maps,.7Everyscape,.Int J Comput Visphotographs in a3D spatial context that gives a sense of the geometry of the underlying scene.Our approach there-fore uses an approximate plane-based view interpolation method and a non-photorealistic rendering of background scene structures.As such,we side-step the more challenging problems of reconstructing full surface models(Debevec et al.1996;Teller et al.2003),lightfields(Gortler et al.1996; Levoy and Hanrahan1996),or pixel-accurate view inter-polations(Chen and Williams1993;McMillan and Bishop 1995;Seitz and Dyer1996;Zitnick et al.2004).The bene-fit of doing this is that we are able to operate robustly with input imagery that is beyond the scope of previous IBM and IBR techniques.2.5Image Browsing,Retrieval,and AnnotationThere are many techniques and commercial products for browsing sets of photos and much research on the subject of how people tend to organize photos,e.g.,(Rodden and Wood2003).Many of these techniques use metadata,such as keywords,photographer,or time,as a basis of photo or-ganization(Cooper et al.2003).There has recently been growing interest in using geo-location information to facilitate photo browsing.In particu-lar,the World-Wide Media Exchange(WWMX)(Toyama et al.2003)arranges images on an interactive2D map.Photo-Compas(Naaman et al.2004)clusters images based on time and location.Realityflythrough(McCurdy and Griswold 2005)uses interface ideas similar to ours for exploring video from camcorders instrumented with GPS and tilt sensors, and Kadobayashi and Tanaka(2005)present an interface for retrieving images using proximity to a virtual camera.In Photowalker(Tanaka et al.2002),a user can manually au-thor a walkthrough of a scene by specifying transitions be-tween pairs of images in a collection.In these systems,loca-tion is obtained from GPS or is manually specified.Because our approach does not require GPS or other instrumentation, it has the advantage of being applicable to existing image databases and photographs from the Internet.Furthermore, many of the navigation features of our approach exploit the computation of image feature correspondences and sparse 3D geometry,and therefore go beyond what has been possi-ble in these previous location-based systems.Many techniques also exist for the related task of retriev-ing images from a database.One particular system related to our work is Video Google(Sivic and Zisserman2003)(not to be confused with Google’s own video search),which al-lows a user to select a query object in one frame of video and efficientlyfind that object in other frames.Our object-based navigation mode uses a similar idea,but extended to the3D domain.A number of researchers have studied techniques for au-tomatic and semi-automatic image annotation,and annota-tion transfer in particular.The LOCALE system(Naaman et al.2003)uses proximity to transfer labels between geo-referenced photographs.An advantage of the annotation ca-pabilities of our system is that our feature correspondences enable transfer at muchfiner granularity;we can transfer annotations of specific objects and regions between images, taking into account occlusions and the motions of these ob-jects under changes in viewpoint.This goal is similar to that of augmented reality(AR)approaches(e.g.,Feiner et al. 1997),which also seek to annotate images.While most AR methods register a3D computer-generated model to an im-age,we instead transfer2D image annotations to other im-ages.Generating annotation content is therefore much eas-ier.(We can,in fact,import existing annotations from pop-ular services like Flickr.)Annotation transfer has been also explored for video sequences(Irani and Anandan1998).Finally,Johansson and Cipolla(2002)have developed a system where a user can take a photograph,upload it to a server where it is compared to an image database,and re-ceive location information.Our system also supports this application in addition to many other capabilities(visual-ization,navigation,annotation,etc.).3OverviewOur objective is to geometrically register large photo col-lections from the Internet and other sources,and to use the resulting3D camera and scene information to facili-tate a number of applications in visualization,localization, image browsing,and other areas.This section provides an overview of our approach and summarizes the rest of the paper.The primary technical challenge is to robustly match and reconstruct3D information from hundreds or thousands of images that exhibit large variations in viewpoint,illumina-tion,weather conditions,resolution,etc.,and may contain significant clutter and outliers.This kind of variation is what makes Internet imagery(i.e.,images returned by Internet image search queries from sites such as Flickr and Google) so challenging to work with.In tackling this problem,we take advantage of two recent breakthroughs in computer vision,namely feature-matching and structure from motion,as reviewed in Sect.2.The back-bone of our work is a robust SfM approach that reconstructs 3D camera positions and sparse point geometry for large datasets and has yielded reconstructions for dozens of fa-mous sites ranging from Notre Dame Cathedral to the Great Wall of China.Section4describes this approach in detail, as well as methods for aligning reconstructions to satellite and map data to obtain geo-referenced camera positions and geometry.One of the most exciting applications for these recon-structions is3D scene visualization.However,the sparseInt J Comput Vispoints produced by SfM methods are by themselves very limited and do not directly produce compelling scene ren-derings.Nevertheless,we demonstrate that this sparse SfM-derived geometry and camera information,along with mor-phing and non-photorealistic rendering techniques,is suffi-cient to provide compelling view interpolations as described in5.Leveraging this capability,Section6describes a novel photo explorer interface for browsing large collections of photographs in which the user can virtually explore the3D space by moving from one image to another.Often,we are interested in learning more about the con-tent of an image,e.g.,“which statue is this?”or“when was this building constructed?”A great deal of annotated image content of this form already exists in guidebooks,maps,and Internet resources such as Wikipedia8and Flickr.However, the image you may be viewing at any particular time(e.g., from your cell phone camera)may not have such annota-tions.A key feature of our system is the ability to transfer annotations automatically between images,so that informa-tion about an object in one image is propagated to all other images that contain the same object(Sect.7).Section8presents extensive results on11scenes,with visualizations and an analysis of the matching and recon-struction results for these scenes.We also briefly describe Photosynth,a related3D image browsing tool developed by Microsoft Live Labs that is based on techniques from this paper,but also adds a number of interesting new elements. Finally,we conclude with a set of research challenges for the community in Sect.9.4Reconstructing Cameras and Sparse GeometryThe visualization and browsing components of our system require accurate information about the relative location,ori-entation,and intrinsic parameters such as focal lengths for each photograph in a collection,as well as sparse3D scene geometry.A few features of our system require the absolute locations of the cameras,in a geo-referenced coordinate frame.Some of this information can be provided with GPS devices and electronic compasses,but the vast majority of existing photographs lack such information.Many digital cameras embed focal length and other information in the EXIF tags of imagefiles.These values are useful for ini-tialization,but are sometimes inaccurate.In our system,we do not rely on the camera or any other piece of equipment to provide us with location,orientation, or geometry.Instead,we compute this information from the images themselves using computer vision techniques.We first detect feature points in each image,then match feature points between pairs of images,andfinally run an iterative, 8Wikipedia,.robust SfM procedure to recover the camera parameters.Be-cause SfM only estimates the relative position of each cam-era,and we are also interested in absolute coordinates(e.g., latitude and longitude),we use an interactive technique to register the recovered cameras to an overhead map.Each of these steps is described in the following subsections.4.1Keypoint Detection and MatchingThefirst step is tofind feature points in each image.We use the SIFT keypoint detector(Lowe2004),because of its good invariance to image transformations.Other feature de-tectors could also potentially be used;several detectors are compared in the work of Mikolajczyk et al.(2005).In addi-tion to the keypoint locations themselves,SIFT provides a local descriptor for each keypoint.A typical image contains several thousand SIFT keypoints.Next,for each pair of images,we match keypoint descrip-tors between the pair,using the approximate nearest neigh-bors(ANN)kd-tree package of Arya et al.(1998).To match keypoints between two images I and J,we create a kd-tree from the feature descriptors in J,then,for each feature in I wefind the nearest neighbor in J using the kd-tree.For efficiency,we use ANN’s priority search algorithm,limiting each query to visit a maximum of200bins in the tree.Rather than classifying false matches by thresholding the distance to the nearest neighbor,we use the ratio test described by Lowe(2004):for a feature descriptor in I,wefind the two nearest neighbors in J,with distances d1and d2,then accept the match if d1d2<0.6.If more than one feature in I matches the same feature in J,we remove all of these matches,as some of them must be spurious.After matching features for an image pair(I,J),we robustly estimate a fundamental matrix for the pair us-ing RANSAC(Fischler and Bolles1981).During each RANSAC iteration,we compute a candidate fundamental matrix using the eight-point algorithm(Hartley and Zis-serman2004),normalizing the problem to improve robust-ness to noise(Hartley1997).We set the RANSAC outlier threshold to be0.6%of the maximum image dimension,i.e., 0.006max(image width,image height)(about six pixels for a1024×768image).The F-matrix returned by RANSAC is refined by running the Levenberg-Marquardt algorithm(No-cedal and Wright1999)on the eight parameters of the F-matrix,minimizing errors for all the inliers to the F-matrix. Finally,we remove matches that are outliers to the recov-ered F-matrix using the above threshold.If the number of remaining matches is less than twenty,we remove all of the matches from consideration.Afterfinding a set of geometrically consistent matches between each image pair,we organize the matches into tracks,where a track is a connected set of matching key-points across multiple images.If a track contains more thanInt J Comput VisFig.1Photo connectivity graph.This graph contains a node for each image in a set of photos of the Trevi Fountain, with an edge between each pair of photos with matching features.The size of a node is proportional to its degree.There are two dominant clusters corresponding to day(a)and night time(d)photos.Similar views of the facade cluster together in the center,while nodes in the periphery,e.g.,(b) and(c),are more unusual(often close-up)viewsone keypoint in the same image,it is deemed inconsistent. We keep consistent tracks containing at least two keypoints for the next phase of the reconstruction procedure.Once correspondences are found,we can construct an im-age connectivity graph,in which each image is a node and an edge exists between any pair of images with matching features.A visualization of an example connectivity graph for the Trevi Fountain is Fig.1.This graph embedding was created with the neato tool in the Graphviz toolkit.9Neato represents the graph as a mass-spring system and solves for an embedding whose energy is a local minimum.The image connectivity graph of this photo set has sev-eral distinct features.The large,dense cluster in the cen-ter of the graph consists of photos that are all fairly wide-angle,frontal,well-lit shots of the fountain(e.g.,image(a)). Other images,including the“leaf”nodes(e.g.,images(b) and(c))and night time images(e.g.,image(d)),are more loosely connected to this core set.Other connectivity graphs are shown in Figs.9and10.4.2Structure from MotionNext,we recover a set of camera parameters(e.g.,rotation, translation,and focal length)for each image and a3D lo-cation for each track.The recovered parameters should be consistent,in that the reprojection error,i.e.,the sum of dis-tances between the projections of each track and its corre-sponding image features,is minimized.This minimization problem can formulated as a non-linear least squares prob-lem(see Appendix1)and solved using bundle adjustment. Algorithms for solving this non-linear problem,such as No-cedal and Wright(1999),are only guaranteed tofind lo-cal minima,and large-scale SfM problems are particularly prone to getting stuck in bad local minima,so it is important 9Graphviz—graph visualization software,/.to provide good initial estimates of the parameters.Rather than estimating the parameters for all cameras and tracks at once,we take an incremental approach,adding in one cam-era at a time.We begin by estimating the parameters of a single pair of cameras.This initial pair should have a large number of matches,but also have a large baseline,so that the ini-tial two-frame reconstruction can be robustly estimated.We therefore choose the pair of images that has the largest num-ber of matches,subject to the condition that those matches cannot be well-modeled by a single homography,to avoid degenerate cases such as coincident cameras.In particular, wefind a homography between each pair of matching im-ages using RANSAC with an outlier threshold of0.4%of max(image width,image height),and store the percentage of feature matches that are inliers to the estimated homogra-phy.We select the initial image pair as that with the lowest percentage of inliers to the recovered homography,but with at least100matches.The camera parameters for this pair are estimated using Nistér’s implementation of thefive point al-gorithm(Nistér2004),10then the tracks visible in the two images are triangulated.Finally,we do a two frame bundle adjustment starting from this initialization.Next,we add another camera to the optimization.We select the camera that observes the largest number of tracks whose3D locations have already been estimated, and initialize the new camera’s extrinsic parameters using the direct linear transform(DLT)technique(Hartley and Zisserman2004)inside a RANSAC procedure.For this RANSAC step,we use an outlier threshold of0.4%of max(image width,image height).In addition to providing an estimate of the camera rotation and translation,the DLT technique returns an upper-triangular matrix K which can 10We only choose the initial pair among pairs for which a focal length estimate is available for both cameras,and therefore a calibrated rela-tive pose algorithm can be used.。
基于多通道图像深度学习的恶意代码检测
2021⁃04⁃10计算机应用,Journal of Computer Applications2021,41(4):1142-1147ISSN 1001⁃9081CODEN JYIIDU http ://基于多通道图像深度学习的恶意代码检测蒋考林,白玮,张磊,陈军,潘志松*,郭世泽(陆军工程大学指挥控制工程学院,南京210007)(∗通信作者电子邮箱hotpzs@ )摘要:现有基于深度学习的恶意代码检测方法存在深层次特征提取能力偏弱、模型相对复杂、模型泛化能力不足等问题。
同时,代码复用现象在同一类恶意样本中大量存在,而代码复用会导致代码的视觉特征相似,这种相似性可以被用来进行恶意代码检测。
因此,提出一种基于多通道图像视觉特征和AlexNet 神经网络的恶意代码检测方法。
该方法首先将待检测的代码转化为多通道图像,然后利用AlexNet 神经网络提取其彩色纹理特征并对这些特征进行分类从而检测出可能的恶意代码;同时通过综合运用多通道图像特征提取、局部响应归一化(LRN )等技术,在有效降低模型复杂度的基础上提升了模型的泛化能力。
利用均衡处理后的Malimg 数据集进行测试,结果显示该方法的平均分类准确率达到97.8%;相较于VGGNet 方法在准确率上提升了1.8%,在检测效率上提升了60.2%。
实验结果表明,多通道图像彩色纹理特征能较好地反映恶意代码的类别信息,AlexNet 神经网络相对简单的结构能有效地提升检测效率,而局部响应归一化能提升模型的泛化能力与检测效果。
关键词:多通道图像;彩色纹理特征;恶意代码;深度学习;局部响应归一化中图分类号:TP309文献标志码:AMalicious code detection based on multi -channel image deep learningJIANG Kaolin ,BAI Wei ,ZHANG Lei ,CHEN Jun ,PAN Zhisong *,GUO Shize(Command and Control Engineering College ,Army Engineering University Nanjing Jiangsu 210007,China )Abstract:Existing deep learning -based malicious code detection methods have problems such as weak deep -level feature extraction capability ,relatively complex model and insufficient model generalization capability.At the same time ,code reuse phenomenon occurred in large number of malicious samples of the same type ,resulting in similar visual features of the code.This similarity can be used for malicious code detection.Therefore ,a malicious code detection method based on multi -channel image visual features and AlexNet was proposed.In the method ,the codes to be detected were converted into multi -channel images at first.After that ,AlexNet was used to extract and classify the color texture features of the images ,so as to detect the possible malicious codes.Meanwhile ,the multi -channel image feature extraction ,the Local Response Normalization (LRN )and other technologies were used comprehensively ,which effectively improved the generalization ability of the model with effective reduction of the complexity of the model.The Malimg dataset after equalization was used for testing ,the results showed that the average classification accuracy of the proposed method was 97.8%,and the method had the accuracy increased by 1.8%and the detection efficiency increased by 60.2%compared with the VGGNet method.Experimental results show that the color texture features of multi -channel images can better reflect the type information of malicious codes ,the simple network structure of AlexNet can effectively improve the detection efficiency ,and the local response normalization can improve the generalization ability and detection effect of the model.Key words:multi -channel image;color texture feature;malicious code;deep learning;Local Response Normalization (LRN)引言恶意代码已经成为网络空间的主要威胁来源之一。
Image-based Facade Modeling
Image-based Fac ¸ade ModelingJianxiong XiaoTian FangPing Tan ∗Peng ZhaoEyal Ofek †Long QuanThe Hong Kong University of Science and Technology ∗National University of Singapore †MicrosoftFigure 1:A few fac¸ade modeling examples from the two sides of a street with 614captured images:some input images in the bottom row,the recovered model rendered in the middle row,and three zoomed sections of the recovered model rendered in the top row.AbstractWe propose in this paper a semi-automatic image-based approach to fac ¸ade modeling that uses images captured along streets and re-lies on structure from motion to recover camera positions and point clouds automatically as the initial stage for modeling.We start by considering a building fac ¸ade as a flat rectangular plane or a developable surface with an associated texture image composited from the multiple visible images.A fac ¸ade is then decomposed and structured into a Directed Acyclic Graph of rectilinear elementary patches.The decomposition is carried out top-down by a recursive subdivision,and followed by a bottom-up merging with the detec-tion of the architectural bilateral symmetry and repetitive patterns.Each subdivided patch of the flat fac ¸ade is augmented with a depth optimized using the 3D points cloud.Our system also allows for an easy user feedback in the 2D image space for the proposed decom-position and augmentation.Finally,our approach is demonstrated on a large number of fac ¸ades from a variety of street-side images.CR Categories:I.3.5[Computer Graphics]:Computational ge-ometry and object modeling—Modeling packages;I.4.5[ImageProcessing and computer vision]:Reconstruction.Keywords:Image-based modeling,building modeling,fac ¸ade modeling,city modeling,photography.1IntroductionThere is a strong demand for the photo-realistic modeling of cities for games,movies and map services such as in Google Earth and Microsoft Virtual Earth.However,most work has been done on large-scale aerial photography-based city modeling.When we zoom to ground level,the viewing experience is often disappoint-ing,with blurry models with few details.On the other hand,many potential applications require street-level representation of cities,where most of our daily activities take place.In term of spatial con-straints,the coverage of ground-level images is close-range.More data need to be captured and processed.This makes street-side modeling much more technically challenging.The current state of the art ranges from pure synthetic methods such as artificial synthesis of buildings based on grammar rules [M¨u ller et al.2006],3D scanning of street fac ¸ades [Fr¨u h and Zakhor 2003],to image-based approaches [Debevec et al.1996].M¨u ller et al.[2007]required manual assignment of depths to the fac ¸ade as they have only one image.However,we do have information from the reconstructed 3D points to automatically infer the critical depth of each primitive.Fr¨u h and Zakhor [2003]required tedious 3D scan-ning,while Debevec et al.[1996]proposed the method for a small set of images that cannot be scaled up well for large scale modelingACM Transaction on Graphics (TOG)Proceedings of SIGGRAPH Asia 2008Figure 2:Overview of the semi-automatic approach to image-based fac ¸ade modeling.of buildings.We propose a semi-automatic method to reconstruct 3D fac ¸ademodels of high visual quality from multiple ground-level street-view images.The key innovation of our approach is the intro-duction of a systematic and automatic decomposition scheme of fac ¸ades for both analysis and reconstruction.The decomposition is achieved through a recursive subdivision that preserves the archi-tectural structure to obtain a Directed Acyclic Graph representation of the fac ¸de by both top-down subdivision and bottom-up merging with local bilateral symmetries to handle repetitive patterns.This representation naturally encodes the architectural shape prior of a fac ¸ade and enables the depth of the fac ¸ade to be optimally com-puted on the surface and at the level of the subdivided regions.We also introduce a simple and intuitive user interface that assists the user to provide feedback on fac ¸ade partition.2Related workThere is a large amount of literature on fac ¸ade,building and archi-tectural modeling.We classify these studies as rule-based,image-based and vision-based modeling approaches.Rule-based methods.The procedural modeling of buildings specifies a set of rules along the lines of L-system.The methods in [Wonka et al.2003;M¨u ller et al.2006]are typical examples of procedural modeling.In general,procedural modeling needs expert specifications of the rules and may be limited in the realism of re-sulting models and their variations.Furthermore,it is very difficult to define the needed rules to generate exact existing buildings.Image-based methods.Image-based methods use images asguide to generate models of architectures interactively.Fac ¸ade de-veloped by Debevec et al.[1996]is a seminal work in this cate-gory.However,the required manual selection of features and the correspondence in different views is tedious,and cannot be scaled up well.M¨u ller et al.[2007]used the limited domain of regular fac ¸ades to highlight the importance of the windows in an architec-tural setting with one single image to create an impressive result of a building fac ¸ade while depth is manually assigned.Although,this technique is good for modeling regular buildings,it is limited to simple repetitive fac ¸ades and cannot be applicable to street-view data as in Figure 1.Oh et al.[2001]presented an interactive sys-tem to create models from a single image.They also manually as-signed the depth based on a painting metaphor.van den Hengel et al.[2007]used a sketching approach in one (or more)image.Although this method is quite general,it is also difficult to scale up for large-scale reconstruction due to the heavy manual interac-tion.There are also a few manual modeling solutions on the market,such as Adobe Canoma,RealViz ImageModeler,Eos Systems Pho-toModeler and The Pixel Farm PFTrack,which all require tedious manual model parameterizations and point correspondences.Vision-based methods.Vision-based methods automatically re-construct urban scenes from images.The typical examples are thework in [Snavely et al.2006;Goesele et al.2007],[Cornelis et al.2008]and the dedicated urban modeling work pursued by Univer-sity of North Carolina at Chapel Hill and University of Kentucky (UNC/UK)[Pollefeys et al.2007]that resulted in meshes on dense stereo reconstruction.Proper modeling with man-made structural constraints from reconstructed point clouds and stereo data has not yet been addressed.Werner and Zisserman [2002]used line seg-ments to reconstruct buildings.Dick et al.[2004]developed 3D architectural modeling from short image sequences.The approach is Bayesian and model based,but it relies on many specific archi-tectural rules and model parameters.Lukas et al.[2006;2008]developed a complete system of urban scene modeling based on aerial images.The result looks good from the top view,but not from the ground level.Our approach is therefore complementary to their system such that the street level details are added.Fr¨u h and Zakhor [2003]also used a combination of aerial imagery,ground color and LIDAR scans to construct models of fac ¸ades.However,like stereo methods,it suffers from the lack of representation for the styles in man-made architectures.Agarwala et al.[2006]composed panoramas of roughly planar scenes without producing 3D models.3OverviewOur approach is schematized in Figure 2.SFM From the captured sequence of overlapping images,we first automatically compute the structure from motion to obtain a set of semi-dense 3D points and all camera positions.We then register the reconstruction with an existing approximate model of the buildings (often recovered from the real images)using GPS data if provided or manually if geo-registration information is not available.Fac ¸ade initialization We start a building fac ¸ade as a flat rectangular plane or a developable surface that is obtained either automatically from the geo-registered approximate building model or we manu-ally mark up a line segment or a curve on the projected 3D points onto the ground plane.The texture image of the flat fac ¸ade is com-puted from the multiple visible images of the fac ¸ade.The detection of occluding objects in the texture composition is possible thanks to the multiple images with parallaxes.Fac ¸ade decomposition A fac ¸ade is then systematically decom-posed into a partition of rectangular patches based on the horizontal and vertical lines detected in the texture image.The decomposition is carried out top-down by a recursive subdivision and followed by a bottom-up merging,with detection of the architectural bilateral symmetry and repetitive patterns.The partition is finally structured into a Directed Acyclic Graph of rectilinear elementary patches.We also allow the user to edit the partition by simply adding and removing horizontal and vertical lines.Fac¸ade augmentation Each subdivided patch of theflat fac¸ade is augmented with the depth obtained from the MAP estimation of the Markov Random Field with the data cost defined by the3D points from the structure from motion.Fac¸ade completion Thefinal fac¸ade geometry is automatically re-textured from all input images.Our main technical contribution is the introduction of a systematic decomposition schema of the fac¸ade that is structured into a Direct Acyclic Graph and implemented as a top-down recursive subdivi-sion and bottom-up merging.This representation strongly embeds the architectural prior of the fac¸ades and buildings into different stages of modeling.The proposed optimization for fac¸ade depth is also unique in that it operates in the fac¸ade surface and in the super-pixel level of a whole subdivision region.4Image CollectionImage capturing We use a camera that usually faces orthogonal to the building fac¸ade and moves laterally along the streets.The camera should preferably be held straight and the neighboring two views should have sufficient overlapping to make the feature corre-spondences computable.The density and the accuracy of the recon-structed points vary,depending on the distance between the camera and the objects,and the distance between the neighboring viewing positions.Structure from motion Wefirst compute point correspondences and structure from motion for a given sequence of images.There are standard computer vision techniques for structure from mo-tion[Hartley and Zisserman2004].We use the approach described in[Lhuillier and Quan2005]to compute the camera poses and a semi-dense set of3D point clouds in space.This technique is used because it has been shown to be robust and capable of providingsufficient point clouds for object modelingpurposes.(a)(b)(c)Figure3:A simple fac¸ade can be initialized from aflat rectangle (a),a cylindrical portion(b)or a developable surface(c).5Fac¸ade InitializationIn this paper,we consider that a fac¸ade has a dominant planar struc-ture.Therefore,a fac¸ade is aflat plane with a depthfield on the plane.We also expect and assume that the depth variation within a simple fac¸ade is moderate.A real building fac¸ade having complex geometry and topology could therefore be broken down into mul-tiple simple fac¸ades.A building is merely a collection of fac¸ades, and a street is a collection of buildings.The dominant plane of the majority of the fac¸ades isflat,but it can be curved sometimes as well.We also consider the dominant surface structure to be any cylinder portion or any developable surface that can be swept by a straight line as illustrated in Figure3.To ease the description,but without loss of generality,we use aflat fac¸ade in the remainder of the paper.For the developable surface,the same methods as forflat fac¸ades in all steps are used,with trivial surface parameterizations. Some cylindrical fac¸ade examples are given in the experiments.Algorithm1Photo Consistency Check For Occlusion Detection Require:A set of N image patches P={p1,p2,...,p N}cor-responding to the projections{x i}of the3D point X. Require:η∈[0,1]to indicate when two patches are similar.1:for all p i∈P do2:s i←0⊲Accumulated similarity for p i 3:for all p j∈P do4:s ij←NCC(p i,p j)5:if s ij>ηthen s i←s i+s ij6:end if7:end for8:end for9: n←arg max i s i⊲ n is the patch with best support 10:V←∅⊲V is the index set with visible projection 11:O←∅⊲V is the index set with occluded projection 12:for all p i∈P do13:if s i n>ηthen V←V∪{i}14:else O←O∪{i}15:end if16:end for17:return V and O5.1Initial Flat RectangleThe reference system of the3D reconstruction can be geo-registered using GPS data of the camera if available or using an interactive technique.Illustrated in Figure2,the fac¸ade modeling process can begin with an existing approximate model of the build-ings often reconstructed from areal images,such as publicly avail-able from Google Earth and Microsoft Virtual Earth.Alternatively, if no such approximate model exists,a simple manual process in the current implementation is used to segment the fac¸ades,based on the projections of the3D points to the groundfloor.We draw a line segment or a curve on the ground to mark up a fac¸ade plane as aflat rectangle or a developable surface portion.The plane or surface position is automaticallyfitted to the3D points or manually adjusted if necessary.5.2Texture CompositionThe geometry of the fac¸ade is initialized as aflat ually, a fac¸ade is too big to be entirely observable in one input image.We first compose a texture image for the entire rectangle of the fac¸ade from the input images.This process is different from image mo-saic,as the images have parallax,which is helpful for removing the undesired occluding objects such as pedestrians,cars,trees,tele-graph poles and trash cans,that lies in front of the target fac¸ade. Furthermore,the fac¸ade plane position is known,compared with an unknown spatial position in stereo algorithms.Hence,the photo consistency constraint is more efficient and robust for occluding object removal,with a better texture image than a pure mosaic. Multi-view occlusion removal As in many multiple view stereo methods,photo consistency is defined as follows.Consider a 3D point X=(x,y,z,1)′with color c.If it has a projection, x i=(u i,v i,1)′=P i X in the i-th camera P i,under the Lam-bertian surface assumption,the projection x i should also have the same color,c.However,if the point is occluded by some other ob-jects in this camera,the color of the projection is usually not the same as c.Note that c is unknown.Assuming that point X is visible from multiple cameras,I={P i},and occluded by some objects in the other cameras,I′={P j},then the color,c i,of the projections in I should be the same as c,while it may be differ-ent from the color,c j,of projections in I′.Now,given a set of projection colors,{c k},the task is to identify a set,O,of the oc-(a)Indicate(b)Remove(c)Inpaint(d)Guide(e)ResultFigure4:Interactive texture refinement:(a)drawn strokes on theobject to indicate removal.(b)the object is removed.(c)automati-cally inpainting.(d)some green lines drawn to guide the structure.(e)better result achieved with the guide lines.cluded cameras.In most situations,we can assume that point X isvisible from most of the cameras.Under this assumption,we have c≈median k{c k}.Given the estimated color of the3D point c,it is now very easy to identify the occluded set,O,according to theirdistances with c.To improve the robustness,instead of a singlecolor,the image patches centered at the projections are used,andpatch similarity,normalized cross correlation(NCC),is used as ametric.The details are presented in Algorithm1.In this way,withthe assumption that the fac¸ade is almost planar,each pixel of thereference texture corresponds to a point that lies on theflat fac¸ade.Hence,for each pixel,we can identify whether it is occluded in aparticular camera.Now,for a given planar fac¸ade in space,all vis-ible images arefirst sorted according to the fronto-parallelism ofthe images with respect to the given fac¸ade.An image is said tobe more fronto-parallel if the projected surface of the fac¸ade in theimage is larger.The reference image isfirst warped from the mostfronto-parallel image,then from the lesser ones according to thevisibility of the point.Inpainting In each step,due to existence of occluding objects,some regions of the reference texture image may still be left empty.In a later step,if an empty region is not occluded and visible fromthe new camera,the region isfilled.In this way of a multi-viewinpainting,the occluded region isfilled from each single camera.At the end of the process,if some regions are still empty,a nor-mal image inpainting technique is used tofill it either automatically[Criminisi et al.2003]or interactively as described in Section5.3.Since we have adjusted the cameras according to the image corre-spondences during bundle adjustment of structure from motion,thissimple mosaic without explicit blending can already produce veryvisually pleasing results.5.3Interactive RefinementAs shown in Figure4,if the automatic texture composition result isnot satisfactory,a two-step interactive user interface is provided forrefinement.In thefirst step,the user can draw strokes to indicatewhich object or part of the texture is undesirable as in Figure4(a).The corresponding region is automatically extracted based on theinput strokes as in Figure4(b)using the method in[Li et al.2004].The removal operation can be interpreted as that the most fronto-parallel and photo-consistent texture selection,from the result ofAlgorithm1,is not what the user wants.For each pixel, n fromLine9of Algorithm1and V should be wrong.Hence,P is up-dated to exclude V:P←O.Then,if P=∅,Algorithm1isrun again.Otherwise,image inpainting[Criminisi et al.2003]isused for automatically inpainting as in Figure4(c).In the secondstep,if the automatic texturefilling is poor,the user can manuallyspecify important missing structural information by extending a fewcurves or line segments from the known to the unknown regions asin Figure4(d).Then,as in[Sun et al.2005],image patches are syn-thesized along these user-specified curves in the unknown regionusing patches selected around the curves in the known region byLoopy Belief Propagation tofind the optimal patches.After com-pleting the structural propagation,the remaining unknownregions(a)Input(b)Structure(c)WeightACB DEHF G(d)SubdivideM(e)Merge Figure5:Structure preserving subdivision.The hidden structure of the fac¸ade is extracted out to form a grid in(b).Such hypotheses are evaluated according to the edge support in(c),and the fac¸ade is recursively subdivided into several regions in(d).Since there is not enough support between Regions A,B,C,D,E,F,G,H,they are all merged into one single region M in(e).arefilled using patch-based texture synthesis as in Figure4(e).6Fac¸ade DecompositionBy decomposing a fac¸ade we try to best describe the faced struc-ture,by segmenting it to a minimal number of elements.The fac¸ades that we are considering inherit the natural horizontal and vertical directions by construction.In thefirst approximation,we may take all visible horizontal and vertical lines to construct an ir-regular partition of the fac¸ade plane into rectangles of various sizes. This partition captures the global rectilinear structure of the fac¸ades and buildings and also keeps all discontinuities of the fac¸ade sub-structures.This usually gives an over-segmentation of the image into patches.But this over-segmentation has several advantages. The over-segmenting lines can also be regarded as auxiliary lines that regularize the compositional units of the fac¸ades and buildings. Some’hidden’rectilinear structures of the fac¸ade during the con-struction can also be rediscovered by this over-segmentation pro-cess.6.1Hidden Structure DiscoveryTo discover the structure inside the fac¸ade,the edge of the reference texture image isfirst detected[Canny1986].With such edge maps, Hough transform[Duda and Hart1972]is used to recover the lines. To improve the robustness,the direction of the Hough transform is constrained to only horizontal and vertical,which happens in most architectural fac¸ades.The detected lines now form a grid to parti-tion the whole reference image,and this grid contains many non-overlapping short line segments by taking intersections of Hough lines as endpoints as in Figure5(b).These line segments are now the hypothesis to partition the fac¸ade.The Hough transformation is good for structure discovery since it can extract the hidden global information from the fac¸ade and align line segments to this hidden structure.However,some line segments in the formed grid may not really be a partition boundary between different regions.Hence,the weight,w e,is defined for each line segment,e,to indicate the like-lihood that this line segment is a boundary of two different regions as shown in Figure5(c).This weight is computed as the number of edge points from the Canny edge map covered by the line segment.Remark on over-segmented partition It is true that the current partition schema is subject to segmentation parameters.But it is important to note that usually a slightly over-segmented partition is not harmful for the purpose of modeling.A perfect partition cer-tainly eases the regularization of the fac¸ade augmentation by depth as presented in the next section.Nevertheless,an imperfect,partic-ularly a slight over-segmented partition,does not affect the model-ing results when the3D points are dense and the optimization works well.(a)Edge weight support(b)Regional statistics supportFigure 6:Merging support evaluation.6.2Recursive SubdivisionGiven a region,D ,in the texture image,it is divided into two sub rectangular regions,D 1and D 2,such that D =D 1∪D 2,by a line segment L with strongest support from the edge points.After D is subdivided into two separate regions,the subdivision procedures continue on the two regions,D 1and D 2,recursively.The recursive subdivision procedure is stopped if either the target region,D ,is too small to be subdivided,or there is not enough support for a division hypothesis,i.e.,region D is very smooth.For a fac ¸ade,the bilateral symmetry about a vertical axis may not exist for the whole fac ¸ade,but it exists locally and can be used for more robust subdivision.First,for each region,D ,the NCC score,s D ,of the two halves,D 1and D 2,vertically divided at the center of D is computed.If s D >η,region D is considered to have bilateral symmetry.Then,the edge map of D 1and D 2are averaged,and subdivision is recursively done on D 1only.Finally,the subdivision in D 1is reflected across the axis to become the subdivision of D 2,and merged the two subdivisions into the subdivision of D .Recursive subdivision is good to preserve boundaries for man-made structural styles.However,it may produce some unnecessary fragments for depth computation and rendering as in Figure 5(d).Hence,as a post-processing,if two neighboring leaf subdivision re-gions,A and B ,has not enough support,s AB ,to separate them,they are merged into one region.The support,s AB ,to separate two neighbor regions,A and B ,is defined to be the strongest weight of all the line segments on the border between A and B :s AB =max e {w e }.However,the weights of line segments can only offer a local image statistic on the border.To improve the ro-bustness,a dual information region statistic between A and B can be used more globally.As in Figure 6,Since regions A and B may not have the same size,this region statistic similarity is defined as follows:First,an axis is defined on the border between A and B ,and region B is mirrored on this axis to have a region,−→B .The over-lapped region,A ∩−→B between A and −→B is defined to be the pixelsfrom A with locations inside −→B .In a similar way,←−A ∩B containsthe pixels from B with locations inside ←−A ,and then it is mirrored tobecome −−−−→←−A ∩B according the the same axis.The normalized crosscorrelation (NCC)between A ∩−→B and −−−−→←−A ∩B is used to define the regional similarity of A and B .In this way,only the symmetric part of A and B is used for region comparison.Therefore,the effect of the other far-away parts of the region is avoided,which will happen if the size of A and B is dramatically different and global statistics,such as the color histogram,are used.Weighted by a parameter,κ,the support,s AB ,to separate two neighboring regions,A and B ,is now defined ass AB =max e{w e }−κNCC (A ∩−→B ,−−−−→←−A ∩B ).Note that the representation of the fac ¸ade is a binary recursive tree before merging and a Directed Acyclic Graph (DAG)after region merging.The DAG representation can innately support the Level of Detail rendering technique.When great details are demanded,the rendering engine can go down the rendering graph to expand all detailed leaves and render them correspondingly.Vice versa,the(x 1,y 1)(x 4,y 4)(x 2,y 2)(x 3,y 3)(a)Fac ¸ade(b)DAGFigure 7:A DAG for repetitive pattern representation.intermediate node is rendered and all its descendents are pruned atrendering time.6.3Repetitive Pattern RepresentationThe repetitive patterns of a fac ¸ade locally exist in many fac ¸ades and most of them are windows.[M¨u ller et al.2007]used a compli-cated technique for synchronization of subdivisions between differ-ent windows.To save storage space and to ease the synchroniza-tion task,in our method,only one subdivision representation for the same type of windows is maintained.Precisely,a window tem-plate is first detected by a trained model [Berg et al.2007]or man-ually indicated on the texture images.The templates are matched across the reference texture image using NCC as the measurement.If good matches exist,they are aligned to the horizontal or vertical direction by a hierarchical clustering,and the Canny edge maps on these regions are averaged.During the subdivision,each matched region is isolated by shrinking a bounding rectangle on the average edge maps until it is snapped to strong edges,and it is regarded as a whole leaf region.The edges inside these isolated regions should not affect the global structure,and hence these edge points are not used during the global subdivision procedure.Then,as in Figure 7,all the matched leaf regions are linked to the root of a common subdivision DAG for that type of window,by introducing 2D trans-lation nodes for the pivot position.Recursive subdivision is again executed on the average edge maps of all matched regions.To pre-serve photo realism,the textures in these regions are not shared and only the subdivision DAG and their respective depths are shared.Furthermore,to improve the robustness of the subdivision,the ver-tical bilateral symmetric is taken as a hard constraint for windows.6.4Interactive Subdivision RefinementIn most situations,the automatic subdivision works satisfactorily.If the user wants to refine the subdivision layout further,three line op-erations and two region operations are provided.The current auto-matic subdivision operates on the horizontal and vertical directions for robustness and simplicity.The fifth ‘carve’operator allows the user to sketch arbitrarily shaped objects manually,which appear less frequently,to be included in the fac ¸ade representation.AddIn an existing region,the user can sketch a stroke to indi-cate the partition as in Figure 8(a).The edge points near the stroke are forced to become salient,and hence the subdivision engine can figure the line segment out and partition the region.DeleteThe user can sketch a zigzag stroke to cross out a linesegment as in Figure 8(b).ChangeThe user can first delete the partition line segments andthen add a new line segment.Alternatively,the user can directly sketch a stroke.Then,the line segment across by the stroke will be deleted and a new line segment will be constructed accordingly as in Figure 8(c).After the operation,all descendants with the target。
reconstruction-based models
reconstruction-based models Reconstruction-based models, also known as generative models, have gained significant attention in the field of machine learning. These models aim to understand the underlying structure of observed data and then use this information to generate new samples that resemble the original data. In this article, we will explore the concept of reconstruction-based models, their applications, and the step-by-step process involved in utilizing such models.1. Introduction to Reconstruction-Based Models: Reconstruction-based models are a class of generative models that learn the probability distribution of observed data and use this knowledge to generate new samples. These models are particularly useful in scenarios where the underlying structure of the data is complex and difficult to model directly.2. Applications of Reconstruction-Based Models: Reconstruction-based models find applications in various fields, including:a. Image Generation: These models can learn to generate realisticimages by capturing the patterns and features present in a given dataset.b. Anomaly Detection: By learning the "normal" distribution of a dataset, reconstruction-based models can identify deviations or anomalies, which can be useful in detecting fraud or anomalies in medical diagnostics.c. Data Imputation: In cases where data is missing, reconstruction-based models can be used to fill in the gaps by generating plausible values based on the observed data.3. The Step-by-Step Process of Utilizing Reconstruction-Based Models:Step 1: Data Preprocessing:The first step in utilizing reconstruction-based models is to preprocess the data. This involves cleaning the data, handling missing values, encoding categorical variables, and normalizing numerical features. Data preprocessing ensures that the input is in a suitable format for the reconstruction-based model.Step 2: Designing the Architecture:The next step is to design the architecture of the reconstruction-based model. This typically involves selecting a suitable neural network architecture, such as a Variational Autoencoder (VAE) or a Generative Adversarial Network (GAN). The choice of the architecture depends on the specific task and requirements.Step 3: Training the Model:Once the architecture is finalized, the model needs to be trained on the input data. The training process involves optimizing the model's parameters to minimize the reconstruction error between the original data and the generated samples. Various optimization techniques, such as gradient descent, are employed to update the model's parameters iteratively.Step 4: Evaluating the Model:After training, the model's performance needs to be evaluated. This can be done by measuring the reconstruction error on a separate validation set. Additionally, subjective evaluations, such as visual inspections or domain-specific metrics, can also be used to assess the quality of the generated samples.Step 5: Generating New Samples:Once the model is trained and evaluated, it can be used to generate new samples. By sampling from the learned probability distribution, the model can produce new data instances that resemble the original dataset.4. Advantages and Limitations of Reconstruction-Based Models: Reconstruction-based models offer several advantages, including their ability to capture complex patterns in data, handle missing values, and generate new samples. However, they also have limitations, such as difficulties in modeling high-dimensional data and the potential for overfitting when the model is too complex.In conclusion, reconstruction-based models play a pivotal role in machine learning by learning the underlying probability distribution of data and generating new samples. They have a wide range of applications and involve a step-by-step process that includes data preprocessing, architecture design, model training, evaluation, and sample generation. By leveraging the power of reconstruction-based models, researchers and practitioners cangain insights into complex datasets and generate new data that closely resembles the original.。
计算机视觉常用术语中英文对照
计算机视觉常用术语中英文对照(1)2011-06-08 21:26人工智能 Artificial Intelligence认知科学与神经科学Cognitive Science and Neuroscience 图像处理Image Processing计算机图形学Computer graphics模式识别Pattern Recognized图像表示Image Representation立体视觉与三维重建Stereo Vision and 3D Reconstruction 物体(目标)识别Object Recognition运动检测与跟踪Motion Detection and Tracking边缘edge边缘检测detection区域region图像分割segmentation轮廓与剪影contour and silhouette纹理texture纹理特征提取feature extraction颜色color局部特征local features or blob尺度scale摄像机标定Camera Calibration立体匹配stereo matching图像配准Image Registration特征匹配features matching物体识别Object Recognition人工标注Ground-truth自动标注Automatic Annotation运动检测与跟踪Motion Detection and Tracking 背景剪除Background Subtraction背景模型与更新background modeling and update运动跟踪Motion Tracking多目标跟踪multi-target tracking颜色空间color space色调Hue色饱和度Saturation明度Value颜色不变性Color Constancy(人类视觉具有颜色不变性)照明illumination反射模型Reflectance Model明暗分析Shading Analysis成像几何学与成像物理学Imaging Geometry and Physics全像摄像机Omnidirectional Camera激光扫描仪Laser Scanner透视投影Perspective projection正交投影Orthopedic projection表面方向半球Hemisphere of Directions立体角solid angle透视缩小效应foreshortening辐射度radiance辐照度irradiance亮度intensity漫反射表面、Lambertian(朗伯)表面diffuse surface 镜面Specular Surfaces漫反射率diffuse reflectance明暗模型Shading Models环境光照ambient illumination互反射interreflection反射图Reflectance Map纹理分析Texture Analysis元素elements基元primitives纹理分类texture classification从纹理中恢复图像shape from texture 纹理合成synthetic图形绘制graph rendering图像压缩image compression统计方法statistical methods结构方法structural methods基于模型的方法model based methods 分形fractal自相关性函数autocorrelation function 熵entropy能量energy对比度contrast均匀度homogeneity上下文约束contextual constraintsGibbs随机场吉布斯随机场边缘检测、跟踪、连接Detection、Tracking、LinkingLoG边缘检测算法(墨西哥草帽算子)LoG=Laplacian of Gaussian 霍夫变化Hough Transform链码chain codeB-样条B-spline有理B-样条Rational B-spline非均匀有理B-样条Non-Uniform Rational B-Spline控制点control points节点knot points基函数basis function控制点权值weights曲线拟合curve fitting逼近approximation回归Regression主动轮廓Active Contour Model or Snake 图像二值化Image thresholding连通成分connected component数学形态学mathematical morphology结构元structuring elements膨胀Dilation腐蚀Erosion开运算opening闭运算closing聚类clustering分裂合并方法split-and-merge区域邻接图region adjacency graphs四叉树quad tree区域生长Region Growing过分割over-segmentation分水岭watered金字塔pyramid亚采样sub-sampling尺度空间Scale Space局部特征Local Features背景混淆clutter遮挡occlusion角点corners强纹理区域strongly textured areas 二阶矩阵Second moment matrix 视觉词袋bag-of-visual-words类内差异intra-class variability类间相似性inter-class similarity生成学习Generative learning判别学习discriminative learning人脸检测Face detection弱分类器weak learners集成分类器ensemble classifier被动测距传感passive sensing多视点Multiple Views稠密深度图dense depth稀疏深度图sparse depth视差disparity外极epipolar外极几何Epipolor Geometry校正Rectification归一化相关NCC Normalized Cross Correlation平方差的和SSD Sum of Squared Differences绝对值差的和SAD Sum of Absolute Difference俯仰角pitch偏航角yaw扭转角twist高斯混合模型Gaussian Mixture Model运动场motion field光流optical flow贝叶斯跟踪Bayesian tracking粒子滤波Particle Filters颜色直方图color histogram尺度不变特征转换SIFT scale invariant feature transform 孔径问题Aperture problem/view/77fb81ddad51f01dc281f1a7.html/quotes/txt/2007-09/06/content_75057.htm /message/message1.html/90001/90776/90883/7342346.html。
实景三维建模的流程和方法英语
实景三维建模的流程和方法英语## Reality Capture 3D Modeling: Workflow and Methods.Introduction.Reality capture 3D modeling is a technique for creating accurate, high-resolution 3D models of real-world objects and environments. By leveraging advanced imaging and data processing methods, reality capture technology can produce digital representations that are highly faithful to the original source.Workflow.The workflow for reality capture 3D modeling typically involves the following steps:1. Data Acquisition: Capturing data about the target object or environment using various methods such as photogrammetry, laser scanning, or structured lightscanning.2. Data Processing: Preprocessing raw data to enhance image quality, remove noise, and align multiple scans.3. Reconstruction: Creating a 3D mesh from the processed data, which represents the shape and geometry of the target object.4. Texture Mapping: Adding textures to the 3D mesh using captured images or photographs.5. Optimization: Refining and optimizing the 3D model for specific applications or requirements.Methods.There are several methods used in reality capture 3D modeling:1. Photogrammetry:Utilizes overlapping photographs taken from different angles to reconstruct a 3D model. It is cost-effective and versatile, suitable for creating models of small and large objects.2. Laser Scanning:Employs a laser scanner to emit pulses of light and measure the time taken for them to bounce back. It provides highly accurate geometry data, but can be time-consuming and expensive.3. Structured Light Scanning:Projects patterns of light onto the target object and analyzes the deformation to generate a 3D model. It is non-contact and can capture intricate details, but requires specialized equipment.Applications.Reality capture 3D modeling has numerous applicationsin various industries, including:Architecture and Engineering: Creating digital twins of buildings and structures for planning, design, and renovation purposes.Manufacturing: Reverse-engineering existing products, designing new prototypes, and improving production processes.Archaeology and Cultural Heritage Preservation: Documenting historical sites, artifacts, and cultural landmarks for preservation and research.Healthcare: Planning and executing surgeries, creating custom prosthetics, and studying human anatomy.Entertainment and Gaming: Developing realistic models for movies, video games, and virtual environments.## 实景三维建模,流程和方法。
Multi-spectral Imaging Using LED Illuminations 多光谱成像系统
2012 5th International Congress on Image and Signal Processing (CISP 2012) Multi-spectral Imaging Using LED Illuminations Hong-ning Li, Jie Feng, Wei-ping Yang, Liang Wang, Hai-bing Xu, Peng-fei Cao, Jian-jun DuanSchool of Physics and Electronics, Yunnan Normal UniversityKunming City, Yunnan Province, China P.R, 650092Abstract- We present a multi-spectral imaging method, using five types of LED(Light Emitting Diode) light compensated with a tungsten light as the active illuminants, adapting a color camera to sample the spectral image. The imaging model of the multi-spectral imaging system is discussed, and the reflectance is reconstructed using the imaging system matrix derived from the imaging model. The imaging method is simple, low cost and requires no special equipment. The imaging accuracy and stability of the spectral reflectance reconstruction algorithm are discussed through three cases of multi-spectral imaging the Macbeth color checker, a color picture and a leaf infected with powdery mildew disease. It shows that the multi-spectral imaging system can be applied in a variety fields.Keywords- multi-spectra imaging; reflectance reconstruction; image based relight; spectrum-based recognitionI.I NTRODUCTIONMulti-spectral imaging is sensitive to spectral variations in surface reflectance which is often beyond vision, so it is an extremely wide field of research ranging from skin disease [1] detection and food contamination [2] to color measurement [3] and material classification [4].Multi-spectral imaging system can be divided into two categories of dispersive and interference [5] by the spectral separation technology it adapted. In the occasion with low spectral precision requirement, the dispersive multi-spectral imaging system would be a better choice because the system is easy to realize. Typical multi-spectral imaging system of dispersion [6] outputs the images of different spectrum through modulation the light between the scene object and camera by mounting broad or narrow band filters on the incident optical path, and the different spectral images are processed to form the multi-spectral data cube of the scene. This technology may cause image displacement or imaging defocus because it changes the incident light of camera. Recently, researchers began to develop a new multi-spectral imaging method by modulation the spectrum of illuminant [7, 8], and this method gradually draw a lot of attention as a compensator for traditional method.This paper presents our multi-spectral imaging system based LED (Light Emitting Diode) lighting, the mathematical model of the imaging system is discussed, and the reflectance and color reconstruction algorithm are deduced. Finally, the test result and application of the spectral-imaging system are shown by three examples. In this paper we focus on a new method of multi-spectral imaging. In particular, we are interested in modeling, reconstruction of reflectance and image based rendering.II.M ETHODThe basic principle of our method is shown in Fig. 1. The objects in the scene are lit under a set of LED illuminations; and the image of the scene is obtained using a RGB color camera by collecting light reflected from the objects. This synchronized process is carried out automatically by a computer controller. The multi-spectral image of the scene is obtained by overlaying the images under different LED light source, and the relit image or the spectral reflectance are obtained by relit program and spectral reflectancereconstruction method respectively.Figure 1. Illustration of our active multispectral imaging system.The object is lit with a set of distinct illuminations and a synchronized RGB camera captures the corresponding images. By activating the illuminations one by one and processing the acquired imaging synchronously, we obtain a multi-spectral image of the scene.A conventional multi-spectral imaging system modulates the incoming light using filters with different transmission spectra. This makes the decoupling of the reflectance properties of the material and the incident light very difficult, it also result the side effects such as the position mismatch, the imaging focal length change and uneven brightness. In contrast, our multi-spectral imaging system uses different LED light source with known spectrum, it do not change the optical path between scene and camera, and this enables a significantly reduction of the effect of imaging preprocessing. In addition, the method control the light source by electricity, no mechanical drive and complex optical configuration, which could further improve the stability of the imaging.The problems of this method are: (a) Different LED light sources can not completely overlap in position, which leads todifferent light conditions under different light sources. In order to reduce the impact, we interval distributed the different LED beads on a circuit board. (b)LED light source color is fixed, it hard to get LED with any wavelength distribution we hope from the market; (c)Due to the weak intensity of LED lighting, it is difficult to apply in the large-scale scenes.A. Imaging ModelThe imaging process can be modeled as a radiometric transfer process, further more, assuming that no shadows are casted by the light source (coaxial illumination or flat world model), the radiometric response I mn measured at a pixel(x ,y ) in the m th channel, for the nth illumination, is given by∫=λλλλd p c s I n m mn )()()( (1) where s (λ) is the spectral reflectance of the scene point, c m (λ) is the spectral response of the camera’s m th color channel, and p n (λ) is the spectral power distribution of the n th illumination. If there are M color channels and N illuminations, then by stacking the N images together, we obtain a multiplexed multispectral image with MN channels. Eq.1 also shows thatthe camera output is linear to the reflectance of the scene pointwhen the light source and camera spectral response are given.The spectral reflectance s (λ) is determined by the material itself and characterizes the property of the object, so it is the primary task for a spectral image system to reconstruct the spectral reflectance of object from multispectral image. Further more, if scene's spectral reflectance is given, it is possible to calculate the color imaging result under different combination of any type of camera and light source. B. Reflectance ReconstructionThere are many imaging details, especially the photoelectric signal conversion process in the digital camera, are unknown, so it is still very difficult to derive the spectral reflectance of each pixel in the scene directly by Eq.1, although the every spectral of light and the spectral response of each camera channel are known. In addition, LED is a broadband light source which leads to the overlap of different LED light in the dimension of spectrum, which also leads the difficulty of reflectance reconstruction.Generally, an imaging system can be treated as a linear system which means that the image output of camera is linear to the reflectance of the scene. It is written in matrix form byFS I = (2)where I is the column matrix of output intensity for eachchannel; F is imaging system matrix, it includes light source, camera spectral response as a whole; S is the column matrix ofreflectance. Since any spectral reflectance function must be positive, the following condition must be satisfied:0≥j S (3)In out multi-spectral imaging system, the reflectance S of sample are measured by a standard equipment, the color imageunder each LED illuminant are taken by the camera whichmakes I known. Based on I and S , the transformation relationship F of our imaging system can be deduced by Eq.2. However, it is possible that the least squares solution results in a recovered spectral reflectance that is negative for some wavelengths. When F T F is invertible, we can get a least squares solution:I F F F S T T 1)(−= (4)By Eq.4, the spectral reflectance of any pixel in multi-spectral image can be estimated. This is an indirect reflectance reconstruction method. A set of calibration target is first used to build the transform between camera signals and spectral reflectance factor, and after that, camera signals of other targets can be transferred into spectral reflectance factor.If the spectral curves are arbitrarily complex, thecombination number of color channels and light sources must be large. In practice, however, the number of light sources and the color channels of camera are always limited, and we hope the number of the combination as small as possible while the accuracy requirements are met. In order to improve the reflectance reconstruction accuracy, Parkkinen [9] presented a set of orthogonal spectral basis functions b k (λ) derived from a database consisting of themeasured spectral reflectances of 1257 Munsell color chips. It was found empirically [10] that this model gives fairly accurate spectral reconstructions for a wide range of real-world materials. The model can be written as∑==s K k k k b s 1)()(λσλ (5) where σk are scalar coefficients and K s is the number of parameters of the model. By substituting Eq. 5 in Eq. 1, we get a set of equations to solve σk instead of S by the least squares optimization solution which is helped to improve the reconstruction accuracy. Similar method was also stated by Yonghui [11].C. Image Based RelightingIf the reflectance of any pixel in the scene is known, the imaging result in any light with any camera can be calculated.This process is called light rendering or relighting [12]. Calculating the imaging result from multi-spectral data cube,without the need of optical configuration, realizes the requirement of high-fidelity color reproduction. The process can be described as∫=λλλλd p c s I m m )()()( (6) where s (λ) is the spectral reflectance of a pixel in multi-spectralimage, c m (λ) is the spectral response of a camera’s m th color channel, and p (λ) is the spectral power distribution of an illumination. If c m (λ) is the CIE standard colorimetric observer spectral response curve, Eq. 6 outputs the correspondingtristimulus values XYZ. The process of relighting is carry out in a numericalcomputing environment which does not need a real light sourceand camera, or the optical imaging configuration. By this mean, we can predict or analyze the lighting characteristics of ideal standard illuminants; also we can calculate the imaging effect of a scene under any light source which spectral power distribution is measured. For any camera which spectral response is measured or given, the camera performance related color characteristic can be evaluated by the relit image directly.D.Multi-spectral Imaging ConfigurationOur light source modulation multi-spectral imaging system adapted five types of LED and a tungsten light source. The 5 types LED light is a cluster of red, amber, green, white and blue respectively, which spectral power distribution shows in Fig.2(a). The tungsten light was used to compensate the low-light conditions between 650-780nm band of the LED lights, and it can improve the accuracy of the reconstruction on the spectral band. The LED is a type of narrow band light source, while the tungsten light is a type of wide band light source, and they together formed a type of composite light source which also increases the difficulty of reflectance reconstruction.In order to ensure the stability of the illumination, constant current source were used to supply every type of LED and tungsten light. These current sources were under the control of a relay array which was driven by a computer IO board. By a control program, the light sources are opened one by one, while the camera shots an image synchronously.Figure 2. The spectral properites of our multi-spectral imaging system (a)The spectra of the 6 types of light sources; (b) The spectral responses of thethree color channels of the MVC3000 color cameraA CMOS color camera, MVC3000 made by Microview Science & Technology Co., was used to acquire image, and its RGB spectral response curves is shown in Fig.2(b). The camera provides a SDK(Software Develop Kit) which enable us to write our own camera control module, and an USB interface to transport the image data to a computer easily. The multi-spectral imaging system can output the data cube with 18channels combined with 6 groups of light source and 3 spectral responses.An opaque darkbox was made to hold the light sources andcamera together. The observed object was placed on the bottom of the box, the camera was fixed right at the top of the box, andthe light sources were also fixed at the top of the box beside the camera lighting downward to the object. The illumination anglewas about 45 degree with 2 group light source on each side of camera, the observing angle was 0 degree, and this kind of lighting condition is help for minimize the shadow of analyte.A control program was written to drive the light source relay, to acquire the image synchronized and to output the multi-spectral imaging results.The reflectance reconstruction method were described above in the section II.B, the reflectance curves of 40 color patches randomly selected from color-checks were measured by a PR715 spectrometer, and their multi-spectral images were taken using our multi-spectral imaging device. The gray values of the multi-spectral image were compared to the reflectance of the color patch to solve the imaging system matrix using Eq.2, and the spectral reflectance of any pixel was calculated using Eq.4.III.E XPERIMENT R ESULT A ND D ISCUSS Focusing on two-dimensional object, we tested our illuminant modulation multi-spectral imaging system on three aspects. First, the Macbeth color checker was used to test the accuracy of reflectance reconstruction, and the reconstruction error was also analyzed. Second, the multi-spectral image of a painting was taken, and the relit results were shown and analyzed. Finally, the disease area of a leaf surface were marked out based on a spectral based method, it showed the result of spectrum based recognition.A.Multi-spectral Imaging of Macbeth Color CheckerThe multi-spectral image of a Macbeth color checker was taken by our imaging system, and the images in 6 light sources are shown in Fig.3. The reflectances of some color patches reconstructed from the multi-spectral image are shown in Fig.4. From the comparison of measured and reconstructed reflectance in Fig.4, they are very close. It shows that the reflectance reconstruction algorithm is effective, and it also laid the foundation for color reproduction and scene analysis based on spectral reflectance.(a)Tungsten light (b)Red LED (c)Amber LED(d)White LED (e)Green LED (f)Blue LEDFigure 3. The Macbetch color checker images in 6 light sources of our multi-spectral imaging deviceSecond, we calculate the reconstructed color image of the Macbeth color checker in the standard D65 Illuminant, and contrast with the image taken by a digital camera, the results shown in Fig.5. It can be seen that the reconstructed image and the digital one, though not completely identical, is very close.Finally, 15 pixels of color patch #9 were randomly selected to reconstruct the reflectance, and the result is shown in Fig.6(a). In such circumstances, because of the noise of imaging process, the different pixel’s reflectance of the same color patch is not completely identical.Figure 4. The reconstructed and measured reflectance of some color patcheson a Macbetch Color Checker.The red dotted line is the reconstruction reflectance; the blue solid line is themeasured reflectance.(a) The reconstructed color image (b) The actual shooting color imagesFigure 5. The reconstructed and the actual shooting color images ofMacbeth color checker under D65 illuminant.The reconstructed reflectance curves in the 670-780nm band are relatively dispersed, and the reflectance in this band is determined mainly by the wide spectral band tungsten light, so we can draw the conclusion that the wide band light source is more sensitive to the noise than the narrow band light source in reflectance reconstruction, the imaging noise easily leads to the change of reconstructed reflectance in the spectral band of the wide band light source.The L*a*b* color values of the 15 pixels are calculated under a standard D65 illumination, and Fig.6(b) shows their distribution in L*a*b* color space. The maximum of the color difference among them is 8.97, and it shows the color deviation caused by imaging noise is quite obvious. It also shows that it can improve the stability and accuracy of the system byreducing the noise in the imaging process.Figure 6. The spectral reflectance curves(a) and L*a*b* color distribution(b)of 15 pixels in color patch #9B. Color Pictrue Imaging and RelightingMulti-spectral imaging a color picture [13], adapting the CIE standard colorimetric observer condition, it's reproduced color images in different illuminant are shown in Fig.7. Where, (a) is in the CIE standard illuminant D65; (b) is in the CIE standard illuminant A; and (c) and (d) are in fluorescent light F7 and F4 respectively. It can be seen from Fig.7, (a) is slightly bluish, and (b) is reddish, and the result is basically consistent to the character of the corresponding light source.(a) (b)(c) (d)Figure 7. The relit results of a painting under D65, A, F7 and F4 Illuminants. (a) is in the D65 illuminant; (b) is in the A illuminant; and (c) and (d) are influorescent light F7 and F4 respectivelyThis case shows that our multi-spectral imaging system can fulfill the basic requirements of spectral based color research, the reflectance and/or color value of any pixel in the multi-spectral data cube can be reconstructed by calculating which is close to the results of physical imaging.C. Spectrum-based analysis of Plant Leaf DiseaseUsing a spectrum-based method [14], the whole process of locating the powdery mildew disease on a poinsettia leaf is shown in Fig.8. The synthesis color image of the leaf is shown in Fig.8(a), two types of spectrum derived from the multi-spectral data cube are shown in Fig.8(b), and the disease area and the health area are shown in Fig.8(c) and Fig.8(d)respectively.(c) (d)Figure 8. Spectrum-based recognition of powdery mildew disease on apoinsettia leaf.(a) is the synthesis color image, (b) is the spectrum of healthy leaf, disease area and background, (c) is the distribution of powdery mildew disease, (d) isthe distribution of healthy area.The powdery mildew disease area shown in Fig.8(c) and the healthy leaf area shown in Fig.8(d) are complementary in image space. In Fig.8(c), there are some weak disease distributed in the healthy area which is hard to distinguish by eyes, and it shows that the method is very sensitive to the disease spectrum. Along the edge of the leaf, on the background, a slender region is identificated incorrectly which is really the background paper, and it can be easily eliminated combining with the morphology of the image. This example demonstrated that our multi-spectral imaging system can be applied in the research of spectrum-based analysis.IV. C ONCLUSIONWe presented a multi-spectral imaging system using active LED illumination, it obtains multi-spectral image through changing the incidental spectral energy distribution of light source. The main advantage of our system is it does not change the optical path between the camera and objects in the scene. The imaging system matrix is derived by solving the imaging equation in an indirect way, and the spectral reflectance of scene can be reconstructed based on the imaging system matrix.We showed that by multi-spectral imaging the Macbeth color checker, the reflectance reconstruction accuracy of our multi-spectral imaging system is acceptable. In the process of reflectance reconstruction, the wide band light is more sensitive to imaging noise than narrow band light which lowers the accuracy of reconstructed reflectance.We also showed that our multi-spectral imaging method is capable for color reproduction and spectrum-based analysis, by spectral imaging a color picture and a piece of leaf with powdery mildew disease. The relit results are consistent to the character of standard illuminant, and the disease area is separated from the leaf clearly and sensitively.A CKNOWLEDGMENTThe authors gratefully thank the support of the Chinese National Natural Science Foundation projects (No.60968001, 61168003), Yunnan Province Natural Science Foundation projects (No. 2011FZ079, 2009CD047).R EFERENCES[1] M. Sambongi, M. Igarashi, T. Obi, M. Yamaguchi, N. Ohyama, M.Kobayashi, Y. Sano, S. Yoshida, and K. Gono. “Analysis of spectral reflectance using normalization method from endoscopic spectroscopy system”, Optical Review, vol.9, pp. 238–243, June 2002.[2] J. C. Noordam, W. van den Broek, and L. Buydens. “Detection andclassification of latent defects and diseases on raw french fries with multispectral imaging”, Journal of the Science of Food and Agriculture, vol. 85(13), pp. 2249–2259, October 2005.[3] Stokman H., Gevers T., and Koenderink J., “Color measurement byimaging spectrometry”, Computer Vision and Image Understanding, vol. 79(2), pp. 236–249, August 2000.[4] Slater D. and Healey G., “Material classification for 3d objects in aerialhyperspectral images”, In Proc. computer vision and pattern recognition, Vol. 2, pp. 268–273, August 1999.[5] Garini Y, Young IT, and McNamara G., “Spectral imaging: principlesand applications”, Cytometry, vol. A69, pp. 735–747, August 2006[6] Miller P. J., “Use of tunable liquid crystal filters to link radiometric andphotometric standards”, Metrologia, vol. 28, pp. 145–149, March 1991 [7] Elza John, David Lau, and Stephanie Leung, “LED MultispectralImaging: Reconstruction of Reflectance Spectra”, unpublished, , 2008.[8] Cui Chi, Hyunjin Yoo, and Moshe Ben-Ezra., “Multi-Spectral Imagingby Optimized Wide Band Illumination”, International Journal of Computer Vision, vol. 86(2-3), pp.140–151, February 2010.[9] J. Parkkinen, J. Hallikainen, and T. Jaaskelainen, “Characteristic spectraof munsell colors”, Journal of the Optical Society of America A, vol. 6(2), pp.318–322, February 1989. [10] Park, M.H., Lee, J.I., Grossberg, M. D., and Nayar, S. K.,“Multispectral imaging using multiplexed illumination”, In ICCV, 2007. [11] Yonghui Zhao, and Roy S. Berns. “Image-Based Spectral ReflectanceReconstruction Using the Matrix R Method”, COLOR research and application, Vol. 32( 5), pp. 343-351, October 2007.[12] M. Yamaguchi, T. Teraji, K. Ohsawa, T. Uchiyama, H. Motomura, Y.Murakami, and N. Ohyama, “Color image reproduction based on the multispectral and multiprimary imaging: Experimental evaluation”, Proc. SPIE Vol. 4663, pp. 15-26, 2002.[13] Qing-Zhang You and Rong Wang, “World of IMDAC”, The press ofYunnan university, April 2003.[14] Hongning Li, Jie Feng, and Weiping Yang., “Spectrum-based methodfor quantitatively detecting diseases on cucumber leaf”, in CISP, 2011.。
数字通信中的多抽样率信号处理中英文翻译(部分)
数字通信中的多抽样率信号处理中英⽂翻译(部分)Multirate Signal Processing Concepts in Digital CommunicationsBojan VrceljIn Partial Fulfillment of the Requirementsfor the Degree ofDoctor of PhilosophyCalifornia Institute of TechnologyPasadena, California2004 (Submitted June 2, 2003)AbstractMultirate systems are building blocks commonly used in digital signal processing (DSP). Their function is to alter the rate of the discrete-time signals, which is achieved by adding or deleting a portion of the signal samples. Multirate systems play a central role in many areas of signal processing, such as filter bank theory and multiresolution theory. They are essential in various standard signal processing techniques such as signal analysis, denoising, compression and so forth. During the last decade, however, they have increasingly found applications in new and emerging areas of signal processing, as well as in several neighboring disciplines such as digital communications.The main contribution of this thesis is aimed towards better understanding of multirate systems and their use in modern communication systems. To this end, we first study a property of linear systems appearing in certain multirate structures. This property is called biorthogonal partnership and represents a terminology introduced recently to address a need for a descriptive term for such class of filters. In the thesis we especially focus on the extensions of this simple idea to the case of vector signals (MIMO biorthogonal partners) and to accommodate for nonintegral decimation ratios (fractional biorthogonal partners).Some of the main results developed here pertain to a better understanding of the biorthogonal partner relationship. These include the conditions for the existence of stable and of finite impulse response (FIR) biorthogonal partners. A major result that we establish states that under some generally mild conditions, MIMO and fractional biorthogonal partners exist. Moreover, when they exist, FIR solutions are not unique. We develop the parameterization of FIR solutions, which makes the search for the best partner in a given application analytically tractable. This proves very useful in the central application of biorthogonal partners, namely, channel equalization in digital communications with signal oversampling at the receiver. Sampling the received signal at a rate higher than that defined by the transmitter provides some flexibility in the design of the equalizer. A good channel equalizer in this context is one that helps neutralize the distortion on the signal introduced by the channel propagation but not at the expense of amplifying the channel noise. This presents the rationale behind the partner design problem which is formulated and solved. Theperformance of such equalizers is then compared to several other equalization methods by computer simulations. These findings point to the conclusion that the communication system performance can be improved at the expense of an increased implementational cost of the receiver.While the multirate DSP in the aforementioned communication systems serves to provide additional degrees of freedom in the design of the receiver, another important class of multirate structures is used at the transmitter side in order to introduce the redundancy in the data stream. This redundancy generally serves to facilitate the equalization process by forcing certain structure on the transmitted signal. If the channel is unknown, this procedure helps to identify it; if the channel is ill-conditioned, additional redundancy helpsVavoid severe noise amplification at the receiver, and so forth. In the second part of the thesis, we focus on this second group of multirate systems, derive some of their properties and introduce certain improvements of the communication systems in question.We first consider the transmission systems that introduce the redundancy in the form of a cyclic prefix. The examples of such systems include the discrete multitone (DMT) and the orthogonal frequency division multiplexing (OFDM) systems. The cyclic prefix insertion helps to effectively divide the channel in a certain number of nonoverlaping frequency bands. We study the problem of signal precoding in such systems that serves to adjust the signal properties in order to fully take advantage of the channel and noise properties across different bands. Our ultimate goal is to improve the overall system performance by minimizing the noise power at the receiver. The special case of our general solution corresponds to the white channel noise and the best precoder under these circumstances simply performs the optimal power allocation.Finally, we study a different class of communication systems with induced signal redundancy, namely, the multiuser systems based on code division multiple access (CDMA). We specifically focus on the special class of CDMA systems called `a mutually orthogonal usercode receiver' (AMOUR). These systems use the transmission redundancy to facilitate the user separation at the receiver regardless of the (different) communication channels. While the method also guarantees the existence of the zero-forcing equalizers irrespective of the channel zero locations, the performance of these equalizers can be further improved by exploiting the inherent flexibility in their design. Weshow how to find the best equalizer from the class of zero-forcing solutions and then increase the size of this class by employing alternative sampling strategies at the receiver. Our method retains the separability properties of AMOUR systems while improving their robustness in the noisy environment.Chapter 1 IntroductionThe theory of multirate digital signal processing (DSP) has traditionally been applied to the contexts of filter banks [61], [13], [50] and wavelets [31], [72]. These play a very important role in signal decomposition, analysis, modeling and reconstruction. Many areas of signal processing would be hard to envision without the use of digital filter banks. This is especially true for audio, video and image compression, digital audio processing, signal denoising, adaptive and statistical signal processing. However, multirate DSP has recently found increasing application in digital communications as well. Multirate building blocks are the crucial ingredient in many modern communication systems, for example, the discrete multitone (DMT), digital subscriber line (DSL) and the orthogonal frequency division multiplexing (OFDM) systems as well as general filter bank precoders, just to name a few. The interested reader is referred to numerous references on these subjects, such as [7]-[9], [17]-[18], [27], [30], [49], [64], [89], etc.This thesis presents a contribution to further understanding of multirate systems and their significance in digital communications. To that end, we introduce some new signal processing concepts and investigate their properties. We also consider some important problems in communications especially those that can be formulated using the multirate methodology. In this introductory chapter, we give a brief overview of the multirate systems and introduce some identities, notations and terminology that will prove useful in the rest of the thesis. Every attempt is made to make the present text as self-contained as possible and the introduction is meant to primarily serve this purpose. While some parts of the thesis, especially those that cover the theory of biorthogonal partners and their extensions provide a rather extensive treatment of the concepts, the material regarding the applications of the multirate theory in communication systems should be viewed as a contribution to a better understanding and by no means the exhaustive treatment of such systems. For a more comprehensive coverage the reader is referred to a range of extensive texts on the subject, for example, [71], [18], [19], [39], [38], [53], etc.1.1 Multirate systems 1.1.1 Basic building blocks The signals of interest in digital signal processing are discrete sequences of real or complex numbers denoted by x(n), y(n), etc. The sequence x(n) is often obtained by sampling a continuous-time signal x c(t). The majority of natural signals (like the audio signal reaching our ears or the optical signal reaching our eyes) are continuous-time. However, in order to facilitate their processing using DSP techniques, they need to be sampled and converted to digital signals. This conversion also includes signal quantization, i.e.,discretization in amplitude, however in practice it is safe to assume that the amplitude of x(n) can be any real or complexSignal processing analysis is often simplified by considering the frequency domain representation of signals and systems. Commonly used alternative representations of x(n) are its z-transform X (z) and the discrete-time Fourier transform X (O'). The z-transform is defined as X(z) = E _.x(n)z-"', and X (e j") is nothing but X(z) evaluated on the unit circle z = e3".Multirate DSP systems are usually composed of three basic building blocks, operating on a discrete-time signal x(n). Those are the linear time invariant (LTI) filter, the decimator and the expander. An LTI filter, like the one shown in Fig.1.1, is characterized by its impulse response h(n), or equivalently by its z-transform (also called the transfer function) H(z). Examples of the M-fold decimator and expander for M = 2 are shown in Fig.1.2. The rate of the signal at the output of an expander is M times higher than the rate at its input, while the converse is true for decimators. That is why the systems containing expanders and decimators are called `multirate' systems. Fig.1.2 demonstrates the behavior of the decimator andthe expander in both the time and the frequency domains.XE(z) = [X (z)]IM XD(z) = [X (z)]iM = X(z M)1 M-1 1 j2 k =M E X(z e n a)k=0for M-fold expander, and (1.1)for M-fold decimator. (1.2)The systems shown in Figs.1.1 and 1.2 operate on scalar signals and thus are called single input-single output (SISO) systems. The extensions to the case of vector signals are ratherstraightforward: the decimation and the expansion are performed on each element separately. The corresponding vector sequence decimators/expanders are denoted within square boxes in block diagrams. In Fig.1.3 this is demonstrated for vector expanders. The LTI systems operating on vector signals are called multiple input-multiple output (MIMO) systems and they are characterized by a (possibly rectangular) matrix transfer function H(z).1.1.2 Some multirate definitions and identitiesThe vector signals are sometimes obtained from the corresponding scalar signals by blocking. Conversely, the scalar signals can be recovered from the vector signals by unblocking. The blocking/unblocking operations can be defined using the delay or the advance chains [61], thus leading to two similar definitions. One way of defining these operations is shown in Fig.1.4, while the other is obtained trivially by switching the delay and the advance operators. Instead of drawing the complete delay/advance chain structure, we often use the simplified block notation as in Fig.1.4. It is usually clear from the context which of the two definitions数字通信中的多抽样率信号处理Bojan Vrcelj博⼠学位论⽂加州技术学会Pasadena, 加州2004 (委托于2003.6.2)摘要多抽样率系统普遍是被运⽤在处理数字信号⽅⾯。
零部件光学影像精准定位的轻量化深度学习网络
第 31 卷第 17 期2023 年 9 月Vol.31 No.17Sept. 2023光学精密工程Optics and Precision Engineering零部件光学影像精准定位的轻量化深度学习网络牛小明1,曾理1*,杨飞2,何光辉1(1.重庆大学数学与统计学院,重庆 401331;2.长春长光辰谱科技有限公司,吉林长春 130000)摘要:光学影像精准定位是提高工业生产效率和质量的重要环节。
传统图像处理定位方法由于光照、噪声等环境因素的影响,在复杂场景下定位精度低、易受干扰;而经典深度学习网络虽然在自然场景目标检测、工业安检、抓取、缺陷检测等得到了广泛应用,但是其海量数据的训练需求、复杂系统的深度学习大模型、检测框的冗余及不精确等问题,导致它不能直接应用于工业零部件像素级精准定位。
针对以上问题,构建了一种零部件光学影像像素级精准定位的轻量化深度学习网络方法。
网络总体选用Encoder-Decoder架构,Encoder使用三级bottleneck级联,在降低特征提取参变量的同时充分提升了网络的非线性;Encoder与Decoder对应特征层实施融合拼接,促使Encoder在上采样卷积后可以获得更多的高分辨率信息,进而更完备地重建出原始图像细节信息;最后,利用加权的Hausdorff距离构建了Decoder输出层与定位坐标点的关系。
实验结果表明:轻量化深度学习定位网络模型参数为57.4 kB,定位精度小于等于5 pixel的识别率大于等于99.5%,基本满足工业零部件定位精度高、准确率高和抗干扰能力强等要求。
关键词:机器视觉;光学影像;深度学习;精准定位;轻量化中图分类号:TP391.4 文献标识码:A doi:10.37188/OPE.20233117.2611Lightweight deep learning network for accurate localization ofoptical image componentsNIU Xiaoming1,ZENG Li1*,YANG Fei2,HE Guanghui1(1.College of Mathematics and Statistics, Chongqing University, Chongqing 401331, China;2.Chang Chun Champion Optics Co., Ltd., Changchun 130000, China)* Corresponding author, E-mail: drlizeng@Abstract:Precise optical image localization is crucial for improving industrial production efficiency and quality. Traditional image processing and localization methods have low accuracy and are vulnerable to en⁃vironmental factors such as lighting and noise in complex scenes.Although classical deep learning net⁃works have been widely applied in natural-scene object detection,industrial inspection,grasping,defect detection, and other areas, directly applying pixel-level precise localization to industrial components is still challenging owing to the requirements of massive data training, complex deep learning models, and redun⁃dant and imprecise detection boxes. To address these issues, this paper proposes a lightweight deep learn⁃ing network approach for pixel-level accurate localization of component optical images. The overall design of the network adopts an Encoder–Decoder architecture. The Encoder incorporates a three-level bottle⁃文章编号1004-924X(2023)17-2611-15收稿日期:2023-06-05;修订日期:2023-06-19.基金项目:国家自然科学基金资助项目(No.62076043);国家重点研发计划资助项目(No.2020YFB2007001)第 31 卷光学精密工程neck cascade to reduce the parameter complexity of feature extraction while enhancing the network’s non⁃linearity. The Encoder and Decoder perform feature layer fusion and concatenation, enabling the Encoder to obtain more high-resolution information after upsampling convolution and to reconstruct the original im⁃age details more comprehensively. Finally, the weighted Hausdorff distance is utilized to establish the rela⁃tionship between the Decoder's output layer and the localization coordinates. Experimental results demon⁃strate that the lightweight deep learning localization network model has a parameter size of 57.4 kB, and the recognition rate for localization accuracy less than or equal to 5 pixels is greater than or equal to 99.5%. Thus, the proposed approach satisfies the requirements of high localization accuracy, high preci⁃sion, and strong anti-interference capabilities for industrial component localization.Key words: machine vision; optical image; deep learning; precise localization; lightweight1 引言机器视觉定位[1]是一种基于光学摄像头或其他传感器获取物体位置和姿态信息,并结合算法进行数据处理,最终实现对目标物体精确定位和跟踪的技术。
教材: 1 现代信号处理理论和方法复旦大学出版社2003 汪源源2 信号
7. Ruimin GUAN, Su YANG, Yuanyuan WANG
Symbol recognition in natural scenes by shape matching across multi-scale segmentations
教材:
1. 现代信号处理理论和方法 复旦大学出版社 2003 汪源源
2. 信号和通信系统 清华大学出版社 2007 包闻亮,汪源源,朱谦
3. 临床超声诊断学 人民卫生出版社 2010 合作编写
合作著作:
1. 消化超声内镜学 科学出版社 2006第一版 2011第二版
Weighted cross-correlation based variational optical flow for gastric flow analysis in ultrasonic videos
Medical Physics 2013,40(5):052901
9. 原宗良,汪源源,余锦华,陈亚清
离散轮廓点集法提取超声图像前列腺边缘
应用科学学报 2012,30(1):89-95
10.白宝丹,汪源源,杨翠微
基于递归复杂网络的房颤术后监测
仪器仪表学报 2012,33(4):809-815
Lecture Notes in Computer Science 2013,7423:59-68
8. 徐福兴,王亮,汪源源,丁传凡
栅网电极离子阱质量分析器的结构与性能
分析化学 2013,41(5):781-786
光学精密工程 2011,19(6):1398-1405
2. Tianjie LI, Yuanyuan WANG
由LeNet-5从单张着装图像重建三维人体
DOI: 10.3785/j.issn.1008-973X.2021.01.018由LeNet-5从单张着装图像重建三维人体许豪灿1,2,李基拓1,2,陆国栋1,2(1. 浙江大学 机械工程学院,浙江 杭州 310027;2. 浙江大学 机器人研究院,浙江 余姚 315400)摘 要:提出基于LeNet-5的从单张着装图像恢复人体三维形状的方法,建立着装人体正面轮廓和人体形状空间之间的映射模型,实现了高效、精确的三维人体建模,可以应用于对人体表面形状精度要求较高的场合,如虚拟试衣. 基于PGA 在流型空间上对公开的三维人体数据集进行数据扩增,给虚拟人体进行着装,构建着装人体数据库. 从着装人体正面投影图像中提取信息,以人体形状参数及正、侧面轮廓信息为约束,基于LeNet-5完成三维人体重建. 实验证明,对于身穿不同款式服装的人,采用的模型通常都能从单张着装图像中重建得到较高精度的三维人体模型.关键词: 三维人体重建;虚拟试衣;数据扩增;着装人体;深度学习中图分类号: TP 399 文献标志码: A 文章编号: 1008−973X (2021)01−0153−09Reconstruction of three-dimensional human bodies fromsingle image by LeNet-5XU Hao-can 1,2, LI Ji-tuo 1,2, LU Guo-dong 1,2(1. School of Mechanical Engineering , Zhejiang University , Hangzhou 310027, China ;2. Robotics Institute , Zhejiang University , Yuyao 315400, China )Abstract: A novel human body modeling method that can reconstruct three-dimensional (3D) human bodies fromsingle dressed human body image based on LeNet-5 was proposed. The method can reconstruct 3D human bodies accurately and efficiently, and the reconstruction results can be potentially used in some occasions where require precise surface shapes, such as virtual try-on systems. 3D human bodies collected from open datasets were selected and augmented on manifolds with PGA. A dressed human body database was established after dressing these 3D human bodies with virtual garments in various types and sizes. Feature descriptors were extracted from the frontal projected images of dressed human bodies. The corresponding 3D human bodies were constructed through LeNet-5with the constraints of shape parameters as well as the frontal and lateral contours. The experimental results show that the model can reconstruct a high-precision 3D human body from a single dressed human body image for people wearing different styles of clothing.Key words: three-dimensional human modeling; virtual try-on; data augmentation; dressed human body; deep learning三维人体模型在人机交互、服装设计、虚拟试衣等领域有着广泛的应用,如何快速、低成本地重建高精度的三维人体模型,一直是计算机图形学领域的重点研究方向.传统的三维人体重建方法主要包括基于多视角数据融合的方法和基于模板形变的方法. 按照数据来源的不同,基于多视角数据融合的方法可以细分为基于色彩数据(R G B )[1]、深度数据收稿日期:2020−01−09. 网址:/eng/article/2021/1008-973X/202101018.shtml基金项目:国家重点研发计划资助项目(2018YFB1700704);国家自然科学基金资助项目(61732015);中央高校基本科研业务费专项资助项目(2019QNA4001);浙江省自然科学基金资助项目(LY18F020004).作者简介:许豪灿(1993—),男,博士生,从事计算机图形学的研究. /0000-0002-1474-7039. E-mail :*****************.cn通信联系人:李基拓,男,副教授. /0000-0003-1343-5305. E-mail :****************.cn第 55 卷第 1 期 2021 年 1 月浙 江 大 学 学 报(工学版)Journal of Zhejiang University (Engineering Science)Vol.55 No.1Jan. 2021(Depth)[2-4]和色彩-深度数据(RGB-D)[5]的三维重建方法. 常见的图像特征提取方式,如SIFT、SURF、HOG等,容易受到外界因素干扰,如光照. 高精度的激光扫描仪面临高昂的设备和安装成本,低成本RGB-D相机面临原始数据精度的挑战.基于模板形变的方法需要事先设定拓扑结构已知的模板人体,通过从RGB/RGB-D图像中提取目标人体的正、侧面轮廓以及关键点位置等信息[6-8],驱动模板人体形变. 从低维特征数据恢复高维模型本身是一个病态问题,因而基于模板形变的方法通常无法保证重建结果的稳定性与精度.随着参数化人体模型SCAPE[9]和SMPL[10]的广泛应用及深度学习的快速发展,基于深度学习的人体建模被逐渐应用于越来越多的场合. 主要思路大多是从图像中提取关键点[11]、轮廓[12-15]、热图[12]或对图像进行语义分割[11,16],通过约束人体姿态/形态参数、正面轮廓、关键点位置等,建立输入图像与人体形态参数、姿态参数之间的映射模型. 单视角数据往往不能全面地反映人体表面形状信息,LSP[17]、Human3.6M[18]、UP-3D[19]等常见数据集中的人体姿态、服装款式丰富且背景复杂多变,重建结果往往更侧重于恢复姿态信息,形状精度一般不高.本文提出由LeNet-5从单张着装图像重建三维人体的方法,以A-pose人体的二值化图像为输入,通过对公开数据集中的虚拟人体进行数据扩增并穿着不同款式、尺寸的服装,构建着装人体数据集;以人体形状参数误差、正/侧面轮廓误差为损失函数,开展模型迭代优化,实现快速、高精度的三维人体重建.1 系统概述如图1所示,从公开数据集[20]中获取约1 500组女性真实人体样本数据. 利用SMPL调整人体姿态后,在流型空间上使用主测地线分析(prin-ciple geodesic analysis, PGA)进行数据扩增,利用Marvelous Designer①给虚拟人体穿上不同款式、不同尺寸的服装,构建着装人体数据库.对于在流型空间上扩增后的人体数据库,采用PCA进行降维,以低维度的PCA主轴系数向量γ表征高维人体;对于着装人体数据库中的每一个样本人体,通过投影得到正面轮廓图像,从中提取特征. 为了能够更加准确地建立着装人体正面轮廓图像与人体形状空间之间的映射模型,重建结果进行重投影,得到三维人体正、侧面轮廓图像. 以PCA主轴系数向量误差、人体正、侧面轮廓误差为损失函数,迭代优化模型,实现了基于单张着装人体正面图像的三维人体重建.2 着装人体数据集构建现有的着装人体数据集,如Human3.6M[18]、LSP[17]、UP-3D[19]等,人体姿态信息丰富且背景复杂多变,更适用于人体姿态信息恢复,难以实现高精度三维人体重建. 为了减小人体姿态及拍摄环境对重建结果的影响,以扫描人体数据为基础,采用SMPL表征人体形状并调整人体姿态,在流型空间上进行数据扩增. 借助Marvelous Designer 完成人体着装,构建姿态近似、背景简单的着装人体数据集.2.1 人体姿态调整在计算机图形学中,通常使用高维三角网格表征人体表面形状. 为了构建不同形状、不同姿态人体的形变关系,参数化统计模型SCAPE和SMPL近年来被逐渐广泛采用,其中以SMPL更具有代表性. SMPL是基于网格顶点的人体形变模型,用于表征不同姿态下不同人体的表面形状,网格形变参数包括形状参数β及非刚性姿态参数θ. 为了降低人体姿态对重建结果的影响,实现高精度三维人体建模,对于所有着装人体数据,姿态均被调整为A-pose.扫描人体数据通常由数万个甚至数十万个空间点构成,维度过高且网格拓扑结构不一致,因而需要统一不同人体数据之间的网格拓扑[21]. 以网格拓扑统一的扫描人体为样本,通过SMPL估算当前人体形态参数βi及姿态参数θi,调整人体姿态至A-pose. 其中人体形态参数βi通过将当前PGAPCA图 1 由单张图像重建三维人体流程Fig.1 Pipeline of reconstruction of three-dimensional human bodies from single image①Marvelous Designer 官网./.154浙江大学学报(工学版)第 55 卷网格顶点投影至T -p o s e 下的P C A 主轴得到(SMPL 模型的标准姿态为T-pose ),当前姿态参数θi 由下式[10]预估:T P µJ P µB P W e (·)V S i ,e ρω式中:和分别为平均人体及目标人体关节点平均位置,为形变融合方程,和分别表示当前人体及目标人体的对应边,和为从人体数据集中训练得到的SMPL 参数.每个虚拟人体以6 890个网格顶点表征,扫描人体姿态均通过SMPL 转换为A-pose ,βi 维度为10,θi 维度为72.2.2 数据集扩增样本数据的数量和质量会影响模型的拟合能力,充足的样本数量及合理的样本分布是构建高精度网络模型的必要条件. 现有的扫描人体数据库[20]仅包含约1 500组男性人体、1 500组女性人体,样本数量有限,不足以训练得到稳定的模型,因而需要进行数据集扩增. 男性和女性在人体形态上存在明显差异,例如在腰围与臀围接近的情况下,女性人体的胸围普遍大于男性. 为了能够更加准确地建立统计模型,重建得到更高精度的三维人体,需要对男性与女性人体分别构建数据集,开展独立的三维建模,思路与方法完全相同.以女性人体为实例,进行详细说明.常用的人体数据集扩增方法为现有人体数据之间的线性插值,但人体形状空间是一个复杂的高维空间,不同人体加权求和得到的人体不一定有效. 为了能够更加准确地表征人体形状空间,得到更加接近于真实人体的扩增数据集,将现有的人体数据投影至黎曼空间,采用李群(Lie Groups)构造流型结构,用测地线距离定义不同人体之间的差异[22],进行人体数据集扩增.拓扑结构统一的人体之间的变形可以认为是各三角网格形变的综合. 人体H 1和H 2中对应三角形T 1=[v 12−v 11, v 13−v 11]∈R 3×2和T 2= [v 22−v 21,v 23−v 21]∈R 3×2之间的形变方程可以表示为T 1=QT 2,其中[v 11, v 12, v 13]、[v 21, v 22, v 23]分别为三角形T 1、T 2的顶点;Q ∈R 3×3为形变矩阵,共包含9个未知数,大于方程数量6. 在欧式空间里,常通过添加额外约束,如正则项来解决这一病态问题,但会不可避免地引入误差;在黎曼几何中,Q 可以被分解为T 1T 212阵,自由度为3;S 为缩放系数,自由度为1;A 为特殊平面内的形变,自由度为2[22],共包含6个自由度,与方程数量相等. 采用PGA 在流型空间上对现有人体数据进行插值操作. 对于流型空间M 中的人体数据集{H i },通过迭代算法[23]得到平均人体:式中:d (·)为测地线距离,∥·∥F 其中 表示Frobenius 范数. 采用PCA 对人体数据集{H i }在切平面T µM 上的投影结果{lg µ−1H i}进行降维,其中T µM 是M 在µ处的切平面. 如图2所示,通过对PCA 主轴系数的线性加权插值,共得到6 000组人体数据,插值得到的人体数据可以表示为式中:k 1、k 2为权重系数,k 1+k 2=1.虽然数据库中的所有人体姿态均为A-pose ,不同样本之间的姿态信息可能存在细微的差别.为了更加合理地表征数据库中的三维人体模型,采用PCA 对插值后的人体数据库进行降维. 取前95%的特征值对应的特征向量表征三维人体模型,即PCA主轴系数向量γi 的维度为15.在实际应用场景中,用户通常更倾向于在穿日常服饰的情况下,采用非侵入式且少附加条件的测量方式,需要在虚拟人体表面添加不同款式、尺寸的虚拟服装. 如图3所示,采用Marvelous Designer 添加虚拟服装,构建着装人体数据库. 样本数据集中服装款式的增加必然会提升模型的泛化能力,但也会增加数据集构建成本,提升模型训练难度. 综合考虑上述因素,本数据集累计包含图 2 人体数据集扩增Fig.2 Human data augmentation第 1 期许豪灿, 等:由 LeNet-5 从单张着装图像重建三维人体 [J]. 浙江大学学报:工学版,2021, 55(1): 153–161.1551 500组真实人体描述数据、1 500组插值人体、3款不同款式的虚拟服装,总计约9 000组着装人体.3 基于LeNet-5的三维人体重建为了降低拍摄环境及人体姿态信息对于重建结果的影响,构建人体正面二值化轮廓图像与人体形状空间之间的映射模型. 模型训练数据为着装人体正面轮廓图像及对应的三维人体数据. 为了减小图像平移对于模型训练的影响,将人体中心点与图像中心点对齐后进行投影变换,得到无平移的人体正面轮廓图像. 从图像中提取特征,以人体姿态参数及正、侧面轮廓为约束,迭代优化重建结果.3.1 网络结构及训练参数网络以二值化后的人体正面轮廓图像为输入,人体模型PCA 主轴系数向量γi 为输出,以γi 的误差及人体正、侧面轮廓误差为损失函数,迭代优化模型. LeNet-5[24]模型共包含3个卷积层、2个池化层及1个全连接层,具体的网络结构如图4所示.1)输入层:输入图像大小为256×256,每个像素点的物理尺寸为8 mm×8 mm.2)C1层:卷积层,滤波器大小为5×5,步长为2,共包含16个滤波器.3)S2层:子采样层/池化层,对每个2×2区块中的值取平均值,得到16组特征映射.4)C3层:卷积层,滤波器大小为5×5,步长为2,共包含32个滤波器.5)S4层:子采样层/池化层,对每个2×2区块中的值取平均值,得到32组特征映射.6)C5层:卷积层,滤波器大小为5×5,步长为2,共包含64个滤波器.7)F6层:全连接层.8)输出层:由PCA 主轴系数构成的向量γ.在模型训练的过程中,采用dropout 以避免模型过拟合,提升模型的泛化能力[25],将dropout rate 设定为0.5;将学习率和batch 分别设置为0.01和100,添加批处理规范化,以提升收敛速度. 将着装人体数据集中70%的样本数据作为训练集,剩下的作为测试集和验证集,采用TensorFlow 及Python 实现网络模型,在服务器上进行训练. 服务器相关硬件为1块Intel Xeon E5 CPU 和4块NVIDIA 2080Ti GPU ,训练时间大约为36 h.3.2 损失函数单视角数据往往不能全面地反映人体表面形状信息,为了提升模型的性能,使得重建结果在视觉上更加接近于真实人体,以人体模型PCA 主轴系数向量及人体正、侧面形状误差为损失函数.迭代优化模型如下.1)人体形状参数误差,即h 式中:N 为训练样本数量,G i 为第i 个着装人体的正面轮廓图像, 为模型LeNet-5的参数.Y i froY i lat2)轮廓误差. 根据PCA 主轴系数向量γi 计算得到三维人体模型M (γi ),重投影得到人体正面轮廓图像及侧面轮廓图像. 正面和侧面轮廓误差分别可以表示为f (I 1,I 2)=∑I 1(x ,y )−I 2(x ,y )P i froP i lat式中:f (I 1,I 2)为图像I 1和I 2之间的差异,;和分别为第i 个着装人体的正面、侧面投影图像(不包含服装).该模型结合人体形状参数误差及正、侧面轮廓误差,总体损失函数为φϕ式中:和分别为正面投影误差和侧面投影误差的权重系数.图 3 在虚拟人体表面添加服装Fig.3 Garment simulation on human body surface图 4 LeNet-5网络结构Fig.4 Network structure of LeNet-5156浙 江 大 学 学 报(工学版)第 55 卷4 实验结果与分析为了客观评价该方法的有效性和泛化能力,开展了大量的实验,主要包括:模型对比,即不同损失函数下模型误差比较及与常见深度学习及非深度学习算法比较;从虚拟着装人体图像和真实人体正面图像恢复三维人体模型. 其中虚拟着装人体图像由三维着装人体模型的小孔成像投影获得,在已知三维人体体形的情况下,可以有效地测试该方法的效果;将真人着装图像的重建测试用于验证该方法的实际应用效果.4.1 模型对比本文的主要贡献之一是将人体形状参数误差及人体正、侧面轮廓误差作为损失函数,迭代优化网络,实现了更高精度的三维人体重建. 为了验证该方法的有效性,以同样的着装人体样本数据库、网络结构、训练参数构建学习模型,仅改变损失函数,评估重建结果的形状误差. 表1中,e为平均误差. 如表1所示,若仅以PCA主轴系数向量γi为误差项,重建结果的总体平均误差约为1.8 cm,平均围度误差(胸围/腰围/臀围误差)超过3 cm,不足以应用在对三维人体模型精度要求较高的领域. 人体轮廓约束,尤其是侧面轮廓约束,明显提升了模型拟合能力,降低了重建人体的形状误差及围度误差.为了客观评价本文方法的性能,表2给出部分从单张图像恢复三维人体模型方法的误差. 由于Guan等[15]仅提供3个真实人体样本及胸围和腰围误差,在表2中仅表示误差范围;Kanazawa 等[14-16]为基于深度学习的三维人体重建方法,重建精度受数据集的影响较大,如表2所示均为在数据集Human 3.6 M下的误差. 可以看出,与常见的偏重人体姿态信息恢复的模型相比,本文方法的重建精度更高;与需要事先手动交互的传统方法[15]相比,本文的模型用户体验更佳,重建结果不易受测试者的影响.4.2 虚拟人体三维重建在实际应用中,用户通常很难保持精确的人体姿态. 为了对模型进行更加准确的评估,客观测试模型的泛化能力,该实验通过适当改变数据库中着装人体的姿态,模拟真实数据. 如图5所示,将手臂与躯干的夹角调整为15°、45°,适当调整双脚的间距,使用Marvelous Designer对形变人体进行着装,投影得到人体正面轮廓图像作为模型输入,预测人体形状. 表3中,P为手臂与躯干的夹角,L为双脚之间的距离. 可以看出,提出的模型具有较好的泛化能力,在合理的范围内,人体姿态的轻微变化不会对重建结果产生明显的影响.为了直观展示模型在不同体型人体上的性表 1 不同损失函数下重建结果误差Tab.1 Reconstruction error with different loss function损失函数e /cm总体误差胸围腰围臀围手长腿长Lγ 1.76 3.27 3.18 3.51 1.94 2.04 Lγ+φL f 1.36 2.34 2.49 2.72 1.48 1.59 Lγ + φL s 1.31 2.36 2.23 2.66 1.41 1.62 L total 1.15 1.97 2.08 2.32 1.21 1.45表 2 不同方法重建结果误差Tab.2 Reconstruction error with different methods方法e /cm总体误差胸围腰围臀围LeNet-5 1.15 1.97 2.08 2.32文献[15]方法−0.1~4.50.6~3.4−文献[12]方法7.59−−−文献[14]方法 5.68−−−文献[16]方法 5.99−−−图 5 不同姿态下的三维人体Fig.5 3D human bodies with different postures第 1 期许豪灿, 等:由 LeNet-5 从单张着装图像重建三维人体 [J]. 浙江大学学报:工学版,2021, 55(1): 153–161.157能,手动从数据集中挑选若干具有代表性的人体进行实验. 如图6所示,四肢与头部的误差相对更大,这主要是因为不同人体之间的姿态存在细微差别. 从上述实验可以看出,无论是身材相对标准的人体还是偏胖或者偏瘦的人体,利用提出的模型都能够实现较高精度的三维重建.4.3 真实人体三维重建三维模型的精度测试数据与样本数据之间的差异性相关. 为了证明本文方法的实际应用价值,在真实人体上开展多次实验,主要可以分为3类,分别评估目标人体本身、目标人体姿态及服装款式对重建结果的影响. 在模型训练的过程中,输入图像为背景纯净的人体正面轮廓图像,且假设相机位置固定不变. 在实际的应用环境下,背景通常多变而复杂,且相机位置难以固定;因此,需要对输入图像进行预处理,保证相机位置与训练图像中的相机位置一致,即从真实人体采集到的图像应与其对应的虚拟人体投影得到的图像相同(假设可以重建得到高精度的三维人体模型). 如图7所示,具体步骤如下.1)背景分割. 去除环境背景,采用Grabcut 算法[26]从原始图像中提取人体,并进行二值化. 由于背景分割不是本文研究的重点,数据采集环境均为白色背景,这在实际应用中不难实现.2)中心点对齐. 以人体平均点为人体中心点,并将图像中心点和人体中心点对齐.3)人体图像缩放. 结合用户提供的身高信息,计算当前图像中每个像素点的实际物理尺寸;按比例缩放人体轮廓,调整图像中每个像素点的物理尺寸至8 mm×8 mm.为了验证模型在不同人体上的效果,邀请几位不同身高、体型的志愿者,根据样本数据采集要求,即姿态为A ,着装为长衣长裤或短裙,采集正面图像,如图8、9所示. 重建结果的关键围度误差见表4. 对比表1、3可知,以真实人体正面图像为输入数据和以虚拟人体正面投影图像为输入数据,重建结果的围度误差较接近,证明了本文方法在真实应用环境中的有效性.为了验证真实环境下人体姿态对重建结果的影响,与4.2节展示的实验类似,志愿者随机改变手臂姿态及双脚间距,采集不同姿态下人体的正面图像,预处理后进行人体重建. 如图10、表5所示,围度误差与4.2节的理论值较接近,证明了该模型具有较强的泛化能力,人体姿态在合理范围表 3 不同姿态下人体的误差Tab.3 Reconstruction error on different postures人体姿态e /cm总体误差胸围腰围臀围手长腿长P = 15° 1.56 3.45 2.91 3.34 1.84 2.15P = 25° 1.26 2.06 2.24 2.29 1.41 1.53P = 30° 1.15 1.97 2.08 2.32 1.21 1.45P = 35° 1.19 1.95 2.17 2.57 1.15 1.57P = 45° 1.69 3.16 3.05 3.40 2.02 2.01P = 90° 2.68 4.52 5.05 5.13 3.17 3.64L = 01.212.142.092.451.361.68(a) 输入图像(b) 重建结果(c) 重建误差>15 mm图 6 不同体型的三维人体重建Fig.6 3D human body reconstruction in different shapes分割对齐图 7 人体正面图像预处理Fig.7 Preprocessing for frontal image158浙 江 大 学 学 报(工学版)第 55 卷内的轻微变化不会对重建精度产生明显的影响.为了使得重建结果在视觉上更加接近于输入图表 4 不同人体的重建误差Tab.4 Reconstruction error on different bodies人体着装e /cm胸围腰围臀围手长腿长人体1长衣长裤 2.35 2.08 1.14 1.960.91人体1短裙 3.15 2.93 3.20 1.69 1.46人体2长衣长裤 1.050.81 1.020.36 1.34人体2短裙 1.47 1.61 1.84 1.37 2.11人体3长衣长裤 1.54 1.72 3.03 2.35 1.71人体3短裙2.612.202.613.221.68表 5 不同姿态人体的重建误差Tab.5 Reconstruction error for real human bodies with differ-ent postures人体姿态e /cm胸围腰围臀围手长腿长P = 15°1.840.73 1.51 1.46 1.58P = 25° 1.620.880.670.93 1.61P = 30° 1.050.81 1.020.73 1.34P = 35°0.91 1.20 1.530.360.95P = 45°2.15 2.06 2.07 1.04 1.36P = 90° 2.163.844.215.13 4.38L = 01.261.390.790.511.37图 8 从正面图像恢复三维人体,姿态A ,长衣长裤Fig.8 3D human body reconstruction from images captured in frontview, posture A, and long trousers图 9 从正面图像恢复三维人体,姿态A ,着短裙Fig.9 3D human body reconstruction from images captured in frontview, posture A, and short skirt图 10 从不同姿态人体图像恢复三维人体Fig.10 Reconstruction from images of human bodies in differentpostures第 1 期许豪灿, 等:由 LeNet-5 从单张着装图像重建三维人体 [J]. 浙江大学学报:工学版,2021, 55(1): 153–161.159像,图10中的三维人体的姿态经过手动调整.为了提升用户体验,验证模型的泛化能力,评估服装款式对重建结果的影响. 实验者穿着不同款式的服装,按照要求采集正面图像,如图11所示. 重建结果的围度误差如表6所示. 可以看出,在当前数据集下,当真实人体着装款式与训练集中服装相同或相近时,重建结果较理想;当服装款式发生明显变化时,重建结果精度会受到影响.这主要是由于数据集中服装款式有限,若适当丰富数据集中的服装款式,则重建精度必然得到提升. 本文旨在从现有数据集中学习着装人体正面图像与人体表面形状之间的统计规律. 在面对某些特殊情形时,如测试人体脸部及四肢相对肥胖,躯干相对瘦小,且身着相对宽松的服装时,正面图像通常无法准确地反映表面形状,重建误差必然相对较大.图 11 从同一人体不同着装的图像恢复三维人体Fig.11 Reconstruction from images of one human body in differentgarments5 结 语本文提出由LeNet-5从单张着装图像恢复三维人体的方法. 为了降低人体姿态、图像背景及服装款式对重建精度的影响,得到符合虚拟试衣要求的高精度三维人体模型,从公开数据集中获取了约1 500组女性真实人体数据. 采用PGA 在流型空间上进行数据扩增,给虚拟人体穿上不同款式、尺寸的虚拟服装,构建着装人体数据集. 从人体正面轮廓图像中提取特征,以PCA 主轴系数向量误差、正、侧面轮廓误差为损失函数,预测人体形态. 实验证明,该模型对于不同体型的人体,在不同着装下,通常都能够得到较理想的重建结果. 本文的主要贡献如下:1)将人体侧面轮廓信息作为约束,提高了模型的性能;2)基于不同服装款式及尺寸的着装人体样本数据训练模型,降低了服装款式及尺寸对三维人体重建结果的影响.参考文献(References):ALLDIECK T, MAGNOR M, XU W, et al. Detailed humanavatars from monocular video [C]// International Conference on 3D Vision . Verona: IEEE, 2018: 98–109.[1]TONG J, ZHOU J, LIU L, et al. Scanning 3D full human bodiesusing kinects [J]. IEEE Transactions on Visualization and Computer Graphics , 2012, 18(4): 643–650.[2]CHEN G, LI J, WANG B, et al. Reconstructing 3D humanmodels with a kinect [J]. Computer Animation and Virtual Worlds , 2016, 27(1): 72–85.[3]CHEN G, LI J, ZENG J, et al. Optimizing human modelreconstruction from RGB-D image based on skin detection [J].Virtual Reality , 2016, 20(3): 159–172.[4]WEISS A, HIRSHBERG D, BLACL M J. Home 3D body scansfrom noisy image and range data [C]// International Conference on Computer Vision . Barcelona: IEEE, 2011: 1951–1958.[5]WANG C C L. Parameterization and parametric design ofmannequins [J]. Computer-Aided Design , 2005, 37(1): 83–98.[6]BEAK S Y, LEE K. Parametric human body shape modelingframework for human-centered product design [J]. Computer-Aided Design , 2012, 44(1): 56–67.[7]表 6 不同服装下人体的重建误差Tab.6 Reconstruction error for real human bodies with differ-ent garments服装款式e /cm胸围腰围臀围手长腿长款式10.73 1.43 2.71 1.82 1.51款式2 2.65 2.81 1.97 1.600.82款式34.794.561.763.141.01160浙 江 大 学 学 报(工学版)第 55 卷HUANG J, KWOK T H, ZHOU C. Parametric design for humanbody modeling by wireframe-assisted deep learning [J].Computer-Aided Design , 2019, 108: 19–29.[8]ANGUELOV D, SRINIVASAN P, KOLLER D, et al. SCAPE:shape completion and animation of people [J]. ACM Transactions on Graphics , 2005, 24(3): 408–416.[9]LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinnedmulti-person linear model [J]. ACM Transactions on Graphics ,2015, 34(6): 248.[10]POPA A I, ZANFIR M, SMINCHISESCU C. Deep multitaskarchitecture for integrated 2d and 3d human sensing [C]//Conference on Computer Vision and Pattern Recognition .Hawaii: IEEE, 2017: 6289–6298.[11]PAVLAKOS G, ZHU L, ZHOU X, et al. Learning to estimate 3Dhuman pose and shape from a single color image [C]//Conference on Computer Vision and Pattern Recognition .Salt Lake City: IEEE, 2018: 459–468.[12]JI Z, QI X, WANG Y, et al. Human body shape reconstructionfrom binary silhouette images [J]. Computer Aided Geometric Design , 2019, 71: 231–243.[13]KANAZAWA A, BLACK M J, JACOBS D W, et al. End-to-endrecovery of human shape and pose [C]// Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018:7122–7131.[14]GUAN P, WEISS A, BALAN A O, et al. Estimating humanshape and pose from a single image [C]// International Conference on Computer Vision . Florida: IEEE, 2009:1381–1388.[15]OMRAN M, LASSNER C, PONS-MOLL G, et al. Neural bodyfitting: unifying deep learning and model based human pose and shape estimation [C]// International Conference on 3D Vision .Verona: IEEE, 2018: 484–494.[16]JOHNSON S, EVERINGHAM M. Clustered pose and nonlinearappearance models for human pose estimation [C]// British Machine Vision Conference . Aberystwyth: BMVA, 2010: 5.[17]IONESCU C, PAPAVA D, OLARU V, et al. Human3.6m: largescale datasets and predictive methods for 3d human sensing in natural environments [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2013, 36(7): 1325–1339.[18]LASSNER C, ROMERO J, KIEFEL M, et al. Unite the people:closing the loop between 3d and 2d human representations [C]//Conference on Computer Vision and Pattern Recognition .Hawaii: IEEE, 2017: 6050–6059.[19]PISHCHULIN L, WUHRER S, HELTEN T, et al. Buildingstatistical shape space for 3d human modeling [J]. Patten Recognition , 2017, 67: 276–286.[20]LI J, LU G. Customizing 3D garments based on volumetricdeformation [J]. Computers in Industry , 2011, 62(7): 693–707.[21]FREIFELD O, BLACK M J. Lie bodies: a manifoldrepresentation of 3D human shape [C]// European Conference on Computer Vision . Berlin: Springer, 2012: 1–14.[22]FLETCHER P T, LU C, JOSHI S. Statistics of shape via principalgeodesic analysis on lie groups [C]// IEEE Computer Society Conference on Computer Vision and Pattern Recognition .Wisconsin: IEEE, 2003: 95–101.[23]LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-basedlearning applied to document recognition [J]. Proceedings of the IEEE , 1998, 86(11): 2278–2324.[24]HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al.Improving neural networks by preventing co-adaptation of feature detectors [J]. Computer Science , 2012, 3(4): 212–223.[25]ROTHER C, KOLMOGOROV V, BLAKE A. "GrabCut"interactive foreground extraction using iterated graph cuts [J].ACM Transactions on Graphics , 2004, 23(3): 309–314.[26]第 1 期许豪灿, 等:由 LeNet-5 从单张着装图像重建三维人体 [J]. 浙江大学学报:工学版,2021, 55(1): 153–161.161。
《计算美学计算科学驱动的视觉美学度量与生成》序言
第42卷第22期 范劲松等:传统陶瓷艺术作品的三维数字化重建及应用的研究与实践 13模型,并开发了相关的虚拟现实应用。
随着科技的不断发展,这些数字内容将在未来有着越来越广阔的应用。
参考文献:[1] 王晓红, 任展翔, 杨礼彬. 基于感兴趣区域的彩色三维物体快速喷绘方法[J]. 包装工程, 2021, 42(7): 257- 263.WANG Xiao-hong, REN Zhan-xiang, YANG Li-bin.Research on the Fast Printing Method of Color Three- Dimensional Objects Based on the Region of Interest[J].Packaging Engineering, 2021, 42(7): 257-263.[2] INZERILLO L, PAOLA F D, ALOGNA Y. High QualityTexture Mapping Process Aimed at the Optimization of 3D Structured Light Models[J]. ISPRS-International Ar-chives of the Photogrammetry, Remote Sensing and Sp-atial Information Sciences, 2019, 2(9): 389-396.[3] MAGNANI M, GUTTORM A, Magnani N. Three-dime-nsional, Community-based Heritage Management of In-digenous Museum Collections: Archaeological Ethnog-raphy, Revitalization and Repatriation at the Sámi Mu-seum Siida[J]. Journal of Cultural Heritage, 2018(1): 162-169.[4] Jo Y H, Hong S, Jo S Y, et al. Noncontact Restoration ofMissing Parts of Stone Buddha Statue Based on Three- dimensional Virtual Modeling and Assembly Simulation [J]. Heritage Science, 2020, 8(1): 103.[5] MANAJITPRASERT S, TRIPATHI N K, ARUNPLODS. Three-Dimensional (3D) Modeling of Cultural Heri-tage Site Using UAV Imagery: a Case Study of the Pa-godas in Wat Maha That, Thailand[J]. Applied Sciences, 2019, 9(18): 3640.[6] 余生吉, 吴健, 王春雪, 等. 敦煌莫高窟第45窟彩塑高保真三维重建方法研究[J]. 文物保护与考古科学, 2021, 33(3): 10-18.YU Sheng-ji, WU Jian, WANG Chun-xue, et al. Resea-rch on the High Fidelity 3 D Reconstruction Method for Painted Sculptures in Cave No.45 of Mogao Grottoes in Dunhuang[J]. Sciences of Conservation and Archaeol-ogy, 2021, 33(3): 10-18.[7] 刘孟涵. 中国传统工艺集萃——石湾陶卷[M]. 北京:中国科学技术出版社, 2017(1): 2-14.LIU Meng-han. Collection of Chinese Traditional Crafts- Shiwan Pottery Rolls[M]. Beijing: China Science and Technology Press, 2017(1): 2-14.[8] 刘立恒, 赵夫群, 汤慧, 等. 几何特征保持的文物点云去噪算法[J]. 数据采集与处理, 2020, 35(2): 373- 380.LIU Li-heng, ZHAO Fu-qun, TANG Hui, et al. A De-noising Method for Point Cloud of Cultural Relics with Geometric Feature Preservation[J]. Journal of Data Ac-quisition & Processing, 2020, 35(2): 373-380.[9] 龙玺, 钟约先, 李仁举, 等. 结构光三维扫描测量的三维拼接技术[J]. 清华大学学报(自然科学版), 2002, 42(4): 477-480.LONG Xi, ZHONG Yue-xian, LI Ren-ju, et al. 3-D Surface Integration in Structured Light 3-D Scanning[J].Journal of Tsinghua University (Science and Technol-ogy), 2002, 42(4): 477-480.[10] 谢杰良. 结构光扫描三维全自动重建方法研究[D].武汉: 武汉大学, 2018.XIE Jie-liang. Research on Fully Automatic Three-dim-ensional Reconstruction of Structured Light Scanning[D]. Wuhan: Wuhan University, 2018.[11] 张俊齐. 基于照片建模技术的彩色三维扫描仪改良设计及其应用研究[D]. 青岛: 青岛理工大学, 2018.ZHANG Jun-qi. Improved Design and Application of Color 3D Scanner Based on Photo Modeling Technol-ogy[D]. Qingdao: Qingdao University of Technology, 2018.[12] GOMES L, BELLON O, SILVA L. 3D ReconstructionMethods for Digital Preservation of Cultural Heritage: a Survey[J]. Pattern Recognition Letters, 2014, 50(1): 3-14.[13] 高爽. 基于结构光的小型物体三维重建技术[D]. 成都: 电子科技大学, 2019.GAO S. 3D Reconstruction of Small Object Based on the Structured Light[D]. Chengdu: University of Elec-tronic Science and Technology of China, 2019.[14] YANG L, YAN Q G, XIAO C X. Shape-controllable Ge-ometry Completion for Point Cloud Models[J]. The Vis-ual Computer, 2017, 33(3): 385-398.[15] TABIB R A, JADHAV Y V, TEGGINKERI S, et al.Learning-Based Hole Detection in 3D Point Cloud To-wards Hole Filling[J]. Procedia Computer Science, 2020, 171: 475-482.[16] KAZHDAN M, MING C, RUSINKIEWICZ S, et al.Poisson Surface Reconstruction with Envelope Constr-aints[J]. Computer Graphics Forum, 2020, 39(5): 173- 182.[17] SU T, WANG W, LIU H, et al. An Adaptive and Rapid3D Delaunay Triangulation for Randomly Distributed Point Cloud Data[J]. The Visual Computer, 2020(20): 1-25.[18] 吴发辉, 张玲, 余文森. 基于图形学算法的纹理映射技术的研究与实现[J]. 现代电子技术, 2018, 41(24): 71-74.WU Fa-hui, ZHANG Ling, YU Wen-sen. Research and Implementation of Texture Mapping Technology Based on Graphics Algorithm[J]. Modern Electronics Techni-que, 2018, 41(24): 71-74.[19] NALLIG L, ESMEIDE L, SANCHEZ T. A Linear Pro-gramming Approach for 3D Point Cloud Simplifica-tion[J]. IAENG Internaitonal Journal of Computer Sci-ence, 2017, 44(1): 60-67.[20] VELJKO M, ZIVANA J, ZORAN M. Feature SensitiveThree-Dimensional Point Cloud Simplification Using Support Vector Regression[J]. Tehnicki Vjesnik, 2019, 26(4): 985-994.. All Rights Reserved.。
基于自适应锚定邻域回归的图像超分辨率算法
因此提出一种基于自适应锚定邻域回归的图像SR算法,根据样本分布自适应地计算邻矩阵。首先,以图像块为中心,运用K均值聚类算法将训练样本聚类成不同的簇;然后,用每个簇的聚
类中心替换字典原子来计算相应的邻域;最后,运用这些邻域来预计算从低分辨率(LR)空间到高分辨率(HR)空间的
基于自适应锚定邻域回归的图像超分辨率算法
叶双,杨晓敏,严斌宇
(四川大学电子信息学院,成都610065) (*通信作者电子邮箱yby@ scu. edu. cn)
摘要:在基于字典的图像超分辨率(SR)算法中,锚定邻域回归超分辨率(ANR)算法由于其优越的重建速度和
质量引起了人们的广泛关注。然而,ANR算法的锚定邻域投影并不稳定,以致于不足以涵盖各种样式的映射关系。
Key words: image super-resolution; adaptive clustering; adaptive neighborhood; K-means clustering algorithm
中图分类号:TP391.41
文献标志码:A
Image super-resolution algorithm based on adaptive anchored neighborhood regression
YE Shuang, YANG Xiaomin, YAN Birfyu*
(College of Electronics and Information Engineering, Sichuan University, Chengdu Sichuan, 610065, China)
Abstract: Among the dictionary-based Super-Resolution ( SR) algorithms, the Anchored Neighborhood Regression
基于图像块相似性和补全生成的人脸复原算法
基于图像块相似性和补全生成的人脸复原算法苏婷婷;王娜【摘要】图像获取过程中,由于成像距离、成像设备分辨率等因素的限制,成像系统难以无失真地获取原始场景中的信息,产生变形、模糊、降采样和噪声等问题,针对上述情况下降质图像的复原问题,提出了适用于低分辨率,低先验知识情况下的人脸复原方法,通过基于图像相似性的期望块1o9相似性EPLL(expected patch log likelihood)框架来构建人脸复原效果的失真函数,利用生成对抗网络的图像补全式生成过程来复原图像.所提算法在加噪率50%以及更高情况下可以保持较好的人脸图像轮廓与视觉特点,在复原加噪20%的降质图像时,相比传统的基于图像块相似性的算法,本文算法复原结果的统计特征峰值信噪比PSNR(peak signal-noise ratio)与结构相似度SSIM(structural similarity)值具有明显优势.【期刊名称】《科学技术与工程》【年(卷),期】2019(019)013【总页数】6页(P171-176)【关键词】图像复原;图像块相似性;生成对抗网络;人脸复原;图像补全【作者】苏婷婷;王娜【作者单位】武警工程大学密码工程学院,西安710086;武警工程大学基础部,西安710086【正文语种】中文【中图分类】TP391.413在图像获取过程中,由于成像距离、成像设备分辨率等因素的限制,成像系统难以无失真地获取原始场景中的信息,通常会受到变形、模糊、降采样和噪声等诸多因素的影响,导致获取图像的质量下降。
因此,如何提高图像的空间分辨率,改善图像质量,一直以来都是成像技术领域亟待解决的问题[1]。
图像复原技术致力于从一定程度上缓解成像过程中各种干扰因素的影响,主要采用的方法是将降质图像建模为原始图像与点扩展函数PSF(point spread function) 的卷积加上噪声的形式,根据PSF是否已知分为传统的定向复原与盲复原。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Modeling Based Image Reconstructionin Time-Resolved Contrast-Enhanced Magnetic Resonance Angiography(CE-MRA)F.T.A.W.Wajer,J.van den Brink,M.Fuderer,D.van Ormondt,J.A.C.van Osch and R.de BeerDelft University of Technology,Department of Applied PhysicsP.O.Box5046,2600GA Delft,The NetherlandsPhone:+31(0)152786394Fax:+31(0)152783251E-mail:beer@si.tn.tudelft.nlPhilips Medical Systems,Best,The Netherlands Abstract—We have worked on the measurement protocoland related image reconstruction of4D data sets in thefieldof time-resolved Contrast-Enhanced Magnetic ResonanceAngiography(CE-MRA).The method aims at improving theinterpolation of sparsely sampled-space data.It is basedon exploiting prior knowledge on the time courses(TCs)ofthe image pixels,when monitoring the uptake of contrastagents in the blood vessels.The result is compared with anexisting approach called3D-TRICKS.Keywords—Magnetic resonance angiography,contrastagent,image reconstruction,scan-time reduction,modelingof pixel time courses.I.I NTRODUCTIONRecently we have published on a method for reducing the scan time in thefield of three-dimensional(3D)time-resolved Contrast-Enhanced Magnetic Resonance Angio-graphy(CE-MRA)[1].CE-MRA is a3D Magnetic Re-sonance Imaging(3D-MRI)technique for visualizing the vascular system.It is based on using the passage of a con-Fig.2.Division of3D-space into four blocks labeled A,B,C and D,as proposed in3D-TRICKS[2].II.M ETHODSA.IntroductionIn3D-MRI the data values in-space are acquired along trajectories,which in case of cartesian sampling distribu-tions are equidistant straight lines.In Figure1these carte-sian trajectories are depicted as arrows in the-direction. Several of such trajectories are required tofill-space.The total measurement time for one3D-space is equal to thenumber of trajectories times the repetition time(TR).Typ-ical values are and TR=10ms,re-sulting in a measurement time of about20s.It is clear that this20s is too long to measure a series of3D-spaces du-ring passage of a contrast agent.A straightforward way to accomplish scan-time reduction is to ommit the measure-ment of a number of trajectories.In the next subsection we will describe how this is done in a sampling strategy called 3D-TRICKS.TimeD D D DC C C CB B B BA A A A A A A AA A A A A A A AB B B BC C C CD D D DFig.3.3D-TRICKS measurement protocol[2].At each sam-pling point in time only one block A,B,C or D of-space is mea-sured.The sequence DACABA is repeated.The empty boxes should be interpolated from the measured ones prior to image reconstruction.Fig.4.TCs of three-coordinates in the neighbourhood of the 3D-space origin.(top)Contributions from all pixels.(bottom) No contributions from pixels outside the blood vessels.B.3D-TRICKSIn three-Dimensional Time-Resolved Imaging of Con-trast KineticS(3D-TRICKS)reduction of MRI scan time is achieved by measuring only parts of the3D-space. To avoid loosing spatial resolution,previously measured -space parts in the4D data set are borrowed.The follow-ing steps can be distinguished in the original3D-TRICKS approach[2]:“Subdivision”of the3D-space into four blocks,la-beled A,B,C and D(see Figure2).Measurement of only“one single block”at each time point,following the sequence shown in Figure3. Estimation of the data values in the empty blocks.In 3D-TRICKS this is accomplished by“linear interpolation”between the acquired blocks.3D FFT based image reconstruction at each time point. The3D-TRICKS measurement protocol,presented in Figure3,makes it clear that the inner blocks A(the center of-space)are measured most.Furthermore it shows that the sequence DACABA is repeated,causing afive times larger empty gap between successive B,C and D blocks than A blocks.We have anticipated that estimating the data values of the empty outer blocks by means of linear inter-polation might be problematic.In the next subsection we propose an alternative way to deal with the missing data.It is based on exploiting the behaviour in the time direction of the individual pixels(the pixel time courses(TCs)). Fig.5.Fit of a sum of two gamma-variate functions(noiselesscurves)to a TC of a pixel in an artery(left peak)and a vein(right peak).C.Modeling by exploiting time behaviourSince in CE-MRA we are dealing with“time series”of 3D-spaces(and related3D images)it seems a logical step to investigate whether the behaviour as a function of time somehow can be included in the sampling-and image reconstruction strategy.In this context afirst decision to be made is,in which domain are we exploiting the timebehaviour?That is to say,are we looking in-space or in image-space as a function of time?Subdivision of3D-space into3D-TRICKS blocksMeasurement of a single block at each time pointZerofilling in the truncation dimension towards fullnumber of points3D-FFTInterpolation of TCs of imagesInterpolation of TCs of,andimagesby imposing the TCs ofAdding,,and imagesFig.6.Block scheme of sampling-and image reconstruction steps for time resolved4D CE-MRA.In principle,due to the linearity property of the Fourier transform[3],there should be no difference when obser-ving the TCs via-space coordinates or image-space pix-els.However,in practice of real-world CE-MRA data sets there is a complication since the contribution of the image domain“background”,or in other words of the pixels out-side the blood vessels,are influencing the time behaviour in-space.The latter is illustrated in Figure4,where we show some TCs for a4D CE-MRA data set as depicted via the center of-space.Details about the data set are given in the section Results and Discussion.When explaining Figure4we assume for a moment that the-space is the result of reverse Fourier transforming the image domain.In doing so in the top part of Figure4some TCs are shown“with the contributions from all pixels”.In the bottom part of thefigure,on the other hand,the TCs of the same-coordinates are shown,but now“without the contributions from the background pixels”.It can be seen that a somewhat clearer contrast agent uptake is visible. In Figure5we show two TCs as obtained via the image domain.One pixel is located in an artery and the other in a vein.The time behaviour is now so pronounced that the TCs could even be modeled by a mathematical function(in the example by a sum of two gamma-variates[4]). Because of the observations,just described,we have de-cided to exploit the time behaviour solely via the image do-main.In doing so we have defined a sequence of sampling-and image reconstruction steps as visualized in the block scheme of Figure6.Concerning the block scheme the following can be said. Since the A blocks are measured most(see Figure3)it seems a logical step to interpolate in some way the TCs of and subsequently impose that time behaviour on,and.The latter amounts to solving the equation(only shown for):(1) Once the linear parameters and have been ob-tained,they can be used to calculate the images at the other times(indicated by):(2) Hence an important step in the method is that in some way the missing time points of must be estimated. We have found that the number of A points is sufficient to allow“linear interpolation”,as was proposed in the origi-nal3D-TRICKS method[2].Fig.7.The TC of an pixel after(solid)using all data, (dashed)using3D-TRICKS and(dotted-solid)imposing the time behaviour of the A part on that of the B part.In order to illustrate that imposing the time behaviour of pixels can yield an improvement we compare in Fig-ure7the TC of an pixel as realized in three ways.The solid line is the true TC,obtained by using all data of the real-world4D data set mentioned earlier.The dashed line results from simulating the3D-TRICKS approach,that is to say after reducing the data set into the3D-TRICKS blocks and performing linear interpolation.Finally,the dotted-solid line is the result of imposing the TC from the A part on that of the B part.It is clear that the latter ismuch closer to the true TC than the linearly interpolated one.III.R ESULTS AND D ISCUSSIONA.Simulation of a4D CE-MRA data setIn order to establish whether the approach,described in the previous section,really works we have applied the method to a“simulated”4D CE-MRA data set.That is to say,we simulated the3D vascular system by sets of ar-teries and veins as shown in the top of Figure8and,in addition,we simulated the uptake of a contrast agent by the TCs shown in the bottom of thatfigure.Subsequently, we applied the protocol described by the block scheme of Figure6.Fig.8.Simulation of a4D CE-MRA data set.(top)Cross-section through the3D image domain.(bottom)The simulated TC of an artery(solid)and a vein(dashed).In Figure9we present the result for3D-TRICKS and our approach,as obtained for the simulated4D CE-MRA data set.Along the vertical axis the so-called“norm”is displayed,which is the square root of the sum of the squared differences between the absolute pixel values of the reconstructed image and the true image.The latter is known,of course,because we are dealing with a simulated image.The horizontal axis concerns thefirst40seconds of the time domain of the contrast agent uptake(the total time is about62seconds).Figure9demonstrates,that a4D CE-MRA data set,measured according to the3D-TRICKS protocol,really can benefit from our interpolation approach for theouter-space blocks.In thefigure most of the norm points for3D-TRICKS are considerable higher(rger differences with the true image)than the points for the new approach. Summing the norms over all time points and taking the 3D-TRICKS result as the100%reference,it is found that the sum for our method is42%of that of3D-TRICKS.Fig.9.Performance of3D-TRICKS and the new method for the simulated4D CE-MRA data set.Along the vertical axis is displayed the“norm”(see text)of the differences between the absolute pixel values of the reconstructed image and the true image.The horizontal axis concerns thefirst40seconds of the time domain.(dashed)Result of3D-TRICKS.(solid)Result of the new method.Fig.10.Performance of3D-TRICKS and the new method for the real-world4D CE-MRA data set.(dashed)Result of3D-TRICKS.(solid)Result of the new method.B.A real-world4D CE-MRA data setThe next step to do is to apply the new approach to a real-world4D CE-MRA data set.To that end we used a measurement that concerned the vascular system of the human neck.The4D acquisition matrix.This amounts for thecomplex-valued-space to a data set of about87MB,ta-king into account that the data values are stored in double precision.To increase the temporal resolution of the mea-surement,the SENSitivity Encoding(SENSE)method was employed.This method realizes a reduced-sampling stra-tegy by using an array of receiver coils[5].In this way for the data set at hand an inter-frame time of1.6s could be accomplished which means a total measurement time of about62s.This time is long enough to visualize the enlightenment of both the arteries and the veins.In order to handle the kind of data set just described we have worked with a PC containing a1.1GHz AMD Athlon CPU and having a1.5GB SDRAM memory.Fur-thermore,the hard-disk capacity is150GB and the sys-tem can operate under either RedHat Linux or Windows NT.When composing this system we have assumed that it must be capable of accessing something like three times the size of the resulting image domain matrix during image reconstruction.A typical case would be,for instance,anreconstruction matrix(when assuming short integers for the pixel values).Fig.11.Performances for the real-world4D CE-MRA data set after introducing true3D rectangular-space blocks and optimizing the sampling sequence of the blocks.(dashed)Result of3D-TRICKS.(solid)Result of the new method.(plus)Result of a mixed approach(see text).In Figure10the performance of3D-TRICKS and the new method is compared for the real-world4D CE-MRA data set.It can be seen that now the improvement of the new interpolation method is only present for a few time points around13seconds.This is the moment that the con-trast agent starts to pass the arteries(see again Figure5). For most of the other time points the performance of3D-TRICKS is even slightly better.Summing the norms over all time points and taking the3D-TRICKS result as the 100%reference,it is found that the sum for our method is102%of that of3D-TRICKS.In the next subsection we will describe how we have tried to improve this somewhat disappointing result for the real-world data set.C.An improvement of the new interpolation approach The performance results,shown in Figure10,were re-alized by using the original3D-TRICKS-space blocks (Figure2)and measurement protocol(Figure3)[2].In or-der tofind out,whether we could improve the results for the real-world data set,we have varied some parameters of the approach:The A,B,C and D-space regions were changed into true3D rectangular blocks(instead of blocks only dividing the dimension;see again Figure2).The order of sampling the A,B,C and D blocks was va-ried,leading to an optimum sequence for the current real-world data set.In fact,more B blocks were introduced at the expense of the C and D blocks.Fig.12.TCs of the original real-world CE-MRA data set, separated into the contribution from the A blocks(solid),the B blocks(dashed),the C blocks(plus-solid)and the D blocks (dotted-solid).(top)Pixel(55,49,7)(in an artery).(bottom) Pixel(70,56,7)(in between an artery and a vein).Dividing the-space into true3D rectangular blocks in-troduced some minor improvement for both3D-TRICKS and our approach.Measuring more B blocks realized a larger improvement,however more for3D-TRICKS than for our interpolation method(see Figure11).This has led us to the idea that somehow our assumption about the time behaviour of the A,B,C and D blocks is not true for all pixels.Indeed,after visualization of many pixel TCs we could demonstrate that for certain pixels the contribu-tions from the various-domain blocks may have different shapes.The latter is illustrated in Figure12.In order to be able to account for different time be-haviours wefinally have applied a mixed interpolation ap-proach:Both the A blocks and C blocks are linearly interpolated (i.e.according to3D-TRICKS).For A this was already the case.For C this was decided because they only represent a small signal strength(see Figure12).The B and C blocks are interpolated either according to 3D-TRICKS or to our approach,depending on the kind of time behaviour(see Figure13).Fig.13.TCs of pixel(55,49,7)(in an artery;see also top of Fig-ure12).The curves are from the full,original,dataset(dotted-solid),from the3D-TRICKS result(dashed)and from the model-based result(solid).(top)The B part.(bottom)The C part.It is clear that in this example the B part should be interpolated according to3D-TRICKS and the C part to our approach.The result for this mixed approach is visualized in Fig-ure11in the form of the plus points.For a number of time points they represent the lowest norm,or in other words they have the best performance.However,when summing the norms over all time points the differences are rather small.Taking the original3D-TRICKS approach of Fig-ure10as the100%reference,the sums for our original approach,the new3D-TRICKS and the mixed approach are88%,86%and85%,respectively.IV.C ONCLUSIONSWe have worked on improving the measurement proto-col and related image reconstruction method,when sam-pling4D data sets in thefield of time-resolved Contrast-Enhanced Magnetic Resonance Angiography(CE-MRA). The method takes as a starting point the3D-TRICKS approach and aimes at improving the interpolation of sparsely sampled outer-space blocks.This is realized by employing prior knowledge delivered by the pixel time courses(TCs).Compared to the original3D-TRICKS pro-tocol[2]we can conclude that:For a simulated CE-MRA data set the new interpolation approach works rather well.A gain in performance of im-age reconstruction of about58%is obtained.Dividing the-space into true3D blocks,instead of blocks that are only dividing the dimension,delivers a minor gain in performance.For the real-world CE-MRA data set a larger gain in performance is obtained by collecting more inner-space blocks than outer blocks as a function of time.The3D-TRICKS method benefits more from that than our method. Compared to the original3D-TRICKS measurement pro-tocol the new3D-TRICKS gains about14%and our new interpolation approach12%.The assumption that outer-space blocks deliver the same kind of time contribution to the TCs of pixels than inner blocks is not always met.This can be the case,for example,for pixels near blood-vessel walls and pixels in between neighbouring arteries and veins.An improvement of our interpolation method is obtained by carrying out a mixed approach concerning the interpo-lation of the sparsely sampled-space blocks.That is to say,some blocks are linearly interpolated(i.e.according to 3D-TRICKS)and others are interpolated according to our approach(i.e.invoking the TCs of inner blocks on that of the outer ones).This yielded a gain in performance,when compared to the original3D-TRICKS,of about15%.A CKNOWLEDGEMENTSThis work is supported by the Dutch Technology Foun-dation(STW,project DTN4967).R EFERENCES[1]Wajer F.T.A.W.;Fuderer M.;van der Brink J.;van Ormondt D.;de Beer R.Image Reconstruction With Time-Resolved Contrast.Proc.ProRISC2001,Veldhoven,the Netherlands,pp.714-720.On CD.[2]Korosec F.R.;Frayne R.;Grist T.M.;Mistretta C.A.Time-Resolved Contrast-Enhanced3D MR Angiography.Magn.Reson.Med.36:345-351;1996.[3]Brigham E.O.The Fast Fourier Transform.Englewood Cliffs:Prentice-Hall;1974.[4]Fain S.B.;Riederer S.J.;Bernstein M.A.;Huston J.Theoreti-cal Limits of Spatial Resolution in Elliptical-Centric Contrast-Enhanced3D-MRA.Magn.Reson.Med.42:1106-1116;1999.[5]Pruessmann K.P.;Weiger M.;Scheidegger M.B.;Boesiger P.SENSE:Sensivity Encoding for Fast MRI.Magn.Reson.Med.42:952-962;1999.。