Texture Synthesis by Non-parametric Sampling:通过非参数化采样的纹理合成
Multi-scale structural similarity for image quality assesment
MULTI-SCALE STRUCTURAL SIMILARITY FOR IMAGE QUALITY ASSESSMENT Zhou Wang1,Eero P.Simoncelli1and Alan C.Bovik2(Invited Paper)1Center for Neural Sci.and Courant Inst.of Math.Sci.,New York Univ.,New York,NY10003 2Dept.of Electrical and Computer Engineering,Univ.of Texas at Austin,Austin,TX78712 Email:zhouwang@,eero.simoncelli@,bovik@ABSTRACTThe structural similarity image quality paradigm is based on the assumption that the human visual system is highly adapted for extracting structural information from the scene,and therefore a measure of structural similarity can provide a good approxima-tion to perceived image quality.This paper proposes a multi-scale structural similarity method,which supplies moreflexibility than previous single-scale methods in incorporating the variations of viewing conditions.We develop an image synthesis method to calibrate the parameters that define the relative importance of dif-ferent scales.Experimental comparisons demonstrate the effec-tiveness of the proposed method.1.INTRODUCTIONObjective image quality assessment research aims to design qual-ity measures that can automatically predict perceived image qual-ity.These quality measures play important roles in a broad range of applications such as image acquisition,compression,commu-nication,restoration,enhancement,analysis,display,printing and watermarking.The most widely used full-reference image quality and distortion assessment algorithms are peak signal-to-noise ra-tio(PSNR)and mean squared error(MSE),which do not correlate well with perceived quality(e.g.,[1]–[6]).Traditional perceptual image quality assessment methods are based on a bottom-up approach which attempts to simulate the functionality of the relevant early human visual system(HVS) components.These methods usually involve1)a preprocessing process that may include image alignment,point-wise nonlinear transform,low-passfiltering that simulates eye optics,and color space transformation,2)a channel decomposition process that trans-forms the image signals into different spatial frequency as well as orientation selective subbands,3)an error normalization process that weights the error signal in each subband by incorporating the variation of visual sensitivity in different subbands,and the vari-ation of visual error sensitivity caused by intra-or inter-channel neighboring transform coefficients,and4)an error pooling pro-cess that combines the error signals in different subbands into a single quality/distortion value.While these bottom-up approaches can conveniently make use of many known psychophysical fea-tures of the HVS,it is important to recognize their limitations.In particular,the HVS is a complex and highly non-linear system and the complexity of natural images is also very significant,but most models of early vision are based on linear or quasi-linear oper-ators that have been characterized using restricted and simplistic stimuli.Thus,these approaches must rely on a number of strong assumptions and generalizations[4],[5].Furthermore,as the num-ber of HVS features has increased,the resulting quality assessment systems have become too complicated to work with in real-world applications,especially for algorithm optimization purposes.Structural similarity provides an alternative and complemen-tary approach to the problem of image quality assessment[3]–[6].It is based on a top-down assumption that the HVS is highly adapted for extracting structural information from the scene,and therefore a measure of structural similarity should be a good ap-proximation of perceived image quality.It has been shown that a simple implementation of this methodology,namely the struc-tural similarity(SSIM)index[5],can outperform state-of-the-art perceptual image quality metrics.However,the SSIM index al-gorithm introduced in[5]is a single-scale approach.We consider this a drawback of the method because the right scale depends on viewing conditions(e.g.,display resolution and viewing distance). In this paper,we propose a multi-scale structural similarity method and introduce a novel image synthesis-based approach to calibrate the parameters that weight the relative importance between differ-ent scales.2.SINGLE-SCALE STRUCTURAL SIMILARITYLet x={x i|i=1,2,···,N}and y={y i|i=1,2,···,N}be two discrete non-negative signals that have been aligned with each other(e.g.,two image patches extracted from the same spatial lo-cation from two images being compared,respectively),and letµx,σ2x andσxy be the mean of x,the variance of x,and the covariance of x and y,respectively.Approximately,µx andσx can be viewed as estimates of the luminance and contrast of x,andσxy measures the the tendency of x and y to vary together,thus an indication of structural similarity.In[5],the luminance,contrast and structure comparison measures were given as follows:l(x,y)=2µxµy+C1µ2x+µ2y+C1,(1)c(x,y)=2σxσy+C2σ2x+σ2y+C2,(2)s(x,y)=σxy+C3σxσy+C3,(3) where C1,C2and C3are small constants given byC1=(K1L)2,C2=(K2L)2and C3=C2/2,(4)Fig.1.Multi-scale structural similarity measurement system.L:low-passfiltering;2↓:downsampling by2. respectively.L is the dynamic range of the pixel values(L=255for8bits/pixel gray scale images),and K1 1and K2 1aretwo scalar constants.The general form of the Structural SIMilarity(SSIM)index between signal x and y is defined as:SSIM(x,y)=[l(x,y)]α·[c(x,y)]β·[s(x,y)]γ,(5)whereα,βandγare parameters to define the relative importanceof the three components.Specifically,we setα=β=γ=1,andthe resulting SSIM index is given bySSIM(x,y)=(2µxµy+C1)(2σxy+C2)(µ2x+µ2y+C1)(σ2x+σ2y+C2),(6)which satisfies the following conditions:1.symmetry:SSIM(x,y)=SSIM(y,x);2.boundedness:SSIM(x,y)≤1;3.unique maximum:SSIM(x,y)=1if and only if x=y.The universal image quality index proposed in[3]corresponds to the case of C1=C2=0,therefore is a special case of(6).The drawback of such a parameter setting is that when the denominator of Eq.(6)is close to0,the resulting measurement becomes unsta-ble.This problem has been solved successfully in[5]by adding the two small constants C1and C2(calculated by setting K1=0.01 and K2=0.03,respectively,in Eq.(4)).We apply the SSIM indexing algorithm for image quality as-sessment using a sliding window approach.The window moves pixel-by-pixel across the whole image space.At each step,the SSIM index is calculated within the local window.If one of the image being compared is considered to have perfect quality,then the resulting SSIM index map can be viewed as the quality map of the other(distorted)image.Instead of using an8×8square window as in[3],a smooth windowing approach is used for local statistics to avoid“blocking artifacts”in the quality map[5].Fi-nally,a mean SSIM index of the quality map is used to evaluate the overall image quality.3.MULTI-SCALE STRUCTURAL SIMILARITY3.1.Multi-scale SSIM indexThe perceivability of image details depends the sampling density of the image signal,the distance from the image plane to the ob-server,and the perceptual capability of the observer’s visual sys-tem.In practice,the subjective evaluation of a given image varies when these factors vary.A single-scale method as described in the previous section may be appropriate only for specific settings.Multi-scale method is a convenient way to incorporate image de-tails at different resolutions.We propose a multi-scale SSIM method for image quality as-sessment whose system diagram is illustrated in Fig. 1.Taking the reference and distorted image signals as the input,the system iteratively applies a low-passfilter and downsamples thefiltered image by a factor of2.We index the original image as Scale1, and the highest scale as Scale M,which is obtained after M−1 iterations.At the j-th scale,the contrast comparison(2)and the structure comparison(3)are calculated and denoted as c j(x,y) and s j(x,y),respectively.The luminance comparison(1)is com-puted only at Scale M and is denoted as l M(x,y).The overall SSIM evaluation is obtained by combining the measurement at dif-ferent scales usingSSIM(x,y)=[l M(x,y)]αM·Mj=1[c j(x,y)]βj[s j(x,y)]γj.(7)Similar to(5),the exponentsαM,βj andγj are used to ad-just the relative importance of different components.This multi-scale SSIM index definition satisfies the three conditions given in the last section.It also includes the single-scale method as a spe-cial case.In particular,a single-scale implementation for Scale M applies the iterativefiltering and downsampling procedure up to Scale M and only the exponentsαM,βM andγM are given non-zero values.To simplify parameter selection,we letαj=βj=γj forall j’s.In addition,we normalize the cross-scale settings such thatMj=1γj=1.This makes different parameter settings(including all single-scale and multi-scale settings)comparable.The remain-ing job is to determine the relative values across different scales. Conceptually,this should be related to the contrast sensitivity func-tion(CSF)of the HVS[7],which states that the human visual sen-sitivity peaks at middle frequencies(around4cycles per degree of visual angle)and decreases along both high-and low-frequency directions.However,CSF cannot be directly used to derive the parameters in our system because it is typically measured at the visibility threshold level using simplified stimuli(sinusoids),but our purpose is to compare the quality of complex structured im-ages at visible distortion levels.3.2.Cross-scale calibrationWe use an image synthesis approach to calibrate the relative impor-tance of different scales.In previous work,the idea of synthesizing images for subjective testing has been employed by the“synthesis-by-analysis”methods of assessing statistical texture models,inwhich the model is used to generate a texture with statistics match-ing an original texture,and a human subject then judges the sim-ilarity of the two textures [8]–[11].A similar approach has also been qualitatively used in demonstrating quality metrics in [5],[12],though quantitative subjective tests were not conducted.These synthesis methods provide a powerful and efficient means of test-ing a model,and have the added benefit that the resulting images suggest improvements that might be made to the model[11].M )distortion level (MSE)12345Fig.2.Demonstration of image synthesis approach for cross-scale calibration.Images in the same row have the same MSE.Images in the same column have distortions only in one specific scale.Each subject was asked to select a set of images (one from each scale),having equal quality.As an example,one subject chose the marked images.For a given original 8bits/pixel gray scale test image,we syn-thesize a table of distorted images (as exemplified by Fig.2),where each entry in the table is an image that is associated witha specific distortion level (defined by MSE)and a specific scale.Each of the distorted image is created using an iterative procedure,where the initial image is generated by randomly adding white Gaussian noise to the original image and the iterative process em-ploys a constrained gradient descent algorithm to search for the worst images in terms of SSIM measure while constraining MSE to be fixed and restricting the distortions to occur only in the spec-ified scale.We use 5scales and 12distortion levels (range from 23to 214)in our experiment,resulting in a total of 60images,as demonstrated in Fig.2.Although the images at each row has the same MSE with respect to the original image,their visual quality is significantly different.Thus the distortions at different scales are of very different importance in terms of perceived image quality.We employ 10original 64×64images with different types of con-tent (human faces,natural scenes,plants,man-made objects,etc.)in our experiment to create 10sets of distorted images (a total of 600distorted images).We gathered data for 8subjects,including one of the authors.The other subjects have general knowledge of human vision but did not know the detailed purpose of the study.Each subject was shown the 10sets of test images,one set at a time.The viewing dis-tance was fixed to 32pixels per degree of visual angle.The subject was asked to compare the quality of the images across scales and detect one image from each of the five scales (shown as columns in Fig.2)that the subject believes having the same quality.For example,one subject chose the images marked in Fig.2to have equal quality.The positions of the selected images in each scale were recorded and averaged over all test images and all subjects.In general,the subjects agreed with each other on each image more than they agreed with themselves across different images.These test results were normalized (sum to one)and used to calculate the exponents in Eq.(7).The resulting parameters we obtained are β1=γ1=0.0448,β2=γ2=0.2856,β3=γ3=0.3001,β4=γ4=0.2363,and α5=β5=γ5=0.1333,respectively.4.TEST RESULTSWe test a number of image quality assessment algorithms using the LIVE database (available at [13]),which includes 344JPEG and JPEG2000compressed images (typically 768×512or similar size).The bit rate ranges from 0.028to 3.150bits/pixel,which allows the test images to cover a wide quality range,from in-distinguishable from the original image to highly distorted.The mean opinion score (MOS)of each image is obtained by averag-ing 13∼25subjective scores given by a group of human observers.Eight image quality assessment models are being compared,in-cluding PSNR,the Sarnoff model (JNDmetrix 8.0[14]),single-scale SSIM index with M equals 1to 5,and the proposed multi-scale SSIM index approach.The scatter plots of MOS versus model predictions are shown in Fig.3,where each point represents one test image,with its vertical and horizontal axes representing its MOS and the given objective quality score,respectively.To provide quantitative per-formance evaluation,we use the logistic function adopted in the video quality experts group (VQEG)Phase I FR-TV test [15]to provide a non-linear mapping between the objective and subjective scores.After the non-linear mapping,the linear correlation coef-ficient (CC),the mean absolute error (MAE),and the root mean squared error (RMS)between the subjective and objective scores are calculated as measures of prediction accuracy .The prediction consistency is quantified using the outlier ratio (OR),which is de-Table1.Performance comparison of image quality assessment models on LIVE JPEG/JPEG2000database[13].SS-SSIM: single-scale SSIM;MS-SSIM:multi-scale SSIM;CC:non-linear regression correlation coefficient;ROCC:Spearman rank-order correlation coefficient;MAE:mean absolute error;RMS:root mean squared error;OR:outlier ratio.Model CC ROCC MAE RMS OR(%)PSNR0.9050.901 6.538.4515.7Sarnoff0.9560.947 4.66 5.81 3.20 SS-SSIM(M=1)0.9490.945 4.96 6.25 6.98 SS-SSIM(M=2)0.9630.959 4.21 5.38 2.62 SS-SSIM(M=3)0.9580.956 4.53 5.67 2.91 SS-SSIM(M=4)0.9480.946 4.99 6.31 5.81 SS-SSIM(M=5)0.9380.936 5.55 6.887.85 MS-SSIM0.9690.966 3.86 4.91 1.16fined as the percentage of the number of predictions outside the range of±2times of the standard deviations.Finally,the predic-tion monotonicity is measured using the Spearman rank-order cor-relation coefficient(ROCC).Readers can refer to[15]for a more detailed descriptions of these measures.The evaluation results for all the models being compared are given in Table1.From both the scatter plots and the quantitative evaluation re-sults,we see that the performance of single-scale SSIM model varies with scales and the best performance is given by the case of M=2.It can also be observed that the single-scale model tends to supply higher scores with the increase of scales.This is not surprising because image coding techniques such as JPEG and JPEG2000usually compressfine-scale details to a much higher degree than coarse-scale structures,and thus the distorted image “looks”more similar to the original image if evaluated at larger scales.Finally,for every one of the objective evaluation criteria, multi-scale SSIM model outperforms all the other models,includ-ing the best single-scale SSIM model,suggesting a meaningful balance between scales.5.DISCUSSIONSWe propose a multi-scale structural similarity approach for image quality assessment,which provides moreflexibility than single-scale approach in incorporating the variations of image resolution and viewing conditions.Experiments show that with an appropri-ate parameter settings,the multi-scale method outperforms the best single-scale SSIM model as well as state-of-the-art image quality metrics.In the development of top-down image quality models(such as structural similarity based algorithms),one of the most challeng-ing problems is to calibrate the model parameters,which are rather “abstract”and cannot be directly derived from simple-stimulus subjective experiments as in the bottom-up models.In this pa-per,we used an image synthesis approach to calibrate the param-eters that define the relative importance between scales.The im-provement from single-scale to multi-scale methods observed in our tests suggests the usefulness of this novel approach.However, this approach is still rather crude.We are working on developing it into a more systematic approach that can potentially be employed in a much broader range of applications.6.REFERENCES[1] A.M.Eskicioglu and P.S.Fisher,“Image quality mea-sures and their performance,”IEEE munications, vol.43,pp.2959–2965,Dec.1995.[2]T.N.Pappas and R.J.Safranek,“Perceptual criteria for im-age quality evaluation,”in Handbook of Image and Video Proc.(A.Bovik,ed.),Academic Press,2000.[3]Z.Wang and A.C.Bovik,“A universal image quality in-dex,”IEEE Signal Processing Letters,vol.9,pp.81–84,Mar.2002.[4]Z.Wang,H.R.Sheikh,and A.C.Bovik,“Objective videoquality assessment,”in The Handbook of Video Databases: Design and Applications(B.Furht and O.Marques,eds.), pp.1041–1078,CRC Press,Sept.2003.[5]Z.Wang,A.C.Bovik,H.R.Sheikh,and E.P.Simon-celli,“Image quality assessment:From error measurement to structural similarity,”IEEE Trans.Image Processing,vol.13, Jan.2004.[6]Z.Wang,L.Lu,and A.C.Bovik,“Video quality assessmentbased on structural distortion measurement,”Signal Process-ing:Image Communication,special issue on objective video quality metrics,vol.19,Jan.2004.[7] B.A.Wandell,Foundations of Vision.Sinauer Associates,Inc.,1995.[8]O.D.Faugeras and W.K.Pratt,“Decorrelation methods oftexture feature extraction,”IEEE Pat.Anal.Mach.Intell., vol.2,no.4,pp.323–332,1980.[9] A.Gagalowicz,“A new method for texturefields synthesis:Some applications to the study of human vision,”IEEE Pat.Anal.Mach.Intell.,vol.3,no.5,pp.520–533,1981. [10] D.Heeger and J.Bergen,“Pyramid-based texture analy-sis/synthesis,”in Proc.ACM SIGGRAPH,pp.229–238,As-sociation for Computing Machinery,August1995.[11]J.Portilla and E.P.Simoncelli,“A parametric texture modelbased on joint statistics of complex wavelet coefficients,”Int’l J Computer Vision,vol.40,pp.49–71,Dec2000. [12]P.C.Teo and D.J.Heeger,“Perceptual image distortion,”inProc.SPIE,vol.2179,pp.127–141,1994.[13]H.R.Sheikh,Z.Wang, A. C.Bovik,and L.K.Cormack,“Image and video quality assessment re-search at LIVE,”/ research/quality/.[14]Sarnoff Corporation,“JNDmetrix Technology,”http:///products_services/video_vision/jndmetrix/.[15]VQEG,“Final report from the video quality experts groupon the validation of objective models of video quality assess-ment,”Mar.2000./.PSNRM O SSarnoffM O S(a)(b)Single−scale SSIM (M=1)M O SSingle−scale SSIM (M=2)M O S(c)(d)Single−scale SSIM (M=3)M O SSingle−scale SSIM (M=4)M O S(e)(f)Single−scale SSIM (M=5)M O SMulti−scale SSIMM O S(g)(h)Fig.3.Scatter plots of MOS versus model predictions.Each sample point represents one test image in the LIVE JPEG/JPEG2000image database [13].(a)PSNR;(b)Sarnoff model;(c)-(g)single-scale SSIM method for M =1,2,3,4and 5,respectively;(h)multi-scale SSIM method.。
基于 3D 人脸重建的光照、姿态不变人脸识别
ISSN 1000-9825, CODEN RUXUEW E-mail: jos@Journal of Software, Vol.17, No.3, March 2006, pp.525−534 DOI: 10.1360/jos170525 Tel/Fax: +86-10-62562563© 2006 by Journal of Softwar e. All rights reserved.∗基于3D人脸重建的光照、姿态不变人脸识别柴秀娟1+, 山世光2, 卿来云2, 陈熙霖2, 高文1,21(哈尔滨工业大学计算机学院,黑龙江哈尔滨 150001)2(中国科学院计算技术研究所 ICT-ISVISION面像识别联合实验室,北京 100080)Pose and Illumination Invariant Face Recognition Based on 3D Face ReconstructionCHAI Xiu-Juan1+, SHAN Shi-Guang2, QING Lai-Yun2, CHEN Xi-Lin2, GAO Wen1,21(Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)2(ICT-ISVISION Joint R&D Laboratory for Face Recognition, Institute of Computer Technology, The Chinese Academy of Sciences,Beijing 100080, China)+ Corresponding author: Phn: +86-10-58858300 ext 314, Fax: +86-10-58858301, E-mail: xjchai@, /Chai XJ, Shan SG, Qing LY, Chen XL, Gao W. Pose and illumination invariant face recognition based on 3Dface reconstruction. Journal of Software, 2006,17(3):525−534. /1000-9825/17/525.htmAbstract: Pose and illumination changings from picture to picture are two main barriers toward full automaticface recognition. In this paper, a novel method to handle both pose and lighting conditions simultaneously isproposed, which calibrates the pose and lighting to a predefined reference condition through an illuminationinvariant 3D face reconstruction. First, some located facial landmarks and a priori statistical deformable 3D modelare used to recover an elaborate 3D shape. Based on the recovered 3D shape, the “texture image” calibrated to astandard illumination is generated by spherical harmonics ratio image and finally the illumination independent 3Dface is reconstructed completely. The proposed method combines the strength of statistical deformable model todescribe the shape information and the compact representations of the illumination in spherical frequency space,and handles both the pose and illumination variation simultaneously. This algorithm can be used to synthesizevirtual views of a given face image and enhance the performance of face recognition. Experimental results on CMUPIE database show that this method can significantly improve the accuracy of the existing face recognition methodwhen pose and illumination are inconsistent between gallery and probe sets.Key words: face recognition; 3D face reconstruction; statistic deformable model; spherical harmonic; ratio image摘 要: 待匹配人脸图像与库存原型图像之间姿态和光照的差异是自动人脸识别的两个主要瓶颈问题,已有的解决方法往往只能单独处理二者之一,而不能同时处理光照和姿态问题.提出了一种对人脸图像中的姿态和光照变化同时进行校正处理的方法,即通过光照不变的3D人脸重建过程,将姿态和光照都校正到预先定义的标准条件下.首先,利用先验的统计变形模型,结合人脸图像上的一些关键点来恢复较为精细的人脸3D形状.基于此重建的3D形∗ Supported by the National Natural Science Foundation of China under Grant No.60332010 (国家自然科学基金); the “100 TalentsProgram” of the CAS (中国科学院百人计划); the Shanghai Municipal Sciences and Technology Committee of China under GrantNo.03DZ15013 (上海市科委项目); the ISVISION Technologies Co., Ltd (银晨智能识别科技有限公司资金资助)Received 2005-05-16; Accepted 2005-07-11526 Journal of Software软件学报 V ol.17, No.3, March 2006状,进而通过球面谐波商图像的方法估计输入图像的光照属性并提取输入图像的光照无关的纹理信息,从而将光照无关的3D人脸完全重构出来,生成输入人脸图像在标准姿态和光照条件下的虚拟视图,用于最终的分类识别,实现了对光照和姿态问题的同时处理.在CMU PIE数据库上的实验结果表明,此方法可以在很大程度上提高现有人脸识别方法对于原型集合(gallery)和测试集合中图像在姿态和光照不一致情况下识别结果的正确性.关键词: 人脸识别;3D人脸重建;统计变形模型;球面谐波;商图像中图法分类号: TP393文献标识码: A人脸识别技术在安全、金融、法律、人机交互等领域具有广阔的应用前景,因此得到了研究人员的广泛关注.经过近40年的发展,对于均匀光照下的中性表情的正面人脸图像其识别率已经很高[1].然而在一些更复杂的情况下,现有多数系统的识别性能受姿态和光照变化的影响特别大.这是因为当人脸的姿态或光照发生变化时,人脸图像的外观也会随之发生很大变化,所以通常使用的基于2D外观的方法在这种情况下就会失效.尽管也有一些基于2D的方法(如基于多视图的方法[2])可以在一定程度上解决姿态或者光照的变化问题,但我们认为基于3D信息来改善2D图像外观的方法是解决姿态、光照变化问题的最本质的方法.在研究的早期阶段,无论是针对姿态问题,还是光照问题,基于人脸图像外观低维子空间描述的方法都是主要的思路.Eigenfaces[2]和Fisherfaces[3]根据统计学习得到经验的人脸低维的姿态或光照空间.这类方法易于实现,精度较高.但是当测试图像和训练图像集合的成像条件不太相似时,其性能下降得非常严重.由Gross等人提出的Fisher Light-Fields算法[4]通过估计gallery或者测试图像对应的头部的特征光场,并将特征光场系数作为特征集合来进行最终的识别.将此工作进一步扩展,Zhou提出了一种Illuminating Light Field方法[5],其中,Lambertian反射模型被用于控制光照的变化,该方法对新的光照条件比Fisher Light Field具有更强的泛化能力.然而,该算法需要多个姿态以及多种光照条件下的多幅训练图像进行建模,这对多数实际应用而言是难以满足的.姿态和光照的变化对人脸图像的影响说到底是与人脸的3D结构有关的,假设人脸的3D形状、表面反射率已知或者可以估计,那么姿态和光照问题就可以更容易地解决.因此,一些基于模型的方法试图从人脸图像中将人脸内在属性(3D形状和表面反射率)和外在成像条件(光照条件、绘制条件、摄像机参数等)分离开来,从而可以消除外在成像条件的影响而仅利用内在本质属性实现准确的分类识别.其中最著名的方法包括光照锥、对称SFS和3D形变模型方法.Georghiades提出了光照锥(illumination cone)方法[6]来解决光照变化和姿态变化下的人脸识别.该方法能够根据给定的多幅(不少于3幅)相同姿态、不同光照条件的输入图像,估计出输入人脸的3D信息.其本质上是传统的光度立体视觉(photometric stereo)方法的一个变种,通过SVD迭代估计光照、人脸3D形状、表面点反射率,并最终利用人脸3D形状分布的先验知识(如对称性、鼻子为最近点等)作为约束求解人脸的3D形状.通常这种方法至少需要每个目标不同光照条件下的7幅图像,这对于大多数应用来说都太过苛刻,因此难以实用.与光照锥利用光度立体视觉方法不同,Zhao则提出了采用从影调恢复形状(shape from shading,简称SFS)方法进行人脸3D形状重建的思路,在传统SFS方法的基础上,利用了人脸的对称性先验知识,提出了SSFS方法(symmetric SFS)[7].该方法只需要一幅输入人脸图像即可,但它需要通过其他方法估计输入图像的光照情况以及精确的对称轴信息,这都增加了该方法在实用上的困难.迄今为止,最成功的姿态和光照不变的人脸识别是3D变形模型方法(3D morphable model)[8].该方法通过主成分分析对人脸的3D形状和纹理(表面反射率)分别进行统计建模,在此基础上建立了包含形状、纹理统计参数、Phone模型参数、光照参数、摄像机内外参数、绘制参数等在内的复杂成像模型,最终采用基于合成的分析(analysis-by-synthesis)技术通过优化算法估计这些参数,得到输入人脸的3D形状和纹理统计参数用于最终的分类识别.这种变形方法已经用于FRVT2002[1]中.遗憾的是,该方法需要求解一个涉及几百个参数的复杂连续优化问题,迭代优化过程耗费了大量的计算时间,整个拟合过程在一台2GHzP4处理器的工作站上,大约需要4.5分钟.这对于多数实用系统而言是难以忍受的.基于上述分析,我们认为3D变形模型中采用的统计建模方法是利用人脸先验3D信息的最佳方式,所以本柴秀娟等:基于3D人脸重建的光照、姿态不变人脸识别527文也采用了类似的建模方法,不同的是,为了避免3D形变模型中过分复杂的参数优化过程,我们没有采用稠密的统计模型,而仅仅采用了一个稀疏的关键特征点统计模型;同时也没有采用复杂的成像模型,而是采用了更为方便、实用的球面谐波商图像方法来处理光照估计和光照影响消除问题.这些措施极大地降低了算法的计算复杂度,使得整个处理过程可以在1秒内完成(P4 3.2G CPU的机器上).当然,与3D形变模型相比,本文算法重建的人脸3D信息的精度会有较大的差距,但需要注意的是,我们的目标是实现对姿态、光照变化不敏感的人脸识别,并非精确的3D重建,而多数准正面人脸识别系统都对较小的光照、姿态变化有一定的容忍能力,因此,本文在算法精度和速度方面进行折衷是合理的,本文的实验也表明了这一点.本文第1节详细的描述我们提出的姿态和光照不变的人脸识别方法,第1.1节介绍基于3D稀疏变形模型的3D形状重构算法.第1.2节对基于球面谐波商图像的光照不变的纹理生成方法进行描述.本文算法的一些合成结果以及对姿态和光照不变的人脸识别实验结果在第2节给出.最后给出本文的结论.1 姿态和光照不变的人脸识别本文提出的光照、姿态不变的人脸识别系统的整体框架如图1所示.首先,对于给定的任意一幅人脸图像进行人脸检测和眼睛中心的定位,根据基于视图的子空间方法得到粗略的姿态类别,然后采用ASM或者ESL算法[9]标定给定人脸图像的稀疏的关键特征点.基于3D稀疏变形模型,结合人脸图像的2D形状信息,重建对应于输入图像人脸的3D形状.借助于重建得到的特定人的3D形状,根据球面谐波商图像方法,将由输入人脸图像重新打光到预先定义的参考光照条件下,从而生成光照不变的纹理图像,即实现了参考光照条件下的3D人脸重建.进行了姿态和光照校正之后的人脸图像即可作为一般正面人脸识别系统的输入,与库存人脸图像进行匹配,得到识别结果.因此,我们的算法也可以看作是任何人脸识别系统的预处理步骤.reconstructionOriginal textureextraction texture generationFig.1 The framework of pose and illumination calibration for face recognition图1 用于人脸识别的姿态和光照校正的流程图下面,我们对上面所述框架中的两个关键问题分别进行描述,即基于3D稀疏变形模型的3D人脸形状重建和基于球面谐波商图像的光照不变的纹理生成.1.1 基于单幅图像的3D人脸形状重构由单幅人脸图像恢复出此特定人的3D结构是解决姿态问题的最直观、最本质的方法.然而,在没有任何假设的前提下,从单幅人脸图像恢复3D形状是一个病态的问题.文献[10]中明确指出,重建3D人脸所需图像的最小数目是3.为克服需要多幅图像才能重建3D人脸形状这一困难,我们对3D人脸数据集合中的形状向量进行主成分分析(principle component analysis,简称PCA),建立统计变形模型.此模型即包含了人脸类的3D形状的先528Journal of Software 软件学报 V ol.17, No.3, March 2006验知识.1.1.1 创建稀疏统计模型 本文采用USF Human ID 3-D 数据库中的100个激光扫描的3D 人脸作为训练集合创建统计变形模型[11].所有这些人脸都被标准化到一个统一的标准的方向和位置.人脸的形状由75 972个顶点表示,为简化计算,均匀下采样到8 955个顶点.在本文算法中,对形状和纹理按照不同的策略分开处理.我们认为,人脸的姿态只与一些关键特征点的位置有关,而与图像的亮度无关.通过将形状和纹理分开处理,避免了复杂的优化过程,节省了计算时间.下面,首先介绍稀疏统计模型的建立过程.我们将人脸的n 个顶点的X ,Y ,Z 坐标串接起来,组成形状向量来描述人脸的3D 形状:n T n n n Z Y X Z Y X 3111),,,...,,,(ℜ∈=S .假设用于训练的3D 人脸的数目是m ,每个3D 人脸向量可以描述为S i ,其中i =1,…,m .这些形状向量都已经在尺度上进行了配准,则任意一个新的3D 人脸形状向量S 都可表示为训练样本形状向量的线性组合形式,即∑==mi w 1i i S S .考虑到所有的人脸形状在整体上都是相似的,只是不同的人脸形状之间存在一些小的差别.由于PCA 适合于捕获同类向量主成分方向上的变化,并能够滤去各种测量噪声,因此,我们采用PCA 对训练集合中的3D 形状进行建模.通过对协方差矩阵的特征值进行降序排列,按序选择前)1(−≤m d 个对应最大特征值的形状特征向量,共同构成投影矩阵P ,即形成统计变形模型: P αS S += (1) 其中S 是平均的形状向量,α是对应于d 维投影矩阵P 的系数向量.当人脸发生姿态旋转时,对上面的等式进行扩展.对于3D 形状中的任意一个顶点D =(x ,y ,z )T ,根据PCA 特性,由式(1)可知此式成立:αRP R R D D D +=.其中,R 是由3个旋转角度表征的3×3维旋转矩阵,D 是S 中对应此顶点的坐标值,P D 是由P 中抽取出的对应顶点D 的3×d 维矩阵.由此可知,对于整个3D 形状向量也同样满足上面的旋转变换.我们引入符号R V 来表示这样的运算操作:对一个3D 形状向量V 按旋转矩阵R 进行旋转变换.同样,P R 表示对P 中的每一列3D 特征向量做同样的旋转变换.因此,将式(1)向多姿态情况扩展,有下面的等式: αP S S R R R += (2)有了这个扩展到多姿态的统计变形模型,则任意姿态下的3D 人脸形状都可以由变换到同姿态下的样本形状的PCA 模型来表示.但是仅仅给定一幅人脸图像,如何利用人脸形状的先验知识来进行3D 形状重建仍然是一个问题.基于此,我们进一步提出了3D 稀疏变形模型(3D SDM).目的是建立输入图像的关键特征点向量和3D 稀疏变形模型之间的对应,从而优化得到3D 稀疏变形模型的系数向量.我们认为此系数向量同样是密集的统计变形模型的系数向量.这样就可以很容易地实现对应于给定人脸的密集3D 形状的重建.与3D 形状表示方法相似,将2D 人脸图像上的k 个关键点的X ,Y 坐标串接起来,表示为I S .每一个2D 的关键点在3D 形状向量上都有且只有一个标号固定的点与其对应.按照这种对应关系,将3D 形状上的对应于2D 关键点的3D 点抽取出来,并同样连接起来表示,标记为S ′.对平均形状和投影矩阵做同样的处理,即形成稀疏的S 和P ′.由此得到3D 变形模型的稀疏版本:αP S S ′+′=′.同样,向多姿态情况进行扩展,即创建得到3D 稀疏变形模型(3D SDM):αP S S R R R )()()(′+′=′.1.1.2 基于SDM 的3D 重建即使有了3D 形状的统计先验知识,仅根据一幅2D 图像恢复其3D 形状仍然很困难.本文的策略是以PCA 由部分重构整体的特性为依据提出的.我们试图根据输入图像的关键点信息以及相应的3D 稀疏变形模型信息,获取PCA 模型的重构系数,最后将优化的系数向量投影到完整的PCA 模型上,以得到对应输入图像人脸的密集柴秀娟 等:基于3D 人脸重建的光照、姿态不变人脸识别 529 的3D 形状向量.我们接下来对稀疏的3D 形状S ′抽取其X ,Y 坐标,形成2D 形状分量,标记为f S ′.由于f S ′可以被认为是3D 形状S 的一部分,因此,下面的等式仍然近似成立:αP S S f f f ′+′=′.这里引入符号f V 来表示在3D 形状V 中按序抽取相应的X ,Y 坐标形成2D 形状向量.f S )(和f P )(′分别表示从稀疏的3D 平均形状S ′和投影矩阵P ′中提取的对应的2D 关键点向量,则符号Rf S )(′就表示从经过旋转矩阵R变换之后的稀疏3D 形状向量S ′中抽取X ,Y 坐标形成的2D 形状向量.因此,任意姿态角度下的3D 形状向量,其对应2D 关键点的稀疏向量的X ,Y 分量串联成的向量可以如下表示: αP S S R f R f R f )()()(′+′=′ (3)我们的目标是通过形状系数向量α来重构完整3D 形状信息,对应于特定人的形状系数向量可如下求解:[][]R f R f R f S S )P α)()((′−′′=+ (4) 其中,+′])[(Rf P 是伪逆转置矩阵,可通过[][]()[]TT))()()()(1R f R f R f R f P P P P ′′′=′−+来计算.此系数向量是根据PCA 的部分向量求得的,我们认为它同时也可作为完整PCA 各分量对应的系数.因此,问题的关键就是根据图像关键特征点向量I S ,计算出向量Rf S )(′.I S 和R f S )(′之间的关系可以表示为T S S Rf +′=])[(c I (5)我们用对应于x ,y ,z 三个坐标轴的旋转角度γβα,,来刻画旋转矩阵R .利用人脸图像上的5个关键特征点及其对应的3D 模型S 上的5个关键的3D 点,其组成的向量分别表示为5−I S 以及5S ′,我们可以通过投影计算得到3个旋转角度参数.用于计算旋转姿态参数的5个特征点是左、右眼中心,鼻尖,左、右嘴角.下面,我们将对此3D 形状重建算法进行详细介绍.迭代优化过程如下:A. 通过对选择的5个关键特征点,在输入人脸图像以及对应的3D 形状上的点之间建立透视投影关系,S I −5=s QRS ′5+(tx ,ty )T 联立方程组,求解计算得到旋转矩阵R 的参数.其中,Q 为2×3的投影矩阵,即⎟⎟⎠⎞⎜⎜⎝⎛=010001Q ,仅获取y x ,方向的信息用来计算.s 是尺度因子,ty tx ,分别是x ,y 方向平移分量. B. 在I S 和Rf S )(′两个向量对应的位移向量T 和尺度变化因子c 通过下面等式计算:尺度因子为()()∑∑==+∇−+∇−=k i i i ki i I i i I i Y X Y Y Y X X X c 122100)()()()(.然后,更新位移向量分量:,)(110∑=−=∇k i i I i X c X k X ∑=−=∇k i i I i Y c Y k Y 10)(1, 位移向量为 T Y X Y X ),...,,,(∇∇∇∇=T .其中,0X ∇和0Y ∇分别是X ∇和Y ∇在前一次迭代计算得到的值,在第1次计算时被设置为A 步骤中求解得到的tx 和ty .I i X 和I i Y 是输入人脸图像中的特征点的坐标值.i X 和i Y 是Rf S )(′中对应的点的x ,y 方向分量值.重复B步骤2~3次即可计算出恰当的位移和尺度因子.C. 利用B 中得到的T 和c ,通过等式(5)更新Rf S )(′.D. 得到了新的Rf S )(′,我们就可以很容易地根据等式(4)来计算形状系数向量α.E. 根据等式αP S S ′+′=′,我们可以重构出对应于特定人的稀疏的3D 人脸形状S ′.F. 重复步骤A~E,直到形状系数向量收敛.530Journal of Software 软件学报 V ol.17, No.3, March 2006最终,由密集的统计变形模型(等式(1)),我们可以重构得到对应于输入人脸的3D 形状.为得到更加精细的3D 形状的解,我们按照给定2D 图像上的特征点的坐标来对3D 形状顶点进行进一步调整.一旦得到了人脸精细的3D 形状信息,再结合此特定人的纹理图像,对姿态的校正就可以简单地通过旋转3D 人脸模型来实现了.1.2 基于球面谐波商图像的光照不变的纹理生成根据上节中恢复得到的3D 形状和姿态参数,人脸区域可以直接从给定的2D 人脸图像中抽取得到.然而,所得到的人脸图像并不是真正的纹理图像,而是随着光照的变化而变化的.虽然生成特定人本质的纹理是很直观的想法,但是本文并不直接恢复纹理信息,而是转而采取去光的策略来消除光照的影响,将提取的人脸区域的光照条件校正到参考光照条件下,即形成光照不变的纹理图像[12].最终,这个标准参考光照下的纹理可以被绘制到3D 人脸形状上,从而重建出完整的与光照无关的3D 人脸.由于反射等式可以被看作是一个卷积,因此很自然地在频域空间来对其进行分析.对于球面谐波,Basri 等人[13]证明了亮度的绝大部分能量都限制在3个低阶的部分,其频域公式为∑∑∑∑∑∑∞=−==−=∞=−=≈==0200),(),(),(),(l l l m l ll m lm lm l l l l m lm lm l lm lm Y L A Y L A Y E E βαβαβαβα (6) 其中,)4π,3π2,π(210===A A A A l [13]是Lambertian 反射的球面谐波系数,lm L 是入射光线的系数,lm Y 是球面谐波函数.给定一幅纹理图像I ,对于每一个像素(x ,y ),下面等式几乎都成立:()),(),,(),(),(y x y x E y x y x I βαρ=.这里,),(y x α和),(y x β可由3D 人脸形状的法线向量计算得到.假设人脸的反照率ρ为常数,lm l A Y E lm =表示谐波图像,E 是lm E 的9×n 维矩阵.其中,n 是纹理图像的像素总数,则光照条件L 的系数就可以通过最小二乘解得: I L E L L−=)(min arg ˆρ (7) 如果我们已经估计得到给定图像的光照条件,就可以很容易地将其重新打光到标准光照下.对图像上(x ,y )处一个确定的点P ,它的法线为),(βα,反照率为),(y x ρ,则原始图像上和光照校正后P 点的亮度分别是:⎪⎪⎭⎪⎪⎬⎫==∑∑∑∑=−==−=2020),(),(),(),(ˆ),(),(l l l m lm can lm l can l ll m lm lm l org Y L A y x y x I Y L A y x y x I βαρβαρ (8) 两种不同光照下的商图像定义为 ∑∑∑∑∑∑∑∑=−==−==−==−====20202020),(ˆ),(),(ˆ),(),(),(),(),(),(l l l m lm lm l l ll m lm can lm l l l l m lm lm l l l lm lm can lm l org can Y L A Y L A Y L A y x Y L A y x y x I y x I y x R βαβαβαρβαρ (9)由于给定图像的光照条件和参考光照条件都是确定的,因此对于给定人来说,参考光照条件相对于原始光照条件下的商图像就已经确定了.进而,基于原图像和商图像,光照校正后的纹理图像可以如下计算得到: ),(),(),(y x I y x R y x I org can ×= (10)得到了精细的3D 形状和去除光照影响的纹理后,我们就实现了根据输入的单幅任意光照下的非正面图像重构出对应于特定人的标准光照条件下的3D 人脸.对于纹理图像上不可见的点,采取插值策略进行填充. 2 实验与分析在这一节,我们通过姿态和光照不变的人脸识别来评价本文提出的姿态和光照校正算法的性能.对于一幅任意光照下的非正面人脸图像,重建它的与光照无关的3D 人脸,即无论输入图像的光照条件如何,最终重构的柴秀娟等:基于3D人脸重建的光照、姿态不变人脸识别5313D人脸对应的纹理都被校正到预先定义的参考光照条件下.姿态的校正是通过将3D人脸旋转到预先定义的标准姿态下实现的.经过校正后的人脸图像被作为一般人脸识别系统的输入来进行识别.2.1 姿态不变的人脸识别实验结果首先,对仅仅是姿态发生变化的情况进行测试,在CMU PIE数据库[14]中对4个姿态子集进行测试,分别是姿态集合05(右转22.5°)、姿态集合29(左转22.5°)、姿态集合37(右转45°)和姿态集合11(左转45°).Gallery中的图像来自姿态集合2,其中都是正面图像.本节仅仅对姿态变化情况进行测试,假设给定的图像其光照条件是均衡的.因此,不必对光照进行特别处理,即用从图像中提取出来的人脸区域来近似标准光照条件下的纹理图像,然后进行3D人脸重建,实现姿态校正.我们采用的人脸识别方法是Gabor PCA 加 LDA方法,其思想与GFC相似[15].训练图像从CAS-PEAL数据库[16]中选取,共300个人,平均每人选择6个姿态图像,10个正面图像.由于CAS-PEAL库采集条件与CMU PIE 库相距甚远,因此,避免了训练图像和测试图像之间条件相似给识别结果带来的影响.在此实验过程中,特征点是手工标定的.对于姿态校正策略进行人脸识别,其最重要的一点就是校正后的视图一定要在外观上与Gallery中的原型视图很相似.我们在图2里给出了一些基于3D人脸重建的姿态校正后的结果,用来进行视觉上的评价.图2中第1行是原始的4个非正面姿态下加了掩模的图像,第2行是对应的姿态校正后的图像,校正视图右侧的一列是来自姿态集合27的gallery图像,用作校正视图的参考.可见,对4个姿态的人脸进行姿态校正,校正后的人脸与Galley中原始正面人脸图像极为相似.并且,在P4 处理器,主频3.2GHz的电脑上,本文的方法进行一次完整的人脸重建只需1秒的时间.与文献[8]中的3D变形模型方法相比,计算复杂度大大降低.3D变形模型方法的一次拟合过程在一台2GHz,P4处理器的工作站上,大约需要4.5分钟[8].我们对这4个姿态子集的图像进行姿态校正,继而进行人脸识别,识别结果在表1中列出.校正之后,在4个姿态集合上的累积识别率比用原始人脸图像进行识别的情况有很大程度上的提高,首选识别率平均达到94.85%.我们还将此结果与特征光场方法(eigen light field)的结果进行了比较.特征光场方法可以用来解决人脸识别中的姿态和光照问题[17].该方法通常采用两种不同的标准化策略,分别是3点标准化和多点标准化.我们的结果与这两种标准化策略的结果进行比较,识别率都有提高(见表1).实验结果表明,基于3D人脸重建的姿态校正方法具有很好的性能.因此,我们认为在计算时间降低的情况下,基于本文方法进行的姿态不变的人脸识别仍然取得了另人满意的识别率.OriginalPose normalization P11 P11P29 P29 P05P05 P37 P37P27 P27Fig.2 The pose normalized images图2 姿态校正的结果由于这里用于3D人脸重建的特征点是手工标定的,而各种多姿态特征点定位方法在定位人脸关键点的过程中都不可避免地存在误差,因此为了衡量本文算法对特征点定位误差的鲁棒性,我们人为地对手工标定的特征点加高斯噪声.基于有高斯误差的特征点定位结果,我们重新进行3D人脸重建、姿态校正以及后续的人脸识别,从而验证定位的误差对使用本文策略进行姿态不变的人脸识别结果的影响.这里为尽量减少其他因素的影响,精确地分析特征点定位误差对识别结果的影响,我们采用简单的计算相关性的方法作为识别的度量.我们还尝试了对手工标定的人脸关键特征点增加5组不同的噪声,其均值为0,方差分别为1.0,1.5,2.0,2.5以及3.0.对特征点定位结果加入高斯噪声的姿态不变人脸识别结果见表 2.我们发现,对特征定位结果增加不同程度的高斯扰动,并没有造成识别结果发生较大的变化.。
cv人物表
CV人物4:Matthew Turk毕业于MIT,最有影响力的研究成果:人脸识别。其和Alex Pentland在1991年发表了"Eigenfaces for Face Recognition".该论文首次将PCA(Principal Component Analysis)引入到人脸识别中,是人脸识别最早期最经典的方法,且被人实现,开源在OpenCV了。主页:/~mturk/
CV人物16: William T.Freeman毕业于MIT;研究领域:应用于CV的ML、可视化感知的贝叶斯模型、计算摄影学;最有影响力的研究成果:图像纹理合成;Alex Efros和Freeman在2001年SIGGRAPH上发表了"Image quilting for texture synthesis and transfer",其思想是从已知图像中获得小块,然后将这些小块拼接mosaic一起,形成新的图像。该算法是图像纹理合成中经典中的经典。主页:/billf/
CV人物17: Feifei Li李菲菲,毕业于Caltech;导师:Pietro Perona;研究领域:Object Bank、Scene Classification、ImageNet等;最有影响力的研究成果:图像识别;她建立了图像识别领域的标准测试库Caltech101/256。是词包方法的推动者。主页:/~feifeili/
CV人物8:Michal Irani毕业于Hebrew大学,最有影响力的研究成果:超分辨率。她和Peleg于1991年在Graphical Models and Image Processing发表了"Improving resolution by image registration",提出了用迭代的、反向投影的方法来解决图像放大的问题,是图像超分辨率最经典的算法。我在公司实现的产品化清晰化增强算法就参考了该算法思想哈哈。主页:http://www.wisdom.weizmann.ac.il/~irani/
一种基于概周期性的纹理图象分析及合成方法
一种基于概周期性的纹理图象分析及合成方法一种基于概周期性的纹理图象分析及合成方法潘璐璐叶正麟孟帆(西北工业大学应用数学系,西安710072)E—mail:lulupan_*************.cn摘要文章对纹理合成中存在的合成时间过长,合成质量不理想的情况进行分析.发现了对一类具有视觉重复性的纹理,可以利用其本身的性质,加快纹理合成的速度,提高纹理合成的质量.通过对这类纹理特征的研究.引入了概周期纹理的概念,利用固定尺度滑动窗口的灰度距离对其概周期性进行度量,继而采用零搜索技术进行合成.实验证明.对于概周期性纹理,该文提出的算法使得纹理合成速度大大提高.合成质量同于甚至优于传统方法.关键词纹理分析纹理合成概周期性灰度距离零搜索文章编号1002—8331-(2006)24—0059—03文献标识码A中图分类号TP391 ANewTextureAnalysisandSynthesisMethod BasedonAlmostPeriodicCharactersPanLuluY eZhenglinMengFan(DepartmentofAppliedMathematics,NorthwestPolytechnicalUniversity,Xi'an710072) Abstract:Thispaperisfocusontheanalysisandresearchofthecurrentsituationoflongtimeco standpoorqualityinpatch_basedsamplingtexturesynthesis.Itwasfoundthatforacertaintypeoftexturewhichhasvisual repeatability,theowncharacteroftexturecanbeusedtoacceleratethespeedoftexturesynthesisandimprov ethequalityaswel1.Bytheresearchingforthistypeoftexturecharacters,theconceptofalmostperiodictexturehas beenintroduced.Basedonit,thispaperprovidesanalgorithmtocalculatethealmostperiodicsolutionwithslipsubwin dow'Sgraydistance, whichleadsUSoptimizetheprecisionofmatchingprocess.Afterwards,thesynthesisisproce ssedbyusingzero-researchtechnique.Itwasevidencedfromtheexperimentsthat,foralmostperiodictexture,thearithme ticinthispapercanhighly enhancethespeedoftexturesynthesisprocess,meanwhile,outputimage'Squalityisevenbett erthanthatoftraditionalmethods.Keywords:textureanalysis,texturesynthesis,almostperiodicproperty,distanceofgrayscal e,zero-searchl引言纹理是计算机视觉和真实感造型领域经常使用的概念,有着非常重要的应用价值.基于样图的纹理合成技术是近年来发展起来的新的纹理合成技术.它不仅能克服传统的纹理技术中存在的问题.而且其中涉及的思想对其他相关领域的研究也有一定的指导意义,因而受到越来越多的关注,成为计算机图形学,计算机视觉和图象处理领域的研究热点之一.基于样图的纹理合成技术还可以应用到图象的压缩传输,缺损图象的填充等领域:如将该技术中的思想应用到计算机动画领域,则可利用一段简短视频图象.生成时域内任意长度的非重复的视频动画等.因此基于样图的纹理合成技术不仅在大规模场景的生成以及真实感图形绘制和非真实感绘制(Non—Photoreal~sticRendering)等方面,而且在图象编辑,数据压缩,网络数据的快速传输和视频等领域也有很强的应用前景.目前较为流行的方法是以MRF为纹理图象的数学模型.以特征匹配的方式进行纹理合成.在1999年的ICCV会议上.Efros和I~ung首先提出了基于像素的纹理合成方法【l1.其基本思想是在MRF的模型下.先应用随机选取的纹理块进行初始化.然后逐点进行匹配计算,建立候选摩,并在库中随机选取进行填写.基于像素的方法对结构性纹理取得了较好的效果,但由于要逐点搜索样本图,所以合成速度非常慢.之后在2001年SIGGRAPH上.Efros提出了基于块的纹理合成方法[21,大大加快了纹理合成的速度.而对一部分结构性纹理合成效果保持很好.不过这种方法在操作的每一步仍要进行全局的搜索,所以速度方面还不能令人满意.之后有各种各样加速的方法提出,对搜索过程进行优化[3,5,al.2001年3月.微软亚洲研究院的梁林等人提出了和Efros基于块的方法极为相似的算法.这个算法同样基于MRF模型,并用纹理块边界匹配原则在样本图中搜索最佳匹配块,同时采用四叉树金字塔,优化的KD树和主元分析来加速搜索过程,得到了实时的合成效果.另外Wei和l_evoy把这种方法推广到了多面体上.可以发现.影响基于块方法速度的最主要的原因就是特征匹配的过程的计算量过大:而影响其合成质量的最主要的原因就是并没有考虑纹理本身的结构特征.然而在研究中我们发现对于某一类纹理,它们自身具有的性质保证它们对于匹配的强作者简介:潘璐璐(1981一),西北工业大学理学院应用数学系在读硕士,研究方向为计算机图形学,图象处理.叶正麟(1943一),西北工业大学理学院应用数学系教授,博士生导师,研究方向为计算机辅助几何设计,计算机图形学,图象处理.孟帆(1979一),西北工业大学理学院应用数学系在读硕士.研究方向为计算机图形学,图象处理.计算机工程与应用2006.2459度要求很低,甚至不需要进行这个过程.对于这些纹理,就可以进行低精度的搜索,甚至是零搜索,从而大大加速纹理合成的速度.改善纹理合成的质量.本文归纳出这种类型纹理具有的特点,通过不变尺度滑动窗口的色彩直方图匹配对此类纹理进行识别,提取出其周期纹理元,然后应用零搜索实现纹理的快速合成.实验证明,同全局搜索方法相比,拟合的质量在视觉上没有明显的差异,有些图象的合成质量甚至优于已有的方法; 而在合成速度上明显优于已有的方法.2基于块方法的特征匹配分析计算机图象处理和图形学中一个普遍存在的问题,就是搜索问题.基于块方法的绝大部分时间正是花费在匹配搜索过程中.无参数的局部条件密度函数为:P(IRI/0R)=8(tR一),1ii其中是输入纹理图象中待匹配的图象块,是的边界.IR是输入纹理图象中所有边界与的边界,匹配的块,占是Dirac'sdelta函数,是权因子.MRF模型认为图象的像素内容仅由此像素的有限邻域内的内容决定,这基本和纹理的性质吻合.所以在合成过程中,只要对边界区域进行匹配计算,凡是匹配的块都可以作为候选进入合成图象.匹配的定义是:d(tR.,:)<其中,d(tR.,)表示两个图象块.和的距离.是指定的阈值.图象块的距离一般采用两图象块边界的均方误差距离.为了保证选择的块在合成纹理中和周围的纹理具有一致的特征,匹配过程要对图象中所有的块进行计算,选出候选块,这需要很长的时间.使用各种加速算法可以从不同的角度优化搜索的范围,从而加快合成速度.但是我们在研究中发现对某些纹理而言,其纹理特征具有较强的重复性,本文称之为概周期性.在概周期纹理图象中满足一定尺度的纹理块之间都有重复的视觉信息,纹理合成时可以进行很低精度的搜索甚至完全省略搜索的过程,从而获得极高的合成速度和较好的合成质量3概周期纹理图象分析3.1概周期纹理图象的定义纹理概周期性的一个基本体现是纹理图象在任意位置和一定尺度下的子图象之间具有统计特征的稳定性.所以可以对纹理的局部子图象的特征进行提取和统计,通过对这些特征的比较来检测其周期的稳定性,进而用来度量纹理的概周期性.严格说来,这种图象特征的概周期性可以按照如下方式定义:设S为一平面矩形域,即{(,y)lx∈【aTb】,Y∈,).=(,Y)是以S为定义域的函数,它能代表上每一点的某种特征属性,称为点特征函数.Gl^l(S)表示取窗口算子,该算子的输人是二维平面的任一个矩形域S,输出是S当中以,Y)为左上角顶点的局部坐标,以h,Z为尺度的子窗口S.称为特征波动阈值.若存在JLI,JL2>0,对Vx∈【aTb】,Vy∈[c,,有x+LI∈【aT6】, y+Lz∈[c,,且f(x+L-,JLz)(,Y),则称S是严格周期的.JL.称为S的一个横向周期,JL称为S的一个纵向周期.若S有最602006.24计算机工程与应用小横向,纵向周期,分别称最小横向,纵向周期为S的基本横向周期,基本纵向周期,简称周期.若存在JLl,JL2>0,对V∈,b】,VY∈[c,,有x+Ll∈【a,b】,y+L∈[c,,且蒈(+JL-,JLz)(,,,)『<,则称S是概周期的.称为S的一个横向一概周期,JL称为S的一个纵向一概周期.若S有最小横向,纵向概周期,分别称最小横向,纵向概周期为S的基本横向概周期,基本纵向概周期,简称概周期.3.2概周期纹理图象的周期提取考虑将上述定义应用到图象当中.纹理图象可供提取的特征很多(例如局部MRF模型的参数).为了计算的简单,本文采用基于色彩直方图的色彩灰度值作为特征.实验证明这种特征可以很好的计算出纹理图象的周期.而对于纹理的子图象窗口的设定,为了保证包含足够的信息,本文选择宽和高都是初始图象的1/2的区域作为考察的子图象窗口.设A为灰度纹理图象,若A是彩色图象,则可将彩色图象转化为灰度图象,转化公式为:Gray=RxO.299+Gx0.587+Bx0.114其中,,G,B分别代表彩色图象的三色分量,Gray表示灰度图象的灰度值考虑到噪声对图象周期提取的影响,首先对灰度纹理图象进行去噪.常用的去噪方法有线性滤波,中值滤波,自适应滤波等等.因为中值滤波能较好地保护边界信息,故本文采用此种方法对纹理图象进行去噪处理.选取3x3的滤波窗口对图象A 进行中值滤波,将窗口内9个像素的灰度值,,南,…,,,1按大小顺序排列,取位于中间位置的灰度值作为窗口中心像素点的输出灰度值.设去除过噪声的图象为A.大量的实验数据表明,中值滤波虽然能较好地保护边界,但有时会失掉图象中的细线和小块的目标区域.为了最大限度地保留我们需要的信息,我们选取图1所示的8个卷积核组成的边缘算子对A进行边缘检测.图象中的每个点都用8个掩模进行卷积,每个掩模对某个特定边缘方向作出最大响应.所有8个方向中的最大值作为边缘幅度图象的输出.记输出图象为A.为了找出一般纹理图象和概周期纹理图象的区分关系,我们对大量的概周期纹理图象和一般纹理图象进行了区分试算, 即选取初始滑动窗口图象块,通过横向及纵向窗口逐像素滑动,计算出图象内部所有与滑动窗口尺度相同的子图象块之间的距离.块与块之间的距离就是普通的欧式距离.由图2可以看出.概周期性纹理图象块之间的欧式距离统计图显示出了明显的周期性,而一般纹理图象块之间的欧式距离统计图却呈现出不规则性.设概周期纹理图象的横向及纵向概周期为JLl,JL2.+5+5+5-30-3-3-3-3-3—3—3-30-3+5+5+5-3+5+5-30+5-3-3-3—3—3—3+50—3+5+5-3--30+5-3-3+5+5—3—3+50-3+5—3-3图l边缘算子一3-3-3-30+5-3+5+5+5+5—3+50—3-3—3-34概周期纹理图象的合成在计算出概周期纹理图象的横向及纵向概周期之后,根据象0.50.450.40l350_30.250.2图2(b)(a)纹理图象块之间的欧式距离统计图0.5.......'..————'.'''———'—'——''———————————0.45?}104l.,J0.35?},10.3-/,.1{0.25l』10.2.}/10.1了而0图2(c)一般纹理图象图2(d)(C)纹理图象块之间的欧式距离统计图上文给出的概周期纹理图象的定义,我们可以把概周期纹理图象看成为一些周期块的拼接.设初始滑动窗口左上角像素坐标为(,Y),则周期块的集合P可表示为:户{+^.厶,y+A,L,Li,LIA1∈Z,A2∈Z,x+A1L1∈a,b],y+A22∈【c,} 其中,G(S)表示取窗El算子,该算子的输入是二维平面的任一个矩形域S,输出是S当中以(,Y)为左上角顶点的局部坐标,以h,Z为尺度的子窗口S'.对于概周期性纹理图象,本文采用零搜索的随机匹配,即建立周期块候选库.每次从候选库中随机选出一块,按照光栅扫描顺序进行拼贴,对块与块的交叉域进行羽化融合.合成的结果表明,在合成质量上,本文的合成结果与传统方法相比没有明显差异,对有些图象的合成效果甚至优于传统方法;而由于采用了零搜索技术,合成速度从根本上得到了加快.5合成结果图3是纹理合成的结果对比.其中左边的纹理是输入的样本纹理.中间是使用本文的方法进行零搜索合成的纹理,右边是按照文献【4]中所述方法进行全局搜索合成的纹理.可以看出.在合成质量上.本文的合成结果与传统方法相比没有明显差异.对有些图象的合成效果甚至优于传统方法;而由于采用了零搜索技术,合成速度从根本上得到了加快.6问题与进一步的讨论本文从一个新的角度提出了一种新的纹理合成算法.它摒弃了传统方法中经常使用的MRF模型,提出了概周期纹理图象的概念.把其看作为一些周期纹理块的集合,利用其视觉信息的重复性,对此类纹理图象进行快速且高质量的合成.我们对概周期纹理图象的分析与合成算法仅仅触碰到了概周期纹理合成算法的冰山一角,许多挑战性的问题仍然存图3概周期性纹理合成的结果对比在.例如,当周期纹理元的尺度不一致时,现有的算法还无法解决:对于周期纹理元边界细节信息较多的纹理,合成时会产生边界信息不匹配的情况.如果对这些问题进行处理,是我们今后要继续努力的方向.(收稿日期:2005年12月)参考文献1.AlexeiAEfros,ThomasKLeung.TextureSynthesisbyNon—paramet—ricSampling[C].In:IEEEInternationalConferenceonComputerVision, Corfu,Greece,1999—092.AlexeiAEfros.WilliamTFreeman.ImageQuiltingforTextureSyn—thesisandTransfer[C].In:ProceedingsofSIGGRAPH2001,LosAn—geles,California,2001-083.TextureSynthesisbyFixedNeighborhoodSearching[D].Dissertation. StanfordUniversity,2001-114.LinLiang,CeLiu,Ying-QingXueta1.Real-TimeTexturesynthesis ByPatch-BasedSampling.MicrosoftResearchChina,2001—035.MarkusMultrus.TextureSynthesisUsingPatchBasedSampling.A StudentResearchProject,2003-046.Li—YiWei.MarcLevoy.FastTextureSynthesisusingTree—structured V ectorQuantization[C].In:ProceedingsofSIGGRAPH2000,Stanford University,20007.I.i—YiWei.MarCLevoy.TextureSynthesisOverArbitraryManifold Surfaces[C].In:ComputerGraphicsProceedings,AnnualConferenceSe—ties.200l一088.Y anxiLiu.Wen—ChiehLin,JamesHays.Near-RegularTextureAnaly—sisandManipulation[C].In:ProceedingsofSIGGRAPH2004,Carnegie MellonUniversity,2004计算机工程与应用2006.2461■■■●■●一■●■一■。
PatchMatch A Randomized Correspondence Algorithm for Structural Image Editing
PatchMatch:A Randomized Correspondence Algorithm for Structural Image EditingConnelly Barnes 1Eli Shechtman 2,3Adam Finkelstein 1Dan B Goldman 21Princeton University 2Adobe Systems 3University of Washington(a)original (b)hole+constraints (c)hole filled (d)constraints (e)constrained retarget (f)reshuffleFigure 1:Structural image editing.Left to right:(a)the original image;(b)a hole is marked (magenta)and we use line constraints(red/green/blue)to improve the continuity of the roofline;(c)the hole is filled in;(d)user-supplied line constraints for retargeting;(e)retargeting using constraints eliminates two columns automatically;and (f)user translates the roof upward using reshuffling.AbstractThis paper presents interactive image editing tools using a new randomized algorithm for quickly finding approximate nearest-neighbor matches between image patches.Previous research in graphics and vision has leveraged such nearest-neighbor searches to provide a variety of high-level digital image editing tools.However,the cost of computing a field of such matches for an entire image has eluded previous efforts to provide interactive performance.Our algorithm offers substantial performance improvements over the previous state of the art (20-100x),enabling its use in interactive editing tools.The key insights driving the algorithm are that some good patch matches can be found via random sampling,and that natural coherence in the imagery allows us to propagate such matches quickly to surrounding areas.We offer theoretical analysis of the convergence properties of the algorithm,as well as empirical and practical evidence for its high quality and performance.This one simple algorithm forms the basis for a variety of tools –image retargeting,completion and reshuffling –that can be used together in the context of a high-level image editing application.Finally,we propose additional intuitive constraints on the synthesis process that offer the user a level of control unavailable in previous methods.CR Categories:I.3.6[Computing Methodologies]:Computer Graphics—Methodology and Techniques;I.4.9[Computing Method-ologies]:Image Processing and Computer Vision—Applications Keywords:Approximate nearest neighbor,patch-based synthesis,image editing,completion,retargeting,reshuffling1IntroductionAs digital and computational photography have matured,researchers have developed methods for high-level editing of digital pho-tographs and video to meet a set of desired goals.For example,recent algorithms for image retargeting allow images to be resized to a new aspect ratio –the computer automatically produces a goodlikeness of the contents of the original image but with new dimen-sions [Rubinstein et al.2008;Wang et al.2008].Other algorithms for image completion let a user simply erase an unwanted portion of an image,and the computer automatically synthesizes a fill re-gion that plausibly matches the remainder of the image [Criminisi et al.2003;Komodakis and Tziritas 2007].Image reshuffling al-gorithms make it possible to grab portions of the image and move them around –the computer automatically synthesizes the remain-der of the image so as to resemble the original while respecting the moved regions [Simakov et al.2008;Cho et al.2008].In each of these scenarios,user interaction is essential,for several reasons:First,these algorithms sometimes require user intervention to obtain the best results.Retargeting algorithms,for example,sometimes provide user controls to specify that one or more regions (e.g.,faces)should be left relatively unaltered.Likewise,the best completion algorithms offer tools to guide the result by providing hints for the computer [Sun et al.2005].These methods provide such controls because the user is attempting to optimize a set of goals that are known to him and not to the computer.Second,the user often cannot even articulate these goals a priori .The artistic process of creating the desired image demands the use of trial and error,as the user seeks to optimize the result with respect to personal criteria specific to the image under consideration.The role of interactivity in the artistic process implies two prop-erties for the ideal image editing framework:(1)the toolset must provide the flexibility to perform a wide variety of seamless edit-ing operations for users to explore their ideas;and (2)the perfor-mance of these tools must be fast enough that the user quickly sees intermediate results in the process of trial and error.Most high-level editing approaches meet only one of these criteria.For ex-ample,one family of algorithms known loosely as non-parametric patch sampling has been shown to perform a range of editing tasks while meeting the first criterion –flexibility [Hertzmann et al.2001;Wexler et al.2007;Simakov et al.2008].These methods are based on small (e.g.7x7)densely sampled patches at multiple scales,and are able to synthesize both texture and complex image structures that qualitatively resemble the input imagery.Because of their abil-ity to preserve structures,we call this class of techniques structural image editing .Unfortunately,until now these methods have failed the second criterion –they are far too slow for interactive use on all but the smallest images.However,in this paper we will describe an algorithm that accelerates such methods by at least an order of mag-nitude,making it possible to apply them in an interactive structural image editing framework.To understand this algorithm,we must consider the common com-ponents of these methods:The core element of nonparamet-ric patch sampling methods is a repeated search of all patchesin one image region for the most similar patchin another image region.In other words,givenimages or regions A and B,find for everypatch in A the nearest neighbor in B under apatch distance metric such as L p.We call thismapping the Nearest-Neighbor Field(NNF),illustrated schematically in the insetfigure.Approaching this problem with a na¨ıve bruteforce search is expensive–O(mM2)for imageregions and patches of size M and m pixels,respectively.Even using acceleration methodssuch as approximate nearest neighbors[Mountand Arya1997]and dimensionality reduction,this search step remains the bottleneck of non-parametric patch sampling methods,preventing them from attain-ing interactive speeds.Furthermore,these tree-based acceleration structures use memory in the order of O(M)or higher with rela-tively large constants,limiting their application for high resolution imagery.To efficiently compute approximate nearest-neighborfields our new algorithm relies on three key observations about the problem: Dimensionality of offset space.First,although the dimensional-ity of the patch space is large(m dimensions),it is sparsely pop-ulated(O(M)patches).Many previous methods have accelerated the nearest neighbor search by attacking the dimensionality of the patch space using tree structures(e.g.,kd-tree,which can search in O(mM log M)time)and dimensionality reduction methods(e.g., PCA).In contrast,our algorithm searches in the2-D space of pos-sible patch offsets,achieving greater speed and memory efficiency. Natural structure of images.Second,the usual independent search for each pixel ignores the natural structure in images.In patch-sampling synthesis algorithms,the output typically contains large contiguous chunks of data from the input(as observed by Ashikhmin[2001]).Thus we can improve efficiency by performing searches for adjacent pixels in an interdependent manner.The law of large numbers.Finally,whereas any one random choice of patch assignment is very unlikely to be a good guess, some nontrivial fraction of a largefield of random assignments will likely be good guesses.As thisfield grows larger,the chance that no patch will have a correct offset becomes vanishingly small. Based on these three observations we offer a randomized algorithm for computing approximate NNFs using incremental updates(Sec-tion3).The algorithm begins with an initial guess,which may be derived from prior information or may simply be a randomfield. The iterative process consists of two phases:propagation,in which coherence is used to disseminate good solutions to adjacent pixels in thefield;and random search,in which the current offset vector is perturbed by multiple scales of random offsets.We show both theoretically and empirically that the algorithm has good conver-gence properties for tested imagery up to2MP,and our CPU im-plementation shows speedups of20-100times versus kd-trees with PCA.Moreover,we propose a GPU implementation that is roughly 7times faster than the CPU version for similar image sizes.Our algorithm requires very little extra memory beyond the original im-age,unlike previous algorithms that build auxiliary data structures to accelerate the ing typical settings of our algorithm’s parameters,the runtime is O(mM log M)and the memory usage is O(M).Although this is the same asymptotic time and memory as the most efficient tree-based acceleration techniques,the leading constants are substantially smaller.In Section4,we demonstrate the application of this algorithm in the context of a structural image editing program with three modes of interactive editing:image retargeting,image completion and image reshuffling.The system includes a set of tools that offer additional control over previous methods by allowing the user to constrain the synthesis process in an intuitive and interactive way(Figure1). The contributions of our work include a fast randomized approxi-mation algorithm for computing the nearest-neighborfield between two disjoint image regions;an application of this algorithm within a structural image editing framework that enables high-quality inter-active image retargeting,image completion,and image reshuffling; and a set of intuitive interactive controls used to constrain the opti-mization process to obtain desired creative results.2Related workPatch-based sampling methods have become a popular tool for image and video synthesis and analysis.Applications include texture synthesis,image and video completion,summarization and retargeting,image recomposition and editing,image stitching and collages,new view synthesis,noise removal and more.We will next review some of these applications and discuss the common search techniques that they use as well as their degree of interactivity. Texture synthesis and completion.Efros and Leung[1999]in-troduced a simple non-parametric texture synthesis method that outperformed many previous model based methods by sampling patches from a texture example and pasting them in the synthe-sized image.Further improvements modify the search and sam-pling approaches for better structure preservation[Wei and Levoy 2000;Ashikhmin2001;Liang et al.2001;Efros and Freeman2001; Kwatra et al.2003;Criminisi et al.2003;Drori et al.2003].The greedyfill-in order of these algorithms sometimes introduces incon-sistencies when completing large holes with complex structures,but Wexler et al.[2007]formulated the completion problem as a global optimization,thus obtaining more globally consistent completions of large missing regions.This iterative multi-scale optimization algorithm repeatedly searches for nearest neighbor patches for all hole pixels in parallel.Although their original implementation was typically slow(a few minutes for images smaller than1MP),our algorithm makes this technique applicable to much larger images at interactive rates.Patch optimization based approaches have now become common practice in texture synthesis[Kwatra et al.2005; Kopf et al.2007;Wei et al.2008].In that domain,Lefebvre and Hoppe[2005]have used related parallel update schemes and even demonstrated real-time GPU based implementations.Komodakis and Tziritas[2007]proposed another global optimization formu-lation for image completion using Loopy Belief Propagation with an adaptive priority messaging scheme.Although this method pro-duces excellent results,it is still relatively slow and has only been demonstrated on small images.Nearest neighbor search methods.The high synthesis quality of patch optimization methods comes at the expense of more search iterations,which is the clear complexity bottleneck in all of these methods.Moreover,whereas in texture synthesis the texture example is usually a small image,in other applications such as patch-based completion,retargeting and reshuffling,the input image is typically much larger so the search problem is even more critical.Various speedups for this search have been proposed, generally involving tree structures such as TSVQ[Wei and Levoy 2000],k d-trees[Hertzmann et al.2001;Wexler et al.2007;Kopf et al.2007],and VP-trees[Kumar et al.2008],each of which supports both exact and approximate search(ANN).In synthesis applications,approximate search is often used in conjunction with dimensionality reduction techniques such as PCA[Hertzmann et al. 2001;Lefebvre and Hoppe2005;Kopf et al.2007],because ANN methods are much more time-and memory-efficient in low dimensions.Ashikhmin[2001]proposed a local propagation technique exploiting local coherence in the synthesis process bylimiting the search space for a patch to the source locations of its neighbors in the exemplar texture.Our propagation search step is inspired by the same coherence assumption.The k-coherence technique[Tong et al.2002]combines the propagation idea with a precomputation stage in which the k nearest neighbors of each patch are cached,and later searches take advantage of these precomputed sets.Although this accelerates the search phase,k-coherence still requires a full nearest-neighbor search for all pixels in the input, and has only been demonstrated in the context of texture synthesis. It assumes that the initial offsets are close enough that it suffices to search only a small number of nearest neighbors.This may be true for small pure texture inputs,but we found that for large complex images our random search phase is required to escape local minima. In this work we compare speed and memory usage of our algorithm against k d-trees with dimensionality reduction,and we show that it is at least an order of magnitude faster than the best competing combination(ANN+PCA)and uses significantly less memory.Our algorithm also provides more generality than k d-trees because it can be applied with arbitrary distance metrics,and easily modified to enable local interactions such as constrained completion. Control and interactivity.One advantage of patch sampling schemes is that they offer a great deal offine-scale control.For ex-ample,in texture synthesis,the method of Ashikhmin[2001]gives the user control over the process by initializing the output pixels with desired colors.The image analogies framework of Hertz-mann et al.[2001]uses auxiliary images as“guiding layers,”en-abling a variety of effects including super-resolution,texture trans-fer,artisticfilters,and texture-by-numbers.In thefield of image completion,impressive guidedfilling results were shown by an-notating structures that cross both inside and outside the missing region[Sun et al.2005].Lines arefilledfirst using Belief Propa-gation,and then texture synthesis is applied for the other regions, but the overall run-time is on the order of minutes for a half MP image.Our system provides similar user annotations,for lines and other region constraints,but treats all regions in a unified iterative process at interactive rates.Fang and Hart[2007]demonstrated a tool to deform image feature curves while preserving textures that allowsfiner adjustments than our editing tools,but not at interac-tive rates.Pavic et al.[2006]presented an interactive completion system based on large fragments that allows the user to define the local3D perspective to properly warp the fragments before corre-lating and pasting them.Although their system interactively pastes each individual fragment,the user must still manually click on each completion region,so the overall process can still be tedious. Image retargeting.Many methods of image retargeting have ap-plied warping or cropping,using some metric of saliency to avoid deforming important image regions[Liu and Gleicher2005;Setlur et al.2005;Wolf et al.2007;Wang et al.2008].Seam carving[Avi-dan and Shamir2007;Rubinstein et al.2008]uses a simple greedy approach to prioritize seams in an image that can safely be removed in retargeting.Although seam carving is fast,it does not preserve structures well,and offers only limited control over the results. Simakov et al.[2008]proposed framing the problem of image and video retargeting as a maximization of bidirectional similarity be-tween small patches in the original and output images,and a similar objective function and optimization algorithm was independently proposed by Wei et al.[2008]as a method to create texture sum-maries for faster synthesis.Unfortunately,the approach of Simakov et al.is extremely slow compared to seam carving.Our constrained retargeting and image reshuffling applications employ the same ob-jective function and iterative algorithm as Simakov et al.,using our new nearest-neighbor algorithm to obtain interactive speeds.In each of these previous methods,the principal method of user con-trol is the ability to define and protect important regions from dis-tortion.In contrast,our system integrates specific user-directableFigure2:Phases of the randomized nearest neighbor algorithm: (a)patches initially have random assignments;(b)the blue patch checks above/green and left/red neighbors to see if they will im-prove the blue mapping,propagating good matches;(c)the patch searches randomly for improvements in concentric neighborhoods. constraints in the retargeting process to explicitly protect lines from bending or breaking,restrict user-defined regions to specific trans-formations such as uniform or non-uniform scaling,andfix lines or objects to specific output locations.Image“reshuffling”is the rearrangement of content within an image,according to user input,without precise mattes.Reshuffling was demonstrated simultaneously by Simakov et al.[2008]and by Cho et al.[2008],who used larger image patches and Belief Propagation in an MRF formulation.Reshuffling requires the minimization of a global error function,as objects may move significant distances,and greedy algorithms will introduce large artifacts.In contrast to all previous work,our reshuffling method is fully interactive.As this task might be particularly hard and badly constrained,these algorithms do not always produce the expected result.Therefore interactivity is essential,as it allows the user to preserve some semantically important structures from being reshuffled,and to quickly choose the best result among alternatives. 3Approximate nearest-neighbor algorithmThe core of our system is the algorithm for computing patch correspondences.We define a nearest-neighborfield(NNF)as a function f:A→R2of offsets,defined over all possible patch coordinates(locations of patch centers)in image A,for some distance function of two patches D.Given patch coordinate a in image A and its corresponding nearest neighbor b in image B,f(a) is simply b−a.We refer to the values of f as offsets,and they are stored in an array whose dimensions are those of A.This section presents a randomized algorithm for computing an approximate NNF.As a reminder,the key insights that motivate this algorithm are that we search in the space of possible offsets, that adjacent offsets search cooperatively,and that even a random offset is likely to be a good guess for many patches over a large image.The algorithm has three main components,illustrated in Figure2. Initially,the nearest-neighborfield isfilled with either random offsets or some prior information.Next,an iterative update process is applied to the NNF,in which good patch offsets are propagated to adjacent pixels,followed by random search in the neighborhood of the best offset found so far.Sections3.1and3.2describe these steps in more detail.(a)originals (b)random (c)14iteration (d)34iteration (e)1iteration (f)2iterations (g)5iterationsFigure 3:Illustration of convergence.(a)The top image is reconstructed using only patches from the bottom image.(b)above:the reconstruction by the patch “voting”described in Section 4,below:a random initial offset field,with magnitude visualized as saturation and angle visualized as hue.(c)1/4of the way through the first iteration,high-quality offsets have been propagated in the region above the current scan line (denoted with the horizontal bar).(d)3/4of the way through the first iteration.(e)First iteration complete.(f)Two iterations.(g)After 5iterations,almost all patches have stopped changing.The tiny orange flowers only find good correspondences in the later iterations.3.1InitializationThe nearest-neighbor field can be initialized either by assigning ran-dom values to the field,or by using prior information.When ini-tializing with random offsets,we use independent uniform samples across the full range of image B .In applications described in Sec-tion 4,we use a coarse-to-fine gradual resizing process,so we have the option to use an initial guess upscaled from the previous level in the pyramid.However,if we use only this initial guess,the al-gorithm can sometimes get trapped in suboptimal local minima.To retain the quality of this prior but still preserve some ability to es-cape from such minima,we perform a few early iterations of the algorithm using a random initialization,then merge with the up-sampled initialization only at patches where D is smaller,and then perform the remaining iterations.3.2IterationAfter initialization,we perform an iterative process of improving the NNF.Each iteration of the algorithm proceeds as follows:Offsets are examined in scan order (from left to right,top to bottom),and each undergoes propagation followed by random search .These operations are interleaved at the patch level:if P j and S j denote,respectively,propagation and random search at patch j ,then we proceed in the order:P 1,S 1,P 2,S 2,...,P n ,S n .Propagation.We attempt to improve f (x ,y )using the known offsets of f (x −1,y )and f (x ,y −1),assuming that the patch offsets are likely to be the same.For example,if there is a good mapping at (x −1,y ),we try to use the translation of that mapping one pixel to the right for our mapping at (x ,y ).Let D (v )denote the patch distance (error)between the patch at (x ,y )in A and patch (x ,y )+v in B .We take the new value for f (x ,y )to be the arg min of {D (f (x ,y )),D (f (x −1,y )),D (f (x ,y −1))}.The effect is that if (x ,y )has a correct mapping and is in a coherent region R ,then all of R below and to the right of (x ,y )will be filled with the correct mapping.Moreover,on even iterations we propagate information up and left by examining offsets in reverse scan order,using f (x +1,y )and f (x ,y +1)as our candidate offsets.Random search.Let v 0=f (x ,y ).We attempt to improve f (x ,y )by testing a sequence of candidate offsets at an exponentially decreasing distance from v 0:u i =v 0+w αi R i(1)where R i is a uniform random in [−1,1]×[−1,1],w is a large maximum search “radius”,and αis a fixed ratio between search window sizes.We examine patches for i =0,1,2,...until the current search radius w αi is below 1pixel.In our applications w isthe maximum image dimension,and α=1/2,except where noted.Note the search window must be clamped to the bounds of B .Halting criteria.Although different criteria for halting may be used depending on the application,in practice we have found it works well to iterate a fixed number of times.All the results shown here were computed with 4-5iterations total,after which the NNF has almost always converged.Convergence is illustrated in Figure 3and in the accompanying video.Efficiency.The efficiency of this naive approach can be improved in a few ways.In the propagation and random search phases,when attempting to improve an offset f (v )with a candidate offset u ,one can do early termination if a partial sum for D (u )exceeds the current known distance D (f (v )).Also,in the propagation stage,when using square patches of side length p and an L q norm,the change in distance can be computed incrementally in O (p )rather than O (p 2)time,by noting redundant terms in the summation over the overlap region.However,this incurs additional memory overhead to store the current best distances D (f (x ,y )).GPU implementation.The editing system to be described in Sec-tion 4relies on a CPU implementation of the NNF estimation al-gorithm,but we have also prototyped a fully parallelized variant on the GPU.To do so,we alternate between iterations of random search and propagation,where each stage addresses the entire offset field in parallel.Although propagation is inherently a serial oper-ation,we adapt the jump flood scheme of Rong and Tan [2006]to perform propagation over several iterations.Whereas our CPU version is capable of propagating information all the way across a scanline,we find that in practice long propagations are not needed,and a maximum jump distance of 8suffices.We also use only 4neighbors at each jump distance,rather than the 8neighbors pro-posed by Rong and Tan.With similar approximation accuracy,the GPU algorithm is roughly 7x faster than the CPU algorithm,on a GeForce 8800GTS card.3.3Analysis for a synthetic exampleOur iterative algorithm converges to the exact NNF in the limit.Here we offer a theoretical analysis for this convergence,showing that it converges most rapidly in the first few iterations with high probability.Moreover,we show that in the common case where only approximate patch matches are required,the algorithm converges even faster.Thus our algorithm is best employed as an approximation algorithm,by limiting computation to a small number of iterations.We start by analyzing the convergence to the exact nearest-neighbor field and then extend this analysis to the more useful case of con-vergence to an approximate solution.Assume A and B have equal size (M pixels)and that random initialization is used.Although the odds of any one location being assigned the best offset in this initial guess are vanishingly small (1/M ),the odds of at least one offset being correctly assigned are quite good (1−(1−1/M )M ))or approximately 1−1/e for large M .Because the random search is quite dense in small local regions we can also consider a “correct”assignment to be any assignment within a small neighborhood of size C pixels around the correct offset.Such offsets will be cor-rected in about one iteration of random search.The odds that at least one offset is assigned in such a neighborhood are excellent:(1−(1−C /M )M )or for large M ,1−exp (−C ).Now we consider a challenging synthetic test case for our algorithm:a distinctive region R of size m pixels lies at two different locations in an oth-erwise uniform pair of images A and B (shown inset).This image is a hard case because the background offers no information about where the offsets for the distinctive region may be found.Patches in the uniform background can match a large number of other identical patches,which are found by random guesses in one iteration with very high probability,so we consider convergence only for the distinct region R .If any one offset inthe distinct region R is within the neighborhood C of the correct offset,then we assume that after a small number of iterations,due to the density of random search in small local regions (mentioned previously),that all of R will be correct via propagation (for nota-tional simplicity assume this is instantaneous).Now suppose R has not yet converged.Consider the random searches performed by our algorithm at the maximum scale w .The random search iterations at scale w independently sample the image B ,and the probability p that any of these samples lands within the neighborhood C of the correct offset isp =1−(1−C /M )m (2)Before doing any iterations,the probability of convergence is p .The probability that we did not converge on iterations 0,1,...,t −1and converge on iteration t is p (1−p )t .The probabilities thus form a geometric distribution,and the expected time of convergence is t =1/p −1.To simplify,let the relative feature size be γ=m /M ,then take the limit as resolution M becomes large:t=[1−(1−C /M )γM ]−1−1(3)lim M →∞t=[1−exp (−C γ)]−1−1(4)By Taylor expansion for small γ, t =(C γ)−1−12=M /(Cm )−12.That is,our expected number of iterations to convergence remains constant for large image resolutions and a small feature size m relative to image resolution M .We performed simulations for images of resolution M from 0.1to 2Megapixels that confirm this model.For example,we find that for a m =202region the algorithm converges with very high probability after 5iterations for a M =20002image.The above test case is hard but not the worst one for exact matching.The worst case for exact matching is when image B consists of a highly repetitive texture with many distractors similar to the distinct feature in A .The offset might then get “trapped”by one of the distractors,and the effective neighborhood region size C might be decreased to 1(i.e.,only the exact match can pull the solution out of the distractor during random search).However in practice,for many image analysis and synthesis applications such as the ones we show in this paper,finding an approximate match (in terms of patch similarity)will not cause any noticeable difference.The chancesTime [s]Memory [MB]MegapixelsOurs k d-tree Ours k d-tree 0.10.6815.2 1.733.90.2 1.5437.2 3.468.90.352.6587.7 5.6118.3Table 1:Running time and memory comparison for the input shownin Figure 3.We compare our algorithm against a method commonly used for patch-based search:kd-tree with approximate nearest neighbor matching.Our algorithm uses n =5iterations.The parameters for kd-tree have been adjusted to provide equal mean error to our algorithm.of finding a successful approximate match are actually higher when many similar distractors are present,since each distractor is itself an approximate match.If we assume there are Q distractors in image B that are similar to the exact match up to some small threshold,where each distractor has approximately the same neighborhood region C ,then following the above analysis the expected number of iterations for convergence is reduced to M /(QCm )−0.5.3.4Analysis for real-world imagesHere we analyze the approximations made by our algorithm on real-world images.To assess how our algorithm addresses different de-grees of visual similarity between the input and output images,we performed error analysis on datasets consisting of pairs of images spanning a broad range of visual similarities.These included inputs and outputs of our editing operations (very similar),stereo pairs 1and consecutive video frames (somewhat similar),images from the same class in the Caltech-256dataset 2(less similar)and pairs of unrelated images.Some of these were also analyzed at multiple resolutions (0.1to 0.35MP)and patch sizes (4x4to 14x14).Our algorithm and ANN+PCA k d-tree were both run on each pair,and compared to ground truth (computed by exact NN).Note that be-cause precomputation time is significant for our applications,we use a single PCA projection to reduce the dimensionality of the in-put data,unlike Kumar et al.[2008],who compute eigenvectors for different PCA projections at each node of the k d-tree.Because each algorithm has tunable parameters,we also varied these parameters to obtain a range of approximation errors.We quantify the error for each dataset as the mean and 95th percentile of the per-patch difference between the algorithm’s RMS patch distance and the ground truth RMS patch distance.For 5iterations of our algorithm,we find that mean errors are between 0.2and 0.5gray levels for similar images,and between 0.6and 1.5gray levels for dissimilar images (out of 256possible gray levels).At the 95th percentile,errors are from 0.5to 2.5gray levels for similar images,and 0.9to 6.0gray levels for dissimilar images.Our algorithm is both substantially faster than k d-tree and uses substantially less memory over a wide range of parameter settings.For the 7x7patch sizes used for most results in the paper,we find our algorithm is typically 20x to 100x faster,and uses about 20x less memory than k d-tree,regardless of resolution.Table 1shows a comparison of average time and memory use for our algorithm vs.ANN k d-trees for a typical input:the pairs shown in Figure 3.The rest of our datasets give similar results.To fairly compare running time,we adjusted ANN k d-tree parameters to obtain a mean approximation error equal to our algorithm after 5iterations.The errors and speedups obtained are a function of the patch size and image resolution.For smaller patches,we obtain smaller speedups (7x to 35x for 4x4patches),and our algorithm has higher error values.Conversely,larger patches give higher speedups (3001/stereo/data/scenes2006/2/Image_Datasets/Caltech256/。
基于非线性滤波和纹理传输的图像风格化
基于非线性滤波和纹理传输的图像风格化Chapter 1: Introduction- Background of image stylization techniques- The importance of image stylization in various fields- The aim of the paper and the research questionsChapter 2: Literature Review- The concept of image stylization and its types- Nonlinear filtering and texture transfer techniques in image stylization- Relevant studies and researches in the field- A critical analysis of previous worksChapter 3: Nonlinear Filtering for Image Stylization- An overview of nonlinear filtering- Implementation of nonlinear filtering in image stylization- Advantages and limitations of using nonlinear filtering in image stylization- Examples of image stylization using nonlinear filtering Chapter 4: Texture Transfer for Image Stylization- Understanding texture transfer in image stylization- Texture transfer techniques and their applications in image stylization- Advantages and limitations of using texture transfer in image stylization- Examples of image stylization using texture transfer techniques Chapter 5: Hybrid Approach: Nonlinear Filtering and Texture Transfer- Overview of the hybrid approach- Combining nonlinear filtering and texture transfer for image stylization- The advantages of the hybrid approach over individual techniques- Examples of image stylization using the hybrid approach Chapter 6: Conclusion and Future Work- Summary of the findings- Contributions of the study- Limitations and future directions for research in image stylization - The significance of the study in image processing and related fields.Chapter 1: IntroductionImage stylization has become an increasingly important topic in the field of computer vision and graphics. Image stylization refers to the process of transforming an input image into a new image with a certain artistic or visual effect. This transformation can be achieved using various techniques, such as nonlinear filtering and texture transfer.Image stylization techniques have gained significant attention from both academia and industry due to their diverse applications in various fields, such as entertainment, advertising, and design. For example, image stylization can be used in digital media to create unique visual effects for films, animation, and video games. Similarly, stylized images can be used for advertising campaigns to attract customers by creating visually appealing advertisements. This paper aims to provide a comprehensive overview of imagestylization techniques, with a particular focus on nonlinear filtering and texture transfer. Specifically, this paper aims to answer the following research questions:1. What is image stylization, and why is it important in various fields?2. What are the different types of image stylization techniques, and how do they compare in terms of effectiveness and efficiency?3. How can image stylization be achieved using nonlinear filtering and texture transfer techniques?4. What are the advantages and limitations of using nonlinear filtering and texture transfer in image stylization?5. What is the hybrid approach to image stylization, and how does it compare to individual techniques in terms of results and efficiency?The remainder of this paper is organized as follows: Chapter 2 provides a review of related literature on image stylization, including a definition of the concept and a critical analysis of previous works. Chapter 3 covers the use of nonlinear filtering techniques for image stylization, including an explanation of the technique and its advantages and limitations. Chapter 4 discusses the use of texture transfer techniques in image stylization, including the different types of texture transfer techniques and their applications. Chapter 5 provides an overview of the hybrid approach to image stylization, including its advantages over individual techniques and examples of its use. Finally, Chapter 6 summarizes the findings of this study, outlines its contributions, limitations, and future directions for research.Chapter 2: Literature ReviewIn this chapter, we provide a review of related literature on image stylization. We first define the concept of image stylization and discuss its importance in various fields. We then critically analyze previous works on image stylization, including their strengths and limitations.2.1 Definition of Image Stylization and ImportanceImage stylization refers to the process of transforming an input image into a new image with a certain artistic or visual effect. Image stylization techniques aim to enhance the aesthetic or emotional impact of an image by altering its visual characteristics such as color, texture, and shape. This process can be achieved using various techniques, such as nonlinear filtering and texture transfer.Image stylization is important in various fields, such as entertainment, advertising, and design. In the entertainment industry, image stylization can be used to create unique visual effects for films, animation, and video games. For example, the use of stylized images in video games can enhance their artistic appeal and create a unique narrative experience for the player. Similarly, stylized images can be used for advertising campaigns to attract customers by creating visually appealing advertisements. In the design industry, image stylization can be used to create unique designs for products, packaging, and branding.2.2 Review of Previous Works on Image StylizationThe previous works on image stylization can be categorized into two main types: non-photorealistic rendering (NPR) and photorealistic rendering (PR). NPR techniques aim to create stylized images that depart from traditional photorealistic rendering by emphasizing the artistic or emotional aspects of the image. In contrast, PR techniques aim to create images that are as close as possible to the original image, using realistic lighting, textures, and colors.Some of the early works on image stylization focused on NPR techniques. Gooch et al. (1998) introduced an NPR technique that uses color and tone to stylize images based on the color transitions in the original image. Another NPR technique that gained popularity is the painterly rendering technique (Hertzmann et al., 2001), which generates stylized images that appear to be painted by an artist. This technique uses brushstrokes and other artistic elements to create a stylized image.More recently, researchers have focused on PR techniques for image stylization. Yang et al. (2017) proposed a technique that uses deep convolutional neural networks (CNNs) to learn style features from a set of reference images and transfer them to a target image. This style transfer technique has gained significant popularity due to its ability to create photorealistic stylized images. While both NPR and PR techniques have their advantages and limitations, the choice of technique depends on the specific application and the desired outcome. NPR techniques are ideal for creating stylized images that emphasize the artistic and emotional aspects of an image, while PR techniques are more suitable forapplications that require realistic rendering of visual information.In summary, previous works on image stylization have provided various techniques for generating stylized images, ranging from NPR techniques to PR techniques driven by deep learning. Each technique has its strengths and limitations, and the choice of technique depends on the specific application and goals of the user. In the next chapter, we focus on nonlinear filtering techniques for image stylization.Chapter 3: Nonlinear Filtering Techniques for Image Stylization3.1 IntroductionNonlinear filtering techniques are commonly used in image processing and computer vision for various tasks, such as denoising, edge detection, and feature extraction. Nonlinear filters are designed to suppress noise and preserve edges in an image by using a nonlinear function to modify the image intensity values.In recent years, nonlinear filtering techniques have been used for image stylization by modifying the intensity values of an image to achieve a desired visual effect. In this chapter, we provide an overview of nonlinear filtering techniques for image stylization, including their strengths and limitations.3.2 Nonlinear Filtering Techniques for Image StylizationOne of the most commonly used nonlinear filters for image stylization is the bilateral filter (BF) (Tomasi and Manduchi, 1998). The BF is a spatial-domain filter that uses both spatial and rangeinformation to smooth an image while preserving edges. The BF is designed to remove high-frequency noise while preserving edge information by weighting the filter coefficients based on the distance between pixels and their intensity values.The BF has been used for various image stylization tasks, such as tone mapping (Durand and Dorsey, 2002), detail enhancement (Fattal, 2007), and texture transfer (Kwatra et al., 2005). For example, to achieve a painterly effect, an image can be stylized by decomposing it into multiple scales using a Laplacian pyramid, applying the BF to each scale, and then recomposing the stylized image (Hertzmann et al., 2001). This technique produces a stylized image that resembles a painting, with brushstrokes and other artistic elements.Another nonlinear filter that has been used for image stylization is the anisotropic diffusion (AD) filter (Perona and Malik, 1990). The AD filter is a diffusion-based filter that suppresses noise and enhances edges by smoothing the image based on its gradient information. The AD filter is designed to smooth an image while preserving edges by diffusing the image intensity values based on the gradient information.The AD filter has been used for various image stylization tasks, such as cartoonization (Kyprianidis et al., 2013) and stylization of medical images (Sarkar et al., 2014). For example, the AD filter can be used for cartoonization by applying it to a gradient map of the input image, followed by thresholding and color quantization to produce a cartoon-like stylized image.However, nonlinear filtering techniques have their limitations. For example, they can produce artifacts such as halos and oversmoothing in the stylized images. Additionally, they can be computationally expensive, especially for large images or high-dimensional data. Despite these limitations, nonlinear filtering techniques remain popular for image stylization due to their ability to produce visually appealing stylized images.3.3 Comparison of Nonlinear Filtering TechniquesThe choice of nonlinear filtering technique for image stylization depends on the specific application and the desired visual effect. While the BF and AD filters are among the most commonly used nonlinear filters for image stylization, other filters such as the guided filter (He et al., 2010) and joint bilateral filter (JBF) (Paris et al., 2007) have also been used for image stylization.The guided filter is a spatial-domain filter that uses a guided image and target image to produce a filtered image that is similar to the guided image. The guided filter has been used for various image stylization tasks, such as detail enhancement, texture transfer, and stylization of low-light images.The JBF is a spatial-domain filter that uses the joint range information of two different images to produce a filtered image. The JBF has been used for various image stylization tasks, such as style transfer and texture transfer.Each nonlinear filtering technique has its strengths and limitations, and the choice of technique depends on the specific application andthe desired visual effect.3.4 ConclusionNonlinear filtering techniques are commonly used for various image processing tasks, such as denoising, edge detection, and feature extraction. In recent years, nonlinear filtering techniques have also been used for image stylization by modifying the intensity values of an image to achieve a desired visual effect.The BF and AD filters are among the most commonly used nonlinear filters for image stylization, but other filters such as the guided filter and JBF have also been used for image stylization. The choice of nonlinear filtering technique for image stylization depends on the specific application and the desired visual effect. While nonlinear filtering techniques have their limitations, they remain popular for image stylization due to their ability to produce visually appealing stylized images.Chapter 4: Deep Learning Techniques for Image Stylization4.1 IntroductionIn recent years, deep learning techniques have gained popularity in image processing and computer vision, particularly for tasks such as object recognition, image segmentation, and image restoration. Deep learning techniques such as convolutional neural networks (CNNs) have also been applied to image stylization, allowing for the creation of realistic and visually appealing stylized images.In this chapter, we provide an overview of deep learningtechniques for image stylization, including their strengths and limitations.4.2 Deep Learning Techniques for Image StylizationDeep learning techniques for image stylization typically use a feedforward deep neural network architecture, where the network takes an input image and produces a stylized output image. The network learns a mapping between the input and output images through a training process, where pairs of input and output images are used to train the network.One of the most commonly used deep learning techniques for image stylization is neural style transfer (Gatys et al., 2015). Neural style transfer uses CNNs to separate the content and style information in two images, allowing for the transfer of style from one image to another while preserving the content information.Neural style transfer has been used for various image stylization tasks, such as artistic style transfer (Gatys et al., 2016), photo-realistic style transfer (Luan et al., 2017), and 3D model stylization (Chen et al., 2019). For example, to achieve artistic style transfer, a neural network is trained using a content image and a style image, with the output image produced by combining the content information from the content image and the style information from the style image.Another deep learning technique that has been used for image stylization is the generative adversarial network (GAN) (Goodfellow et al., 2014). GANs are a type of deep neural networkarchitecture in which two neural networks, a generator and a discriminator, are trained simultaneously.GANs have been used for various image stylization tasks, such as unpaired image-to-image translation (CycleGAN) (Zhu et al., 2017) and photorealistic image synthesis (PGGAN) (Karras et al., 2017). For example, CycleGAN can be used for unpaired image-to-image translation, where the network is trained to learn a mapping between images from two different domains, such as summer and winter landscapes.Deep learning techniques for image stylization have several advantages over traditional methods such as nonlinear filtering techniques. They can produce realistic and visually appealing stylized images, and can effectively capture complex relationships between input and output images. However, deep learning techniques can have high computational requirements and may require large amounts of training data to achieve good results.4.3 Comparison of Deep Learning TechniquesThe choice of deep learning technique for image stylization depends on the specific application and the desired visual effect. While neural style transfer and GANs are among the most commonly used deep learning techniques for image stylization, other techniques such as neural color transfer (Xiong et al., 2017) and deep photo style transfer (Luan et al., 2017) have also been used.Neural color transfer is a deep learning technique that transfers thecolor distribution from a reference image to a target image, while preserving the spatial details of the target image. Neural color transfer has been used for various image stylization tasks, such as colorization and recoloring of images.Deep photo style transfer is a deep learning technique that transfers the style information from a style image to a target photograph, while preserving the key features of the target photograph. Deep photo style transfer has been used to produce photorealistic stylized images that resemble paintings or other artistic styles.Each deep learning technique for image stylization has its strengths and limitations, and the choice of technique depends on the specific application and the desired visual effect.4.4 ConclusionDeep learning techniques have become popular in image processing and computer vision, and have also been applied to image stylization. Neural style transfer and GANs are among the most commonly used deep learning techniques for image stylization, but other techniques such as neural color transfer and deep photo style transfer have also been used.Deep learning techniques for image stylization can produce realistic and visually appealing stylized images, but may have high computational requirements and require large amounts of training data. The choice of technique depends on the specific application and the desired visual effect. Despite their limitations, deep learning techniques are expected to continue to play an importantrole in image stylization in the years to come.Chapter 5: Applications of Deep Learning for Image Stylization5.1 IntroductionDeep learning techniques for image stylization have practical applications in various fields, such as marketing, art, and entertainment. In this chapter, we discuss the applications of deep learning for image stylization, including their potential advantages and limitations.5.2 MarketingDeep learning techniques for image stylization have potential applications in marketing, particularly in the fields of advertising and product design. With the increasing amount of visual content online, it has become challenging to attract and retain the attention of potential customers with standard advertising techniques.One potential use of deep learning techniques for image stylization in marketing is the creation of visually appealing product images. For example, a deep learning model can be trained to generate stylized product images that showcase the product's unique features in an eye-catching and compelling way.Additionally, deep learning techniques for image stylization can be used to personalize marketing campaigns. For example, a deep learning model can be trained to generate personalized images for each customer based on their preferences and past purchasing behavior.While deep learning techniques for image stylization have potential benefits in marketing, it is important to consider the potential ethical implications of using personalized images and advertising techniques that manipulate consumer perception.5.3 ArtDeep learning techniques for image stylization have also been applied in the field of art, allowing artists to create new and unique visual styles that were previously difficult to achieve with traditional techniques.One potential use of deep learning techniques for image stylization in art is as a tool for artists to generate new and experimental styles that challenge traditional artistic boundaries. For example, an artist can use a deep learning model to generate stylized images based on a set of abstract features or constraints.Additionally, deep learning techniques for image stylization can be used for creating interactive art installations that respond to the environment and audience behavior in real-time.While deep learning techniques for image stylization have potential in art, it is important to consider the potential impact on the traditional creative process, as well as the ethical implications of the use of AI-generated art.5.4 EntertainmentDeep learning techniques for image stylization have also found applications in the entertainment industry, particularly in the fields of movies and video games.One potential use of deep learning techniques for image stylization in movies is the creation of realistic visual effects, such as explosions, fire, and water, that were previously difficult to achieve with traditional techniques.Additionally, deep learning techniques for image stylization can be used in video game development to generate new and innovative visual styles, as well as to improve the realism and visual quality of the game.While deep learning techniques for image stylization have potential in entertainment, it is important to consider the potential impact on the traditional creative process and the potential limitations of using AI-generated content in creating immersive experiences.5.5 ConclusionDeep learning techniques for image stylization have practical applications in various fields, including marketing, art, and entertainment. In marketing, deep learning techniques can be used to create visually appealing product images and personalized advertising. In art, deep learning techniques can be used to generate new and experimental visual styles. In entertainment, deep learning techniques can be used to create realistic visual effects and improve the realism of video games.While deep learning techniques for image stylization have potential benefits, it is important to consider the potential impact on the traditional creative process and the ethical implications of using AI-generated content. As such, it is important for researchers, practitioners, and policymakers to carefully consider these implications and limitations in developing and implementing deep learning techniques for image stylization.。
基于简化的TV模型修复图像
基于简化的TV模型修复图像贺文熙;叶坤涛【摘要】An simplified TV model based on impainting images is proposed in this paper. This model adopts different equations with special inpainting area to achieve much more simplified partial differential equations. Four damaged pictures are selected for image impainting experiments. The experiment results prove that this simplified model would need less time fqr different requirements of image impainting and make impainted images more natural.%基于修复受损灰度图像的整体变分法,提出了一种简化的TV模型.该模型根据待修复区域的实际情况对TV模型方程进行不同程度的简化,得到更容易实现的复杂度更低的偏微分方程.文中选择了四幅受损区域不同的待修复图像进行修复实验.实验表明:不论是需要去除不必要的文字还是修复不希望出现的划痕,对于不同的修复区域采用不同的修复方程在保证修复主观效果相当的情况下,可以使得修复速度明显加快,修复后的图像不易察觉修复痕迹,看上去自然.【期刊名称】《江西理工大学学报》【年(卷),期】2012(033)005【总页数】4页(P66-68,73)【关键词】图像修复;TV模型;偏微分方程【作者】贺文熙;叶坤涛【作者单位】江西理工大学理学院,江西赣州341000;江西理工大学理学院,江西赣州341000【正文语种】中文【中图分类】TP319.41所谓图像修复是指利用已知区域的信息来推测未知区域(即待修复区域)的信息.通常认为未知区域附近和未知区域高度相关.以前图像修复完全依靠有经验的艺术家进行人工修复,费时费力,而且很难做到不露痕迹.考虑到图像修复在影视特技表演、珍贵文史资料复原、多余物体去除等方面有非常大的应用价值,有必要对图像修复领域进行深入研究.2000年Bertalmio、Sapiro、Caselles与Ballester等科学家[1-5]首先把三阶偏微分方程引进图像修复领域(此即BSCB模型),该算法强调沿梯度垂直方向把信息渗透进待修复区域,这种算法比较费时.后来不断有人提出新的算法,如Oliveira等研究者[6]利用高斯卷积核进行滤波,这种算法能提高修复速度,但修复效果仍然不够理想.另外,代表人物Criminisi与Efros[7-11]关于样图与纹理的图像修复思想对后来的图像修复工作者也起了积极影响.和Criminisi与Efros一样杰出的科技工作者Chan与Shen[12-14]等人提出了曲率驱动扩散模型(curvature-drive diffusion,简称为CDD模型)和整体变分模型(total variationl,简称为TV模型),这两种模型修复效果很好,但修复速度仍然较慢.当然目前的修复模型还有很多,比如利用径向基函数[15](RBF)进行高维插值修复图像,利用小波系数相关性修复图像等[16].文中提出一种简化的TV模型.该模型根据待修复区域的实际情况对修复方程进行简化.使得运算复杂度有所降低,进而提高修复速度.这种方法也可用于近红外图像的修复.实验表明,利用该模型修复数字图像不仅速度更快,而且修复效果不错.修复过程原理见图1,图中Ω是信息缺损的区域,Ω为缺损区域与信息完好区域之边界,G代表包含Ω的相邻区域.这个原理图说明了修复过程是从外向里一层一层修复.所谓基于TV模型的图像修复就是要使得泛函R(u)最小化.这Δ里u是图像任一像素的灰度值,是u的梯度,r(u)是一个取正值的函数,R(u)的表达式为:我们还可把条件加进去,再用拉格朗日乘子法得到满足一定条件的泛函:最后得到如下包含去噪的修复方程:文中对以上模型进行改进.首先,不讨论去噪,第二项为零.方程为:该方程可展开为:众所周知:利用计算机编程进行迭代时,时间开销主要是乘法或除法运算,因此减少乘法或除法的的运算次数就能节约时间.而图像修复问题主要是考虑两个方面.一是主观修复效果好,另外就是修复速度要快,这有利于实时修复图像.基于这样的考虑,有必要根据待修复区域像素分布情况简化模型,从而达到节约时间的目的. 比如待修复区域以及邻域为G,(α,β)是G内任意一个像素的位置,亦即G={(α,β∈[c,d]}.对α∈[a,b],β∈[c,d],若(α,β)点的像素灰度值f的梯度的倒数大,但梯度的倒数变化小,这时只须取第一项.方程变为如下简单的形式:同理,如果区域区G={(α,βd]}任一像素梯度倒数为常数c,则方程为均匀热传导方程:当然如果区域G={(α,β)α∈[a,b],β∈[c,d]}内像素梯度大,散度小,这时第一项较小,方程成为:要是区域G={(α顶作用相当,这时取一项乘以2即可.方程变为:方程的离散在发表的论文[17]上已有讨论,这里不再重复.试验以Matlab7.0为平台,在PC机(Pentium 4,2.94GHz,内存在512MB)上实现,离散时间步长取0.01.对于均匀区域(见图2(b)),可用式(7)修复(c取为1),修复结果如图2(d).若梯度的倒数比较大,但变化较小(见图3(a)),可用式(8)修复,修复结果如图3(c).若两项作用相当(见图4(b)),则可用式(9)进行修复,修复结果如果4(d).一般情况(见图5(b)),可用式(5)或式(4)修复,修复结果见图5(c)为了比较修复效果,引进客观评价标准ISNR.ISNR的定义是:式中I(i,j)、J(i,j)和Iˆ(i,j)分别表示原图像、受损图像和修复后的图像的灰度.图3(a)无原图,故不能计算INSR,其余图像修复时,INSR可达到19.文中方法修复时间更快.综合图2(b)、图3(a)、图4(b)的修复结果可以得到如下结论:若只考虑修复后的主观感受,文中提出的改进TV模型和chan与Shen等人提出的TV模型修复效果没有特别大的区别,如果限定较少的迭代次数,则TV模型稍好.但是图像修复强调的不是修复次数而是修复速度.今后的工作主要是建立像素间的影射关系,从而可以并行修复待修复区域的各个像素.终极目标为:架构一个快速修复的人工神经网络,能够对缺损图像进行实时修复.【相关文献】[1]Bertamio M,Sapiro G,Caselles V,et al.Image inpainting[C]//Internationalconference on Computer Graphics and Interactive Techniques,New York:ACM Press,2000:417-424. [2]M Bertalmio,A L Bertozzi,G Sapiro.Navier-stokes,fluid dynamics,and image and video puter Vision andPattern Recognition,2001:355-362.[3]M Bertalmio,L Vese,G Sapiro,et al.Simutanneous strcture image inpainting[J].IEEE Transactions on Image Processing,2003,12(8):882-889.[4]C Ballester,V Caselles,J Verdera.A variational model for fillingin gray level and color images[J].In Proc.ICCV,2001:10-16.[5]Ballester C,Bertalmio M,Caselles V,et al.Filling in by Joint Interpolation of Vector Field and Gray Levels[J].IEEE Trans.Image Process,2001,10(8):1200-1211.[6]M M Oliveira,B Bowen,R Mckenna.Fast digital image inpainting[J].International Conference on Visualization,Imaging and Image processing,2001:261-266.[7]A Criminissi,P Pérez,K Toyama.Region filling and removal by exemplar-based imageinpainting[J].IEEE Transactions on Image Processing,2004,13(9):1200-1212.[8]A Criminissi,P Pérez,K.Toyama.Object removal by exemplarbasedinpainting[C]//Proceedings of Conference on Computer Vision PatternRecognition,USA,2003.[9]Efoseung T.Texture synthesis by non-parametric sampling[C]puter Vision,1999:1033-1038.[10]A A Efros,W T Freeman.Image quilting for texture synthesis andtransfer[C].Proceedings of SIGGRAPH,2001:341-346.[11]D J Heeger,J R Bergen.Pyramid-based texture analysis/synthesis[C].SIGGRAPH95,1995:229-238.[12]Tony F Chan,Shen J H.Mathematic models for local nonmtexture imprinting by Curvature-Driven Diffusion(CDD)[J].J.Visual Comm.Image Rep.,2001,12(4):436-449.[13]Tony F Chan,Shen J H.Image processing and analysis:variational,PDE,wavelet,and stochastic methods[M].SIAM Philadelphia,2005.[14]Shen J H,Tony F Chan.Variational restoration of nonflat image features:models and algorithms[J].SIAM J.Appl.Math,2000,61(4):338-1361.[15]周廷方,汤锋,王进,等.基于径向基函数的图像修复技术[J].中国图像图形学报,2004,9(10):1190-1196.[16]何凯,梁然,张涛.基于小波系数相关性的纹理图像快速修复算法[J].天津大学学报,2010,43(12):1093-1097.[17]廉小丽,徐中宇,冯丽丽,等.一种新的基于偏微分方程的图像修复[J].计算机工程,2009,35(6):34-236.。
Zemax 教学 错误信息
第十四章错误信息(Error Message)在ZEMAX中,若出现ERROR 58: Key driver not be installed的信息时该如何处理?Answer:您需至CD里DRIVERS目录下安装适合您计算机运行系统的Driver,即可解决。
ZEMAX-在ZEMAX中,当使用MTF的分析功能时,若图表上出现ERROR 921:SAMPLING TOO LOW, DATA INACCURATE!的信息时,是什么原因造成的呢?Answer:这是因为您所设罝的采样率太低,您只要把采样率调高即可。
如下图所示:在ZEMAX中,为何使用NSC with ports来建构光学系统时,会出现"Non-sequential surfaces must be followed by a standard surface type."的错误信息?Answer:这是因为当您以NSC with ports来建构您的光学系统时,除了建构NSC的表面之外,还要多插入一个standard的表面在NSC所在的表面之后,有了这两个表面才能完整的描述NSC的边界,即键入port和输出port。
9 2550为何在ZEMAX中使用Relative Illumination Plot时,图表上会出现" CANNOT COMPUTE RELATIVE ILLUMINATION " 的错误信息? Answer:这是因为您的系统设罝有违反Relative Illumination的定义,而关于定义部分您得查看ZEMAX手册Chapter 7 ANALYSIS MENU/Illumination/Relative Illumination,当然根据手册的内容描述如下:For systems with very high amounts of vignetting, or for systems with non-linear cosine space aberrations that would violate the assumptions of the computation, the relative illumination cannot be calculated, and an error message will be displayed.您的系统使用了许多渐晕( vignettig ) 或像差过大时将违反定义,所以您可能要考虑修改您的系统!下图是在安装ZEMAX时出现的错误信息,请问为什么出现这种问题呢?问题出在KeyPro的还是PC。
Texture synthesis by non-parametric sampling
IEEE International Conference on Computer Vision,Corfu,Greece,September1999 Texture Synthesis by Non-parametric SamplingAlexei A.Efros and Thomas K.LeungComputer Science DivisionUniversity of California,BerkeleyBerkeley,CA94720-1776,U.S.A.efros,leungt@AbstractA non-parametric method for texture synthesis is pro-posed.The texture synthesis process grows a new image outward from an initial seed,one pixel at a time.A Markov randomfield model is assumed,and the conditional distri-bution of a pixel given all its neighbors synthesized so far is estimated by querying the sample image andfinding all sim-ilar neighborhoods.The degree of randomness is controlled by a single perceptually intuitive parameter.The method aims at preserving as much local structure as possible and produces good results for a wide variety of synthetic and real-world textures.1.IntroductionTexture synthesis has been an active research topic in computer vision both as a way to verify texture analysis methods,as well as in its own right.Potential applications of a successful texture synthesis algorithm are broad,in-cluding occlusionfill-in,lossy image and video compres-sion,foreground removal,etc.The problem of texture synthesis can be formulated as follows:let us define texture as some visual pattern on an infinite2-D plane which,at some scale,has a stationary distribution.Given afinite sample from some texture(an image),the goal is to synthesize other samples from the same texture.Without additional assumptions this problem is clearly ill-posed since a given texture sample could have been drawn from an infinite number of different textures. The usual assumption is that the sample is large enough that it somehow captures the stationarity of the texture and that the(approximate)scale of the texture elements(texels)is known.Textures have been traditionally classified as either reg-ular(consisting of repeated texels)or stochastic(without explicit texels).However,almost all real-world textures lie somewhere in between these two extremes and should be captured with a single model.In this paper we have chosen a statistical non-parametric model based on the assumption of spatial locality.The result is a very simple texture syn-thesis algorithm that works well on a wide range of textures and is especially well-suited for constrained synthesis prob-lems(hole-filling).1.1.Previous workMost recent approaches have posed texture synthesis in a statistical setting as a problem of sampling from a probabil-ity distribution.Zhu et.al.[12]model texture as a Markov Random Field and use Gibbs sampling for synthesis.Un-fortunately,Gibbs sampling is notoriously slow and in fact it is not possible to assess when it has converged.Heeger and Bergen[6]try to coerce a random noise image into a texture sample by matching thefilter response histograms at different spatial scales.While this technique works well on highly stochastic textures,the histograms are not pow-erful enough to represent more structured texture patterns such as bricks.De Bonet[1]also uses a multi-resolutionfilter-based ap-proach in which a texture patch at afiner scale is condi-tioned on its“parents”at the coarser scales.The algorithm works by taking the input texture sample and randomizing it in such a way as to preserve these inter-scale dependen-cies.This method can successfully synthesize a wide range of textures although the randomness parameter seems to ex-hibit perceptually correct behavior only on largely stochas-tic textures.Another drawback of this method is the way texture images larger than the input are generated.The in-put texture sample is simply replicated tofill the desired di-mensions before the synthesis process,implicitly assuming that all textures are tilable which is clearly not correct.The latest work in texture synthesis by Simoncelli and Portilla[9,11]is based onfirst and second order properties of joint wavelet coefficients and provides impressive results. It can capture both stochastic and repeated textures quite well,but still fails to reproduce high frequency information on some highly structured patterns.1.2.Our ApproachIn his 1948article,A Mathematical Theory of Communi-cation [10],Claude Shannon mentioned an interesting way of producing English-sounding written text using -grams.The idea is to model language as a generalized Markov chain:a set of consecutive letters (or words)make up an -gram and completely determine the probability distri-bution of the next letter (or word).Using a large sample of the language (e.g.,a book)one can build probability ta-bles for each -gram.One can then repeatedly sample from this Markov chain to produce English-sounding text.This is the basis for an early computer program called M ARK V.S HANEY ,popularized by an article in Scientific American [4],and famous for such pearls as:“I spent an interesting evening recently with a grain of salt”.This paper relates to an earlier work by Popat and Picard [8]in trying to extend this idea to two dimensions.The three main challenges in this endeavor are:1)how to define a unit of synthesis (a letter)and its context (-gram)for texture,2)how to construct a probability distribution,and 3)how to linearize the synthesis process in 2D.Our algorithm “grows”texture,pixel by pixel,outwards from an initial seed.We chose a single pixel as our unit of synthesis so that our model could capture as much high frequency information as possible.All previously synthe-sized pixels in a square window around (weighted to em-phasize local structure)are used as the context.To proceed with synthesis we need probability tables for the distribu-tion of ,given all possible contexts.However,while for text these tables are (usually)of manageable size,in our tex-ture setting constructing them explicitly is out of the ques-tion.An approximation can be obtained using various clus-tering techniques,but we choose not to construct a model at all.Instead,for each new context,the sample image is queried and the distribution of is constructed as a his-togram of all possible values that occurred in the sample image as shown on Figure 1.The non-parametric sampling technique,although simple,is very powerful at capturing statistical processes for which a good model hasn’t been found.2.The AlgorithmIn this work we model texture as a Markov RandomField (MRF).That is,we assume that the probability distri-bution of brightness values for a pixel given the brightness values of its spatial neighborhood is independent of the rest of the image.The neighborhood of a pixel is modeled as a square window around that pixel.The size of the window is a free parameter that specifies how stochastic the user be-lieves this texture to be.More specifically,if the texture is presumed to be mainly regular at high spatial frequencies and mainly stochastic at low spatial frequencies,the size of the window should be on the scale of the biggest regularfeature.age (left),a new image is being synthesized one pixel at a time (right).To synthesize a pixel,the algorithm first finds all neighborhoods in the sample image (boxes on the left)that are similar to the pixel’s neighborhood (box on the right)and then randomly chooses one neighborhood and takes its center to be the newly synthesized pixel.2.1.Synthesizing one pixelLet be an image that is being synthesized from a tex-ture sample image where is the real in-finite texture.Letbe a pixel and let be a square image patch of width centered at .Letdenote some perceptual distance between two patches.Let us assume for the moment that all pixels in except for are known.To synthesize the value of we first construct an approximation to the conditional probability distributionand then sample from it.Based on our MRF model we assume that is indepen-dent ofgiven .If we define a set containing all occurrences of in ,then the condi-tional pdf of can be estimated with a histogram of all cen-ter pixel values in.1Unfortunately,we are only given ,a finite sample from ,which means there might not be any matches for in .Thus we must use a heuristic which will let us find a plausibleto sample from.In our implementation,a variation of the nearest neighbor technique is used:the closest matchis found,and all im-age patches with are included in ,where for us.The center pixel values of patches in give us a histogram for ,which can then be sampled,either uniformly or weighted by .Now it only remains to find a suitable distance .One choice is a normalized sum of squared differences metric.However,this metric gives the same weight to any mismatched pixel,whether near the center or at the edge of the window.Since we would like to preserve the local structure of the texture as much as possible,the error for(a)(b)(c)Figure2.Results:given a sample image(left),the algorithm synthesized four new images with neighborhood windows of width5,11,15,and23pixels respectively.Notice how perceptually intuitively the window size corresponds to the degree of randomness in the resulting textures.Input images are:(a)synthetic rings,(b)Brodatz texture D11,(c)brick wall.nearby pixels should be greater than for pixels far away.Toachieve this effect we set where is a two-dimensional Gaussian kernel.2.2.Synthesizing textureIn the previous section we have discussed a method ofsynthesizing a pixel when its neighborhood pixels are al-ready known.Unfortunately,this method cannot be usedfor synthesizing the entire texture or even for hole-filling(unless the hole is just one pixel)since for any pixel the val-ues of only some of its neighborhood pixels will be known.The correct solution would be to consider the joint proba-bility of all pixels together but this is intractable for imagesof realistic size.Instead,a Shannon-inspired heuristic is proposed,wherethe texture is grown in layers outward from a3-by-3seedtaken randomly from the sample image(in case of holefill-ing,the synthesis proceeds from the edges of the hole).Nowfor any point to be synthesized only some of the pixel val-ues in are known(i.e.have already been synthesized).Thus the pixel synthesis algorithm must be modified to han-dle unknown neighborhood pixel values.This can be easilydone by only matching on the known values in andnormalizing the error by the total number of known pixelswhen computing the conditional pdf for.This heuristicdoes not guarantee that the pdf for will stay valid as therest of isfilled in.However,it appears to be a goodapproximation in practice.One can also treat this as an ini-tialization step for an iterative approach such as Gibbs sam-pling.However,our trials have shown that Gibbs samplingproduced very little improvement for most textures.Thislack of improvement indicates that the heuristic indeed pro-vides a good approximation to the desired conditional pdf.3.ResultsOur algorithm produces good results for a wide range oftextures.The only parameter set by the user is the widthof the context window.This parameter appears to intuitivelycorrespond to the human perception of randomness for mosttextures.As an example,the image with rings on Figure2ahas been synthesized several times while increasing.Inthefirst synthesized image the context window is not bigenough to capture the structure of the ring so only the notionof curved segments is preserved.In the next image,thecontext captures the whole ring,but knows nothing of inter-ring distances producing a Poisson process pattern.In thethird image we see rings getting away from each other(socalled Poisson process with repulsion),andfinally in thelast image the inter-ring structure is within the reach of thewindow as the pattern becomes almost purely structured.Figure3shows synthesis examples done on real-worldtextures.Examples of constrained synthesis are shown on(a)(b)(c)(d)Figure3.Texture synthesis on real-world textures:(a)and(c)are original images,(b)and(d)are synthesized.(a)images D1,D3,D18,and D20from Brodatz collection[2],(c)granite,bread,wood,and text(a homage to Shannon)images.Figure 4.Examples of constrained texture synthesis.The synthesis process fills in the black regions.Figure 4.The black regions in each image are filled in by sampling from that same image.A comparison with De Bonet [1]at varying randomness settings is shown on Fig-ure 7using texture 161from his web site.4.Limitations and Future WorkAs with most texture synthesis procedures,only frontal-parallel textures are handled.However,it is possible to use Shape-from-Texture techniques [5,7]to pre-warp an image into frontal-parallel position before synthesis and post-warp afterwards.One problem of our algorithm is its tendency for some textures to occasionally “slip”into a wrong part of the search space and start growing garbage (Figure 5a)or get locked onto one place in the sample image and produce ver-batim copies of the original (Figure 5b).These problems occur when the texture sample contains too many different types of texels (or the same texels but differently illumi-nated)making it hard to find close matches for the neigh-borhood context window.These problems can usually be eliminated by providing a bigger sample image.We have also used growing with limited backtracking as asolution.(a)(b)Figure 5.Failure examples.Sometimes the growing algo-rithm “slips”into a wrong part of the search space and starts growing garbage (a),or gets stuck at a particular place in the sample image and starts verbatim copying (b).In the future we plan to study automatic window-size se-lection,including non-square windows for elongated tex-tures.We are also currently investigating the use of texels as opposed to pixels as the basic unit of synthesis (similar to moving from letters to words in Shannon’s setting).This is akin to putting together a jigsaw puzzle where each piece has a different shape and only a few can fit together.Cur-rently,the algorithm is quite slow but we are working on ways to make it more efficient.5.ApplicationsApart from letting us gain a better understanding of tex-ture models,texture synthesis can also be used as a tool for solving several practical problems in computer vision,graphics,and image processing.Our method is particularly versatile because it does not place any constraints on the shape of the synthesis region or the sampling region,mak-ing it ideal for constrained texture synthesis such as hole-filling.Moreover,our method is designed to preserve local image structure,such as continuing straight lines,so there are no visual discontinuities between the original hole out-line and the newly synthesized patch.For example,capturing a 3D scene from several cam-era views will likely result in some regions being occluded from all cameras [3].Instead of letting them appear as black holes in a reconstruction,a localized constrained texture synthesis can be performed to fill in the missing informa-tion from the surrounding region.As another example,con-sider the problem of boundary handling when performing a convolution on an image.Several methods exist,such as zero-fill,tiling and reflection,however all of them may in-troduce discontinuities not present in the original image.In many cases,texture synthesis can be used to extrapolate theFigure6.The texture synthesis algorithm is applied to a real image(left)extrapolating it using itself as a model,to result in a larger image(right)that,for this particular image,looks quite plausible.This technique can be used in convolutions to extendfilter support at imageboundaries.Our method sample image De Bonet’s methodFigure7.Texture synthesized from sample image with our method compared to[1]at decreasing degree of randomness. image by sampling from itself as shown on Figure6.The constrained synthesis process can be further en-hanced by using image segmentation tofind the exact sam-pling region boundaries.A small patch of each region canthen be stored together with region boundaries as a lossycompression technique,with texture synthesis being used torestore each region separately.If afigure/ground segmen-tation is possible and the background is texture-like,thenforeground removal can be done by synthesizing the back-ground into the foreground segment.Our algorithm can also easily be applied to motion syn-thesis such as ocean waves,rolling clouds,or burningfireby a trivial extension to3D.Acknowledgments:We would like to thank Alex Berg,Elizaveta Levina,and Yair Weiss for many helpful discus-sions and comments.This work has been supported byNSF Graduate Fellowship to AE,Berkeley Fellowship toTL,ONR MURI grant FDN00014-96-1-1200,and the Cal-ifornia MICRO grant98-096.References[1]J.S.D.Bonet.Multiresolution sampling procedure for anal-ysis and synthesis of texture images.In SIGGRAPH’97,pages361–368,1997.[2]P.Brodatz.Textures.Dover,New York,1966.[3]P.E.Debevec,C.J.Taylor,and J.Malik.Modeling and ren-dering architecture from photographs:A hybrid geometry-and image-based approach.In SIGGRAPH’96,pages11–20,August1996.[4] A.Dewdney.A potpourri of programmed prose and prosody.Scientific American,122-TK,June1989.[5]J.Garding.Surface orientation and curvature from differen-tial texture distortion.ICCV,pages733–739,1995.[6] D.J.Heeger and J.R.Bergen.Pyramid-based texture anal-ysis/synthesis.In SIGGRAPH’95,pages229–238,1995.[7]J.Malik and puting local surface orien-tation and shape from texture for curved surfaces.Interna-tional Journal of Computer Vision,23(2):149–168,1997.[8]K.Popat and R.W.Picard.Novel cluster-based probabilitymodel for texture synthesis,classification,and compression.In Proc.SPIE Visual Comm.and Image Processing,1993.[9]J.Portilla and E.P.Simoncelli.Texture representation andsynthesis using correlation of complex wavelet coefficientmagnitudes.TR54,CSIC,Madrid,April1999.[10] C.E.Shannon.A mathematical theory of communication.Bell Sys.Tech.Journal,27,1948.[11] E.P.Simoncelli and J.Portilla.Texture characterization viajoint statistics of wavelet coefficient magnitudes.In Proc.5th Int’l Conf.on Image Processing Chicago,IL,1998.[12]S.C.Zhu,Y.Wu,and D.Mumford.Filters,randomfieldsand maximum entropy(frame).International Journal ofComputer Vision,27(2):1–20,March/April1998.。
纹理物体缺陷的视觉检测算法研究--优秀毕业论文
摘 要
在竞争激烈的工业自动化生产过程中,机器视觉对产品质量的把关起着举足 轻重的作用,机器视觉在缺陷检测技术方面的应用也逐渐普遍起来。与常规的检 测技术相比,自动化的视觉检测系统更加经济、快捷、高效与 安全。纹理物体在 工业生产中广泛存在,像用于半导体装配和封装底板和发光二极管,现代 化电子 系统中的印制电路板,以及纺织行业中的布匹和织物等都可认为是含有纹理特征 的物体。本论文主要致力于纹理物体的缺陷检测技术研究,为纹理物体的自动化 检测提供高效而可靠的检测算法。 纹理是描述图像内容的重要特征,纹理分析也已经被成功的应用与纹理分割 和纹理分类当中。本研究提出了一种基于纹理分析技术和参考比较方式的缺陷检 测算法。这种算法能容忍物体变形引起的图像配准误差,对纹理的影响也具有鲁 棒性。本算法旨在为检测出的缺陷区域提供丰富而重要的物理意义,如缺陷区域 的大小、形状、亮度对比度及空间分布等。同时,在参考图像可行的情况下,本 算法可用于同质纹理物体和非同质纹理物体的检测,对非纹理物体 的检测也可取 得不错的效果。 在整个检测过程中,我们采用了可调控金字塔的纹理分析和重构技术。与传 统的小波纹理分析技术不同,我们在小波域中加入处理物体变形和纹理影响的容 忍度控制算法,来实现容忍物体变形和对纹理影响鲁棒的目的。最后可调控金字 塔的重构保证了缺陷区域物理意义恢复的准确性。实验阶段,我们检测了一系列 具有实际应用价值的图像。实验结果表明 本文提出的纹理物体缺陷检测算法具有 高效性和易于实现性。 关键字: 缺陷检测;纹理;物体变形;可调控金字塔;重构
Keywords: defect detection, texture, object distortion, steerable pyramid, reconstruction
II
国外大牛的图像修复综述
InpaintingMarcelo Bertalm´ıo,Vicent Caselles,Simon Masnou,GuillermoSapiroSynonyms–Disocclusion–Completion–Filling-in–Error concealmentRelated Concepts–Texture synthesisDefinitionGiven an image and a regionΩinside it,the inpainting problem consists in modifying the image values of the pixels inΩso that this region does not stand out with respect to its surroundings.The purpose of inpainting might be to restore damaged portions of an image(e.g.an old photograph where folds and scratches have left image gaps)or to remove unwanted elements present in the image(e.g.a microphone appearing in afilm frame).Seefigure1.The regionΩis always given by the user,so the localization ofΩis not part of the inpainting problem.Almost all inpainting algorithms treatΩas a hard constraint,whereas some methods allow some relaxing of the boundaries ofΩ.This definition,given for a single-image problem,extends naturally to the multi-image case therefore this entry covers both image and video inpainting. What is not however considered in this text is surface inpainting(e.g.how tofill holes in3D scans),although this problem has been addessed in the literature.Fig.1.The inpainting problem.Left:original image.Middle:inpainting mask Ω,in black.Right:an inpainting result.Figure taken from[20].BackgroundThe term inpainting comes from art restoration,where it is also called re-touching.Medieval artwork started to be restored as early as the Renaissance, the motives being often as much to bring medieval pictures“up to date”as to fill-in any gaps.The need to retouch the image in an unobtrusive way extended naturally from paintings to photography andfilm.The purposes remained the same:to revert deterioration(e.g.scratches and dust spots infilm),or to add or remove elements(e.g.the infamous“airbrushing”of political enemies in Stalin-era U.S.S.R).In the digital domain,the inpainting problemfirst appeared under the name“error concealment”in telecommunications,where the need was to fill-in image blocks that had been lost during data transmission.One of thefirst works to address automatic inpainting in a general setting dubbed it“image disocclusion,”since it treated the image gap as an occluding object that had to be removed,and the image underneath would be the restoration result.Popular terms used to denote inpainting algorithms are also“image completion”and “imagefill-in”.ApplicationThe extensive literature on digital image inpainting may be roughly grouped into three categories:patch-based,sparse,and PDEs/variational methods. From texture synthesis to patch-based inpaintingEfros and Leung[14]proposed a method that,although initially intended for texture synthesis,has proven most effective for the inpainting problem.The image gap isfilled-in recursively,inwards from the gap boundary:each“empty”pixel P at the boundary isfilled with the value of the pixel Q(lying outside the image gap,i.e.Q is a pixel with valid information)such that the neighborhood Ψ(Q)of Q(a square patch centered in Q)is most similar to the(available) neighborhoodΨ(P)of P.Formally,this can be expressed as an optimization problem:Output(P)=V alue(Q),P∈Ω,Q/∈Ω,Q=arg min d(Ψ(P),Ψ(Q)),(1) where d(Ψ(P),Ψ(Q))is the Sum of Squared Differences(SSD)among the patches Ψ(P)andΨ(Q)(considering only available pixels):d(Ψ1,Ψ2)=ij|Ψ1(i,j)−Ψ2(i,j)|2,(2)and the indices i,j span the extent of the patches(e.g.ifΨis an11×11patch then0≤i,j≤10.Once P isfilled-in,the algorithm marchs on to the next pixel at the boundary of the gap,never going back to P(whose value is,therefore,not altered again).See Figure2for an overview of the algorithm and Figure3for an example of the outputs it can achieve.The results are really impressive for a wide range of images.The main shortcomings of this algorithm are its computational cost,the selection of the neighborhood size(which in the original paper is a global user-selected parameter,but which should change locally depending onimage content),thefilling order(which may create unconnected boundaries for some objects)and the fact that it cannot deal well with image perspective(it was intended to synthesize frontal textures,hence neighborhoods are compared always with the same size and orientation).Also,results are poor if the image gap is very large and disperse(e.g.an image where80%of the pixels have been lost due to random salt and pepper noise).Fig.2.Efros and Leung’s algorithm overview(figure taken from[14]).Given a sample texture image(left),a new image is being synthesized one pixel at a time(right).To synthesize a pixel,the algorithmfirstfinds all neighborhoods in the sample image(boxes on the left)that are similar to the pixels neighborhood (box on the right)and then randomly chooses one neighborhood and takes its center to be the newly synthesized pixel.Criminisi et al.[12]improved on this work in two aspects.Firstly,they changed thefilling order from the original“onion-peel”fashion to a priority scheme where empty pixels at the edge of an image object have higher prior-ity than empty pixels onflat regions.Thus,they are able to correctly inpaint straight object boundaries which could have otherwise ended up disconnected with the original formulation.See Figure4.Secondly,they copy entire patches instead of single pixels,so this method is considerably faster.Several shortcom-ings remain,though,like the inability to deal with perspective and the need to manually select the neighborhood size(here there are two sizes to set,one for the patch to compare with and another for the patch to copy from).Also,objects with curved boundaries may not be inpainted correctly.Ashikhmin[2]contributed as well to improve on the original method of Efros and Leung[14].With the idea of reducing the computational cost of the proce-dure,he proposed to look for the best candidate Q to copy its value to the empty pixel P not searching the whole image but only searching among the candidates of the neighbors of P which have already been inpainted.See Figure5.The speed-up achieved with this simple technique is considerable,and also there is a very positive effect regarding the visual quality of the output.Other methods reduce the search space and computational cost involved in the candidate patch search by organizing image patches in tree structures,reducing the dimension-Fig.3.Left:original image,inpainting maskΩin black.Right:inpainting result obtained with Efros and Leung’s algorithm,images taken from their paper[14]. ality of the patches with techniques like Principal Component Analysis(PCA), or using randomized approaches.While most image inpainting methods attempt to be fully automatic(aside from the manual setting of some parameters),there are user-assisted methods that provide remarkable results with just a little input from the user.In the work by Sun et al.[27]the user must specify curves in the unknown region, curves corresponding to relevant object boundaries.Patch synthesis is performed along these curves inside the image gap,by copying from patches that lie on the segments of these curves which are outside the gap,in the“known”region. Once these curves are completed,in a process which the authors call structure propagation,the remaining empty pixels are inpainted using a technique like the one by Ashikhmin[2]with priorities as in Criminisi et al.[12].Barnes et al.[5]accelerate this method and make it interactive,by employing randomized searches and combining into one step the structure propagation and texture synthesis processes of Sun et al.[27].The role of sparsityAfter the introduction of patch-based methods for texture synthesis by Efros and Leung[14],and image inpainting by Criminisi et al[12],it became clear that the patches of an image provide a good dictionary to express other parts of the image.This idea has been successfully applied to other areas of image processing,e.g.denoising and segmentation.More general sparse image representations using dictionaries have proven their efficiency in the context of inpainting.For instance,using overcomplete dictionaries adapted to the representation of image geometry and texture,Elad et al.[15]proposed an image decomposition model with sparse coefficients forFig.4.Left:original image.Right:inpainting result obtained with the algorithm of Criminisi et al.[12],images taken from their paper.the geometry and texture components of the image,and showed that the model can be easily adapted for image inpainting.A further description of this model follows.Let u be an image represented as a vector in R N .Let the matrices D g ,D t of sizes N ×k g and N ×k t represent two dictionaries adapted to geometry and texture,respectively.If αg ∈R k g and αt ∈R k g represent the geometry and texture coefficients,then u =D g αg +D t αt represents the image decomposition using the dictionaries collected in D g and D t .A sparse image representation is obtained by minimizingmin (αg ,αt ):u =D g αg +D t αt αg p + αt p ,(3)where p =0,1.Although the case p =0represents the sparseness measure (i.e.,the number of non zero coordinates)it leads to a non-convex optimization problem whose minimization is more complex.The case p =1yields a convex and tractable optimization problem leading also to sparsness.Introducing the constraint by penalization (thus,in practice,relaxing it)and regularizing the ge-ometric part of the decomposition with a total variation semi-norm penalization,Elad et al [15]propose the variational model:min (αg ,αt ) αg 1+ αt 1+λ u −D g αg −D t αt 22+γT V (D g αg ),(4)where T V denotes the total variation,λ,γ>0.This model can be easily adapted to a model for image inpainting.Observe that u −D g αg −D t αt can be inter-preted as the noise component of the image and λis a penalization parameterFig.5.Ashikhmin’s texture synthesis method(figure taken from[2]).Each pixel in the current L-shaped neighborhood generates a shifted candidate pixel(black) according to its original position(hatched)in the input texture.The best pixel is chosen among these candidates only.Several different pixels in the current neighborhood can generate the same candidate.that depends inversely on the noise power.Then the inpainting mask can be in-terpreted as a region where the noise is very large(infinite).Thus,if M=0and =1identify the inpainting mask and the known part of the image,respectively, then the extension of(4)to inpainting can be written asαg 1+ αt 1+λ M(u−D gαg−D tαt) 22+γT V(D gαg).(5) min(αg,αt)Writing the energy in(5)using u g:=D g u,u t:=D t u as unknown variables, it can be observed thatαg=D+g u g+r g,αt=D+t u t+r t,where D+g,D+t denote the corresponding pseudoinverse matrices and r g,r t are in the null spaces of D g and D t,respectively.Assuming for simplicity,as in Elad et al[15],that r g=0, r t=0,the model(5)can be written asD+g u g 1+ D+t u t 1+λ M(u−u g−u t) 22+γT V(u g).(6) min(αg,αt)This simplified model is justified in Elad et al[15]by several reasons:it is an upper bound for(5),is easier to solve,it provides good results,it has a Bayesian interpretation,and is equivalent to(5)if D g and D t are non-singular,or when using the 2norm in place of the 1norm.The model has nice featuressince it permits to use adapted dictionaries for geometry and texture,treats the inpainting as missing samples and the sparsity model is included with 1norms that are easy to solve.This framework has been adapted to the use of dictionaries of patches and has been extended in several directions like image denoising,filling-in missing pixels (Aharon et al [1]),color image denoising,demosaicing and inpainting of small holes (Mairal et al [21],and further extended to deal with multiscale dictionaries and to cover the case of video sequences in Mairal et al [22].To give a brief review of thismodel some notation is required.Image patches are squares of size n =√n ×√n .Let D be a dictionary of patches represented by a matrix of size n ×k ,where the elements of the dictionary are the columns of D .If α∈R k is a vector of coefficients,then Dαrepresents the patch obtained by linear combination of the columns of D .Given an image v (i,j ),i,j ∈{1,...,N },the purpose is to find a dictionary ˆD ,an image ˆu and coefficients ˆα={ˆαi,j ∈R k :i,j ∈{1,...,N }}which minimize the energymin (α,D,u )λ v −u 2+Ni,j =1µi,j αi,j 0+Ni,j =1 Dαi,j −R i,j u 2,(7)where R i,j u denotes the patch of u centered at (i,j )(dismissing boundary ef-fects),and µi,j are positive weights.The solution of the nonconvex problem (7)is obtained using an alternate minimization:a sparse coding step where one computes αi,j knowing the dictionary D for all i,j ,a dictionary update using a sequence of one rank approximation problems to update each column of D (Aharon,Elad,and Bruckstein [1]),and a final reconstruction step given by the solution ofmin u λ v −u 2+N i,j =1ˆDαi,j −R i,j u 2.(8)Again,the inpainting problem can be considered as a case of non-homogeneous noise.Defining for each pixel (i,j )a coefficient βi,j inversely proportional to the noise variance,a value of βi,j =0may be taken for each pixel in the inpainting mask.Then the inpainting problem can be formulated asmin (α,D,u )λ β⊗(v −u ) 2+Ni,j =1µi,j αi,j 0+Ni,j =1 (R i,j β)⊗(Dαi,j −R i,j u ) 2,(9)where β=(βi,j )N i,j =1,and ⊗denotes the elementwise multiplication between two vectors.With suitable adaptations,this model has been applied to inpainting (of relatively small holes),to interpolation from sparse irregular samples and super-resolution,to image denoising,demoisaicing of color images,and video denoising and inpainting,obtaining excellent results,see Mairal et al [22].PDEs and variational approachesAll the methods mentioned so far are based on the same principle:a miss-ing/corrupted part of an image can be well synthetized by suitably sampling and copying uncorrupted patches (taken either from the image itself or built from a dictionary).A very different point of view underlies many contributions in-volving either a variational principle,through a minimization process,or a (non necessarily variational)partial differential equation (PDE).An early interpolation method that applies for inpainting is due to Ogden,Adelson,Bergen,and Burt [24].Starting from an initial image,a Gaussian filter-ing is built by iterated convolution and subsampling.Then,a given inpainting domain can be filled-in by successive linear interpolations,downsampling and upsampling at different levels of the Gaussian pyramid.The efficiency of such approach is illustrated in Figure 6.Fig.6.An inpainting experiment taken from Ogden et al [24].The method uses a Gaussian pyramid and a series of linear interpolations,downsampling,and upsampling.Masnou and Morel proposed in [23]to interpolate a gray-valued image by extending its isophotes (the lines of constant intensity)in the inpainting domain.This approach is very much in the spirit of early works by Kanizsa,Ullman,Horn,Mumford and Nitzberg to model the ability of the visual system to complete edges in an occlusion or visual illusion context.This is illustrated in Figure 7.The general completion process involves complicated phenomena that cannot be easily and univocally modeled.However,experimental results show that,in simple occlusion situations,it is reasonable to argue that the brain extrapolates broken edges using elastica-type curves,i.e.,curves that join two given points with prescribed tangents at these points,a totallength lower than a given L ,and minimize the Euler elastica energy |κ(s )|2ds ,with s the curve arc-lengthand κthe curvature.The model by Masnou and Morel [23]generalizes this principle to the isophotes of a gray-valued image.More precisely,denoting ˜Ωa domain slightly larger than Ω,it is proposed in [23]to extrapolate the isophotes of an image u ,known out-Fig.7.Amodal completion:the visual system automatically completes the bro-ken edge in the leftfigure.The middlefigure illustrates that,here,no global symmetry process is involved:in bothfigures,the same edge is synthesized.In such simple situation,the interpolated curve can be modeled as a Euler’s elas-tica,i.e.a curve with clamped points and tangents at its extremities,and with minimal oscillations.sideΩand valued in[m,M],by a collection of curves{γt}t∈[m,M]with no mutual crossings,that coincide with the isophotes of u on˜Ω\Ωand that minimize theenergyMmγt(α+β|κγt|p)ds dt.(10)Hereα,βare two context-dependent parameters.This energy penalizes a gener-alized Euler’s elastica energy,with curvature to the power p>1instead of2,of all extrapolation curvesγt,t∈[m,M].An inpainting algorithm,based on the minimization of(10)in the case p=1, is proposed by Masnou and Morel in[23].A globally minimal solution is com-puted using a dynamic programming approach that reduces the algorithmical complexity.The algorithm handles only simply connected domains,i.e.,those with no holes.In order to deal with color images,RGB images are turned into a luma/chrominance representation,e.g.YCrCb,or Lab,and each channel is processed independently.The reconstruction process is illustrated in Figure8.The word inpainting,in the image processing context,has been coinedfirst by Bertalm´ıo,Sapiro,Caselles,and Ballester in[7],where a PDE model is proposed in the very spirit of real paintings restoration.More precisely,being u a gray-valued image to be inpainted inΩ,a time stepping method for the transport-like equationu t=∇⊥u·∇∆u inΩ,(11)u given inΩc,is combined with anisotropic diffusion steps that are interleaved for stabilization, using the following diffusion modelu t=ϕ (x)|∇u|∇·∇u|∇u|,(12)whereϕ is a smooth cut-offfunction that forces the equation to act only inΩ, and∇·(∇u/|∇u|)is the curvature along isophotes.This diffusion equation,that(a)(b)(c)(d)(e)(f)Fig.8.8(a)is the original image and8(b)the image with occlusions in white. The luminance channel is shown in Figure8(c).A few isophotes are drawn in Figure8(d)and their reconstruction by the algorithm of Masnou and Morel[23] is given in Figure8(e).Applying the same method to the luminance,hue,and saturation channels,yields thefinal result of Figure8(f).has been widely used for denoising an image while preserving its edges,com-pensates any shock possibly created by the transport-like equation.What is the meaning of Equation(11)?Following Bertalm´ıo et al[7],∆u is a measure of im-age smoothness,and stationary points for the equation are images for which∆u is constant along the isophotes induced by the vectorfield∇⊥u.Equation(11) is not explicitly a transport equation for∆u,but,in the equivalent form,u t=−∇⊥∆u·∇u(13) it is a transport equation for u being convected by thefield∇⊥∆u.Following Bornemann and M¨a rz[9],thisfield is in the direction of the level lines of∆u, which are related to the Marr-Hildreth edges.Indeed,the zero crossings of(a convoluted version of)∆u are the classical characterization of edges in the cel-ebrated model of Marr and Hildreth.In other words,as in the real paintings restoration,the approach of Bertalm´ıo et al[7]consists in conveying the image intensities along the direction of the edges,from the boundary of the inpainting domainΩtowards the interior.The efficiency of such approach is illustrated in Figure9.From a numerical viewpoint,the transport equation and the anisotropic diffusion can be implemented with classicalfinite difference schemes.For color images,the coupled system can be applied independently to each channel ofany classical luma/chrominance representation.There is no restriction on the topology of the inpaintingdomain.Fig.9.An experiment taken from Bertalm´ıo et al [7].Left:original image.Mid-dle:a user-defined mask.Right:the result with the algorithm of [7].Another perspective on this model is provided by Bertalm´ıo,Bertozzi,and Sapiro in [6],where connections with the classical Navier-Stokes equation of fluid dynamics are shown.Indeed,the steady state equation of Bertalm´ıo et al [7],∇⊥u ·∇∆u =0,is exactly the equation satisfied by steady state inviscid flows in the two-dimensional incompressible Navier-Stokes model.Although the anisotropic diffusion equa-tion (12)is not the exact couterpart of the viscous diffusion term used in the Navier-Stokes model for incompressible and Newtonian flows,yet a lot of the numerical knowledge on fluid mechanics seems to be adaptable to design sta-ble and efficient schemes for inpainting.Results in this direction are shown in Bertalm´ıo,Bertozzi,and Sapiro [6].Chan and Shen propose in [11]a denoising/inpainting first-order model based on the joint minimization of a quadratic fidelity term outside Ωand a total variation criterion in Ω,i.e.,the joint energy A|∇u |dx +λ Ω|u −u 0|2dx,with A ⊃⊃Ωthe image domain and λa Lagrange multiplier.The existence of so-lutions to this problem follows easily from the properties of functions of bounded variation.As for the implementation,Chan and Shen look for critical points of the energy using a Gauss-Jacobi iteration scheme for the linear system associ-ated to an approximation of the Euler-Lagrange equation by finite differences.More recent approaches to the minimization of total variation with subpixel ac-curacy should nowadays be preferred.From the phenomenological point of view, the model of Chan and Shen[11]yields inpainting candidates with the smallest possible isophotes.It is therefore more suitable for thin or sparse domains.An illustration of the model’s performances is given in Figure10Fig.10.An experiment taken from Chan and Shen[11].Left:original image. Right:after denoising and removal of text.Turning back to the criterion(10),a similar penalization on˜Ωof both the length and the curvature of all isophotes of an image u yields two equivalent forms,in the case where u is smooth enough(see Masnou and Morel[23]): +∞−∞{u=t}∩˜Ω(α+β|κ|p)ds dt=˜Ω|∇u|α+β∇·∇u|∇u|pdx.(14)There have been various contributions to the numerical approximation of critical points for this criterion.A fourth-order time-stepping method is proposed by Chan,Kang,and Shen in[10]based on the approximation of the Euler-Lagrange equation,for the case p=2,using upwindfinite differences and a min-mod formula for estimating the curvature.Such high-order evolution method suffers from well-known stability and convergence issues that are difficult to handle.A model,slightly different from(14),is tackled by Ballester,Bertalm´ıo, Caselles,Sapiro,and Verdera in[4]using a relaxation approach.The key idea is to replace the second-order term∇·∇u|∇u|with afirst-order term depending on an auxiliary variable.More precisely,Ballester et al study in[4]the minimization of˜Ω|∇·θ|p(a+b|∇k∗u|)dx+α˜Ω(|∇u|−θ·∇u)dx,under the constraint thatθis a vectorfield with subunit modulus and prescribed normal component on the boundary of˜Ω,and u takes values in the same range as inΩc.Clearly,θplays the role of∇u/|∇u|but the new criterion is much less singular.As for k,it is a regularizing kernel introduced for technical reasons in order to ensure the existence of a minimizing couple(u,θ).The main differencebetween the new relaxed criterion and (14),besides singularity,is the term ˜Ω|∇·θ|p which is more restrictive,despite the relaxation,than ˜Ω|∇u | ∇·∇u |∇u | p dx .However,the new model has a nice property:a gradient descent with respect to (u,θ)can be easily computed and yields two coupled second-order equations whose numerical approximation is standard.Results obtained with this model are shown in Figure 11.Fig.11.Two inpainting results obtained with the model proposed by Ballester et al [4].Observe in particular how curved edges are restored.The Mumford-Shah-Euler model by Esedoglu and Shen [17]is also varia-tional.It combines the celebrated Mumford-Shah segmentation model for images and the Euler’s elastica model for curves,i.e.,denoting u a piecewise weakly smooth function,that is a function with integrable squared gradient out of a discontinuity set K ⊂˜Ω,the proposed criterion reads ˜Ω\K |∇u |2dx + K(α+βk 2)ds.Two numerical approaches to the minimization of such criterion are discussed in Esedoglu and Shen [17]:first,a level set approach based on the representation of K as the zero-level set of a sequence of smooth functions that concentrate,and the explicit derivation,using finite differences,of the Euler-Lagrange equations associated with the criterion;the second method addressed by Esedoglu and Shen is a Γ-convergence approach based on a result originally conjectured by De Giorgi and recently proved by Sch¨a tzle.In both cases,the final system of discrete equations is of order four,facing again difficult issues of convergence and stability.More recently,following the work of Grzibovskis and Heintz on the Willmore flow,Esedoglu,Ruuth,and Tsai [16]have addressed the numerical flow associ-ated with the Mumford-Shah-Euler model using a promising convolution/thresholding method that is much easier to handle than the previous approaches.Tschumperl´e proposes in [28]an efficient second-order anisotropic diffusion model for multi-valued image regularization and inpainting.Given a R N -valued image u known outside Ω,and starting from an initial rough inpainting obtained by straightforward advection of boundary values,the pixels in the inpainting domain are iteratively updated according to a finite difference approximation to the equations ∂u i ∂t=trace(T ∇2u i ),i ∈{1,···,N }.Here,T is the tensor field defined asT =1(1+λmin +λmax )α1v min ⊗v min +1(1+λmin +λmax )α2v max ⊗v max ,with 0<α1<<α2,and λmin ,λmax ,v min ,v max are the eigenvalues and eigen-vectors,respectively,of G σ∗ N i =1∇u i ⊗∇u i ,being G σa smoothing kernel and Ni =1∇u i ⊗∇u i the classical structure tensor,that is known for representing well the local geometry of u .Figure 12reproduces an experiment taken from Tschumperl´e [28].Fig.12.An inpainting experiment (the middle image is the mask defined by the user)taken from Tschumperl´e [28].The approach of Auroux and Masmoudi in [3]uses the PDE techniques that have been developed for the inverse conductivity problem in the context of crack detection.The link with inpainting is the following:missing edges are modeled as cracks and the image is assumed to be smooth out of these cracks.Given a crack,two inpainting candidates can be obtained as the solutions of the Laplace equation with Neumann condition along the crack and either a Dirichlet,or a Neumann condition on the domain’s boundary.The optimal cracks are those for which the two candidates are the most similar in quadratic norm,and theycan be found through topological analysis,i.e.they correspond to the set of points where putting a crack mostly decreases the quadratic difference.Both the localization of the cracks and the associated piecewise smooth inpainting solutions can be found using fast and simplefinite differences schemes.Finally,Bornemann and M¨a rz propose in[9]afirst-order model to advect the image information along the integral curves of a coherence vectorfield that extends inΩthe dominant directions of the image gradient.This coherence field is explicitly defined,at every point,as the normalized eigenvector to the minimal eigenvalue of a smoothed structure tensor whose computation carefully avoids boundary biases in the vicinity of∂Ω.Denoting c the coherencefield, Bornemann and M¨a rz show that the equation c·∇u=0with Dirichlet boundary constraint can be obtained as the vanishing viscosity limit of an efficient fast-marching scheme:the pixels inΩare synthezised one at a time,according to their distance to the boundary.The new value at a pixel p is a linear combination of both known and previously generated values in a neighborhood of p.The key ingredient of the method is the explicit definition of the linear weights according to the coherencefield c.Although the Bornemann-M¨a rz model requires a careful tune of four parameters,it is much faster than the PDE approaches mentioned so far,and performs very well,as illustrated in Figure13Fig.13.An inpainting experiment taken from Bornemann and M¨a rz[9],with a reported computation time of0.4sec.Combining and extending PDEs and patch modelsIn general,most PDE/variational methods that have been presented so far per-form well for inpainting either thin or sparsely distributed domains.However, there is a common drawback to all these methods:they are unable to restore texture properly,and this is particularly visible on large inpainting domains,like for instance in the inpainting result of Figure12where the diffusion method is not able to recover the parrot’s texture.On the other hand,patch-based meth-ods are not able to handle sparse inpainting domains like in Figure14,where no valid squared patch can be found that does not reduce to a point.On the contrary,most PDE/variational methods remain applicable in such situation, like in Figure14where the model proposed by Masnou and Morel[23]yields the。
语音合成 技术原理
语音合成技术原理
语音合成是一种将文本转化为语音的技术。
其技术原理基于文本到语音合成(Text-to-Speech, TTS)技术,通过将输入的文
字转化为音频输出,实现了计算机语音的模拟。
以下是实现语音合成的主要技术原理:
1. 文本分析:系统首先对输入的文本进行分析,包括句子和词汇的解析,以及语义和语法的理解。
这一步骤能够帮助系统准确地理解输入的文本内容,为后续的音频合成做好准备。
2. 音素转换:音素是语音中最小的发音单位。
文本中的每个词语都会被转化为对应的音素序列。
将文本转化为音素可以提高语音合成的准确性和自然度。
音素转换通常基于语音数据库或是统计模型。
3. 声调和语调处理:在语音合成过程中,声调和语调对于表达语义和情感起着重要的作用。
系统会对文本中的每个音素添加相应的声调和语调模式,以使合成的语音更加生动和自然。
4. 音频合成:根据文本和音素的信息,系统会将其转换为对应的语音波形。
音频合成可以使用多种方法,包括拼接单元(concatenative synthesis)、基于规则的合成(rule-based synthesis)和基于统计的合成(statistical parametric synthesis)等。
不同的方法在准确性、自然度和灵活性上有所差异。
5. 合成后处理:合成的语音波形可能会经过一些后处理技术以优化合成效果。
这些后处理方法可以用于去除噪声、调整音量、
增加语音的清晰度和自然度等。
最终,语音合成技术将生成的语音输出给用户,使得计算机能够通过模拟人类语音的方式与用户进行交互,形成自然流畅的对话体验。
高级软阴影映射技术:Louis Bavoil NVIDIA开发人员技术说明书
Using Bilinear PCF with DX10
! CSMs and ESMs also have this limitation
! Shadows look bad when blurring shadow map without everything rendered into them
VSM Light Bleeding
Two quads floating above a ground plane
d = d0 + dot(uv_offset, gradient)
d
[Schuler06] and [Isidoro06]
! Render midpoints into shadow map
! Midpoint z = (z0 + z1) / 2 ! Requires two rasterization passes
Ground plane
z
False occlusion (z < d)
d
P
depth bias should increase
PCF Self-Shadowing Solutions
! Use depth gradient = float2(dz/du, dz/dv)
Make depth d follow tangent plane
! Approximate the depth values in the kernel by a Gaussian distribution of mean μ and variance σ2
Simulating Soft Shadows with Graphics Hardware
Simulating Soft Shadowswith Graphics HardwarePaul S.Heckbert and Michael HerfJanuary15,1997CMU-CS-97-104School of Computer ScienceCarnegie Mellon UniversityPittsburgh,PA15213email:ph@,herf+@World Wide Web:/phThis paper was written in April1996.An abbreviated version appeared in[Michael Herf and Paul S.Heckbert,Fast Soft Shadows,Visual Proceedings,SIGGRAPH96,Aug.1996,p.145].AbstractThis paper describes an algorithm for simulating soft shadows at interactive rates using graphics hardware.On current graphics workstations,the technique can calculate the soft shadows cast by moving,complex objects onto multiple planar surfaces in about a second.In a static,diffuse scene,these high quality shadows can then be displayed at30Hz,independent of the number and size of the light sources.For a diffuse scene,the method precomputes a radiance texture that captures the shadows and other brightness variations on each polygon.The texture for each polygon is computed by creating registered projections of the scene onto the polygon from multiple sample points on each light source,and averaging the resulting hard shadow images to compute a soft shadow image. After this precomputation,soft shadows in a static scene can be displayed in real-time with simple texture mapping of the radiance textures.All pixel operations employed by the algorithm are supported in hardware by existing graphics workstations. The technique can be generalized for the simulation of shadows on specular surfaces.This work was supported by NSF Young Investigator award CCR-9357763.The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies,either expressed or implied,of NSF or the ernment.Keywords:penumbra,texture mapping,graphics workstation, interaction,real-time,SGI Reality Engine.1IntroductionShadows are both an important visual cue for the perception of spatial relationships and an essential component of realistic images. Shadows differ according to the type of light source causing them: point light sources yield hard shadows,while linear and area(also known as extended)light sources generally yield soft shadows with an umbra(fully shadowed region)and penumbra(partially shad-owed region).The real world contains mostly soft shadows due to thefinite size of sky light,the sun,and light bulbs,yet most computer graphics rendering software simulates only hard shadows,if it simulates shadows at all.Excessive sharpness of shadow edges is often a telltale sign that a picture is computer generated.Shadows are even less commonly simulated with hardware ren-dering.Current graphics workstations,such as Silicon Graphics (SGI)and Hewlett Packard(HP)machines,provide z-buffer hard-ware that supports real-time rendering of fairly complex scenes. Such machines are wonderful tools for computer aided design and visualization.Shadows are seldom simulated on such machines, however,because existing algorithms are not general enough,or they require too much time or memory.The shadow algorithms most suitable for interaction on graphics workstations have a cost per frame proportional to the number of point light sources.While such algorithms are practical for one or two light sources,they are impractical for a large number of sources or the approximation of extended sources.We present here a new algorithm that computes the soft shad-ows due to extended light sources.The algorithm exploits graphics hardware for fast projective(perspective)transformation,clipping, scan conversion,texture mapping,visibility testing,and image av-eraging.The hardware is used both to compute the shading on the surfaces and to display it,using texture mapping.For diffuse scenes,the shading is computed in a preprocessing step whose cost is proportional to the number of light source samples,but while the scene is static,it can be redisplayed in time independent of the num-ber of light sources.The method is also useful for simulating the hard shadows due to a large number of point sources.The memory requirements of the algorithm are also independent of the number of light source samples.1.1The IdeaFor diffuse scenes,our method works by precomputing,for each polygon in the scene,a radiance texture[12,14]that records the color(outgoing radiance)at each point in the polygon.In a diffuse scene,the radiance at each surface point is view independent,so it can be precomputed and re-used until the scene geometry changes. This radiance texture is analogous to the mesh of radiosity values computed in a radiosity algorithm.Unlike a radiosity algorithm, however,our algorithm can compute this texture almost entirely in hardware.The key idea is to use graphics hardware to determine visibility and calculate shading,that is,to determine which portions of a surface are occluded with respect to a given extended light source, and how brightly they are lit.In order to simulate extended light sources,we approximate them with a number of light sample points, and we do visibility tests between a given surface point and each light sample.To keep as many operations in hardware as possible, however,we do not use a hemicube[7]to determine visibility. Instead,to compute the shadows for a single polygon,we render the scene into a scratch buffer,with all polygons except the one being shaded appropriately blackened,using a special projective projection from the point of view of each light sample.These views are registered so that corresponding pixels map to identical points on the polygon.When the resulting hard shadow images are averaged, a soft shadow image results(figure1).This image is then used directly as a texture on the polygon in order to simulate shadows correctly.The textures so computed are used for real-time display until the scene geometry changes.In the remainder of the paper,we summarize previous shadow algorithms,we present our method for diffuse scenes in more detail, we discuss generalizations to scenes with specular and general re-flectance,we present our implementation and results,and we offer some concluding remarks.2Previous Work2.1Shadow AlgorithmsWoo et al.surveyed a number of shadow algorithms[19].Here we summarize soft shadows methods and methods that run at inter-active rates.Shadow algorithms can be divided into three categories: those that compute everything on thefly,those that precompute just visibility,and those that precompute shading.Computation on the Fly.Simple ray tracing computes everything on thefly.Shadows are computed on a point-by-point basis by tracing rays between the surface point and a point on each light source to check for occluders.Soft shadows can be simulated by tracing rays to a number of points distributed across the light source [8].The shadow volume approach is another method for computing shadows on thefly.With this method,one constructs imaginary surfaces that bound the shadowed volume of space with respect to each point light source.Determining if a point is in shadow then reduces to point-in-volume testing.Brotman and Badler used an extended z-buffer algorithm with linked lists at each pixel to support soft shadows using this approach[4].The shadow volume method has also been used in two hardware implementations.Fuchs et ed the pixel processors of the Pixel Planes machine to simulate hard shadows in real-time[10]. Heidmann used the stencil buffer in advanced SGI machines[13]. With Heidmann’s algorithm,the scene must be rendered through the stencil created from each light source,so the cost per frame is proportional to the number of light sources times the number of polygons.On1991hardware,soft shadows in a fairly simple scene required several seconds with his algorithm.His method appears to be one of the algorithms best suited to interactive use on widely available graphics hardware.We would prefer,however,an algorithm whose cost is sublinear in the number of light sources.A simple,brute force approach,good for casting shadows of objects onto a plane,is tofind the projective transformation that projects objects from a point light onto a plane,and to use it to draw each squashed,blackened object on top of the plane[3],[15, p.401].This algorithm effectively multiplies the number of objects in the scene by the number of light sources times the number of receiver polygons onto which shadows are being cast,however, so it is typically practical only for very small numbers of light sources and receivers.Another problem with this method is that occluders behind the receiver will cast erroneous shadows,unless extra clipping is done.Precomputation of Visibility.Instead of computing visibility on thefly,one can precompute visibility from the point of view of each light source.The z-buffer shadow algorithm uses two(or more)passes of z-buffer rendering,first from the light sources,and then from the eye[18].The z-buffers from the light views are used in thefinalFigure 1:Hard shadow images from 22grid of sample points on lightsource.Figure 2:Left:scene with square light source (foreground),triangular occluder (center),and rectangular receiver (background),with shadows on receiver.Center:Approximate soft shadows resulting from 22grid of sample points;the average of the four hard shadow images in Figure 1.Right:Correct soft shadow image (generated with 1616sampling).This image is used as the texture on the receiver at left.pass to determine if a given 3-D point is illuminated with respect to each light source.The transformation of points from one coordinate system to another can be accelerated using texture mapping hard-ware [17].This latter method,by Segal et al.,achieves real-time rates,and is the other leading method for interactive shadows.Soft shadows can be generated on a graphics workstation by rendering the scene multiple times,using different points on the extended light source,averaging the resulting images using accumulation buffer hardware [11].A variation of the shadow volume approach is to intersect these volumes with surfaces in the scene to precompute the umbra and penumbra regions on each surface [16].During the final rendering pass,illumination integrals are evaluated at a sparse sampling of pixels.Precomputation of Shading.Precomputation can be taken fur-ther,computing not just visibility but also shading.This is most relevant to diffuse scenes,since their shading is view-independent.Some of these methods compute visibility continuously,while oth-ers compute it discretely.Several researchers have explored continuous visibility methods for soft shadow computation and radiosity mesh generation.With this approach,surfaces are subdivided into fully lit,penumbra,and umbra regions by splitting along lines or curves where visibility changes.In Chin and Feiner’s soft shadow method,polygons are split using BSP trees,and these sub-polygons are then pre-shaded [6].They achieved rendering times of under a minute for simple scenes.Drettakis and Fiume used more sophisticated computational geometry techniques to precompute their subdivision,and reported rendering times of several seconds [9].Most radiosity methods discretize each surface into a mesh of elements and then use discrete methods such as ray tracing or hemicubes to compute visibility.The hemicube method computes visibility from a light source point to an entire hemisphere by pro-jecting the scene onto a half-cube [7].Much of this computation can be done in hardware.Radiosity meshes typically do not resolve shadows well,however.Typical artifacts are Mach bands along the mesh element boundaries and excessively blurry shadows.Most radiosity methods are not fast enough to support interactive changes to the geometry,however.Chen’s incremental radiosity method is an exception [5].Our own method can be categorized next to hemicube radiosity methods,since it also precomputes visibility discretely.Its tech-nique for computing visibility also has parallels to the method of flattening objects to a plane.2.2Graphics HardwareCurrent graphics hardware,such as the Silicon Graphics Reality Engine [1],can projective-transform,clip,shade,scan convert,and texture tens of thousands of polygons in real-time (in 1/30sec.).We would like to exploit the speed of this hardware to simulate soft shadows.Typically,such hardware supports arbitrary 44homogeneous transformations of planar polygons,clipping to any truncated pyra-midal frustum (right or oblique),and scan conversion with z-buffering or overwriting.On SGI machines,Phong shading (once per pixel)is not possible,but faceted shading (once per polygon)and Gouraud shading (once per vertex)are supported.Phong shadingcan be simulated by splitting polygons into small pieces on input.A common,general form for hardware-supported illumination is dif-fuse reflection from multiple point spotlight sources,with a texture mapped reflectance function and attenuation:cos cos2where,as shown in Figure3,is a3-D point on a reflective surface,and isa point on a light source,is polar angle(angle from normal)at,is the angle at,is the distance between and,,,and are functions of and,is outgoing radiance at point for color channel,due to either emission or reflection,a is ambient radiance,is reflectance,is a Boolean visibility function that equals1if point is visible from point,else0,cos+max cos0,for backface testing,andthe integral is over all points on all light sources,with respect to,which is an infinitesimal area on a light source.The inputs to the problem are the geometry,the reflectance, and emitted radiance on all light sources,the ambient radi-ance a,and the output is the reflected radiance function.Figure3:Geometry for direct illumination.The radiance at point on the receiver is being calculated by summing the contributions from a set of point light sources at on light.3.1Approximating Extended Light SourcesAlthough such integrals can be solved in closed form for planar surfaces with no occlusion(1),the complexity of the visibility function makes these integrals intractable in the general case.We can compute approximations to the integral,however,by replacing each extended light source by a set of point light sources:1where is a3-D Dirac delta function,is sample point on light source,and is the area associated with this sample point. Typically,each sample on a light source has equal area:, where is the area of light source.With this approximation,the radiance of a reflective surface point can be computed by summing the contributions over all sample points on all light sources:a1cos+cos+2(2)Each term in the inner summation can be regarded as a hard shadow image resulting from a point light source at,where is a function of screen.That summand consists of the product of three factors.Thefirst one,which is an area times the reflectance of the receiving polygon, can be calculated in software.The second factor is the cosine of the angle on the receiver,times the cosine of the angle on the lightb+e x Figure4:Pyramid with parallelogram base.Faces of pyramid are marked with their plane equations.source,times the radiance of the light source,divided by2.This can be computed in hardware by rendering the receiver polygon with a single spotlight at turned on,using a spotlight exponent of1and quadratic attenuation.On machines that do not support Phong shading,we will have tofinely subdivide the polygon.The third factor is visibility between a point on a light source and each point on the receiver.Visibility can be computed by projecting all polygons between light source point and the receiver onto the receiver.We want to simulate soft shadows as quickly as possible.To take full advantage of the hardware,we can precompute the shading for each polygon using the formula above,and then display views of the scene from moving viewpoints using real-time texture mapping and z-buffering.To compute soft shadow textures,we need to generate a number of hard shadow images and then average them.If these hard shadow images are not registered(they would not be,using hemi-cubes), then it would be necessary to resample them so that corresponding pixels in each hard shadow image map to the same surface point in 3-D.This would be very slow.A faster alternative is to choose the transformation for each projection so that the hard shadow images are perfectly registered with each other.For planar receiver surfaces,this is easily accomplished by ex-ploiting the capabilities of projective transformations.If wefit a parallelogram around the receiver surface of interest,and then con-struct a pyramid with this as its base and the light point as its apex, there is a44homogeneous transformation that will map such a pyramid into an axis-aligned box,as described shortly.The hard shadow image due to sample point on light is created by loading this special transformation matrix and rendering the receiver polygon.The polygon is illuminated by the ambient light plus a single point light source at,using Phong shading or a good approximation to it.The visibility function is then computed by rendering the remainder of the scene with all surfaces shaded as if they were the receiver illuminated by ambient light:r ar g ag b ab.This is most quickly done with z-buffering off,and clipping to a pyramid with the receiver polygon as its base. Drawing each polygon with an unsorted painter’s algorithm suffices here because all polygons are the same color,and after clipping, the only polygon fragments remaining will lie between the light source and the receiver,so they all cast shadows on the receiver. To compute the weighted average of the hard shadow images so created,we use the accumulation buffer.3.3Projective Transformation of a Pyramid to a BoxWe want a projective(perspective)transformation that maps a pyramid with parallelogram base into a rectangular parallelepiped. The pyramid lies in object space,with coordinates o o o.It has apex and its parallelogram base has one vertex at and edge vectors x and y(bold lower case denotes a3-D point or vector). The parallelepiped lies in what we will call unit screen space,with coordinates u u u.Viewed from the apex,the left and right sides of the pyramid map to the parallel planes u0and u1, the bottom and top map to u0and u1,and the base plane anda plane parallel to it through the apex map to u1and u, respectively.Seefigure4.A44homogeneous matrix effecting this transformation can be derived from these conditions.It will have the form:0001020310111213000130313233and the homogeneous transformation and homogeneous division to transform object space to unit screen space are:1ooo1anduuu1The third row of matrix takes this simple form because a constant uvalue is desired on the base plane.The homogeneous screen coordinates,,and are each affine functions of o,o,and o (that is,linear plus translation).The constraints above specify the value of each of the three coordinates at four points in space–just enough to uniquely determine the twelve unknowns in.The coordinate,for example,has value1at the points, x,and y,and value0at.Therefore,the vector w y xis normal to any plane of constant,thusfixing thefirst three elements of the last row of the matrix within a scale factor: 303132w w.Setting to0at and1at constrains33w w and w1w w,where w.Thefirst two rows of can be derived similarly(seefigure4).The result is:x xx x xy x xz x xy yx y yy y yz y y0001w wx w wy w wz w wwherex w yy x ww y xandx1x xy1y yw1w w Blinn[3]uses a related projective transformation for the genera-tion of shadows on a plane,but his is a projection(it collapses3-D to2-D),while ours is3-D to3-D.We use the third dimension for clipping.3.4Using the TransformationTo use this transformation in our shadow algorithm,wefirstfit a parallelogram around the receiver polygon.If the receiver is a rectangle or other parallelogram,thefit is exact;if the receiver is a triangle,then wefit the triangle into the lower left triangle of the parallelogram;and for more general polygons with four or more sides,a simple2-D bounding box in the plane of the polygon can be used.It is possible to go further with projective transformations, mapping arbitrary planar quadrilaterals into squares(using the ho-mogeneous texture transformation matrix of OpenGL,for example). We assume for simplicity,however,that the transformation between texture space(the screen space in these light source projections)and object space is affine,and so we restrict ourselves to parallelograms.3.5Soft Shadow Algorithm for Diffuse ScenesTo precompute soft shadow radiance textures:turn off z-bufferingfor each receiver polygonchoose resolution for receiver’s texture (x y pixels)clear accumulator image of x y pixels to black create temporary image of x y pixels for each light sourcefirst backface test:if is entirely behind or is entirely behind ,then skip to next for each sample point on light sourcesecond backface test:if x li is behind then skip to next compute transformation matrix M ,where a x li ,and the base parallelogram fits tightly aroundset current transformation matrix to scale x y 1M set clipping planes to u near 1and u far big draw with illumination from x li only,as described in equation (2),into temp image for each other object in scenedraw object with ambient color into temp image add temp image into accumulator image with weight save accumulator image as texture for polygonA hard shadow image is computed in each iteration of the loop.These are averaged together to compute a soft shadow image,which is used as a radiance texture.Note that objects casting shadows need not be polygonal;any object that can be quickly scan converted will work well.To display a static scene from moving viewpoints,simply:turn on z-bufferingfor each object in sceneif object receives shadows,draw it textured but without illumination else draw object with illumination3.6Backface TestingThe cases where cos +cos +0can be optimized using backface testing.To test if polygon is behind polygon ,compute the signed distances from the plane of polygon to each of the vertices of (signed positive on the front of and negative on the back).If they are all positive,then is entirely in front of ,if they are all nonpositive,is entirely in back,otherwise,part of is in front of and part is in back.To test if the apex of the pyramid is behind the receiver that defines the base plane,simply test if w w 0.The above checks will ensure that cos0at every point on the receiver,but there is still the possibility that cos 0on portions of the receiver (i.e.that the receiver is only partially illuminated by the light source).This final case should be handled at the polygon level or pixel level when shading the receiver in the algorithm above.Phong shading,or a good approximation to it,is needed here.3.7Sampling Extended Light SourcesThe set of samples used on each light source greatly influences the speed and quality of the results.Too few samples,or a poorly chosen sample distribution,result in penumbras that appear stepped,not continuous.If too many samples are used,however,the simulation runs too slowly.If a uniform grid of sample points is used,the stepping is much more pronounced in some cases.For example,if a uniform grid ofsamples is used on a parallelogram light source,an occluderedge coplanar with one of the light source edges will causebig steps,while an occluder edge in general position will cause 2small steps.Stochastic sampling [8]with the same number of samples yields smoother penumbra than a uniform grid,because the steps no longer coincide.We use a jittered uniform grid because it gives good results and is very easy to compute.Using a fixed number of samples on each light source is ineffi-cient.Fine sampling of a light source is most important when the light source subtends a large solid angle from the point of view of the receiver,since that is when the penumbra is widest and stepping artifacts would be most visible.A good approach is to choose the light source sample resolution such that the solid angle subtended by the light source area associated with each sample is below a user-specified threshold.The algorithm can easily handle diffuse (non-directional)light sources whose outgoing radiance varies with position,such as stained glass windows.For such light sources,importance sam-pling might be preferable:concentration of samples in the regions of the light source with highest radiance.3.8Texture ResolutionThe resolution of the shadow texture should be roughly equal to the resolution at which it will be viewed (one texture pixel mapping to one screen pixel);lower resolution results in visible artifacts such as blocky shadows,and higher resolution is wasteful of time and memory.In the absence of information about probable views,a reasonable technique is to set the number of pixels on a polygon’s texture,in each dimension,proportional to its size in world space us-ing a “desired pixel size”parameter.With this scheme,the required texture memory,in pixels,will be the total world space surface area of all polygons in the scene divided by the square of the desired pixel size.Texture memory for triangles can be further optimized by packing the textures for two triangles into one rectangular texture block.If there are too many polygons in the scene,or the desired pixel size is too small,the texture memory could be exceeded,causing paging of texture memory and slow performance.Radiance textures can be antialiased by supersampling:gener-ating the hard and initial soft shadow images at several times the desired resolution,and then filtering and downsampling the images before creating textures.Textured surfaces should be rendered with good texture filtering.Some polygons will contain penumbral regions with respect to a light source,and will require high texture resolution,but others will be either totally shadowed (umbral)or totally illuminated by each light source,and will have very smooth radiance functions.Sometimes these functions will be so smooth that they can be ad-equately approximated by a single Gouraud shaded polygon.This optimization saves significant texture memory and speeds display.This idea can be carried further,replacing the textured planar polygon with a mesh of coplanar Gouraud shaded triangles.For complex shadow patterns and radiance functions,however,textures may render faster than the corresponding Gouraud approximation,depending on the relative speed of texture mapping and Gouraud-shaded triangle drawing,and the number of triangles required to achieve a good approximation.3.9ComplexityWe now analyze the expected complexity of our algorithm (worstcase costs are not likely to be observed in practice,so we do not discuss them here).Although more sophisticated schemes are pos-sible,we will assume for the purposes of analysis that the same setFigure5:Shadows are computed on plane and projected onto thereceiving object at right.of light samples are used for shadowing all polygons.Suppose wehave a scene with surfaces(polygons),a total of lightsource samples,a total of radiance texture pixels,and the outputimages are rendered with pixels.We assume the depth complexityof the scene(the average number of surfaces intersecting a ray)isbounded,and that and are roughly linearly related.The averagenumber of texture pixels per polygon is.With our technique,preprocessing renders the scene times.A painter’s algorithm rendering of polygons into an image ofpixels takes time for scenes of bounded depth complexity. The total preprocessing time is thus2,and the required texture memory is.Display requires only z-buffered texturemapping of polygons to an image of pixels,for a time costof.The memory for the z-buffer and output image is .Our display algorithm is very fast for complex scenes.Its cost isindependent of the number of light source samples used,and alsoindependent of the number of texture pixels(assuming no texturepaging).For scenes of low or moderate complexity,our preprocessingalgorithm is fast because all of its pixel operations can be done inhardware.For very complex scenes,our preprocessing algorithmbecomes impractical because it is quadratic in,however.In suchcases,performance can be improved by calculating shadows only ona small number of surfaces in the scene(e.g.floor,walls,and otherlarge,important surfaces),thereby reducing the cost to t, where t is the number of textured polygons.In an interactive setting,a progressive refinement of images canbe used,in which hard shadows on a small number of polygons(precomputation with1,t small)are rendered while the useris moving objects with the mouse,a full solution(precomputationwith large,t large)is computed when they complete a movement,and then top speed rendering(display with texture mapping)is usedas the viewer moves through the scene.More fundamentally,the quadratic cost can be reduced usingmore intelligent data structures.Because the angle of view of mostof the shadow projection pyramids is narrow,only a small fractionof the polygons in a scene shadow a given polygon,on average.Using spatial data structures,entire objects can be culled with a fewquick tests[2],obviating transformation and clipping of most ofthe scene,speeding the rendering of each hard shadow image from to,where3or so.An alternative optimization,which would make the algorithmmore practical for the generation of shadows on complex curved ormany-faceted objects,is to approximate a receiving object with aplane,compute shadows on this plane,and then project the shadowsonto the object(figure5).This has the advantage of replacingmany renderings with a single rendering,but its disadvantage is thatself-shadowing of concave objects is not simulated.3.10Comparison to Other AlgorithmsWe can compare the complexity of our algorithm to other algo-rithms capable of simulating soft shadows at near-interactive rates. The main alternatives are the stencil buffer technique by Heidmann, the z-buffer method by Segal et al.,and hardware hemicube-based radiosity algorithms.The stencil buffer technique renders the scene once for each light source,so its cost per frame is,making it difficult to support soft shadows in real-time.With the z-buffer shadow algorithm,the preprocessing time is acceptable,but the memory cost and display time cost are.This makes the algorithm awkward for many point light sources or extended light sources with many samples(large).When soft shadows are desired,our approach appears to yield faster walkthroughs than either of these two methods,because our display process is so fast.Among current radiosity algorithms,progressive radiosity using hardware hemicubes is probably the fastest method for complex scenes.With progressive radiosity,very high resolution hemicubes and many elements are needed to get good shadows,however.While progressive radiosity may be a better approach for shadow genera-tion in very complex scenes(very large),it appears slower than our technique for scenes of moderate complexity because every pixel-level operation in our algorithm can be done in hardware,but this is not the case with hemicubes,since the process of summing differential form factors while reading out of the hemicube must be done in software[7].4Scenes with General ReflectanceShadows on specular surfaces,or surfaces with more general reflectance,can be simulated with a generalization of the diffuse algorithm,but not without added time and memory costs.Shadows from a single point light source are easily simulated by placing just the visibility function in texture memory, creating a Boolean shadow texture,and computing the remaining local illumination factors at vertices only.This method costs t for precomputation,and for display.Shadows from multiple point light sources can also be simulated. After precomputing a shadow texture for each polygon when illu-minated with each light source,the total illumination due to light sources can be calculated by rendering the scene times with each of these sets of shadow textures,compositing thefinal image using blending or with the accumulation buffer.The cost of this method is one-bit texture pixels and display time.Generalizing this method to extended light sources in the case of general reflectance is more difficult,as the computation involves the integration of light from polygonal light sources weighted by the bidirectional reflectance distribution functions(BRDFs).Specular BRDF’s are spiky,so careful integration is required or the highlights will betray the point sampling of the light sources.We believe, however,that with careful light sampling and numerical integration of the BRDF’s,soft shadows on surfaces with general reflectance could be displayed with memory and time.5ImplementationWe implemented our diffuse algorithm using the OpenGL sub-routine library,running with the IRIX5.3operating system on an SGI Crimson with100MHz MIPS R4000processor and Reality Engine graphics.This machine has hardware for texture mapping and an accumulation buffer with24bits per channel.The implementation is fairly simple,since OpenGL supports loading of arbitrary44matrices,and we intentionally cast our。
自学报告格式模板
基于纹理合成的快速自适应图像补全算法陈志雄(厦门大学信息科学与技术学院,福建厦门360005)摘要:图像修复是一种修复图像中破坏部分的技术,具有广泛的应用。
对基于纹理合成的快速自适应图像补全算法进行了研究,算法从寻找纹理匹配块的搜索范围、纹理块修复的优先次序、纹理块大小的自适应选取以及块效应的去除等方面着手,充分利用图像的局部特征,有效改善了图像补全的视觉效果,加快了图像补全的速度,提高了纹理合成的速度。
关键词:纹理合成;图像补全;自适应A fast adaptive algorithm based on texture synthesisfor image completionCHEN Zhi-xiong(School of information science and technology,Xiamen University, Xiamen Fujian 360005, China)Abstract:Image inpainting is a kind of technique with a wide range of applications, which repairing the damaged sect ion of the image. Researching on fast adaptive algorithm based on texture synthesis for image completion which focus on seeking the texture match block’s hunting zone, precedence in texture block repairing, texture block size auto-adapted selection and blocking effect elimination, fully uses image’s partial characteristic, improves the visual effect of image completion, speeds up the speed of image completion and enhances thespeed of texture synthesis.Keyword:texture synthesis;image completion;adaptive1 绪论数字图像修复(image inpainting)技术是目前图像工程领域一个新的研究热点,其目的是检测图像上的受损部分或被遮挡区域,并根据周围的有效信息进行自动恢复,达到满意的目视效果。
基于稀疏表示的Data Matrix码图像修复算法
基于稀疏表示的Data Matrix码图像修复算法陈庆然;许义宝;李新华【摘要】Sparse representation theory has become a hot topic in the field of image processing because of its simple modeling,high robust-ness and strong anti-jamming,which is applied in image restoration as a new research direction in image processing. In view of the blocked two-dimensional code image with no identification which often appear in the industrial scene,we propose an image restoration algorithm of block clustering based on sparse representation model. According to the effective information in the image to be repaired,the image is seg-mented by fixed overlapping pixels,and Euclidean distance is used to train and match image blocks. Taking the obtained image blocks with similar structure clustered into structural groups as the basic unit of image sparse representation,fast dictionary learning is carried out by using the estimation of each structure group. By using the separation iteration and the optimized gradient algorithm,the L1 norm-minimization of the group sparse representation model is solved,improving the robustness of restoration algorithm. The experiment shows that the proposed algorithm can repair the damaged Data Matrix code images such asocclusion,scratches or pixel loss,which greatly improves the recognition rate of bar code.%稀疏表示理论凭借其建模简单、鲁棒性高与抗干扰能力强等优势成为研究热点,将稀疏理论应用于图像修复已成为图像处理领域新的研究方向.针对工业现场中常出现的被遮挡而不能识别的二维码图像,提出一种基于稀疏表示模型的块聚类图像修复算法.依据待修复图像内的有效信息,以固定重叠像素的方式将图像分块,分别对图像块使用欧几里得距离进行训练匹配,将得到的具有相似结构的图像块聚类为结构组作为图像稀疏表示的基本单位,利用每个结构组的估计来快速学习字典.通过使用分离迭代与优化梯度算法对组稀疏表示模型的L1范数最小化问题进行求解,提高了修复算法的鲁棒性.实验结果表明,该算法能够很好地修复被遮挡、划痕或像素丢失等受损的Data Matrix码图像,较大地提高了条码的识别率.【期刊名称】《计算机技术与发展》【年(卷),期】2018(028)001【总页数】5页(P60-63,68)【关键词】Data Matrix码;图像修复;块聚类;稀疏表示【作者】陈庆然;许义宝;李新华【作者单位】安徽大学计算智能与信号处理教育部重点实验室,安徽合肥 230601;安徽大学计算智能与信号处理教育部重点实验室,安徽合肥 230601;安徽大学计算智能与信号处理教育部重点实验室,安徽合肥 230601【正文语种】中文【中图分类】TP3010 引言随着信息技术与自动识别技术的快速发展,二维码得到了广泛应用。
LMS复合材料的完整解决方案——介绍
LMS Solutions : A complete portfolio for composite material Introduction
Julien SIMON – Business Development Manager LMS Samtech Division
Automotive
Simulation challenges
Mechanical Industry
Simulate global behavior of composite structure that requires equivalent or superior performance to existing components with reduced mass and acceptable cost
Introduction
Composite materials :
engineered or naturally occurring materials made from two or more constituent materials First composite material Wood Cob, Bow Why ?
Intra-laminar & Inter-laminar
Ride & Handling Ride Comfort
• Highly Non-linear Subsystems
Coupling to Dedicated Simulation Applications for Composite Material/Laminate Properties
Texture Synthesis by Non-parametric Sampling:通过非参数化采样的纹理合成
Randomness Parameter
More Synthesis Results
Increasing window size
Brodatz Results
Applications
• Occlusion fill-in
– for 3D reconstruction
• region-based image and video compression
– a small sample of textured region is stored
• Texturing non-developable objects
– Now let’s try this in 2D...
Synthesizing One Pixel
SAMPLE
p
Infinite sample image
Generated image
– Assuming Markov property, what is conditional probability
– If no close match can be found, the pixel is not synthesized until the end
• Using Gaussian-weighted SSD is very important
– to make sure the new pixel agrees with its closest neighbors
distribution of p, given the neighbourhood window?
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Applications
• Occlusion fill-in
– for 3D reconstruction
• region-based image and video compression
– a small sample of textured region is stored
• Texturing non-developable objects
• Our method:
– Texture is “grown” one pixel at a time – conditional pdf of pixel given its neighbors
synthesized thus far is computed directly from the sample image
– "One morning I shot an elephant in my arms and kissed him.”
– "I spent an interesting evening recently with a grain of salt"
• Notice how well local structure is preserved!
• Texture analysis: how to capture the essence of texture?
• Need to model the whole spectrum: from repeated to stochastic texture
• This problem is at intersection of vision, graphics, statistics, and image compression
– Approximates reduction to a smaller neighborhood window if data is too sparse
Randomness Parameter
More Synthesis Results
Increasing window size
Brodatz Resultwire
More Brodatz Results
french canvas
rafia weave
More Results
wood
granite
More Results
white bread
brick wall
Constrained Synthesis
– If no close match can be found, the pixel is not synthesized until the end
• Using Gaussian-weighted SSD is very important
– to make sure the new pixel agrees with its closest neighbors
Visual Comparison
Synthetic tilable texture
[DeBonet, ‘97]
Simple tiling
Our approach
Failure Cases
Growing garbage
Verbatim copying
Homage to Shannon
Constrained Text Synthesis
Texture Synthesis by Non-parametric Sampling
Alexei Efros and Thomas Leung UC Berkeley
distribution of p, given the neighbourhood window?
– Instead of constructing a model, let’s directly search the input image for all such neighbourhoods to produce a histogram for p
Growing Texture
– Starting from the initial configuration, we “grow” the texture one pixel at a time
– The size of the neighbourhood window is a parameter that specifies how stochastic the user believes this texture to be
– So we find the best match using SSD error (weighted by a Gaussian to emphasize local structure), and take all samples within some distance from that match
Texture Synthesis by Non-parametric Sampling
Alexei Efros and Thomas Leung UC Berkeley
Goal of Texture Synthesis
input image
SYNTHESIS
True (infinite) texture generated image
– Alex Berg – Elizaveta Levina – Jitendra Malik – Yair Weiss
• Funding agencies
– NSF Graduate Fellowship
– Berkeley Fellowship – ONR MURI – California MIRCO
• Given a finite sample of some texture, the goal is to synthesize other samples from that same texture.
– The sample needs to be "large enough"
The Challenge
repeated stochastic
Both?
Some Previous Work
– multi-scale filter response histogram matching [Heeger and Bergen,’95]
– sampling from conditional distribution over multiple scales [DeBonet,’97]
– growing texture directly on surface
• Motion synthesis
Texturing a sphere
Sample image 2D
3D
Image Extrapolation
Summary
• Advantages:
– conceptually simple – models a wide range of real-world textures – naturally does hole-filling
– To grow from scratch, we use a random 3x3 patch from input image as seed
Some Details
• Growing is in “onion skin” order
– Within each “layer”, pixels with most neighbors are synthesized first
– Now let’s try this in 2D...
Synthesizing One Pixel
SAMPLE
p
Infinite sample image
Generated image
– Assuming Markov property, what is conditional probability
– filter histograms with Gibbs sampling [Zhu et al,’98]
– matching 1st and 2nd order properties of wavelet coefficients [Simoncelli and Portilla,’98]
– N-gram language model [Shannon,’48]
– To synthesize p, just pick one match at random
Really Synthesizing One Pixel
SAMPLE
finite sample image
p
Generated image
– However, since our sample image is finite, an exact neighbourhood match might not be present
• Disadvantages:
– it’s greedy – it’s slow – it’s a heuristic
• Not an answer to texture analysis, but hopefully some inspiration!
Acknowledgments
• Thanks to:
Motivation from Language
• [Shannon,’48] proposed a way to generate English-looking text using N-grams:
– Assume a generalized Markov model – Use a large text to compute probability