Ica and Gabor Representation for Facial Expression Recognition

合集下载

人脸识别 面部 数字图像处理相关 中英对照 外文文献翻译 毕业设计论文 高质量人工翻译 原文带出处

人脸识别 面部 数字图像处理相关 中英对照 外文文献翻译 毕业设计论文 高质量人工翻译 原文带出处

人脸识别相关文献翻译,纯手工翻译,带原文出处(原文及译文)如下翻译原文来自Thomas David Heseltine BSc. Hons. The University of YorkDepartment of Computer ScienceFor the Qualification of PhD. — September 2005 -《Face Recognition: Two-Dimensional and Three-Dimensional Techniques》4 Two-dimensional Face Recognition4.1 Feature LocalizationBefore discussing the methods of comparing two facial images we now take a brief look at some at the preliminary processes of facial feature alignment. This process typically consists of two stages: face detection and eye localisation. Depending on the application, if the position of the face within the image is known beforehand (fbr a cooperative subject in a door access system fbr example) then the face detection stage can often be skipped, as the region of interest is already known. Therefore, we discuss eye localisation here, with a brief discussion of face detection in the literature review(section 3.1.1).The eye localisation method is used to align the 2D face images of the various test sets used throughout this section. However, to ensure that all results presented are representative of the face recognition accuracy and not a product of the performance of the eye localisation routine, all image alignments are manually checked and any errors corrected, prior to testing and evaluation.We detect the position of the eyes within an image using a simple template based method. A training set of manually pre-aligned images of feces is taken, and each image cropped to an area around both eyes. The average image is calculated and used as a template.Figure 4-1 - The average eyes. Used as a template for eye detection.Both eyes are included in a single template, rather than individually searching for each eye in turn, as the characteristic symmetry of the eyes either side of the nose, provides a useful feature that helps distinguish between the eyes and other false positives that may be picked up in the background. Although this method is highly susceptible to scale(i.e. subject distance from the camera) and also introduces the assumption that eyes in the image appear near horizontal. Some preliminary experimentation also reveals that it is advantageous to include the area of skin justbeneath the eyes. The reason being that in some cases the eyebrows can closely match the template, particularly if there are shadows in the eye-sockets, but the area of skin below the eyes helps to distinguish the eyes from eyebrows (the area just below the eyebrows contain eyes, whereas the area below the eyes contains only plain skin).A window is passed over the test images and the absolute difference taken to that of the average eye image shown above. The area of the image with the lowest difference is taken as the region of interest containing the eyes. Applying the same procedure using a smaller template of the individual left and right eyes then refines each eye position.This basic template-based method of eye localisation, although providing fairly preciselocalisations, often fails to locate the eyes completely. However, we are able to improve performance by including a weighting scheme.Eye localisation is performed on the set of training images, which is then separated into two sets: those in which eye detection was successful; and those in which eye detection failed. Taking the set of successful localisations we compute the average distance from the eye template (Figure 4-2 top). Note that the image is quite dark, indicating that the detected eyes correlate closely to the eye template, as we would expect. However, bright points do occur near the whites of the eye, suggesting that this area is often inconsistent, varying greatly from the average eye template.Figure 4-2 一Distance to the eye template for successful detections (top) indicating variance due to noise and failed detections (bottom) showing credible variance due to miss-detected features.In the lower image (Figure 4-2 bottom), we have taken the set of failed localisations(images of the forehead, nose, cheeks, background etc. falsely detected by the localisation routine) and once again computed the average distance from the eye template. The bright pupils surrounded by darker areas indicate that a failed match is often due to the high correlation of the nose and cheekbone regions overwhelming the poorly correlated pupils. Wanting to emphasise the difference of the pupil regions for these failed matches and minimise the variance of the whites of the eyes for successful matches, we divide the lower image values by the upper image to produce a weights vector as shown in Figure 4-3. When applied to the difference image before summing a total error, this weighting scheme provides a much improved detection rate.Figure 4-3 - Eye template weights used to give higher priority to those pixels that best represent the eyes.4.2 The Direct Correlation ApproachWe begin our investigation into face recognition with perhaps the simplest approach,known as the direct correlation method (also referred to as template matching by Brunelli and Poggio [29 ]) involving the direct comparison of pixel intensity values taken from facial images. We use the term "Direct Conelation, to encompass all techniques in which face images are compared directly, without any form of image space analysis, weighting schemes or feature extraction, regardless of the distance metric used. Therefore, we do not infer that Pearson's correlation is applied as the similarity function (although such an approach would obviously come under our definition of direct correlation). We typically use the Euclidean distance as our metric in these investigations (inversely related to Pearson's correlation and can be considered as a scale and translation sensitive form of image correlation), as this persists with the contrast made between image space and subspace approaches in later sections.Firstly, all facial images must be aligned such that the eye centres are located at two specified pixel coordinates and the image cropped to remove any background information. These images are stored as greyscale bitmaps of 65 by 82 pixels and prior to recognition converted into a vector of 5330 elements (each element containing the corresponding pixel intensity value). Each corresponding vector can be thought of as describing a point within a 5330 dimensional image space. This simple principle can easily be extended to much larger images: a 256 by 256 pixel image occupies a single point in 65,536-dimensional image space and again, similar images occupy close points within that space. Likewise, similar faces are located close together within the image space, while dissimilar faces are spaced far apart. Calculating the Euclidean distance d, between two facial image vectors (often referred to as the query image q, and gallery image g), we get an indication of similarity. A threshold is then applied to make the final verification decision.d . q - g ( threshold accept ) (d threshold ⇒ reject ). Equ. 4-14.2.1 Verification TestsThe primary concern in any face recognition system is its ability to correctly verify a claimed identity or determine a person's most likely identity from a set of potential matches in a database. In order to assess a given system's ability to perform these tasks, a variety of evaluation methodologies have arisen. Some of these analysis methods simulate a specific mode of operation (i.e. secure site access or surveillance), while others provide a more mathematicaldescription of data distribution in some classification space. In addition, the results generated from each analysis method may be presented in a variety of formats. Throughout the experimentations in this thesis, we primarily use the verification test as our method of analysis and comparison, although we also use Fisher's Linear Discriminant to analyse individual subspace components in section 7 and the identification test for the final evaluations described in section 8. The verification test measures a system's ability to correctly accept or reject the proposed identity of an individual. At a functional level, this reduces to two images being presented for comparison, fbr which the system must return either an acceptance (the two images are of the same person) or rejection (the two images are of different people). The test is designed to simulate the application area of secure site access. In this scenario, a subject will present some form of identification at a point of entry, perhaps as a swipe card, proximity chip or PIN number. This number is then used to retrieve a stored image from a database of known subjects (often referred to as the target or gallery image) and compared with a live image captured at the point of entry (the query image). Access is then granted depending on the acceptance/rej ection decision.The results of the test are calculated according to how many times the accept/reject decision is made correctly. In order to execute this test we must first define our test set of face images. Although the number of images in the test set does not affect the results produced (as the error rates are specified as percentages of image comparisons), it is important to ensure that the test set is sufficiently large such that statistical anomalies become insignificant (fbr example, a couple of badly aligned images matching well). Also, the type of images (high variation in lighting, partial occlusions etc.) will significantly alter the results of the test. Therefore, in order to compare multiple face recognition systems, they must be applied to the same test set.However, it should also be noted that if the results are to be representative of system performance in a real world situation, then the test data should be captured under precisely the same circumstances as in the application environment.On the other hand, if the purpose of the experimentation is to evaluate and improve a method of face recognition, which may be applied to a range of application environments, then the test data should present the range of difficulties that are to be overcome. This may mean including a greater percentage of6difficult9 images than would be expected in the perceived operating conditions and hence higher error rates in the results produced. Below we provide the algorithm for executing the verification test. The algorithm is applied to a single test set of face images, using a single function call to the face recognition algorithm: CompareF aces(F ace A, FaceB). This call is used to compare two facial images, returning a distance score indicating how dissimilar the two face images are: the lower the score the more similar the two face images. Ideally, images of the same face should produce low scores, while images of different faces should produce high scores.Every image is compared with every other image, no image is compared with itself and nopair is compared more than once (we assume that the relationship is symmetrical). Once two images have been compared, producing a similarity score, the ground-truth is used to determine if the images are of the same person or different people. In practical tests this information is often encapsulated as part of the image filename (by means of a unique person identifier). Scores are then stored in one of two lists: a list containing scores produced by comparing images of different people and a list containing scores produced by comparing images of the same person. The final acceptance/rejection decision is made by application of a threshold. Any incorrect decision is recorded as either a false acceptance or false rejection. The false rejection rate (FRR) is calculated as the percentage of scores from the same people that were classified as rejections. The false acceptance rate (FAR) is calculated as the percentage of scores from different people that were classified as acceptances.For IndexA = 0 to length(TestSet) For IndexB = IndexA+l to length(TestSet) Score = CompareFaces(TestSet[IndexA], TestSet[IndexB]) If IndexA and IndexB are the same person Append Score to AcceptScoresListElseAppend Score to RejectScoresListFor Threshold = Minimum Score to Maximum Score:FalseAcceptCount, FalseRejectCount = 0For each Score in RejectScoresListIf Score <= ThresholdIncrease FalseAcceptCountFor each Score in AcceptScoresListIf Score > ThresholdIncrease FalseRejectCountF alse AcceptRate = FalseAcceptCount / Length(AcceptScoresList)FalseRej ectRate = FalseRejectCount / length(RejectScoresList)Add plot to error curve at (FalseRejectRate, FalseAcceptRate)These two error rates express the inadequacies of the system when operating at aspecific threshold value. Ideally, both these figures should be zero, but in reality reducing either the FAR or FRR (by altering the threshold value) will inevitably resultin increasing the other. Therefore, in order to describe the full operating range of a particular system, we vary the threshold value through the entire range of scores produced. The application of each threshold value produces an additional FAR, FRR pair, which when plotted on a graph produces the error rate curve shown below.False Acceptance Rate / %Figure 4-5 - Example Error Rate Curve produced by the verification test.The equal error rate (EER) can be seen as the point at which FAR is equal to FRR. This EER value is often used as a single figure representing the general recognition performance of a biometric system and allows for easy visual comparison of multiple methods. However, it is important to note that the EER does not indicate the level of error that would be expected in a real world application. It is unlikely that any real system would use a threshold value such that the percentage of false acceptances were equal to the percentage of false rejections. Secure site access systems would typically set the threshold such that false acceptances were significantly lower than false rejections: unwilling to tolerate intruders at the cost of inconvenient access denials.Surveillance systems on the other hand would require low false rejection rates to successfully identify people in a less controlled environment. Therefore we should bear in mind that a system with a lower EER might not necessarily be the better performer towards the extremes of its operating capability.There is a strong connection between the above graph and the receiver operating characteristic (ROC) curves, also used in such experiments. Both graphs are simply two visualisations of the same results, in that the ROC format uses the True Acceptance Rate(TAR), where TAR = 1.0 - FRR in place of the FRR, effectively flipping the graph vertically. Another visualisation of the verification test results is to display both the FRR and FAR as functions of the threshold value. This presentation format provides a reference to determine the threshold value necessary to achieve a specific FRR and FAR. The EER can be seen as the point where the two curves intersect.Figure 4-6 - Example error rate curve as a function of the score threshold The fluctuation of these error curves due to noise and other errors is dependant on the number of face image comparisons made to generate the data. A small dataset that only allows fbr a small number of comparisons will results in a jagged curve, in which large steps correspond to the influence of a single image on a high proportion of the comparisons made. A typical dataset of 720 images (as used in section 4.2.2) provides 258,840 verification operations, hence a drop of 1% EER represents an additional 2588 correct decisions, whereas the quality of a single image could cause the EER to fluctuate by up to 0.28.422 ResultsAs a simple experiment to test the direct correlation method, we apply the technique described above to a test set of 720 images of 60 different people, taken from the AR Face Database [ 39 ]. Every image is compared with every other image in the test set to produce a likeness score, providing 258,840 verification operations from which to calculate false acceptance rates and false rejection rates. The error curve produced is shown in Figure 4-7.Figure 4-7 - Error rate curve produced by the direct correlation method using no image preprocessing.We see that an EER of 25.1% is produced, meaning that at the EER threshold approximately one quarter of all verification operations carried out resulted in an incorrect classification. Thereare a number of well-known reasons for this poor level of accuracy. Tiny changes in lighting, expression or head orientation cause the location in image space to change dramatically. Images in face space are moved far apart due to these image capture conditions, despite being of the same person's face. The distance between images of different people becomes smaller than the area of face space covered by images of the same person and hence false acceptances and false rejections occur frequently. Other disadvantages include the large amount of storage necessaryfor holding many face images and the intensive processing required for each comparison, making this method unsuitable fbr applications applied to a large database. In section 4.3 we explore the eigenface method, which attempts to address some of these issues.4二维人脸识别4.1功能定位在讨论比较两个人脸图像,我们现在就简要介绍的方法一些在人脸特征的初步调整过程。

人工智能及发展的作文英语

人工智能及发展的作文英语

Artificial Intelligence AI has been a rapidly evolving field with profound implications for society,economy,and technology.Heres an essay on AI and its development:Introduction to Artificial IntelligenceArtificial Intelligence refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions.The term was coined in1956 at a conference at Dartmouth College,and since then,AI has been a subject of fascination and research.Historical DevelopmentThe development of AI can be traced back to the1950s with the advent of the first AI program,the Logic Theorist,developed by Allen Newell and Herbert A.Simon.This was followed by the development of the General Problem Solver and the creation of the first AI laboratory at MIT.However,the field faced a period of stagnation in the1970s, known as the AI winter,due to a lack of funding and overestimation of AI capabilities.Renaissance of AIThe field saw a resurgence in the1990s with the introduction of machine learning,a subset of AI that focuses on the development of algorithms that can learn from and make predictions or decisions based on data.The availability of big data,advancements in computational power,and the development of new algorithms have all contributed to this renaissance.Current State of AIToday,AI is pervasive in various sectors,from healthcare,where it assists in diagnosing diseases,to finance,where it is used for fraud detection and algorithmic trading.In the consumer market,AI is evident in virtual assistants like Siri and Alexa,which can perform tasks and answer questions through natural language processing.Machine Learning and Deep LearningMachine learning,a core component of AI,has further evolved with the advent of deep learning,which uses neural networks with many layers to analyze complex patterns in data.This has led to significant advancements in image and speech recognition,as well as natural language processing.Ethical Considerations and ChallengesDespite the benefits,AI development has raised ethical concerns,such as privacy issues, the potential for job displacement,and the need for transparency in AI decisionmaking processes.There is also a debate on the potential risks of AI becoming too powerful andthe need for regulation to ensure its safe and beneficial use.Future ProspectsThe future of AI is promising,with ongoing research into areas such as autonomous vehicles,advanced robotics,and personalized AI assistants.However,it is crucial to address the ethical and societal implications to ensure that AI development aligns with human values and contributes positively to society.ConclusionArtificial Intelligence is a transformative technology that continues to push the boundaries of what machines can do.As it develops,it is essential to foster a multidisciplinary approach that includes technologists,ethicists,and policymakers to guide its responsible and beneficial integration into all aspects of life.。

基于Gabor小波能量子带分块的稀疏表示人脸识别

基于Gabor小波能量子带分块的稀疏表示人脸识别
第3 7卷 第 1 期 2 0 1 3年 1 月
燕 山大 学 学报
J o u r n a l o f Ya n s h a n Un i v e r s i t y
Vo 】 .3 7 No .1
J a n. 2 01 3
文章编号 :1 0 0 7 — 7 9 1 X( 2 0 1 3 )0 1 - 0 0 6 8 — 0 7
摘 要 :基于稀疏表示分类 的人脸识别通常提取特征脸 、随机脸和费歇尔脸这些整体特 征,忽略了局部特征在 克服光照和表情变化方面 的优越性 。针对 以上 问题 ,本文提 出了基于 Ga b o r 小波能量子带分块的稀疏表示 人脸
识 别 算 法 。首 先 将 人 脸 图像 进 行 不 同尺 度 和 方 向 下的 Ga b o r 小 波 变 换 ,对 得 到 的 每 个 能 量 子 带进 行 分 块 ,然 后 将 各 子块 能量 信 息 融 合 组 成 子 带 的特 征 向量 ,再 将 各 能 量 子 带 特 征 向量 融 合 组 成 增 强 的 Ga b o r 特 征 向 量 , 最后 将 该特 征 应用 于稀 疏 表 示 人 脸 识 别 。实 验 结 果 表 明 ,该 算 法 对 于 光 照 和 表 情 变 化 具 较 好 的 的 鲁棒 性 。
关键 词 : 人脸 识 别 ; 图像 分 块 ;Ga b o r 小 波 ;稀 疏 表 示
中图分类号:T P 3 9 1 . 4
文献标识码 :A
DOl :1 0 . 3 9 6 9  ̄ . i s s n . 1 0 0 7 - 7 9 1 X . 2 0 1 3 . 0 1 . 0 1 2
一Leabharlann 脸识 别 和纹理 分类 领域 取得 了可喜 的成 果 。

Image-based Facade Modeling

Image-based Facade Modeling

Image-based Fac ¸ade ModelingJianxiong XiaoTian FangPing Tan ∗Peng ZhaoEyal Ofek †Long QuanThe Hong Kong University of Science and Technology ∗National University of Singapore †MicrosoftFigure 1:A few fac¸ade modeling examples from the two sides of a street with 614captured images:some input images in the bottom row,the recovered model rendered in the middle row,and three zoomed sections of the recovered model rendered in the top row.AbstractWe propose in this paper a semi-automatic image-based approach to fac ¸ade modeling that uses images captured along streets and re-lies on structure from motion to recover camera positions and point clouds automatically as the initial stage for modeling.We start by considering a building fac ¸ade as a flat rectangular plane or a developable surface with an associated texture image composited from the multiple visible images.A fac ¸ade is then decomposed and structured into a Directed Acyclic Graph of rectilinear elementary patches.The decomposition is carried out top-down by a recursive subdivision,and followed by a bottom-up merging with the detec-tion of the architectural bilateral symmetry and repetitive patterns.Each subdivided patch of the flat fac ¸ade is augmented with a depth optimized using the 3D points cloud.Our system also allows for an easy user feedback in the 2D image space for the proposed decom-position and augmentation.Finally,our approach is demonstrated on a large number of fac ¸ades from a variety of street-side images.CR Categories:I.3.5[Computer Graphics]:Computational ge-ometry and object modeling—Modeling packages;I.4.5[ImageProcessing and computer vision]:Reconstruction.Keywords:Image-based modeling,building modeling,fac ¸ade modeling,city modeling,photography.1IntroductionThere is a strong demand for the photo-realistic modeling of cities for games,movies and map services such as in Google Earth and Microsoft Virtual Earth.However,most work has been done on large-scale aerial photography-based city modeling.When we zoom to ground level,the viewing experience is often disappoint-ing,with blurry models with few details.On the other hand,many potential applications require street-level representation of cities,where most of our daily activities take place.In term of spatial con-straints,the coverage of ground-level images is close-range.More data need to be captured and processed.This makes street-side modeling much more technically challenging.The current state of the art ranges from pure synthetic methods such as artificial synthesis of buildings based on grammar rules [M¨u ller et al.2006],3D scanning of street fac ¸ades [Fr¨u h and Zakhor 2003],to image-based approaches [Debevec et al.1996].M¨u ller et al.[2007]required manual assignment of depths to the fac ¸ade as they have only one image.However,we do have information from the reconstructed 3D points to automatically infer the critical depth of each primitive.Fr¨u h and Zakhor [2003]required tedious 3D scan-ning,while Debevec et al.[1996]proposed the method for a small set of images that cannot be scaled up well for large scale modelingACM Transaction on Graphics (TOG)Proceedings of SIGGRAPH Asia 2008Figure 2:Overview of the semi-automatic approach to image-based fac ¸ade modeling.of buildings.We propose a semi-automatic method to reconstruct 3D fac ¸ademodels of high visual quality from multiple ground-level street-view images.The key innovation of our approach is the intro-duction of a systematic and automatic decomposition scheme of fac ¸ades for both analysis and reconstruction.The decomposition is achieved through a recursive subdivision that preserves the archi-tectural structure to obtain a Directed Acyclic Graph representation of the fac ¸de by both top-down subdivision and bottom-up merging with local bilateral symmetries to handle repetitive patterns.This representation naturally encodes the architectural shape prior of a fac ¸ade and enables the depth of the fac ¸ade to be optimally com-puted on the surface and at the level of the subdivided regions.We also introduce a simple and intuitive user interface that assists the user to provide feedback on fac ¸ade partition.2Related workThere is a large amount of literature on fac ¸ade,building and archi-tectural modeling.We classify these studies as rule-based,image-based and vision-based modeling approaches.Rule-based methods.The procedural modeling of buildings specifies a set of rules along the lines of L-system.The methods in [Wonka et al.2003;M¨u ller et al.2006]are typical examples of procedural modeling.In general,procedural modeling needs expert specifications of the rules and may be limited in the realism of re-sulting models and their variations.Furthermore,it is very difficult to define the needed rules to generate exact existing buildings.Image-based methods.Image-based methods use images asguide to generate models of architectures interactively.Fac ¸ade de-veloped by Debevec et al.[1996]is a seminal work in this cate-gory.However,the required manual selection of features and the correspondence in different views is tedious,and cannot be scaled up well.M¨u ller et al.[2007]used the limited domain of regular fac ¸ades to highlight the importance of the windows in an architec-tural setting with one single image to create an impressive result of a building fac ¸ade while depth is manually assigned.Although,this technique is good for modeling regular buildings,it is limited to simple repetitive fac ¸ades and cannot be applicable to street-view data as in Figure 1.Oh et al.[2001]presented an interactive sys-tem to create models from a single image.They also manually as-signed the depth based on a painting metaphor.van den Hengel et al.[2007]used a sketching approach in one (or more)image.Although this method is quite general,it is also difficult to scale up for large-scale reconstruction due to the heavy manual interac-tion.There are also a few manual modeling solutions on the market,such as Adobe Canoma,RealViz ImageModeler,Eos Systems Pho-toModeler and The Pixel Farm PFTrack,which all require tedious manual model parameterizations and point correspondences.Vision-based methods.Vision-based methods automatically re-construct urban scenes from images.The typical examples are thework in [Snavely et al.2006;Goesele et al.2007],[Cornelis et al.2008]and the dedicated urban modeling work pursued by Univer-sity of North Carolina at Chapel Hill and University of Kentucky (UNC/UK)[Pollefeys et al.2007]that resulted in meshes on dense stereo reconstruction.Proper modeling with man-made structural constraints from reconstructed point clouds and stereo data has not yet been addressed.Werner and Zisserman [2002]used line seg-ments to reconstruct buildings.Dick et al.[2004]developed 3D architectural modeling from short image sequences.The approach is Bayesian and model based,but it relies on many specific archi-tectural rules and model parameters.Lukas et al.[2006;2008]developed a complete system of urban scene modeling based on aerial images.The result looks good from the top view,but not from the ground level.Our approach is therefore complementary to their system such that the street level details are added.Fr¨u h and Zakhor [2003]also used a combination of aerial imagery,ground color and LIDAR scans to construct models of fac ¸ades.However,like stereo methods,it suffers from the lack of representation for the styles in man-made architectures.Agarwala et al.[2006]composed panoramas of roughly planar scenes without producing 3D models.3OverviewOur approach is schematized in Figure 2.SFM From the captured sequence of overlapping images,we first automatically compute the structure from motion to obtain a set of semi-dense 3D points and all camera positions.We then register the reconstruction with an existing approximate model of the buildings (often recovered from the real images)using GPS data if provided or manually if geo-registration information is not available.Fac ¸ade initialization We start a building fac ¸ade as a flat rectangular plane or a developable surface that is obtained either automatically from the geo-registered approximate building model or we manu-ally mark up a line segment or a curve on the projected 3D points onto the ground plane.The texture image of the flat fac ¸ade is com-puted from the multiple visible images of the fac ¸ade.The detection of occluding objects in the texture composition is possible thanks to the multiple images with parallaxes.Fac ¸ade decomposition A fac ¸ade is then systematically decom-posed into a partition of rectangular patches based on the horizontal and vertical lines detected in the texture image.The decomposition is carried out top-down by a recursive subdivision and followed by a bottom-up merging,with detection of the architectural bilateral symmetry and repetitive patterns.The partition is finally structured into a Directed Acyclic Graph of rectilinear elementary patches.We also allow the user to edit the partition by simply adding and removing horizontal and vertical lines.Fac¸ade augmentation Each subdivided patch of theflat fac¸ade is augmented with the depth obtained from the MAP estimation of the Markov Random Field with the data cost defined by the3D points from the structure from motion.Fac¸ade completion Thefinal fac¸ade geometry is automatically re-textured from all input images.Our main technical contribution is the introduction of a systematic decomposition schema of the fac¸ade that is structured into a Direct Acyclic Graph and implemented as a top-down recursive subdivi-sion and bottom-up merging.This representation strongly embeds the architectural prior of the fac¸ades and buildings into different stages of modeling.The proposed optimization for fac¸ade depth is also unique in that it operates in the fac¸ade surface and in the super-pixel level of a whole subdivision region.4Image CollectionImage capturing We use a camera that usually faces orthogonal to the building fac¸ade and moves laterally along the streets.The camera should preferably be held straight and the neighboring two views should have sufficient overlapping to make the feature corre-spondences computable.The density and the accuracy of the recon-structed points vary,depending on the distance between the camera and the objects,and the distance between the neighboring viewing positions.Structure from motion Wefirst compute point correspondences and structure from motion for a given sequence of images.There are standard computer vision techniques for structure from mo-tion[Hartley and Zisserman2004].We use the approach described in[Lhuillier and Quan2005]to compute the camera poses and a semi-dense set of3D point clouds in space.This technique is used because it has been shown to be robust and capable of providingsufficient point clouds for object modelingpurposes.(a)(b)(c)Figure3:A simple fac¸ade can be initialized from aflat rectangle (a),a cylindrical portion(b)or a developable surface(c).5Fac¸ade InitializationIn this paper,we consider that a fac¸ade has a dominant planar struc-ture.Therefore,a fac¸ade is aflat plane with a depthfield on the plane.We also expect and assume that the depth variation within a simple fac¸ade is moderate.A real building fac¸ade having complex geometry and topology could therefore be broken down into mul-tiple simple fac¸ades.A building is merely a collection of fac¸ades, and a street is a collection of buildings.The dominant plane of the majority of the fac¸ades isflat,but it can be curved sometimes as well.We also consider the dominant surface structure to be any cylinder portion or any developable surface that can be swept by a straight line as illustrated in Figure3.To ease the description,but without loss of generality,we use aflat fac¸ade in the remainder of the paper.For the developable surface,the same methods as forflat fac¸ades in all steps are used,with trivial surface parameterizations. Some cylindrical fac¸ade examples are given in the experiments.Algorithm1Photo Consistency Check For Occlusion Detection Require:A set of N image patches P={p1,p2,...,p N}cor-responding to the projections{x i}of the3D point X. Require:η∈[0,1]to indicate when two patches are similar.1:for all p i∈P do2:s i←0⊲Accumulated similarity for p i 3:for all p j∈P do4:s ij←NCC(p i,p j)5:if s ij>ηthen s i←s i+s ij6:end if7:end for8:end for9: n←arg max i s i⊲ n is the patch with best support 10:V←∅⊲V is the index set with visible projection 11:O←∅⊲V is the index set with occluded projection 12:for all p i∈P do13:if s i n>ηthen V←V∪{i}14:else O←O∪{i}15:end if16:end for17:return V and O5.1Initial Flat RectangleThe reference system of the3D reconstruction can be geo-registered using GPS data of the camera if available or using an interactive technique.Illustrated in Figure2,the fac¸ade modeling process can begin with an existing approximate model of the build-ings often reconstructed from areal images,such as publicly avail-able from Google Earth and Microsoft Virtual Earth.Alternatively, if no such approximate model exists,a simple manual process in the current implementation is used to segment the fac¸ades,based on the projections of the3D points to the groundfloor.We draw a line segment or a curve on the ground to mark up a fac¸ade plane as aflat rectangle or a developable surface portion.The plane or surface position is automaticallyfitted to the3D points or manually adjusted if necessary.5.2Texture CompositionThe geometry of the fac¸ade is initialized as aflat ually, a fac¸ade is too big to be entirely observable in one input image.We first compose a texture image for the entire rectangle of the fac¸ade from the input images.This process is different from image mo-saic,as the images have parallax,which is helpful for removing the undesired occluding objects such as pedestrians,cars,trees,tele-graph poles and trash cans,that lies in front of the target fac¸ade. Furthermore,the fac¸ade plane position is known,compared with an unknown spatial position in stereo algorithms.Hence,the photo consistency constraint is more efficient and robust for occluding object removal,with a better texture image than a pure mosaic. Multi-view occlusion removal As in many multiple view stereo methods,photo consistency is defined as follows.Consider a 3D point X=(x,y,z,1)′with color c.If it has a projection, x i=(u i,v i,1)′=P i X in the i-th camera P i,under the Lam-bertian surface assumption,the projection x i should also have the same color,c.However,if the point is occluded by some other ob-jects in this camera,the color of the projection is usually not the same as c.Note that c is unknown.Assuming that point X is visible from multiple cameras,I={P i},and occluded by some objects in the other cameras,I′={P j},then the color,c i,of the projections in I should be the same as c,while it may be differ-ent from the color,c j,of projections in I′.Now,given a set of projection colors,{c k},the task is to identify a set,O,of the oc-(a)Indicate(b)Remove(c)Inpaint(d)Guide(e)ResultFigure4:Interactive texture refinement:(a)drawn strokes on theobject to indicate removal.(b)the object is removed.(c)automati-cally inpainting.(d)some green lines drawn to guide the structure.(e)better result achieved with the guide lines.cluded cameras.In most situations,we can assume that point X isvisible from most of the cameras.Under this assumption,we have c≈median k{c k}.Given the estimated color of the3D point c,it is now very easy to identify the occluded set,O,according to theirdistances with c.To improve the robustness,instead of a singlecolor,the image patches centered at the projections are used,andpatch similarity,normalized cross correlation(NCC),is used as ametric.The details are presented in Algorithm1.In this way,withthe assumption that the fac¸ade is almost planar,each pixel of thereference texture corresponds to a point that lies on theflat fac¸ade.Hence,for each pixel,we can identify whether it is occluded in aparticular camera.Now,for a given planar fac¸ade in space,all vis-ible images arefirst sorted according to the fronto-parallelism ofthe images with respect to the given fac¸ade.An image is said tobe more fronto-parallel if the projected surface of the fac¸ade in theimage is larger.The reference image isfirst warped from the mostfronto-parallel image,then from the lesser ones according to thevisibility of the point.Inpainting In each step,due to existence of occluding objects,some regions of the reference texture image may still be left empty.In a later step,if an empty region is not occluded and visible fromthe new camera,the region isfilled.In this way of a multi-viewinpainting,the occluded region isfilled from each single camera.At the end of the process,if some regions are still empty,a nor-mal image inpainting technique is used tofill it either automatically[Criminisi et al.2003]or interactively as described in Section5.3.Since we have adjusted the cameras according to the image corre-spondences during bundle adjustment of structure from motion,thissimple mosaic without explicit blending can already produce veryvisually pleasing results.5.3Interactive RefinementAs shown in Figure4,if the automatic texture composition result isnot satisfactory,a two-step interactive user interface is provided forrefinement.In thefirst step,the user can draw strokes to indicatewhich object or part of the texture is undesirable as in Figure4(a).The corresponding region is automatically extracted based on theinput strokes as in Figure4(b)using the method in[Li et al.2004].The removal operation can be interpreted as that the most fronto-parallel and photo-consistent texture selection,from the result ofAlgorithm1,is not what the user wants.For each pixel, n fromLine9of Algorithm1and V should be wrong.Hence,P is up-dated to exclude V:P←O.Then,if P=∅,Algorithm1isrun again.Otherwise,image inpainting[Criminisi et al.2003]isused for automatically inpainting as in Figure4(c).In the secondstep,if the automatic texturefilling is poor,the user can manuallyspecify important missing structural information by extending a fewcurves or line segments from the known to the unknown regions asin Figure4(d).Then,as in[Sun et al.2005],image patches are syn-thesized along these user-specified curves in the unknown regionusing patches selected around the curves in the known region byLoopy Belief Propagation tofind the optimal patches.After com-pleting the structural propagation,the remaining unknownregions(a)Input(b)Structure(c)WeightACB DEHF G(d)SubdivideM(e)Merge Figure5:Structure preserving subdivision.The hidden structure of the fac¸ade is extracted out to form a grid in(b).Such hypotheses are evaluated according to the edge support in(c),and the fac¸ade is recursively subdivided into several regions in(d).Since there is not enough support between Regions A,B,C,D,E,F,G,H,they are all merged into one single region M in(e).arefilled using patch-based texture synthesis as in Figure4(e).6Fac¸ade DecompositionBy decomposing a fac¸ade we try to best describe the faced struc-ture,by segmenting it to a minimal number of elements.The fac¸ades that we are considering inherit the natural horizontal and vertical directions by construction.In thefirst approximation,we may take all visible horizontal and vertical lines to construct an ir-regular partition of the fac¸ade plane into rectangles of various sizes. This partition captures the global rectilinear structure of the fac¸ades and buildings and also keeps all discontinuities of the fac¸ade sub-structures.This usually gives an over-segmentation of the image into patches.But this over-segmentation has several advantages. The over-segmenting lines can also be regarded as auxiliary lines that regularize the compositional units of the fac¸ades and buildings. Some’hidden’rectilinear structures of the fac¸ade during the con-struction can also be rediscovered by this over-segmentation pro-cess.6.1Hidden Structure DiscoveryTo discover the structure inside the fac¸ade,the edge of the reference texture image isfirst detected[Canny1986].With such edge maps, Hough transform[Duda and Hart1972]is used to recover the lines. To improve the robustness,the direction of the Hough transform is constrained to only horizontal and vertical,which happens in most architectural fac¸ades.The detected lines now form a grid to parti-tion the whole reference image,and this grid contains many non-overlapping short line segments by taking intersections of Hough lines as endpoints as in Figure5(b).These line segments are now the hypothesis to partition the fac¸ade.The Hough transformation is good for structure discovery since it can extract the hidden global information from the fac¸ade and align line segments to this hidden structure.However,some line segments in the formed grid may not really be a partition boundary between different regions.Hence,the weight,w e,is defined for each line segment,e,to indicate the like-lihood that this line segment is a boundary of two different regions as shown in Figure5(c).This weight is computed as the number of edge points from the Canny edge map covered by the line segment.Remark on over-segmented partition It is true that the current partition schema is subject to segmentation parameters.But it is important to note that usually a slightly over-segmented partition is not harmful for the purpose of modeling.A perfect partition cer-tainly eases the regularization of the fac¸ade augmentation by depth as presented in the next section.Nevertheless,an imperfect,partic-ularly a slight over-segmented partition,does not affect the model-ing results when the3D points are dense and the optimization works well.(a)Edge weight support(b)Regional statistics supportFigure 6:Merging support evaluation.6.2Recursive SubdivisionGiven a region,D ,in the texture image,it is divided into two sub rectangular regions,D 1and D 2,such that D =D 1∪D 2,by a line segment L with strongest support from the edge points.After D is subdivided into two separate regions,the subdivision procedures continue on the two regions,D 1and D 2,recursively.The recursive subdivision procedure is stopped if either the target region,D ,is too small to be subdivided,or there is not enough support for a division hypothesis,i.e.,region D is very smooth.For a fac ¸ade,the bilateral symmetry about a vertical axis may not exist for the whole fac ¸ade,but it exists locally and can be used for more robust subdivision.First,for each region,D ,the NCC score,s D ,of the two halves,D 1and D 2,vertically divided at the center of D is computed.If s D >η,region D is considered to have bilateral symmetry.Then,the edge map of D 1and D 2are averaged,and subdivision is recursively done on D 1only.Finally,the subdivision in D 1is reflected across the axis to become the subdivision of D 2,and merged the two subdivisions into the subdivision of D .Recursive subdivision is good to preserve boundaries for man-made structural styles.However,it may produce some unnecessary fragments for depth computation and rendering as in Figure 5(d).Hence,as a post-processing,if two neighboring leaf subdivision re-gions,A and B ,has not enough support,s AB ,to separate them,they are merged into one region.The support,s AB ,to separate two neighbor regions,A and B ,is defined to be the strongest weight of all the line segments on the border between A and B :s AB =max e {w e }.However,the weights of line segments can only offer a local image statistic on the border.To improve the ro-bustness,a dual information region statistic between A and B can be used more globally.As in Figure 6,Since regions A and B may not have the same size,this region statistic similarity is defined as follows:First,an axis is defined on the border between A and B ,and region B is mirrored on this axis to have a region,−→B .The over-lapped region,A ∩−→B between A and −→B is defined to be the pixelsfrom A with locations inside −→B .In a similar way,←−A ∩B containsthe pixels from B with locations inside ←−A ,and then it is mirrored tobecome −−−−→←−A ∩B according the the same axis.The normalized crosscorrelation (NCC)between A ∩−→B and −−−−→←−A ∩B is used to define the regional similarity of A and B .In this way,only the symmetric part of A and B is used for region comparison.Therefore,the effect of the other far-away parts of the region is avoided,which will happen if the size of A and B is dramatically different and global statistics,such as the color histogram,are used.Weighted by a parameter,κ,the support,s AB ,to separate two neighboring regions,A and B ,is now defined ass AB =max e{w e }−κNCC (A ∩−→B ,−−−−→←−A ∩B ).Note that the representation of the fac ¸ade is a binary recursive tree before merging and a Directed Acyclic Graph (DAG)after region merging.The DAG representation can innately support the Level of Detail rendering technique.When great details are demanded,the rendering engine can go down the rendering graph to expand all detailed leaves and render them correspondingly.Vice versa,the(x 1,y 1)(x 4,y 4)(x 2,y 2)(x 3,y 3)(a)Fac ¸ade(b)DAGFigure 7:A DAG for repetitive pattern representation.intermediate node is rendered and all its descendents are pruned atrendering time.6.3Repetitive Pattern RepresentationThe repetitive patterns of a fac ¸ade locally exist in many fac ¸ades and most of them are windows.[M¨u ller et al.2007]used a compli-cated technique for synchronization of subdivisions between differ-ent windows.To save storage space and to ease the synchroniza-tion task,in our method,only one subdivision representation for the same type of windows is maintained.Precisely,a window tem-plate is first detected by a trained model [Berg et al.2007]or man-ually indicated on the texture images.The templates are matched across the reference texture image using NCC as the measurement.If good matches exist,they are aligned to the horizontal or vertical direction by a hierarchical clustering,and the Canny edge maps on these regions are averaged.During the subdivision,each matched region is isolated by shrinking a bounding rectangle on the average edge maps until it is snapped to strong edges,and it is regarded as a whole leaf region.The edges inside these isolated regions should not affect the global structure,and hence these edge points are not used during the global subdivision procedure.Then,as in Figure 7,all the matched leaf regions are linked to the root of a common subdivision DAG for that type of window,by introducing 2D trans-lation nodes for the pivot position.Recursive subdivision is again executed on the average edge maps of all matched regions.To pre-serve photo realism,the textures in these regions are not shared and only the subdivision DAG and their respective depths are shared.Furthermore,to improve the robustness of the subdivision,the ver-tical bilateral symmetric is taken as a hard constraint for windows.6.4Interactive Subdivision RefinementIn most situations,the automatic subdivision works satisfactorily.If the user wants to refine the subdivision layout further,three line op-erations and two region operations are provided.The current auto-matic subdivision operates on the horizontal and vertical directions for robustness and simplicity.The fifth ‘carve’operator allows the user to sketch arbitrarily shaped objects manually,which appear less frequently,to be included in the fac ¸ade representation.AddIn an existing region,the user can sketch a stroke to indi-cate the partition as in Figure 8(a).The edge points near the stroke are forced to become salient,and hence the subdivision engine can figure the line segment out and partition the region.DeleteThe user can sketch a zigzag stroke to cross out a linesegment as in Figure 8(b).ChangeThe user can first delete the partition line segments andthen add a new line segment.Alternatively,the user can directly sketch a stroke.Then,the line segment across by the stroke will be deleted and a new line segment will be constructed accordingly as in Figure 8(c).After the operation,all descendants with the target。

gaussianface高斯脸 人脸识别

gaussianface高斯脸 人脸识别

{lc013, xtang}@.hk
arXiv:1404.3840v3 [cs.CV] 20 Dec 2014
Abstract
Face verification remains a challenging problem in very complex conditions with large variations such as pose, illumination, expression, and occlusions. This problem is exacerbated when we rely unrealistically on a single training data source, which is often insufficient to cover the intrinsically complex face variations. This paper proposes a principled multi-task learning approach based on Discriminative Gaussian Process Latent Variable Model, named GaussianFace, to enrich the diversity of training data. In comparison to existing methods, our model exploits additional data from multiple source-domains to improve the generalization performance of face verification in an unknown target-domain. Importantly, our model can adapt automatically to complex data distributions, and therefore can well capture complex face variations inherent in multiple sources. Extensive experiments demonstrate the effectiveness of the proposed model in learning from diverse data sources and generalize to unseen domain. Specifically, the accuracy of our algorithm achieves an impressive accuracy rate of 98.52% on the well-known and challenging Labeled Faces in the Wild (LFW) benchmark [23]. For the first time, the human-level performance in face verification (97.53%) [28] on LFW is surpassed. 1

一种基于ICA和LDA组合的人脸识别新方法

一种基于ICA和LDA组合的人脸识别新方法


要 特征提取是模 式识 别研 究领域 的一个热点。本文提 出了一种基 于独立成分分析和 线性鉴 别分析的特征提 取
方法 。谊方法 中引入 了零 空间的概 念, 出了前人算 法中的不足之 处 , 指 并且给 出了一个 完整 的独立成分分析和线 性鉴 别分析的组合算法。在 OR L和 Ya 人 脸数 据库上的 实验表 明 了该 方法的有效性 。 l e 关键词 人 脸识 别 , 特征提取 , 独立 成分 分析 , 线性鉴别分析 , 空间 零
源信号 。它 的基本思 想如下 : 假设得 到了 M 个观测信号 (=12 …, , , , M) 每个 观测
( p rme t f o ue c n e Deat n mp tr i c,Naj gUnv ri f ce c n eh oo y oC Se ni iest o i ea dT cn lg .Naj g2 0 9 ) n y S n ni 1 0 4 n (co l f eto is n nomain ja guUnvri f inea dT cn lg ,Z ej n 10 3 。 S ho crnc dIfr t , in s iest o e c n eh oo y h ni g2 2 0 ) o El a o y S c a
A w e h d o u i n o CA n DA o a e Re o n t n Ne M t o f F s o fI a dL f rF c c g i o i Z NG -i Y D n -u YANG J gYu WU a -u 2 HE YuJe U o gJ n i - n Xi J n o WANG i o g We— n D
fau e rm ih o d ra d L e t rsfo h r e DA a x rc e t rs whc r s f lt ls i . Ho v r p ro ma c f I A g n c n e ta tfau e i a e u eu o ca sf h y we e 。 e fr n e o C meh d cm bn d wi t o o ie t LDA e re s1we a h i rp t a S o d o rt n I m eh dp e iu l.I sp p r h CA ) o rvo sy n t a e ,wep it u h a k es hi n tt ewe n s o o o h rvo sme h d n e meh ff t r xr cinh s n I ft ep e iu t o sa dan w to o e u ee ta t e o d a o a d CA n a d LDA t o c p ino ul p c wi c n e t fn l s a ei h o S p p sd Ex ei e trs l are n fc aa a e e n ta eteefcie e so h rp s d meh d o r o e . p r n eut c ri o aed tb s sd mo srt h fe t n s ft ep o e to . m s d v o 1| 0 【

融合Gabor与局部切空间排列法的人脸识别算法

融合Gabor与局部切空间排列法的人脸识别算法
降维, 同时将主成分分析 (C 和线性判别分析 (D ) P A) L A 引入到算法中, 确定用最近邻分类器进行分类识别的 最优投影子空间。通过在O L R A脸数据库上的实验证 明了该算法的有效性 , G br 用 ao 小波提取特征对光照和 表情变化等有 良 的鲁棒性。 好 关键词 : ao 小波; G br 局部切空间排列法; 主成分分析; 线性判别分析 文章编 号 :0283 (02 1.280 文献 标识 码 : 中 图分类 号 :P 9 10.3 12 1)000.4 A T31
clua o sT le h rbe , atin l o a T n e t p c i me tP T A)i it d cda to a l in . os v e o l c t o t p msP rt a L cl a g n ae g n ( L S io S Al n snr u e s me d o a h
Ke r s Ga o v l ; ati a L cl a gn p c in n ( L S : r cp l o o e t n ls ywo d : b r waee P rt n l o a T n etS a e gme tP T A) Pi ia C mp n n a i t io Al n A y s
C m u r n i ei d p la os o p t gn r ga Api tn 计算机工程与应用 eE e n n ci
融合 Ga o 与局部切空 间排列法 的人脸识别算法 br
程 琨, 舒 勤, 伟 罗
C NG u , H Qi, UOWl HE K n S U n L e i 四川大学 电气信息学院 , 成都 606 10 5
t p rt o b r g i d etrsGMF t e t c tesb nflsI esme i C adL A ei— oo ea nGao nt eF a e ( e Ma u u ) o xr th ma i d.nt a u o h a me A D a t P n r n

结合Gabor变换和FastICA的人脸表情识别方法

结合Gabor变换和FastICA的人脸表情识别方法

算 法对 噪声的抵 抗能 力 , 以及进 一步提高 算法 的泛化性和 有
效性 。
参考 文献 :
[ 王 志 良, 1 ] 刘芳 , 莉. 王 基于计算机视 觉的表情识别方 法综 述[ . J 计算 】
机 工 程 ,0 6 3 ( 2 . 2 0 ,2 1 ) []Ln Da T n .a i x rsin c sict n uig P A a d h— 2 i w—u gF ca epeso l s ao s C n i l a f i i n
eac ia rda ai fnt n n t r[ . un lo nomain rrhcl ailbs u c o ewokJ J ra fIfr t s i ]o o
中性 生气 厌恶 恐惧 高兴 悲伤 惊讶 平均
Sce c n E gi e ig, 0 6, 2: 0 3 1 4 in e a d n ne rn 2 0 2 1 3 — 0 6.
2 0 ,6 1 :7 —7 . 0 5 1 ( ) 2 52 8
方案 2 的平均 识别率 与方案 1 比要低一 些 。这 是 因为 相 方案2 中训练 样本 集和测试样本集 中的个体 完全不 同 , 无法利 用人脸 间的相 似性来 帮助判 别 , 只能凭 借表情 特征 判别 。而
不 同 的 个 体 即使 表 现 出 相 同 的 表 情 , 们 的 表 情 特 征 也 是 有 他
idp n e t o o et n lss ]E E rnat n o ua n e ed n cmp n n a a i[ . E Ta sci n Ne rl y JI o Ne rs2 0 ,3 6 :4 01 6 . t k ,0 2 1 ( ) 15 —4 4 wo

Gabor小波和局部二值模式结合的一种人脸识别算法

Gabor小波和局部二值模式结合的一种人脸识别算法

P S得 到的特征 维数 为 5× 5 7× H 8× 9× 7=156 0 1 4 .特征 维 数太 高 ,将 存在 大量 冗余 特 征 ,耗费 大量 的存储空 问和识 别时 间.本 文拟将 G br 波和 L P结合 ,寻找一 种特 征提取 的新方 法 ,以降低特征 ao 小 B 维 数.最 后采用 径 向基 核 函数 的支 持 向量 机 ( V S M) 作为分 类器 .
Ga o 小 波和局 部二值模 式结合 的一种人脸 识别 算法 br

陈培 芝 ,陈水利 , 陈 国龙
( .福州大学数学与计算机科 学学院,福建 福州 30 0 ;2 集美 大学理学院 , 1 5 18 . 福建 厦 门 3 12 ; 6 0 1 3 .集荚大学影像 信息工程技术研究 中心,福建 厦 门 3 12 ) 60 1
[ 摘要]针对传统的 G b r ao 小波存在提取特征维数高 、识别时间长的缺点 ,对 G br ao 小波 的使用方 法进 行了改进.首先利用 G br ao 小波的幅值 直接与人脸 图像作乘积 得到 G br ao 图像 ,接着使用 局部二值模式 得 到纹理 图像 ,然后提取出纹理 图像 的直方 图信息 ,并 作为人脸 图像 的特 征 ,最后 使用支持向量机作为 分类
( ̄a Bnr at ,以下 简称 L P 1cl ia Ptr _ y e n B )等 .G br 波 通 过一 系列 多 尺度 、多 方 向 的滤波 器 组 ,分别 ao 小
和人脸 图像作 卷积 ,提取 的特征 能够反 映人 脑视觉 神经 的特性 .L P作 为一种纹 理描述 方法 ,能够 有 B
器 ,在未经过预处理 的 O L人脸 数据库 中取得 9 . R 50% 的识别 率.平均 每张人脸 图像识别 时间为 0 1 , .4s

基于局部Gabor变化直方图序列的人脸描述与识别_张文超

基于局部Gabor变化直方图序列的人脸描述与识别_张文超

ISSN 1000-9825, CODEN RUXUEW E-mail: jos@Journal of Software, Vol.17, No.12, December 2006, pp.2508-2517 DOI: 10.1360/jos172508 Tel/Fax: +86-10-62562563© 2006 by Journal of Softwar e. All rights reserved.*基于局部Gabor变化直方图序列的人脸描述与识别张文超1+, 山世光2, 张洪明1, 陈杰1, 陈熙霖2, 高文1,21(哈尔滨工业大学计算机科学与技术学院,黑龙江哈尔滨 150001)2(中国科学院计算技术研究所,北京 100080)Histogram Sequence of Local Gabor Binary Pattern for Face Description and IdentificationZHANG Wen-Chao1+, SHAN Shi-Guang2, ZHANG Hong-Ming1, CHEN Jie1, CHEN Xi-Lin2, GAO Wen1,21(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)2(Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100080, China)+ Corresponding author: Phn: +86-10-58858300 ext 316, Fax: +86-10-58858301, E-mail: wczhang@, Zhang WC, Shan SG, Zhang HM, Chen J, Chen XL, Gao W. Histogram sequence of local Gabor binarypattern for face description and identification. Journal of Software, 2006,17(12):2508-2517. .cn/1000-9825/17/2508.htmAbstract: In this paper, a method for face description and recognition is proposed, which extracts the histogramsequence of local Gabor binary patterns (HSLGBP) from the magnitudes of Gabor coefficients. Since Gabor featureis robust to illumination and expression variations and has been successfully used in face recognition area. First, theproposed method decomposes the normalized face image by convolving the face image with multi-scale andmulti-orientation Gabor filters to extract their corresponding Gabor magnitude maps (GMMs). Then, the localbinary patterns (LBP) operates on each GMM to extract the local neighbor pattern. Finally, the input face image isdescribed by the histogram sequence extracted from all these region patterns. The proposed method is robust toillumination, expression and misalignment by combing the Gabor transform, LBP and spatial histogram. In addition,this face modeling method does not need the training set for statistic learning, thus it avoids the generalizabilityproblem. Moreover, how to combine the statistic method in the stage of classification and propose statistic Fisherweight HSLGBP matching method are discussed. The results compared with the published results on FERET facedatabase of changing illumination, expression and aging verify the validity of the proposed method.Key words: face recognition; Gabor filter; local binary pattern (LBP); histogram摘 要: 提出了一种在Gabor变换幅值域内提取局部变化模式空间直方图序列(histogram sequence of localGabor binary patterns,简称HSLGBP)的人脸描述及其识别方法.鉴于Gabor特征对光照、表情等变化比较鲁棒,并已在人脸识别领域得到成功应用,首先对归一化的人脸图像进行多方向、多分辨率Gabor小波滤波,并提取其对应不同方向、不同尺度的多个Gabor幅值域图谱(Gabor magnitude map,简称GMM),然后在每个GMM上采用* Supported by the National Natural Science Foundation of China under Grant Nos.60332010, 60673091 (国家自然科学基金); the“100 Talents Program” of CAS (中国科学院“百人计划”); the ISVISION Technologies Co., Ltd (上海银晨智能识别科技有限公司资助项目)Received 2005-07-13; Accepted 2005-12-31张文超等:基于局部Gabor 变化直方图序列的人脸描述与识别2509局部二值模式(local binary pattern,简称LBP)算子抽取局部邻域关系模式,最后由这些模式的区域直方图形成的序列来描述人脸.Gabor变换、LBP、空间区域直方图的采用使得该方法对光照变化、表情变化、误配准等具有良好的鲁棒性.而且,这种人脸建模方法不需要基于训练集合进行统计学习,因而不存在推广性问题.同时,进一步探讨了如何在分类器设计阶段与统计方法进行结合的问题,提出了统计Fisher加权的HSLGBP匹配方法.在通过FERET人脸库光照、表情和时间变化测试集上与已发表的实验结果进行对比,充分验证了该方法的有效性.关键词: 人脸识别;Gabor滤波器;局部二值模式(LBP);直方图中图法分类号: TP391文献标识码: A作为模式识别、图像分析与理解等领域的典型研究课题,人脸识别不仅在理论上具有重要价值,而且在安全、金融等领域具有重要的应用前景,因此在学术界和产业界都受到了广泛的关注,目前已经出现了一些实用的商业系统.然而,由于图像采集条件和人脸自身属性的变化,例如图像采集时的光照、视角、摄像设备的变化,人脸的表情、化妆、年龄变化等,都可能使得同一人的不同照片表观差别很大,造成识别上的困难.因此,提高人脸识别系统对这些变化的鲁棒性成为该领域研究人员的重要目标之一[1,2].为实现鲁棒的识别,可以从人脸建模、分类器设计等不同角度入手:前者试图从寻求对各种外界条件导致的图像变化具有不变性的人脸描述入手;而后者则试图使得分类器对特征的变异有足够的鲁棒性.另外,也可以同时从两个角度入手解决问题.在人脸识别的早期,基于表观(appearance)的人脸识别方法往往直接采用图像灰度作为特征进行分类或者特征选择的基础,如Correlation[3],Eigenfaces[4],Fisherfaces[5]等.而近年来,对图像灰度进行多尺度、多方向的Gabor小波变换逐渐成为主流的思路之一,这主要是因为Gabor小波可以很好地模拟大脑皮层中单细胞感受野的轮廓,捕捉突出的视觉属性,例如空间定位、方向选择等[6].特别是Gabor小波可以提取图像特定区域内的多尺度、多方向空间频率特征,像显微镜一样放大灰度的变化,这样,人脸图像中的眼睛、鼻子和嘴以及其他一些局部特征被放大.因此,采用Gabor变换来处理人脸图像,可以增强一些关键特征,以区分不同人脸图像.Gabor小波也因此在人脸识别中得到了广泛的应用,如弹性图匹配[7]、基于Gabor特征的Fisher判别[8]、基于AdaBoost 的Gabor特征分类[9]等.但是,由于多尺度、多方向的Gabor分解使得数据的维数大量增加,尤其当图像尺寸偏大时更为严重,为避免维数灾难问题必须进行降维.弹性图匹配方法仅对人脸图像中部分关键特征点进行Gabor变换,并将人脸描述为以这些特征点位置为顶点、以其Gabor变换系数为顶点属性、以其关键点位置关系为边属性的属性图,从而将人脸识别问题转化为图匹配问题[7],但该方法对特征点的选择和配准有较高的要求.直接降维的方法是在Gabor变换系数下采样,然后采用Fisher判别分析方法进一步提取特征[8],但简单下采样很可能造成一些重要信息的丢失.针对该问题,文献[9]提出了一种基于AdaBoost对高维Gabor特征进行选择的降维方法,采用机器学习的方法更加客观地利用Gabor特征,从而能在有效降维的同时提高识别性能.上述方法在一些人脸库上取得了很好的识别结果,但由于它们采用了统计或者学习的策略,因此都不可避免地遇到推广性的问题,即算法性能在很大程度上依赖于测试集和训练集之间数据分布的相似程度.尽管统计学习理论以该问题为核心进行了深入的理论探讨[10],但在模式识别算法应用实践上,该问题仍然很棘手,尤其是对很多人脸识别实用系统而言,往往不可能获得待识别人的多个样本,这就意味着不可能对它们进行有效的训练.例如,在护照、驾照、身份证等的验证、大规模人脸图像库上的照片检索等应用中,每个人都只提供了单幅图像,很难进行针对性的训练.解决该问题的可能思路有两条:其一是仍然采用统计、学习策略,寄希望于其泛化能力;其二则是考虑非统计学习的策略,转向采用直接的模型匹配方法.近年来,基于局部二值模式(local binary pattern,简称LBP)[11]的人脸识别方法受到人们的关注,该方法来源于纹理分析领域.它首先计算图像中每个像素与其局部邻域点在亮度上的序关系,然后对二值序关系进行编码形成局部二值模式,最后采用多区域直方图作为图像的特征描述.该方法在FERET人脸图像数据库上取得了很好的识别性能.LBP方法本质上提取的是图像边缘、角点等局部变化特征,它们对于区分不同的人脸是很重要的.但是,边缘特征有方向性和尺度差异,角点特征也同样有不同的尺度,或者说边缘、角点等图像特征的方向性和尺度包2510 Journal of Software 软件学报 V ol.17, No.12, December 2006 含了更多的可以区分不同人脸的细节信息,而原始的LBP 算子却不能提取这些特征.基于此,本文提出了一种称为局部Gabor 二值模式直方图序列的人脸描述及其相应的识别方法(histogram sequence of local Gabor binary patterns,简称HSLGBP).该方法首先利用Gabor 变换提取多方向、多尺度的局部图像特征,然后应用LBP 算子对这些特征进行编码,最后采用空间直方图进行人脸建模,人脸的分类识别则通过直方图匹配来实现.在此基础上,本文还进一步考察了基于Fisher 准则对不同区域加权的直方图匹配策略,以进一步提高识别精度.与直接从图像的灰度计算LBP 空间直方图相比,Gabor 变换和LBP 的结合提取了更多方向、更多尺度的局部特征,从而有效地增强了空间直方图的表示能力,我们在FERET 等人脸库上的实验有力地表明了这一点.尤其是使用加权HSLGBP 在FERET 人脸库上的表情、光照和时间变化几个测试集上均取得了目前已知的最好结果.这进一步表明HSLGBP 对人脸图像条件的变化是鲁棒的,而且具有很好的判别能力.1 基于HSLGBP 的人脸描述方法本文提出的HSLGBP 人脸描述提取过程如图1所示,主要包括以下步骤:(1) 根据手工标定人眼位置对输入人脸图像归一化;(2) 将40个不同尺度、不同方向的Gabor 滤波器分别与归一化图像卷积,取每个卷积结果的幅值部分,这里称为Gabor 幅值图谱(Gabor magnitude map,简称GMM);(3) 对每个GMM 应用LBP 运算得到局部Gabor 二值模式(local Gabor binary pattern,简称LGBP)图谱;(4) 将每个LGBP 图谱划分为互不重叠且具有特定大小的多个矩形区域,对每个区域计算直方图;(5) 将GMM 的所有区域的直方图串接为一个直方图序列,作为人脸图像的描述.下面详细阐述这一过程.Fig.1 The framework to extract the proposed HSLGBP face representation图1 本文提出的HSLGBP 人脸描述提取方法框架1.1 Gabor 幅值图谱考虑到如前所述Gabor 变换的优良特性,我们使用Gabor 滤波器来分解输入的人脸图像.本文使用的Gabor 滤波器如式(1)所示[12]. []2222,,2,222,e e e )(σσνµνµνµνµσψ−⎟⎠⎞⎜⎝⎛−−=z ik z k k z (1) 其中,z =(x ,y );||⋅||表示范数运算;µφννµi k k e,=,ννf k k max =,8µφµπ=,µ和ν分别表示Gabor 滤波器的方向和尺度. 人脸图像的Gabor 特征由人脸图像和Gabor 滤波器的卷积得到.令f (x ,y )表示人脸图像的灰度分布,那么f (x ,y )和Gabor 滤波器的卷积可定义为)(),(),,,(,z y x f y x G µνψµν∗= (2) 其中,‘*’表示卷积运算.这样,由一系列的}4,...,1,0{∈ν和µ∈{0,1,…,7}即可得到人脸图像的多层Gabor 滤波器分 解表示.1.2 局部Gabor 二值模式Gabor 变换得到的是由实部和虚部组成的复数,包含幅值和相位谱.其中,相位谱随着空间位置呈周期性变化[7],因而通常认为不适合作为人脸特征.而幅值的变化相对平滑而稳定,因此,本文仅用变换后的幅值作为人脸Normalized imageGMMs …………LGBPs Original image Gabor ……张文超 等:基于局部Gabor 变化直方图序列的人脸描述与识别 2511 特征的描述.由于与40个Gabor 滤波器进行卷积,Gabor 特征的维数相对于原始图像维数急剧增加,必须进行降维.常用的降维方法有两种:一种是首先进行均匀下采样,然后进行特征提取(如主成分分析)[8],但下采样可能会丢失一些重要的判别特征;另一种方法,如弹性图匹配[7],只对人脸图像上选择的一些特征点做Gabor 变换,但这样既对特征点的定位要求非常高,又对特征点的选择敏感.本文采用LBP 算子来编码Gabor 幅值的邻域变化,并用直方图分析LBP 编码后的局部变化属性,这样直接对GMM 分析,避免了因下采样造成的信息损失,同时又通过直方图统计达到了降维的目的.近年来,LBP 算子在人脸识别领域中受到关注[11,13,14],该算子[15]对图像每个像素f c 的8邻域采样,每个采样点f p (p =0,1,…,7)与中心像素f c 做灰度值二值化运算S (f p −f c ).⎩⎨⎧<≥=−c p c p c p f f f f f f S ,0,1() (3) 其中f c 表示该中心像素的灰度值;f p 表示采样点的灰度值.然后,通过对每个采样点赋予不同的权系数2p 来计算该f c 的LBP 值,p p c p f f S LBP 2)(7∑=−= (4)LBP 运算刻画了局部图像纹理的空间结构.本文对Gabor 幅值进行LBP 运算,简记为LGBP.p p c p y x G y x G S LGBP 2)),,,(),,,((70∑=−=νµνµ(5)1.3 LGBP 空间区域直方图序列(HSLGBP )人脸表观由于表情、时间等变化呈现多样性,带来识别上的困难.基于全局的人脸表示方法对这种面部的局部变化不鲁棒,而基于区域的分析通常会较好地解决这一问题[16].基于此,我们将人脸图像划分为多个区域进行分析,即将LGBP 划分为多个不相交的矩形区域,并用直方图刻画每个区域的分布属性.这样,高维LGBP 变为低维的直方图,人脸图像用所有直方图串接而成的直方图序列描述.我们采用直方图对LGBP 进行统计,同时通过选择合适的直方图Bin 数达到对LGBP 降维的目的.灰度范围是[0,L −1]的图像f (x ,y )的直方图可定义为∑−===y x i L i i y x f I ,1,...,1,0 },),({h(6)其中i 表示第i 个灰度级;h i 是具有第i 级灰度的像素的数目,并且⎩⎨⎧=false is ,0true is ,1}{A A A I (7) 将每个LGBP 图谱划分为m 个区域R 0,R 1,…,R m −1,根据式(8)从每个区域提取直方图:7,...,1,0 ;4,...,1,0 ;1,...,1,0 ;1,...,1,0 },),{()},,,({),(,,,==−=−=∈=∑∈µννµµνm j L i R y x I y x LGBP I Ηj R y x i R j j (8)将所有的直方图串接为一个序列ℜ作为最终的人脸描述,本文称其为HSLGBP. ),...,,...,,,...,(1,7,41,1,00,1,01,0,00,0,0−−−=ℜm m m ΗΗΗΗΗ(9) 1.4 HSLGBP 描述方法的性质分析HSLGBP 人脸描述方法计算过程简单、直接,无需训练集,且具有对表情、老化、误配准等各种变化鲁棒的特点,分析如下:1) 对各种图像变化的鲁棒性分析表情和时间的变化主要表现在面部的局部区域,如吃惊会使眼睛和嘴发生较大变化、皱纹出现在额头和眼角等.基于全局的人脸描述通常对面部的局部变化不鲁棒,但基于局部的人脸描述可以较好地解决这个问题[16].由于本文采用基于区域分析的人脸描述方法,同时直方图序列保留了人脸图像的空间结构信息,因而在保留对人脸图像的整体描述的同时对面部的局部变化也会鲁棒.2512Journal of Software 软件学报 V ol.17, No.12, December 20062) 对图像误配准的鲁棒性分析 由于特征点定位不精确带来的图像误配准问题往往会引起人脸描述特征的较大“变异”,从而影响最终人脸识别系统的分类识别性能[17].而HSLGBP 人脸描述方法采用的Gabor 特征提取方法,其提取的是局部图像特征,即每个Gabor 特征都是原图像内一定区域范围内的若干个像素共同作用的结果,而不只是单一像素的性质,因此,Gabor 幅值变化更加平缓,即图像中一定范围内的Gabor 卷积幅值相差不大.这样,当图像中的特征定位出现一定偏差时,对Gabor 变换的幅值结果影响不大.同时,后端我们进一步采用了描述局部变化的区域直方图,更提高了HSLGBP 人脸描述方法对图像误配准的鲁棒性.3) 无须训练,具有良好的推广性主流的基于统计或者学习策略的人脸识别方法尽管取得了一定的成功,但是,该类方法需要训练数据用于人脸建模,因而存在推广性的问题,即由于测试数据分布与训练数据的分布存在差异,尤其当其分布差异较大时,识别性能下降很大.由于测试数据的未知性,通过扩大训练集的方式并不能很好地解决泛化能力的问题.而基于HSLGBP 的人脸建模不需要训练集进行训练,且基于HSLGBP 的匹配也无须训练,因而避免了基于统计学习的人脸建模中的推广能力的问题.2 基于HSLGBP 的人脸识别方法由于HSLGBP 是由多个直方图组成的序列,与传统意义上的“人脸描述特征”大相径庭,不能采用欧式距离、余弦夹角之类的相似性度量方法,所以,本文采用直方图的交作为两个直方图序列之间的匹配准则.另一方面,构成HSLGBP 的每个直方图是从不同的面部区域中提取获得,而不同的面部区域对分类识别的贡献显然应该是不同的,基于此,本文还提出了一种基于Fisher 判据加权的HSLGBP 匹配策略,通过对不同的区域直方图赋予不同的权值来提高识别的精度.2.1 基于直方图交的HSLGBP 匹配直方图交是一种简单而有效的直方图相似度度量方法,其计算方法为∑==L i i i h h H H 12121),min(),(Ψ (10) 其中,H 1和H 2表示两个直方图;L 是直方图Bin 的数目.这种度量方式是以两直方图H 1和H 2之间的相同部分的大小来衡量两直方图之间相似性的强弱的.使用这种度量方式,基于HSLGBP 的人脸描述方法的样本间的相似度可用式(11)计算.∑∑∑==−==ℜℜ4070102,,1,,21),(),(νµm r r µνr µνH H S Ψ(11)2.2 Fisher 准则加权的HSLGBP 匹配如前所述,表情和时间变化对面部的特定区域产生影响.并且,人脸图像中通常作为关键特征的区域,如眼睛、嘴角等本身具有较强的判别能力,而额头、面颊等区域则包含较少的判别信息.因此,对人脸的不同区域赋予不同的权值以反映各个区域的变化规律,将对识别产生积极的影响.据此分析,当进行HSLGBP 匹配时,我们对不同的空间区域直方图赋予不同的权值.这时,相似度计算公式(11)可重写为式(12)的形式:∑∑∑==−==ℜℜ′4070102,,1,,,,21),(),(νµm r r µνr µνr µνH H W S Ψ (12)这里,r µνW ,,是第g 个LGBP 图谱的第r 个区域的权值.本文基于Fisher 线性判别分析,即类内散度尽可能小而类 间散度尽可能大来分析每个区域的权值.但由于人脸识别问题是一个多类分类问题,为应用Fisher 线性判别准则,本文借鉴文献[18]提出的将多类问题转化为两类问题的“类内差”空间和“类间差”空间的思想,对于LGBP 图谱中的每个区域,同一类样本的不同样本间的相似度形成“类内相似度”空间,不同类样本的样本间的相似度形成“类间相似度”空间.张文超 等:基于局部Gabor 变化直方图序列的人脸描述与识别2513假设对于C 类问题,由上述转化可得:类内相似度均值为 ∑∑∑==−=−=C i N k k j k i r µνj i r µνi i r W i H H N N C m 1211),(,,),(,,),,(),()1(21Ψµν (13) 其中,N i 是第i 类样本的样本数;),(,,j i r µνH 表示从第i 类的第j 个样本的第),(µν个LGBP 图谱的第r 个区域提取的 直方图.则类内相似度方差为∑∑∑==−=−=C i N k k j r µνW k i r µνj i r µνr µνW i m H H S 12112),,(),(,,),(,,2),,()),((Ψ (14) 类间相似度均值为()∑∑∑∑−=+===−=11111),(,,),(,,),,(,1)1(2C i C i j N k N l l j r µνk i r µνj i r µνB i jH H N N C C m Ψ (15) 类间相似度方差为∑∑∑∑−=+===−=111112),,(),(,,),(,,2),,()),((C i C i j N k N l r B l j r k i r r B i j m H H S µνµνµνµνΨ (16)最终,得到r µνW ,, 2),,(2),,(2),,(),,(,,)(r µνB r µνW r µνB r µνW r µνS S m m W +−= (17)本文应用上述两种度量方式作为不同直方图序列的相似度度量准则,使用最近邻分类准则进行分类.3 对比实验与分析为了评测本文所提方法的性能,我们使用人脸识别研究领域中广泛应用的FERET 人脸库[2]进行了测试.这是一个规模比较大的人脸数据库,且提供了指定的训练集、原型集(gallery)和测试集(probe).其中,训练集由429人的1 002幅中性表情和表情变化的图像构成.原型集是由1 196人的每人一幅正面图像构成.4个测试集分别是:fb,fc,Dup.I 和Dup.II.其中,fb 含有1 195幅与原型集同时采集,并且与原型集图像光照相同的表情变化图像;fc 中含有194幅与原型集图像采集光照条件不同的人脸图像;在Dup.I 和Dup.II 中,分别有722幅和234幅图像,采集时间距原型集分别为一个月和一年左右.3.1 评测比对基准算法为了评价本文提出方法的有效性,我们选择了当前主流的人脸识别方法Fisherfaces,基于Gabor 特征的Fisher 判别作为基准比对算法.同时,与FERET’97评测以及文献[11]在FERET 测试集上的最好结果做比对.3.1.1 Fisherfaces [5]Fisherfaces 利用Fisher 判别准则对数据进行变换,即使得变换后的数据类内散度尽可能小,而类间散度尽可能大.Fisher 线性判别分析是寻找变换矩阵W ,该矩阵是由b wS S T 1−=的按特征值排序的特征向量构成的,其中,S b 是类间散度矩阵;S w 是总类内散度矩阵.原始数据x 在变换矩阵W 上投影形成新的特征y ,即y =W T x .测试集数据与原型集数据分别经过上述变换,然后应用最近邻进行分类.3.1.2 基于Gabor 特征的Fisher 判别[8]该方法首先对图像进行多尺度、多方向的Gabor 变换,然后对所得高维数据进行均匀下采样,由于下采样的结果也不能满足类内散度矩阵满秩的条件,所以运用主成分分析对下采样的结果进一步降维,然后进行Fisher 线性判别分析,本文将其简记为GFC.由于该方法与本文所提出方法的初始特征相同,选择该方法作比对可以更好地衡量本文所提出方法的性能.同时,也可以从该方法的结果看出基于图像的灰度特征(Fisherfaces)与基于Gabor 特征的识别结果的差异.2514Journal of Software 软件学报 V ol.17, No.12, December 20063.1.3 FERET’97[2]评测最好结果 FERET’97[2]评测是对人脸识别具有重要意义的一个评测.首先,它提供了一个公共的人脸数据库,使得不同的人脸识别算法可以此为比对基准;其次,该数据库人脸图像的变化对算法的鲁棒性能提出了很高的要求.通过这个测试,促进了人脸识别向实际应用的发展.其规定的训练集、原型集和测试集也是研究人员测试算法性能常选择的数据库.因此,这个数据库的测试结果也可作为不同算法性能的比对依据.3.1.4 文献[11]发表的最好测试结果文献[11]中的结果与文献[2]中的最好结果相比,除fc 集合以外,均有一定程度的提高.为了评价本文方法与基于图像灰度的LBP 方法的性能差异,我们选择了文献[11]中的最好结果做比对.3.2 精确配准实验3.2.1 区域窗口大小和直方图bin 对识别率的影响在本文提出的方法中,区域的大小会对识别性能有一定的影响:如果区域过大,其极端情况即原图像大小,这样无法体现局部区域分析的优势;如果区域过小,其极端情况是像素级分析,这样将对图像的配准敏感性增强.同时,对灰度级进行不同程度的量化也会对识别性能有一定的影响.为观察这些变化对识别性能的影响,我们从FERET 人脸库的测试集中选出50个测试图像,并在Gallery 中选出其相应的图像,进行识别实验.依据已有的研究成果[19,20],根据FERET 协议给定的眼睛位置,将图像归一化为80×88像素(可参见图3给出人脸图像归一化示例).图2给出了变化区域窗口大小和对灰度级进行量化的实验结果.Fig.2 The rank-1 recognition rate of different sizes of region and histogram bin图2 不同区域大小和直方图Bin 时的首选识别率从图2结果可知,当区域窗口相对较小时,由于保留的图像结构信息越多,识别性能越好.但同时,由于区域越小对图像配准和表情等变化越敏感,因此,窗口大小为4×4时的识别率低于4×8;而当窗口变大时,结构信息保留的越少,识别率呈下降趋势.另外,由对灰度级量化的实验结果可知,尽可能地保留灰度级数目可以提高识别性能,这是由于尽可能地逼进原始特征的缘故.但是,这样也会使数据的维数过高.从图2可见,当区域窗口较小时,对灰度级进行量化对识别性能影响不是很大.3.2.2 对表情、光照和时间变化的鲁棒性测试为测试本文方法的鲁棒性,我们分别在表情、光照和时间变化的4个集合上做了测试.为保留更多的人脸图像空间结构信息,同时使特征维数较低,我们选择的区域窗口大小为4×8像素,灰度级量化为16级.几种方法的实验结果见表1. Table 1 The rank-1 recognition rates of different algorithms for the FERET probe sets表1 不同算法在FERET 人脸测试集上的首选识别率Method fb fc Dup.I Dup.II GFC 0.95 0.84 0.67 0.61 Best results of FERET’97[2]0.96 0.82 0.59 0.52 Best results of Ref.[11] 0.97 0.79 0.66 0.64HSLGBP 0.94 0.97 0.68 0.53Weighted HSLGBP 0.98 0.97 0.74 0.711.00.90.80.70.60.50.40.30.20.10.0Window sizeR e c o g n i t i o n r a t e张文超等:基于局部Gabor 变化直方图序列的人脸描述与识别2515由实验结果可以看出,在fb测试集上,几种方法的结果比较接近,而且识别率均在94%以上.由此可见,当测试图像与原型集图像采集时间相同时,尽管表情发生变化,但识别依然相对容易.从加权HSLGBP和HSLGBP测试结果可见,对不同的区域赋予不同的权值可以提高识别性能,并且其结果也超过了其他几种算法的结果.在fc上的实验结果相差比较显著:Fisherfaces的识别率为73%;GFC的识别率为84%.尽管两者都是基于Fisher判别分析的,但由于fc是相对于原型集发生光照变化的图像集合,而训练集中没有光照变化的图像,因此,Fisherfaces显现了推广性的问题;而GFC的识别率之所以高,是因为Gabor特征对光照变化比较鲁棒的原因.基于非统计学习的文献[11]的结果也较差,可见前文分析的基于图像的LBP的人脸识别方法具有一定的局限性.而本文HSLGBP和加权HSLGBP的识别率均为97%,一方面是由于Gabor滤波对光照比较鲁棒,另一方面是因为光照对图像的影响也一定程度地体现在对区域的影响上,例如光照变鼻部影响较大,因此,基于Gabor滤波与区域分析的联合作用使得本文的方法在fc上取得好的测试结果.在Dup.I和Dup.II上,识别结果都有所下降,且Dup.II的结果更差.可见测试集图像的采集与原型集图像采集时间间隔越久,人脸图像的变化越大,尤其是局部的变化,这样对识别的影响也就越大.其中在Dup.II上, Fisherfaces仅为31%;而GFC为61%.这一方面是由于训练集中的样本都是和原型集图像同一时间采集,即没有时间变化的样本;另一方面也说明Fisherfaces和GFC这种基于全局的分析方法不能很好地表示图像的细节变化.而基于区域的分析,如文献[11]结果为64%;加权HSLGBP的结果为71%.即可说明基于区域的分析对时间变化具有较好的鲁棒性.此外,在Dup.II上,加权HSLGBP比HSLGBP识别结果高18%,可以看出,对不同区域赋予不同的权值,可以有效地提高识别性能.值得指出的是:在fc测试中,加权的HSLGBP比HSLGBP没有提高,我们认为这是由于光照的变化不具有如表情、时间的变化对人脸不同区域影响的规律性(如表情、时间的变化分别特定的区域产生影响).从测试集为时间变化的图像的测试结果也可以看出,随着时间的变化,尽管人脸的总体表观变化不会很大,但是局部细节的变化对识别的影响还是很大的.3.3 配准鲁棒测试对于实用的人脸识别系统而言,面部特征的精确定位是一个困难的问题.首先,面部特征定位算法在光照、表情甚至遮挡的情况下,会出现一定的偏差;另外,由于图像的成像质量对特征定位也有一定的影响;再者,在用户不配合的非限定条件应用系统中,自动提取的面部特征点往往会有很大的偏差[17].因此,要求人脸识别算法本身对误配准具有较好的鲁棒性.为测试本文提出方法对误配准的鲁棒性,我们采用对手工标定的人眼位置加入随机高斯噪声的策略,获得具有配准误差的人脸测试图像.同时,为观察该方法在其他人脸库上的性能,我们在AR人脸库上做了实验.AR人脸数据库[21]由126人的3 200多幅正面人脸图像组成.其中每个人26幅图像,分两个阶段采集,时间间隔为两周.每阶段采集13幅具有不同表情、光照和遮挡的图像.具体实现过程如下:首先,手工标定人脸图像中双眼位置,如图3(a)所示;然后,按照双眼位置将图像归一化为具有相同大小且眼睛位置相同的图像.如果按照精确定位归一化人脸图像,将得到如图3(b)上图所示的图像;如果在归一化时对精确定位加入高斯随机噪声,则得到如图3(b)下图所示的误配准图像.本文实验采用加入均值为0、均方差从0.2~2.4的高斯随机噪声.为说明本文方法的有效性,我们使用Fisherfaces和GFC作为评测基准.测试结果如图4所示.由实验结果可见:HSLGBP随着配准误差的增加,识别精度下降缓慢;而GFC和Fisherfaces下降都比较快. Fisherfaces在配准误差的标准差大于0.8后识别结果变化较小,是因为本身识别率已经很低.由此可见:本文提出的方法对配准误差鲁棒,当存在配准误差时仍表现出了良好的识别性能.4 结论及后续工作本文提出了一种新的基于局部Gabor变化直方图序列的人脸描述与识别方法(HSLGBP),该方法首先采用Gabor小波滤波器提取人脸图像中各种方向、各种尺度的局部细节变化特征,然后进一步对这些特征进行局部二值编码,并使用局部空间直方图来描述人脸,最终通过直方图匹配完成人脸识别.。

基于Gabor小波和二维主元分析的人脸识别

基于Gabor小波和二维主元分析的人脸识别

摘 要
论 文 提 出 了一种 基 于 G br小渡 和二 维主 元 分 析 ( D C 的人 脸 识 别 方 法 。 方 法 首先 对 人 脸 图像 进 行 G br a变 换 的 系数 作 为人 脸 图像 的特 征 向量 ; 将 然后 , 2 P A 对 所 得 的 人 脸 图像 特 征 进 行 降 维 . 采 用 最 近 用 DC 并 邻 法 进行 分 类 ; 最后 , 用 A & 利 T T人 脸 库 , 基 于 G br 对 ao 小波 和二 维主 元 分析 (D C 的人 脸识 别 方法 和 基 于 G br 波和 2 P A) ao 小 P A 的人 脸识 别方法 进 行 了仿真 比较 实验 。 真 实验表 明。 于 G b r C 仿 基 a o 小波和 2 P A 的人 脸识 别 方法具 有 较好 的识 别性 能 。 DC
cmp n n a a s ( D C .i t te c ef i t o a o a ee t n f m d r i rm a f e i g r tk n a o o e t n l i 2 P A) r , o f c ns f G b rw v lt r s r ei n f a ma e ae a e s ys Fs h i e a o v g o c
smu ai n r s l s o h t te fr r h s t e g o e o n t n e fr a c o h a e i g . i lt e u t h ws t a h o me a h o d r c g i o p r m n e fr t e f c ma e o i o
维普资讯
基于 Ga o 小波和二维主元分析 的人脸 识别 br
马晓 燕 杨 国胜 范 秋风 王应 军

非等位基因

非等位基因

非等位基因概述非等位基因是指同一基因座上的不同等位基因。

等位基因是指在某个给定的基因座上,可以存在多种不同的变体。

每个个体继承了一对等位基因,一对等位基因可能会导致不同的表型表达。

非等位基因的存在使得遗传学研究更加复杂,因为不同的等位基因会对个体的表型产生不同的影响。

背景在生物学中,基因座是指染色体上一个特定的位置,该位置上的基因决定了某个特征的表达方式。

每个基因座上可以有多种不同的等位基因。

等位基因是指在某个特定基因座上的不同基因变体。

每个个体都会继承一对等位基因,通过这对等位基因的不同组合,决定了个体的表型。

然而,并非所有基因座上的等位基因都具有相同的表现型。

非等位基因的影响非等位基因的存在导致不同等位基因会对个体表型产生不同的影响。

有些非等位基因会表现出显性效应,也就是说,当个体继承了一个突变的等位基因时,即使同时继承了一个正常的等位基因,但显性效应会使得突变的等位基因的表型表达得到体现。

相反,有些非等位基因会表现出隐性效应,当个体继承了两个突变的等位基因时,才会表现出突变的表型。

除了显性和隐性效应之外,非等位基因还可能发生两种其他类型的表型效应。

一种是共显效应,当个体继承了两个不同的突变等位基因时,在表型表达上会表现出一种新的特征,这个特征并不是单个突变等位基因所能导致的。

另一种是部分显性效应,当个体继承了两个不同的突变等位基因时,表型表达将介于两个单独突变等位基因的表型之间。

重组和非等位基因重组是指两个不同的染色体交换部分基因序列的过程。

在重组的过程中,非等位基因可能会发生改变,导致新的等位基因组合形成。

这一过程使得非等位基因的表型效应更加复杂,因为新的等位基因可能将不同基因座的效应组合起来。

非等位基因的重要性非等位基因对生物的适应性和多样性起着重要作用。

通过对等位基因的各种组合的研究,人们可以更好地理解基因与表型之间的关系,并揭示遗传变异对物种适应环境的重要性。

总结非等位基因是指同一基因座上的不同等位基因。

构造地球化学找矿方法及其应用

构造地球化学找矿方法及其应用

找矿技术P rospecting technology构造地球化学找矿方法及其应用陈航华(广东省有色金属地质局九四〇队,广东 清远 511520)摘 要:在我国新发展形式下,地质矿产勘查技术不断发展创新,尤其是构造地球化学找矿方法,现已经被广泛应用。

将化学地球与构造地质理论融会贯通,应用到实际找矿工作中,理论结合实践,解决了以往矿产勘查中的种种困难,使找矿工作更加高效、准确。

本文从构造地球化学找矿工作原理着手,阐述利用构造地球化学找矿方法的发展历程和意义。

最后,对构造地球化学找矿方法的应用,以具体案例展开详细分析。

关键词:构造地球化学;找矿方法;实践应用;预测隐伏矿体中图分类号:P632 文献标识码:A 文章编号:1002-5065(2021)04-0070-2Structural geochemical prospecting method and its applicationCHEN Hang-hua(No.940 Branch of Nonferrous Metals Geological Bureau of Guangdong Province,Qingyuan 511520,China)Abstract: Under the new development form of China, the geological and mineral exploration technology has been developing and innovating, especially the tectonic geochemical prospecting method, which has been widely used. The combination of chemical earth and tectonic geology theory is applied to the actual prospecting work. The theory combined with practice solves the difficulties in the past mineral exploration, and makes the prospecting work more efficient and accurate. This paper starts with the principle of tectonic geochemical prospecting, and expounds the development process and significance of using the method of tectonic geochemical prospecting. Finally, the application of tectonic geochemical prospecting method is analyzed in detail with specific cases.Keywords: tectonic geochemistry; prospecting method; practical application; prediction of concealed ore bodies构造地球化学法是一种将地球化学和构造地质相结合的找矿新方法,同时研究地质构造组成和地质化学元素活化迁移及其运动内在规律。

基于Gabor小波与FastICA的人脸表情特征提取方法研究

基于Gabor小波与FastICA的人脸表情特征提取方法研究

k v e j  ̄

k v =
. ,
, =

( 2 )
式中, 尼 为 最 高频率 的带通 滤波 器 中心频 率 ; 厂 是 一 个 限定频 域 中核 函数 距 离 的间 隔 因子 ,通 常
选取 f=4 2, =2 x,
= / 2。
对于 一幅 人脸 图像 的 Ga b o r 小 波特 征表 示 只 需将 该 图像 与 Ga b o r小波 核 函数 卷 积 ,即 假 设 I ( z ) 表示一幅图像的灰度分布, 则该 图像的 G a b o r 特 征 表示 为 :

F a s t I C A 方法L 5 】 是 芬 兰 赫 乐 辛 基 工 业 大 学 Hy v a r i n e n 等人 提 出并 发 展起 来 的一 种快 速 I C A 方法 ,其基 于非 高 斯最 大化 原 理 , 以近 似 负熵 为
式( 1 ) 表示一个正弦波,其经过高斯包络调制
过, 并 且k = + = I k I , 模I I k 1 I 控 制 着 高 斯
作者 简介 :李 烈 熊( 1 9 8 4 -) ,男 ,讲 师 ,研 究方 向:模 式识 别 。
第 5期
李烈熊:基于 G a b o r 小波与 F a s t I C A 的人脸表情特征提 取方法研 究
波器 组 。
表情) , 针对上述表情的研究也有很多, 例如陈培 l J 利用 A AM 模 型 ,结合 粗糙 集 理论 ,用 S VM
作 为 分 类 方 法 ,获 得 了 8 3 . 7 0 %的整 体 识别 率 : Z h a o等 人【 2 ] 采 用 Ga b o r小波+ L B P特 征提 取 和最

FR_ICA1

FR_ICA1

Face Recognition by IndependentComponent AnalysisMarian Stewart Bartlett,Member,IEEE,Javier R.Movellan,Member,IEEE,and Terrence J.Sejnowski,Fellow,IEEEAbstract—A number of current face recognition algorithms use face representations found by unsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images.Principal compo-nent analysis(PCA)is a popular example of such methods.The basis images found by PCA depend only on pairwise relationships between pixels in the image database.In a task such as face recognition,in which important information may be contained in the high-order relationships among pixels,it seems reasonable to expect that better basis images may be found by methods sensitive to these high-order statistics.Independent component analysis (ICA),a generalization of PCA,is one such method.We used a version of ICA derived from the principle of optimal information transfer through sigmoidal neurons.ICA was performed on face images in the FERET database under two different architectures, one which treated the images as random variables and the pixels as outcomes,and a second which treated the pixels as random variables and the images as outcomes.The first architecture found spatially local basis images for the faces.The second architecture produced a factorial face code.Both ICA representations were superior to representations based on PCA for recognizing faces across days and changes in expression.A classifier that combined the two ICA representations gave the best performance.Index Terms—Eigenfaces,face recognition,independent com-ponent analysis(ICA),principal component analysis(PCA), unsupervised learning.I.I NTRODUCTIONR EDUNDANCY in the sensory input contains structural in-formation about the environment.Barlow has argued that such redundancy provides knowledge[5]and that the role of the sensory system is to develop factorial representations in which these dependencies are separated into independent componentsManuscript received May21,2001;revised May8,2002.This work was supported by University of California Digital Media Innovation Program D00-10084,the National Science Foundation under Grants0086107and IIT-0223052,the National Research Service Award MH-12417-02,the Lawrence Livermore National Laboratories ISCR agreement B291528,and the Howard Hughes Medical Institute.An abbreviated version of this paper appears in Proceedings of the SPIE Symposium on Electronic Imaging:Science and Technology;Human Vision and Electronic Imaging III,Vol.3299,B.Rogowitz and T.Pappas,Eds.,1998.Portions of this paper use the FERET database of facial images,collected under the FERET program of the Army Research Laboratory.The authors are with the University of California-San Diego,La Jolla, CA92093-0523USA(e-mail:marni@;javier@; terry@).T.J.Sejnowski is also with the Howard Hughes Medical Institute at the Salk Institute,La Jolla,CA92037USA.Digital Object Identifier10.1109/TNN.2002.804287(ICs).Barlow also argued that such representations are advan-tageous for encoding complex objects that are characterized by high-order dependencies.Atick and Redlich have also argued for such representations as a general coding strategy for the vi-sual system[3].Principal component analysis(PCA)is a popular unsuper-vised statistical method to find useful image representations. Consider a setofpixels as random variables and the images as outcomes.1Matlab code for the ICA representations is available at /~marni.Face recognition performance was tested using the FERET database [52].Face recognition performances using the ICA representations were benchmarked by comparing them to per-formances using PCA,which is equivalent to the “eigenfaces”representation [51],[57].The two ICA representations were then combined in a single classifier.II.ICAThere are a number of algorithms for performing ICA [11],[13],[14],[25].We chose the infomax algorithm proposed by Bell and Sejnowski [11],which was derived from the principle of optimal information transfer in neurons with sigmoidal transfer functions [27].The algorithm is motivated as follows:Letbeaninvertiblematrix,and-neurons.Each componentof is an invertible squashing function,mapping real numbers intothe(1)The-neurons.The.The goal in Belland Sejnowski’s algorithm is to maximize the mutual informa-tion between theenvironment.The gradient update rule for the weightmatrix,,the ratio between the second andfirst partial derivatives of the activationfunction,,andof this matrixis the derivativeofwith respectto,resulting in the following learning rule[12]:encourages the individual out-puts to move toward statistical independence.When the form1Preliminaryversions of this work appear in [7]and [9].A longer discussionof unsupervised learning for face recognition appears in [6].of the nonlinear transferfunction[12],[42].In practice,thelogistic transfer function has been found sufficient to separate mixtures of natural signals with sparse distributions including sound sources [11].The algorithm is speeded up by including a “sphering”step prior to learning [12].The row meansof(4)This removes the first and the second-order statistics of the data;both the mean and covariances are set to zero and the variances are equalized.When the inputs to ICA are the “sphered”data,the full transformmatrixfor the following generative model of thedata:,the inverse of the weight matrix in Bell and Sejnowski’s algo-rithm,can be interpreted as the source mixing matrix andthevariables can be interpreted as the maximum-likeli-hood (ML)estimates of the sources that generated the data.A.ICA and Other Statistical TechniquesICA and PCA:PCA can be derived as a special case of ICA which uses Gaussian source models.In such case the mixingmatrixis unidentifiable in the sense that there is an infinite number of equally good ML solutions.Among all possible ML solutions,PCA chooses an orthogonal matrix which is optimal in the following sense:1)Regardless of the distributionofwhich areuncorrelatedwith.If the sources are Gaussian,the likelihood of the data depends only on first-and second-order statistics (the covariance matrix).In PCA,the rowsofof natural images,we can scramble their phase spectrum while maintaining their power spectrum.This will dramatically alter the appearance of the images but will not change their second-order statistics.The phase spectrum,not the power spectrum,contains the structural information in images that drives human perception.For example,as illustrated in Fig.1,a face image synthesized from the amplitude spectrum of face A and the phase spectrum of face B will be perceived as an image of face B [45],[53].The fact that PCA is only sensitive to the power spectrum of images suggests that it might not be particularly well suited for representing natural images.The assumption of Gaussian sources implicit in PCA makes it inadequate when the true sources are non-Gaussian.In par-ticular,it has been empirically observed that many natural signals,including speech,natural images,and EEG are better described as linear combinations of sources with long tailed distributions [11],[19].These sources are called “high-kur-tosis,”“sparse,”or “super-Gaussian”sources.Logistic random variables are a special case of sparse source models.When sparse source models are appropriate,ICA has the following potential advantages over PCA:1)It provides a better proba-bilistic model of the data,which better identifies where the data concentratein.3)It finds a not-necessarily orthogonalbasis which may reconstruct the data better than PCA in the presence of noise.4)It is sensitive to high-order statistics in the data,not just the covariance matrix.Fig.2illustrates these points with an example.The figure shows samples from a three-dimensional (3-D)distribution constructed by linearly mixing two high-kurtosis sources.The figure shows the basis vectors found by PCA and by ICA on this problem.Since the three ICA basis vectors are nonorthogonal,they change the relative distance between data points.This change in metric may be potentially useful for classification algorithms,like nearest neighbor,that make decisions based on relative distances between points.The ICA basis also alters the angles between data points,which affects similarity measures such as cosines.Moreover,if an undercomplete basis set is chosen,PCA and ICA may span different subspaces.For example,in Fig.2,when only two dimensions are selected,PCA and ICA choose different subspaces.The metric induced by ICA is superior to PCA in the sense that it may provide a representation more robust to the effect of noise [42].It is,therefore,possible for ICA to be better than PCA for reconstruction in noisy or limited precision environ-ments.For example,in the problem presented in Fig.2,we found that if only 12bits are allowed to represent the PCA and ICA coefficients,linear reconstructions based on ICA are 3dB better than reconstructions based on PCA (the noise power is re-duced by more than half).A similar result was obtained for PCA and ICA subspaces.If only four bits are allowed to represent the first 2PCA and ICA coefficients,ICA reconstructions are 3dB better than PCA reconstructions.In some problems,one can think of the actual inputs as noisy versions of some canon-ical inputs.For example,variations in lighting and expressions can be seen as noisy versions of the canonical image of a person.Having input representations which are robust to noise may po-tentially give us representations that better reflect thedata.Fig.1.(left)Two face images.(Center)The two faces with scrambled phase.(right)Reconstructions with the amplitude of the original face and the phase of the other face.Faces images are from the FERET face database,reprinted with permission from J.Phillips.When the sources models are sparse,ICA is closely related to the so called nonorthogonal “rotation”methods in PCA and factor analysis.The goal of these rotation methods is to find di-rections with high concentrations of data,something very sim-ilar to what ICA does when the sources are sparse.In such cases,ICA can be seen as a theoretically sound probabilistic method to find interesting nonorthogonal “rotations.”ICA and Cluster Analysis:Cluster analysis is a technique for finding regionsinbe a data matrixwithas outcomes (independent trials)of a random experiment.We think of theas the specific value taken by a randomvariableFig.2.(top)Example3-D data distribution and corresponding PC and IC axes.Each axis is a column of the mixing matrix W found by PCA or ICA.Note the PC axes are orthogonal while the IC axes are not.If only two components are allowed,ICA chooses a different subspace than PCA.(bottom left)Distribution of the first PCA coordinates of the data.(bottom right)Distribution of the first ICA coordinates of the data.Note that since the ICA axes are nonorthogonal,relative distances between points are different in PCA than in ICA,as are the angles between points.a distribution.For example,we say that rows ofOur goal in this paper is to find a good set of basis imagesto represent a database of faces.We organize each image in thedatabase as a long vector with as many dimensions as numberof pixels in the image.There are at least two ways in which ICAcan be applied to this problem.1)We can organize our database into a matrixare independent if whenmoving across pixels,it is not possible to predict the valuetaken by the pixel on imageare in the columns ofsponding value taken by pixelFig.4.Image synthesis model for Architecture I.To find a set of IC images,the images in X are considered to be a linear combination of statistically independent basis images,S ,where A is an unknown mixing matrix.The basis images were estimated as the learned ICA output U.Fig.5.Image synthesis model for Architecture II,based on [43]and [44].Each image in the dataset was considered to be a linear combination of underlying basis images in the matrix A .The basis images were each associated with a set of independent “causes,”given by a vector of coefficients in S .The basisimages were estimated by A =Wis the learned ICA weight matrix.individual.The training set was comprised of 50%neutral ex-pression images and 50%change of expression images.The al-gorithms were tested for recognition under three different con-ditions:same session,different expression;different day,same expression;and different day,different expression (see Table I).Coordinates for eye and mouth locations were provided with the FERET database.These coordinates were used to center the face images,and then crop and scale them to60so that the images are in rows and the pixels arein columns,i.e.,such that the rowsof,as shown in Fig.7.These coordinatesare contained in the mixingmatrixlinear combinations of those images,where.Recall that the image synthesis model assumes that the imagesintions (pixels).The use of PCA vectors in the input did not throw away the high-order relationships.These relationships still ex-isted in the data but were not separated.Letindependent source images in the rowsofthat comprised the face imagesinis obtainedbythat comprised,wherewas a minimum squared error approximationofwas,therefore,given by the rows of thematrix3000eigenvec-torsin3000inputmatrixwere updated according to (3)for 1900iterations.The learning rate was initialized at 0.0005and annealed down3B.However,if ICA did not removeall of the second-order dependencies then U will not be precisely orthonormal.4In pilot work,we found that face recognition performance improved with the number of components separated.We chose 200components as the largest number to separate within our processing limitations.5Although PCA already removed the covariances in the data,the variances were not equalized.We,therefore,retained the spheringstep.Fig.8.Twenty-five ICs of the image set obtained by Architecture I,which provide a set of statistically independent basis images (rows of U in Fig.4).ICs are ordered by the class discriminability ratio,r (4).to 0.0001.Training took 90minutes on a Dec Alpha 2100a.Fol-lowing training,a set of statistically independent source images were contained in the rows of the outputmatrixfound by ICA represents a cluster of pixelsthat have similar behavior across images.Each row oftheby the nearest neighbor algorithm,using cosinesas the similarity measure.Coefficient vectors in each test set were assigned the class label of the coefficient vector in the training set that was most similar as evaluated by the cosine of the angle betweenthem.Fig.9.First25PC axes of the image set(columns of P),ordered left to right, top to bottom,by the magnitude of the corresponding eigenvalue.In experiments to date,ICA performs significantly better using cosines rather than Euclidean distance as the similarity measure,whereas PCA performs the same for both.A cosine similarity measure is equivalent to length-normalizing the vectors prior to measuring Euclidean distance when doing nearestneighbor,be the overall meanof acoefficient.For both the PCA and ICA representations,we calculated theratio of between-class to within-classvariabilityclassmeans,andwere calculated separately for each test set,ex-cluding the test images from the analysis.Both the PCA and ICAcoefficients were then ordered by the magnitudeofFig.11.Selection of components by class discriminability,Architecture II. Top:Discriminability of the ICA coefficients(solid lines)and discriminability of the PCA components(dotted lines)for the three test ponents were sorted by the magnitude of r.Bottom:Improvement in face recognition performance for the ICA and PCA representations using subsets of components selected by the class discriminability r.The improvement is indicated by the gray segments at the top of the bars.Face classification performance was compared usingthe,so thatrows represent different pixels and columns represent differentimages.[See(Fig.3right)].This corresponds to treating thecolumnsof.Eachcolumnof(Fig.12).ICA attempts tomake theoutputs,comprised the columns ofthe input data matrix,where each coefficient had zero mean.The Architecture II representation for the training images wastherefore contained in the columnsofwas200for each face image,consisting of the outputsof each of the ICA filters.7The architecture II representation fortest images was obtained in the columnsof.A sample of the basis images is shown6Here,each pixel has zero mean.7An image filter f(x)is defined as f(x)=w1x.Fig.13.Basis images for the ICA-factorial representation(columns of AAA(a)(b)Fig.16.Pairwise mutual information.(a)Mean mutual information between basis images.Mutual information was measured between pairs of gray-level images,PC images,and independent basis images obtained by Architecture I.(b)Mean mutual information between coding variables.Mutual information was measured between pairs of image pixels in gray-level images,PCA coefficients, and ICA coefficients obtained by Architecture II.obtained85%,56%,and44%correct,respectively.Again,as found for200separated components,selection of subsets of components by class discriminability improved the performance of ICA1to86%,78%,and65%,respectively,and had little ef-fect on the performances with the PCA and ICA2representa-tions.This suggests that the results were not simply an artifact due to small sample size.VI.E XAMINATION OF THE ICA R EPRESENTATIONSA.Mutual InformationA measure of the statistical dependencies of the face repre-sentations was obtained by calculating the mean mutual infor-mation between pairs of50basis images.Mutual information was calculatedas(18)where.Again,therewere considerable high-order dependencies remaining in thePCA representation that were reduced by more than50%by theinformation maximization algorithm.The ICA representationsobtained in these simulations are most accurately described notas“independent,”but as“redundancy reduced,”where the re-dundancy is less than half that in the PC representation.B.SparsenessField[19]has argued that sparse distributed representationsare advantageous for coding visual stimuli.Sparse representa-tions are characterized by highly kurtotic response distributions,in which a large concentration of values are near zero,with rareoccurrences of large positive or negative values in the tails.Insuch a code,the redundancy of the input is transformed intothe redundancy of the response patterns of the the individualoutputs.Maximizing sparseness without loss of information isequivalent to the minimum entropy codes discussed by Barlow[5].8Given the relationship between sparse codes and minimumentropy,the advantages for sparse codes as outlined by Field[19]mirror the arguments for independence presented byBarlow[5].Codes that minimize the number of active neuronscan be useful in the detection of suspicious coincidences.Because a nonzero response of each unit is relatively rare,high-order relations become increasingly rare,and therefore,more informative when they are present in the stimulus.Field8Information maximization is consistent with minimum entropy coding.Bymaximizing the joint entropy of the output,the entropies of the individual out-puts tend to be minimized.Fig.18.Recognition successes and failures.{left)Two face image pairs which both ICA algorithms correctly recognized.(right)Two face image pairs that were misidentified by both ICA algorithms.Images from the FERET face database were reprinted with permission from J.Phillips.contrasts this with a compact code such as PCs,in which a few units have a relatively high probability of response,and there-fore,high-order combinations among this group are relatively common.In a sparse distributed code,different objects are rep-resented by which units are active,rather than by how much they are active.These representations have an added advantage in signal-to-noise,since one need only determine which units are active without regard to the precise level of activity.An ad-ditional advantage of sparse coding for face representations is storage in associative memory works with sparse inputs can store more memories and provide more effective re-trieval with partial information[10],[47].The probability densities for the values of the coefficients of the two ICA representations and the PCA representation are shown in Fig.17.The sparseness of the face representations were examined by measuring the kurtosis of the distributions. Kurtosis is defined as the ratio of the fourth moment of the dis-tribution to the square of the second moment,normalized to zero for the Gaussian distribution by subtracting3kurtosis,,,correspond to the similaritymeasure.Performance of the combined classifier is shown in Fig.19.Thecombined classifier improved performance to91.0%,88.9%,and81.0%for the three test cases,respectively.The difference inperformance between the combined ICA classifier and PCA wassignificant for all three test sets(,,;auditory signals into independent sound sources.Under this ar-chitecture,ICA found a basis set of statistically independent im-ages.The images in this basis set were sparse and localized in space,resembling facial features.Architecture II treated pixels as random variables and images as random trials.Under this ar-chitecture,the image coefficients were approximately indepen-dent,resulting in a factorial face code.Both ICA representations outperformed the“eigenface”rep-resentation[57],which was based on PC analysis,for recog-nizing images of faces sampled on a different day from the training images.A classifier that combined the two ICA rep-resentations outperformed eigenfaces on all test sets.Since ICA allows the basis images to be nonorthogonal,the angles and dis-tances between images differ between ICA and PCA.Moreover, when subsets of axes are selected,ICA defines a different sub-space than PCA.We found that when selecting axes according to the criterion of class discriminability,ICA-defined subspaces encoded more information about facial identity than PCA-de-fined subspaces.ICA representations are designed to maximize information transmission in the presence of noise and,thus,they may be more robust to variations such as lighting conditions,changes in hair,make-up,and facial expression,which can be considered forms of noise with respect to the main source of information in our face database:the person’s identity.The robust recogni-tion across different days is particularly encouraging,since most applications of automated face recognition contain the noise in-herent to identifying images collected on a different day from the sample images.The purpose of the comparison in this paper was to examine ICA and PCA-based representations under identical conditions.A number of methods have been presented for enhancing recognition performance with eigenfaces(e.g.,[41]and[51]). ICA representations can be used in place of eigenfaces in these techniques.It is an open question as to whether these techniques would enhance performance with PCA and ICA equally,or whether there would be interactions between the type of enhancement and the representation.A number of research groups have independently tested the ICA representations presented here and in[9].Liu and Wech-sler[35],and Yuen and Lai[61]both supported our findings that ICA outperformed PCA.Moghaddam[41]employed Euclidean distance as the similarity measure instead of cosines.Consistent with our findings,there was no significant difference between PCA and ICA using Euclidean distance as the similarity mea-sure.Cosines were not tested in that paper.A thorough compar-ison of ICA and PCA using a large set of similarity measures was recently conducted in[17],and supported the advantage of ICA for face recognition.In Section V,ICA provided a set of statistically independent coefficients for coding the images.It has been argued that such a factorial code is advantageous for encoding complex objects that are characterized by high-order combinations of features, since the prior probability of any combination of features can be obtained from their individual probabilities[2],[5].According to the arguments of both Field[19]and Barlow[5],the ICA-fac-torial representation(Architecture II)is a more optimal object representation than the Architecture I representation given its sparse,factorial properties.Due to the difference in architec-ture,the ICA-factorial representation always had fewer training samples to estimate the same number of free parameters as the Architecture I representation.Fig.16shows that the residual de-pendencies in the ICA-factorial representation were higher than in the Architecture I representation.The ICA-factorial repre-sentation may prove to have a greater advantage given a much larger training set of images.Indeed,this prediction has born out in recent experiments with a larger set of FERET face im-ages[17].It also is possible that the factorial code representa-tion may prove advantageous with more powerful recognition engines than nearest neighbor on cosines,such as a Bayesian classifier.An image set containing many more frontal view im-ages of each subject collected on different days will be needed to test that hypothesis.In this paper,the number of sources was controlled by re-ducing the dimensionality of the data through PCA prior to per-forming ICA.There are two limitations to this approach[55]. The first is the reverse dimensionality problem.It may not be possible to linearly separate the independent sources in smaller subspaces.Since we retained200dimensions,this may not have been a serious limitation of this implementation.Second,it may not be desirable to throw away subspaces of the data with low power such as the higher PCs.Although low in power,these sub-spaces may contain ICs,and the property of the data we seek is independence,not amplitude.Techniques have been proposed for separating sources on projection planes without discarding any ICs of the data[55].Techniques for estimating the number of ICs in a dataset have also recently been proposed[26],[40]. The information maximization algorithm employed to per-form ICA in this paper assumed that the underlying“causes”of the pixel gray-levels in face images had a super-Gaussian (peaky)response distribution.Many natural signals,such as sound sources,have been shown to have a super-Gaussian distribution[11].We employed a logistic source model which has shown in practice to be sufficient to separate natural signals with super-Gaussian distributions[11].The under-lying“causes”of the pixel gray-levels in the face images are unknown,and it is possible that better results could have been obtained with other source models.In particular,any sub-Gaussian sources would have remained mixed.Methods for separating sub-Gaussian sources through information maximization have been developed[30].A future direction of this research is to examine sub-Gaussian components of face images.The information maximization algorithm employed in this work also assumed that the pixel values in face images were generated from a linear mixing process.This linear approxima-tion has been shown to hold true for the effect of lighting on face images[21].Other influences,such as changes in pose and ex-pression may be linearly approximated only to a limited extent. Nonlinear ICA in the absence of prior constraints is an ill-condi-tioned problem,but some progress has been made by assuming a linear mixing process followed by parametric nonlinear func-tions[31],[59].An algorithm for nonlinear ICA based on kernel methods has also recently been presented[4].Kernel methods have already shown to improve face recognition performance。

亚慢性铝暴露对大鼠学习记忆能力及大脑皮质β-淀粉样蛋白表达的影响

亚慢性铝暴露对大鼠学习记忆能力及大脑皮质β-淀粉样蛋白表达的影响
cr c - m l dp o i i t M e d T iy t oma D rt w r rn o l dv e t tec nrl ru o i 1 a y i rt n nr s ta 3 l o e a.  ̄o s hr -w l S s e a d m y ii d i o h o t o p t e a e d n og
adte02 , . ,ad 1 8mgc A ( 1- xoe ru s(= ah.T er si fu op cie t pr n . 05 n . ,g l ) epsdgop n 8ec) h a org u sr evdi r ei h 7 4 0 l ma3 tn r e na —
梁瑞峰 李伟庆 牛 侨
【 要 】 目的 研究铝对大鼠学习记忆能力及脑组织 B 淀粉样蛋白( 1表达的影响。方法 选用健 摘 一 A) 3
康清 洁级雄性 (D) 鼠 3 S 大 2只 , 按体质量 随机分为 4组 , 即对照组 , 麦芽酚铝 [ lm )]0 7 0 4 1 8m /g染 A ( a , . ,. ,. gk 1 2 5 0
ecsi tem a sa el ec .8m /sA m )gopa di t m of dn lt r .4a d1 8 ne enecp a nyi 1 s 1( a 3 ru et et i igpa om i 05 n . nh t n 0 k 1 n nh i n f n 0 m /sA 13 op o ae oeo nrl ru P O0 )A ogwt i e 1 a)d sg, h x rsi s 1 )g u s mprdt t s f o t opC< .5. ln i hg r (1 3 oae teep s n k ma r c oh c og h h Am e o o 3 0p ti dcesdi 0 4 ad 1 8m /sA ma 3ep sdgop n eepes no 1 2p ti i— f 4 r e erae . . sk 1( 1 - xoe ru sadt xrsi f 3 r e A1 o n n 5 n 0 ) h o A 4 o nn c ae l ru sa o prds nf at i ecnrl ru P 00)Co c s n lm nm o ss nf e r sdi a op, scm ae i icnl wt t ot op(< . . n l i Au iu s w i i nl g gi y hh og 5 uo h g -

博士生发一篇information fusion

博士生发一篇information fusion

博士生发一篇information fusion Information Fusion: Enhancing Decision-Making through the Integration of Data and KnowledgeIntroduction:Information fusion, also known as data fusion or knowledge fusion, is a rapidly evolving field in the realm of decision-making. It involves the integration and analysis of data and knowledge from various sources to generate meaningful and accurate information. In this article, we will delve into the concept of information fusion, explore its key components, discuss its application in different domains, and highlight its significance in enhancingdecision-making processes.1. What is Information Fusion?Information fusion is the process of combining data and knowledge from multiple sources to provide a comprehensive and accurate representation of reality. The goal is to overcome the limitations inherent in individual sources and derive improved insights and predictions. By assimilating diverse information,information fusion enhances situational awareness, reduces uncertainty, and enables intelligent decision-making.2. Key Components of Information Fusion:a. Data Sources: Information fusion relies on various data sources, which can include sensors, databases, social media feeds, and expert opinions. These sources provide different types of data, such as text, images, audio, and numerical measurements.b. Data Processing: Once data is collected, it needs to be processed to extract relevant features and patterns. This step involves data cleaning, transformation, normalization, and aggregation to ensure compatibility and consistency.c. Information Extraction: Extracting relevant information is a crucial step in information fusion. This includes identifying and capturing the crucial aspects of the data, filtering out noise, and transforming data into knowledge.d. Knowledge Representation: The extracted information needs to be represented in a meaningful way for integration and analysis.Common methods include ontologies, semantic networks, and knowledge graphs.e. Fusion Algorithms: To integrate the information from various sources, fusion algorithms are employed. These algorithms can be rule-based, model-based, or data-driven, and they combine multiple pieces of information to generate a unified and coherent representation.f. Decision-Making Processes: The ultimate goal of information fusion is to enhance decision-making. This requires the fusion of information with domain knowledge and decision models to generate insights, predictions, and recommendations.3. Applications of Information Fusion:a. Defense and Security: Information fusion plays a critical role in defense and security applications, where it improves intelligence analysis, surveillance, threat detection, and situational awareness. By integrating information from multiple sources, such as radars, satellites, drones, and human intelligence, it enables effective decision-making in complex and dynamic situations.b. Health Monitoring: In healthcare, information fusion is used to monitor patient health, combine data from different medical devices, and provide real-time decision support to medical professionals. By fusing data from wearables, electronic medical records, and physiological sensors, it enables early detection of health anomalies and improves patient care.c. Smart Cities: Information fusion offers enormous potential for the development of smart cities. By integrating data from multiple urban systems, such as transportation, energy, and public safety, it enables efficient resource allocation, traffic management, and emergency response. This improves the overall quality of life for citizens.d. Financial Markets: In the financial sector, information fusion helps in the analysis of large-scale and diverse datasets. By integrating data from various sources, such as stock exchanges, news feeds, and social media mentions, it enables better prediction of market trends, risk assessment, and investmentdecision-making.4. Significance of Information Fusion:a. Enhanced Decision-Making: Information fusion enables decision-makers to obtain comprehensive and accurate information, reducing uncertainty and improving the quality of decisions.b. Improved Situational Awareness: By integrating data from multiple sources, information fusion enhances situational awareness, enabling timely and informed responses to dynamic and complex situations.c. Risk Reduction: By combining information from diverse sources, information fusion improves risk assessment capabilities, enabling proactive and preventive measures.d. Resource Optimization: Information fusion facilitates the efficient utilization of resources by providing a holistic view of the environment and enabling optimization of resource allocation.Conclusion:In conclusion, information fusion is a powerful approach to enhance decision-making by integrating data and knowledge from multiple sources. Its key components, including data sources, processing, extraction, knowledge representation, fusion algorithms, and decision-making processes, together create a comprehensive framework for generating meaningful insights. By applying information fusion in various domains, such as defense, healthcare, smart cities, and financial markets, we can maximize the potential of diverse information sources to achieve improved outcomes.。

ICA算法介绍

ICA算法介绍

一种基于独立分量分析的识别算法引言在模式识别领域中,仅获得待识别目标的原始数据是不够的,需要从原始数据中发掘潜在的本质信息。

通常待识别目标的原始数据的数据量相当大,处于一个高维空间中,直接用原始数据进行分类识别,计算复杂度高且影响了分类器的性能。

为了有效实现分类识别,需要从待识别目标的原始数据映射到一个低维空间,提取到最大可能反映待识别目标的本质信息。

目前常用的提取特征的方法有主分量分析(PCA)和独立分量分析(ICA)。

(1)PCA(Principal Component Analysis)是一种最小均方意义上的最优变换,它的目标是去除输入随机向量之间的相关性,突出原始数据中的隐含特性。

其优势在于数据压缩以及对多维数据进行降维。

但PCA方法利用二阶的统计信息进行计算,并未考虑到信号数据的高阶统计特性,变换后的数据间仍有可能存在高阶冗余信息。

[文献1,2](2)ICA(Independent Component Analysis)是20世纪90年代Jutten和Herault 提出的一种新的信号处理方法。

该方法的目的是将观察到的数据进行某种线性分解,使其分解成统计独立的成分。

从统计分析的角度看,ICA和PCA同属多变量数据分析方法,但ICA处理得到的各个分量不仅去除了相关性,还是相互统计独立的,而且是非高斯分布。

因此,ICA能更加全面揭示数据间的本质结构。

所以,ICA在许多方面对传统方法的重要突破使得其越来越成为信号处理中一个极具潜力的工具,并已在模式识别、信号除噪、图像处理等诸多领域中得到了广泛应用。

[文献3,4,5]原理[文献6,7,8](1)ICA步骤1、标准化:数据标准化的主要目的是从观测数据中除去其均值。

2、白化:白化的主要目的是去除数据的相关性。

数据的白化处理可以使随后的计算大为简化,并且还可以压缩数据。

我们通常使用特征值分解的方法进行数据的白化。

3、ICA判据:在设计ICA算法的过程中,最实际的困难是如何可靠地验证源信号分量间的独立性。

组合局部多通道Gabor滤波器和ICA的人脸描述与识别

组合局部多通道Gabor滤波器和ICA的人脸描述与识别
第2 第1 期 5卷 1
20 0 8年 1 1月
计 算 机 应 用 研 究
Ap l ain Ree r h o o ue s pi t sa c fC mp tr c o
V0 . 5 No 1 12 . 1 NO .2 0 V 0 8
组 合 局 部 多 通 道 Ga o 滤 波 器 和 br IA的 人脸 描 述 与 识 别 C
Meh d frfc e ci t n a d r c g i o y to o a e d s r i n e o n t n b p o i
lc lmu t— h n e b rfle sa d I o a lic a n lGa o tr n CA i
G a , HE Mig y , B i AO T o n 。i AIL n
(h a x P oic e aoaoyo nom tnA qit n& P oes g col Eet nc& I omai N a w sr o t h w l n— S a n i rv e yL brtr f rai cusi n K fI o io rcsn ,Sh o o l r i i f co n r tn, o h eenP l e n a i f o t yc U vr t, ia 10 2,C ia ei X ’ n7 0 7 sy so dn cl ao antd a L MM) w i eecnt c dt hge i es nl uts l t et thicr pn igl a G br gi em p( G ic e o a r e o m u , hc w r o su t i r m ni a h r e o h d o fa r vco .N x, d cdtedm ni ai f hs et ym a s f r cpl o p nn nls P A) ial, et e etr etr u e i es nlyo teevc r b en i ia cm oe t a i C .Fn y u s e h o t o s op n a y s( l

形状记忆材料有助于整形外科重构

形状记忆材料有助于整形外科重构

形状记忆材料有助于整形外科重构
爱莲
【期刊名称】《《军民两用技术与产品》》
【年(卷),期】2008(000)001
【摘要】据美国MedShape方案解决研究所宣布,形状记忆聚合物和合金可以模铸成型人体骨骼和组织,用于多种类型的结构重造外科手术。

目前,一种称作ShapeLoc的产品已经被用于膝盖外科手术。

通常做外科手术时,先在骨骼上钻孔,然后在孔洞里添加塑料或金属填充物。

而采用ShapeLoc形状记忆聚合物,可沿
着肌肉腱的方向切口并植入形状记忆聚合物,移植材料与腱的吻合性非常好。

【总页数】1页(P21)
【作者】爱莲
【作者单位】
【正文语种】中文
【中图分类】TG139.6
【相关文献】
1.饮誉中外的材料科学家我国材料相变、形状记忆材料、热力学等研究领域的开拓者——中国科学院院士徐祖耀 [J], 无
2.我国形状记忆材料、生物医学材料等方面的知名中青年专家中国仪表功能材料学会常务理事、《功能材料》期刊编委哈尔滨工业大学材料科学与工程学院材料物理与化学系主任博士生导师蔡伟教授 [J], 杨亲民;
3.饮誉中外的材料科学家,我国材料相变、形状记忆材料、热力学等领域的开拓者、
中国科学院院士——徐祖耀 [J], 无
4.加强国际知识产权保护有助于我国重构全球价值链吗?——以我国制造业为例[J], 钱馨蕾;武舜臣
5.形状记忆聚氨酯/碳纳米管复合材料在形状记忆和热处理过程中导电性能变化 [J], 费国霞;龚启春;夏和生
因版权原因,仅展示原文概要,查看原文内容请购买。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
2. DATA SET DESCRIPTION
From the perspective of psychology and anthropology, the evolution of human brain is reflected by the human language that represents a major milestone in the human evolution. The language alone does not seem to be sufficient for successful social (human-to-human) interaction. Therefore. the evolution o f a nonverbal signaling system, such as the facial expression mechanism has captured an increased attention in psychology and anthropology for a better understanding of the social context [I]. Unlike the human-to-human interaction that takes into consideration the facial expression, human computer interaction systems that use facial expression analysis have been introduced only recently 121, [3], [4] and [51. Reliable facial expression modeling and, particularly, human emotions recognition, are challenging tasks since there is no pure emotion. A particular emotion rather is a complex combination of several facial expressions. Moreover, the emotions can vary in intensity, which makes emotion recognition even more difficult. Basically, there are two types of approaches to cope with facial expression recognition: appearance-based methods and geometric feature-based methods. For the first method the fiducial points ofthe face are selected either manually [b] or automatically
ABSTRACT Two hybrid systems for classifying seven categories of human facial expression are proposed. The first system combines independent component analysis (ICA) and support vector machines (SVMs). The original face image database is decomposed into linear combinations of several basis images, where the corresponding coefficients of these combinations are fed up into SVMs instead of an original feature vector comprised of grayscale image pixel values. The classification accuracy of this system is compared against that of baseline techniques that combine ICA with either two-class cosine similarity classifiers or twoclass maximum correlation classifiers, when we classify facial expressions into these seven classes. We found that, ICA decomposition combined with SVMs outperforms the aforementioned baseline classifiers. The second system proposed operates in two steps: first, a set of Gas applied to the original face image database bor wavelets (GWs) i and, second, the new features obtained are classified by using either SVMs or cosine similarity classifiers or maximum correlation classifier. The best facial expression recognition rate is achieved when Gabor wavelets are combined with SVMs.
This work was supported by the European Union Research Training Network “Multi-modal Human-Computer Interartion (HPRN-CT-2000001 I I).
The database we used in our experiments contains N = 2 I3 images of Japanese female facial expression (IAFFE) [ 9 ] .Ten expressers posed 3 or 4 examples for each of the 7 basic facial expressions (anger, disgust, fear, happiness, neutral, sadness, surprise) yielding a total of 213 images of facial expressions. Each original image has been manually cropped and aligned with respect to the upper lei? comer. The cropped image size is I60 x 120. Each cropped image has been downsampled by a factor of 2 yielding an image of 80 x 60 pixels.
ICA AND GABOR REPRESENTATION FOR FACIAL EXPRESSION RECOGNITION
I . Buciu C. Kotropoulos
and 1 . Pitas
GR-54 124, Thessaloniki, Box 45 1, Greece, {nelu,costas,pitas}@zeus.csd.auth.gr
1. 1NTRODUformatics, Aristotle University of Thessaloniki
[7]. The face images are convolved with Gabor filters and the responses extracted from the face images at fiducial points form vectors that are futher used for classification. Alternatively, the Gabor filters can he applied to the entire face image instead to specific face regions. Regarding the geometric feature-based methods, the positions of a set of fiducial points in a face form a feature vector that represents the face geometry. Although the appearance-based methods (especially Gabor wavelets) seem to yield a reasonable recognition rate, the highest recognition rate is obtained, when these two main approaches are combined 161, 181. Several other techniques for the recognition of 6 single upper action units (AUs) and 6 lower face AUs are described in 141. These works suggest that the GW-based method can achieve high recognition accuracy for facial expression classification. Similar results are obtained by using ICA. The paper is organized as follows. AAer a brief image database description in Section 2, the independent component analysis is described in Section 3. In Section 4, we present the Gabor image representation. ICA/SVMs and GWdSVMs system, training and testing procedures and the experiments conducted are described in Section 5. Conclusions are drawn in Section 6.
相关文档
最新文档