Face Segmentation For Identification Using Hidden Markov Models Ferdinando Samaria
人脸识别 面部 数字图像处理相关 中英对照 外文文献翻译 毕业设计论文 高质量人工翻译 原文带出处
人脸识别相关文献翻译,纯手工翻译,带原文出处(原文及译文)如下翻译原文来自Thomas David Heseltine BSc. Hons. The University of YorkDepartment of Computer ScienceFor the Qualification of PhD. — September 2005 -《Face Recognition: Two-Dimensional and Three-Dimensional Techniques》4 Two-dimensional Face Recognition4.1 Feature LocalizationBefore discussing the methods of comparing two facial images we now take a brief look at some at the preliminary processes of facial feature alignment. This process typically consists of two stages: face detection and eye localisation. Depending on the application, if the position of the face within the image is known beforehand (fbr a cooperative subject in a door access system fbr example) then the face detection stage can often be skipped, as the region of interest is already known. Therefore, we discuss eye localisation here, with a brief discussion of face detection in the literature review(section 3.1.1).The eye localisation method is used to align the 2D face images of the various test sets used throughout this section. However, to ensure that all results presented are representative of the face recognition accuracy and not a product of the performance of the eye localisation routine, all image alignments are manually checked and any errors corrected, prior to testing and evaluation.We detect the position of the eyes within an image using a simple template based method. A training set of manually pre-aligned images of feces is taken, and each image cropped to an area around both eyes. The average image is calculated and used as a template.Figure 4-1 - The average eyes. Used as a template for eye detection.Both eyes are included in a single template, rather than individually searching for each eye in turn, as the characteristic symmetry of the eyes either side of the nose, provides a useful feature that helps distinguish between the eyes and other false positives that may be picked up in the background. Although this method is highly susceptible to scale(i.e. subject distance from the camera) and also introduces the assumption that eyes in the image appear near horizontal. Some preliminary experimentation also reveals that it is advantageous to include the area of skin justbeneath the eyes. The reason being that in some cases the eyebrows can closely match the template, particularly if there are shadows in the eye-sockets, but the area of skin below the eyes helps to distinguish the eyes from eyebrows (the area just below the eyebrows contain eyes, whereas the area below the eyes contains only plain skin).A window is passed over the test images and the absolute difference taken to that of the average eye image shown above. The area of the image with the lowest difference is taken as the region of interest containing the eyes. Applying the same procedure using a smaller template of the individual left and right eyes then refines each eye position.This basic template-based method of eye localisation, although providing fairly preciselocalisations, often fails to locate the eyes completely. However, we are able to improve performance by including a weighting scheme.Eye localisation is performed on the set of training images, which is then separated into two sets: those in which eye detection was successful; and those in which eye detection failed. Taking the set of successful localisations we compute the average distance from the eye template (Figure 4-2 top). Note that the image is quite dark, indicating that the detected eyes correlate closely to the eye template, as we would expect. However, bright points do occur near the whites of the eye, suggesting that this area is often inconsistent, varying greatly from the average eye template.Figure 4-2 一Distance to the eye template for successful detections (top) indicating variance due to noise and failed detections (bottom) showing credible variance due to miss-detected features.In the lower image (Figure 4-2 bottom), we have taken the set of failed localisations(images of the forehead, nose, cheeks, background etc. falsely detected by the localisation routine) and once again computed the average distance from the eye template. The bright pupils surrounded by darker areas indicate that a failed match is often due to the high correlation of the nose and cheekbone regions overwhelming the poorly correlated pupils. Wanting to emphasise the difference of the pupil regions for these failed matches and minimise the variance of the whites of the eyes for successful matches, we divide the lower image values by the upper image to produce a weights vector as shown in Figure 4-3. When applied to the difference image before summing a total error, this weighting scheme provides a much improved detection rate.Figure 4-3 - Eye template weights used to give higher priority to those pixels that best represent the eyes.4.2 The Direct Correlation ApproachWe begin our investigation into face recognition with perhaps the simplest approach,known as the direct correlation method (also referred to as template matching by Brunelli and Poggio [29 ]) involving the direct comparison of pixel intensity values taken from facial images. We use the term "Direct Conelation, to encompass all techniques in which face images are compared directly, without any form of image space analysis, weighting schemes or feature extraction, regardless of the distance metric used. Therefore, we do not infer that Pearson's correlation is applied as the similarity function (although such an approach would obviously come under our definition of direct correlation). We typically use the Euclidean distance as our metric in these investigations (inversely related to Pearson's correlation and can be considered as a scale and translation sensitive form of image correlation), as this persists with the contrast made between image space and subspace approaches in later sections.Firstly, all facial images must be aligned such that the eye centres are located at two specified pixel coordinates and the image cropped to remove any background information. These images are stored as greyscale bitmaps of 65 by 82 pixels and prior to recognition converted into a vector of 5330 elements (each element containing the corresponding pixel intensity value). Each corresponding vector can be thought of as describing a point within a 5330 dimensional image space. This simple principle can easily be extended to much larger images: a 256 by 256 pixel image occupies a single point in 65,536-dimensional image space and again, similar images occupy close points within that space. Likewise, similar faces are located close together within the image space, while dissimilar faces are spaced far apart. Calculating the Euclidean distance d, between two facial image vectors (often referred to as the query image q, and gallery image g), we get an indication of similarity. A threshold is then applied to make the final verification decision.d . q - g ( threshold accept ) (d threshold ⇒ reject ). Equ. 4-14.2.1 Verification TestsThe primary concern in any face recognition system is its ability to correctly verify a claimed identity or determine a person's most likely identity from a set of potential matches in a database. In order to assess a given system's ability to perform these tasks, a variety of evaluation methodologies have arisen. Some of these analysis methods simulate a specific mode of operation (i.e. secure site access or surveillance), while others provide a more mathematicaldescription of data distribution in some classification space. In addition, the results generated from each analysis method may be presented in a variety of formats. Throughout the experimentations in this thesis, we primarily use the verification test as our method of analysis and comparison, although we also use Fisher's Linear Discriminant to analyse individual subspace components in section 7 and the identification test for the final evaluations described in section 8. The verification test measures a system's ability to correctly accept or reject the proposed identity of an individual. At a functional level, this reduces to two images being presented for comparison, fbr which the system must return either an acceptance (the two images are of the same person) or rejection (the two images are of different people). The test is designed to simulate the application area of secure site access. In this scenario, a subject will present some form of identification at a point of entry, perhaps as a swipe card, proximity chip or PIN number. This number is then used to retrieve a stored image from a database of known subjects (often referred to as the target or gallery image) and compared with a live image captured at the point of entry (the query image). Access is then granted depending on the acceptance/rej ection decision.The results of the test are calculated according to how many times the accept/reject decision is made correctly. In order to execute this test we must first define our test set of face images. Although the number of images in the test set does not affect the results produced (as the error rates are specified as percentages of image comparisons), it is important to ensure that the test set is sufficiently large such that statistical anomalies become insignificant (fbr example, a couple of badly aligned images matching well). Also, the type of images (high variation in lighting, partial occlusions etc.) will significantly alter the results of the test. Therefore, in order to compare multiple face recognition systems, they must be applied to the same test set.However, it should also be noted that if the results are to be representative of system performance in a real world situation, then the test data should be captured under precisely the same circumstances as in the application environment.On the other hand, if the purpose of the experimentation is to evaluate and improve a method of face recognition, which may be applied to a range of application environments, then the test data should present the range of difficulties that are to be overcome. This may mean including a greater percentage of6difficult9 images than would be expected in the perceived operating conditions and hence higher error rates in the results produced. Below we provide the algorithm for executing the verification test. The algorithm is applied to a single test set of face images, using a single function call to the face recognition algorithm: CompareF aces(F ace A, FaceB). This call is used to compare two facial images, returning a distance score indicating how dissimilar the two face images are: the lower the score the more similar the two face images. Ideally, images of the same face should produce low scores, while images of different faces should produce high scores.Every image is compared with every other image, no image is compared with itself and nopair is compared more than once (we assume that the relationship is symmetrical). Once two images have been compared, producing a similarity score, the ground-truth is used to determine if the images are of the same person or different people. In practical tests this information is often encapsulated as part of the image filename (by means of a unique person identifier). Scores are then stored in one of two lists: a list containing scores produced by comparing images of different people and a list containing scores produced by comparing images of the same person. The final acceptance/rejection decision is made by application of a threshold. Any incorrect decision is recorded as either a false acceptance or false rejection. The false rejection rate (FRR) is calculated as the percentage of scores from the same people that were classified as rejections. The false acceptance rate (FAR) is calculated as the percentage of scores from different people that were classified as acceptances.For IndexA = 0 to length(TestSet) For IndexB = IndexA+l to length(TestSet) Score = CompareFaces(TestSet[IndexA], TestSet[IndexB]) If IndexA and IndexB are the same person Append Score to AcceptScoresListElseAppend Score to RejectScoresListFor Threshold = Minimum Score to Maximum Score:FalseAcceptCount, FalseRejectCount = 0For each Score in RejectScoresListIf Score <= ThresholdIncrease FalseAcceptCountFor each Score in AcceptScoresListIf Score > ThresholdIncrease FalseRejectCountF alse AcceptRate = FalseAcceptCount / Length(AcceptScoresList)FalseRej ectRate = FalseRejectCount / length(RejectScoresList)Add plot to error curve at (FalseRejectRate, FalseAcceptRate)These two error rates express the inadequacies of the system when operating at aspecific threshold value. Ideally, both these figures should be zero, but in reality reducing either the FAR or FRR (by altering the threshold value) will inevitably resultin increasing the other. Therefore, in order to describe the full operating range of a particular system, we vary the threshold value through the entire range of scores produced. The application of each threshold value produces an additional FAR, FRR pair, which when plotted on a graph produces the error rate curve shown below.False Acceptance Rate / %Figure 4-5 - Example Error Rate Curve produced by the verification test.The equal error rate (EER) can be seen as the point at which FAR is equal to FRR. This EER value is often used as a single figure representing the general recognition performance of a biometric system and allows for easy visual comparison of multiple methods. However, it is important to note that the EER does not indicate the level of error that would be expected in a real world application. It is unlikely that any real system would use a threshold value such that the percentage of false acceptances were equal to the percentage of false rejections. Secure site access systems would typically set the threshold such that false acceptances were significantly lower than false rejections: unwilling to tolerate intruders at the cost of inconvenient access denials.Surveillance systems on the other hand would require low false rejection rates to successfully identify people in a less controlled environment. Therefore we should bear in mind that a system with a lower EER might not necessarily be the better performer towards the extremes of its operating capability.There is a strong connection between the above graph and the receiver operating characteristic (ROC) curves, also used in such experiments. Both graphs are simply two visualisations of the same results, in that the ROC format uses the True Acceptance Rate(TAR), where TAR = 1.0 - FRR in place of the FRR, effectively flipping the graph vertically. Another visualisation of the verification test results is to display both the FRR and FAR as functions of the threshold value. This presentation format provides a reference to determine the threshold value necessary to achieve a specific FRR and FAR. The EER can be seen as the point where the two curves intersect.Figure 4-6 - Example error rate curve as a function of the score threshold The fluctuation of these error curves due to noise and other errors is dependant on the number of face image comparisons made to generate the data. A small dataset that only allows fbr a small number of comparisons will results in a jagged curve, in which large steps correspond to the influence of a single image on a high proportion of the comparisons made. A typical dataset of 720 images (as used in section 4.2.2) provides 258,840 verification operations, hence a drop of 1% EER represents an additional 2588 correct decisions, whereas the quality of a single image could cause the EER to fluctuate by up to 0.28.422 ResultsAs a simple experiment to test the direct correlation method, we apply the technique described above to a test set of 720 images of 60 different people, taken from the AR Face Database [ 39 ]. Every image is compared with every other image in the test set to produce a likeness score, providing 258,840 verification operations from which to calculate false acceptance rates and false rejection rates. The error curve produced is shown in Figure 4-7.Figure 4-7 - Error rate curve produced by the direct correlation method using no image preprocessing.We see that an EER of 25.1% is produced, meaning that at the EER threshold approximately one quarter of all verification operations carried out resulted in an incorrect classification. Thereare a number of well-known reasons for this poor level of accuracy. Tiny changes in lighting, expression or head orientation cause the location in image space to change dramatically. Images in face space are moved far apart due to these image capture conditions, despite being of the same person's face. The distance between images of different people becomes smaller than the area of face space covered by images of the same person and hence false acceptances and false rejections occur frequently. Other disadvantages include the large amount of storage necessaryfor holding many face images and the intensive processing required for each comparison, making this method unsuitable fbr applications applied to a large database. In section 4.3 we explore the eigenface method, which attempts to address some of these issues.4二维人脸识别4.1功能定位在讨论比较两个人脸图像,我们现在就简要介绍的方法一些在人脸特征的初步调整过程。
人脸面部特征提取技术的研究FRT毕业设计
人脸面部特征提取技术的研究FRT毕业设计人脸识别技术(FRT) 计算机科学与技术专业毕业设计(论文)人脸面部特征提取技术的研究摘要人脸识别技术(FRT)是当今模式识别和人工智能领域的一个重要研究方向。
虽然人脸识别的研究已有很长的历史,各种人脸识别的技术也很多,但由于人脸属于复杂模式而且容易受表情、肤色和衣着的影响,目前还没有一种人脸识别技术是公认快速有效的.本文主要讨论了人脸识别技术的一些常用方法,对现有的人脸检测与定位、人脸特征提取、人脸识别的方法进行分析和讨论,最后对人脸识别未来的发展和应用做了展望。
关键字:人脸识别,特征定位,特征提取ABSTRACTNowadays the face recognition technology (FRT) is a hot issue in the field of pattern recognition and artificial intelligence.Although this research already has a long history and many different recognition methods are proposed,there is still no effective method with low cost an d high precision.Human face is a complex pattern an d is easily affected by the expression,complexion and clothes.In this paper,some general research are discussed,including methods of face detection and location,features abstraction,and face recognition.Then we analyze and forecast the face recognition’s application and its prospects.Keywords: Face Recognition Technology, Face location,Features abstraction第1章绪论所谓人脸识别,是指对输入的人脸图像或者视频.判断其中是否存在人脸,如果存在人脸,则进一步给出每张人脸的位置、大小和各个主要面部器官的位置信息。
Face Detection EE368 Final Project 人脸检测方法 ee368最终的项目— 11页PPT文档
Suspicioቤተ መጻሕፍቲ ባይዱs rectangles
Face samples (16x16 pixels) 12 eigen faces
Eigen face method (2)
Image segmentation
Segmentation based on connected areas many face candidate rectangles put strict restrictions on them face!!
Eigen face method (1)
谢谢!
240x320x1
Image Segmentation
Yes Face!!
No
Edge-Based Preprocessing
Decision based on ratios, sizes
Eigenface Face!! method
Skin colors?
- Fortunately, skin-colors form a cluster in YCbCr color space - Thus, we can approximate the skin color region with several lines.
Face Detection
- EE368 Final Project -
Stanford University, Dept. of EE Taesang Yoo, Youngjae Kim (Group 7)
Algorithm Outline
基于肤色分割的人脸检测算法研究
—179—基于肤色分割的人脸检测算法研究刘正光,刘 洁(天津大学自动化系,天津 300072)摘 要:介绍了目前人脸检测领域检测速度最快的Boosted Cascade 人脸检测算法。
该算法在进行人脸检测时没有考虑到肤色因素,在具体识别过程中,有些可利用肤色信息很快排除的区域,在Boosted Cascade 算法中却没有被排除掉。
针对该算法的缺陷提出了一种改进算法,即利用Boosted Cascade 人脸检测算法,检测出人脸的候选区域,通过人脸肤色模型进行验证,如果候选区域的像素符合人脸的肤色模型的程度到达某一数值,则接受该区域,即认为该区域是人脸;否则排除该区域。
改进后的算法能够有效地提高检测的正确率,减小出现检测错误的几率,在不影响识别速度的情况下,提高了检测效率。
关键词:人脸检测;Boosted Cascade 算法;肤色分割Research on Face Detection Algorithm Based onComplexional SegmentationLIU Zhengguang, LIU Jie(Department of Automation, Tianjin University, Tianjin 300072)【Abstract 】This paper presents the highest speed face detection algorithm ——Boosted Cascade face detection algorithm. Since the Boosted Cascade algorithm hasn’t taken the complexion into consider, it sometimes makes mistakes that can be easily solved when adding the complexional information to the algorithm. In order to solve the problems, this paper presents a solution of improving Boosted Cascade algorithm, that is detecting the possible face areas with the Boosted Cascade algorithm, using the model of complexional segmentation to verify the result of recognition.If the degree that the pixels of possible face areas accord with the complexional mode exceeds a certain number, then accept them, or abandon them. The improved algorithm has high detection accuracy, and it can improve the detection efficiency without slowing down the speed. 【Key words 】Face detection; Boosted Cascade algorithm; Complexional segmentation计 算 机 工 程Computer Engineering 第33卷 第4期 Vol.33 No.4 2007年2月February 2007·人工智能及识别技术·文章编号:1000—3428(2007)04—0179—03文献标识码:A 中图分类号:TP391.41在诸多的身份验证技术中,生物特征识别技术被公认为是最具应用潜力的识别技术之一。
Discriminative Regions for Human Face Detection
ACCV2002:The5th Asian Conference on Computer Vision,23–25January2002,Melbourne,Australia.Discriminative Regions for Human Face Detection∗J.Matas1,2,P.B´ılek1,M.Hamouz2,and J.Kittler21Center for Machine Perception,Czech Technical University{bilek,matas}@cmp.felk.cvut.cz2Centre for Vision,Speech,and Signal Processing,University of Surrey{m.hamouz,j.kittler}@AbstractWe propose a robust method for face detection based on the assumption that face can be represented by arrange-ments of automatically detectable discriminative regions. The appearance of face is modelled statistically in terms of local photometric information and the spatial relationship of the discriminative regions.The spatial relationship be-tween these regions serves mainly as a preliminary evidence for the hypothesis that a face is present in a particular po-sition.Thefinal decision is carried out using the complete information from the whole image patch.The results are very promising.1IntroductionDetection and recognition of objects is the most difficult task in computer vision.In many papers object detection and object recognition are considered as distinct problems, treated separately and under different names,e.g.object localisation(detection)and recognition.In our approach localisation of an object of a given class is a natural gener-alisation of object recognition.In the terminology that we introduce object detection is understood to mean the recog-nition of object’s class,while object recognition implies dis-tinguishing between specific objects from one class.Ac-cordingly,an object class,or category,is a set of objects with similar local surface properties and global geometry. In this paper we focus on object detection,in particular,we address the problem of face localisation.The main idea of this paper is based on the premise that objects in a class can be represented by arrangements of automatically detectable discriminative regions.Discrimi-∗This research was supported by the Czech Ministry of Education under Research Programme MSM210000012Transdisciplinary Biomedical En-gineering Research and by European Commission IST-1999-11159project BANCA.native regions are distinguished regions exhibiting proper-ties important for object detection and recognition.Distin-guished regions are”local parts”of the object surface,ap-pearance of which is stable over a wide range of views and illumination conditions.Instances of the category are repre-sented by a statistical model of appearance of local patches defined in terms of discriminative regions and by their re-lationship.Such a local model of objects has a number of attractive properties,e.g.robustness to partial occlusion and simpler illumination compensation in comparison with global models.Superficially,the framework seems to be no more than a local appearance-based method.The main difference is the focus in our work on the selection of regions where appear-ance is modelled.Detectors of such regions are built during the learning phase.In the detection stage,multiple detec-tors of discriminative regions process the image.Detection is then posed as a combinatorial optimisation problem.De-tails of the scheme are presented in Section3.Before that, previous work is revised in Section2.Experiments in de-tecting human faces based on the proposed framework are described in Section4.Possible refinements of the general framework are discussed in Section5.The main contribu-tions of this paper are summarised in Section6.2Previous WorkMany early object recognition systems were based on two basic approaches:•template matching—one or morefilters(templates), representing each object,are applied to a part of im-age,and from their responses the degree of similarity between the templates and the image is deduced.•measuring geometric features—geometric measure-ments(distance,angle...)between features are ob-tained and different objects are characterised by differ-ent constraints imposed on the measurements.It is was showed by Brunelli et al.[3]that template match-ing outperforms measuring geometric features,since the ap-proach exploits more information extracted from the image. Although template matching works well for some types of patterns,there must be complex solutions to cope with non-rigid objects,illumination variations or geometrical trans-formation due to different camera projections.Both approaches,template matching and measuring ge-ometric constraints,can be combined together to reduce their respective disadvantages.Brunelli et al.[3]showed that a face detector consisting of individual features linked together with crude geometry constraints have better per-formance than a detector based on”whole-face”template matching.Yuille[20]proposed the use of deformable templates to befitted to contrast profiles by the gradient descent of a suitable energy function.A similar approach was proposed by Lades et al.[9]and Wiskott et al.[19].They developed a recognition method based on deformable meshes.The mesh(representing object or object’s class)is overlaid over image and adjusted to obtain the best match between the node descriptors and the image.The likelihood of match is computed from the extent of mesh deformation.Schmid et al.[14,17]proposed detectors based on local-jets.The robustness is achieved by using spatial constraints between locally detected features.The spatial constraints are represented by angle and length ratios,that are supposed to be Gaussian variables each with their own mean and stan-dard deviation.Burl et al.[4,5,6]introduced a principled framework for representing possible deformations of objects using prob-abilistic shape models.The objects are again represented as constellations of rigid features(parts).The features are characterised photometrically.The variability of constella-tions is represented by a joint probability density function.A similar approach is used by Mohan et al.[13]for the detection of human bodies.The local parts are again recog-nised by detectors based on photometric information.The geometric constraints on mutual positions of the local parts in the image are defined heuristically.All the above mentioned methods make decisions about the presence or absence of the object in the image only from geometric constraints.Our proposed method shares the same framework,but in our work the local feature de-tector and geometric constraints define only a set of pos-sible locations of object in the image.Thefinal decision is made using photometric information,where the parts of object between the local features are taken into account as well.There are other differences between our approach and the approach of Schmid[17]or Burl[4,6].A coordinate system is introduced for each object from the object class. This allows us to tackle the problem of selecting distinctive and well localisable features in a natural way whereas in the case of Schmid’s approach,detectable regions were selected heuristically and a model was built from such selected fea-tures.Eventhough Weber[18]used an automatic feature selection,this was not carried out in an object-normalised space(as was in our approach),and consequently no re-quirements on the spatial stability of features were speci-fied.The relative spatial stability of discriminative regions used in our method facilitates a natural affine-invariant way of verifying the presence of a face in the image using corre-spondences between points in the normalized object space and the image,as will be discussed into detail further.3Method OutlineObject detection is performed in three stages.First,the discriminative region detectors are applied to image,and thus a set of candidate locations is obtained.In the second stage,the possible constellations(hypotheses)of discrimi-native regions are formed.In the third stage the likelihood of each hypothesis is computed.The best hypotheses are verified using the photometric information content from the test image.For algorithmic details see Section4.3.In the following sections we define several terms used in object recognition in a more formal way.The main aim of the sections is to unify different approaches in the literature and different taxonomy.3.1Object ClassesFor our purposes,we define an object class as a collec-tion of objects which share characteristic features,i.e.ob-jects are composed of several local parts and these parts are in a specific spatial relationship.We assume the local parts are detectable in the image directly and the possible arrangements of the local parts are given by geometrical constraints.The geometrical constraints should be invari-ant with respect to a predefined group of transformations. Under this assumption,the task of discrimination between two classes can be reduced to measuring the differences be-tween local parts and their geometrical relationships.3.2Discriminative RegionsImagine you are presented with two images depicting ob-jects from one class.You are asked to mark corresponding points in the image pair.We would argue that,unless distin-guished regions are present in the two images,the task is ex-tremely hard.Two views of a white featureless wall,a patch of grass,sea surface or an ant hill might be good examples. However,on most objects,wefind surface patches that can be separated from their surroundings and are detectable overa wide range of views.Before proceeding further,we give a more formal definition of distinguished region:Definition1Distinguished Region(DR)is any subset of an image that is a projection of a part of scene(an object) possessing a distinguishing property allowing its detection (segmentation,figure-ground separation)over a range of viewing and illumination conditions.In other words,the DR detection must be repeatable and stable w.r.t.viewpoint and illumination changes.DRs are referred to in the literature as’interest points’[7],’features’[1]or’invariant regions’[16].Note that we do not require DRs to have some transformation-invariant property that is unique in the image.If a DR possessed such a property,finding its corresponding DR in an other image would be greatly simplified.To increase the likelihood of this hap-pening,DRs can be equipped with a characterisation com-puted on associated measurement regions:Definition2A Measurement Region(MR)is any subset of an image defined by a transformation-invariant construc-tion(projective,affine,similarity invariant)from one or more(in case of grouping)regions.The separation of the concepts of DR and MRs is impor-tant and not made explicit in the literature.Since DRs are projections of the same part of an object in both views and MRs are defined in a transformation-invariant manner they are quasi view-point invariant.Besides the simplest and most common case where the MR is the DR itself,a MR may be constructed for example as a convex hull of a DR, afitted ellipse(affinelly invariant,[16]),a line segment be-tween a pair of interest points[15]or any region defined in a DR-derived coordinates.Of course,invariant measure-ments from a single or even multiple MRs associated with a DR will not guarantee a unique match on e.g.repetitive patterns.However,often DR characterisation by invariants computed on MR might be unique or almost unique.Note that,any set of pixels,not necessarily continu-ous,can posses a distinguishing property.Many percep-tual grouping processes detect such arrangements,e.g.a set of(unconnected)edges lying along a straight line form a DR of maximum edge density.The property is view-point quasi-invariant and detectable by the Hough Trans-form.The’distinguished pixel set’[10]would be a more precise term,but it is cumbersome.The definition of”local part”(sometimes also called ”feature”,”object component”etc.)is very vague in the recent literature.For our purpose it is important to define it more precisely.In the following discussion we will use the term”discriminative region”instead of”local part”.In this way,we would like to emphasise the difference between our definition of discriminative region and the usual sense of lo-cal part(a discriminative region is a local part with special properties important for its detection and recognition).Definition3A Discriminative Region is any subset of an image defined by discriminative descriptors computed on measurement region.Discriminative descriptors have to have the following properties:•Stability under change of imaging conditions.A discriminative region must be detectable over a wide range of imaging conditions(viewpoint,illumination).This property is guaranteed by definition of a DR.•Good intra-category localization.The variation in the position of the discriminative region in the object coordinate system should be small for different objects in the same category.•Uniqueness.A small number of similar discriminative regions should be present in the image of both object and background.•High incidence.The discriminative region should be detectable in a high proportion of objects from the same category.Note,there exists a trade-off between the ability to localise objects and the ability to discriminate between.A very dis-criminative part can be a strong cue,even if it appears in an arbitrary location on the surface of the object.On the other hand,a less discriminative part can only contribute infor-mation if it occurs in a stable spatial relationship relative to other parts.3.3Combining EvidenceThis is a rather important stage of the detection process, which significantly influences the overall performance of the system and makes it robust with respect to arbitrary geometrical transformations.The combination of evidence coming from the detected discriminative regions is carried out in a novel way,significantly different from approaches of the Schmid et al.[14,17]or Burl et al.[4,5,6].In most approaches,a shape model is built over the placement of particular discriminative regions.If an admis-sible configuration of these regions is found in an image,an instance of object in the image is hypothesised.It means that all the information conveyed by the area that lies be-tween the detected discriminative regions is discarded.If you imagine a collage,consisting of one eye,a nostril and a mouth corner placed in a reasonable manner on a black background,this will still be detected as a face,since no other parts of the image are needed to accept the”face-present”hypothesis.In our approach the geometrical constraints are modelled probabilistically in terms of spatial coordinates of discrim-inative regions.But these geometrical constraints are used only to define possible positions(hypotheses)of object inthe image.Thefinal decision about object presence in the image is deduced from the photometric information content in the original image.4ExperimentWe have carried out the experiment on face localisation [2]with the XM2VTS database[11].In order to verify the correctness of our localization framework,several simpli-fications to the general scheme are made.In the exper-iment the discriminative regions were semi-automatically defined as the eye-corners,the eye-centers the nostrils and the mouth corners.4.1Detector of discriminative regionsAs a distinguished region detector we use the improved Harris corner detector[8].Our implementation[2]of the detector is relatively insensitive to illumination changes, since the threshold is computed automatically from the neighborhood of the interest point.Such a corner detec-tor is not generally invariant to scale change,but we solve this problem by searching for interest points through several scales.We have observed[2]that the distribution of interest points coincide with the manually labelled points.It means, these points should define discriminative regions(here we suppose,that humans often identify interest points as most discriminative parts of object).Further,we have assumed that all potential in-plane face rotations and differences in face scale are covered by the training database.The MRs was defined very simply,as rectangular regions with the centre at the interest points.We select ten positions (the left eye centre,the right eye centre,the right left-eye corner,the left left-eye corner,the right right-eye corner, the left right-eye corner,the left nostril,the right nostril,the left mouth corner,the right mouth corner),which we further denote as regions1–10.All properties of a discriminative region are then determined by the size of the region.As a descriptor of a region we use the normalised colour infor-mation of all points contained in the region.Each region was modelled by a uni-modal Gaussian in a low-dimensional sub-space and the hypothesis whether the sample belongs to the class of faces is decided from the distance of this sample from the mean for a given region. The distance from the mean is measured as a sum of the in sub-space(DISS)and the from sub-space(DFSS)distances (Moghaddam et al.[12]).4.2Combining EvidenceThe proposed method is based onfinding the correspon-dences between generic face features(referred to as dis-criminative regions)that lie in the face-space and the face features detected in an image.This correspondence is then used to estimate the transformation that a generic face pos-sibly underwent.So far the correspondence of three points was used to estimate a four or six parametric affine trans-formation.When the the transformation from the face space to im-age space determined,the verification of a”face-present”hypothesis becomes an easy task.An inverse transforma-tion(i.e.transformation from the image space into the face-space)is found and the image patch(containing the three points of correspondence)is transformed into the face-space.The decision whether the”face-present”hypothesis holds or not is carried out in the face-space,where all the variations introduced by the geometrical transformation(so far only affine transformation is assumed to be the admis-sible transformation that a generic face can undergo)are compensated(or at least reduced to a negligible extent). The distance from a generic face class[12]is computed for the transformed patch and a threshold is used to determine whether the patch is from a face class or not.Moreover,many possible face patches do not have to be necessarily verified,since certain constraints can be put on the estimated transformation.Imagine for instance that all the feasible transformations that a face can undergo are the scaling from50%to150%of the original size in the face space and rotations up to30degrees.This is quite a rea-sonable limitation which will cause most of the correspon-dences to be discarded without doing a costly verification in the face space(in our experiments the pruning reached about70%).In case of the six parametric affine transform both shear and anisotropic scale is incorporated as the ad-missible transformation.4.3Algorithm summaryAlgorithm1:Detection of human faces1.Detection of the distinguished regions.For each im-age from the test set,detect the distinguished regions using the illumination invariant version of the Harris detector2.Detection of the discriminative regions.For each de-tected distinguished region determine to which class the region belongs using the PCA-based classifier in the colour space from among ten discriminative regionclasses(practically the eye corners,the eye centres,the nostrils and the mouth corners).The distinguished regions that do not belong to any of the predefined classes are discarded.bination of evidence.•Compute the estimate of the transformation fromthe image space to the face space using the corre-spondences between the three points in the facespace and in the image space.•Decompose this transformation into rotation,scale,translation and possibly shear and testwhether these parameters lie within a predefinedconstraints,i.e.make the decision,whether thetransformation is admissible or not.•If the transformation derived from the correspon-dences is admissible,transform the image patchthat is defined by the transformation of the faceoutline into the face space.4.Verification.Verify the”face present”hypothesis us-ing a PCA-based classifier.4.4ResultsResults of discriminative regions detector are sum-marised in Tab.1.Note that since the classifier is very sim-ple,the performance is not very high.However,even with such a simple detector of discriminative regions the system is capable of detecting faces with very low error,since we need only a small number of successfully detected discrim-inative regions(in our case only3).Several extensive experiments were conducted.Image patches were declared as”face”when their Mahanalobis distance based score lied below a certain threshold.200im-ages from the XM2VTS database were used for training a grayscale classifier based on the Moghaddam method[12], as mentioned earlier.The detection rate reached98%in case of XM2VTS database-see Fig.1for examples.Faces in several images containing cluttered background were successfully detected as shown in Fig.2.5Discussion and Future WorkWe proposed a method for face detection using discrim-inative regions.The detector performance is very good for the case when the general face detection problem is con-strained by assuming a particular camera and pose position.Table1.Performance of discriminative regiondetectorsfalse negative false positive%#%#Region131.8919172.263831Region210.686437.881342Region357.7634633.03433Region454.9232919.85218Region515.039022.34538Region613.698262.333260Region715.5393 4.0078Region812.5275 5.07104Region948.75292 6.2770Region1033.5620114.90233Correctly detected False rejections Figure1.Experiment resultsWe also assumed that the parts that appear distinctive to the human observer will be also discriminative,and therefore the discriminative regions were selected manually.In gen-eral,the correlation between distinctiveness and discrimi-nativeness cannot necessarily be assumed and therefore the discriminative regions should be”learned”from the training images.The training problem was addressed in this paper only partially.As an alternative the method proposed by Weber et al.[18]can be exploited.The admissible transformation,which a face can undergo has so far been restricted to affine transformation.Never-theless,the results showed even in such a simple case,that high detection performance can be achieved.Future modifi-cations will involve the employment of more complex trans-formations(such as general non-rigid transformations).The PCA based classification can be replaced by more powerful classifiers,such as Neural Networks,or Support Vector Ma-chines.Figure2.Experiments with cluttered back-ground6ConclusionIn the paper,a novel framework for face detection wasproposed.The framework is based on the idea that mostreal objects can be decomposed into a collection of localparts tied by geometrical constraints imposed on their spa-tial arrangement.By exploiting this fact,face detection canbe treated as recognition of local image patches(photomet-ric information)in a given configuration(geometric con-straints).In our approach,discriminative regions serve as apreliminary evidence reducing the search time dramatically.This evidence is utilised for generating a normalised versionof the image patch,which is then used for the verificationof the”face present”hypothesis.The proposed method was applied to the problem of facedetection.The results of extensive experiments are verypromising.The experiments demonstrated that the pro-posed method is able to solve a rather difficult problem incomputer vision.Moreover we showed that even simplerecognition methods(with a limited capability when usedalone)can be configured to create powerful framework ableto tackle such a difficult task as face detection.References[1] A.Baumberg.Reliable feature matching across widely sepa-rated views.In Proc.of Computer Vision and Pattern Recog-nition,pages I:774–781,2000.[2]P.B´ılek,J.Matas,M.Hamouz,and J.Kittler.Detection ofhuman faces from discriminative regions.Technical ReportVSSP–TR–2/2001,Department of Electronic&ElectricalEngineering,University of Surrey,2001.[3]R.Brunelli and T.Poggio.Face recognition:Features vs.templates.IEEE Trans.on Pattern Analysis and MachineIntelligence,15(10):1042–1053,1993.[4]M.C.Burl,T.K.Leung,and P.Perona.Face localizationvia shape statistics.In Proc.of International Workshop onAutomatic Face and Gesture Recognition,pages154–159,1995.[5]M.C.Burl and P.Perona.Recognition of planar objectclasses.In Proc.of Computer Vision and Pattern Recog-nition,pages223–230,1996.[6]M.C.Burl,M.Weber,and P.Perona.A Probabilistic ap-proach to object recognition using local photometry abdglobal Geometry.In Proc.of European Conference on Com-puter Vision,pages628–641,1998.[7]Y.Dufournaud,C.Schmid,and R.Horaud.Matching im-ages with different resolutions.In Proc.of Computer Visionand Pattern Recognition,pages I:612–618,2000.[8] C.J.Harris and M.Stephens.A combined corner and edgedetector.In Proc.of Alvey Vision Conference,pages147–151,1988.[9]des,J. C.V orbr¨u ggen,J.Buhmann,nge,C.von der Malsburg,R.P.W¨u rtz,and W.Konen.Distrotioninvariant object recognition in the dynamic link architecture.IEEE Trans.on Pattern Analysis and Machine Intelligence,42(3):300–310,1993.[10]J.Matas,M.Urban,and T.Pajdla.Unifying view for wide-baseline stereo.In B.Likar,editor,puter Vi-sion Winter Workshop,pages214–222,Ljubljana,Sloveni,February2001.Slovenian Pattern Recorgnition Society.[11]K.Messer,J.Matas,J.Kittler,J.Luettin,and G.Maitre.XM2VTSDB:The extended M2VTS database.In R.Chel-lapa,editor,Second International Conference on Audio andVideo-based Biometric Person Authentication,pages72–77,Washington,USA,March1999.University of Maryland.[12] B.Moghaddam and A.Pentland.Probabilistic visual learn-ing for object detection.In Proc.of International Confer-ence on Computer Vision,pages786–793,1995.[13] A.Mohan,C.Papageorgiou,and T.Poggio.Example-basedobject detection in images by components.IEEE Trans.onPattern Analysis and Machine Intelligence,23(4):349–361,2001.[14] C.Schmid and R.Mohr.Local grayvalue invariants for im-age retrieval.IEEE Trans.on Pattern Analysis and MachineIntelligence,19(5):530–535,1997.[15] D.Tell and S.Carlsson.Wide baseline point matching usingaffine invariants computed from intensity profiles.In Proc.of European Conference on Computer Vision,pages754–760,2000.[16]T.Tuytelaars and L.van Gool.Wide baseline stereo match-ing based on local,affinely invariant regions.In Proc.ofBritish Machine Vision Conference,pages412–422,2000.[17]V.V ogelhuber and C.Schmid.Face detection based ongeneric local descriptors and spatial constraints.In Proc.of International Conference on Computer Vision,pagesI:1084–1087,2000.[18]M.Weber,M.Welling,and P.Perona.Unsupervised learn-ing of models for recognition.In Proc.of European Confer-ence on Computer Vision,pages18–32,2000.[19]L.Wiskott,J.-M.Fellous,N.Kr¨u ger,and C.von der Mals-burg.Face recognition by elastic bunch graph matching.IEEE Trans.on Pattern Analysis and Machine Intelligence,19(7):775–779,1997.[20] A.L.Yuille.Deformable templates for face recognition.Journal of Cognitive Neuroscience,3(1):59–70,1991.。
写一篇对于面部识别技术的认识英语作文
写一篇对于面部识别技术的认识英语作文Facial recognition technology, also known as face recognition technology, is a biometric technology that analyzes and identifies individuals based on their facial features. It has gained significant attention and widespread use in recent years due to its potential applications in various fields. In this article, we will explore the concept of facial recognition technology, its benefits, and potential concerns.Firstly, let's delve into how facial recognition technology works. It involves capturing an individual's facial image or video and analyzing it using various algorithms. These algorithms extract unique facial features such as the distance between the eyes, the shape of the nose, and the contour of the face. These features are then converted into a numerical code, commonly referred to as a faceprint or facial template. This code is compared against a database of known faces to identify the individual.One of the key benefits of facial recognition technology is its potential to enhance security and safety. It can be used in surveillance systems to identify and track individuals in real-time, aiding in the prevention and investigation of criminal activities. For example, it can help law enforcement agencies in identifying suspects or locating missing persons. Additionally, facial recognition technology can be integrated into access control systems, replacing traditional methods such as ID cards or passwords, providing a more secure and convenient way of authentication.Moreover, facial recognition technology has found applications in various industries. In the retail sector, it can be used to analyze customer demographics and behavior, helping businesses tailor their marketing strategies and improve customer experiences. In the healthcare industry, it can assist in patient identification, reducing the risk of medical errors and improving the efficiency of healthcare delivery. Furthermore, facial recognition technology has been utilized in the entertainment industry for personalized experiences, such as unlocking exclusive content or customizing avatars in video games.However, despite its potential benefits, facial recognition technology also raises concerns regarding privacy and ethical implications. The collection and storage of facialdata raise questions about data protection and potential misuse. There are concerns that the technology may be used for mass surveillance, infringing on individuals' right to privacy. Additionally, there have been cases of misidentification or bias in facial recognition systems, particularly when it comes to individuals from diverse racial or ethnic backgrounds. This highlights the importance of ensuring the accuracy and fairness of these systems through rigorous testing and regulation.In conclusion, facial recognition technology holds great promise in various domains, ranging from security to personalized experiences. Its ability to analyze and identify individuals based on their facial features has revolutionized many industries. However, it is crucial to address the concerns surrounding privacy and ethical considerations to ensure its responsible and beneficial use. Striking a balance between innovation and safeguarding individual rights will be key in harnessing the full potential of facial recognition technology.。
人脸识别中的图像处理技术
⼈脸识别中的图像处理技术⼈脸识别中的图像处理技术 ⼈脸作为⼀种⾼普遍性、可以⾮接触式采集的重要⽣物特征,正被越来越多地⽤来进⾏⾝份鉴别。
下⾯是YJBYS⼩编搜索整理的关于⼈脸识别中的图像处理技术,欢迎参考阅读,希望对⼤家有所帮助!想了解更多相关信息请持续关注我们应届毕业⽣培训⽹! ⼈脸识别,特指利⽤分析⽐较⼈脸视觉特征信息进⾏⾝份鉴别的计算机技术。
⼈脸识别技术应⽤⼴泛,可⽤于安全验证系统、医学、档案管理、银⾏和海关的监控系统及⾃动门禁系统等[1]。
与利⽤指纹、虹膜等其他⼈体⽣物特征进⾏⾝份识别的⽅法相⽐,⼈脸识别更加友好、⽅便和隐蔽。
因其巨⼤的应⽤前景,以及其⽆可⽐拟的优越性,⼈脸识别越来越成为当前模式识别和⼈⼯智能领域的⼀个热点。
图像预处理是⼈脸识别过程中的⼀个重要环节。
输⼊图像由于图像采集环境的不同,往往存在有噪声,对⽐度不够等缺点。
为了保证⼈脸图像中⼈脸⼤⼩、位置以及⼈脸图像质量的⼀致性,必须对图像进⾏预处理。
1、⼈脸识别的基本内容和过程 ⼈脸识别(Face Recognition)⼀般可描述为:给定⼀静⽌或动态图像,利⽤已有的⼈脸数据库来确认图像中的⼀个或多个⼈。
从⼴义上讲,其研究内容包括以下五个⽅⾯: (1)⼈脸检测(Face Detection):即从各种不同的场景中检测出⼈脸的存在并确定其位置。
这⼀任务主要受光照、噪声、头部倾斜度以及各种遮挡的影响。
(2)⼈脸表征(Face Representation):即确定表⽰检测出的⼈脸和数据库中的⼰知⼈脸的描述⽅式。
通常的表⽰⽅法包括⼏何特征(如欧⽒距离、曲率、⾓度等)、代数特征(如矩阵特征⽮量)、固定特征模板、特征脸、云纹图等。
(3)⼈脸鉴别(Face identification):即通常所说的⼈脸识别,就是将待识别的⼈脸与数据库中的已知⼈脸⽐较,得出相关信息。
这⼀过程的核⼼是选择适当的⼈脸表⽰⽅式与匹配策略。
(4)表情分析(Facial expression Analysis):即对待识别⼈脸的表情进⾏分析,并对其加以分类。
人脸识别外文翻译参考文献
人脸识别外文翻译参考文献(文档含中英文对照即英文原文和中文翻译)译文:基于PAC的实时人脸检测和跟踪方法摘要:这篇文章提出了复杂背景条件下,实现实时人脸检测和跟踪的一种方法。
这种方法是以主要成分分析技术为基础的。
为了实现人脸的检测,首先,我们要用一个肤色模型和一些动作信息(如:姿势、手势、眼色)。
然后,使用PAC技术检测这些被检验的区域,从而判定人脸真正的位置。
而人脸跟踪基于欧几里德(Euclidian)距离的,其中欧几里德距离在位于以前被跟踪的人脸和最近被检测的人脸之间的特征空间中。
用于人脸跟踪的摄像控制器以这样的方法工作:利用平衡/(pan/tilt)平台,把被检测的人脸区域控制在屏幕的中央。
这个方法还可以扩展到其他的系统中去,例如电信会议、入侵者检查系统等等。
1.引言视频信号处理有许多应用,例如鉴于通讯可视化的电信会议,为残疾人服务的唇读系统。
在上面提到的许多系统中,人脸的检测喝跟踪视必不可缺的组成部分。
在本文中,涉及到一些实时的人脸区域跟踪[1-3]。
一般来说,根据跟踪角度的不同,可以把跟踪方法分为两类。
有一部分人把人脸跟踪分为基于识别的跟踪喝基于动作的跟踪,而其他一部分人则把人脸跟踪分为基于边缘的跟踪和基于区域的跟踪[4]。
基于识别的跟踪是真正地以对象识别技术为基础的,而跟踪系统的性能是受到识别方法的效率的限制。
基于动作的跟踪是依赖于动作检测技术,且该技术可以被分成视频流(optical flow)的(检测)方法和动作—能量(motion-energy)的(检测)方法。
基于边缘的(跟踪)方法用于跟踪一幅图像序列的边缘,而这些边缘通常是主要对象的边界线。
然而,因为被跟踪的对象必须在色彩和光照条件下显示出明显的边缘变化,所以这些方法会遭遇到彩色和光照的变化。
此外,当一幅图像的背景有很明显的边缘时,(跟踪方法)很难提供可靠的(跟踪)结果。
当前很多的文献都涉及到的这类方法时源于Kass et al.在蛇形汇率波动[5]的成就。
人脸识别外文文献
Method of Face Recognition Based on Red-BlackWavelet Transform and PCAYuqing He, Huan He, and Hongying YangDepartment of Opto-Electronic Engineering,Beijing Institute of Technology, Beijing, P.R. China, 10008120701170@。
cnAbstract。
With the development of the man—machine interface and the recogni—tion technology, face recognition has became one of the most important research aspects in the biological features recognition domain. Nowadays, PCA(Principal Components Analysis) has applied in recognition based on many face database and achieved good results. However, PCA has its limitations: the large volume of computing and the low distinction ability。
In view of these limitations, this paper puts forward a face recognition method based on red—black wavelet transform and PCA. The improved histogram equalization is used to realize image pre-processing in order to compensate the illumination. Then, appling the red—black wavelet sub—band which contains the information of the original image to extract the feature and do matching。
Face Detection EE368 Final Project 人脸检测方法 ee368最终的项目—共12页
谢谢你的阅读
知识就是财富 丰富你的人生
更多精品资源请访问
docin/sanshengshiyuan doc88/sanshenglu
Face Detection
- EE368 Final Project -
Stanford University, Dept. of EE Taesang Yoo, Youngjae Kim (Group 7)
Algorithm Outline
960x1280x3
Resize
240x320x3
Detect Skin-colors
- Reduced the false alarm rate using two filters - Out of 163 faces in the 7 training images, 90%
with false alarm rate 5% was detected - Dependent on the training set by assumptions
- Edge based preprocessing speeds up the process by enabling the eigenface method to skip blank regions
Detection result
Training_1.jpg
Conclusions
- A combination of skin color method & eigenface method
made from it (e.g. image size, # of faces, colors)
References
- Skin color detection Face Detection in Color Images using Wavelet Packet
Human Face Recognition technology
• The application of face recognition
• Face recognition market is still in the market cultivation stage, its development space is very wide in recent years, by the very big concern. As a new biological recognition technology, and iris recognition, fingerprint scanning, palm scan technology, compared to face recognition technology in application has unique advantages. Can be widely used in life beautification, security and identity identification, cracking the case and all aspects of life.
• The basic idea of the method of face features: high dimensional image space after KL transform from a group of
new orthogonal geometric feature of face recognition。
• The advantage of Human Face Recognition
Face recognition advantage lies in its natural and not be measured individuals perceive characteristics.
基于Mask R-CNN的人脸检测与分割方法
第46卷第6期 计算机工程2020年6月V o l. 46 N o. 6 C o m p u te r E n g in e e rin g June 2020•图形图像处理• 文章编号:1000#428(2020)06-0274-07 文献标志码:A 中图分类号:TP391基于Mask R -C N N 的人脸检测与分割方法林凯瀚,赵慧民,吕巨建,詹瑾,刘晓勇,陈荣军(广东技术师范大学计算机科学学院,广州510665)摘要:针对现有主流的人脸检测算法不具备像素级分割,从而存在人脸特征具有噪声及检测精度不理想的问题,种基于M ask R -C N N 的人脸检测及分割方法。
R +N e t-101 合RPN 生成 区域,再用R o IA lig n 算像素级的特征点定位,旨在 定位 。
根据 积 生成相应的人脸二 码, 图像中人脸 与 的分割。
此外,构建 个具有分割标 的人脸数据集用于训练相应模型。
在用人脸检测数据集的 果表明,该方法具有较好的人脸检测效果,并能在准确检测的同时实现像素级的人脸信息分割。
关键词:人脸检测;M ask R -C N N 算法;实例分割;R o IA lig n 算法;全卷积网络开放科学(资源服务)标志码(O S ID ):中文引用格式:林凯瀚,赵慧民,吕巨建,等.基于M ask R -C N N 的人脸检测与分割方法[J ].计算机工程,2〇2〇,46(6):274-280.英文引用格式:L IN K a ih a n ,Z H A O H u im in ,L ]J u jia n ,et al. Face detection and segmentation method based on Mask R -C N N [J ]. Computer Engineering ,2020,46 (6): 274-280.Face Detection and Segmentation Method Based on Mask R-CNNL I N K a ilia n ,Z H A O H u im in ,L U J u jia n ,Z H A N J in ,L I U X ia o y o n g ,C H E N R o n g ju n (School of Computer Science ,Guangdong Polytechnic Normal University ,Guangzhou 510665,China)[A b s tr a c t ] Face detection is an im portant research direction in com puter vision and inform ation secu rity ,w hich has beenw idely studied over the past few decades. In the traditional face detection m e tlio d , there is no p ixe l-le ve l segmentation process ,w hich leads to the problem o f face features w ith noise and unsatisfactory detection accuracy. In order to overcome this shortcom ing ,a face detection and segmentation method based on M ask R -C N N is proposed in this paper. In this m ethod ,ResNet-101 andRPNis used to generateR o Is ,and R o IA lig n fa ith fu lly retains theexactsbinary mask through F u lly C onvolution N etw ork. Inordertotrainth e m o d e l ,thispaper constructs a fa c e dsegmentation annotation inform ation. The experimental results o f w e ll-kn ow n face detection dataset show that the proposed method has better f ace detection effect and can achieve p ixe l-le ve l face inform ation segmentation at the same tim e.[K e y w o rd s ] face de te ctio n ; M ask R -C N N a lg o rith m ; instance segm entation ; R o IA lig n a lg o rith m ; F u ll C onvo lutio na lN e tw ork (F C N )D O I :10. 19678/j. issn. 1000-3428.0054566o概述人脸检测是计算机视觉和 的一个重要研究方向,也是标检测技术的 分支,具有重要的研究意义 用价值。
面部识别在中国的应用英语作文
面部识别在中国的应用英语作文Facial recognition technology, a cutting-edge biometric technology, has been experiencing rapid development and widespread application in China. Leveraging advances in artificial intelligence and machine learning, this technology has become an integral part of daily life,革命izing various industries and sectors.In the realm of security, facial recognition has become a powerful tool in the hands of law enforcement agencies. Police forces across the country are using this technologyto identify criminal suspects, track fugitives, and monitor public places for suspicious activities. This not only enhances the efficiency of law enforcement but alsoimproves public safety.The retail industry has also been revolutionized by facial recognition. Stores are now able to recognize their customers and provide personalized shopping experiences. This technology can identify a customer's preferences and buying habits, enabling retailers to offer targeted discounts and recommendations. Furthermore, it can alsohelp in preventing shoplifting by identifying known thieves.Financial institutions have also embraced facial recognition technology. Banks and other financialinstitutions are using this technology to authenticate customers and prevent fraud. By comparing a customer's face with their stored biometric data, these institutions can ensure that only the rightful owner can access their accounts.In addition to these industries, facial recognition technology is also finding its way into our daily lives. Smartphones and other electronic devices now come withfacial unlock features, making it easier and moreconvenient for users to unlock their devices. This technology is also being used in airports, railway stations, and other public places to facilitate fast and efficient check-in and identification processes.Despite its widespread application, facial recognition technology in China has also raised concerns regarding privacy and ethical issues. There have been reports of misuse of this technology, such as the unauthorized collection and sale of biometric data. To address these concerns, the Chinese government has been working onregulating the use of facial recognition technology, ensuring that it is used ethically and within legal limits. In conclusion, facial recognition technology has brought about significant changes in China, revolutionizing various industries and enhancing public safety. However, it is crucial to address the privacy and ethical issues associated with this technology to ensure its responsible and sustainable use.**面部识别在中国的应用**面部识别技术,作为前沿的生物识别技术,在中国经历了快速发展和广泛应用。
基于模糊主分量分析的脸部未知遮挡物去除
Remove Unknown Face Occlusion by Fuzzy PrincipalComponent AnalysisZhi-ming Wang1,2 Jian-hua Tao21: School of Information Engineering, University of Science and Technology Beijing,Beijing 100083, China2: National Laboratory of Pattern Recognition, Institute of Automation,Chinese Academy of Sciences, Beijing 100080, ChinaAbstract: This paper proposes an iterative face occlusion remove algorithm based on accumulative error compensation and fuzzy principal component analysis (FPCA).The originality of this work is two folds. First, instead of successive error, normalized accumulated absolute error was used as an image fusion weight in recursive error compensation. Second, gappy PCA with bi-value mask was extended to fuzzy PCA with continuous mask between 0~1. The value of the fuzzy mask vector is also defined by normalized accumulated error, which indicates the probability of being occluded of face region. Experimental results shown that our new reconstruction algorithm could remove unknown occlusion with various shape effectively, and outperform classical iterative PCA based algorithm.Keywords:Face occlusion, Face reconstruction, Principal component analysis (PCA), Gappy principal component analysis (GPCA), Fuzzy principal component analysis (FPCA)基于模糊主分量分析的脸部未知遮挡物去除王志明1,2 陶建华21: 北京科技大学信息工程学院,北京,1000832: 中科院自动化化研究所模式识别国家重点实验室,北京,100080摘 要:本文提出一种基于累积误差和模糊主分量分析(FPCA)的迭代人脸遮挡物去除算法。
Localization of human faces fusing color segmentation and depth from stereo
Francesc Moreno, Juan Andrade-Cetto, and Alberto Sanfeliu
Institut de Rob` otica i Inform` atica Industrial, UPC-CSIC Llorens i Artigas 4-6, Edifici U, 2a pl. Barcelona 08028, Spain
8th IEEE International Conference on Emerging Technologies and Factory Automation Antibes, October 2001, pp. 527-535.
I. INTRODUCTION The ability to recognize a human face or a facial expression is of great importance for the interaction of computers and robots with humans. At the Institut de Rob` otica i Inform` atica Industrial, UPC-CSIC, we are interested in providing our mobile platform Marco [1] with the ability to recognize people. Some results from our group on the recognition of human faces with the aid of a computer vision system are reported in [2]. However, this technique does not address the problem of locating faces in the scene prior to recognition. For this reason, we present now a method for the localization of faces that complements our recognition module. The method is tailored for its use in factory automation applications where the detection and localization of humans is necessary for the completion or interruption of a particular task. This is particularly useful in robotic workcells that require automatic safety precautions such as speed reduction or sound warnings when a human operator approaches its workspace, or immediate motion interruption if such operator interferes with robot motion. These systems are tailored to multirobot workcells where motion sensors cannot accurately estimate human presence. Another application field is that of human-machine interaction. It is desirable for a mobile service robot to be able to modify its behavior with respect to its interaction with people. Such is the case of surveillance systems or mobile delivery units that must modify their trajectory in the presence of humans. And ultimately, be able to recognize among different people and behave accordingly. When no restrictions are imposed on the input images, human face localization can be a challenging task. Apart from scale variation and position uncertainty, there exist other artifacts that make this problem difficult including the a priori ignorance of the pose of the face in the image, i.e., frontal, sideways, or nodded; occlusions of the face by other objects; or the lighting conditions that may change the position of the skin color in the color space. Also, complex backgrounds could lead to inference of false head shapes. Many approaches have been proposed for the detection and localization of human faces. A survey on face detection methods can be found in [3]. The reader should take into account the
PCA人脸识别实验报告
Assignment 1: Face Recognition using PCAIntroduction to BiometricsXun Huang 0827121 TU/e x.huang@student.tue.nl2012/10/81. IntroductionThis assignment aims to build a simply constructed recognition system using the standard Principal Component Analysis (PCA) method to identify face image and non-face image. After that, the test set will be used, to analyze the performance of the PCA approach for varying feature sizes.2. Initiate: Read the training setWe have a ‘FaceData’ containing images of 40 persons, with 10 images each of them. And what we should do first is to extract the information of the first 5 images of each person as training set. According to the PCA method, we should reshape 2D images into 1D image vectors. In this experiment, we prefer column vector to row vector. So, a R*200 matrix will be created, with R = 56*46, which is the width and height of a jpg image.3. Plot the first 20 Eigenfaces for the PCA approach.Figure 1: Eigen faces for standard PC A4. Reconstruct a sample image from the test set using PCA features of given sizesThe array I chose as PCA feature size is [2, 5, 10, 20, 40, 60, 100, 150, 200, 400, 1000, 2000]. The sample image which I chose is the (12, 10). See Figure 2.Figure 2: Reconstruct a sample face image (12, 10).The next screenshot is for reconstructing a non-face image. It is the Eiffel Tower. See figure 2-(b).Figure 3: Reconstruct a non-face image.Comment: In the Figure 2, we can see that from the 100, the rudiment of this man’s head portrait reveals, his hair style is clear to us from 600. Let’s observe the Figure 3, we notice thatthis image looks like a tower after 1000, before the 1000, it just seems like a nose of a man. So we can draw a conclusion that the face database plays better on identifying the human face image than identifying the non-face image.5. Analysis on the performance of PCA methodThe total variance explained by selecting the biggest K eigenvectors is computed by∑∑ .Figure 4: computing the rank-1 identification accuracyFigure 5: total variance explainedNumber of Eigenfaces VarianceExplainedRank-1Recognition Rate2 0. 332850 0.32008 0.614053 0.805010 0. 654791 0.840015 0.722345 0.840020 0.767350 0.865027 0.810986 0.875050 0.892373 0.890090 0.952278 0.8900120 0.974525 0.9000150 0.988557 0.9050200 1.000000 0.9050250 1.000000 0.9050500 1.000000 0.90501000 1.000000 0.90501500 1.000000 0.90502000 1.000000 0.9050Table 1: The performance of PCA based face recognizer with respect to feature dimensionality6. SummaryWe can see from the table that when the Number of Eigenfaces reaches 8, the Recognition Rate increases rapidly to around 0.8, and keep increasing gradually in the wake of Number of Eigenfaces. And then stops increasing at 0.9050 when number of Eigenfaces is 150.And when the number of eigenface up to 200, the variances explained reaches 1, it means that all the following eigen values is of no use any more. The 1-200 eigen vectors contains all the data we need to this experiment.。
视频中的人脸检测定位与跟踪识别(1)
视频中的人脸检测定位与跟踪识别华见华见 张祥张祥张祥 龚小彪龚小彪龚小彪(西南交通大学信息科学与技术学院,四川 成都成都 610031 610031 610031))摘要人脸检测定位跟踪作为生物特征识别的一项重要技术,其应用相当广泛。
人脸检测定位跟踪的方法有很多,为了实现视频中彩色图像人脸的精确定位,本文采用了一种基于肤色模型、肤色分割处理的人脸定位算法。
肤色分割处理的人脸定位算法。
通过建立肤色模型,通过建立肤色模型,通过建立肤色模型,经自适应阈值的二值化处理后,经自适应阈值的二值化处理后,经自适应阈值的二值化处理后,再进行再进行肤色分割,肤色分割,将非人脸区域去除;将非人脸区域去除;将非人脸区域去除;最终利用眼睛特征定位人脸。
最终利用眼睛特征定位人脸。
最终利用眼睛特征定位人脸。
实验结果表明,实验结果表明,该算法对于复杂背景下的彩色图像中的人脸正面定位和人脸转动一定角度后定位都有较好效果。
杂背景下的彩色图像中的人脸正面定位和人脸转动一定角度后定位都有较好效果。
关键字:人脸检测跟踪;人脸检测跟踪; 肤色建模;肤色建模; 二值化;二值化;Face Detection And Tracking Identification In The Video HuaJian Zhang Xiang Gong Xiaobiao (School of Information Science & Technology, Southwest Jiaotong University, Chengdu, 610031, China )AbstractFace Face detection detection detection positioning positioning and and tracking tracking tracking as as as a a a biological biological biological feature feature feature recognition recognition recognition is is is an an an important important technique, it is widely used in many aspects. In this article, in order to localize the human face in color color images captured images captured from from the the the video video video accurately, accurately, accurately, a a a human human human face face face localization localization localization algorithm algorithm algorithm based based based on on skin module and skin color segmentation was presented. Firstly, we build the skin module. Then, the the non-face non-face non-face region region region was was was removed removed removed in in in color color color image image image after after after binary binary binary image image image processing processing processing with with with adaptive adaptive threshold and the skin color segmentation. And finally the human face was localized by using the characteristic the eyes Experiments show that the algorithm is effective to localize the human front face and the face after turning an angle in color images under complex background. key words: face detection and tracking; skin module; enbinary 目录第1章绪论 ...................................................................................................................... 3 1.1 1.1 课题研究背景与意义课题研究背景与意义 (3)1.2 1.2 国内外研究状况国内外研究状况 (4)1.3 1.3 人脸检测与跟踪的难点人脸检测与跟踪的难点 (4)1.4 1.4 主要研究内容及章节安排主要研究内容及章节安排 (5)第2章人脸检测和跟踪的主要方法 (6)2.1 2.1 人脸检测的方法人脸检测的方法 (6)2.2 2.2 基于肤色的检测方法基于肤色的检测方法 (7)2.2.1 RGB 模型模型.................................................................................................. 7 2.2.2 YCbCr(YUV)2.2.2 YCbCr(YUV)格式格式 (8)2.2.3 HSV 2.2.3 HSV(色调(色调(色调//饱和度饱和度//强度)模型强度)模型................................................................ 8 2.3 2.3 基于启发式模型的方法基于启发式模型的方法 (9)2.3.1 2.3.1 基于知识的方法基于知识的方法......................................................................................10 2.3.2 2.3.2 基于局部特征的方法基于局部特征的方法...............................................................................10 2.3.3 2.3.3 基于模板的方法基于模板的方法......................................................................................10 2.3.4 2.3.4 基于统计模型方法基于统计模型方法 .................................................................................. 11 2.4 2.4 人脸跟踪的方法人脸跟踪的方法 ................................................................................................ 11 2.4.1 2.4.1 基于特征检测方法的人脸跟踪基于特征检测方法的人脸跟踪.................................................................12 2.4.2 2.4.2 基于模型的人脸跟踪基于模型的人脸跟踪...............................................................................12 2.5 2.5 本章小结本章小结...........................................................................................................14 第3章基于肤色模型的单图片人脸检测 ...........................................................................15 3.1 3.1 基于肤色的人脸定位基于肤色的人脸定位 .........................................................................................15 3.2 RGB 到YCrCb 色彩模型的转换色彩模型的转换............................................................................15 3.3 3.3 人脸肤色模型和二值化人脸肤色模型和二值化......................................................................................16 3.4 3.4 后处理后处理 ..............................................................................................................19 3.5 3.5 人脸定位人脸定位...........................................................................................................19 3.6 3.6 本章小结本章小结...........................................................................................................20 第4章基于肤色模型视频中的人脸检测 ...........................................................................21 4.1算法流程 ...........................................................................................................21 4.2 4.2 图像差分——运动目标提取图像差分——运动目标提取图像差分——运动目标提取.............................................................................21 4.3 4.3 模型建立和光补偿模型建立和光补偿.............................................................................................22 4.4 4.4 眼部特征检测眼部特征检测....................................................................................................24 4.5 4.5 本章小结本章小结...........................................................................................................25 第5章 总结 ...................................................................................................................25 参考文献.........................................................................................................................26 第1章绪论1.1 课题研究背景与意义近年来,随着计算机技术和数字信号处理技术的迅猛发展,人们用摄像机获取环境图像并将其转换成数字信号,且利用计算机实现对视觉信息处理的全过程,这就是计算机视觉技术的起源。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
(fs@)This paper details work done on face processing using a novel approach in-volving Hidden Markov Models.Experimental results from earlier work [14]indicated that left-to-right models with use of structural information yield better feature extraction than ergodic models.This paper illustrates how these hybrid models can be used to extract facial bands and automatically segment a face image into meaningful regions,showing the bene ts of simul-taneous use of statistical and structural information.It is shown how the segmented data can be used to identify di erent subjects.Successful seg-mentation and identi cation of face images was obtained,even when facial details (with/without glasses,smiling/non-smiling,open/closed eyes)were varied.Some experiments with a simple left-to-right model are presented to support the plausibility of this approach.Finally,present and future directions of research work using these models are indicated.AbstractMost of our social behaviour is dependent on the correct identi cation of the peo-ple surrounding us.Humans are generally able to infer information regarding sex,age and expression and use it reliably to identify faces.We are able to identify faces even when they are distorted (as in a caricature)or coarsely quantised,when they have occluded details and sometimes even when they have been inverted [2].This process is here called face identi cation [13],where other authors in the lit-erature have often used the term face recognition [1].Faces play a fundamental role in social interactions and substantial research e ort has gone into trying to understand how to build a successful model for face identi cation,both by psychologists and information scientists.Apart from its relevance to research into arti cial intelligence,the importance of face identi ca-tion stems from its numerous potential applications.A successful system could be used for building and workstation security [3],criminal identi cation,credit card veri cation and video-mail retrieval [5].Ferdinando SamariaCambridge University Engineering Department,Trumpington St.Cambridge CB21PZ,United Kingdom,tel.+44223332752Olivetti Research Ltd,Old Addenbrookes Site,24a Trumpington St.Cambridge CB21QA,United Kingdom,tel.+442233430001INTRODUCTIONFace Segmentation For Identi cationUsing Hidden Markov Models12121=1121N i t M i t ij ij t j t i ij N j ij j j t k t j N i i ÀThis paper presents a new architecture to describe facial features that uses continuous density Hidden Markov Models (HMMs)[11].Face images are auto-matically segmented into horizontal regions,each of which is represented in the model by a \facial band"[13].Faces are treated as two-dimensional objects and the segmentation is performed by extracting facial features.However,by making simultaneous use of information (yielding a model),these statistical facial features are made to correspond to features as understood by humans.The following sections present the proposed approach with a general overview of HMMs and some experimental results.The paper concludes with an indication of the strengths of the approach and areas of future research work.A new method is proposed to automatically extract facial features from a set of training data (database)and successively use them to identify test images.This is achieved using a particular context-dependent classi er [16]based on HMMs.HMMs are a set of statistical models used to characterise the statistical properties of a signal.Rabiner [11]provides an extensive and complete tutorial on HMMs.The elements of an HMM can be formally de ned by specifying the following parameters:=is the number of states in the model,where =is the set of possible states .The state of the model at time is given by ,1where is the length of the observation sequence (number of frames).=is the number of the di erent observation symbols,where=is the set of all the possible observation symbols (also called the of the model).The observation symbol at time is given by ,1.=is the state transition probability matrix,where:=[==]110=11=()is the observation symbol probability matrix,where:()=[==]11=is the initial state distribution,where:=[=]1N S S s ;s ;:::;s s t q S t T T M V V ;;:::;t V t T a a P r q s q s ;i;j Na a ;i Nb k b k P r q s j Nk M; ;:::; P r q s ;i Nj j f g P j j f g P f g j !! f g j f g statistical structural hybrid code book v v v v o A B o v 2.1Introducing HMMs2HMM BASED APPROACHXo o XL YMi i+11Y L L MÀÀ2121T T i Thenumber of observation vectors can be calculated as =+1T Using shorthand notation,an HMM is de ned as:=()HMMs provide a way of modeling the statistical properties of one-dimensional (1D)signals.The 1D nature of certain signals,for example speech [10],is suited to analysis by HMM:the probability of an observation sequence given a model is computed using the forward-backward algorithm,which reduces the order of computation from 2to [11].Images,however,are two-dimensional (2D).No direct equivalent of 1D HMMs exists for 2D signals,where a fully connected 2D HMM would lead to an NP-complete problem [9].Attempts have been made to use multi-model representations that give pseudo 2D structures [8].In this paper,traditional 1D HMMs will be used.This poses the problem of extracting a meaningful 1D sequence from a 2D image.One solution is to consider either a temporal or a spatial sequence:these two methods were discussed in [12],where it was concluded that spatial sequences of windowed data give more meaningful models.Initial results [14]showed that a left-to-right HMM with top-to-bottom line block sampling gives physiologically signi cant features.The following section details how these models can be built.Figure 1:Sampling TechniqueA training set of di erent face images is collected for each subject.The sampling technique described in gure 1converts each image into a sequence of column vectors =(),which are spatially ordered in a top-to-bottom sense .Each vector represents the intensity level values of all the pixels contained in the corresponding ing this technique each image is sampled into a sequence of overlapping line blocks.The overlapping allows the;; T N N T X L ;;:::;ÂA B O O o o o o 2.2Preparing Face Images For HMM Analysis2.3Training The Models1 forehead2 eyes3 nose4 mouth5 chin4123a aaaa12a 23a 3445a 11223344555a()()()()()()()()()()1()k k k k k k ii k k k k k i Figure 2:Facial Regions For 5-state left-to-right HMMmodel to capture signi cant features independently of their vertical position (a disjoint partitioning of the image could result in features occurring across block boundaries being truncated).Assuming that each face is in an upright,frontal position,feature regions would occur in a predictable order,for example eyes following forehead,nose following eyes,and so on.This ordering suggests the use of a left-to-right HMM (see gure 2for a 5-state model),where only transitions between adjacent states in a left-to-right manner will be allowed.The HMM will then segment the image into statistically similar regions,each of which will be represented by a facial band [13].Each facial band corresponds to one state in the model and is represented by the mean of that state.One model is trained for each subject in the database,yielding a left-to-right HMM:=()1where is the total number of di erent subjects in the database.The param-eters of the trained HMM have meanings as follows:measures the probability of going from one facial band to another.Aftertraining,will record the occurrences of transitions from one band to another across the face and the thickness of the various bands (this can be seen by considering the terms which represent the probability of staying in the same facial band).measures the probability of observing a feature vector,given that one islooking at a particular facial band.After training,will record the feature vector distribution per subject across the various bands.all observations sequences start in the top band and therefore,with thepresent model,this parameter does not provide any useful discriminatinginformation (=1and =0,11).To test some of the ideas presented in the previous sections,a simple spatial left-to-right HMM was built.Five images of 20di erent subjects were collected and a separate HMM trained for each subject.The 20resulting HMMs were used to;; ;k FF a <i N;k F A B A :A B :B :3EXPERIMENTAL RESULTSFigure3:Training Dataclassify unknown images.Each image was of size184224(8-bit grey levels)and was sampled into a1D observation sequence using a line block of size16,moving down4lines at a time(with a12-line overlap).For such model the parameters of gure1have the following values:=184=224=16=12.For each subject, ve such sequences(one per training image)were generated and used to train a5-state left-to-right HMM.The number of states of the HMM was set to ve because,by inspection,approximately ve distinct regions seem to appear in the face image(see gure2).Figure4:Segmented Training Data and Magni ed State Mean VectorsTraining and testing were carried out using thewritten at the Cambridge University Engineering Department by Steve Young[20].After creating a prototype model(specifying number of states, observation vector size and allowed transitions),a set of initial parameters was it-eratively computed from the training data,using at rst uniform segmentation and successively Viterbi alignment.Parameters were then re-estimated using Baum-Welch re-estimation[11]and adjusted so as to maximise the probability of observ-ing the training data,given each corresponding model.X;Y;L;MÂi.e.HTK:Hidden Markov Model Toolkit V1.33.1Training,Feature Extraction And SegmentationFigure 3shows the training data for two subjects in the database.An HMM was trained for each of them and gure 4shows how the data was segmented,together with the mean vectors for the 5states found by the HMM.The training data is segmented as predicted in gure 2.The training images for the same subject di er from each other,nonetheless the segmented regions are found to be consistent across the training ensemble.The states for the two subjects correspond to physical facial features.For example,the eye band can be identi ed in both cases.There is evidence that one of the salient features for identi cation used by humans is the eye region of the face [15].In some of the remaining states extracted by the HMM,a leaking e ect is visible where some states appear lightly in the next one.Because of the overlap,the rst observation vector to fall into a new state contains lines in common with the previous state.The contribution to the state mean of the common lines will cause the leaking e ect.Figure 5shows the segmented training data for 4other subjects.The seg-mented regions are again consistent with one another,even when facial details di er greatly (as in the case of the subject with and without glasses).Figure 5:Segmented Training Data For Other SubjectsThe full database comprised of 20distinct subjects.In order to identify a test image,each was converted into a 1D observation sequence using the samplingM O 3.2Using The Models For Identi cation1st 2nd 1st 2nd 1st 1st 2nd 2nd2()2k Thetotal number of frames was =53T technique previously illustrated.The observation sequence was matched against each of the 20models in the database and the model likelihoods were computed:[]120The highest match was chosen and the person corresponding to the chosen model revealed the identity of the subject in the unknown image.Figure 6:Test Results For Unseen ImagesThe 100training images (5pictures of 20di erent subjects)were tested and correctly identi ed.The average log-likelihood per frame for the correct matches varied between 12600and 13700.Other experiments were carried out on im-ages that had not been seen during the training phase.Figure 6shows four such test images and the segmentation obtained with the best and second best match.The average log-likelihood per frame for the best match (which was correct for all 4cases)varied between 13500and 15000[13].Other experiments showed that various images not containing a face had scores below 18500[13].The hand-drawn head depicted in gure 7had a best match score of approx-imately 15500.The segmentation results re ect quite accurately the model states.The eye band is located correctly and the score indicates that this method could be used to test the presence of a face-shaped object in an image.Some of the test images in gure 6di er considerably from the training images and the successful results obtained indicate that the model exhibits a certain ro-bustness to feature variation.The segmentation obtained for these images is also consistent with that of the training data,even when facial details di er substan-tially.These results suggest that similar lighting conditions are not an essentialP r ;k ;;;;;;j ÀÀÀÀÀÀO2nd1stFigure7:Segmentation Of Hand-drawn Faceconstraint for the model to work successfully.If shown to hold for a larger database and test set,these results would constitute a strong advantage for the HMM based approach over other more traditional approaches.Recent work on human action recognition[19]also reports successful results using an HMM based method ap-plied to images.The approach proposed in this paper shows that HMMs provide a new method for automatic face segmentation and image classi cation.The statistical features ob-tained by the HMM have been shown to correspond to physical features as under-stood by humans,when structural information is used to build the model.Other methods for feature extraction,such as Arti cial Neural Networks[6],deformable templates[21]or snakes[7],have usually required considerable initial guidance. Successful identi cation results were obtained with relatively low constraints on the image data.Small orientation changes,non-homogeneous lighting and local feature variation(with/without glasses,smiling/non-smiling,open/closed eyes) are dealt with automatically.Template-based models have generally su ered from such variations.It was also shown that robust face segmentation was obtained even for images substantially di erent from the training images.Experiments have started to compare this approach with other purely statistical methods:ini-tial results based on a system as described in[17]with only one training image per subject indicate that a better identi cation performance is achieved with the HMM based approach.The classi er always outputs the best matching model from the database.The initial results investigated in this paper seem to indicate that the log-likelihood values are su ciently discriminate to enable the presence of a face in the image to be detected.Further vertical segmentation of each horizontal band will be investigated.This approach may yield more accurate feature location:for example,segmenting the eye band into5further vertical regions might locate the eyes.Initial experiments with very limited training data have already shown encouraging results.Model likelihood can be obtained using a pseudo2D structure as described in[8].This 4CONCLUSIONSkind of representation makes more use of 2D information and will be compared with the purely 1D approach presented in this paper.Future work will investigate the possibility of enhancing the facial bands using standard image processing techniques.Facial bands are e ectively blurred images obtained by averaging all those line blocks which exhibit similar statistical prop-erties.Image enhancement techniques [4]applied to facial band images may yield more suitable state mean vectors for segmentation and classi cation.At present the image analysis is carried out in the space domain.There is evidence [18]that frequency or frequency/space representations may yield better data separability and approaches will be investigated to process the face bands in the frequency domain.This work is supported by a Trinity College Internal Graduate Studentship and an Olivetti Research Ltd CASE award.Their support is gratefully acknowledged.Thanks also to many colleagues in Cambridge:Andy Hopper,Frazer Bennett and Andy Harter of Olivetti Research,for useful discussions and for the image capture software;Steve Young and the Speech Group at CUED for the HMM software;Owen Jones and Gabor Megezy of the Department of Pure Mathematics and Mathematical Statistics,Barney Pell of the Computer Lab,and Steve Hodges and Tat-Jen Cham of CUED for the many ideas discussed stly,I wish to remember the late Prof.Frank Fallside,whose ingenuity illuminated my work on so many occasions.[1]urence Erlbaum Associates,1988.[2]R.Diamond and S.Carey.Why faces are and are not special:an e ect ofexpertise.,115,2:107{117,1986.[3]R.Gallery and T.I.P.Trew.An architecture for face classi cation.,2:1{5,1992.[4]R.C.Gonzalez and P.Wintz..Addison-Wesley,second edition,1987.[5]A.Hopper.Digital video on computer workstations.,1992.[6]R.A.Hutchinson and parison of neural networks and con-ventional techniques for feature location in facial images.,1989.Recognising Faces Journal of Experimantal Psychology:General IEE Collo-quium on `Machine Storage and Recognition of Faces',Digest No:1992/017Digital Image Processing Proceedings of Euro-graphics`92IEE International Conference on Arti cial Neural Networks,Conf.Publication Number 313AcknowledgementsReferences[7]M.Kass,A.Witkin,and D.Terzopoulos.Snakes:Active contour models.,pages 259{268,1987.[8]S.Kuo and O.E.Agazzi.Machine vision for keyword spotting using pseudo2d hidden markov models.,V:81{84,1993.[9]E.Levin and R.Pieraccini.Dynamic planar warping for optical characterrecognition.,III:149{152,1992.[10]T.W.Parsons..McGraw-Hill,1986.[11]L.R.Rabiner.A tutorial on hidden markov models and selected applicationsin speech recognition.,77,2:257{286,January 1989.[12]F.Samaria..1st Year Report,Cambridge University Engineering Department,1992.[13]F.Samaria and F.Fallside.Automated face identi cation using hiddenmarkov models.In .The Japan Society of mechanical Engineers,1993.[14]F.Samaria and F.Fallside.Face identi cation and feature extraction usinghidden markov models.In G.Vernazza,editor,.Elsevier,1993.[15]J.Shepherd,G.Davies,and H.Ellis.Studies of cue saliency.In G.Davies,H.Ellis,and J.Shepherd,editors,,pages 105{131.Academic Press,1981.[16]C.W.Therrien..Wiley,1989.[17]M.Turk and A.Pentland.Eigenfaces for recognition.,3,1:71{86,1991.[18]H.Wechsler..Academic Press,San Diego CA,1990.[19]J.Yamato,J.Ohya,and K.Ishii.Recognizing human action in time-sequentialimages using hidden markov model.,pages 379{385,1992.[20]S.J.Young..Reference Manual,Cambridge University Engineering Department,1992.[21]A.L.Yuille,P.W.Hallinan,and D.S.Cohen.Feature extraction from faces us-ing deformable templates.,8,2:99{11,1992.ICCV Proceedings of ICASSP`93Proceedings of ICASSP`92Voice and speech processing Proceedings of the IEEE Face Identi cation using Hidden Markov Models Proceedings of the International Conference on Advanced Mechatronics Image Processing:Theory and Applications Perceiving and remembering faces Decision estimation and classi cation Journal of CognitiveNeuroscience Computational Vision Proceedings of CVPR`92HTK:Hidden Markov Model Toolkit V1.3International Journal of Computer Vision。