[2004 ECCV] Face Recognition with Local Binary Patterns
人脸识别论文文献翻译中英文
人脸识别论文文献翻译中英文人脸识别论文中英文附录(原文及译文)翻译原文来自Thomas David Heseltine BSc. Hons. The University of YorkDepartment of Computer ScienceFor the Qualification of PhD. -- September 2005 -《Face Recognition: Two-Dimensional and Three-Dimensional Techniques》4 Two-dimensional Face Recognition4.1 Feature LocalizationBefore discussing the methods of comparing two facial images we now take a brief look at some at the preliminary processes of facial feature alignment. This process typically consists of two stages: face detection and eye localisation. Depending on the application, if the position of the face within the image is known beforehand (for a cooperative subject in a door access system for example) then the face detection stage can often be skipped, as the region of interest is already known. Therefore, we discuss eye localisation here, with a brief discussion of face detection in the literature review(section 3.1.1).The eye localisation method is used to align the 2D face images of the various test sets used throughout this section. However, to ensure that all results presented arerepresentative of the face recognition accuracy and not a product of the performance of the eye localisation routine, all image alignments are manually checked and any errors corrected, prior to testing and evaluation.We detect the position of the eyes within an image using a simple template based method. A training set of manually pre-aligned images of faces is taken, and each image cropped to an area around both eyes. The average image is calculated and used as a template.Figure 4-1 - The average eyes. Used as a template for eye detection.Both eyes are included in a single template, rather thanindividually searching for each eye in turn, as the characteristic symmetry of the eyes either side of the nose, provides a useful feature that helps distinguish between the eyes and other false positives that may be picked up in the background. Although this method is highly susceptible to scale(i.e. subject distance from thecamera) and also introduces the assumption that eyes in the image appear near horizontal. Some preliminary experimentation also reveals that it is advantageous to include the area of skin just beneath the eyes. The reason being that in some cases the eyebrows can closely match the template, particularly if there are shadows in the eye-sockets, but the area of skin below the eyes helps to distinguish the eyes from eyebrows (the area just below the eyebrows contain eyes, whereas the area below the eyes contains only plain skin).A window is passed over the test images and the absolute difference taken to that of the average eye image shown above. The area of the image with the lowest difference is taken as the region of interest containing the eyes. Applying the same procedure using a smallertemplate of the individual left and right eyes then refines each eye position.This basic template-based method of eye localisation, although providing fairly preciselocalisations, often fails to locate the eyes completely. However, we are able to improve performance by including a weighting scheme.Eye localisation is performed on the set of training images, whichis then separated into two sets: those in which eye detection was successful; and those in which eye detection failed. Taking the set of successful localisations we compute the average distance from the eye template (Figure 4-2 top). Note that the image is quite dark, indicating that the detected eyes correlate closely to the eye template, as wewould expect. However, bright points do occur near the whites of the eye, suggesting that this area is often inconsistent, varying greatly fromthe average eye template.Figure 4-2 – Distance to the eye template for successful detections (top) indicating variance due tonoise and failed detections (bottom) showing credible variance dueto miss-detected features.In the lower image (Figure 4-2 bottom), we have taken the set of failed localisations(images of the forehead, nose, cheeks, background etc. falsely detected by the localisation routine) and once again computed the average distance from the eye template. The bright pupils surrounded by darker areas indicate that a failed match is often due to the high correlation of the nose and cheekbone regions overwhelming the poorly correlated pupils. Wanting to emphasise the2difference of the pupil regions for these failed matches and minimise the variance of the whites of the eyes for successful matches, we divide the lower image values by the upper image to produce a weights vector as shown in Figure 4-3. When applied to the difference image before summing a total error, this weighting scheme provides a much improved detection rate.Figure 4-3 - Eye template weights used to give higher priority to those pixels that best represent the eyes.4.2 The Direct Correlation ApproachWe begin our investigation into face recognition with perhaps the simplest approach,known as the direct correlation method (also referred to as template matching by Brunelli and Poggio [ 29 ]) involving the direct comparison of pixel intensity values taken from facial images. We use the term ‘Direct Correlation’ to encompass all techniques in which face images are compared directly, without any form of image spaceanalysis, weighting schemes or feature extraction, regardless of the distance metric used. Therefore, we do not infer that Pearson’s correlation is applied as the similarity function (although such an approach would obviously come under our definition of direct correlation). We typically use the Euclidean distance as our metric in these investigations (inversely related to Pearson’s correlation and can be considered as a scale and translation sensitive form of image correlation), as this persists with the contrast made between image space and subspace approaches in later sections.Firstly, all facial images must be aligned such that the eye centres are located at two specified pixel coordinates and the image cropped to remove any backgroundinformation. These images are stored as greyscale bitmaps of 65 by 82 pixels and prior to recognition converted into a vector of 5330 elements (each element containing the corresponding pixel intensity value). Each corresponding vector can be thought of as describing a point within a 5330 dimensional image space. This simple principle can easily be extended to much larger images: a 256 by 256 pixel image occupies a single point in 65,536-dimensional image space and again, similar images occupy close points within that space. Likewise, similar faces are located close together within the image space, while dissimilar faces are spaced far apart. Calculating the Euclidean distance d, between two facial image vectors (often referred to as thequery image q, and gallery image g), we get an indication of similarity. A threshold is thenapplied to make the final verification decision.d q g (d threshold ?accept d threshold ?reject ) . Equ. 4-134.2.1 Verification TestsThe primary concern in any face recognition system is its ability to correctly verify a claimed identity or determine a person's most likely identity from a set of potential matches in a database. In order to assess a given system’s ability to perform these tasks, a variety of evaluation methodologies have arisen. Some of these analysis methods simulate a specific mode of operation (i.e. secure site access or surveillance), while others provide a more mathematical description of data distribution in someclassification space. In addition, the results generated from each analysis method may be presented in a variety of formats. Throughout the experimentations in this thesis, we primarily use the verification test as our method of analysis and comparison, although we also use Fisher’s Linear Discriminant to analyse individual subspace components in section 7 and the identification test for the final evaluations described in section 8. The verification test measures a system’s ability to correctly accept or reject the proposed identity of an individual. At a functional level, this reduces to two images being presented forcomparison, for which the system must return either an acceptance (the two images are of the same person) or rejection (the two images are of different people). The test is designed to simulate the application area of secure site access. In this scenario, a subject will present some form of identification at a point of entry, perhaps as a swipe card, proximity chip or PIN number. This number is then used to retrieve a stored image from a database of known subjects (often referred to as the target or gallery image) and compared with a live image captured at the point of entry (the query image). Access is then granted depending on the acceptance/rejection decision.The results of the test are calculated according to how many times the accept/reject decision is made correctly. In order to execute this test we must first define our test set of face images. Although the number of images in the test set does not affect the results produced (as the error rates are specified as percentages of image comparisons), it is important to ensure that the test set is sufficiently large such that statistical anomalies become insignificant (for example, a couple of badly aligned images matching well). Also, the type of images (high variation in lighting, partial occlusions etc.) will significantly alter the results of the test. Therefore, in order to compare multiple face recognition systems, they must be applied to the same test set.However, it should also be noted that if the results are to be representative of system performance in a real world situation, then the test data should be captured under precisely the same circumstances asin the application environment.On the other hand, if the purpose of the experimentation is to evaluate and improve a method of face recognition, which may be applied to a range of application environments, then the test data should present the range of difficulties that are to be overcome. This may mean including a greater percentage of ‘difficult’ images than4would be expected in the perceived operating conditions and hence higher error rates in the results produced. Below we provide the algorithm for executing the verification test. The algorithm is applied to a single test set of face images, using a single function call to the face recognition algorithm: CompareFaces(FaceA, FaceB). This call is used to compare two facial images, returning a distance score indicating how dissimilar the two face images are: the lower the score the more similar the two face images. Ideally, images of the same face should produce low scores, while images of different faces should produce high scores.Every image is compared with every other image, no image is compared with itself and no pair is compared more than once (we assume that the relationship is symmetrical). Once two images have been compared, producing a similarity score, the ground-truth is used to determine if the images are of the same person or different people. In practicaltests this information is often encapsulated as part of the image filename (by means of a unique person identifier). Scores are thenstored in one of two lists: a list containing scores produced by comparing images of different people and a list containing scores produced by comparing images of the same person. The finalacceptance/rejection decision is made by application of a threshold. Any incorrect decision is recorded as either a false acceptance or false rejection. The false rejection rate (FRR) is calculated as the percentage of scores from the same people that were classified as rejections. The false acceptance rate (FAR) is calculated as the percentage of scores from different people that were classified as acceptances.For IndexA = 0 to length(TestSet)For IndexB = IndexA+1 to length(TestSet)Score = CompareFaces(TestSet[IndexA], TestSet[IndexB])If IndexA and IndexB are the same personAppend Score to AcceptScoresListElseAppend Score to RejectScoresListFor Threshold = Minimum Score to Maximum Score:FalseAcceptCount, FalseRejectCount = 0For each Score in RejectScoresListIf Score <= ThresholdIncrease FalseAcceptCountFor each Score in AcceptScoresListIf Score > ThresholdIncrease FalseRejectCount5FalseAcceptRate = FalseAcceptCount / Length(AcceptScoresList)FalseRejectRate = FalseRejectCount / length(RejectScoresList)Add plot to error curve at (FalseRejectRate, FalseAcceptRate)These two error rates express the inadequacies of the system when operating at a specific threshold value. Ideally, both these figures should be zero, but in reality reducing either the FAR or FRR (by altering the threshold value) will inevitably result in increasing the other. Therefore, in order to describe the full operating range of a particular system, we vary the threshold value through the entire range of scores produced. The application of each threshold value produces an additional FAR, FRR pair, which when plotted on a graph produces the error rate curve shown below.6Figure 4-5 - Example Error Rate Curve produced by the verification test.The equal error rate (EER) can be seen as the point at which FAR is equal to FRR. This EER value is often used as a single figure representing the general recognition performance of a biometric system and allows for easy visual comparison of multiple methods. However, it is important to note that the EER does not indicate the level of error that would be expected in a real world application. It is unlikely that any real system would use a threshold value such that the percentage of false acceptances were equal to the percentage of false rejections. Secure site access systems would typically set the threshold such that false acceptances were significantly lower than false rejections: unwilling to tolerate intruders at the cost of inconvenient access denials. Surveillance systems on the other hand would require low false rejection rates to successfully identify people in a less controlled environment. Therefore we should bear in mind that a system with a lower EER might not necessarily be the better performer towards the extremes of its operating capability.There is a strong connection between the above graph and thereceiver operating characteristic (ROC) curves, also used in such experiments. Both graphs are simply two visualisations of the same results, in that the ROC format uses the True Acceptance Rate(TAR), where TAR = 1.0 – FRR in place of the FRR, effectively flipping thegraph vertically. Another visualisation of the verification test results is to display both the FRR and FAR as functions of the threshold value. This presentation format provides a reference to determine the threshold value necessary to achieve a specific FRR and FAR. The EER can be seen as the point where the two curves intersect.7Figure 4-6 - Example error rate curve as a function of the score thresholdThe fluctuation of these error curves due to noise and other errors is dependant on the number of face image comparisons made to generate the data. A small dataset that only allows for a small number of comparisons will results in a jagged curve, in which large steps correspond to the influence of a single image on a high proportion of thecomparisons made. A typical dataset of 720 images (as used insection 4.2.2) provides 258,840 verification operations, hence a drop of 1% EER represents an additional 2588 correct decisions, whereas the quality of a single image could cause the EER tofluctuate by up to 0.28.4.2.2 ResultsAs a simple experiment to test the direct correlation method, we apply the technique described above to a test set of 720 images of 60 different people, taken from the AR Face Database [ 39 ]. Every image is compared with every other image in the test set to produce a likeness score, providing 258,840 verification operations from which to calculate false acceptance rates and false rejection rates. The error curve produced is shown in Figure 4-7.Figure 4-7 - Error rate curve produced by the direct correlation method using no image preprocessing.We see that an EER of 25.1% is produced, meaning that at the EER threshold8approximately one quarter of all verification operations carried out resulted in an incorrect classification. There are a number of well-known reasons for this poor level of accuracy. Tiny changes in lighting, expression or head orientation cause the location in image space to change dramatically. Images in face space are moved far apart due to these image capture conditions, despite being of the same person’s face. The distance between images of different people becomes smaller than the area of face space covered by images of the same person and hence false acceptances and false rejections occur frequently. Other disadvantages include the large amount of storage necessary for holding many face images and the intensive processing required for each comparison, making this method unsuitable for applications applied to a large database. In section 4.3 we explore the eigenface method, which attempts to address some of these issues.4 二维人脸识别4.1 功能定位在讨论比较两个人脸图像,我们现在就简要介绍的方法一些在人脸特征的初步调整过程。
Robust Face Recognition via Sparse Representation
Robust Face Recognition via Sparse Representation -- A Q&A about the recent advances in face recognitionand how to protect your facial identityAllen Y. Yang (yang@)Department of EECS, UC BerkeleyJuly 21, 2008Q: What is this technique all about?A: The technique, called robust face recognition via sparse representation, provides a new solution to use computer program to classify human identity using frontal facial images, i.e., the well-known problem of face recognition.Face recognition has been one of the most extensively studied problems in the area of artificial intelligence and computer vision. Its applications include human-computer interaction, multimedia data compression, and security, to name a few. The significance of face recognition is also highlighted by a contrast between human’s high accuracy to recognize face images under various conditions and the computer’s historical poor accuracy.This technique proposes a highly accurate recognition framework. The extensive experiment has shown the method can achieve similar recognition accuracy as human vision, for the first time. In some cases, the method has outperformed what human vision can achieve in face recognition.Q: Who are the authors of this technique?A: The technique was developed in 2007 by Mr. John Wright, Dr. Allen Y. Yang, Dr. S. Shankar Sastry, and Dr. Yi Ma.The technique is jointly owned by the University of Illinois and the University of California, Berkeley. A provisional US patent has been filed in 2008. The technique is also being published in the IEEE Transactions on Pattern Analysis and Machine Intelligence [Wright 2008].Q: Why is face recognition difficult for computers?A: There are several issues that have historically hindered the improvement of face recognition in computer science.1.High dimensionality, namely, the data size is large for face images.When we take a picture of a face, the face image under certain color metrics will be stored as an image file on a computer, e.g., the image shown in Figure 1. Because the human brain is a massive parallel processor, it can quickly process a 2-D image and match the image with the other images learned in the past. However, the modern computer algorithms can only process 2-D images sequentially, meaning, it can only process an image pixel-by-pixel. Hence although the image file usually only takes less than 100 K Bytes to store on computer, if we treat each image as a sample point, it sits in a space of more than 10-100 K dimension (that is each pixel owns an individual dimension). Any pattern recognition problem in high-dimensional space (>100 D) is known to be difficult in the literature.Fig. 1. A frontal face image from the AR database [Martinez 1998]. The size of a JPEG file for this image is typically about 60 Kbytes.2.The number of identities to classify is high.To make the situation worse, an adult human being can learn to recognize thousands if not tens of thousands of different human faces over the span of his/her life. To ask a computer to match the similar ability, it has to first store tens of thousands of learned face images, which in the literature is called the training images. Then using whatever algorithm, the computer has to process the massive data and quickly identify a correct person using a new face image, which is called the test image.Fig. 2. An ensemble of 28 individuals in the Yale B database [Lee 2005]. A typical face recognition system needs to recognition 10-100 times more individuals. Arguably an adult can recognize thousands times more individuals in daily life.Combine the above two problems, we are solving a pattern recognition problem to carefully partition a high-dimensional data space into thousands of domains, each domain represents the possible appearance of an individual’s face images.3.Face recognition has to be performed under various real-world conditions.When you walk into a drug store to take a passport photo, you would usually be asked to pose a frontal, neutral expression in order to be qualified for a good passport photo. The store associate will also control the photo resolution, background, and lighting condition by using a uniform color screen and flash light. However in the real world, a computer program is asked to identify humans without all the above constraints. Although past solutions exist to achieve recognition under very limited relaxation of the constraints, to this day, none of the algorithms can answer all the possible challenges, including this technique we present.To further motivate the issue, human vision can accurately recognize learned human faces under different expressions, backgrounds, poses, and resolutions [Sinha 2006]. With professional training, humans can also identify face images with facial disguise. Figure 3 demonstrates this ability using images of Abraham Lincoln.Fig. 3. Images of Abraham Lincoln under various conditions (available online). Arguably humans can recognize the identity of Lincoln from each of these images.A natural question arises: Do we simply ask too much for a computer algorithm to achieve? For some applications such as at security check-points, we can mandate individuals to pose a frontal, neural face in order to be identified. However, in most other applications, this requirement is simply not practical. For example, we may want to search our photo albums to find all the images that contain our best friendsunder normal indoor/outdoor conditions, or we may need to identify a criminal suspect from a murky, low-resolution hidden camera who would naturally try to disguise his identity. Therefore, the study to recognize human faces under real-world conditions is motivated not only by pure scientific rigor, but also by urgent demands from practical applications.Q: What is the novelty of this technique? Why is the method related to sparse representation?A: The method is built on a novel pattern recognition framework, which relies on a scientific concept called sparse representation. In fact, sparse representation is not a new topic in many scientific areas. Particularly in human perception, scientists have discovered that accurate low-level and mid-level visual perceptions are a result of sparse representation of visual patterns using highly redundant visual neurons [Olshausen 1997, Serre 2006].Without diving into technical detail, let us consider an analogue. Assume that a normal individual, Tom, is very good at identifying different types of fruit juice such as orange juice, apple juice, lemon juice, and grape juice. Now he is asked to identify the ingredients of a fruit punch, which contains an unknown mixture of drinks. Tom discovers that when the ingredients of the punch are highly concentrated on a single type of juice (e.g., 95% orange juice), he will have no difficulty in identifying the dominant ingredient. On the other hand, when the punch is a largely even mixture of multiple drinks (e.g., 33% orange, 33% apple, and 33% grape), he has the most difficulty in identifying the individual ingredients. In this example, a fruit punch drink can be represented as a sum of the amounts of individual fruit drinks. We say such representation is sparse if the majority of the juice comes from a single fruit type. Conversely, we say the representation is not sparse. Clearly in this example, sparse representation leads to easier and more accurate recognition than nonsparse representation.The human brain turns out to be an excellent machine in calculation of sparse representation from biological sensors. In face recognition, when a new image is presented in front of the eyes, the visual cortex immediately calculates a representation of the face image based on all the prior face images it remembers from the past. However, such representation is believed to be only sparse in human visual cortex. For example, although Tom remembers thousands of individuals, when he is given a photo of his friend, Jerry, he will assert that the photo is an image of Jerry. His perception does not attempt to calculate the similarity of Jerry’s photo with all the images from other individuals. On the other hand, with the help of image-editing software such as Photoshop, an engineer now can seamlessly combine facial features from multiple individuals into a single new image. In this case, a typical human would assert that he/she cannot recognize the new image, rather than analytically calculating the percentage of similarities with multiple individuals (e.g., 33% Tom, 33% Jerry, 33% Tyke) [Sinha 2006].Q: What are the conditions that the technique applies to?A: Currently, the technique has been successfully demonstrated to classify frontal face images under different expressions, lighting conditions, resolutions, and severe facial disguise and image distortion. We believe it is one of the most comprehensive solutions in face recognition, and definitely one of the most accurate.Further study is required to establish a relation, if any, between sparse representation and face images with pose variations.Q: More technically, how does the algorithm estimate a sparse representation using face images? Why do the other methods fail in this respect?A: This technique has demonstrated the first solution in the literature to explicitly calculate sparse representation for the purpose of image-based pattern recognition. It is hard to say that the other extant methods have failed in this respect. Why? Simply because previously investigators did not realize the importance of sparse representation in human vision and computer vision for the purpose of classification. For example, a well-known solution to face recognition is called the nearest-neighbor method. It compares the similarity between a test image with all individual training images separately. Figure 4 shows an illustration of the similarity measurement. The nearest-neighbor method identifies the test image with a training image that is most similar to the test image. Hence the method is called the nearest neighbor. We can easily observe that the so-estimated representation is not sparse. This is because a single face image can be similar to multiple images in terms of its RGB pixel values. Therefore, an accurate classification based on this type of metrics is known to be difficult.Fig. 4. A similarity metric (the y-axis) between a test face image and about 1200 training images. The smaller the metric value, the more similar between two images. Our technique abandons the conventional wisdom to compare any similarity between the test image and individual training images or individual training classes. Rather, the algorithm attempts to calculate a representation of the input image w.r.t. all available training images as a whole. Furthermore, the method imposes one extra constraint that the optimal representation should use the smallest number of training images. Hence, the majority of the coefficients in the representation should be zero, and the representation is sparse (as shown in Figure 5).Fig. 5. An estimation of sparse representation w.r.t. a test image and about 1200 training images. The dominant coefficients in the representation correspond to the training images with the same identity as the input image. In this example, the recognition is based on downgraded 12-by-10 low-resolution images. Yet, the algorithm can correctly identify the input image as Subject 1.Q: How does the technique handle severe facial disguise in the image?A: Facial disguise and image distortion pose one of the biggest challenges that affect the accuracy of face recognition. The types of distortion that can be applied to face images are manifold. Figure 6 shows some of the examples.Fig. 6. Examples of image distortion on face images. Some of the cases are beyond human’s ability to perform reliable recognition.One of the notable advantages about the sparse representation framework is that the problem of image compensation on distortion combined with face recognition can be rigorously reformulated under the same framework. In this case, a distorted face image presents two types of sparsity: one representing the location of the distorted pixels in the image; and the other representing the identity of the subject as before. Our technique has been shown to be able to handle and eliminate all the above image distortion in Figure 6 while maintaining high accuracy. In the following, we present an example to illustrate a simplified solution for one type of distortion. For more detail, please refer to our paper [Wright 2008].Figure 7 demonstrates the process of an algorithm to recognize a face image with severe facial disguise by sunglasses. The algorithm first partitions the left test image into eight local regions, and individually recovers a sparse representation per region. Notice that with the sunglasses occluding the eye regions, the corresponding representations from these regions do not provide correct classification. However, when we look at the overall classification result over all regions, the nonocclused regions provide a high consensus for the image to be classified as Subject 1 (as shownin red circles in the figure). Therefore, the algorithm simultaneously recovers the subject identity and the facial regions that are being disguised.Fig. 7. Solving for part-based sparse representation using local face regions. Left: Test image. Right: Estimation of sparse representation and the corresponding classification on the titles. The red circle identifies the correct classiciation.Q: What is the quantitative performance of this technique?A: Most of the representative results from our extensive experiment have been documented in our paper [Wright 2008]. The experiment was based on two established face recognition databases, namely, the Extended Yale B database [Lee 2005] and the AR database [Martinez 1998].In the following, we highlight some of the notable results. On the Extended Yale B database, the algorithm achieved 92.1% accuracy using 12-by-10 resolution images, 93.7% using single-eye-region images, and 98.3% using mouth-region images. On the AR database, the algorithm achieves 97.5% accuracy on face images with sunglasses disguise, and 93.5% with scarf disguise.Q: Does the estimation of sparse representation cost more computation and time compared to other methods?A: The complexity and speed of an algorithm are important to the extent that they do not hinder the application of the algorithm to real-world problems. Our technique uses some of the best-studied numerical routines in the literature, namely, L-1 minimization to be specific. The routines belong to a family of optimization algorithms called convex optimization, which have been known to be extremely efficient to solve on computer. In addition, considering the rapid growth of the technology in producing advanced micro processors today, we do not believe there is any significant risk to implement a real-time commercial system based on this technique.Q: With this type of highly accurate face recognition algorithm available, is it becoming more and more difficult to protect biometric information and personal privacy in urban environments and on the Internet?A: Believe it or not, a government agency, a company, or even a total stranger can capture and permanently log your biometric identity, including your facial identity, much easier than you can imagine. Based on a Time magazine report [Grose 2008], a resident living or working in London will likely be captured on camera 300 times per day! One can believe other people living in other western metropolitan cities are enjoying similar “free services.” If you like to stay indoor and blog on the Internet, your public photo albums can be easily accessed over the nonprotected websites, and probably have been permanently logged by search engines such as Google and Yahoo!.With the ubiquitous camera technologies today, completely preventing your facial identity from being obtained by others is difficult, unless you would never step into a downtown area in big cities and never apply for a driver’s license. However, there are ways to prevent illegal and involuntary access to your facial identity, especially on the Internet. One simple step that everyone can choose to do to stop a third party exploring your face images online is to prevent these images from being linked to your identity. Any classification system needs a set of training images to study the possible appearance of your face. If you like to put your personal photos on your public website and frequently give away the names of the people in the photos, over time a search engine will be able to link the identities of the people with the face images in those photos. Therefore, to prevent an unauthorized party to “crawl” into your website and sip through the valuable private information, you should make these photo websites under password protection. Do not make a large amount of personal images available online without consent and at the same time provide the names of the people on the same website.Previously we have mentioned many notable applications that involve face recognition. The technology, if properly utilized, can also revolutionize the IT industry to better protect personal privacy. For example, an assembly factory can install a network of cameras to improve the safety of the assembly line but at the same time blur out the facial images of the workers from the surveillance videos. A cellphone user who is doing teleconferencing can activate a face recognition function to only track his/her facial movements and exclude other people in the background from being transmitted to the other party. All in all, face recognition is a rigorous scientific study. Its sole purpose is to hypothesize, model, and reproduce the image-based recognition process with accuracy comparable or even superior to human perception. The scope of its final extension and impact to our society will rest on the shoulder of the government, the industry, and each of the individual end users. References[Grose 2008] T. Grose. When surveillance cameras talk. Time (online), Feb. 11, 2008.[Lee 2005] K. Lee et al.. Acquiring linear subspaces for face recognition under variable lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, 2005.[Martinez 1998] A. Martinez and R. Benavente. The AR face database. CVC Tech Report No. 24, 1998.[Olshausen 1997] B. Olshausen and D. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, vol. 37, 1997.[Serre 2006] T. Serre. Learning a dictionary of shape-components in visual cortex: Comparison with neurons, humans and machines. PhD dissertation, MIT, 2006.[Sinha 2006] P. Sinha et al.. Face recognition by humans: Nineteen results all computer vision researchers should know about. Proceedings of the IEEE, vol. 94, no. 11, November 2006.[Wright 2008] J. Wright et al.. Robust face recognition via sparse representation. (in press) IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.。
Discriminative Regions for Human Face Detection
ACCV2002:The5th Asian Conference on Computer Vision,23–25January2002,Melbourne,Australia.Discriminative Regions for Human Face Detection∗J.Matas1,2,P.B´ılek1,M.Hamouz2,and J.Kittler21Center for Machine Perception,Czech Technical University{bilek,matas}@cmp.felk.cvut.cz2Centre for Vision,Speech,and Signal Processing,University of Surrey{m.hamouz,j.kittler}@AbstractWe propose a robust method for face detection based on the assumption that face can be represented by arrange-ments of automatically detectable discriminative regions. The appearance of face is modelled statistically in terms of local photometric information and the spatial relationship of the discriminative regions.The spatial relationship be-tween these regions serves mainly as a preliminary evidence for the hypothesis that a face is present in a particular po-sition.Thefinal decision is carried out using the complete information from the whole image patch.The results are very promising.1IntroductionDetection and recognition of objects is the most difficult task in computer vision.In many papers object detection and object recognition are considered as distinct problems, treated separately and under different names,e.g.object localisation(detection)and recognition.In our approach localisation of an object of a given class is a natural gener-alisation of object recognition.In the terminology that we introduce object detection is understood to mean the recog-nition of object’s class,while object recognition implies dis-tinguishing between specific objects from one class.Ac-cordingly,an object class,or category,is a set of objects with similar local surface properties and global geometry. In this paper we focus on object detection,in particular,we address the problem of face localisation.The main idea of this paper is based on the premise that objects in a class can be represented by arrangements of automatically detectable discriminative regions.Discrimi-∗This research was supported by the Czech Ministry of Education under Research Programme MSM210000012Transdisciplinary Biomedical En-gineering Research and by European Commission IST-1999-11159project BANCA.native regions are distinguished regions exhibiting proper-ties important for object detection and recognition.Distin-guished regions are”local parts”of the object surface,ap-pearance of which is stable over a wide range of views and illumination conditions.Instances of the category are repre-sented by a statistical model of appearance of local patches defined in terms of discriminative regions and by their re-lationship.Such a local model of objects has a number of attractive properties,e.g.robustness to partial occlusion and simpler illumination compensation in comparison with global models.Superficially,the framework seems to be no more than a local appearance-based method.The main difference is the focus in our work on the selection of regions where appear-ance is modelled.Detectors of such regions are built during the learning phase.In the detection stage,multiple detec-tors of discriminative regions process the image.Detection is then posed as a combinatorial optimisation problem.De-tails of the scheme are presented in Section3.Before that, previous work is revised in Section2.Experiments in de-tecting human faces based on the proposed framework are described in Section4.Possible refinements of the general framework are discussed in Section5.The main contribu-tions of this paper are summarised in Section6.2Previous WorkMany early object recognition systems were based on two basic approaches:•template matching—one or morefilters(templates), representing each object,are applied to a part of im-age,and from their responses the degree of similarity between the templates and the image is deduced.•measuring geometric features—geometric measure-ments(distance,angle...)between features are ob-tained and different objects are characterised by differ-ent constraints imposed on the measurements.It is was showed by Brunelli et al.[3]that template match-ing outperforms measuring geometric features,since the ap-proach exploits more information extracted from the image. Although template matching works well for some types of patterns,there must be complex solutions to cope with non-rigid objects,illumination variations or geometrical trans-formation due to different camera projections.Both approaches,template matching and measuring ge-ometric constraints,can be combined together to reduce their respective disadvantages.Brunelli et al.[3]showed that a face detector consisting of individual features linked together with crude geometry constraints have better per-formance than a detector based on”whole-face”template matching.Yuille[20]proposed the use of deformable templates to befitted to contrast profiles by the gradient descent of a suitable energy function.A similar approach was proposed by Lades et al.[9]and Wiskott et al.[19].They developed a recognition method based on deformable meshes.The mesh(representing object or object’s class)is overlaid over image and adjusted to obtain the best match between the node descriptors and the image.The likelihood of match is computed from the extent of mesh deformation.Schmid et al.[14,17]proposed detectors based on local-jets.The robustness is achieved by using spatial constraints between locally detected features.The spatial constraints are represented by angle and length ratios,that are supposed to be Gaussian variables each with their own mean and stan-dard deviation.Burl et al.[4,5,6]introduced a principled framework for representing possible deformations of objects using prob-abilistic shape models.The objects are again represented as constellations of rigid features(parts).The features are characterised photometrically.The variability of constella-tions is represented by a joint probability density function.A similar approach is used by Mohan et al.[13]for the detection of human bodies.The local parts are again recog-nised by detectors based on photometric information.The geometric constraints on mutual positions of the local parts in the image are defined heuristically.All the above mentioned methods make decisions about the presence or absence of the object in the image only from geometric constraints.Our proposed method shares the same framework,but in our work the local feature de-tector and geometric constraints define only a set of pos-sible locations of object in the image.Thefinal decision is made using photometric information,where the parts of object between the local features are taken into account as well.There are other differences between our approach and the approach of Schmid[17]or Burl[4,6].A coordinate system is introduced for each object from the object class. This allows us to tackle the problem of selecting distinctive and well localisable features in a natural way whereas in the case of Schmid’s approach,detectable regions were selected heuristically and a model was built from such selected fea-tures.Eventhough Weber[18]used an automatic feature selection,this was not carried out in an object-normalised space(as was in our approach),and consequently no re-quirements on the spatial stability of features were speci-fied.The relative spatial stability of discriminative regions used in our method facilitates a natural affine-invariant way of verifying the presence of a face in the image using corre-spondences between points in the normalized object space and the image,as will be discussed into detail further.3Method OutlineObject detection is performed in three stages.First,the discriminative region detectors are applied to image,and thus a set of candidate locations is obtained.In the second stage,the possible constellations(hypotheses)of discrimi-native regions are formed.In the third stage the likelihood of each hypothesis is computed.The best hypotheses are verified using the photometric information content from the test image.For algorithmic details see Section4.3.In the following sections we define several terms used in object recognition in a more formal way.The main aim of the sections is to unify different approaches in the literature and different taxonomy.3.1Object ClassesFor our purposes,we define an object class as a collec-tion of objects which share characteristic features,i.e.ob-jects are composed of several local parts and these parts are in a specific spatial relationship.We assume the local parts are detectable in the image directly and the possible arrangements of the local parts are given by geometrical constraints.The geometrical constraints should be invari-ant with respect to a predefined group of transformations. Under this assumption,the task of discrimination between two classes can be reduced to measuring the differences be-tween local parts and their geometrical relationships.3.2Discriminative RegionsImagine you are presented with two images depicting ob-jects from one class.You are asked to mark corresponding points in the image pair.We would argue that,unless distin-guished regions are present in the two images,the task is ex-tremely hard.Two views of a white featureless wall,a patch of grass,sea surface or an ant hill might be good examples. However,on most objects,wefind surface patches that can be separated from their surroundings and are detectable overa wide range of views.Before proceeding further,we give a more formal definition of distinguished region:Definition1Distinguished Region(DR)is any subset of an image that is a projection of a part of scene(an object) possessing a distinguishing property allowing its detection (segmentation,figure-ground separation)over a range of viewing and illumination conditions.In other words,the DR detection must be repeatable and stable w.r.t.viewpoint and illumination changes.DRs are referred to in the literature as’interest points’[7],’features’[1]or’invariant regions’[16].Note that we do not require DRs to have some transformation-invariant property that is unique in the image.If a DR possessed such a property,finding its corresponding DR in an other image would be greatly simplified.To increase the likelihood of this hap-pening,DRs can be equipped with a characterisation com-puted on associated measurement regions:Definition2A Measurement Region(MR)is any subset of an image defined by a transformation-invariant construc-tion(projective,affine,similarity invariant)from one or more(in case of grouping)regions.The separation of the concepts of DR and MRs is impor-tant and not made explicit in the literature.Since DRs are projections of the same part of an object in both views and MRs are defined in a transformation-invariant manner they are quasi view-point invariant.Besides the simplest and most common case where the MR is the DR itself,a MR may be constructed for example as a convex hull of a DR, afitted ellipse(affinelly invariant,[16]),a line segment be-tween a pair of interest points[15]or any region defined in a DR-derived coordinates.Of course,invariant measure-ments from a single or even multiple MRs associated with a DR will not guarantee a unique match on e.g.repetitive patterns.However,often DR characterisation by invariants computed on MR might be unique or almost unique.Note that,any set of pixels,not necessarily continu-ous,can posses a distinguishing property.Many percep-tual grouping processes detect such arrangements,e.g.a set of(unconnected)edges lying along a straight line form a DR of maximum edge density.The property is view-point quasi-invariant and detectable by the Hough Trans-form.The’distinguished pixel set’[10]would be a more precise term,but it is cumbersome.The definition of”local part”(sometimes also called ”feature”,”object component”etc.)is very vague in the recent literature.For our purpose it is important to define it more precisely.In the following discussion we will use the term”discriminative region”instead of”local part”.In this way,we would like to emphasise the difference between our definition of discriminative region and the usual sense of lo-cal part(a discriminative region is a local part with special properties important for its detection and recognition).Definition3A Discriminative Region is any subset of an image defined by discriminative descriptors computed on measurement region.Discriminative descriptors have to have the following properties:•Stability under change of imaging conditions.A discriminative region must be detectable over a wide range of imaging conditions(viewpoint,illumination).This property is guaranteed by definition of a DR.•Good intra-category localization.The variation in the position of the discriminative region in the object coordinate system should be small for different objects in the same category.•Uniqueness.A small number of similar discriminative regions should be present in the image of both object and background.•High incidence.The discriminative region should be detectable in a high proportion of objects from the same category.Note,there exists a trade-off between the ability to localise objects and the ability to discriminate between.A very dis-criminative part can be a strong cue,even if it appears in an arbitrary location on the surface of the object.On the other hand,a less discriminative part can only contribute infor-mation if it occurs in a stable spatial relationship relative to other parts.3.3Combining EvidenceThis is a rather important stage of the detection process, which significantly influences the overall performance of the system and makes it robust with respect to arbitrary geometrical transformations.The combination of evidence coming from the detected discriminative regions is carried out in a novel way,significantly different from approaches of the Schmid et al.[14,17]or Burl et al.[4,5,6].In most approaches,a shape model is built over the placement of particular discriminative regions.If an admis-sible configuration of these regions is found in an image,an instance of object in the image is hypothesised.It means that all the information conveyed by the area that lies be-tween the detected discriminative regions is discarded.If you imagine a collage,consisting of one eye,a nostril and a mouth corner placed in a reasonable manner on a black background,this will still be detected as a face,since no other parts of the image are needed to accept the”face-present”hypothesis.In our approach the geometrical constraints are modelled probabilistically in terms of spatial coordinates of discrim-inative regions.But these geometrical constraints are used only to define possible positions(hypotheses)of object inthe image.Thefinal decision about object presence in the image is deduced from the photometric information content in the original image.4ExperimentWe have carried out the experiment on face localisation [2]with the XM2VTS database[11].In order to verify the correctness of our localization framework,several simpli-fications to the general scheme are made.In the exper-iment the discriminative regions were semi-automatically defined as the eye-corners,the eye-centers the nostrils and the mouth corners.4.1Detector of discriminative regionsAs a distinguished region detector we use the improved Harris corner detector[8].Our implementation[2]of the detector is relatively insensitive to illumination changes, since the threshold is computed automatically from the neighborhood of the interest point.Such a corner detec-tor is not generally invariant to scale change,but we solve this problem by searching for interest points through several scales.We have observed[2]that the distribution of interest points coincide with the manually labelled points.It means, these points should define discriminative regions(here we suppose,that humans often identify interest points as most discriminative parts of object).Further,we have assumed that all potential in-plane face rotations and differences in face scale are covered by the training database.The MRs was defined very simply,as rectangular regions with the centre at the interest points.We select ten positions (the left eye centre,the right eye centre,the right left-eye corner,the left left-eye corner,the right right-eye corner, the left right-eye corner,the left nostril,the right nostril,the left mouth corner,the right mouth corner),which we further denote as regions1–10.All properties of a discriminative region are then determined by the size of the region.As a descriptor of a region we use the normalised colour infor-mation of all points contained in the region.Each region was modelled by a uni-modal Gaussian in a low-dimensional sub-space and the hypothesis whether the sample belongs to the class of faces is decided from the distance of this sample from the mean for a given region. The distance from the mean is measured as a sum of the in sub-space(DISS)and the from sub-space(DFSS)distances (Moghaddam et al.[12]).4.2Combining EvidenceThe proposed method is based onfinding the correspon-dences between generic face features(referred to as dis-criminative regions)that lie in the face-space and the face features detected in an image.This correspondence is then used to estimate the transformation that a generic face pos-sibly underwent.So far the correspondence of three points was used to estimate a four or six parametric affine trans-formation.When the the transformation from the face space to im-age space determined,the verification of a”face-present”hypothesis becomes an easy task.An inverse transforma-tion(i.e.transformation from the image space into the face-space)is found and the image patch(containing the three points of correspondence)is transformed into the face-space.The decision whether the”face-present”hypothesis holds or not is carried out in the face-space,where all the variations introduced by the geometrical transformation(so far only affine transformation is assumed to be the admis-sible transformation that a generic face can undergo)are compensated(or at least reduced to a negligible extent). The distance from a generic face class[12]is computed for the transformed patch and a threshold is used to determine whether the patch is from a face class or not.Moreover,many possible face patches do not have to be necessarily verified,since certain constraints can be put on the estimated transformation.Imagine for instance that all the feasible transformations that a face can undergo are the scaling from50%to150%of the original size in the face space and rotations up to30degrees.This is quite a rea-sonable limitation which will cause most of the correspon-dences to be discarded without doing a costly verification in the face space(in our experiments the pruning reached about70%).In case of the six parametric affine transform both shear and anisotropic scale is incorporated as the ad-missible transformation.4.3Algorithm summaryAlgorithm1:Detection of human faces1.Detection of the distinguished regions.For each im-age from the test set,detect the distinguished regions using the illumination invariant version of the Harris detector2.Detection of the discriminative regions.For each de-tected distinguished region determine to which class the region belongs using the PCA-based classifier in the colour space from among ten discriminative regionclasses(practically the eye corners,the eye centres,the nostrils and the mouth corners).The distinguished regions that do not belong to any of the predefined classes are discarded.bination of evidence.•Compute the estimate of the transformation fromthe image space to the face space using the corre-spondences between the three points in the facespace and in the image space.•Decompose this transformation into rotation,scale,translation and possibly shear and testwhether these parameters lie within a predefinedconstraints,i.e.make the decision,whether thetransformation is admissible or not.•If the transformation derived from the correspon-dences is admissible,transform the image patchthat is defined by the transformation of the faceoutline into the face space.4.Verification.Verify the”face present”hypothesis us-ing a PCA-based classifier.4.4ResultsResults of discriminative regions detector are sum-marised in Tab.1.Note that since the classifier is very sim-ple,the performance is not very high.However,even with such a simple detector of discriminative regions the system is capable of detecting faces with very low error,since we need only a small number of successfully detected discrim-inative regions(in our case only3).Several extensive experiments were conducted.Image patches were declared as”face”when their Mahanalobis distance based score lied below a certain threshold.200im-ages from the XM2VTS database were used for training a grayscale classifier based on the Moghaddam method[12], as mentioned earlier.The detection rate reached98%in case of XM2VTS database-see Fig.1for examples.Faces in several images containing cluttered background were successfully detected as shown in Fig.2.5Discussion and Future WorkWe proposed a method for face detection using discrim-inative regions.The detector performance is very good for the case when the general face detection problem is con-strained by assuming a particular camera and pose position.Table1.Performance of discriminative regiondetectorsfalse negative false positive%#%#Region131.8919172.263831Region210.686437.881342Region357.7634633.03433Region454.9232919.85218Region515.039022.34538Region613.698262.333260Region715.5393 4.0078Region812.5275 5.07104Region948.75292 6.2770Region1033.5620114.90233Correctly detected False rejections Figure1.Experiment resultsWe also assumed that the parts that appear distinctive to the human observer will be also discriminative,and therefore the discriminative regions were selected manually.In gen-eral,the correlation between distinctiveness and discrimi-nativeness cannot necessarily be assumed and therefore the discriminative regions should be”learned”from the training images.The training problem was addressed in this paper only partially.As an alternative the method proposed by Weber et al.[18]can be exploited.The admissible transformation,which a face can undergo has so far been restricted to affine transformation.Never-theless,the results showed even in such a simple case,that high detection performance can be achieved.Future modifi-cations will involve the employment of more complex trans-formations(such as general non-rigid transformations).The PCA based classification can be replaced by more powerful classifiers,such as Neural Networks,or Support Vector Ma-chines.Figure2.Experiments with cluttered back-ground6ConclusionIn the paper,a novel framework for face detection wasproposed.The framework is based on the idea that mostreal objects can be decomposed into a collection of localparts tied by geometrical constraints imposed on their spa-tial arrangement.By exploiting this fact,face detection canbe treated as recognition of local image patches(photomet-ric information)in a given configuration(geometric con-straints).In our approach,discriminative regions serve as apreliminary evidence reducing the search time dramatically.This evidence is utilised for generating a normalised versionof the image patch,which is then used for the verificationof the”face present”hypothesis.The proposed method was applied to the problem of facedetection.The results of extensive experiments are verypromising.The experiments demonstrated that the pro-posed method is able to solve a rather difficult problem incomputer vision.Moreover we showed that even simplerecognition methods(with a limited capability when usedalone)can be configured to create powerful framework ableto tackle such a difficult task as face detection.References[1] A.Baumberg.Reliable feature matching across widely sepa-rated views.In Proc.of Computer Vision and Pattern Recog-nition,pages I:774–781,2000.[2]P.B´ılek,J.Matas,M.Hamouz,and J.Kittler.Detection ofhuman faces from discriminative regions.Technical ReportVSSP–TR–2/2001,Department of Electronic&ElectricalEngineering,University of Surrey,2001.[3]R.Brunelli and T.Poggio.Face recognition:Features vs.templates.IEEE Trans.on Pattern Analysis and MachineIntelligence,15(10):1042–1053,1993.[4]M.C.Burl,T.K.Leung,and P.Perona.Face localizationvia shape statistics.In Proc.of International Workshop onAutomatic Face and Gesture Recognition,pages154–159,1995.[5]M.C.Burl and P.Perona.Recognition of planar objectclasses.In Proc.of Computer Vision and Pattern Recog-nition,pages223–230,1996.[6]M.C.Burl,M.Weber,and P.Perona.A Probabilistic ap-proach to object recognition using local photometry abdglobal Geometry.In Proc.of European Conference on Com-puter Vision,pages628–641,1998.[7]Y.Dufournaud,C.Schmid,and R.Horaud.Matching im-ages with different resolutions.In Proc.of Computer Visionand Pattern Recognition,pages I:612–618,2000.[8] C.J.Harris and M.Stephens.A combined corner and edgedetector.In Proc.of Alvey Vision Conference,pages147–151,1988.[9]des,J. C.V orbr¨u ggen,J.Buhmann,nge,C.von der Malsburg,R.P.W¨u rtz,and W.Konen.Distrotioninvariant object recognition in the dynamic link architecture.IEEE Trans.on Pattern Analysis and Machine Intelligence,42(3):300–310,1993.[10]J.Matas,M.Urban,and T.Pajdla.Unifying view for wide-baseline stereo.In B.Likar,editor,puter Vi-sion Winter Workshop,pages214–222,Ljubljana,Sloveni,February2001.Slovenian Pattern Recorgnition Society.[11]K.Messer,J.Matas,J.Kittler,J.Luettin,and G.Maitre.XM2VTSDB:The extended M2VTS database.In R.Chel-lapa,editor,Second International Conference on Audio andVideo-based Biometric Person Authentication,pages72–77,Washington,USA,March1999.University of Maryland.[12] B.Moghaddam and A.Pentland.Probabilistic visual learn-ing for object detection.In Proc.of International Confer-ence on Computer Vision,pages786–793,1995.[13] A.Mohan,C.Papageorgiou,and T.Poggio.Example-basedobject detection in images by components.IEEE Trans.onPattern Analysis and Machine Intelligence,23(4):349–361,2001.[14] C.Schmid and R.Mohr.Local grayvalue invariants for im-age retrieval.IEEE Trans.on Pattern Analysis and MachineIntelligence,19(5):530–535,1997.[15] D.Tell and S.Carlsson.Wide baseline point matching usingaffine invariants computed from intensity profiles.In Proc.of European Conference on Computer Vision,pages754–760,2000.[16]T.Tuytelaars and L.van Gool.Wide baseline stereo match-ing based on local,affinely invariant regions.In Proc.ofBritish Machine Vision Conference,pages412–422,2000.[17]V.V ogelhuber and C.Schmid.Face detection based ongeneric local descriptors and spatial constraints.In Proc.of International Conference on Computer Vision,pagesI:1084–1087,2000.[18]M.Weber,M.Welling,and P.Perona.Unsupervised learn-ing of models for recognition.In Proc.of European Confer-ence on Computer Vision,pages18–32,2000.[19]L.Wiskott,J.-M.Fellous,N.Kr¨u ger,and C.von der Mals-burg.Face recognition by elastic bunch graph matching.IEEE Trans.on Pattern Analysis and Machine Intelligence,19(7):775–779,1997.[20] A.L.Yuille.Deformable templates for face recognition.Journal of Cognitive Neuroscience,3(1):59–70,1991.。
CVPR&ML榜单Top 30如下
榜单Top 30如下,欢迎拾遗补缺:[1]Rapid Object Detection using a Boosted Cascade of Simple Features (Citations: 3296,PER=299.64)Paul A. Viola, Michael J. Jones @CVPR , vol. 1, pp. 511-518, 2001[2]Histograms of Oriented Gradients for Human Detection (Citations: 1704, PER=243.43)Navneet Dalal, Bill Triggs @CVPR , vol. 1, pp. 886-893, 2005[3]SURF: Speeded-Up Robust Features (Citations: 1054, PER=175.67)Herbert Bay, Tinne Tuytelaars, Luc J. Van Gool @ECCV , pp. 404-417, 2006[4]Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural SceneCategories (Citations: 873, PER=145.5)Svetlana Lazebnik, Cordelia Schmid, Jean Ponce @CVPR , vol. 2, pp. 2169-2178, 2006[5]Object Class Recognition by Unsupervised Scale-Invariant Learning (Citations: 1071,PER=119)Robert Fergus, Pietro Perona, Andrew Zisserman @CVPR , vol. 2, pp. 264-271, 2003[6]Robust Real-Time Face Detection (Citations: 1092, PER=99.27)Paul A. Viola, Michael J. Jones @ ICCV , 2001[7]A Bayesian hierarchical model for learning natural scene categories (Citations: 677,PER=96.71)Fei-Fei Li, Pietro Perona @CVPR , vol. 2, pp. 524-531, 2005[8]Scalable Recognition with a Vocabulary Tree (Citations: 570, PER=95)David Nistér, Henrik Stewénius @CVPR , vol. 2, pp. 2161-2168, 2006[9]Real-Time Tracking of Non-Rigid Objects Using Mean Shift (Citations: 1132,PER=94.33)Dorin Comaniciu, Visvanathan Ramesh, Peter Meer @CVPR , vol. 2, pp. 2142-149vol.2, 2000[10]Visual Categorization with Bags of Keypoints (Citations: 745, PER=93.13)Gabriella Csurka, Christopher R. Dance, Lixin Fan, etc @ECCV , 2004[11]Video Google: A Text Retrieval Approach to Object Matching in Videos (Citations:790, PER=87.78)Josef Sivic, Andrew Zisserman @ ICCV , pp. 1470-1477, 2003[12]What Energy Functions Can Be Minimized via Graph Cuts? (Citations: 842, PER=84.2)Vladimir Kolmogorov, Ramin Zabih @ECCV , pp. 65-81, 2002[13]Overview of the Face Recognition Grand Challenge (Citations: 578, PER=82.57)P. Jonathon Phillips, Patrick J. Flynn, W. Todd Scruggs, etc @CVPR , vol. 1, pp.947-954, 2005[14]Robust wide baseline stereo from maximally stable extremal regions (Citations: 810,PER=81)Jiri Matas, Ondrej Chum, Martin Urban, etc @BMVC , vol. 1, 2002[15]PCA-SIFT: A More Distinctive Representation for Local Image Descriptors (Citations:639, PER=79.88)Yan Ke, Rahul Sukthankar @CVPR , vol. 2, pp. 506-513, 2004[16]Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects inND Images (Citations: 796, PER=72.36)Yuri Y. Boykov, Marie-pierre Jolly @ ICCV , pp. 105-112, 2001[17]An extended set of Haar-like features for rapid object detection (Citations: 710,PER=71)Rainer Lienhart, Jochen Maydt @ICIP , vol. 1, pp. 900-903, 2002[18]A Database of Human Segmented Natural Images and its Application to EvaluatingSegmentation Algorithms and Measuring Ecological Statistics (Citations: 750,PER=68.18)David R. Martin, Charless Fowlkes, Doron Tal, etc @ ICCV , pp. 416-425, 2001 [19]Detecting Pedestrians Using Patterns of Motion and Appearance (Citations: 584,PER=64.89)Paul A. Viola, Michael J. Jones, Daniel Snow @ ICCV , pp. 734-741, 2003[20]Object Recognition as Machine Translation: Learning a Lexicon for a Fixed ImageVocabulary (Citations: 603, PER=60.3)Pinar Duygulu, Kobus Barnard, João F. G. De Freitas, etc @ECCV , pp. 97-112, 2002 [21]Real-Time Simultaneous Localisation and Mapping with a Single Camera (Citations:527, PER=58.56)Andrew J. Davison @ ICCV , pp. 1403-1410, 2003[22]Recognizing Human Actions: A Local SVM Approach (Citations: 440, PER=55)Christian Schüldt, Ivan Laptev, Barbara Caputo @ICPR , pp. 32-36, 2004[23]Actions as Space-Time Shapes (Citations: 379, PER=54.14)Moshe Blank, Lena Gorelick, Eli Shechtman, etc @ ICCV , vol. 2, pp. 1395-1402, 2005 [24]A Discriminatively Trained, Multiscale, Deformable Part Model (Citations: 215,PER=53.75)Pedro F. Felzenszwalb, David A. Mcallester, Deva Ramanan @CVPR , pp. 1-8, 2008 [25]Non-parametric Model for Background Subtraction (Citations: 642, PER=53.5)Ahmed M. Elgammal, David Harwood, Larry S. Davis @ECCV , pp. 751-767, 2000 [26]A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms(Citations: 318, PER=53)Steven M. Seitz, Brian Curless, James Diebel, etc @CVPR , vol. 1, pp. 519-528, 2006 [27]Comprehensive Database for Facial Expression Analysis (Citations: 636, PER=53)Takeo Kanade, Yingli Tian, Jeffrey F. Cohn @FG , pp. 46-53, 2000[28]Learning Realistic Human Actions from Movies (Citations: 211, PER=52.75)Ivan Laptev, Marcin Marszalek, Cordelia Schmid, etc @CVPR , pp. 1-8, 2008 [29]Object Retrieval with Large Vocabularies and Fast Spatial Matching (Citations: 258,PER=51.6)James Philbin, Ondrej Chum, Michael Isard, etc @CVPR , 2007[30]Statistical Shape Influence in Geodesic Active Contours (Citations: 616, PER=51.33)Michael E. Leventon, W. Eric L. Grimson, Olivier D. Faugeras @CVPR , vol. 1, pp.1316-1323, 2000。
Profile-Based 3D Face Registration and Recognition
Profile-Based3D Face Registrationand RecognitionChao Li1and Armando Barreto1,21Electrical and Computer Engineering Department,Florida International University,33174Miami,USA{cli006,barretoa}@2Biomedical Engineering Department,Florida International University,33174Miami,USAAbstract.With the rapid development of3D imaging technology,facerecognition using3D range data has become another alternative in thefield of biometrics.Unlike face recognition using2D intensity images,which has been studied intensively by many researchers since the1960’s,3D range data records the exact geometry of a person and it is invariantwith respect to illumination changes of the environment and orientationchanges of the person.This paper proposes a new algorithm to registerand identify3D range faces.Profiles and contours are extracted for thematching of a probe face with available gallery faces.Different combina-tions of profiles are tried for the purpose of face recognition using a setof27subjects.Our results show that the central vertical profile is one ofthe most powerful profiles to characterize individual faces and that thecontour is also a potentially useful feature for face recognition.Keywords:2D,3D,biometrics,contour,face,intensity,moment,profile,range,recognition,registration.1IntroductionFace recognition has been widely studied during the last two decades.It is a branch of biometrics,which studies the process of automatically associating an identity with an individual by means of some inherent personal characteristics [1].Biometric characteristics include something that a person is or produces.Ex-amples of the former arefingerprints,the iris,the face,the hand/finger geometry or the palm print,etc.The latter include voice,handwriting,signature,etc.[2]. Compared with other biometric characteristics,the face is considered to be the most immediate and transparent biometric modality for physical authentication applications.Despite its intrinsic complexity,face-based authentication still re-mains of particular interest because it is perceived psychologically and physically as noninvasive.Significant motivations for its use include the following[2]:–Face recognition is a modality that humans largely depend on to authenticate other humans.–Face recognition is a modality that requires no or only weak cooperation to be useful.–Face authentication can be advantageously included in multimodal systems, not only for authentication purposes but also to confirm the aliveness of the signal source offingerprints,voice,etc.The definition of face recognition was formulated in[3]as:“Given an image of a scene,identify one or more persons in the scene using a stored database of faces.”This is called the‘one to many’problem or identification problem in face recognition.Another kind of problem is‘one to one’,i.e.,the authentication problem.This kind of problem is to determine whether the input face of a person is really the person he or she claims to be or not.In this paper,we deal with face recognition in thefirst scenario.The potentialfield of the application of face recognition is very wide,mostly in areas such as authentication,security and access control,which include the physical access control and logical access control.Especially in recent years,anti-terrorism has been a big issue throughout the world.Face recognition will play a more and more important role in its efforts.In the last ten years,most of the research work in the area of face recogni-tion used two-dimensional images,that is,gray level images taken by a camera. Many new techniques emerged in thisfield and achieved good recognition rates.A number of these techniques are outlined in survey publications,such as[5]. However,most of the2D face recognition systems are sensitive to the illumina-tion changes or orientation changes of the subjects.All these problems result from the incomplete information contained in a2D image about a face.On the other hand,a3D scan of a subject’s face has complete geometric information about the face,even including texture information,in the case of some scanners. It is believed that,on average,3D face recognition methods will achieve higher recognition rates than their2D counterparts.With the rapid development of3D imaging technology,3D face recognition will attract more and more attention.In[6],Bowyer provides a survey of3D face recognition technology.Some of the techniques are derived from2D face recognition,such as Principal Com-ponent Analysis(PCA)used in[7,8]to extract features from faces.Some of the techniques are unique to3D face recognition,such as the geometry match-ing method in[9],the profile matching proposed in[10,11]and the isometric transformation method presented in[4].This paper outlines a new algorithm used to register3D face images auto-matically.Specific profiles are defined in the registered faces and these are used for matching against the faces on a database including27subjects.The impact of using different types of profiles for matching is studied.Also the possibility of using the contour of a face as a feature for face recognition is explored.The structure of the paper is as follows:Section2describes the database used for this research.Section3presents the registration algorithm and Section 4outlines the matching procedure using different profiles and contours and gives the results of the experiments.Section5is the conclusion.23D Face DatabaseUnlike2D face recognition research,for which there are numerous databases available in the Internet,there are only a few3D face databases available to researchers.Examples are the Biometrics Database from the University of Notre Dame[12]and the University of South Florida(USF)face database[13].In our experiment,the USF database is used.The USF database of human3D face images is maintained by researchers in the department of Computer Science at the University of South Florida,and sponsored by the Defense Advanced Research Projects Agency(DARPA).The USF database has a total number of111subjects(74male;37female).All subjects have a neutral facial expression.Some of the subjects were scanned multiple times.In our experiment,the3D faces of the subjects who were scanned multiple times are considered,so that one scan can be used as a gallery image, i.e.,one of the faces that are assumed to be prerecorded,and the remaining scans from the same subject can be used as probe images,i.e.,faces to be identified.A subset of27subjects is used in this research,with27faces in the gallery and 27scans to be identified(probe faces).Fig.1.Rendered3D face image(Left)and triangulated mesh3D face image(Right) The3D scans in the USF database were acquired using a Cyberware3030 scanner.This scanner incorporates a rugged,self-contained optical range-finding system,whose dynamic range accommodates varying lighting conditions and surface properties[14].The faces in the database were converted into Stereolitography(STL)format. Each face has an average of18,000vertices and36,000triangles.Figure1shows a face from the database in its rendered and triangulated mesh forms.3Registration and PreprocessingIn3D face recognition,registration is a key pre-processing step.Registering may be crucial to the efficiency of some matching methods.Earlier work used Princi-pal Curvature and Gaussian Curvature to segment the face surface and register it,such as the methods in[9,10,15].The disadvantage of using curvatures to register faces is that this process is very computationally intensive and requires very accurate range data[16].Another method often used involves choosing several user-selected landmark locations on the face,such as the tip of the nose,the inner and outer corners of the eyes,etc.,and then using the affine transformation to register the face to a standard position[7,8,11].A third method performs registration by using moments.The matrix(Equa-tion1)constituted by the six second moments of the face surface:m200,m020, m002,m110,m101,m011,contains the rotational information of the face[17].M=⎡⎣m200m110m101m110m020m011m101m011m002⎤⎦(1)U∆U =SV D(M)(2) By applying the Singular Value Decomposition(Equation2),the unitary matrix U represents the rotation and the diagonal matrix∆represents the scale, for the three axes.U can be used as an affine transformation matrix on the original face surface.The problem with this method is that during repeated scans for the same subject,besides the changes in the face area,there are also some changes outside the face area,such as the different caps worn by the subjects during the scanning process(Fig.1).These additional changes will also impact the registration of the face surface,causing the registration for different instances of the same subject not to be the same.This limitation constrains this approach to only the early stages of registration.Figure2is an example of a scanned face rendered in a Cartesian coordinate system,with the X axis corresponding to the depth direction of the face,the Y axis corresponding to the length direction of the face and the Z axis correspond-ing to the width direction of the face.In the registration process,we assume that each subject kept his head upright during scanning,so that the face orientation around the X axis does not need to be corrected,but the orientation changes in the Y and Z axes need to be compensated for.The registration algorithm proposed does not require user-defined landmark locations and can be done automatically.First,the tip of the nose is found by looking for the point with the maximum value in the X direction.Then a‘cutting plane’,parallel to the XZ plane is set to contain the tip of the nose(Fig.3).The intersection of this cutting plane with the face defines the horizontal profile curve.In effect,the result is a discretized curve with a spacing of1.8mm between samples(Fig.4).Fig.2.Face surface in a Cartesian co-ordinate system(the units in the threeaxes aremm)Fig.3.Illustration of the extraction of the horizontalprofileFig.4.Discrete horizontal profile before registrationA trilinear interpolation method is used to find the value of each point in this profile.(Fig 5).The point P is in the YZ plane.P’is the intersection between the triangle ABC and the straight line PP’,which is normal to the YZ plane.The length of PP’is the profile value corresponding to point P.Next,the following cost function is minimized with respect to α,where I is the index of the maximum point of X.Fig.5.Trilinear interpolation to get exact values of profileelevationsFig.6.Horizontal profile after registration around Y axisE =15i =1[(X (I +i )−X (I −i )]2(3)For every α,the affine transformation is applied to the face surface using the following transformation matrix,and the horizontal profile is found,as illustrated before.T =⎡⎣cos α0−sin α010sin α0cos α⎤⎦(4)α=arg{min[15i=1[(X(I+i)−X(I−i)]2]}(5)Thefinal value ofαrepresents the orientation change around the Y axis required for the registration.Figure6shows the horizontal profile seen in Figure4,after the Y axis ad-justment has been performed:Typically,a rotational adjustment around the Z axis will also be required. Analogous to Figure3,Figure7shows the intersection of the face surface with a cutting plane,which is parallel to the XY plane and passes through the tip of the nose.This intersection is the central vertical profile.Similar to Figure4, Figure8shows the discretized central vertical profile,before adjustment.Fig.7.Illustration of extraction of central vertical profileThe cost function to be minimized in this case is the following,E=abs(X(I−50)−X(I+40))(6) Minimization is with respect toα.I is the index of profile point with the largest value of X.α=arg{[min(abs(X(I−50)−X(I+40))]}(7) For everyα,the affine transformation is applied,using the following trans-formation matrix.Fig.8.Discretized central profile beforeregistrationFig.9.Central vertical profile after reg-istrationFig.10.Mesh plot of the range image(left)and gray level image plot of range data(right)T=⎡⎣sinα0cosαcosα0−sinα010⎤⎦(8)The aim is to equalize the X coordinates of two critical points contained in the central vertical profile:the end point on the forehead side and the end point on the chin side.Figure9is the central vertical profile after adjustment around the Z axis.To complete the registration process,a grid of91by81points is prepared that corresponds to pairs of(y,z)coordinates.The point(51,41)of the grid is made to coincide with the tip of the nose in the adjusted face surface.This grid assumes a spacing of1.8mm in both the Y and Z directions,with91points in the length direction and81points in the width direction.The value associated to each point in the grid is the distance between the point in the face surface and the corresponding location on the YZ plane,calculated by trilinear interpolation (Fig.5).The values are offset so that the value corresponding to the tip ofthe nose is normalized to100mm.Values below20mm in the grid area are thresholded to20mm.Figure10is a Matlab mesh plot of the resulting grid,and a gray level plot of the same range image.4Recognition Experiments and ResultsFor the experiments described here,a gallery database of27range images of 27subjects(one for each subject)and a probe database of27different scans of the same27subject were used.The time interval between the acquisition of the gallery image and the corresponding probe image for a given subject ranges from several months to one year.The use of profile matching as a means for face recognition is a very intuitive idea that has been proposed in the past.In[10,11,18,19],different researchers explored the profile matching method in different ways.In our research,because the range image has already been obtained,profile extraction is simple.We have,in fact,tested the efficiency of several potential profile combinations used for identification.Besides profiles,the contour of a face was also tested for its potential applicability for face recognition.In our experiment,a frontal contour defined30mm behind the tip of the nose was extracted for each scan.Although in computing the distance or dissimilarity between profiles,some researchers[19] used the Hausdoffdistance,we found that the Euclidean distance is suitable for the context of our experiment.The following six different feature combinations and direct range image match-ing variations were tested with the experimental data described above:(a)Central vertical profile alone.(b)Central horizontal profile alone.(c)Contour,which is30mm behind the tip of the nose.(d)Central vertical profile and two horizontal profiles.The two horizontal profiles are defined at18mm and36mm above the tip of the nose.The distance between central profiles is given the weight of0.7;the two horizontal profile distances are given the weight of0.15each,towards the overall matching score for identification.(e)Central vertical profile and two more vertical profiles,one passing18mm to the left of the central profile,the other passing18mm to the right of the central profile.The distance between central profiles is given the weight of0.7; the other two vertical profile distances are given a weight of0.15each.(f)Using the entire range image.From the results in Figure11,we can see that scheme(a),i.e.,matching the central vertical profile alone,has the highest recognition rate.On the other hand,using the whole range image for matching yields the lowest recognition rate.Because the probe image was taken several months to one year after the gallery image was taken,we have sufficient reason to assume there were changesin the face for every subject.The high recognition rate using the central vertical profile suggests that this profile has the most distinctive properties between different subjects and is the most consistent through time for the same subject. These observations concur with a similar analysis,presented in[11].Besides the central vertical profile,the contour of a face also shows its potential as a feature to be used in face recognition.5ConclusionIn this paper,a new registration algorithm for3D face range data was proposed. This algorithm is valid under some constrains;i.e.,only orientation changes along the width direction and length direction of the face need to be compensated.But this algorithm can also be extended to register arbitrarily oriented face surfaces in3D space,combined with simple registration algorithms that use the six second moments of the face surface.Also in this paper,face identification based on profile matching was explored. Different combinations of profiles for matching were compared.It was found that the central vertical profile is the feature that best represented the intrinsic characteristics of each face and had the highest identification value among all the profile combinations tested.The contour of a face also has the potential to be used as one of the features in face recognition.AcknowledgmentsThis work was sponsored by NSF grants IIS-0308155and HRD-0317692.The participation of Mr.Chao Li in this research was made possible through the support of his Florida International University Presidential Fellowship. References1. A.K.Jian,R.Bolle and S.Pankanti,Biometrics-Personal Identification in NetworkedSociety.1999,Norwell,MA:Kluwer2.J.Ortega-Garcia,J.Bigun,D.Reynolds and J.Gonzales-Rodriguez,Authticationgets personal with biometrics.IEEE Signal Processing,2004.21(No.2):p.50-613.R.Chellappa,C.Wilson and S.Sirohey,Human and Machine Recognition of Faces:A Survey.Proceedings of the IEEE,1995.83(5):p.705-7404. A.Bronstein,M.Bronstein and R.Kimmel,Expression-invariant3D face recogni-tion.In the Proceedings of Audio and video-based Biometric Person Authentication (AVBPA),2003:p.62-695.W.Zhao,R.Chellappa and A.Rosenfeld,Face recognition:a literature survey.ACMComputing Survey,2003.35:p.399-4586.K.Bowyer,K.Chang and P.Flynn,A Survey of Approaches to3D and Multi-Modal3D+2D Face Recognition.In Proceedings of IEEE International Conference on Pattern Recognition.2004:p.358-3617.K.Chang,K.Bowyer and P.Flynn,Multimodal2D and3D biometrics for facerecognition.In the Proceedings of ACM workshop on Analysis and Modeling of Faces and Gestures.2003:p.25-328. C.Hesher,A.Srivastava and G.Erlebacher,A novel technique for face recognitionusing range images.In the Proceedings of Seventh International Symposium on Signal Processing and Its Application.20039.G.Gordon,Face recognition based on depth maps and surface curvature.In Geo-metric Methods in Computer Vision,SPIE.July1991:p.1-1210.J.Y.Cartoux,Preste and M.Richetin,Face authentication or recognition byprofile extraction from range images.In the Proceedings of the Workshop on Interp.of3D Scenes.1989:p.194-19911.T.Nagamine,T.Uemura and I.Masuda,3D facial image analysis for human iden-tification.In the Proceedings of Internaitonal Conference on Pattern Recognition (ICPR,1992):p.324-32712.K.Bowyer,University of Notre Dame Biometrics Database./%7Ecvrl/UNDBiometricsDatabase.html13.K.Bowyer,S.Sarkar,USF3D Face Database.f.edu/HumanID/2001.15.H.Tanaka,T.Ikeda,Curvature-based face surface recognition using sphericalcorrelation-principal directions for curved object recognition.In the Proceedings of the13th International Conference on Pattern Recognition,1996:p.638-642 16.Y.Wu,G.Pan and Z.Wu,Face Authentication Based on Multiple Profiles Ex-tracted from Range Data.In the Proceedings of4th International Conference on audio-and video-based biometric person authentication.2003:p.515-52217.M.Elad,A.Tal and S.Ar,Content Based Retrieval of VRML Objects-An Iterativeand Interactive Approach.In the Proceedings of the6th Eurographics Workshop in Multimedia.2001.18. C.Beumier,C.Acheroy,Face verification from3d and grey level clues.PatternRecognition Letters,2001.22(12):p.1321-132919.G.Pan,Y.Wu and Z.Wu,Investigating Profile Extraction from Range Data for3D Face Recognition.In the Proceedings of2003IEEE International Conferrence on Systems Man and Cybernetics:p.1396-1399。
纹理物体缺陷的视觉检测算法研究--优秀毕业论文
摘 要
在竞争激烈的工业自动化生产过程中,机器视觉对产品质量的把关起着举足 轻重的作用,机器视觉在缺陷检测技术方面的应用也逐渐普遍起来。与常规的检 测技术相比,自动化的视觉检测系统更加经济、快捷、高效与 安全。纹理物体在 工业生产中广泛存在,像用于半导体装配和封装底板和发光二极管,现代 化电子 系统中的印制电路板,以及纺织行业中的布匹和织物等都可认为是含有纹理特征 的物体。本论文主要致力于纹理物体的缺陷检测技术研究,为纹理物体的自动化 检测提供高效而可靠的检测算法。 纹理是描述图像内容的重要特征,纹理分析也已经被成功的应用与纹理分割 和纹理分类当中。本研究提出了一种基于纹理分析技术和参考比较方式的缺陷检 测算法。这种算法能容忍物体变形引起的图像配准误差,对纹理的影响也具有鲁 棒性。本算法旨在为检测出的缺陷区域提供丰富而重要的物理意义,如缺陷区域 的大小、形状、亮度对比度及空间分布等。同时,在参考图像可行的情况下,本 算法可用于同质纹理物体和非同质纹理物体的检测,对非纹理物体 的检测也可取 得不错的效果。 在整个检测过程中,我们采用了可调控金字塔的纹理分析和重构技术。与传 统的小波纹理分析技术不同,我们在小波域中加入处理物体变形和纹理影响的容 忍度控制算法,来实现容忍物体变形和对纹理影响鲁棒的目的。最后可调控金字 塔的重构保证了缺陷区域物理意义恢复的准确性。实验阶段,我们检测了一系列 具有实际应用价值的图像。实验结果表明 本文提出的纹理物体缺陷检测算法具有 高效性和易于实现性。 关键字: 缺陷检测;纹理;物体变形;可调控金字塔;重构
Keywords: defect detection, texture, object distortion, steerable pyramid, reconstruction
II
人脸识别(英文)Face-RecognitionPPT课件
Fundamentals
step 2 ) feature extraction for trained set(database) at the same time for input image
Feature extraction can provide effective information .Like those pictures, a birthmark under the right eye is useful to distinguish that they are one person.
n A computer application for automatically identifying or verifying a person from a digital image or a video frame from a video source.
Processing Flow
Application
Face recognition to pay
Alibaba Group founder Jack Ma showed off the technology Sunday during a CeBIT event that would seamlessly scan users’ faces via their smartphones to verify mobile payments. The technology, called “Smile to Pay,” is being developed
Application
Face Recognition Access Control System
Face Recognition acceste, Whenever one wishes to access a building, FaceGate verifies the person’s entry code or card, then compares his face with its stored “key.” It registers him as being authorized and allows him to enter the building. Access is denied to anyone whose face does not match.
人脸识别外文翻译参考文献
人脸识别外文翻译参考文献(文档含中英文对照即英文原文和中文翻译)译文:基于PAC的实时人脸检测和跟踪方法摘要:这篇文章提出了复杂背景条件下,实现实时人脸检测和跟踪的一种方法。
这种方法是以主要成分分析技术为基础的。
为了实现人脸的检测,首先,我们要用一个肤色模型和一些动作信息(如:姿势、手势、眼色)。
然后,使用PAC技术检测这些被检验的区域,从而判定人脸真正的位置。
而人脸跟踪基于欧几里德(Euclidian)距离的,其中欧几里德距离在位于以前被跟踪的人脸和最近被检测的人脸之间的特征空间中。
用于人脸跟踪的摄像控制器以这样的方法工作:利用平衡/(pan/tilt)平台,把被检测的人脸区域控制在屏幕的中央。
这个方法还可以扩展到其他的系统中去,例如电信会议、入侵者检查系统等等。
1.引言视频信号处理有许多应用,例如鉴于通讯可视化的电信会议,为残疾人服务的唇读系统。
在上面提到的许多系统中,人脸的检测喝跟踪视必不可缺的组成部分。
在本文中,涉及到一些实时的人脸区域跟踪[1-3]。
一般来说,根据跟踪角度的不同,可以把跟踪方法分为两类。
有一部分人把人脸跟踪分为基于识别的跟踪喝基于动作的跟踪,而其他一部分人则把人脸跟踪分为基于边缘的跟踪和基于区域的跟踪[4]。
基于识别的跟踪是真正地以对象识别技术为基础的,而跟踪系统的性能是受到识别方法的效率的限制。
基于动作的跟踪是依赖于动作检测技术,且该技术可以被分成视频流(optical flow)的(检测)方法和动作—能量(motion-energy)的(检测)方法。
基于边缘的(跟踪)方法用于跟踪一幅图像序列的边缘,而这些边缘通常是主要对象的边界线。
然而,因为被跟踪的对象必须在色彩和光照条件下显示出明显的边缘变化,所以这些方法会遭遇到彩色和光照的变化。
此外,当一幅图像的背景有很明显的边缘时,(跟踪方法)很难提供可靠的(跟踪)结果。
当前很多的文献都涉及到的这类方法时源于Kass et al.在蛇形汇率波动[5]的成就。
基于几何特征与深度数据的三维人脸识别
基于几何特征与深度数据的三维人脸识别作者:陈立生王斌斌来源:《电脑知识与技术》2013年第08期摘要:提出一种基于三维点云数据多特征融合的人脸识别方法。
利用深度信息提取人脸中分轮廓线和鼻尖横切轮廓线;采用曲率分析的方法定位出人脸关键点,针对鼻子等人脸刚性区域,选取并计算了4类(包括曲率、距离、体积和角度)共13维的特征向量作为三维几何特征。
深度图特征采用结合LBP与Fisherface的方法进行提取与识别。
在3DFACE-XMU和ZJU-3DFED数据库上比较了该方法与PCA、LBP等单一方法的识别性能,识别效果有比较明显的提升。
关键词:三维人脸识别;几何特征;深度图像;LBP算子;FisherFace中图分类号:TP391 文献标识码:A 文章编号:1009-3044(2013)08-1864-051 概述基于二维图像的人脸识别算法经过半个多世纪的发展,已经取得了一定的研究成果。
随着LBP[1]和Gabor[2]等算子的引入,以及子空间方法在人脸识别上的应用,人脸识别进入高速发展的时期。
然而研究表明[3],受限于所采用的数据形式,二维人脸识别方法不可避免的受到环境(如光照,背景等)和人脸本身(如姿态,表情等)等因素的影响。
因此本课题组赖海滨[4]、刘丹华[5]通过双目立体视觉技术获得了具有良好表征能力的人脸三维点云数据。
该文在此基础上研究了基于点云的三维人脸识别技术。
分别研究了基于几何特征以及基于深度图的三维人脸识别。
该文计算几何特征主要选取人脸的中分轮廓线、鼻尖横切轮廓线作为研究对象。
人脸的中分轮廓线上包括了前额、鼻子、嘴巴和下巴的缩影,呈现出了人脸中最突出的各个器官的轮廓。
鼻尖横切轮廓线上包含了鼻翼点和鼻尖点的信息,能够有效地表达鼻子宽度、鼻尖鼻翼所成角度等信息。
该文在人脸中分轮廓线和鼻尖横切轮廓线上定位了十三个特征点并根据这些特征点之间的关系计算出几何特征。
该文采用LBP算子提取人脸深度图的纹理特征。
人脸面部识别算法的点
人脸面部识别算法的点英文回答:Facial recognition algorithms leverage computer vision and machine learning techniques to detect, analyze, and map facial features. These algorithms are often utilized in security, surveillance, and access control systems, as well as in consumer applications such as photo tagging andsocial media filters.The process of facial recognition typically involves the following steps:1. Face Detection: The algorithm identifies the presence of a face in an image or video frame.2. Feature Extraction: Key facial features, such as the eyes, nose, mouth, and other unique characteristics, are extracted and mapped.3. Feature Representation: The extracted features are converted into a numerical representation that can beeasily compared and processed.4. Matching: The feature representation of an unknown face is compared to a database of known faces to identify potential matches.Types of Facial Recognition Algorithms:There are several different types of facial recognition algorithms, including:Local Binary Patterns (LBP): LBP algorithms analyze the texture of facial features by comparing the brightness values of adjacent pixels.Scale Invariant Feature Transform (SIFT): SIFT algorithms detect and describe key points in an image, which are then used for matching.Histogram of Oriented Gradients (HOG): HOG algorithmscreate histograms of the gradients of image pixels, which are then used to extract features.Convolutional Neural Networks (CNNs): CNNs are deep learning algorithms that have achieved state-of-the-art performance in facial recognition tasks.Accuracy and Limitations:The accuracy of facial recognition algorithms depends on factors such as the quality of the input image, the algorithm itself, and the size and diversity of thetraining dataset. While facial recognition algorithms have made significant progress in recent years, they are not yet foolproof and can be susceptible to errors caused byfactors such as facial expressions, lighting conditions, and aging.Ethical Considerations:The use of facial recognition algorithms raises ethical concerns related to privacy, surveillance, and bias. It iscrucial to ensure that these algorithms are used in a responsible and transparent manner, with appropriate safeguards in place to protect individual rights and freedoms.中文回答:人脸面部识别算法要点。
人脸识别外文文献
Method of Face Recognition Based on Red-BlackWavelet Transform and PCAYuqing He, Huan He, and Hongying YangDepartment of Opto-Electronic Engineering,Beijing Institute of Technology, Beijing, P.R. China, 10008120701170@。
cnAbstract。
With the development of the man—machine interface and the recogni—tion technology, face recognition has became one of the most important research aspects in the biological features recognition domain. Nowadays, PCA(Principal Components Analysis) has applied in recognition based on many face database and achieved good results. However, PCA has its limitations: the large volume of computing and the low distinction ability。
In view of these limitations, this paper puts forward a face recognition method based on red—black wavelet transform and PCA. The improved histogram equalization is used to realize image pre-processing in order to compensate the illumination. Then, appling the red—black wavelet sub—band which contains the information of the original image to extract the feature and do matching。
基于PCA和ICA的人脸识别
, U = [ U 1 …U N ] 。
( 3)
则训练样本影像集 X 在特征脸子空间上的投影为 : ρPCA = W eig X T ( 4) 因此 ,前 m 个主轴所决定的子空间能最大可能地还 原原始数据 。引入主成分的目的是为了降低维数 ,
m 到底取多大合适呢 ? 可以通过贡献率或者累计
贡献率来确定 m :
1 PCA 和 ICA 的人脸识别
1. 1 PCA 人脸识别 PCA 是线性模型参数估计性能的一种常用方 法 。基本思想是将原来的回归自变量变换到另一组
变量 ,即所谓的 “主成分” ,然后选择其中一部分重要 成分作为自变量 ( 此时丢弃了一部分不重要的自变 量) ,最后利用最小二乘方法对选取主成分后的模型 参数进行估计 。PCA 算法用于人脸识别主要是在 二阶统计量基础上进行分析的 。 利用 PCA 算法进行人脸识别 ,对训练影像进行 标准化处理 :
N m N
R =λ j
i =1
λ i或 R = ∑
j =1
λ j ∑
i =1
λ ∑
i
( 5)
在人脸识别中 , 许多重要的信息包含在高阶统 计量中 。ICA 算法是一种基于高阶统计量的去相关 多元数据处理方法 。其基本思想是用一些基函数来 表示一系列随机变量 , 而假设它的各成分之间是统 计独立的或者尽可能独立 [ 1 ] 。 利用 ICA 算法进行人脸识别 , 人脸训练样本影 像集 X 可以看作为统计独立的基影像 S 和可逆混 合矩阵 A 的线性组合 : ( 8) X = AS ICA 算法的目的就是找出混合矩阵 A 或者分离矩 阵 W ,使其满足下式 : -1 ( 9) I = W X = WA S , A = W 式中 , I 为独立统计基影像 S 的估计 。 因此 , ICA 人脸识别就是根据输入影像求出混 合矩阵 A 或者分离矩阵 W , 其模型如图 3 所示 。 文中引用 H YVAR IN EN [ 5 ] 给出的快速固定点
face recognition ppt
INTRODUCTION( / )
Two challenging factors : illumination variation and head pose change. The illumination variation changes the profile (histogram) of the image intensity distribution. The pose variation changes the image projection transform, resulting in a different face image, for instance, the profile view is widely different from the frontal view.
255, D ( x , y ) ≥ TD ˆ (x , y) = ⎧ D ⎨ ⎩0 , D ( x , y ) < TD
Face Detection and Facial Landmark Extraction ( / )
Face skin is extracted based on a similarity measure using (hue, saturation) color attributes:
(
)
D = U 3 m × n D n × nV n × n
T
(Non-singular case : rank (D) = 4)
M
ห้องสมุดไป่ตู้(P)
=U
4
D4
X ( P ) = D4 V4T
Camera Matrix and 3D Face Model Estimation by Multi-View Factorization ( / )
人脸特征值聚类算法
人脸特征值聚类算法英文回答:Face feature clustering algorithms are widely used in various applications, such as face recognition, emotion analysis, and facial expression detection. The goal of these algorithms is to group similar face features together based on their similarities and dissimilarities. This allows for efficient face feature representation and analysis.One commonly used algorithm for face feature clustering is the k-means algorithm. This algorithm aims to partition the face features into k clusters, where each cluster represents a group of similar face features. The algorithm starts by randomly selecting k initial cluster centroids and then iteratively assigns each face feature to the nearest centroid. After each assignment, the centroids are updated based on the mean of the assigned face features. This process continues until convergence, where the facefeatures are assigned to their final clusters.Another popular algorithm for face feature clustering is the hierarchical clustering algorithm. This algorithm builds a hierarchy of clusters by successively merging or splitting clusters based on their similarities. It starts with each face feature as a separate cluster and then merges the closest pair of clusters at each step. This process continues until all face features are in a single cluster or until a predefined stopping criterion is met. The resulting hierarchy of clusters can be represented as a dendrogram, which provides insights into the relationships between different clusters.In addition to these algorithms, there are also other advanced techniques for face feature clustering, such as spectral clustering, density-based clustering, and fuzzy clustering. These techniques offer different advantages and can be chosen based on the specific requirements of the application.中文回答:人脸特征聚类算法被广泛应用于各种应用中,如人脸识别、情绪分析和面部表情检测。
3D FACE RECOGNITION FOR BIOMETRIC APPLICATIONS
3D FACE RECOGNITION FOR BIOMETRIC APPLICATIONSL.Akarun,B.G¨o kberk,A.A.SalahDepartment of Computer EngineeringBo˘g azic¸i University,Bebek,˙Istanbul,Turkeyphone:+(90)2123597183,fax:+(90)2122872461e-mail:{akarun,gokberk,salah}@.trABSTRACTFace recognition(FR)is the preferred mode of identity recognition by humans:It is natural,robust and unintrusive.However,auto-matic FR techniques have failed to match up to expectations:Vari-ations in pose,illumination and expression limit the performance of2D FR techniques.In recent years,3D FR has shown promise to overcome these challanges.With the availability of cheaper ac-quisition methods,3D face recognition can be a way out of these problems,both as a stand-alone method,or as a supplement to2D face recognition.We review the relevant work on3D face recogni-tion here,and discuss merits of different representations and recog-nition algorithms.1.INTRODUCTIONRecent developments in computer technology and the call for better security applications have brought biometrics into focus.A bio-metric is a physical property;it cannot be forgotten or mislaid like a password,and it has the potential to identify a person in very different settings:a criminal entering an airport,an unconscious patient without documents for identification,an authorized person accessing a highly-secured system.Be it for purposes of security or human–computer interaction,there is wide application to robust biometrics.Two different scenarios are of primary importance.In the veri-fication(authentication)scenario,the person claims to be someone, and this claim is verified by ensuring the provided biometric is suf-ficiently close to the data stored for that person.In the more diffi-cult recognition scenario,the person is searched in a database.The database can be small(e.g.criminals on the wanted list)or large (e.g.photos on registered ID cards).The unobtrusive search for a number of people is called screening.The signature and handwriting have been the oldest biometrics, used in the verification of authentication of documents.Face image and thefingerprint also have a long history,and are still kept by police departments all over the world.More recently,voice,gait, retina and iris scans,hand print,and3D face information are con-sidered for biometrics.Each of these have different merits,and applicability.When deploying a biometrics based system,we con-sider its accuracy,cost,ease of use,ease of development,whether it allows integration with other systems,and the ethical consequences of its use.Two other criteria are susceptibility to spoofing(faking an identity)in a verification setting,and susceptibility to evasion (hiding an identity)in a recognition setting.The purpose of the present study is to discuss the merits and drawbacks of3D face information as a biometric,and review the state of the art in3D face recognition.Two things make face recog-nition especially attractive for our consideration.The acquisition of the face information is easy and non-intrusive,as opposed to iris and retina scans.This is important if the system is going to be used frequently,and by a large number of users.The second point is the relatively low privacy of the information;we expose our faces constantly,and if the stored information is compromised,it does not lend itself to improper use like signatures andfingerprints would.The drawbacks of3D face recognition include high cost and decreased ease-of-use for laser sensors,low accuracy for other acquisiton types,and the lack of sufficiently powerful algorithms. Figure.1presents a summary of different biometrics and their rela-tivestrengths.Figure1:Biometrics and their relative strengths.Although2D and 3D face recognition are not as accurate as iris scans,their ease of use and lower cost makes them a preferable choice for some scenarios.3D face recognition represents an improvement over2D face recognition in some respects.Recognition of faces from still im-ages is a difficult problem,because the illumination,pose and ex-pression changes in the images create great statistical differences and the identity of the face itself becomes shadowed by these fac-tors.Humans are very capable in this modality,precisely because they learn to deal with these variations.3D face recognition has the potential to overcome feature localization,pose and illumination problems,and it can be used in conjunction with2D systems.In the next section we review the current research on3D face recognition.We focus on different representations of3D informa-tion,and the fusion of different sources of information.We con-clude by a discussion of the future of3D face recognition.2.STATE OF THE ART IN3D FACE RECOGNITION 2.13D Acquisition and PreprocessingWe distinguish between a number of range data acquisition tech-niques.In the stereo acquisition technique,two or more cameras that are positioned and calibrated are employed to acquire simul-taneous snapshots of the subject.The depth information for each point can be computed from geometrical models and by solving a correspondence problem.This method has the lowest cost and high-est ease of use.The structural light technique involves a light pat-tern projected on the face,where the distortion of the pattern reveals depth information.This setup is relatively fast,cheap,and allows a single standard camera to produce3D and texture information.The last technique employs a laser sensor,which is typically more accu-rate,but also more expensive and slower to use.The acquisition of a single3D head scan can take more than30seconds,a restricting factor for the deployment of laser-based systems.3D information needs to be preprocessed after acquisition.De-pending on the type of sensor,there might be holes and spikes(ar-tifacts)in the range data.Eyes and hair will not reflect the light appropriately,and the structured light approaches will have trouble correctly registering those portions.Illumination still effects the3D acquisition,unless accurate laser scanners are employed[7].For patching the holes,missing points can befilled by inter-polation or by looking at the other side of the face[22,33].Gaus-sian smoothing and linear interpolation are used for both texture and range information[1,8,10,13,15,22,24,30].Clutter is usually re-moved manually[6,8,15,24,18,21,29,30]and sometimes parts of the data are completely omitted where the acquisition leads to noise levels that cannot be coped with algorithmically[10,21,35]. To help distance calculation,the mesh representations can be regu-larized[16,36],or a voxel discretization can be used[2].Most of the algorithms start by aligning the faces,either by their centres of mass[8,29],nose tip[15,18,19,22,26,30],the eyes[13,17],or byfitting a plane to the face and aligning it with that of the camera[2].Registration of the images is important for all local similarity metrics.The key idea in registration is to define the similarity metric and the set of possible transformations.The sim-ilarity is measured by point-to-point or point-to-surface distances, or cross correlation between more complex features.The rigid transformation of a3D object involves a3D rotation and translation,but the nonlinearity of the problem calls for iterative methods[11].The most frequently used([16,19,21,22,27,29]) registration technique is the Iterative Closest Point(ICP)algo-rithm[3].Warping and deforming the models(non–rigid regis-tration)for better alignment helps co-locating the landmarks.An important method is the Thin Plate Spline(TPS)algorithm,which establishes perfect correspondence[16,20].One should however keep in mind that the deformation may be detrimental to the recog-nition performance,as discriminatory information is lost propor-tional to the number of anchor points.Lu and Jain also distin-guish between inter-subject and intra-subject deformations,which is found useful for classification[20].Landmark locations used in registration are either found man-ually[6,10,17,19,21,25,33]or automatically[12,16,37].The correct localization of the landmarks is crucial to many algorithms, and it is usually not possible to judge the sensitivity of an algorithm to localization errors from its description.Nevertheless,the auto-matic landmark localization remains an unsolved problem.2.23D Recognition AlgorithmsWe summarize relevant work in3D face recognition.We have clas-sified each work according to the primary representation used in the recognition algorithm,much in the spirit of[7].Table3summarizes the recent work on3D and2D+3D face recognition.2.2.1Curvatures and Surface FeaturesIn one of the early3D face papers,Gordon proposed a curvature-based method for face recognition from3D data,kept in a cylin-drical coordinate system[13].Since the curvatures involve second derivatives,they are very sensitive to noise.An adaptive Gaussian smoothing is applied so as not to destroy curvature information. In[31]principal directions of curvatures are used.The advantage of these over surface normals is that they are applicable to free-form surfaces.Moreno et al.extracted a number of features from 3D data,and found that curvature and line features perform better than area features[24].In[14],the authors have compared differ-ent representations on the3D RMA dataset:point clouds,surface normals,shape-index values,depth images,and facial profile sets. Surface normals are reported to be more discriminative than others, and LDA is found very useful in extracting discriminative features.2.2.2Point Clouds and MeshesPoint cloud is the most primitive3D representation for faces,and it is difficult to work with.Achermann and Bunke employ Haus-dorff distance for matching the point clouds[2].They use a voxel discretization to speed up matching,but it causes some informa-tion o et al.discard matched points with large distances as noise[17].When the data are in point cloud representation,ICP is the most widely used registration technique.The similarity of two point sets that is calculated at each iteration of the ICP algorithm is fre-quently used in point cloud-based face recognizers.Medioni and Waupotitsch present an authentication system that acquires the3D image of the subject with two calibrated cameras[23]and ICP al-gorithm is used to define similarity between two face meshes.Lu et e a hybrid-ICP based registration using Besl’s method and Chen’s method successively[19].The base mesh is also used for alignment in[36],where features are extracted from around land-mark points,and nearest neighbour after PCA is used for recogni-tion.Lu and Jain also use ICP for rigid deformations,but they also propose to use TPS for intra-subject and inter-subject nonrigid de-formations,with the purpose of handling expression variations[20]. Deformation analysis and combination with appearance based clas-sifiers both increase the recognition accuracy.In a similar study,˙Irfano˘g lu et al.have used ICP to automat-ically locate facial landmarks in a coarse alignment step,and then warp faces using TPS algorithm to establish dense point-to-point correspondences[16].The use of an average face model signifi-cantly reduces the complexity of similarity calculation and point-cloud representation of registered faces are more suitable for recog-nition then depth image-based methods,point signatures,and im-plicit polynomial-based representation techniques.In a follow-up study,G¨o kberk et al.have analyzed the effect of registration meth-ods on the classification accuracy[14].To inspect side effects of warping on discrimination an ICP-based approximate dense regis-tration algorithm is designed that allows only rotation and transla-tion transformations.Experimental results confirmed that ICP with-out warping leads to better recognition accuracy1.Table.1sum-marizes the classification accuracies of different feature extractors for both TPS-based and ICP-based registration algorithms on the 3D RMA dataset.Improvement is visible for all feature extraction methods,except the shape-index.Table1:Average classification accuracies(and standard deviations) of different face recognizers for1)TPS warping-based and2)ICP-based face representation techniques.TPS ICPPoint Cloud92.95±1.0196.48±2.02Surface Normals97.72±0.4699.17±0.87Shape Index90.26±2.2188.91±1.07Depth PCA45.39±2.1550.78±1.10Depth LDA75.03±2.8796.27±0.93Central Profile60.48±3.7882.49±1.34Profile Set81.14±2.0994.30±1.552.2.3Depth MapDepth maps are usually used in conjunction with subspace meth-ods,although most of the existing2D techniques are suitable for processing the depth maps.The depth map construction consists of selecting a viewpoint,and smoothing the sampled depth values. In[15],PCA and ICA were compared on the depth maps.ICA was found to perform better,but PCA degraded more gracefully with declining numbers of training samples.In Srivastava et al. the set of all k-dimensional subspaces of the data space is searched with a MCMC simulated annealing algorithm for the optimal linear subspace[30].The optimal subspace method performs better than PCA,LDA or ICA.Achermann at pare an eigenface method with a5-state left-right HMM on a database of depth maps[1]. They show that the eigenface method outperforms the HMM,and 1In[32]texture was found to be more informative than depth;ourfind-ings point out to warping as a possible reason.the smoothing effects the eigenface method positively,while its ef-fect on the HMM is detrimental.The3D data are usually more suitable for alignment,and should be preferred if available.In Lee et al.the3D image is thresholded after alignment to obtain the depth map,and a number of small windows are sampled from around the nose[18].The statistical features extracted from these windows are used in recognition. 2.2.4ProfileThe most important problem for the profile-based schemes is the extraction of the profile.In an early paper Cartoux et e an iterative scheme tofind the symmetry plane that cuts the face into two similar parts[9].The nose tip and a second point are used to extract the profiles.Nagamine et e various heuristics tofind feature points and align the faces by looking at the symmetry[25]. Then the faces are intersected with different kinds of planes(verti-cal,horizontal or cylindrical around the nose tip),and the intersec-tion curve is used in recognition.Vertical planes around±20mm.of the central region and selecting a cylinder with20−30mm.radius around the nose(crossing the inner corners of the eyes)produced the best results.In[4],Beumier and Acheroy detail the acquisition of the popular3D RMA dataset with structural light and report pro-file based recognition results.In addition to the central profile,they use the average of two lateral profiles in recognition.Once the profiles are obtained,there are several ways of match-ing them.In[9],corresponding points of two profiles are selected to maximize a matching coefficient that uses the curvature on the profile curve.Then a correlation coefficient and the mean quadratic distance is calculated between the coordinates of the aligned profile curves,as two alternative measures.In[4],the area between the profile curves is used.In[14]distances calculated with L1norm, L2norm,and generalized Hausdorff distance were compared for aligned profiles,and the L1norm is found to perform better.2.2.5Analysis by SynthesisIn[6]the analysis-by-synthesis approach that uses morphable mod-els is detailed.A morphable model is defined as a convex combi-nation of shape and texture vectors of a number of samples that are placed in dense correspondence.A single3D model face is used to render an image similar to the test image,which leads to the es-timation of viewpoint parameters(pose angles,3D translation,fo-cal length of the camera),illumination parameters(ambient and di-rected light intensities,direction angles of the light,colour contrast, gains and offsets of the colour channels),and deformation parame-ters(shape and texture).In[22]a system is proposed to work with 2D colour images and corresponding3D depth maps.The idea is to synthesize a pose and illumination corrected image pair for recog-nition.The depth images performed significantly better(by4-7per cent)than colour images,and the combination increased the accu-racy as well(by1-2per cent).Pose correction is found to be more important than illumination correction.2.2.6Combinations of RepresentationsMost of the work that uses3D face data use a combination of rep-resentations.The enriched variety of features,when combined with classifiers with different statistical properties,produce more accu-rate and more robust results.In Tsutsumi et al.surface normals and intensities are concatenated to form a single feature vector,and the dimensionality is reduced with PCA[34].In[35],the3D data are described by point signatures,and the2D data by Gabor wavelet responses,respectively.3D intensities and texture were combined to form the4D representation in[29].Bronstein et al.point out to the non-rigid nature of the face,and to the necessity of using a suit-able similarity metric that takes this deformability into account[8]. For this purpose,they use multi-dimensional scaling projection al-gorithm for both shape and texture information.Apart from techniques that fuse the representations at the fea-ture level,there are a number of systems that employ combination at the decision level.Chang et al.propose in[10]to use Maha-lanobis distance-based nearest-neighbor classifiers on the2D inten-sity and3D range images separately,and fuse the decisions with a rank-based approach at the decision level.In[32]the depth map and colour maps(one for each YUV channel)are projected via PCA and the distances in four subspaces are combined by multiplication. In[33]the depth map and the intensity image are processed with embedded HMMs separately,and weighted score summation is pro-posed for the combination.In[21],Lu and Jain combine texture (LDA)and surface(point-to-plane distance)with weighted voting, but only the difficult samples are classified via the combined sys-tem.Profiles are also used in conjunction with other features.In[5], 3D central and lateral profiles,gray level central and lateral profiles were evaluated separately,and then fused with Fisher’s method. In[26]a surface-based recognizer and a profile-based recognizer are combined at the decision level.Surface-matcher’s similarity is based on a point cloud distance approach,and profile similarity is calculated using Hausdorff distance.In[27],a number of methods are tested on the depth map(Eigenface,Fisherface,and kernel Fish-erface),and the depth map expert is fused with three profile experts with Max,Min,Sum,Product,Median and Majority V ote rules,out of which the Sum rule was selected.G¨o kberk et al.have proposed two combination schemes that use3D facial shape information[14].In thefirst scheme,called parallel fusion,different pattern classifiers are trained using differ-ent features such as point clouds,surface normals,facial profiles, and PCA/LDA of depth images.The outputs of these pattern classi-fiers are merged using a rank-based decision level fusion algorithm. As combination rules,consensus voting,a non-linear variation of a rank-sum method,and a highest rank majority method are used.Ta-ble.2shows the recognition accuracies of individual pattern recog-nizers together with the accuracies of the parallel ensemble methods for the3D RMA dataset.It is seen that while the best individual pat-tern classifier(Depth-LDA)can accurately classify96.27per cent of the test examples,a non-linear rank-sum fusion of Depth-LDA, surface normals,and point cloud classifiers improves the accuracy to99.07per cent.Paired t-test results indicate that all of the ac-curacies of the parallel fusion schemes are statistically better than individual classifier’s performances.The second scheme is called serial fusion where the class outputs of afilteringfirst classifier is passed to a second more complex classifier.The ranked output lists of these classifiers are fused.Thefirst classifier in the pipeline should be fast and accurate.Therefore a point cloud-based pattern classifier was selected.As the second classifier,Depth-LDA was chosen because of its discriminatory power.This system has98.14 per cent recognition accuracy,significantly better than the single best classifier.Table2:Classification accuracies of single face classifiers(top part),and the combined classifiers(bottom part).Performances of Pattern ClassifiersDimensionality Acc.Point Cloud3,389×395.96Surface Normals3,389×395.54Depth PCA30050.78Depth LDA3096.27Profile Set1,55794.30Performances of Combined ClassifiersPattern Classifiers Acc.Consensus V oting LDA,PC,SN98.76Nonlinear Rank-Sum Profile,LDA,SN99.07Highest Rank Majority Profile,LDA,SN,PC98.13Serial Fusion PC,LDA98.143.CONCLUSIONSThere are a number of questions3D face recognition research needs to address.In acquisition,the accuracy of cheaper and less intru-sive systems needs to be improved,temporal sequences should be considered.For registration,automatic landmark localization,arti-fact removal,scaling,and elimination of errors due to occlusions, glasses,beard,etc.need to be worked out.Ways of deforming the face without losing discriminative information might be beneficial.It is obvious that information fusion is the future of3D face recognition.There are many ways of representing and combining texture and shape information.We also distinguish between local and configural processing,where the ideal face recognizer makes use of both.For realistic systems,single training instance cases should be considered,which is a great hurdle to some of the more successful discriminative algorithms.Publicly available3D datasets are necessary to encourage further research on these topics.REFERENCES[1]Achermann,B.,X.Jiang,H.Bunke,“Face recognition usingrange images,”in Proc.Int.Conf.on Virtual Systems and Mul-tiMedia,pp.129-136,1997.[2]Achermann,B.,H.Bunke,“Classifying range images of hu-man faces with Hausdorff distance,”in Proc.ICPR,pp.809-813,2000.[3]Besl,P.,N.McKay,“A Method for Registration of3-DShapes,”IEEE Trans.PAMI,vol.14,no.2,pp.239-256,1992.[4]Beumier,C.,M.Acheroy,“Automatic3D face authentication,”Image and Vision Computing,vol.18,no.4,pp.315-321,2000.[5]Beumier,C.,M.Acheroy,“Face verification from3D andgrey level cues,”Pattern Recognition Letters,vol.22,pp.1321-1329,2001.[6]Blanz,V.,T.Vetter,“Face Recognition Based on Fitting a3DMorphable Model,”IEEE Trans.PAMI,vol.25,no.9,pp.1063-1074,2003.[7]Bowyer,K.,Chang K.,P.Flynn,“A survey of multi-modal2D+3D face recognition,”Technical Report,Notre Dame De-partment of Computer Science and Engineering,2004. [8]Bronstein,A.M.,M.M.Bronstein,R.Kimmel,“Expression-invariant3D face recognition,”in J.Kittler,M.S.Nixon(eds.) Audio-and Video-Based Person Authentication,pp.62-70, 2003.[9]Cartoux,J.Y.,Preste,M.Richetin,“Face authentica-tion or recognition by profile extraction from range images,”in Proc.of the Workshop on Interpretation of3D Scenes,pp.194-199,1989.[10]Chang,K.,K.Bowyer,P.Flynn,“Multi-modal2D and3Dbiometrics for face recognition,”in Proc.IEEE Int.Workshop on Analysis and Modeling of Faces and Gestures,2003. [11]Chen,Y.,G.Medioni,“Object Modeling by Registrationof Multiple Range Images,”Image and Vision Computing, vol.10,no.3,pp.145-155,1992.[12]Colbry,D.,X.Lu,A.Jain,G.Stockman,“3D face feature ex-traction for recognition,”Technical Report MSU-CSE-04-39, Computer Science and Engineering,Michigan State Univer-sity,2004.[13]Gordon,G.“Face recognition based on depth and curvaturefeatures,”in SPIE Proc.:Geometric Methods in Computer Vi-sion,vol.1570,pp.234-247,1991.[14]G¨o kberk,B.,A.A.Salah,L.Akarun,“Rank-based DecisionFusion for3D Shape-based Face Recognition,”submitted for publication.[15]Hesher, C., A.Srivastava,G.Erlebacher,“A novel tech-nique for face recognition using range imaging,”in Proc.7th Int.Symposium on Signal Processing and Its Applications, pp.201-204,2003.[16]˙Irfano˘g lu,M.O.,B.G¨o kberk,L.Akarun,“3D Shape-BasedFace Recognition Using Automatically Registered Facial Sur-faces,”in Proc.ICPR,vol.4,pp.183-186,2004.[17]Lao,S.,Y.Sumi,M.Kawade,F.Tomita,“3D template match-ing for pose invariant face recognition using3D facial model built with iso-luminance line based stereo vision,”in Proc.ICPR,vol.2,pp.911-916,2000.[18]Lee,Y.,K.Park,J.Shim,T.Yi,“3D face recognition usingstatistical multiple features for the local depth information,”in Proc.ICVI,2003.[19]Lu,X.,D.Colbry,A.K.Jain,“Three-Dimensional ModelBased Face Recognition,”in Proc.ICPR,2004.[20]Lu,X.A.K.Jain,“Deformation Analysis for3D Face Match-ing,”to appear in Proc.IEEE WACV,2005.[21]Lu,X.A.K.Jain,“Integrating Range and Texture Informationfor3D Face Recognition,”to appear in Proc.IEEE WACV, 2005.[22]Malassiotis,S.,M.G.Strintzis,“Pose And Illumination Com-pensation For3D Face Recognition,”in Proc.ICIP,2004. [23]Medioni,G.,R.Waupotitsch,“Face recognition and model-ing in3D,”IEEE Int.Workshop on Analysis and Modeling of Faces and Gestures,pp.232-233,2003.[24]Moreno, A.B.,´A.S´a nchez,J.F.V´e lez, F.J.D´ıaz,“Facerecognition using3D surface-extracted descriptors,”in Proc.IMVIPC,2003.[25]Nagamine,T.,T.Uemura,I.Masuda,“3D facial image anal-ysis for human identification,”in Proc.ICPR,pp.324-327, 1992.[26]Pan,G.,Y.Wu,Z.Wu,W.Liu,“3D Face recognition by pro-file and surface matching,”in Proc.IJCNN,vol.3,pp.2169-2174,2003.[27]Pan,G.,Z.Wu,“3D Face Recognition From Range Data,”submitted to Int.Journal of Image and Graphics,2004. [28]Pankanti,S.,R.M.Bolle,A.Jain“Biometrics:The Future ofIdentification,”IEEE Computer,pp.46–49,2000.[29]Papatheodorou,T.,D.Rueckert,“Evaluation of automatic4Dface recognition using surface and texture registration,”in Proc.AFGR,pp.321-326,2004.[30]Srivastava,A.,X.Liu,C.Hesher,“Face Recognition UsingOptimal Linear Components of Range Images,”submitted for publication,2003.[31]Tanaka,H.,M.Ikeda,H.Chiaki,“Curvature-based face sur-face recognition using spherical correlation,”in Proc.ICFG, pp.372-377,1998.[32]Tsalakanidou,F.,D.Tzovaras,M.Strinzis,“Use of depth andcolour eigenfaces for face recognition,”Pattern Recognition Letters,vol.24,pp.1427-1435,2003.[33]Tsalakanidou,F.,S.Malassiotis,M.Strinzis,“Integration of2D and3D images for enhanced face authentication,”in Proc.AFGR,pp.266-271,2004.[34]Tsutsumi,S.,S.Kikuchi,M.Nakajima,“Face identificationusing a3D gray-scale image-a method for lessening restric-tions on facial directions,”in Proc.AFGR,pp.306-311,1998.[35]Wang,Y.,C.Chua,Y.Ho,“Facial feature detection and facerecognition from2D and3D images,”Pattern Recognition Letters,vol.23,pp.1191-1202,2002.[36]Xu,C.,Y.Wang,T.Tan,L.Quan,“Automatic3D face recog-nition combining global geometric features with local shape variation information,”in Proc.AFGR,pp.308-313,2004. [37]Yacoob,Y.,L.S.Davis,“Labeling of human face componentsfrom range data,”CVGIP:Image Understanding,vol.60,no.2, pp.168-178,1994.Table3:Overview of3D face recognition systemsGroup Representation Database Algorithm NotesGordon[13]curvatures26training Euclidean Curvatures can be used for feature detection24test nearest neighbour but they are sensitive to smoothing.Tanaka et al.[31]curvature NRCC Fisher’s spherical Use principal curvatures instead of surface based EGI correlation normals for non-polyhedral objects.Moreno et al.[24]Curvature,line,7img.×Euclidean Angle,distance and curvature featuresregion features60persons nearest neighbour work better than area based features. Achermann and point cloud120training Hausdorff Hausdorff distance can be speeded upBunke[2]120test nearest neighbour by voxel discretization.Lao et al.[17]curve segments36img.×Euclidean Points with bad correspondence are10persons nearest neighbour not used in distance calculation.Medioni and mesh7img.×normalized After alignment,a distance map is found. Waupotitsch[23]100persons cross-correlation Statistics of the map are used in similarity.˙Irfano˘g lu et al.[16]point cloud3D RMA Point set ICP used to align point clouds with a basedifference(PSD)mesh.PSD outperforms PCA on depth map.Lu et al.[19]mesh90training hybrid ICP and ICP distances and shape index based113test cross-correlation correlation can be usefully combined.Xu et al.[36]regular mesh3D RMA Feature extraction,Feature derivation+PCA around landmarksPCA and NN worked better than aligned mesh distances.Lu and Jain[20]deformation500training ICP+TPS,Distinguishing between inter-subject and points196test nearest neighbour intra-subject deformations helps recognition. Achermann depth map120training eigenface Eigenface outperforms HMM.Smoothinget al.[1]120test vs.HMM is good for eigenface,bad for HMM.Hesher et al.[15]mesh FSU ICA or PCA+ICA outperforms PCA,PCA degrades morenearest neighbour gracefully as training samples are decreased. Lee et al.[18]depth map2img.×feature extraction+Mean and variance of depth from windows35persons nearest neighbour around the nose are used as features. Srivastava et al.[30]depth map6img.×subspace projection Optimal subspace found with MCMC simulated67persons+SVM annealing outperforms PCA,ICA and LDA. Cartoux et al.[9]profile3/4img.×curvature based High quality images needed for principal5persons nearest neigbour curvatures.Nagamine et al.[25]vertical,horiz.,10img.×Euclidean Central vertical profile and circular sections circular profiles16persons nearest neigbour touching eye corners are most informative. Beumier and vertical profiles3D RMA area based Central profile and mean lateral profiles Acheroy[4]nearest neigbour are fused by averaging.Blanz and Vetter[6]2D+viewpoint CMU-PIE,analysis Using a generic3D model,2D viewpoint parameters FERET by synthesis parameters are found.Malassiotis and texture+110img.×embedded HMM Depth is better than colour,fusion is best.Pose Strinzis[22]depth map20persons+fusion correction is better than illumination correction. Tsutsumi et al.[34]texture+35img.×concatenated Adding perturbed versions of training images depth map24persons features+PCA reduces sensitivity of PCA.Beumier and2D and3D3D RMA nearest neighbour Combination of2D and3D helps.Temporal Acheroy[5]vertical profiles+fusion fusion(snapshots taken in time)helps too. Wang et al.[35]point signature6img.×concatenation Omit3D info from the eyes,eyebrows(missing Gabor features50persons after PCA+SVM elements)and mouth(expression sensitivity) Bronstein et al.[8]texture+157concatenation after Bending-invariant canonical representation is depth map persons PCA+near.neigh.robust to facial expressions.Chang et al.[10]texture+278training Mahalanobis based Pose correction through3D is not better than depth map166test near.neigh.+fusion rotation-corrected2D.Pan et al.[26]profile+3D RMA ICP+Hausdorff Surface and profile combined usefully.Discard point cloud+fusion worst points(10per cent)during registration. Tsalakanidou texture+XM2VTS nearest neighbour Fusion of frontal colour and depth images with et al.[32]depth map+fusion colour faces from profile.Tsalakanidou texture+60img.×embedded HMM Appropriately processed texture is moreet al.[33]depth map50persons+fusion informative than warped depth maps.Pan and Wu[27]depth map6img.×(kernel)Fisherface Sum rule is preferred to max,min,product, +profile120persons+Eigenface+fusion median and majority vote for fusion. Papatheodorou dense mesh12img.×nearest neighbour3D helps2D especially for profile views.and R¨u ckert[29]+texture62persons+fusion Texture has small relative weight.Lu and Jain[21]mesh598test ICP(3D),LDA(2D)Difficult samples are evaluated by the+texture scans+fusion combined scheme.G¨o kberk et al.[14]surface normals,3D RMA PCA,LDA,Best single classifier is depth-LDA.profiles,depth nearest neighbour,Combining it with surface normals and profilesmap,point cloud rank based fusion with nonlinear rank sum increases accuracy.。
Gabor 特征判别分析人脸识别方法的 误配准鲁棒性分析
d
模型。
此处∂=(P, P*)是自动特征定位算法给出的特征点位置 P
在系统中,对 Gabor 核的描述如下:
2
ψu, (z)‖k=u, v‖
关键词 人脸识别 误配准 Gabor 小波特征 文章编号 1002-8331- (2005) 05-0056-0文4 献标识码 A
中图分类号 TP391
Evaluation of Gabor Features for Face Recognition from the Angle of Robustness to Mis-alignment
Fisherfac[e2] 是最成功的人脸识别技术之一。在许多人脸库 上对 Fisherface 的测试都表明:在面部特征手工精确配准情况 下该方法具有优越的识别性能。但是在实际应用系统中结果则 完全不同。究其原因,发现大多数误识别都来自于眼睛中心点 定位时发生的一、两个像素的偏差,即性能的下降源于不精确 的特征配准。因为在实际应用中一、两个像素的误配准几乎是
摘 要 人脸识别领域中, Gabor 特征人脸表示方法因其在应用中获得的高首选识别率而被认为是一种理想的人脸特
征表示方法。文章用一种全新的量化评价方法,结合配准精度和识别率,从误配准鲁棒性角度评价 Gabor 特征在人脸识 别中的优越性。实验表明,和图像灰度信息特征相比, Gabor 特征不仅在精确配准时具有高识别率,而且对由于人脸特征 定位不精确而导致的图像变化的鲁棒性也更强。
Chang Yizheng1 Shan Shiguang2 Gao W2en1, Cao Bo2 Yang Peng2
1
2
Abstract: Gabor feaTure has been Widely recognized as a desirous represenTaTion for face recogniTion in Ter high recogniTion raTe.This paper evaluaTes Gabor feaTure for face recogniTion from a neW angle of iTs robusT alignmenT using a novel quanTificaTional evaluaTion meThod Which combines The alignmenT precision WiTh T accuracy.This experimenTs shoW ThaT, compared WiTh The gray-level inTensiTy, Gabor feaTure is much more variaTion caused by The imprecision of facial feaTure localizaTion, Which furTher supporT The feas represenTaTion.
面部识别在中国的应用英语作文
面部识别在中国的应用英语作文Facial recognition technology, a cutting-edge biometric technology, has been experiencing rapid development and widespread application in China. Leveraging advances in artificial intelligence and machine learning, this technology has become an integral part of daily life,革命izing various industries and sectors.In the realm of security, facial recognition has become a powerful tool in the hands of law enforcement agencies. Police forces across the country are using this technologyto identify criminal suspects, track fugitives, and monitor public places for suspicious activities. This not only enhances the efficiency of law enforcement but alsoimproves public safety.The retail industry has also been revolutionized by facial recognition. Stores are now able to recognize their customers and provide personalized shopping experiences. This technology can identify a customer's preferences and buying habits, enabling retailers to offer targeted discounts and recommendations. Furthermore, it can alsohelp in preventing shoplifting by identifying known thieves.Financial institutions have also embraced facial recognition technology. Banks and other financialinstitutions are using this technology to authenticate customers and prevent fraud. By comparing a customer's face with their stored biometric data, these institutions can ensure that only the rightful owner can access their accounts.In addition to these industries, facial recognition technology is also finding its way into our daily lives. Smartphones and other electronic devices now come withfacial unlock features, making it easier and moreconvenient for users to unlock their devices. This technology is also being used in airports, railway stations, and other public places to facilitate fast and efficient check-in and identification processes.Despite its widespread application, facial recognition technology in China has also raised concerns regarding privacy and ethical issues. There have been reports of misuse of this technology, such as the unauthorized collection and sale of biometric data. To address these concerns, the Chinese government has been working onregulating the use of facial recognition technology, ensuring that it is used ethically and within legal limits. In conclusion, facial recognition technology has brought about significant changes in China, revolutionizing various industries and enhancing public safety. However, it is crucial to address the privacy and ethical issues associated with this technology to ensure its responsible and sustainable use.**面部识别在中国的应用**面部识别技术,作为前沿的生物识别技术,在中国经历了快速发展和广泛应用。
生成对抗网络人脸生成及其检测技术研究
1 研究背景近年来,AIGC(AI Generated Content)技术已经成为人工智能技术新的增长点。
2023年,AIGC开启了人机共生的时代,特别是ChatGPT的成功,使得社会对AIGC的应用前景充满了期待。
但AIGC在使用中也存在着诸多的风险隐患,主要表现在:利用AI生成不存在的虚假内容或篡改现有的真实内容已经达到了以假乱真的效果,这降低了人们对于虚假信息的判断力。
例如,2020年,MIT利用深度伪造技术制作并发布了一段美国总统宣布登月计划失败的视频,视频中语音和面部表情都高度还原了尼克松的真实特征,成功地实现了对非专业人士的欺骗。
人有判断力,但AI没有,AI 生成的内容完全取决于使用者对它的引导。
如果使用者在使用这项技术的过程中做了恶意诱导,那么AI所生成的如暴力、极端仇恨等这样有风险的内容会给我们带来很大的隐患。
因此,对相关生成技术及其检测技术的研究成为信息安全领域新的研究内容。
本文以A I G C在图片生成方面的生成技术为目标,分析现有的以生成对抗网络(G e n e r a t i v e Adversarial Network,GAN)为技术基础的人脸生成技术。
在理解GAN的基本原理的同时,致力于对现有的人像生成技术体系和主要技术方法进行阐述。
对于当前人脸伪造检测的主流技术进行综述,并根据实验的结果分析检测技术存在的问题和研究的方向。
2 GAN的基本原理GAN由Goodfellow等人[1]于2014年首次提出。
生成对抗网络人脸生成及其检测技术研究吴春生,佟 晖,范晓明(北京警察学院,北京 102202)摘要:随着AIGC的突破性进展,内容生成技术成为社会关注的热点。
文章重点分析基于GAN的人脸生成技术及其检测方法。
首先介绍GAN的原理和基本架构,然后阐述GAN在人脸生成方面的技术模式。
重点对基于GAN在人脸语义生成方面的技术框架进行了综述,包括人脸语义生成发展、人脸语义生成的GAN实现。
利用人脸检测技术改进面部辨识新生物方法(IJIGSP-V4-N8-6)
I.J. Image, Graphics and Signal Processing, 2012, 8, 43-49Published Online August 2012 in MECS (/)DOI: 10.5815/ijigsp.2012.08.06New Biometric Approaches for Improved Person Identification Using Facial Detection1 V.K. NARENDIRA KUMAR &2 Dr. B. SRINIVASAN1 Assistant Professor, Department of Information Technology,2Associate Professor, PG & Research Department of Computer Science,Gobi Arts & Science College (Autonomous),Gobichettipalayam – 638 453, Erode District, Tamil Nadu, India.Email ID: 1kumarmcagobi@, 2 srinivasan_gasc@Abstract—Biometrics is measurable characteristics specific to an individual. Face detection has diverse applications especially as an identification solution which can meet the crying needs in security areas. While traditionally 2D images of faces have been used, 3D scans that contain both 3D data and registered color are becoming easier to acquire. Before 3D face images can be used to identify an individual, they require some form of initial alignment information, typically based on facial feature locations. We follow this by a discussion of the algorithms performance when constrained to frontal images and an analysis of its performance on a more complex dataset with significant head pose variation using 3D face data for detection provides a promising route to improved performance.Index Terms—Biometrics, Face, Face Sensor, Feature Extraction, Template MatchingI. I NTRODUCTIONBiometrics is measurable characteristics of an individual used to identify him or her. If a biometric identification system had been in place prior to September 11, the tragedy might have been avoided as several of the terrorists involved were already on government watch lists of suspected terrorists. The need to be able to automate the identification of individuals will become increasingly important in the coming years; watch lists are increasing in size and it is no longer realistic to expect human immigration agents to be able to keep up to date with the large number of people on these lists. This supports the need for the development of working biometrics. Biometric systems can function in verification or identification modes depending on their intended use. In a verification task, a person presents an identity claim to the system and the system only needs to verify the claim. In an identification task, an unknown individual presents himself or herself to the system, and it must identify them. In general, there are three approaches to authentication. In order of least secure and least convenient to most secure and most convenient, they are:1.Something you have - card, token, key.2.Something you know- PIN, password.3.Something you are - a biometric [1].The human face plays an irreplaceable role in biometrics technology due to some of its unique characteristics. First, most cameras are non-invasive; therefore face verification systems are one of the most publicly acceptable verification technologies in use. Another advantage is that face detection systems can work mostly without the cooperation of the user concerned, which is therefore very convenient for the general users.A. Face DetectionThe first task needed after the capture of an image is an initial alignment. The features commonly used to identify the orientation and location of the face is the eyes, nose, and mouth. This approach is the standard used on most facial biometric algorithms. After this stage, processing varies based on whether the application is identification or verification. Identification is the process of determining who someone is. Verification only needs to confirm that a subject is the person they claim to be [9]. In identification, the system compares the captured image (probe) to the gallery. The type of comparisons made depends both on the biometric used and on the matching algorithm in question. After the comparison, the system returns a rank ordering of identities.B. Face VerificationThe face verification compares features from the captured image (probe) to those belonging to the subject of the identity claim. After the comparison, the system returns a confidence score for verification. If this score is above a certain threshold, the system verifies the individual s identity. By varying this threshold, the tradeoff between the number of false accepts (percentage of the time the system will wrongly verify a different person) and false rejections (percentage of time it will reject the correct person) may be adjusted to balance ease of use with security.II. F ACIAL R ECOGNITIONFacial recognition records the spatial geometry of distinguishing features of the face. Different vendors usedifferent methods of facial recognition, however, all focus on measures of key features of the face. Because a person’s face can be captured by a camera from some distance away, facial recognition has a clandestine or covert capability (i.e. the subject does not necessarily know he has been observed). For this reason, facial recognition has been used in projects to identify card counters or other undesirables in casinos, shoplifters in stores, criminals and terrorists in urban areas [2]. A. 2D Face RecognitionMultiple local regions patches to perform 2D face recognition in the presence of expressions and occlusion. The motivation for this is that different facial expressions influence different parts of the face more than others. Algorithm addresses this belief by weighting areas that are less affected by the current displayed emotion more heavily. Reported results show that up to one-sixth of the face can be occluded without a loss in recognition, and one-third of the face can be occluded with a minimal loss [3].B. 3D Face RecognitionThe ICP algorithm to align 3D meshes containing face geometry. Their algorithm is based on four main steps: feature point detection in the probe images, rough alignment of probe to gallery by moving the probe centric to match, iterative adjustment based on the closest point matching (ICP), and using known points (i.e. the eyes, tip of the nose and the mouth) to verify the match. Once thisprocess is run, the ICP algorithm reports an average root mean-square distance that represents the separation between the gallery and probe meshes (i.e. the quality of the match) [3]. After running this process against their database of images with one gallery image and probe image per subject, they achieved a 95.6% rank one recognition rate with 108 images.C. Biometric System ModulesEnrollment Unit: The enrollment module registers individuals into the biometric system database. During this phase, a biometric reader scans the individual’s biometric characteristic to produce its digital representation (see figure 1).Feature Extraction Unit: This module processes the input sample to generate a compact representation called the template, which is then stored in a central database or a smartcard issued to the individual.Matching Unit: This module compares the current input with the template. If the system performs identity verification, it compares the new characteristics to the user’s master template and produces a score or match value (one to one matching). A system performing identification matches the new characteristics against the master templates of many users resulting in multiple match values (one too many matching).Decision Maker: This module accepts or rejects the user based on a security threshold and matching score [1].Figure 1: Block diagrams of Enrollment, Verification, and Identification tasks are shown using the four main modules of a biometric system.User InterfaceVERIFICATIONUser InterfaceIDENTIFICATIONIII. F ACE S ENSORFace sensor using for both environment and the object to be scanned affect scanner accuracy and impose limitations on scanning. The material, geometry, or many other factors in the scanned object can cause decreases in accuracy or prevent successful scanning entirely. For example, secular materials are difficult to scan with lasers and discontinuous surfaces can yield scanning errors around edges. For this reason, it is difficult to define a single measurement to characterize accuracy. By focusing on a specific type of object (in our case, faces), we can provide a more detailed analysis for the scanner than is provided by most of the manufacturers. Further, faces themselves are rather complex with nontrivial geometry. As a result, they are a good test object to consider for different applications as well.When attempting to determine the best scanner to use in a face recognition application, there are several factors to consider. There are accuracy levels in a scanner that are not necessary because faces can change more than that due to aging, weight, and expression. However, we will not explore that limit here since the majority of the scanners have accuracies in approximately tenths of a millimeter [6].A. Scanner AssessmentIn order to determine which of these devices is best for our application we also analyzed the accuracy of the points provided. While some of the vendors do make accuracy claims, attempting to compare those claims for our purposes may not be possible since most companies tend to test on the accuracy of sampling planar objects; this is not helpful here since faces are not planar. When examining the accuracy we needed a series of reference faces to use for the comparisons. However, using human faces is problematic because the human face is deformable and hence cannot serve as ground truth. No matter how much a person may try to keep the same face from one minute to the next, it can change significantly. Therefore, we constructed 3D face masks from real subjects in order to test the scanners. Since we were manufacturing the synthetic faces, we knew the ground truth values for them. In order to get the most realistic accuracy values possible, we produced ten faces consisting of five male and five female subjects.B. Konica Minolta Vivid 910We ultimately decided to use the Konica Minolta Vivid 910 for all of our data collection. It provides a significantly higher level of detail than any of the other scanners we examined while still maintaining an accuracy level essentially tied with that of the Qlonerator. While the Qlonerator does provide better ear to ear facial coverage, it requires a significant offset between scanning pods which may not always be available. The Minolta provides data in a grid structure. The points in focus provide valid range data, while points out of focal range do not provide range data allowing for easy segmentation of the foreground and background. We show an example of the difference between valid and invalid range pixels where red are invalid range pixels and blue are validrange pixels.Figure 2: Vivid 910 by Konica MinoltaThe Minolta provides a 640 X 480 grid of points with color and registered range. The scanner is eye safe provided the subject does not circumvent the built in safety features. As mentioned above the scanner works well in a laboratory setting but will not work in direct sunlight or unusually bright light. While this scanner may not be the universal scanner ideal for a deployment situation, it captures the best data for our needs.IV. F ACIAL F EATURE D ETECTIONWe approach for locating the nose tip in 3D facial data.A hierarchical filtering scheme combining two ―rules‖ to extract the points that distinguish the nose from other salient points. The first rule states that the nose tip will be the highest point in a certain direction that is determined by finding the normal’s on the face [4]. This rule eliminates many points, leaving a limited number of candidate points (the chin, the forehead, the cheeks, hair, etc.). The next rule attempts to model the cap-like shape on the nose tip itself. Each candidate point is characterized by a feature vector containing the mean and variance of its neighboring points. The vectors are projected into mean-variance space and a Support Vector Machine (SVM) is used to determine the boundary between nose tips and non-nose tips. The authors note that this rule also is challenged by wrinkles, clothing, or other cap-like areas on the face. The authors use three databases to test their algorithm. The largest database, the 3D Pose and Expression Face Models (3DPEF), contains 300 images of 30 subjects with small amounts of changes in pitch, yaw, and roll and a 99.3% nose detection rate is reported.A. Knowledge Based MethodsThe classical work in this category is the multiple-rule based method. The main problem with knowledge-based methods is the difficulty of transforming human knowledge into rules described in computer languages, especially for 3-D rotated faces in different poses.B. Template MatchingWe proposed a mosaic Gravity-Center Templatematching method. It can be observed that the maincomponents of an upright human face, such as doubleeyebrows, double eyes, nose bottom and mouth; almostall orient in a horizontal direction and that the verticalscale of the features are approximately equal.C. Invariant Feature MethodsThere are many works using various invariant featuresincluding gray values, edges, textures, color or acombination of these features. Among them, color is mostwidely used for face detection [5]. However, colorinformation is not enough to correctly locate faces, although non-upright and non-frontal faces can be easilydetected as candidates. It is therefore usually combinedwith other features such as edges or textures.D. Object and Face IndexingFace indexing method to reduce the search space of adatabase by placing images in different bins based on the subject’s hand geometry and written signature. First,feature vectors are found by taking various measurementson each type of image. Once the feature vectors arecalculated, we use the k-means clustering algorithm tocluster images [8]. We uses database representing 50users, each having 5 training images and 5 testing imagesfor a total of 500 images. They are able to reduce thesearch space to 5% of the original size while not affectingthe false reject rate (FRR). Using a point matchingsystem, their technique calculates the similarity betweentwo objects and a nearest neighbor method is used todetermine the closest object prototype.V. S YSTEM D ESIGNSystem design is a transition from a user-orienteddocument to a document oriented to programmers ordatabase personnel. It goes through logical and physicaldesign with emphasis on the following:Preparing input/output specifications.Preparing security and control specifications.Specifying the implementation plan.Preparing a logical design walkthrough before implementation.As a biometric, facial recognition is a form of computer vision that uses faces to attempt to identify a person or verify a person’s claimed identity. Regardless of specific method used, facial recognition is accomplished in a five step process [7].1. First, an image of the face is acquired. This acquisitioncan be accomplished by digitally scanning an existingphotograph or by using an electro-optical camera toacquire a live picture of a subject. As video is a rapidsequence of individual still images, it can also be used asa source of facial images.2. Second, software is employed to detect the location ofany faces in the acquired image. This task is difficult, and often generalized patterns of what a face ―looks like‖ (two eyes and a mouth set in an oval shape) are employed to pick out the faces.3. Third, once the facial detection software has targeted a face, it can be analyzed. As noted in slide three, facial recognition analyzes the spatial geometry of distinguishing features of the face. Template generation is the result of the feature extraction process. A template is a reduced set of data that represents the unique features of an enrollee’s face.4. The fourth step is to compare the template generated in step three with those in a database of known faces. In an identification application, this process yields scores that indicate how closely the generated template matches each of those in the database. In a verification application, the generated template is only compared with one template in the database – that of the claimed identity.5. The final step is determining whether any scores produced in step four are high enough to declare a match. The rules governing the declaration of a match are often configurable by the end user, so that he or she can determine how the facial recognition system should behave based on security and operational considerations.VI.I MPLEMENTATION OF S YSTEMIn order to implement this new biometric approaches for improved person identification using facial detection efficiently, program is used. This program could speed up the development of this system because it has facilities to draw forms and to add library easily [6].A. Face Detection AlgorithmAutomatic facial feature detection algorithms tested and designed for algorithms in 2D or 3D image. Facial feature detection algorithms operating on 2D color and grayscale images exist and are able to identify the eyes and mouth somewhat reliably. Examples of current methods for identifying facial features use Eigen features, deformable templates, Gabor wavelet filters, colormanipulation methods, Edge Holistic, graph matching, etc.B. Feature ExtractionEigen Face/ Eigen Feature methods on faces utilize a mathematical method knows as Principal Component Analysis to simplify the representation of more complex data based upon a training set. This simpler representation is a vector, which is typically then used to search for the nearest neighbor vector in a gallery to identify who the person is most likely to be. Performance for this varies widely and this type of biometric can be used in different forms for almost any distinguishing feature including face as well as being applicable for different types of measurements including 2D, 3D, or infrared images. This method is not limited to biometrics either and can be applied to generic object identification.C. DatasetsWe proposed a method using face curvature to identify facial features. Similar to, this shape based approach identifies facial features, but does identify more facial features than. The additional facial features are needed because to use these facial features not just for an alignment but uses information about these features (such as eye width) as recognition metric as well. Her method worked well, however it was tested on an extremely small database of 24 range scans which is too small to accurately assess performance on much larger real world datasets.D. Data AcquisitionThe Minolta scanner uses triangulation with a laser stripe projector to build a 3D model of the face from a sequence of profiles. Both color (r, g, b) and 3D location (x, y, z) coordinates are captured, but not perfectly simultaneously, and the laser stripe requires a few seconds to cross the face. The resolution on the Minolta camera is 640x480, yielding a maximum of 300,000 possible sample points. The number of 3D points on a frontal image of the face taken by the Minolta camera is typically around 112,000, and depends on the lens used as well as standoff. Additional vertices arise from hair, clothing, and background objects [7]. Example images from this sensor can be seen in Figure 3.(a)(b)(c) (d)Figure 3: Examples of images captured with the Vivid 910 by Minolta (a and c) 3D shape data for two different subjects (b and d) associated2D color texture information.VII. T ESTING B IOMETRIC S YSTEMAll biometric tests are accuracy based. A summary of the more common of these tests is described below: Acceptance Testing:The process of determining whether an implementation satisfies acceptance criteria and enables the user to determine whether or not to accept the implementation. This includes the planning and execution of several kinds of tests (e.q., functionality, quality, and speed performance testing) that demonstrate that the implementation satisfies the user requirements. Interoperability Testing:The testing of one implementation (product, system) with another to establish that they can work together properly. Performance Testing:Measures the performance characteristics of an Implementation Under Test (IUT) such as its throughput, responsiveness, etc., under various conditions.Robustness Testing:The process of determining how well an implementation processes data which contains errors.VIII. E XPERIMENTAL R ESULTSThe samples used for evaluation of the framework were organized as one gallery and three probe databases. The gallery database has 30 neutral faces, one for each subject, recorded in the first data acquisition session. Three probe sets are formed as follows:Probe set 1: 30 neutral faces acquired in the second session.Probe set 2: 30 smiling faces acquired in the second session.Probe set 3: 60 faces, (probe set 1 and probe set 2 together).The validation experiments were organized as follows: Experiment 1: Testing the neutral and smiling recognition modules separately1.1 Neutral face recognition: probe set 1. (Neutral facerecognition module used.)1.2 Neutral face recognition: probe set2. (Neutral facerecognition module used.)1.3 Smiling face recognition: probe set2. (Smiling facerecognition module used.)Experiment 2: Testing a practical scenario2.1 Neutral face recognition module used alone: probe set3 is used2.2 Integrated expression and face recognition: probe set3 is used. (Linear discriminate classifier is used forexpression recognition.)2.3 Integrated expression and face recognition: probe set3 is used. (Support vector machine is used forexpression recognition.)Experiment 1 tested one of the basic assumptions behind the framework proposed. It was expected that a system designed to recognize neutral faces would be successful with faces that are indeed neutral, but it may achieve much less success when dealing with faces displaying an expression, (e.g., smiling faces). These expectations were confirmed by the high rank-one recognition (97%) achieved by the neutral face recognition module for neutral faces (probe set 1) in sub-experiment 1, and the much lower rank-one recognition rate (57%) achieved by this same module for smiling faces (probe set 2), in sub-experiment 2. In contrast, the third sub-experiment confirmed that a module that has been specifically developed for the identification of individuals from smiling probe images (probe set 2) is clearly more successful in this task (80% rank-one recognition).Experiment 2 simulated a more realistic scenario, in which the expression in the subject is not controlled. Accordingly, for all three sub-experiments in Experiment 2 we used the comprehensive probe set 3, including one neutral range image and one smiling range image from each of the subjects. In the first sub-experiment we observe the kind of results that could be expected when these 60 probe images are processed by a ―standard‖ neutral face recognition module alone. It was observed that with a mix of neutral and smiling faces this simple system only achieves 77% rank-one face recognition. This result highlights the need to account for the possibility of a non-neutral expression in 3D face recognition systems. On the other hand, in sub-experiments two and three we apply the same mixed set of images (Probe set 3) through the complete process depicted in our proposed framework. The right-most four columns whether using the linear discriminate analysis classifier or the support vector machine for the initial expression sorting, the rank-one face recognition levels achieved by the overall system are higher (87%, 85%).IX. C ONCLUSIONSThe work represents an attempt to acknowledge and account for the presence on face detection, towards their improved identification. We have focused on models of identify, features extraction and classification of the face authentication problem. Major challenges and their corresponding solutions are discussed. Some commercial systems available in the industry market are introduced briefly along with the face recognition. Classification is a step which must be complemented with feature extraction in order to demonstrate detection accuracy and performances. Its use has been successful with little to no exception, and face detection will prove to be a widely used security measure in the future. In the meantime, we, as a society, have time to decide how we want to use this new technology. By implementing reasonable safeguards, we can harness the power of the technology to maximize its public safety benefits while minimizing the intrusion on individual privacy.R EFERENCES[1] A.K.Jian, R.Bolle, S.Pankanti(Eds), ―Biometrics-personal identification in networked society‖ 1999, Norwell, MA: Kluwer.[2] C.Hesher, A.Srivastava, G.Erlebacher, ―A noveltechnique for face recognition using range images‖ in the Proceedings of Seventh International Symposium on Signal Processing and Its Application, 2003. [3] K. Bowyer, K.Chang, P. Flynn, ―A survey ofapproaches to 3D and multi-modal 3D+ 2D face recogni tion‖ in IEEE International Conference on Pattern Recognition, 2004: pages 358-361.[4] P.Ekman, W. Friesen, ―Constants across cultures inthe face and emotion,‖ in Jounal of Personality and Social Psychology, 1971. 17(2): pages 124-129. [5] C.Li, A.Barreto, ―Prof ile-Based 3D Face Registrationand Recognition‖. in Lecture Notes on Computer Science, 2005. 3506: pages 484-494.[6] C.Li, A.Barreto, J.Zhai, and C.Chin, ―Exploring FaceRecognition Using 3D Profiles and Contours‖ in the Proceedings of IEEE Southeast on 2005: pages 576-579.[7] C. Garcia and M. Delakis, ―Convolutional face finder:A neural architecture for fast and robust facedetection,‖ IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 11, pp. 1408–1423, Nov. 2004.[8] R. Wang, J. Chen, S. Yan, S. Shan, X. Chen, and W.Gao, ―Face detection based on the manifold,‖ in Audio- and Video-Based Biometric Person Authentication. Berlin, Germany: Springer-Verlag, 2005, pp. 208–218.[9] R. Osadchy, M. Miller, and Y. LeCun, ―Synergisticface detection and pose estimation with energy-based model,‖ in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2005,pp. 1017–1024.AUTHOR PROFILEMr. V.K. NARENDIRA KUMARM.C.A., M.Phil., Assistant Professor,Department of InformationTechnology, Gobi Arts & ScienceCollege (Autonomous),Gobichettipalayam –638 453, ErodeDistrict, Tamil Nadu, India. Hereceived his M.Phil Degree inComputer Science from Bharathiar University in 2007.He has authored or co-authored more than 59 technicalpapers and conference presentations. He is an editorial board member for several scientific journals. His research interests are focused on Internet Security,Biometrics, Advanced Networking, Visual Human-Computer Interaction, and Multiple BiometricsTechnologies.Dr. B. SRINIVASAN M.C.A.,M.Phil., M.B.A., Ph.D., AssociateProfessor, PG & Research Departmentof Computer Science, Gobi Arts &Science College (Autonomous),Gobichettipalayam –638 453, ErodeDistrict, Tamil Nadu, India. Hereceived his Ph.D. Degree inComputer Science from Vinayaka Missions University in11.11.2010. He has authored or co-authored more than 70technical papers and conference presentations. He is areviewer for several scientific e-journals. His research interests include automated biometrics, computer networking, Internet security, and performance evaluation.。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Face Recognition with Local Binary PatternsTimo Ahonen,Abdenour Hadid,and Matti Pietik¨a inenMachine Vision Group,Infotech OuluPO Box4500,FIN-90014University of Oulu,Finland,{tahonen,hadid,mkp}@ee.oulu.fi,http://www.ee.oulu.fi/mvg/ Abstract.In this work,we present a novel approach to face recognitionwhich considers both shape and texture information to represent face im-ages.The face area isfirst divided into small regions from which LocalBinary Pattern(LBP)histograms are extracted and concatenated into asingle,spatially enhanced feature histogram efficiently representing theface image.The recognition is performed using a nearest neighbour classi-fier in the computed feature space with Chi square as a dissimilarity mea-sure.Extensive experiments clearly show the superiority of the proposedscheme over all considered methods(PCA,Bayesian Intra/extrapersonalClassifier and Elastic Bunch Graph Matching)on FERET tests whichinclude testing the robustness of the method against different facial ex-pressions,lighting and aging of the subjects.In addition to its efficiency,the simplicity of the proposed method allows for very fast feature extrac-tion.1IntroductionThe availability of numerous commercial face recognition systems[1]attests to the significant progress achieved in the researchfield[2].Despite these achieve-ments,face recognition continues to be an active topic in computer vision re-search.This is due to the fact that current systems perform well under relatively controlled environments but tend to suffer when variations in different factors (such as pose,illumination etc.)are present.Therefore,the goal of the ongoing research is to increase the robustness of the systems against different factors. Ideally,we aim to develop a face recognition system which mimics the remark-able capabilities of human visual perception.Before attempting to reach such a goal,one needs to continuously learn the strengths and weaknesses of the proposed techniques in order to determine new directions for future improve-ments.To facilitate this task,the FERET database and evaluation methodology have been created[3].The main goal of FERET is to compare different face recognition algorithms on a common and large database and evaluate their per-formance against different factors such as facial expression,illumination changes, aging(time between the acquisition date of the training image and the image presented to the algorithm)etc.Among the major approaches developed for face recognition are Principal Component Analysis(PCA)[4],Linear Discriminant Analysis(LDA)[5]and T.Pajdla and J.Matas(Eds.):ECCV2004,LNCS3021,pp.469–481,2004.c Springer-Verlag Berlin Heidelberg2004470T.Ahonen,A.Hadid,and M.Pietik¨a inenElastic Bunch Graph Matching(EBGM)[6].PCA is commonly referred to as the”eigenface”method.It computes a reduced set of orthogonal basis vectors or eigenfaces of the training face images.A new face image can be approximated by a weighted sum of these eigenfaces.PCA provides an optimal linear transfor-mation from the original image space to an orthogonal eigenspace with reduced dimensionality in the sense of least mean squared reconstruction error.LDA seeks tofind a linear transformation by maximising the between-class variance and minimising the within-class variance.In the EBGM algorithm,faces are rep-resented as graphs,with nodes positioned atfiducial points and edges labelled with distance vectors.Each node contains a set of Gabor wavelet coefficients, known as a jet.Thus,the geometry of the face is encoded by the edges while the grey value distribution(texture)is encoded by the jets.The identification of a new face consists of determining among the constructed graphs,the one which maximises the graph similarity function.Another proposed approach to face recognition is the Bayesian Intra/extrapersonal Classifier(BIC)[7]which uses the Bayesian decision theory to divide the difference vectors between pairs of face images into two classes:one representing intrapersonal differences(i.e. differences in a pair of images representing the same person)and extrapersonal differences.In this work,we introduce a new approach for face recognition which consid-ers both shape and texture information to represent the face images.As opposed to the EBGM approach,a straightforward extraction of the face feature vector (histogram)is adopted in our algorithm.The face image isfirst divided into small regions from which the Local Binary Pattern(LBP)features[8,9]are ex-tracted and concatenated into a single feature histogram efficiently representing the face image.The textures of the facial regions are locally encoded by the LBP patterns while the whole shape of the face is recovered by the construction of the face feature histogram.The idea behind using the LBP features is that the face images can be seen as composition of micro-patterns which are invariant with re-spect to monotonic grey scale bining these micro-patterns, a global description of the face image is obtained.2Face Description with Local Binary PatternsThe original LBP operator,introduced by Ojala et al.[9],is a powerful means of texture description.The operator labels the pixels of an image by thresholding the3x3-neighbourhood of each pixel with the center value and considering the result as a binary number.Then the histogram of the labels can be used as a texture descriptor.See Figure1for an illustration of the basic LBP operator.Later the operator was extended to use neigbourhoods of different sizes[8]. Using circular neighbourhoods and bilinearly interpolating the pixel values allow any radius and number of pixels in the neighbourhood.For neighbourhoods we will use the notation(P,R)which means P sampling points on a circle of radius of R.See Figure2for an example of the circular(8,2)neighbourhood.Another extension to the original operator uses so called uniform patterns [8].A Local Binary Pattern is called uniform if it contains at most two bitwiseFace Recognition with Local Binary Patterns 471Binary: 11010011Decimal: 211Fig.1.The basic LBP operator.Fig.2.The circular (8,2)neigbourhood.The pixel values are bilinearly interpolated whenever the sampling point is not in the center of a pixel.transitions from 0to 1or vice versa when the binary string is considered circular.For example,00000000,00011110and 10000011are uniform patterns.Ojala et al.noticed that in their experiments with texture images,uniform patterns account for a bit less than 90%of all patterns when using the (8,1)neighbourhood and for around 70%in the (16,2)neighbourhood.We use the following notation for the LBP operator:LBP u 2P,R .The subscript represents using the operator in a (P,R )neighbourhood.Superscript u 2stands for using only uniform patterns and labelling all remaining patterns with a single label.A histogram of the labeled image f l (x,y )can be defined asH i =x,yI {f l (x,y )=i },i =0,...,n −1,(1)in which n is the number of different labels produced by the LBP operator and I {A }= 1,A is true 0,A is false.This histogram contains information about the distribution of the local micropat-terns,such as edges,spots and flat areas,over the whole image.For efficient face representation,one should retain also spatial information.For this purpose,the image is divided into regions R 0,R 1,...R m −1(see Figure 5(a))and the spatially enhanced histogram is defined asH i,j = x,yI {f l (x,y )=i }I {(x,y )∈R j },i =0,...,n −1,j =0,...,m −1.(2)In this histogram,we effectively have a description of the face on three different levels of locality:the labels for the histogram contain information about the patterns on a pixel-level,the labels are summed over a small region to produce information on a regional level and the regional histograms are concatenated to build a global description of the face.472T.Ahonen,A.Hadid,and M.Pietik¨a inenFrom the pattern classification point of view,a usual problem in face recog-nition is having a plethora of classes and only a few,possibly only one,training sample(s)per class.For this reason,more sophisticated classifiers are not needed but a nearest-neighbour classifier is used.Several possible dissimilarity measures have been proposed for histograms:–Histogram intersection:D(S,M)=imin(S i,M i)(3)–Log-likelihood statistic:L(S,M)=−iS i log M i(4)–Chi square statistic(χ2):χ2(S,M)=i (S i−M i)2S i+M i(5)All of these measures can be extended to the spatially enhanced histogram by simply summing over i and j.When the image has been divided into regions,it can be expected that some of the regions contain more useful information than others in terms of distin-guishing between people.For example,eyes seem to be an important cue in human face recognition[2,10].To take advantage of this,a weight can be set for each region based on the importance of the information it contains.For example, the weightedχ2statistic becomesχ2w(S,M)=i,j w j(S i,j−M i,j)2S i,j+M i,j,(6)in which w j is the weight for region j.3Experimental DesignThe CSU Face Identification Evaluation System[11]was utilised to test the performance of the proposed algorithm.The system follows the procedure of the FERET test for semi-automatic face recognition algorithms[12]with slight modifications.The system uses the full-frontal face images from the FERET database and works as follows(see Figure3):1.The system preprocesses the images.The images are registered using eyecoordinates and cropped with an elliptical mask to exclude non-face area from the image.After this,the grey histogram over the non-masked area is equalised.2.If needed,the algorithm is trained using a subset of the images.Face Recognition with Local Binary Patterns473Fig.3.The parts of the CSU face recognition system.3.The preprocessed images are fed into the experimental algorithm which out-puts a distance matrix containing the distance between each pair of images.ing the distance matrix and different settings for gallery and probe imagesets,the system calculates rank curves for the system.These can be calcu-lated for prespecified gallery and probe image sets or by choosing a random permutations of one large set as probe and gallery sets and calculating the average performance.The advantage of the prior method is that it is easy to measure the performance of the algorithm under certain challenges(e.g.different lighting conditions)whereas the latter is more reliable statistically.The CSU system uses the same gallery and probe image sets that were used in the original FERET test.Each set contains at most one image per person. These sets are:–fa set,used as a gallery set,contains frontal images of1196people.–fb set(1195images).The subjects were asked for an alternative facial ex-pression than in fa photograph.–fc set(194images).The photos were taken under different lighting condi-tions.–dup I set(722images).The photos were taken later in time.–dup II set(234images).This is a subset of the dup I set containing those images that were taken at least a year after the corresponding gallery image.In this paper,we use two statistics produced by the permutation tool:the mean recognition rate with a95%confidence interval and the probability of one algorithm outperforming another[13].The image list used by the tool1contains 4images of each of the160subjects.One image of every subject is selected to the gallery set and another image to the probe set on each permutation.The number of permutations is10000.1list640.srt in the CSU Face Identification Evaluation System package474T.Ahonen,A.Hadid,and M.Pietik¨a inenThe CSU system comes with implementations of the PCA,LDA,Bayesian intra/extrapersonal(BIC)and Elastic Bunch Graph Matching(EBGM)face recognition algorithms.We include the results obtained with PCA,BIC2and EBGM here for comparison.There are some parameters that can be chosen to optimise the performance of the proposed algorithm.Thefirst one is choosing the LBP operator.Choosing an operator that produces a large amount of different labels makes the histogram long and thus calculating the distace matrix gets ing a small number of labels makes the feature vector shorter but also means losing more information.A small radius of the operator makes the information encoded in the histogram more local.The number of labels for a neighbourhood of8pixels is256for standard LBP and59for LBP u2.For the16-neighbourhood the numbers are 65536and243,respectively.The usage of uniform patterns is motivated by the fact that most patterns in facial images are uniform:we found out that in the preprocessed FERET images,79.3%of all the patterns produced by the LBP16,2 operator are uniform.Another parameter is the division of the images into regions R0,...,R m−1. The length of the feature vector becomes B=mB r,in which m is the number of regions and B r is the LBP histogram length.A large number of small regions produces long feature vectors causing high memory consumption and slow clas-sification,whereas using large regions causes more spatial information to be lost. We chose to divide the image with a grid into k∗k equally sized rectangular regions(windows).See Figure5(a)for an example of a preprocessed facial image divided into49windows.4ResultsTo assess the performance of the three proposed distance measures,we chose to use two different LBP operators in windows of varying size.We calculated the distance matrices for each of the different settings and used the permutation tool to calculate the probabilities of the measures outperforming each other.The results are in Table1.From the statistical hypothesis testing point of view,it cannot be said that any of the metrics would be the best one with a high(>0.95)probability. However,histogram intersection andχ2measures are clearly better than log-likelihood when the average number of labels per histogram bin is low but log-likelihood performs better when this number increases.The log-likelihood mea-sure has been preferred for texture images[8]but because of its poor performance on small windows in our experiments it is not appealing for face recognition.The χ2measure performs slightly better than histogram intersection so we chose to use it despite the simplicity of the histogram intersection.When looking for the optimal window size and LBP operator we noticed that the LBP representation is quite robust with respect to the selection of the 2Two decision rules can be used with the BIC classifier:Maximum A Posteriori(MAP) or Maximum Likelihood(ML).We include here the results obtained with MAP.Face Recognition with Local Binary Patterns 475Table 1.The performance of the histogram intersection,log-likelihood and χ2dissim-ilarity measures using different window sizes and LBP operators.Operator Window size P(HI >LL)P(χ2>HI)P(χ2>LL)LBP u 28,118x21 1.0000.714 1.000LBP u 28,121x25 1.0000.609 1.000LBP u 28,126x300.3090.8060.587LBP u 216,218x21 1.0000.850 1.000LBP u 216,221x25 1.0000.874 1.000LBP u 216,226x30 1.0000.918 1.000LBP u 216,232x37 1.0000.933 1.000LBP u 216,243x500.0850.8970.418parameters.Changes in the parameters may cause big differences in the length of the feature vector,but the overall performance is not necessarily affected significantly.For example,changing from LBP u 216,2in 18*21-sized windows to LBP u 28,2in 21*25-sized windows drops the histogram length from 11907to 2124,while the mean recognition rate reduces from 76.9%to 73.8%.The mean recognition rates for the LBP u 216,2,LBP u 28,2and LBP u 28,1as a func-tion of the window size are plotted in Figure 4.The original 130*150pixel image was divided into k ∗k windows,k =4,5,...,11,13,16resulting in window sizes from 32*37to 8*9.The five smallest windows were not tested using the LBP u 216,2operator because of the high dimension of the feature vector that would have been produced.As expected,a larger window size induces a decreased recog-nition rate because of the loss of spatial information.The LBP u 28,2operator in 18*21pixel windows was selected since it is a good trade-offbetween recognition performance and feature vector length.Fig.4.The mean recognition rate for three LBP operators as a function of the window size.476T.Ahonen,A.Hadid,and M.Pietik¨a inenTofind the weights w j for the weightedχ2statistic(Equation6),the follow-ing procedure was adopted:a training set was classified using only one of the 18*21windows at a time.The recognition rates of corresponding windows on the left and right half of the face were averaged.Then the windows whose rate lay below the0.2percentile of the rates got weight0and windows whose rate lay above the0.8and0.9percentile got weights2.0and4.0,respectively.The other windows got weight1.0.The CSU system comes with two training sets,the standard FERET training set and the CSU training set.As shown in Table2,these sets are basically subsets of the fa,fb and dup I sets.Since illumination changes pose a major challenge to most face recognition algorithms and none of the images in the fc set were included in the standard training sets,we defined a third training set,called the subfc training set,which contains half of the fc set(subjects1013–1109).Table2.Number of images in common between different training and testing sets.CSU standard39600990501subfc9709700194The permutation tool was used to compare the weights computed from the different training sets.The weights obtained using the FERET standard set gave an average recognition rate of0.80,the CSU standard set0.78and the subfc set 0.81.The pairwise comparison showed that the weights obtained with the subfc set are likely to be better than the others(P(subfc>FERET)=0.66and P(subfc >CSU)=0.88).The weights computed using the subfc set are illustrated in Figure5(b). The weights were selected without utilising an actual optimisation procedure and thus they are probably not optimal.Despite that,in comparison with the nonweighted method,we got an improvement both in the processing time(see Table3)and recognition rate(P(weighted>nonweighted)=0.976).The image set which was used to determine the weights overlaps with the fc set.To avoid biased results,we preserved the other half of the fc set(subjects(a)(b)Fig.5.(a)An example of a facial image divided into7x7windows.(b)The weights set for weightedχ2dissimilarity measure.Black squares indicate weight0.0,dark grey 1.0,light grey2.0and white4.0.Face Recognition with Local Binary Patterns477 Table3.Processing times of weighted and nonweighted LBP on a1800MHz AMD Athlon running Linux.Note that processing FERET images(last column)includes heavy disk operations,most notably writing the distance matrix of about400MB to disk.Type of LBP Feature ext.Distance calc.Processing(ms/image)(µs/pair)FERET images(s) Weighted 3.4946.61046Nonweighted 4.1458.612851110-1206)as a validation set.Introducing the weights increased the recognition rate for the training set from0.49to0.81and for the validation set from0.52to 0.77.The improvement is slightly higher for the training set,but the significant improvement for the validation set implies that the calculated weights generalize well outside the training set.Thefinal recognition results for the proposed method are in shown Table4 and the rank curves are plotted in Figures6(a)–(d).LBP clearly outperforms the control algorithms in all the FERET test sets and in the statistical test.It should be noted that the CSU implementations of the algorithms whose results we included here do not achieve the samefigures as in the original FERET test due to some modifications in the experimental setup as mentioned in[11].The results of the original FERET test can be found in[12].Table4.The recognition rates of the LBP and comparison algorithms for the FERET probe sets and the mean recognition rate of the permutation test with a95%confidence interval.Method fb fc dup I dup II lower mean upperLBP,weighted0.970.790.660.640.760.810.85LBP,nonweighted0.930.510.610.500.710.760.81PCA,MahCosine0.850.650.440.220.660.720.78Bayesian,MAP0.820.370.520.320.670.720.78EBGM Optimal0.900.420.460.240.610.660.71 Additionally,to gain knowledge about the robustness of our method against slight variations of pose angle and alignment we tested our approach on the ORL face databse(Olivetti Research Laboratory,Cambridge)[14].The database contains10different images of40distinct subjects(individuals).Some images were taken at different times for some people.There are variations in facial expression(open/closed eyes,smiling/non-smiling.),facial details(glasses/no glasses)and scale(variation of up to about10%).All the images were taken against a dark homogenous background with the subjects in an upright,frontal position,with tolerance for some tilting and rotation of up to about20degrees. The images are grey scale with a resolution of92*112.Randomly selecting5 images for the gallery set and the other5for the probe set,the preliminary experiments result in0.98of average recognition rate and0.012of standard478T.Ahonen,A.Hadid,and M.Pietik¨a inenFig.6.(a),(b),(c)Rank curves for the fb,fc and dup1probe sets(from top to down).Face Recognition with Local Binary Patterns479Fig.6.(d)Rank curve for the dup2probe set.deviation of100random permutations using LBP u216,2,a windows size of30*37 andχ2as a dissimilarity measure.Window weights were not used.Note that no registration or preprocessing was made on the images.The good results indicate that our approach is also relatively robust with respect to alignment.However, because of the lack of a standardised protocol for evaluating and comparing systems on the ORL database,it is to difficult to include here a fair comparison with other approaches that have been tested using ORL.5Discussion and ConclusionFace images can be seen as a composition of micro-patterns which can be well described by LBP.We exploited this observation and proposed a simple and efficient representation for face recognition.In our approach,a face image is first divided into several blocks(facial regions)from which we extract local bi-nary patterns and construct a global feature histogram that represents both the statistics of the facial micro-patterns and their spatial locations.Then,face recognition is performed using a nearest neighbour classifier in the computed feature space withχ2as a dissimilarity measure.The proposed face represen-tation can be easily extracted in a single scan through the image,without any complex analysis as in the EBGM algorithm.We implemented the proposed approach and compared it against well-known methods such as PCA,EBGM and BIC.To achieve a fair comparison,we con-sidered the FERET face database and protocol,which are a de facto standard in face recognition research.In addition,we adopted normalisation steps and im-plementation of the different algorithms(PCA,EBGM and BIC)from the CSU face identification evaluation system.Reporting our results in such a way does not only make the comparative study fair but also offers the research community new performances to which they are invited to compare their results.480T.Ahonen,A.Hadid,and M.Pietik¨a inenThe experimental results clearly show that the LBP-based method outper-forms other approaches on all probe sets(fb,fc,dup I and dup II).For instance, our method achieved a recognition rate of97%in the case of recognising faces under different facial expressions(fb set),while the best performance among the tested methods did not exceed90%.Under different lighting conditions(fc set),the LBP-based approach has also achieved the best performance with a recognition rate of79%against65%,37%and42%for PCA,BIC and EBGM, respectively.The relatively poor results on the fc set confirm that illumination change is still a challenge to face recognition.Additionally,recognising duplicate faces(when the photos are taken later in time)is another challenge,although our proposed method performed better than the others.To assess the performance of the LBP-based method on different datasets,we also considered the ORL face database.The experiments not only confirmed the validity of out approach,but also demonstrated its relative robustness against changes in alignment.Analyzing the different parameters in extracting the face representation,we noticed a relative insensitivity to the choice of the LBP operator and region size.This is an interesting result since the other considered approaches are more sensitive to their free parameters.This means that only simple calculations are needed for the LBP description while some other methods use exhaustive training tofind their optimal parameters.In deriving the face representation,we divided the face image into several regions.We used only rectangular regions each of the same size but other di-visions are also possible as regions of different sizes and shapes could be used. To improve our system,we analyzed the importance of each region.This is mo-tivated by the psychophysicalfindings which indicate that some facial features (such as eyes)play more important roles in face recognition than other features (such as the nose).Thus we calculated and assigned weights from0to4to the regions(See Figure5(b)).Although this kind of simple approach was adopted to compute the weights,improvements were still obtained.We are currently inves-tigating approaches for dividing the image into regions andfinding more optimal weights for them.Although we clearly showed the simplicity of LBP-based face representation extraction and its robustness with respect to facial expression,aging,illumi-nation and alignment,some improvements are still possible.For instance,one drawback of our approach lies in the length of the feature vector which is used for face representation.Indeed,using a feature vector length of2301slows down the recognition speed especially,for very large face databases.A possible direction is to apply a dimensionality reduction to the face feature vectors.However,due to the good results we have obtained,we expect that the methodology presented here is applicable to several other object recognition tasks as well. Acknowledgements.This research was supported in part by the Academy of Finland.Face Recognition with Local Binary Patterns481 References1.Phillips,P.,Grother,P.,Micheals,R.J.,Blackburn,D.M.,Tabassi,E.,Bone,J.M.:Face recognition vendor test2002results.Technical report(2003)2.Zhao,W.,Chellappa,R.,Rosenfeld,A.,Phillips,P.J.:Face recognition:a liter-ature survey.Technical Report CAR-TR-948,Center for Automation Research, University of Maryland(2002)3.Phillips,P.J.,Wechsler,H.,Huang,J.,Rauss,P.:The FERET database andevaluation procedure for face recognition algorithms.Image and Vision Computing 16(1998)295–3064.Turk,M.,Pentland,A.:Eigenfaces for recognition.Journal of Cognitive Neuro-science3(1991)71–865.Etemad,K.,Chellappa,R.:Discriminant analysis for recognition of human faceimages.Journal of the Optical Society of America14(1997)1724–17336.Wiskott,L.,Fellous,J.M.,Kuiger,N.,von der Malsburg,C.:Face recognition byelastic bunch graph matching.IEEE Transaction on Pattern Analysis and Machine Intelligence19(1997)775–7797.Moghaddam,B.,Nastar,C.,Pentland,A.:A bayesian similarity measure for directimage matching.In:13th International Conference on Pattern Recognition.(1996) II:350–3588.Ojala,T.,Pietik¨a inen,M.,M¨a enp¨a¨a,T.:Multiresolution gray-scale and rotationinvariant texture classification with local binary patterns.IEEE Transactions on Pattern Analysis and Machine Intelligence24(2002)971–9879.Ojala,T.,Pietik¨a inen,M.,Harwood,D.:A comparative study of texture measureswith classification based on feature distributions.Pattern Recognition29(1996) 51–5910.Gong,S.,McKenna,S.J.,Psarrou,A.:Dynamic Vision,From Images to FaceRecognition.Imperial College Press,London(2000)11.Bolme,D.S.,Beveridge,J.R.,Teixeira,M.,Draper,B.A.:The CSU face identifica-tion evaluation system:Its purpose,features and structure.In:Third International Conference on Computer Vision Systems.(2003)304–31112.Phillips,P.J.,Moon,H.,Rizvi,S.A.,Rauss,P.J.:The FERET evaluation method-ology for face recognition algorithms.IEEE Transactions on Pattern Analysis and Machine Intelligence22(2000)1090–110413.Beveridge,J.R.,She,K.,Draper,B.A.,Givens,G.H.:A nonparametric statisti-cal comparison of principal component and linear discriminant subspaces for face recognition.In:IEEE Computer Society Conference on Computer Vision and Pat-tern Recognition.(2001)I:535–54214.Samaria,F.S.,Harter,A.C.:Parameterisation of a stochastic model for human faceidentification.In:IEEE Workshop on Applications of Computer Vision.(1994) 138–142。