人脸识别论文文献翻译中英文人脸识别论文中英文附录(原文及译文)翻译原文来自Thomas David Heseltine BSc. Hons. The University of YorkDepartment of Computer ScienceFor the Qualification of PhD. -- September 2005 -《Face Recognition: Two-Dimensional and Three-Dimensional Techniques》4 Two-dimensional Face Recognition4.1 Feature LocalizationBefore discussing the methods of comparing two facial images we now take a brief look at some at the preliminary processes of facial feature alignment. This process typically consists of two stages: face detection and eye localisation. Depending on the application, if the position of the face within the image is known beforehand (for a cooperative subject in a door access system for example) then the face detection stage can often be skipped, as the region of interest is already known. Therefore, we discuss eye localisation here, with a brief discussion of face detection in the literature review(section 3.1.1).The eye localisation method is used to align the 2D face images of the various test sets used throughout this section. However, to ensure that all results presented arerepresentative of the face recognition accuracy and not a product of the performance of the eye localisation routine, all image alignments are manually checked and any errors corrected, prior to testing and evaluation.We detect the position of the eyes within an image using a simple template based method. A training set of manually pre-aligned images of faces is taken, and each image cropped to an area around both eyes. The average image is calculated and used as a template.Figure 4-1 - The average eyes. Used as a template for eye detection.Both eyes are included in a single template, rather thanindividually searching for each eye in turn, as the characteristic symmetry of the eyes either side of the nose, provides a useful feature that helps distinguish between the eyes and other false positives that may be picked up in the background. Although this method is highly susceptible to scale(i.e. subject distance from thecamera) and also introduces the assumption that eyes in the image appear near horizontal. Some preliminary experimentation also reveals that it is advantageous to include the area of skin just beneath the eyes. The reason being that in some cases the eyebrows can closely match the template, particularly if there are shadows in the eye-sockets, but the area of skin below the eyes helps to distinguish the eyes from eyebrows (the area just below the eyebrows contain eyes, whereas the area below the eyes contains only plain skin).A window is passed over the test images and the absolute difference taken to that of the average eye image shown above. The area of the image with the lowest difference is taken as the region of interest containing the eyes. Applying the same procedure using a smallertemplate of the individual left and right eyes then refines each eye position.This basic template-based method of eye localisation, although providing fairly preciselocalisations, often fails to locate the eyes completely. However, we are able to improve performance by including a weighting scheme.Eye localisation is performed on the set of training images, whichis then separated into two sets: those in which eye detection was successful; and those in which eye detection failed. Taking the set of successful localisations we compute the average distance from the eye template (Figure 4-2 top). Note that the image is quite dark, indicating that the detected eyes correlate closely to the eye template, as wewould expect. However, bright points do occur near the whites of the eye, suggesting that this area is often inconsistent, varying greatly fromthe average eye template.Figure 4-2 – Distance to the eye template for successful detections (top) indicating variance due tonoise and failed detections (bottom) showing credible variance dueto miss-detected features.In the lower image (Figure 4-2 bottom), we have taken the set of failed localisations(images of the forehead, nose, cheeks, background etc. falsely detected by the localisation routine) and once again computed the average distance from the eye template. The bright pupils surrounded by darker areas indicate that a failed match is often due to the high correlation of the nose and cheekbone regions overwhelming the poorly correlated pupils. Wanting to emphasise the2difference of the pupil regions for these failed matches and minimise the variance of the whites of the eyes for successful matches, we divide the lower image values by the upper image to produce a weights vector as shown in Figure 4-3. When applied to the difference image before summing a total error, this weighting scheme provides a much improved detection rate.Figure 4-3 - Eye template weights used to give higher priority to those pixels that best represent the eyes.4.2 The Direct Correlation ApproachWe begin our investigation into face recognition with perhaps the simplest approach,known as the direct correlation method (also referred to as template matching by Brunelli and Poggio [ 29 ]) involving the direct comparison of pixel intensity values taken from facial images. We use the term ‘Direct Correlation’ to encompass all techniques in which face images are compared directly, without any form of image spaceanalysis, weighting schemes or feature extraction, regardless of the distance metric used. Therefore, we do not infer that Pearson’s correlation is applied as the similarity function (although such an approach would obviously come under our definition of direct correlation). We typically use the Euclidean distance as our metric in these investigations (inversely related to Pearson’s correlation and can be considered as a scale and translation sensitive form of image correlation), as this persists with the contrast made between image space and subspace approaches in later sections.Firstly, all facial images must be aligned such that the eye centres are located at two specified pixel coordinates and the image cropped to remove any backgroundinformation. These images are stored as greyscale bitmaps of 65 by 82 pixels and prior to recognition converted into a vector of 5330 elements (each element containing the corresponding pixel intensity value). Each corresponding vector can be thought of as describing a point within a 5330 dimensional image space. This simple principle can easily be extended to much larger images: a 256 by 256 pixel image occupies a single point in 65,536-dimensional image space and again, similar images occupy close points within that space. Likewise, similar faces are located close together within the image space, while dissimilar faces are spaced far apart. Calculating the Euclidean distance d, between two facial image vectors (often referred to as thequery image q, and gallery image g), we get an indication of similarity. A threshold is thenapplied to make the final verification decision.d q g (d threshold ?accept d threshold ?reject ) . Equ. 4-134.2.1 Verification TestsThe primary concern in any face recognition system is its ability to correctly verify a claimed identity or determine a person's most likely identity from a set of potential matches in a database. In order to assess a given system’s ability to perform these tasks, a variety of evaluation methodologies have arisen. Some of these analysis methods simulate a specific mode of operation (i.e. secure site access or surveillance), while others provide a more mathematical description of data distribution in someclassification space. In addition, the results generated from each analysis method may be presented in a variety of formats. Throughout the experimentations in this thesis, we primarily use the verification test as our method of analysis and comparison, although we also use Fisher’s Linear Discriminant to analyse individual subspace components in section 7 and the identification test for the final evaluations described in section 8. The verification test measures a system’s ability to correctly accept or reject the proposed identity of an individual. At a functional level, this reduces to two images being presented forcomparison, for which the system must return either an acceptance (the two images are of the same person) or rejection (the two images are of different people). The test is designed to simulate the application area of secure site access. In this scenario, a subject will present some form of identification at a point of entry, perhaps as a swipe card, proximity chip or PIN number. This number is then used to retrieve a stored image from a database of known subjects (often referred to as the target or gallery image) and compared with a live image captured at the point of entry (the query image). Access is then granted depending on the acceptance/rejection decision.The results of the test are calculated according to how many times the accept/reject decision is made correctly. In order to execute this test we must first define our test set of face images. Although the number of images in the test set does not affect the results produced (as the error rates are specified as percentages of image comparisons), it is important to ensure that the test set is sufficiently large such that statistical anomalies become insignificant (for example, a couple of badly aligned images matching well). Also, the type of images (high variation in lighting, partial occlusions etc.) will significantly alter the results of the test. Therefore, in order to compare multiple face recognition systems, they must be applied to the same test set.However, it should also be noted that if the results are to be representative of system performance in a real world situation, then the test data should be captured under precisely the same circumstances asin the application environment.On the other hand, if the purpose of the experimentation is to evaluate and improve a method of face recognition, which may be applied to a range of application environments, then the test data should present the range of difficulties that are to be overcome. This may mean including a greater percentage of ‘difficult’ images than4would be expected in the perceived operating conditions and hence higher error rates in the results produced. Below we provide the algorithm for executing the verification test. The algorithm is applied to a single test set of face images, using a single function call to the face recognition algorithm: CompareFaces(FaceA, FaceB). This call is used to compare two facial images, returning a distance score indicating how dissimilar the two face images are: the lower the score the more similar the two face images. Ideally, images of the same face should produce low scores, while images of different faces should produce high scores.Every image is compared with every other image, no image is compared with itself and no pair is compared more than once (we assume that the relationship is symmetrical). Once two images have been compared, producing a similarity score, the ground-truth is used to determine if the images are of the same person or different people. In practicaltests this information is often encapsulated as part of the image filename (by means of a unique person identifier). Scores are thenstored in one of two lists: a list containing scores produced by comparing images of different people and a list containing scores produced by comparing images of the same person. The finalacceptance/rejection decision is made by application of a threshold. Any incorrect decision is recorded as either a false acceptance or false rejection. The false rejection rate (FRR) is calculated as the percentage of scores from the same people that were classified as rejections. The false acceptance rate (FAR) is calculated as the percentage of scores from different people that were classified as acceptances.For IndexA = 0 to length(TestSet)For IndexB = IndexA+1 to length(TestSet)Score = CompareFaces(TestSet[IndexA], TestSet[IndexB])If IndexA and IndexB are the same personAppend Score to AcceptScoresListElseAppend Score to RejectScoresListFor Threshold = Minimum Score to Maximum Score:FalseAcceptCount, FalseRejectCount = 0For each Score in RejectScoresListIf Score <= ThresholdIncrease FalseAcceptCountFor each Score in AcceptScoresListIf Score > ThresholdIncrease FalseRejectCount5FalseAcceptRate = FalseAcceptCount / Length(AcceptScoresList)FalseRejectRate = FalseRejectCount / length(RejectScoresList)Add plot to error curve at (FalseRejectRate, FalseAcceptRate)These two error rates express the inadequacies of the system when operating at a specific threshold value. Ideally, both these figures should be zero, but in reality reducing either the FAR or FRR (by altering the threshold value) will inevitably result in increasing the other. Therefore, in order to describe the full operating range of a particular system, we vary the threshold value through the entire range of scores produced. The application of each threshold value produces an additional FAR, FRR pair, which when plotted on a graph produces the error rate curve shown below.6Figure 4-5 - Example Error Rate Curve produced by the verification test.The equal error rate (EER) can be seen as the point at which FAR is equal to FRR. This EER value is often used as a single figure representing the general recognition performance of a biometric system and allows for easy visual comparison of multiple methods. However, it is important to note that the EER does not indicate the level of error that would be expected in a real world application. It is unlikely that any real system would use a threshold value such that the percentage of false acceptances were equal to the percentage of false rejections. Secure site access systems would typically set the threshold such that false acceptances were significantly lower than false rejections: unwilling to tolerate intruders at the cost of inconvenient access denials. Surveillance systems on the other hand would require low false rejection rates to successfully identify people in a less controlled environment. Therefore we should bear in mind that a system with a lower EER might not necessarily be the better performer towards the extremes of its operating capability.There is a strong connection between the above graph and thereceiver operating characteristic (ROC) curves, also used in such experiments. Both graphs are simply two visualisations of the same results, in that the ROC format uses the True Acceptance Rate(TAR), where TAR = 1.0 – FRR in place of the FRR, effectively flipping thegraph vertically. Another visualisation of the verification test results is to display both the FRR and FAR as functions of the threshold value. This presentation format provides a reference to determine the threshold value necessary to achieve a specific FRR and FAR. The EER can be seen as the point where the two curves intersect.7Figure 4-6 - Example error rate curve as a function of the score thresholdThe fluctuation of these error curves due to noise and other errors is dependant on the number of face image comparisons made to generate the data. A small dataset that only allows for a small number of comparisons will results in a jagged curve, in which large steps correspond to the influence of a single image on a high proportion of thecomparisons made. A typical dataset of 720 images (as used insection 4.2.2) provides 258,840 verification operations, hence a drop of 1% EER represents an additional 2588 correct decisions, whereas the quality of a single image could cause the EER tofluctuate by up to ResultsAs a simple experiment to test the direct correlation method, we apply the technique described above to a test set of 720 images of 60 different people, taken from the AR Face Database [ 39 ]. Every image is compared with every other image in the test set to produce a likeness score, providing 258,840 verification operations from which to calculate false acceptance rates and false rejection rates. The error curve produced is shown in Figure 4-7.Figure 4-7 - Error rate curve produced by the direct correlation method using no image preprocessing.We see that an EER of 25.1% is produced, meaning that at the EER threshold8approximately one quarter of all verification operations carried out resulted in an incorrect classification. There are a number of well-known reasons for this poor level of accuracy. Tiny changes in lighting, expression or head orientation cause the location in image space to change dramatically. Images in face space are moved far apart due to these image capture conditions, despite being of the same person’s face. The distance between images of different people becomes smaller than the area of face space covered by images of the same person and hence false acceptances and false rejections occur frequently. Other disadvantages include the large amount of storage necessary for holding many face images and the intensive processing required for each comparison, making this method unsuitable for applications applied to a large database. In section 4.3 we explore the eigenface method, which attempts to address some of these issues.4 二维人脸识别4.1 功能定位在讨论比较两个人脸图像,我们现在就简要介绍的方法一些在人脸特征的初步调整过程。

Robust Face Recognition via Sparse Representation -- A Q&A about the recent advances in face recognitionand how to protect your facial identityAllen Y. Yang (yang@)Department of EECS, UC BerkeleyJuly 21, 2008Q: What is this technique all about?A: The technique, called robust face recognition via sparse representation, provides a new solution to use computer program to classify human identity using frontal facial images, i.e., the well-known problem of face recognition.Face recognition has been one of the most extensively studied problems in the area of artificial intelligence and computer vision. Its applications include human-computer interaction, multimedia data compression, and security, to name a few. The significance of face recognition is also highlighted by a contrast between human’s high accuracy to recognize face images under various conditions and the computer’s historical poor accuracy.This technique proposes a highly accurate recognition framework. The extensive experiment has shown the method can achieve similar recognition accuracy as human vision, for the first time. In some cases, the method has outperformed what human vision can achieve in face recognition.Q: Who are the authors of this technique?A: The technique was developed in 2007 by Mr. John Wright, Dr. Allen Y. Yang, Dr. S. Shankar Sastry, and Dr. Yi Ma.The technique is jointly owned by the University of Illinois and the University of California, Berkeley. A provisional US patent has been filed in 2008. The technique is also being published in the IEEE Transactions on Pattern Analysis and Machine Intelligence [Wright 2008].Q: Why is face recognition difficult for computers?A: There are several issues that have historically hindered the improvement of face recognition in computer science.1.High dimensionality, namely, the data size is large for face images.When we take a picture of a face, the face image under certain color metrics will be stored as an image file on a computer, e.g., the image shown in Figure 1. Because the human brain is a massive parallel processor, it can quickly process a 2-D image and match the image with the other images learned in the past. However, the modern computer algorithms can only process 2-D images sequentially, meaning, it can only process an image pixel-by-pixel. Hence although the image file usually only takes less than 100 K Bytes to store on computer, if we treat each image as a sample point, it sits in a space of more than 10-100 K dimension (that is each pixel owns an individual dimension). Any pattern recognition problem in high-dimensional space (>100 D) is known to be difficult in the literature.Fig. 1. A frontal face image from the AR database [Martinez 1998]. The size of a JPEG file for this image is typically about 60 Kbytes.2.The number of identities to classify is high.To make the situation worse, an adult human being can learn to recognize thousands if not tens of thousands of different human faces over the span of his/her life. To ask a computer to match the similar ability, it has to first store tens of thousands of learned face images, which in the literature is called the training images. Then using whatever algorithm, the computer has to process the massive data and quickly identify a correct person using a new face image, which is called the test image.Fig. 2. An ensemble of 28 individuals in the Yale B database [Lee 2005]. A typical face recognition system needs to recognition 10-100 times more individuals. Arguably an adult can recognize thousands times more individuals in daily life.Combine the above two problems, we are solving a pattern recognition problem to carefully partition a high-dimensional data space into thousands of domains, each domain represents the possible appearance of an individual’s face images.3.Face recognition has to be performed under various real-world conditions.When you walk into a drug store to take a passport photo, you would usually be asked to pose a frontal, neutral expression in order to be qualified for a good passport photo. The store associate will also control the photo resolution, background, and lighting condition by using a uniform color screen and flash light. However in the real world, a computer program is asked to identify humans without all the above constraints. Although past solutions exist to achieve recognition under very limited relaxation of the constraints, to this day, none of the algorithms can answer all the possible challenges, including this technique we present.To further motivate the issue, human vision can accurately recognize learned human faces under different expressions, backgrounds, poses, and resolutions [Sinha 2006]. With professional training, humans can also identify face images with facial disguise. Figure 3 demonstrates this ability using images of Abraham Lincoln.Fig. 3. Images of Abraham Lincoln under various conditions (available online). Arguably humans can recognize the identity of Lincoln from each of these images.A natural question arises: Do we simply ask too much for a computer algorithm to achieve? For some applications such as at security check-points, we can mandate individuals to pose a frontal, neural face in order to be identified. However, in most other applications, this requirement is simply not practical. For example, we may want to search our photo albums to find all the images that contain our best friendsunder normal indoor/outdoor conditions, or we may need to identify a criminal suspect from a murky, low-resolution hidden camera who would naturally try to disguise his identity. Therefore, the study to recognize human faces under real-world conditions is motivated not only by pure scientific rigor, but also by urgent demands from practical applications.Q: What is the novelty of this technique? Why is the method related to sparse representation?A: The method is built on a novel pattern recognition framework, which relies on a scientific concept called sparse representation. In fact, sparse representation is not a new topic in many scientific areas. Particularly in human perception, scientists have discovered that accurate low-level and mid-level visual perceptions are a result of sparse representation of visual patterns using highly redundant visual neurons [Olshausen 1997, Serre 2006].Without diving into technical detail, let us consider an analogue. Assume that a normal individual, Tom, is very good at identifying different types of fruit juice such as orange juice, apple juice, lemon juice, and grape juice. Now he is asked to identify the ingredients of a fruit punch, which contains an unknown mixture of drinks. Tom discovers that when the ingredients of the punch are highly concentrated on a single type of juice (e.g., 95% orange juice), he will have no difficulty in identifying the dominant ingredient. On the other hand, when the punch is a largely even mixture of multiple drinks (e.g., 33% orange, 33% apple, and 33% grape), he has the most difficulty in identifying the individual ingredients. In this example, a fruit punch drink can be represented as a sum of the amounts of individual fruit drinks. We say such representation is sparse if the majority of the juice comes from a single fruit type. Conversely, we say the representation is not sparse. Clearly in this example, sparse representation leads to easier and more accurate recognition than nonsparse representation.The human brain turns out to be an excellent machine in calculation of sparse representation from biological sensors. In face recognition, when a new image is presented in front of the eyes, the visual cortex immediately calculates a representation of the face image based on all the prior face images it remembers from the past. However, such representation is believed to be only sparse in human visual cortex. For example, although Tom remembers thousands of individuals, when he is given a photo of his friend, Jerry, he will assert that the photo is an image of Jerry. His perception does not attempt to calculate the similarity of Jerry’s photo with all the images from other individuals. On the other hand, with the help of image-editing software such as Photoshop, an engineer now can seamlessly combine facial features from multiple individuals into a single new image. In this case, a typical human would assert that he/she cannot recognize the new image, rather than analytically calculating the percentage of similarities with multiple individuals (e.g., 33% Tom, 33% Jerry, 33% Tyke) [Sinha 2006].Q: What are the conditions that the technique applies to?A: Currently, the technique has been successfully demonstrated to classify frontal face images under different expressions, lighting conditions, resolutions, and severe facial disguise and image distortion. We believe it is one of the most comprehensive solutions in face recognition, and definitely one of the most accurate.Further study is required to establish a relation, if any, between sparse representation and face images with pose variations.Q: More technically, how does the algorithm estimate a sparse representation using face images? Why do the other methods fail in this respect?A: This technique has demonstrated the first solution in the literature to explicitly calculate sparse representation for the purpose of image-based pattern recognition. It is hard to say that the other extant methods have failed in this respect. Why? Simply because previously investigators did not realize the importance of sparse representation in human vision and computer vision for the purpose of classification. For example, a well-known solution to face recognition is called the nearest-neighbor method. It compares the similarity between a test image with all individual training images separately. Figure 4 shows an illustration of the similarity measurement. The nearest-neighbor method identifies the test image with a training image that is most similar to the test image. Hence the method is called the nearest neighbor. We can easily observe that the so-estimated representation is not sparse. This is because a single face image can be similar to multiple images in terms of its RGB pixel values. Therefore, an accurate classification based on this type of metrics is known to be difficult.Fig. 4. A similarity metric (the y-axis) between a test face image and about 1200 training images. The smaller the metric value, the more similar between two images. Our technique abandons the conventional wisdom to compare any similarity between the test image and individual training images or individual training classes. Rather, the algorithm attempts to calculate a representation of the input image w.r.t. all available training images as a whole. Furthermore, the method imposes one extra constraint that the optimal representation should use the smallest number of training images. Hence, the majority of the coefficients in the representation should be zero, and the representation is sparse (as shown in Figure 5).Fig. 5. An estimation of sparse representation w.r.t. a test image and about 1200 training images. The dominant coefficients in the representation correspond to the training images with the same identity as the input image. In this example, the recognition is based on downgraded 12-by-10 low-resolution images. Yet, the algorithm can correctly identify the input image as Subject 1.Q: How does the technique handle severe facial disguise in the image?A: Facial disguise and image distortion pose one of the biggest challenges that affect the accuracy of face recognition. The types of distortion that can be applied to face images are manifold. Figure 6 shows some of the examples.Fig. 6. Examples of image distortion on face images. Some of the cases are beyond human’s ability to perform reliable recognition.One of the notable advantages about the sparse representation framework is that the problem of image compensation on distortion combined with face recognition can be rigorously reformulated under the same framework. In this case, a distorted face image presents two types of sparsity: one representing the location of the distorted pixels in the image; and the other representing the identity of the subject as before. Our technique has been shown to be able to handle and eliminate all the above image distortion in Figure 6 while maintaining high accuracy. In the following, we present an example to illustrate a simplified solution for one type of distortion. For more detail, please refer to our paper [Wright 2008].Figure 7 demonstrates the process of an algorithm to recognize a face image with severe facial disguise by sunglasses. The algorithm first partitions the left test image into eight local regions, and individually recovers a sparse representation per region. Notice that with the sunglasses occluding the eye regions, the corresponding representations from these regions do not provide correct classification. However, when we look at the overall classification result over all regions, the nonocclused regions provide a high consensus for the image to be classified as Subject 1 (as shownin red circles in the figure). Therefore, the algorithm simultaneously recovers the subject identity and the facial regions that are being disguised.Fig. 7. Solving for part-based sparse representation using local face regions. Left: Test image. Right: Estimation of sparse representation and the corresponding classification on the titles. The red circle identifies the correct classiciation.Q: What is the quantitative performance of this technique?A: Most of the representative results from our extensive experiment have been documented in our paper [Wright 2008]. The experiment was based on two established face recognition databases, namely, the Extended Yale B database [Lee 2005] and the AR database [Martinez 1998].In the following, we highlight some of the notable results. On the Extended Yale B database, the algorithm achieved 92.1% accuracy using 12-by-10 resolution images, 93.7% using single-eye-region images, and 98.3% using mouth-region images. On the AR database, the algorithm achieves 97.5% accuracy on face images with sunglasses disguise, and 93.5% with scarf disguise.Q: Does the estimation of sparse representation cost more computation and time compared to other methods?A: The complexity and speed of an algorithm are important to the extent that they do not hinder the application of the algorithm to real-world problems. Our technique uses some of the best-studied numerical routines in the literature, namely, L-1 minimization to be specific. The routines belong to a family of optimization algorithms called convex optimization, which have been known to be extremely efficient to solve on computer. In addition, considering the rapid growth of the technology in producing advanced micro processors today, we do not believe there is any significant risk to implement a real-time commercial system based on this technique.Q: With this type of highly accurate face recognition algorithm available, is it becoming more and more difficult to protect biometric information and personal privacy in urban environments and on the Internet?A: Believe it or not, a government agency, a company, or even a total stranger can capture and permanently log your biometric identity, including your facial identity, much easier than you can imagine. Based on a Time magazine report [Grose 2008], a resident living or working in London will likely be captured on camera 300 times per day! One can believe other people living in other western metropolitan cities are enjoying similar “free services.” If you like to stay indoor and blog on the Internet, your public photo albums can be easily accessed over the nonprotected websites, and probably have been permanently logged by search engines such as Google and Yahoo!.With the ubiquitous camera technologies today, completely preventing your facial identity from being obtained by others is difficult, unless you would never step into a downtown area in big cities and never apply for a driver’s license. However, there are ways to prevent illegal and involuntary access to your facial identity, especially on the Internet. One simple step that everyone can choose to do to stop a third party exploring your face images online is to prevent these images from being linked to your identity. Any classification system needs a set of training images to study the possible appearance of your face. If you like to put your personal photos on your public website and frequently give away the names of the people in the photos, over time a search engine will be able to link the identities of the people with the face images in those photos. Therefore, to prevent an unauthorized party to “crawl” into your website and sip through the valuable private information, you should make these photo websites under password protection. Do not make a large amount of personal images available online without consent and at the same time provide the names of the people on the same website.Previously we have mentioned many notable applications that involve face recognition. The technology, if properly utilized, can also revolutionize the IT industry to better protect personal privacy. For example, an assembly factory can install a network of cameras to improve the safety of the assembly line but at the same time blur out the facial images of the workers from the surveillance videos. A cellphone user who is doing teleconferencing can activate a face recognition function to only track his/her facial movements and exclude other people in the background from being transmitted to the other party. All in all, face recognition is a rigorous scientific study. Its sole purpose is to hypothesize, model, and reproduce the image-based recognition process with accuracy comparable or even superior to human perception. The scope of its final extension and impact to our society will rest on the shoulder of the government, the industry, and each of the individual end users. References[Grose 2008] T. Grose. When surveillance cameras talk. Time (online), Feb. 11, 2008.[Lee 2005] K. Lee et al.. Acquiring linear subspaces for face recognition under variable lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, 2005.[Martinez 1998] A. Martinez and R. Benavente. The AR face database. CVC Tech Report No. 24, 1998.[Olshausen 1997] B. Olshausen and D. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, vol. 37, 1997.[Serre 2006] T. Serre. Learning a dictionary of shape-components in visual cortex: Comparison with neurons, humans and machines. PhD dissertation, MIT, 2006.[Sinha 2006] P. Sinha et al.. Face recognition by humans: Nineteen results all computer vision researchers should know about. Proceedings of the IEEE, vol. 94, no. 11, November 2006.[Wright 2008] J. Wright et al.. Robust face recognition via sparse representation. (in press) IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.。

ACCV2002:The5th Asian Conference on Computer Vision,23–25January2002,Melbourne,Australia.Discriminative Regions for Human Face Detection∗J.Matas1,2,P.B´ılek1,M.Hamouz2,and J.Kittler21Center for Machine Perception,Czech Technical University{bilek,matas}@cmp.felk.cvut.cz2Centre for Vision,Speech,and Signal Processing,University of Surrey{m.hamouz,j.kittler}@AbstractWe propose a robust method for face detection based on the assumption that face can be represented by arrange-ments of automatically detectable discriminative regions. The appearance of face is modelled statistically in terms of local photometric information and the spatial relationship of the discriminative regions.The spatial relationship be-tween these regions serves mainly as a preliminary evidence for the hypothesis that a face is present in a particular po-sition.Thefinal decision is carried out using the complete information from the whole image patch.The results are very promising.1IntroductionDetection and recognition of objects is the most difficult task in computer vision.In many papers object detection and object recognition are considered as distinct problems, treated separately and under different names,e.g.object localisation(detection)and recognition.In our approach localisation of an object of a given class is a natural gener-alisation of object recognition.In the terminology that we introduce object detection is understood to mean the recog-nition of object’s class,while object recognition implies dis-tinguishing between specific objects from one class.Ac-cordingly,an object class,or category,is a set of objects with similar local surface properties and global geometry. In this paper we focus on object detection,in particular,we address the problem of face localisation.The main idea of this paper is based on the premise that objects in a class can be represented by arrangements of automatically detectable discriminative regions.Discrimi-∗This research was supported by the Czech Ministry of Education under Research Programme MSM210000012Transdisciplinary Biomedical En-gineering Research and by European Commission IST-1999-11159project BANCA.native regions are distinguished regions exhibiting proper-ties important for object detection and recognition.Distin-guished regions are”local parts”of the object surface,ap-pearance of which is stable over a wide range of views and illumination conditions.Instances of the category are repre-sented by a statistical model of appearance of local patches defined in terms of discriminative regions and by their re-lationship.Such a local model of objects has a number of attractive properties,e.g.robustness to partial occlusion and simpler illumination compensation in comparison with global models.Superficially,the framework seems to be no more than a local appearance-based method.The main difference is the focus in our work on the selection of regions where appear-ance is modelled.Detectors of such regions are built during the learning phase.In the detection stage,multiple detec-tors of discriminative regions process the image.Detection is then posed as a combinatorial optimisation problem.De-tails of the scheme are presented in Section3.Before that, previous work is revised in Section2.Experiments in de-tecting human faces based on the proposed framework are described in Section4.Possible refinements of the general framework are discussed in Section5.The main contribu-tions of this paper are summarised in Section6.2Previous WorkMany early object recognition systems were based on two basic approaches:•template matching—one or morefilters(templates), representing each object,are applied to a part of im-age,and from their responses the degree of similarity between the templates and the image is deduced.•measuring geometric features—geometric measure-ments(distance,angle...)between features are ob-tained and different objects are characterised by differ-ent constraints imposed on the measurements.It is was showed by Brunelli et al.[3]that template match-ing outperforms measuring geometric features,since the ap-proach exploits more information extracted from the image. Although template matching works well for some types of patterns,there must be complex solutions to cope with non-rigid objects,illumination variations or geometrical trans-formation due to different camera projections.Both approaches,template matching and measuring ge-ometric constraints,can be combined together to reduce their respective disadvantages.Brunelli et al.[3]showed that a face detector consisting of individual features linked together with crude geometry constraints have better per-formance than a detector based on”whole-face”template matching.Yuille[20]proposed the use of deformable templates to befitted to contrast profiles by the gradient descent of a suitable energy function.A similar approach was proposed by Lades et al.[9]and Wiskott et al.[19].They developed a recognition method based on deformable meshes.The mesh(representing object or object’s class)is overlaid over image and adjusted to obtain the best match between the node descriptors and the image.The likelihood of match is computed from the extent of mesh deformation.Schmid et al.[14,17]proposed detectors based on local-jets.The robustness is achieved by using spatial constraints between locally detected features.The spatial constraints are represented by angle and length ratios,that are supposed to be Gaussian variables each with their own mean and stan-dard deviation.Burl et al.[4,5,6]introduced a principled framework for representing possible deformations of objects using prob-abilistic shape models.The objects are again represented as constellations of rigid features(parts).The features are characterised photometrically.The variability of constella-tions is represented by a joint probability density function.A similar approach is used by Mohan et al.[13]for the detection of human bodies.The local parts are again recog-nised by detectors based on photometric information.The geometric constraints on mutual positions of the local parts in the image are defined heuristically.All the above mentioned methods make decisions about the presence or absence of the object in the image only from geometric constraints.Our proposed method shares the same framework,but in our work the local feature de-tector and geometric constraints define only a set of pos-sible locations of object in the image.Thefinal decision is made using photometric information,where the parts of object between the local features are taken into account as well.There are other differences between our approach and the approach of Schmid[17]or Burl[4,6].A coordinate system is introduced for each object from the object class. This allows us to tackle the problem of selecting distinctive and well localisable features in a natural way whereas in the case of Schmid’s approach,detectable regions were selected heuristically and a model was built from such selected fea-tures.Eventhough Weber[18]used an automatic feature selection,this was not carried out in an object-normalised space(as was in our approach),and consequently no re-quirements on the spatial stability of features were speci-fied.The relative spatial stability of discriminative regions used in our method facilitates a natural affine-invariant way of verifying the presence of a face in the image using corre-spondences between points in the normalized object space and the image,as will be discussed into detail further.3Method OutlineObject detection is performed in three stages.First,the discriminative region detectors are applied to image,and thus a set of candidate locations is obtained.In the second stage,the possible constellations(hypotheses)of discrimi-native regions are formed.In the third stage the likelihood of each hypothesis is computed.The best hypotheses are verified using the photometric information content from the test image.For algorithmic details see Section4.3.In the following sections we define several terms used in object recognition in a more formal way.The main aim of the sections is to unify different approaches in the literature and different taxonomy.3.1Object ClassesFor our purposes,we define an object class as a collec-tion of objects which share characteristic features,i.e.ob-jects are composed of several local parts and these parts are in a specific spatial relationship.We assume the local parts are detectable in the image directly and the possible arrangements of the local parts are given by geometrical constraints.The geometrical constraints should be invari-ant with respect to a predefined group of transformations. Under this assumption,the task of discrimination between two classes can be reduced to measuring the differences be-tween local parts and their geometrical relationships.3.2Discriminative RegionsImagine you are presented with two images depicting ob-jects from one class.You are asked to mark corresponding points in the image pair.We would argue that,unless distin-guished regions are present in the two images,the task is ex-tremely hard.Two views of a white featureless wall,a patch of grass,sea surface or an ant hill might be good examples. However,on most objects,wefind surface patches that can be separated from their surroundings and are detectable overa wide range of views.Before proceeding further,we give a more formal definition of distinguished region:Definition1Distinguished Region(DR)is any subset of an image that is a projection of a part of scene(an object) possessing a distinguishing property allowing its detection (segmentation,figure-ground separation)over a range of viewing and illumination conditions.In other words,the DR detection must be repeatable and stable w.r.t.viewpoint and illumination changes.DRs are referred to in the literature as’interest points’[7],’features’[1]or’invariant regions’[16].Note that we do not require DRs to have some transformation-invariant property that is unique in the image.If a DR possessed such a property,finding its corresponding DR in an other image would be greatly simplified.To increase the likelihood of this hap-pening,DRs can be equipped with a characterisation com-puted on associated measurement regions:Definition2A Measurement Region(MR)is any subset of an image defined by a transformation-invariant construc-tion(projective,affine,similarity invariant)from one or more(in case of grouping)regions.The separation of the concepts of DR and MRs is impor-tant and not made explicit in the literature.Since DRs are projections of the same part of an object in both views and MRs are defined in a transformation-invariant manner they are quasi view-point invariant.Besides the simplest and most common case where the MR is the DR itself,a MR may be constructed for example as a convex hull of a DR, afitted ellipse(affinelly invariant,[16]),a line segment be-tween a pair of interest points[15]or any region defined in a DR-derived coordinates.Of course,invariant measure-ments from a single or even multiple MRs associated with a DR will not guarantee a unique match on e.g.repetitive patterns.However,often DR characterisation by invariants computed on MR might be unique or almost unique.Note that,any set of pixels,not necessarily continu-ous,can posses a distinguishing property.Many percep-tual grouping processes detect such arrangements,e.g.a set of(unconnected)edges lying along a straight line form a DR of maximum edge density.The property is view-point quasi-invariant and detectable by the Hough Trans-form.The’distinguished pixel set’[10]would be a more precise term,but it is cumbersome.The definition of”local part”(sometimes also called ”feature”,”object component”etc.)is very vague in the recent literature.For our purpose it is important to define it more precisely.In the following discussion we will use the term”discriminative region”instead of”local part”.In this way,we would like to emphasise the difference between our definition of discriminative region and the usual sense of lo-cal part(a discriminative region is a local part with special properties important for its detection and recognition).Definition3A Discriminative Region is any subset of an image defined by discriminative descriptors computed on measurement region.Discriminative descriptors have to have the following properties:•Stability under change of imaging conditions.A discriminative region must be detectable over a wide range of imaging conditions(viewpoint,illumination).This property is guaranteed by definition of a DR.•Good intra-category localization.The variation in the position of the discriminative region in the object coordinate system should be small for different objects in the same category.•Uniqueness.A small number of similar discriminative regions should be present in the image of both object and background.•High incidence.The discriminative region should be detectable in a high proportion of objects from the same category.Note,there exists a trade-off between the ability to localise objects and the ability to discriminate between.A very dis-criminative part can be a strong cue,even if it appears in an arbitrary location on the surface of the object.On the other hand,a less discriminative part can only contribute infor-mation if it occurs in a stable spatial relationship relative to other parts.3.3Combining EvidenceThis is a rather important stage of the detection process, which significantly influences the overall performance of the system and makes it robust with respect to arbitrary geometrical transformations.The combination of evidence coming from the detected discriminative regions is carried out in a novel way,significantly different from approaches of the Schmid et al.[14,17]or Burl et al.[4,5,6].In most approaches,a shape model is built over the placement of particular discriminative regions.If an admis-sible configuration of these regions is found in an image,an instance of object in the image is hypothesised.It means that all the information conveyed by the area that lies be-tween the detected discriminative regions is discarded.If you imagine a collage,consisting of one eye,a nostril and a mouth corner placed in a reasonable manner on a black background,this will still be detected as a face,since no other parts of the image are needed to accept the”face-present”hypothesis.In our approach the geometrical constraints are modelled probabilistically in terms of spatial coordinates of discrim-inative regions.But these geometrical constraints are used only to define possible positions(hypotheses)of object inthe image.Thefinal decision about object presence in the image is deduced from the photometric information content in the original image.4ExperimentWe have carried out the experiment on face localisation [2]with the XM2VTS database[11].In order to verify the correctness of our localization framework,several simpli-fications to the general scheme are made.In the exper-iment the discriminative regions were semi-automatically defined as the eye-corners,the eye-centers the nostrils and the mouth corners.4.1Detector of discriminative regionsAs a distinguished region detector we use the improved Harris corner detector[8].Our implementation[2]of the detector is relatively insensitive to illumination changes, since the threshold is computed automatically from the neighborhood of the interest point.Such a corner detec-tor is not generally invariant to scale change,but we solve this problem by searching for interest points through several scales.We have observed[2]that the distribution of interest points coincide with the manually labelled points.It means, these points should define discriminative regions(here we suppose,that humans often identify interest points as most discriminative parts of object).Further,we have assumed that all potential in-plane face rotations and differences in face scale are covered by the training database.The MRs was defined very simply,as rectangular regions with the centre at the interest points.We select ten positions (the left eye centre,the right eye centre,the right left-eye corner,the left left-eye corner,the right right-eye corner, the left right-eye corner,the left nostril,the right nostril,the left mouth corner,the right mouth corner),which we further denote as regions1–10.All properties of a discriminative region are then determined by the size of the region.As a descriptor of a region we use the normalised colour infor-mation of all points contained in the region.Each region was modelled by a uni-modal Gaussian in a low-dimensional sub-space and the hypothesis whether the sample belongs to the class of faces is decided from the distance of this sample from the mean for a given region. The distance from the mean is measured as a sum of the in sub-space(DISS)and the from sub-space(DFSS)distances (Moghaddam et al.[12]).4.2Combining EvidenceThe proposed method is based onfinding the correspon-dences between generic face features(referred to as dis-criminative regions)that lie in the face-space and the face features detected in an image.This correspondence is then used to estimate the transformation that a generic face pos-sibly underwent.So far the correspondence of three points was used to estimate a four or six parametric affine trans-formation.When the the transformation from the face space to im-age space determined,the verification of a”face-present”hypothesis becomes an easy task.An inverse transforma-tion(i.e.transformation from the image space into the face-space)is found and the image patch(containing the three points of correspondence)is transformed into the face-space.The decision whether the”face-present”hypothesis holds or not is carried out in the face-space,where all the variations introduced by the geometrical transformation(so far only affine transformation is assumed to be the admis-sible transformation that a generic face can undergo)are compensated(or at least reduced to a negligible extent). The distance from a generic face class[12]is computed for the transformed patch and a threshold is used to determine whether the patch is from a face class or not.Moreover,many possible face patches do not have to be necessarily verified,since certain constraints can be put on the estimated transformation.Imagine for instance that all the feasible transformations that a face can undergo are the scaling from50%to150%of the original size in the face space and rotations up to30degrees.This is quite a rea-sonable limitation which will cause most of the correspon-dences to be discarded without doing a costly verification in the face space(in our experiments the pruning reached about70%).In case of the six parametric affine transform both shear and anisotropic scale is incorporated as the ad-missible transformation.4.3Algorithm summaryAlgorithm1:Detection of human faces1.Detection of the distinguished regions.For each im-age from the test set,detect the distinguished regions using the illumination invariant version of the Harris detector2.Detection of the discriminative regions.For each de-tected distinguished region determine to which class the region belongs using the PCA-based classifier in the colour space from among ten discriminative regionclasses(practically the eye corners,the eye centres,the nostrils and the mouth corners).The distinguished regions that do not belong to any of the predefined classes are discarded.bination of evidence.•Compute the estimate of the transformation fromthe image space to the face space using the corre-spondences between the three points in the facespace and in the image space.•Decompose this transformation into rotation,scale,translation and possibly shear and testwhether these parameters lie within a predefinedconstraints,i.e.make the decision,whether thetransformation is admissible or not.•If the transformation derived from the correspon-dences is admissible,transform the image patchthat is defined by the transformation of the faceoutline into the face space.4.Verification.Verify the”face present”hypothesis us-ing a PCA-based classifier.4.4ResultsResults of discriminative regions detector are sum-marised in Tab.1.Note that since the classifier is very sim-ple,the performance is not very high.However,even with such a simple detector of discriminative regions the system is capable of detecting faces with very low error,since we need only a small number of successfully detected discrim-inative regions(in our case only3).Several extensive experiments were conducted.Image patches were declared as”face”when their Mahanalobis distance based score lied below a certain threshold.200im-ages from the XM2VTS database were used for training a grayscale classifier based on the Moghaddam method[12], as mentioned earlier.The detection rate reached98%in case of XM2VTS database-see Fig.1for examples.Faces in several images containing cluttered background were successfully detected as shown in Fig.2.5Discussion and Future WorkWe proposed a method for face detection using discrim-inative regions.The detector performance is very good for the case when the general face detection problem is con-strained by assuming a particular camera and pose position.Table1.Performance of discriminative regiondetectorsfalse negative false positive%#%#Region131.8919172.263831Region210.686437.881342Region357.7634633.03433Region454.9232919.85218Region515.039022.34538Region613.698262.333260Region715.5393 4.0078Region812.5275 5.07104Region948.75292 6.2770Region1033.5620114.90233Correctly detected False rejections Figure1.Experiment resultsWe also assumed that the parts that appear distinctive to the human observer will be also discriminative,and therefore the discriminative regions were selected manually.In gen-eral,the correlation between distinctiveness and discrimi-nativeness cannot necessarily be assumed and therefore the discriminative regions should be”learned”from the training images.The training problem was addressed in this paper only partially.As an alternative the method proposed by Weber et al.[18]can be exploited.The admissible transformation,which a face can undergo has so far been restricted to affine transformation.Never-theless,the results showed even in such a simple case,that high detection performance can be achieved.Future modifi-cations will involve the employment of more complex trans-formations(such as general non-rigid transformations).The PCA based classification can be replaced by more powerful classifiers,such as Neural Networks,or Support Vector Ma-chines.Figure2.Experiments with cluttered back-ground6ConclusionIn the paper,a novel framework for face detection wasproposed.The framework is based on the idea that mostreal objects can be decomposed into a collection of localparts tied by geometrical constraints imposed on their spa-tial arrangement.By exploiting this fact,face detection canbe treated as recognition of local image patches(photomet-ric information)in a given configuration(geometric con-straints).In our approach,discriminative regions serve as apreliminary evidence reducing the search time dramatically.This evidence is utilised for generating a normalised versionof the image patch,which is then used for the verificationof the”face present”hypothesis.The proposed method was applied to the problem of facedetection.The results of extensive experiments are verypromising.The experiments demonstrated that the pro-posed method is able to solve a rather difficult problem incomputer vision.Moreover we showed that even simplerecognition methods(with a limited capability when usedalone)can be configured to create powerful framework ableto tackle such a difficult task as face detection.References[1] A.Baumberg.Reliable feature matching across widely sepa-rated views.In Proc.of Computer Vision and Pattern Recog-nition,pages I:774–781,2000.[2]P.B´ılek,J.Matas,M.Hamouz,and J.Kittler.Detection ofhuman faces from discriminative regions.Technical ReportVSSP–TR–2/2001,Department of Electronic&ElectricalEngineering,University of Surrey,2001.[3]R.Brunelli and T.Poggio.Face recognition:Features vs.templates.IEEE Trans.on Pattern Analysis and MachineIntelligence,15(10):1042–1053,1993.[4]M.C.Burl,T.K.Leung,and P.Perona.Face localizationvia shape statistics.In Proc.of International Workshop onAutomatic Face and Gesture Recognition,pages154–159,1995.[5]M.C.Burl and P.Perona.Recognition of planar objectclasses.In Proc.of Computer Vision and Pattern Recog-nition,pages223–230,1996.[6]M.C.Burl,M.Weber,and P.Perona.A Probabilistic ap-proach to object recognition using local photometry abdglobal Geometry.In Proc.of European Conference on Com-puter Vision,pages628–641,1998.[7]Y.Dufournaud,C.Schmid,and R.Horaud.Matching im-ages with different resolutions.In Proc.of Computer Visionand Pattern Recognition,pages I:612–618,2000.[8] C.J.Harris and M.Stephens.A combined corner and edgedetector.In Proc.of Alvey Vision Conference,pages147–151,1988.[9]des,J. C.V orbr¨u ggen,J.Buhmann,nge,C.von der Malsburg,R.P.W¨u rtz,and W.Konen.Distrotioninvariant object recognition in the dynamic link architecture.IEEE Trans.on Pattern Analysis and Machine Intelligence,42(3):300–310,1993.[10]J.Matas,M.Urban,and T.Pajdla.Unifying view for wide-baseline stereo.In B.Likar,editor,puter Vi-sion Winter Workshop,pages214–222,Ljubljana,Sloveni,February2001.Slovenian Pattern Recorgnition Society.[11]K.Messer,J.Matas,J.Kittler,J.Luettin,and G.Maitre.XM2VTSDB:The extended M2VTS database.In R.Chel-lapa,editor,Second International Conference on Audio andVideo-based Biometric Person Authentication,pages72–77,Washington,USA,March1999.University of Maryland.[12] B.Moghaddam and A.Pentland.Probabilistic visual learn-ing for object detection.In Proc.of International Confer-ence on Computer Vision,pages786–793,1995.[13] A.Mohan,C.Papageorgiou,and T.Poggio.Example-basedobject detection in images by components.IEEE Trans.onPattern Analysis and Machine Intelligence,23(4):349–361,2001.[14] C.Schmid and R.Mohr.Local grayvalue invariants for im-age retrieval.IEEE Trans.on Pattern Analysis and MachineIntelligence,19(5):530–535,1997.[15] D.Tell and S.Carlsson.Wide baseline point matching usingaffine invariants computed from intensity profiles.In Proc.of European Conference on Computer Vision,pages754–760,2000.[16]T.Tuytelaars and L.van Gool.Wide baseline stereo match-ing based on local,affinely invariant regions.In Proc.ofBritish Machine Vision Conference,pages412–422,2000.[17]V.V ogelhuber and C.Schmid.Face detection based ongeneric local descriptors and spatial constraints.In Proc.of International Conference on Computer Vision,pagesI:1084–1087,2000.[18]M.Weber,M.Welling,and P.Perona.Unsupervised learn-ing of models for recognition.In Proc.of European Confer-ence on Computer Vision,pages18–32,2000.[19]L.Wiskott,J.-M.Fellous,N.Kr¨u ger,and C.von der Mals-burg.Face recognition by elastic bunch graph matching.IEEE Trans.on Pattern Analysis and Machine Intelligence,19(7):775–779,1997.[20] A.L.Yuille.Deformable templates for face recognition.Journal of Cognitive Neuroscience,3(1):59–70,1991.。

榜单Top 30如下,欢迎拾遗补缺:[1]Rapid Object Detection using a Boosted Cascade of Simple Features (Citations: 3296,PER=299.64)Paul A. Viola, Michael J. Jones @CVPR , vol. 1, pp. 511-518, 2001[2]Histograms of Oriented Gradients for Human Detection (Citations: 1704, PER=243.43)Navneet Dalal, Bill Triggs @CVPR , vol. 1, pp. 886-893, 2005[3]SURF: Speeded-Up Robust Features (Citations: 1054, PER=175.67)Herbert Bay, Tinne Tuytelaars, Luc J. Van Gool @ECCV , pp. 404-417, 2006[4]Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural SceneCategories (Citations: 873, PER=145.5)Svetlana Lazebnik, Cordelia Schmid, Jean Ponce @CVPR , vol. 2, pp. 2169-2178, 2006[5]Object Class Recognition by Unsupervised Scale-Invariant Learning (Citations: 1071,PER=119)Robert Fergus, Pietro Perona, Andrew Zisserman @CVPR , vol. 2, pp. 264-271, 2003[6]Robust Real-Time Face Detection (Citations: 1092, PER=99.27)Paul A. Viola, Michael J. Jones @ ICCV , 2001[7]A Bayesian hierarchical model for learning natural scene categories (Citations: 677,PER=96.71)Fei-Fei Li, Pietro Perona @CVPR , vol. 2, pp. 524-531, 2005[8]Scalable Recognition with a Vocabulary Tree (Citations: 570, PER=95)David Nistér, Henrik Stewénius @CVPR , vol. 2, pp. 2161-2168, 2006[9]Real-Time Tracking of Non-Rigid Objects Using Mean Shift (Citations: 1132,PER=94.33)Dorin Comaniciu, Visvanathan Ramesh, Peter Meer @CVPR , vol. 2, pp. 2142-149vol.2, 2000[10]Visual Categorization with Bags of Keypoints (Citations: 745, PER=93.13)Gabriella Csurka, Christopher R. Dance, Lixin Fan, etc @ECCV , 2004[11]Video Google: A Text Retrieval Approach to Object Matching in Videos (Citations:790, PER=87.78)Josef Sivic, Andrew Zisserman @ ICCV , pp. 1470-1477, 2003[12]What Energy Functions Can Be Minimized via Graph Cuts? (Citations: 842, PER=84.2)Vladimir Kolmogorov, Ramin Zabih @ECCV , pp. 65-81, 2002[13]Overview of the Face Recognition Grand Challenge (Citations: 578, PER=82.57)P. Jonathon Phillips, Patrick J. Flynn, W. Todd Scruggs, etc @CVPR , vol. 1, pp.947-954, 2005[14]Robust wide baseline stereo from maximally stable extremal regions (Citations: 810,PER=81)Jiri Matas, Ondrej Chum, Martin Urban, etc @BMVC , vol. 1, 2002[15]PCA-SIFT: A More Distinctive Representation for Local Image Descriptors (Citations:639, PER=79.88)Yan Ke, Rahul Sukthankar @CVPR , vol. 2, pp. 506-513, 2004[16]Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects inND Images (Citations: 796, PER=72.36)Yuri Y. Boykov, Marie-pierre Jolly @ ICCV , pp. 105-112, 2001[17]An extended set of Haar-like features for rapid object detection (Citations: 710,PER=71)Rainer Lienhart, Jochen Maydt @ICIP , vol. 1, pp. 900-903, 2002[18]A Database of Human Segmented Natural Images and its Application to EvaluatingSegmentation Algorithms and Measuring Ecological Statistics (Citations: 750,PER=68.18)David R. Martin, Charless Fowlkes, Doron Tal, etc @ ICCV , pp. 416-425, 2001 [19]Detecting Pedestrians Using Patterns of Motion and Appearance (Citations: 584,PER=64.89)Paul A. Viola, Michael J. Jones, Daniel Snow @ ICCV , pp. 734-741, 2003[20]Object Recognition as Machine Translation: Learning a Lexicon for a Fixed ImageVocabulary (Citations: 603, PER=60.3)Pinar Duygulu, Kobus Barnard, João F. G. De Freitas, etc @ECCV , pp. 97-112, 2002 [21]Real-Time Simultaneous Localisation and Mapping with a Single Camera (Citations:527, PER=58.56)Andrew J. Davison @ ICCV , pp. 1403-1410, 2003[22]Recognizing Human Actions: A Local SVM Approach (Citations: 440, PER=55)Christian Schüldt, Ivan Laptev, Barbara Caputo @ICPR , pp. 32-36, 2004[23]Actions as Space-Time Shapes (Citations: 379, PER=54.14)Moshe Blank, Lena Gorelick, Eli Shechtman, etc @ ICCV , vol. 2, pp. 1395-1402, 2005 [24]A Discriminatively Trained, Multiscale, Deformable Part Model (Citations: 215,PER=53.75)Pedro F. Felzenszwalb, David A. Mcallester, Deva Ramanan @CVPR , pp. 1-8, 2008 [25]Non-parametric Model for Background Subtraction (Citations: 642, PER=53.5)Ahmed M. Elgammal, David Harwood, Larry S. Davis @ECCV , pp. 751-767, 2000 [26]A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms(Citations: 318, PER=53)Steven M. Seitz, Brian Curless, James Diebel, etc @CVPR , vol. 1, pp. 519-528, 2006 [27]Comprehensive Database for Facial Expression Analysis (Citations: 636, PER=53)Takeo Kanade, Yingli Tian, Jeffrey F. Cohn @FG , pp. 46-53, 2000[28]Learning Realistic Human Actions from Movies (Citations: 211, PER=52.75)Ivan Laptev, Marcin Marszalek, Cordelia Schmid, etc @CVPR , pp. 1-8, 2008 [29]Object Retrieval with Large Vocabularies and Fast Spatial Matching (Citations: 258,PER=51.6)James Philbin, Ondrej Chum, Michael Isard, etc @CVPR , 2007[30]Statistical Shape Influence in Geodesic Active Contours (Citations: 616, PER=51.33)Michael E. Leventon, W. Eric L. Grimson, Olivier D. Faugeras @CVPR , vol. 1, pp.1316-1323, 2000。

Profile-Based3D Face Registrationand RecognitionChao Li1and Armando Barreto1,21Electrical and Computer Engineering Department,Florida International University,33174Miami,USA{cli006,barretoa}@2Biomedical Engineering Department,Florida International University,33174Miami,USAAbstract.With the rapid development of3D imaging technology,facerecognition using3D range data has become another alternative in thefield of biometrics.Unlike face recognition using2D intensity images,which has been studied intensively by many researchers since the1960’s,3D range data records the exact geometry of a person and it is invariantwith respect to illumination changes of the environment and orientationchanges of the person.This paper proposes a new algorithm to registerand identify3D range faces.Profiles and contours are extracted for thematching of a probe face with available gallery faces.Different combina-tions of profiles are tried for the purpose of face recognition using a setof27subjects.Our results show that the central vertical profile is one ofthe most powerful profiles to characterize individual faces and that thecontour is also a potentially useful feature for face recognition.Keywords:2D,3D,biometrics,contour,face,intensity,moment,profile,range,recognition,registration.1IntroductionFace recognition has been widely studied during the last two decades.It is a branch of biometrics,which studies the process of automatically associating an identity with an individual by means of some inherent personal characteristics [1].Biometric characteristics include something that a person is or produces.Ex-amples of the former arefingerprints,the iris,the face,the hand/finger geometry or the palm print,etc.The latter include voice,handwriting,signature,etc.[2]. Compared with other biometric characteristics,the face is considered to be the most immediate and transparent biometric modality for physical authentication applications.Despite its intrinsic complexity,face-based authentication still re-mains of particular interest because it is perceived psychologically and physically as noninvasive.Significant motivations for its use include the following[2]:–Face recognition is a modality that humans largely depend on to authenticate other humans.–Face recognition is a modality that requires no or only weak cooperation to be useful.–Face authentication can be advantageously included in multimodal systems, not only for authentication purposes but also to confirm the aliveness of the signal source offingerprints,voice,etc.The definition of face recognition was formulated in[3]as:“Given an image of a scene,identify one or more persons in the scene using a stored database of faces.”This is called the‘one to many’problem or identification problem in face recognition.Another kind of problem is‘one to one’,i.e.,the authentication problem.This kind of problem is to determine whether the input face of a person is really the person he or she claims to be or not.In this paper,we deal with face recognition in thefirst scenario.The potentialfield of the application of face recognition is very wide,mostly in areas such as authentication,security and access control,which include the physical access control and logical access control.Especially in recent years,anti-terrorism has been a big issue throughout the world.Face recognition will play a more and more important role in its efforts.In the last ten years,most of the research work in the area of face recogni-tion used two-dimensional images,that is,gray level images taken by a camera. Many new techniques emerged in thisfield and achieved good recognition rates.A number of these techniques are outlined in survey publications,such as[5]. However,most of the2D face recognition systems are sensitive to the illumina-tion changes or orientation changes of the subjects.All these problems result from the incomplete information contained in a2D image about a face.On the other hand,a3D scan of a subject’s face has complete geometric information about the face,even including texture information,in the case of some scanners. It is believed that,on average,3D face recognition methods will achieve higher recognition rates than their2D counterparts.With the rapid development of3D imaging technology,3D face recognition will attract more and more attention.In[6],Bowyer provides a survey of3D face recognition technology.Some of the techniques are derived from2D face recognition,such as Principal Com-ponent Analysis(PCA)used in[7,8]to extract features from faces.Some of the techniques are unique to3D face recognition,such as the geometry match-ing method in[9],the profile matching proposed in[10,11]and the isometric transformation method presented in[4].This paper outlines a new algorithm used to register3D face images auto-matically.Specific profiles are defined in the registered faces and these are used for matching against the faces on a database including27subjects.The impact of using different types of profiles for matching is studied.Also the possibility of using the contour of a face as a feature for face recognition is explored.The structure of the paper is as follows:Section2describes the database used for this research.Section3presents the registration algorithm and Section 4outlines the matching procedure using different profiles and contours and gives the results of the experiments.Section5is the conclusion.23D Face DatabaseUnlike2D face recognition research,for which there are numerous databases available in the Internet,there are only a few3D face databases available to researchers.Examples are the Biometrics Database from the University of Notre Dame[12]and the University of South Florida(USF)face database[13].In our experiment,the USF database is used.The USF database of human3D face images is maintained by researchers in the department of Computer Science at the University of South Florida,and sponsored by the Defense Advanced Research Projects Agency(DARPA).The USF database has a total number of111subjects(74male;37female).All subjects have a neutral facial expression.Some of the subjects were scanned multiple times.In our experiment,the3D faces of the subjects who were scanned multiple times are considered,so that one scan can be used as a gallery image, i.e.,one of the faces that are assumed to be prerecorded,and the remaining scans from the same subject can be used as probe images,i.e.,faces to be identified.A subset of27subjects is used in this research,with27faces in the gallery and 27scans to be identified(probe faces).Fig.1.Rendered3D face image(Left)and triangulated mesh3D face image(Right) The3D scans in the USF database were acquired using a Cyberware3030 scanner.This scanner incorporates a rugged,self-contained optical range-finding system,whose dynamic range accommodates varying lighting conditions and surface properties[14].The faces in the database were converted into Stereolitography(STL)format. Each face has an average of18,000vertices and36,000triangles.Figure1shows a face from the database in its rendered and triangulated mesh forms.3Registration and PreprocessingIn3D face recognition,registration is a key pre-processing step.Registering may be crucial to the efficiency of some matching methods.Earlier work used Princi-pal Curvature and Gaussian Curvature to segment the face surface and register it,such as the methods in[9,10,15].The disadvantage of using curvatures to register faces is that this process is very computationally intensive and requires very accurate range data[16].Another method often used involves choosing several user-selected landmark locations on the face,such as the tip of the nose,the inner and outer corners of the eyes,etc.,and then using the affine transformation to register the face to a standard position[7,8,11].A third method performs registration by using moments.The matrix(Equa-tion1)constituted by the six second moments of the face surface:m200,m020, m002,m110,m101,m011,contains the rotational information of the face[17].M=⎡⎣m200m110m101m110m020m011m101m011m002⎤⎦(1)U∆U =SV D(M)(2) By applying the Singular Value Decomposition(Equation2),the unitary matrix U represents the rotation and the diagonal matrix∆represents the scale, for the three axes.U can be used as an affine transformation matrix on the original face surface.The problem with this method is that during repeated scans for the same subject,besides the changes in the face area,there are also some changes outside the face area,such as the different caps worn by the subjects during the scanning process(Fig.1).These additional changes will also impact the registration of the face surface,causing the registration for different instances of the same subject not to be the same.This limitation constrains this approach to only the early stages of registration.Figure2is an example of a scanned face rendered in a Cartesian coordinate system,with the X axis corresponding to the depth direction of the face,the Y axis corresponding to the length direction of the face and the Z axis correspond-ing to the width direction of the face.In the registration process,we assume that each subject kept his head upright during scanning,so that the face orientation around the X axis does not need to be corrected,but the orientation changes in the Y and Z axes need to be compensated for.The registration algorithm proposed does not require user-defined landmark locations and can be done automatically.First,the tip of the nose is found by looking for the point with the maximum value in the X direction.Then a‘cutting plane’,parallel to the XZ plane is set to contain the tip of the nose(Fig.3).The intersection of this cutting plane with the face defines the horizontal profile curve.In effect,the result is a discretized curve with a spacing of1.8mm between samples(Fig.4).Fig.2.Face surface in a Cartesian co-ordinate system(the units in the threeaxes aremm)Fig.3.Illustration of the extraction of the horizontalprofileFig.4.Discrete horizontal profile before registrationA trilinear interpolation method is used to find the value of each point in this profile.(Fig 5).The point P is in the YZ plane.P’is the intersection between the triangle ABC and the straight line PP’,which is normal to the YZ plane.The length of PP’is the profile value corresponding to point P.Next,the following cost function is minimized with respect to α,where I is the index of the maximum point of X.Fig.5.Trilinear interpolation to get exact values of profileelevationsFig.6.Horizontal profile after registration around Y axisE =15i =1[(X (I +i )−X (I −i )]2(3)For every α,the affine transformation is applied to the face surface using the following transformation matrix,and the horizontal profile is found,as illustrated before.T =⎡⎣cos α0−sin α010sin α0cos α⎤⎦(4)α=arg{min[15i=1[(X(I+i)−X(I−i)]2]}(5)Thefinal value ofαrepresents the orientation change around the Y axis required for the registration.Figure6shows the horizontal profile seen in Figure4,after the Y axis ad-justment has been performed:Typically,a rotational adjustment around the Z axis will also be required. Analogous to Figure3,Figure7shows the intersection of the face surface with a cutting plane,which is parallel to the XY plane and passes through the tip of the nose.This intersection is the central vertical profile.Similar to Figure4, Figure8shows the discretized central vertical profile,before adjustment.Fig.7.Illustration of extraction of central vertical profileThe cost function to be minimized in this case is the following,E=abs(X(I−50)−X(I+40))(6) Minimization is with respect toα.I is the index of profile point with the largest value of X.α=arg{[min(abs(X(I−50)−X(I+40))]}(7) For everyα,the affine transformation is applied,using the following trans-formation matrix.Fig.8.Discretized central profile beforeregistrationFig.9.Central vertical profile after reg-istrationFig.10.Mesh plot of the range image(left)and gray level image plot of range data(right)T=⎡⎣sinα0cosαcosα0−sinα010⎤⎦(8)The aim is to equalize the X coordinates of two critical points contained in the central vertical profile:the end point on the forehead side and the end point on the chin side.Figure9is the central vertical profile after adjustment around the Z axis.To complete the registration process,a grid of91by81points is prepared that corresponds to pairs of(y,z)coordinates.The point(51,41)of the grid is made to coincide with the tip of the nose in the adjusted face surface.This grid assumes a spacing of1.8mm in both the Y and Z directions,with91points in the length direction and81points in the width direction.The value associated to each point in the grid is the distance between the point in the face surface and the corresponding location on the YZ plane,calculated by trilinear interpolation (Fig.5).The values are offset so that the value corresponding to the tip ofthe nose is normalized to100mm.Values below20mm in the grid area are thresholded to20mm.Figure10is a Matlab mesh plot of the resulting grid,and a gray level plot of the same range image.4Recognition Experiments and ResultsFor the experiments described here,a gallery database of27range images of 27subjects(one for each subject)and a probe database of27different scans of the same27subject were used.The time interval between the acquisition of the gallery image and the corresponding probe image for a given subject ranges from several months to one year.The use of profile matching as a means for face recognition is a very intuitive idea that has been proposed in the past.In[10,11,18,19],different researchers explored the profile matching method in different ways.In our research,because the range image has already been obtained,profile extraction is simple.We have,in fact,tested the efficiency of several potential profile combinations used for identification.Besides profiles,the contour of a face was also tested for its potential applicability for face recognition.In our experiment,a frontal contour defined30mm behind the tip of the nose was extracted for each scan.Although in computing the distance or dissimilarity between profiles,some researchers[19] used the Hausdoffdistance,we found that the Euclidean distance is suitable for the context of our experiment.The following six different feature combinations and direct range image match-ing variations were tested with the experimental data described above:(a)Central vertical profile alone.(b)Central horizontal profile alone.(c)Contour,which is30mm behind the tip of the nose.(d)Central vertical profile and two horizontal profiles.The two horizontal profiles are defined at18mm and36mm above the tip of the nose.The distance between central profiles is given the weight of0.7;the two horizontal profile distances are given the weight of0.15each,towards the overall matching score for identification.(e)Central vertical profile and two more vertical profiles,one passing18mm to the left of the central profile,the other passing18mm to the right of the central profile.The distance between central profiles is given the weight of0.7; the other two vertical profile distances are given a weight of0.15each.(f)Using the entire range image.From the results in Figure11,we can see that scheme(a),i.e.,matching the central vertical profile alone,has the highest recognition rate.On the other hand,using the whole range image for matching yields the lowest recognition rate.Because the probe image was taken several months to one year after the gallery image was taken,we have sufficient reason to assume there were changesin the face for every subject.The high recognition rate using the central vertical profile suggests that this profile has the most distinctive properties between different subjects and is the most consistent through time for the same subject. These observations concur with a similar analysis,presented in[11].Besides the central vertical profile,the contour of a face also shows its potential as a feature to be used in face recognition.5ConclusionIn this paper,a new registration algorithm for3D face range data was proposed. This algorithm is valid under some constrains;i.e.,only orientation changes along the width direction and length direction of the face need to be compensated.But this algorithm can also be extended to register arbitrarily oriented face surfaces in3D space,combined with simple registration algorithms that use the six second moments of the face surface.Also in this paper,face identification based on profile matching was explored. Different combinations of profiles for matching were compared.It was found that the central vertical profile is the feature that best represented the intrinsic characteristics of each face and had the highest identification value among all the profile combinations tested.The contour of a face also has the potential to be used as one of the features in face recognition.AcknowledgmentsThis work was sponsored by NSF grants IIS-0308155and HRD-0317692.The participation of Mr.Chao Li in this research was made possible through the support of his Florida International University Presidential Fellowship. References1. A.K.Jian,R.Bolle and S.Pankanti,Biometrics-Personal Identification in NetworkedSociety.1999,Norwell,MA:Kluwer2.J.Ortega-Garcia,J.Bigun,D.Reynolds and J.Gonzales-Rodriguez,Authticationgets personal with biometrics.IEEE Signal Processing,2004.21(No.2):p.50-613.R.Chellappa,C.Wilson and S.Sirohey,Human and Machine Recognition of Faces:A Survey.Proceedings of the IEEE,1995.83(5):p.705-7404. A.Bronstein,M.Bronstein and R.Kimmel,Expression-invariant3D face recogni-tion.In the Proceedings of Audio and video-based Biometric Person Authentication (AVBPA),2003:p.62-695.W.Zhao,R.Chellappa and A.Rosenfeld,Face recognition:a literature survey.ACMComputing Survey,2003.35:p.399-4586.K.Bowyer,K.Chang and P.Flynn,A Survey of Approaches to3D and Multi-Modal3D+2D Face Recognition.In Proceedings of IEEE International Conference on Pattern Recognition.2004:p.358-3617.K.Chang,K.Bowyer and P.Flynn,Multimodal2D and3D biometrics for facerecognition.In the Proceedings of ACM workshop on Analysis and Modeling of Faces and Gestures.2003:p.25-328. C.Hesher,A.Srivastava and G.Erlebacher,A novel technique for face recognitionusing range images.In the Proceedings of Seventh International Symposium on Signal Processing and Its Application.20039.G.Gordon,Face recognition based on depth maps and surface curvature.In Geo-metric Methods in Computer Vision,SPIE.July1991:p.1-1210.J.Y.Cartoux,Preste and M.Richetin,Face authentication or recognition byprofile extraction from range images.In the Proceedings of the Workshop on Interp.of3D Scenes.1989:p.194-19911.T.Nagamine,T.Uemura and I.Masuda,3D facial image analysis for human iden-tification.In the Proceedings of Internaitonal Conference on Pattern Recognition (ICPR,1992):p.324-32712.K.Bowyer,University of Notre Dame Biometrics Database./%7Ecvrl/UNDBiometricsDatabase.html13.K.Bowyer,S.Sarkar,USF3D Face Database.f.edu/HumanID/2001.15.H.Tanaka,T.Ikeda,Curvature-based face surface recognition using sphericalcorrelation-principal directions for curved object recognition.In the Proceedings of the13th International Conference on Pattern Recognition,1996:p.638-642 16.Y.Wu,G.Pan and Z.Wu,Face Authentication Based on Multiple Profiles Ex-tracted from Range Data.In the Proceedings of4th International Conference on audio-and video-based biometric person authentication.2003:p.515-52217.M.Elad,A.Tal and S.Ar,Content Based Retrieval of VRML Objects-An Iterativeand Interactive Approach.In the Proceedings of the6th Eurographics Workshop in Multimedia.2001.18. C.Beumier,C.Acheroy,Face verification from3d and grey level clues.PatternRecognition Letters,2001.22(12):p.1321-132919.G.Pan,Y.Wu and Z.Wu,Investigating Profile Extraction from Range Data for3D Face Recognition.In the Proceedings of2003IEEE International Conferrence on Systems Man and Cybernetics:p.1396-1399。



摘 要
在竞争激烈的工业自动化生产过程中,机器视觉对产品质量的把关起着举足 轻重的作用,机器视觉在缺陷检测技术方面的应用也逐渐普遍起来。与常规的检 测技术相比,自动化的视觉检测系统更加经济、快捷、高效与 安全。纹理物体在 工业生产中广泛存在,像用于半导体装配和封装底板和发光二极管,现代 化电子 系统中的印制电路板,以及纺织行业中的布匹和织物等都可认为是含有纹理特征 的物体。本论文主要致力于纹理物体的缺陷检测技术研究,为纹理物体的自动化 检测提供高效而可靠的检测算法。 纹理是描述图像内容的重要特征,纹理分析也已经被成功的应用与纹理分割 和纹理分类当中。本研究提出了一种基于纹理分析技术和参考比较方式的缺陷检 测算法。这种算法能容忍物体变形引起的图像配准误差,对纹理的影响也具有鲁 棒性。本算法旨在为检测出的缺陷区域提供丰富而重要的物理意义,如缺陷区域 的大小、形状、亮度对比度及空间分布等。同时,在参考图像可行的情况下,本 算法可用于同质纹理物体和非同质纹理物体的检测,对非纹理物体 的检测也可取 得不错的效果。 在整个检测过程中,我们采用了可调控金字塔的纹理分析和重构技术。与传 统的小波纹理分析技术不同,我们在小波域中加入处理物体变形和纹理影响的容 忍度控制算法,来实现容忍物体变形和对纹理影响鲁棒的目的。最后可调控金字 塔的重构保证了缺陷区域物理意义恢复的准确性。实验阶段,我们检测了一系列 具有实际应用价值的图像。实验结果表明 本文提出的纹理物体缺陷检测算法具有 高效性和易于实现性。 关键字: 缺陷检测;纹理;物体变形;可调控金字塔;重构
Keywords: defect detection, texture, object distortion, steerable pyramid, reconstruction



step 2 ) feature extraction for trained set(database) at the same time for input image
Feature extraction can provide effective information .Like those pictures, a birthmark under the right eye is useful to distinguish that they are one person.
n A computer application for automatically identifying or verifying a person from a digital image or a video frame from a video source.
Processing Flow
Face recognition to pay
Alibaba Group founder Jack Ma showed off the technology Sunday during a CeBIT event that would seamlessly scan users’ faces via their smartphones to verify mobile payments. The technology, called “Smile to Pay,” is being developed
Face Recognition Access Control System
Face Recognition acceste, Whenever one wishes to access a building, FaceGate verifies the person’s entry code or card, then compares his face with its stored “key.” It registers him as being authorized and allows him to enter the building. Access is denied to anyone whose face does not match.
















基于动作的跟踪是依赖于动作检测技术,且该技术可以被分成视频流(optical flow)的(检测)方法和动作—能量(motion-energy)的(检测)方法。




当前很多的文献都涉及到的这类方法时源于Kass et al.在蛇形汇率波动[5]的成就。







关键词:三维人脸识别;几何特征;深度图像;LBP算子;FisherFace中图分类号:TP391 文献标识码:A 文章编号:1009-3044(2013)08-1864-051 概述基于二维图像的人脸识别算法经过半个多世纪的发展,已经取得了一定的研究成果。













人脸面部识别算法的点英文回答:Facial recognition algorithms leverage computer vision and machine learning techniques to detect, analyze, and map facial features. These algorithms are often utilized in security, surveillance, and access control systems, as well as in consumer applications such as photo tagging andsocial media filters.The process of facial recognition typically involves the following steps:1. Face Detection: The algorithm identifies the presence of a face in an image or video frame.2. Feature Extraction: Key facial features, such as the eyes, nose, mouth, and other unique characteristics, are extracted and mapped.3. Feature Representation: The extracted features are converted into a numerical representation that can beeasily compared and processed.4. Matching: The feature representation of an unknown face is compared to a database of known faces to identify potential matches.Types of Facial Recognition Algorithms:There are several different types of facial recognition algorithms, including:Local Binary Patterns (LBP): LBP algorithms analyze the texture of facial features by comparing the brightness values of adjacent pixels.Scale Invariant Feature Transform (SIFT): SIFT algorithms detect and describe key points in an image, which are then used for matching.Histogram of Oriented Gradients (HOG): HOG algorithmscreate histograms of the gradients of image pixels, which are then used to extract features.Convolutional Neural Networks (CNNs): CNNs are deep learning algorithms that have achieved state-of-the-art performance in facial recognition tasks.Accuracy and Limitations:The accuracy of facial recognition algorithms depends on factors such as the quality of the input image, the algorithm itself, and the size and diversity of thetraining dataset. While facial recognition algorithms have made significant progress in recent years, they are not yet foolproof and can be susceptible to errors caused byfactors such as facial expressions, lighting conditions, and aging.Ethical Considerations:The use of facial recognition algorithms raises ethical concerns related to privacy, surveillance, and bias. It iscrucial to ensure that these algorithms are used in a responsible and transparent manner, with appropriate safeguards in place to protect individual rights and freedoms.中文回答:人脸面部识别算法要点。



Method of Face Recognition Based on Red-BlackWavelet Transform and PCAYuqing He, Huan He, and Hongying YangDepartment of Opto-Electronic Engineering,Beijing Institute of Technology, Beijing, P.R. China, 10008120701170@。


With the development of the man—machine interface and the recogni—tion technology, face recognition has became one of the most important research aspects in the biological features recognition domain. Nowadays, PCA(Principal Components Analysis) has applied in recognition based on many face database and achieved good results. However, PCA has its limitations: the large volume of computing and the low distinction ability。

In view of these limitations, this paper puts forward a face recognition method based on red—black wavelet transform and PCA. The improved histogram equalization is used to realize image pre-processing in order to compensate the illumination. Then, appling the red—black wavelet sub—band which contains the information of the original image to extract the feature and do matching。



, U = [ U 1 …U N ] 。
( 3)
则训练样本影像集 X 在特征脸子空间上的投影为 : ρPCA = W eig X T ( 4) 因此 ,前 m 个主轴所决定的子空间能最大可能地还 原原始数据 。引入主成分的目的是为了降低维数 ,
m 到底取多大合适呢 ? 可以通过贡献率或者累计
贡献率来确定 m :
1 PCA 和 ICA 的人脸识别
