Face recognition A hybrid neural network approach
常用的特征提取方法包括局部二值模式(Local Binary Pattern, LBP)、主成分分析(Principal Component Analysis, PCA)和深度学习方法等。
常用的分类器包括支持向量机(Support Vector Machine, SVM)、最近邻(Nearest Neighbor, NN)和深度神经网络(Deep Neural Network, DNN)等。
二、多模态融合技术的应用领域1. 安全防护领域:多模态融合技术在安全防护领域有着广泛的应用。
2. 出入管理领域:多模态融合技术在出入管理领域也发挥着重要作用。
3. 金融支付领域:多模态融合技术可以用于金融支付领域的身份验证。
4. 智能家居领域:多模态融合技术在智能家居领域的应用潜力巨大。
1. 《Face Recognition: A Literature Survey》- 作者: Rabia Jafri, Shehzad Tanveer, and Mubashir Ahmad这篇综述性文献回顾了人脸识别领域的相关研究,包括了人脸检测、特征提取、特征匹配以及人脸识别系统的性能评估等。
2. 《Deep Face Recognition: A Survey》- 作者: Mei Wang, Weihong Deng该综述性文献聚焦于深度学习在人脸识别中的应用。
文中详细介绍了深度学习中的卷积神经网络(Convolutional Neural Networks, CNN)以及其在人脸特征学习和人脸识别中的应用。
3. 《A Survey on Face Recognition: Advances and Challenges》-作者: Anil K. Jain, Arun Ross, and Prabhakar这篇综述性文献回顾了人脸识别技术中的进展和挑战。
4. 《Face Recognition Across Age Progression: A Comprehensive Survey》- 作者: Weihong Deng, Jiani Hu, Jun Guo该综述性文献主要关注跨年龄变化的人脸识别问题。
核心问题是:脸部的形象是如何由大脑编码的?在我国6.1儿童节这天,纽约时报对两位加州理工学院生物学家Le Chang和Doris Y. Tsao在周四的“Cell”杂志上文章进行报道,报道称Caltech团队确切地知道面部的哪些方面触发细胞以及面部特征如何被编码。
达特茅斯脸部识别专家布拉德·杜尚恩(Brad Duchaine)表示:“打破面孔代码肯定会是一件很大的事情。
面部识别在中国的应用英语作文Facial recognition technology has been rapidly advancing in recent years, and China has emerged as a global leader in the development and implementation of this innovative technology. China's vast population, coupled with its ambitious plans to build a comprehensive surveillance system, has made facial recognition a crucial component of the country's technological landscape. This essay will explore the various applications of facial recognition in China, its benefits, and the ethical concerns surrounding its use.One of the primary applications of facial recognition in China is its integration into the country's extensive surveillance network. China has been investing heavily in building a nationwide network of surveillance cameras, with estimates suggesting that the country has over 200 million surveillance cameras installed, making it the world's largest video surveillance system. Facial recognition technology is used to identify and track individuals as they move through public spaces, providing the government with a powerful tool for monitoring and controlling its citizens.The Chinese government has justified the use of facial recognition by claiming that it enhances public safety and security. The technology has been employed to identify and apprehend criminals, as well as to monitor the movements of individuals deemed to be potential threats to social stability. For example, the government has used facial recognition to track and monitor the Uyghur minority population in the Xinjiang region, a practice that has been widely criticized by human rights organizations as a violation of individual privacy and a form of ethnic discrimination.In addition to its use in surveillance, facial recognition technology has also been integrated into various other aspects of daily life in China. The technology is widely used in mobile payment systems, allowing users to authenticate their identity and make payments using their facial features. This has led to a significant increase in the adoption of mobile payment platforms, such as Alipay and WeChat Pay, which have become ubiquitous in the country.Furthermore, facial recognition has been implemented in various public services, such as accessing public transportation, entering office buildings, and even checking into hotels. This has led to increased efficiency and convenience for users, but it has also raised concerns about the potential for abuse and the erosion of personal privacy.One of the most controversial applications of facial recognition in China is its use in the country's social credit system. The social credit system is a government-run initiative that aims to monitor and assess the behavior of Chinese citizens, with the goal of incentivizing "good" behavior and punishing "bad" behavior. Facial recognition is used to identify individuals and track their activities, which can then be used to assign them a social credit score. This score can have significant consequences, affecting an individual's access to various public services and opportunities.The use of facial recognition in China's social credit system has been widely criticized by human rights organizations and international observers. They argue that the system represents a significant threat to individual privacy and civil liberties, as it gives the government unprecedented power to monitor and control its citizens.Despite these concerns, the Chinese government has continued to invest heavily in the development and deployment of facial recognition technology. The country has become a global leader in this field, with Chinese companies such as Hikvision, Dahua, and SenseTime emerging as major players in the global facial recognition market.The rapid advancement of facial recognition technology in China has also raised concerns about the potential for abuse and the erosion ofindividual privacy. There are fears that the technology could be used to suppress dissent, target minority groups, and create a highly invasive surveillance state. Moreover, the lack of robust privacy protections and oversight mechanisms in China has exacerbated these concerns.In response to these concerns, the Chinese government has attempted to address some of the ethical issues surrounding the use of facial recognition. For example, the government has introduced regulations that require companies to obtain user consent before collecting and using facial recognition data. Additionally, the government has established guidelines for the ethical use of facial recognition technology, which include measures to protect individual privacy and prevent discrimination.However, critics argue that these measures are largely inadequate and that the Chinese government's commitment to protecting individual privacy is questionable. They point to the government's continued use of facial recognition for surveillance and social control purposes as evidence of its prioritization of national security over individual rights.In conclusion, the application of facial recognition technology in China is a complex and multifaceted issue. While the technology has brought about increased efficiency and convenience in variousaspects of daily life, it has also raised significant ethical concerns about the potential for abuse and the erosion of individual privacy. As China continues to push the boundaries of this technology, it will be crucial for the government to strike a delicate balance between national security and individual rights, and to implement robust safeguards and oversight mechanisms to ensure the ethical and responsible use of facial recognition technology.。
1.FaceRecognition 库简介
2.FaceRecognition 库的分类算法
3.FaceRecognition 库的分类算法的应用
【FaceRecognition 库简介】
FaceRecognition 库是一个开源的 Python 库,用于进行人脸识别和人脸分类任务。
FaceRecognition 库基于 dlib 库,使用 HOG 特征提取器和支持向量机(SVM)进行人脸分类。
【FaceRecognition 库的分类算法】
FaceRecognition 库使用的分类算法是支持向量机(SVM)。
在 FaceRecognition 库中,支持向量机用于对人脸图像进行分类,根据不同的特征将人脸图像分为不同的类别。
【FaceRecognition 库的分类算法的应用】
FaceRecognition 库的分类算法可以应用于多种场景,例如人脸识别门禁系统、人脸识别考勤系统、人脸识别抓拍系统等。
在这些系统中,FaceRecognition 库可以识别人脸图像,并根据预先训练的模型将人脸图像分为不同的类别,从而实现不同的功能。
FaceRecognition 库是一个功能强大的人脸识别库,其中使用的支持向量机分类算法可以实现对人脸图像的准确分类。
1. 亚洲人脸图像数据库亚洲人脸图像数据库是一个非常具有代表性的数据库,它可以为亚洲人面部识别技术的研究提供充足的样本数据。
2. 金融安全亚洲的金融系统非常发达,但是金融安全问题也一直是人们比较关心的问题。
3. 人脸支付人脸支付是亚洲人面部识别技术中的一个比较热门的应用场景,尤其是在中国市场上。
Computer Science and Application 计算机科学与应用, 2023, 13(3), 301-310 Published Online March 2023 in Hans. https:///journal/csa https:///10.12677/csa.2023.133029HSANet :混合型自我注意力网络识别 微整容人脸方法帕孜来提·努尔买提,古丽娜孜·艾力木江*伊犁师范大学网络安全与信息技术学院,新疆 伊宁收稿日期:2023年2月5日;录用日期:2023年3月3日;发布日期:2023年3月14日摘 要微整容给在日常生产中给人脸识别技术带来了新的挑战,因人脸特征变化较大导致对原人脸正确识别率较低,针对现象,该实验提出了一种混合型自我注意力块结构,用于识别面部特征变化的人脸,为此自制了26类微整容小样本图片数据集。
关键词卷积神经网络,残差网络,瓶颈块,自我注意力,混合型自我注意力网络HSANet: Hybrid Self-Attention Network Recognition Facial Micro Plastic MethodPazilaiti Nuermaiti, Gulinazi Ailimujiang *School of Network Security and Information Technology, Yili Normal University, Yining XinjiangReceived: Feb. 5th , 2023; accepted: Mar. 3rd , 2023; published: Mar. 14th, 2023AbstractDue to the large changes in facial features, the correct recognition rate of the original face is low. In view of the phenomenon, this experiment proposed a hybrid self-attention block structure for recognizing faces with facial features changes. For this reason, 26 kinds of micro-plastic surgery*通讯作者。
人工智能中的面部识别技术一、引言人工智能(Artificial Intelligence,AI)是当今世界最受关注和讨论的技术之一,其应用领域已涵盖许多行业。
三、面部识别技术的应用领域1. 安保领域面部识别技术在安保领域的应用最为广泛。
2. 金融领域面部识别技术在金融领域的应用主要体现在防范金融诈骗、身份验证、自动开户等方面。
3. 教育领域随着在线教育和远程教育的快速发展,如何保障学生的课堂纪律和考试作弊问题成为关注焦点。
4. 旅游领域在旅游领域,面部识别技术主要应用于旅游景点、机场、车站等地方的安全管理、人员流量统计以及个人行程管理等方面。
四、面部识别技术面临的挑战1. 数据集不足面部识别技术的训练需要大量的人脸图像,而现有的数据集往往不足以支持训练模型的准确性。
Face Recognition
2.LFWLabeled Faces in the Wild (户外脸部监测数据库)是人脸识别研究领域比较有名的人脸图像集合,其图像采集自Yahoo! News,共13233幅图像,其中5749个人,其中1680人有两幅及以上的图像,4069人只有一幅图像;大多数图像都是由Viola-Jones人脸检测器得到之后,被裁剪为固定大小,有少量的人为地从false positive 中得到。
3.FDDBFDDB全称Face Detection Data Set and Benchmark,是由马萨诸塞大学计算机系维护的一套公开数据库,为来自全世界的研究者提供一个标准的人脸检测评测平台,其中涵盖在自然环境下的各种姿态的人脸,作为全世界最具权威的人脸检测评测平台之一,FDDB使用Faces in the Wild数据库中的包含5171张人脸的2845张图片作为测试集,而其公布的评测集也代表了人脸检测的世界最高水平。
4.300-w人脸关键点定位5.FRVTFace Recognition Vendor Test人脸识别供应商测试,由美国国家标准技术研究所定制。
➢ 归一化
2 灰度化
将彩色图像转化为灰度图像旳过程是图像旳灰度化处理。 彩色图像中旳每个像素旳颜色由R,G,B三个分量决定,而每个分量中可取值0-255,像素 点旳颜色变化范围太大。而灰度图像是R,G,B三个分量相同旳一种特殊旳彩色图像,会大 大降低后续旳计算量。
人脸图像 . 预处理
预处理是人脸辨认过程中旳一种主要环节。输入图像因为采集环境旳不同, 可能收到光照,遮挡旳影响得到旳样图是有缺陷旳。
2 图像预处理
➢ 灰度化
➢ 几何变换
Artificial Intelligence && Face Recognition
人脸辨认是基于计算机图像处理技术和生物特征辨认技术,提取图像或视频中旳人像特征信息, 并将其与已知人脸进行比对,从而辨认每个人旳身份。它集成了人工智能、机器学习、模型理论、视 频图像处理等多样专业技术。
01 人脸辨认 . 应用
1 应用场景
目前主要是经过扫描或者复印身份证信 息,人工比对身份证照片。扫描或复印身份 证只是作为备案,并不能有效核实身份证真 伪。要确保是采用真实身份证办理业务,必 须有某种技术手段对办事人提供旳身份证进 行查验。
学校宿舍,刷脸进门 电商网站,刷脸支付
4 人脸辨认
文档下载后可定制修改,请根据实际需要进行调整和使用,谢谢!本店铺为大家提供各种类型的实用资料,如教育随笔、日记赏析、句子摘抄、古诗大全、经典美文、话题作文、工作总结、词语解析、文案摘录、其他资料等等,想了解不同资料格式和写法,敬请关注!Download tips: This document is carefully compiled by this editor. I hope that after you download it, it can help you solve practical problems. The document can be customized and modified after downloading, please adjust and use it according to actual needs, thank you! In addition, this shop provides you with various types of practical materials, such as educational essays, diary appreciation, sentence excerpts, ancient poems, classic articles, topic composition, work summary, word parsing, copy excerpts, other materials and so on, want to know different data formats and writing methods, please pay attention!人脸识别中多模态生物识别技术介绍1. 引言人脸识别技术作为一种重要的生物识别技术,在安防、金融、医疗等领域有着广泛的应用。
人脸识别的英文文献15篇英文回答:1. Title: A Survey on Face Recognition Algorithms.Abstract: Face recognition is a challenging task in computer vision due to variations in illumination, pose, expression, and occlusion. This survey provides a comprehensive overview of the state-of-the-art face recognition algorithms, including traditional methods like Eigenfaces and Fisherfaces, and deep learning-based methods such as Convolutional Neural Networks (CNNs).2. Title: Face Recognition using Deep Learning: A Literature Review.Abstract: Deep learning has revolutionized the field of face recognition, leading to significant improvements in accuracy and robustness. This literature review presents an in-depth analysis of various deep learning architecturesand techniques used for face recognition, highlighting their strengths and limitations.3. Title: Real-Time Face Recognition: A Comprehensive Review.Abstract: Real-time face recognition is essential for various applications such as surveillance, access control, and biometrics. This review surveys the recent advances in real-time face recognition algorithms, with a focus on computational efficiency, accuracy, and scalability.4. Title: Facial Expression Recognition: A Comprehensive Survey.Abstract: Facial expression recognition plays a significant role in human-computer interaction and emotion analysis. This survey presents a comprehensive overview of facial expression recognition techniques, including traditional approaches and deep learning-based methods.5. Title: Age Estimation from Facial Images: A Review.Abstract: Age estimation from facial images has applications in various fields, such as law enforcement, forensics, and healthcare. This review surveys the existing age estimation methods, including both supervised and unsupervised learning approaches.6. Title: Face Detection: A Literature Review.Abstract: Face detection is a fundamental task in computer vision, serving as a prerequisite for face recognition and other facial analysis applications. This review presents an overview of face detection techniques, from traditional methods to deep learning-based approaches.7. Title: Gender Classification from Facial Images: A Survey.Abstract: Gender classification from facial imagesis a widely studied problem with applications in gender-specific marketing, surveillance, and security. This surveyprovides an overview of gender classification methods, including both traditional and deep learning-based approaches.8. Title: Facial Keypoint Detection: A Comprehensive Review.Abstract: Facial keypoint detection is a crucialstep in face analysis, providing valuable information about facial structure. This review surveys facial keypoint detection methods, including traditional approaches anddeep learning-based algorithms.9. Title: Face Tracking: A Survey.Abstract: Face tracking is vital for real-time applications such as video surveillance and facial animation. This survey presents an overview of facetracking techniques, including both model-based andfeature-based approaches.10. Title: Facial Emotion Analysis: A Literature Review.Abstract: Facial emotion analysis has become increasingly important in various applications, including affective computing, human-computer interaction, and surveillance. This literature review provides a comprehensive overview of facial emotion analysis techniques, from traditional methods to deep learning-based approaches.11. Title: Deep Learning for Face Recognition: A Comprehensive Guide.Abstract: Deep learning has emerged as a powerful technique for face recognition, achieving state-of-the-art results. This guide provides a comprehensive overview of deep learning architectures and techniques used for face recognition, including Convolutional Neural Networks (CNNs) and Deep Residual Networks (ResNets).12. Title: Face Recognition with Transfer Learning: A Survey.Abstract: Transfer learning has become a popular technique for accelerating the training of deep learning models. This survey presents an overview of transferlearning approaches used for face recognition, highlighting their advantages and limitations.13. Title: Domain Adaptation for Face Recognition: A Comprehensive Review.Abstract: Domain adaptation is essential foradapting face recognition models to new domains withdifferent characteristics. This review surveys various domain adaptation techniques used for face recognition, including adversarial learning and self-supervised learning.14. Title: Privacy-Preserving Face Recognition: A Comprehensive Guide.Abstract: Privacy concerns have arisen with the widespread use of face recognition technology. This guide provides an overview of privacy-preserving face recognition techniques, including anonymization, encryption, anddifferential privacy.15. Title: The Ethical and Social Implications of Face Recognition Technology.Abstract: The use of face recognition technology has raised ethical and social concerns. This paper explores the potential risks and benefits of face recognition technology, and discusses the implications for society.中文回答:1. 题目,人脸识别算法综述。
Hybrid Deep Learning for Face Verification混合深度学习人脸验证
Hybrid Deep Learning for Face Verification Yi Sun1Xiaogang Wang2,3Xiaoou Tang1,31Department of Information Engineering,The Chinese University of Hong Kong2Department of Electronic Engineering,The Chinese University of Hong Kong3Shenzhen Institutes of Advanced Technology,Chinese Academy of Sciencessy011@.hk xgwang@.hk xtang@.hkAbstractThis paper proposes a hybrid convolutional network (ConvNet)-Restricted Boltzmann Machine(RBM)model for face verification in wild conditions.A key contribution of this work is to directly learn relational visual features, which indicate identity similarities,from raw pixels of face pairs with a hybrid deep network.The deep ConvNets in our model mimic the primary visual cortex to jointly extract local relational visual features from two face images compared with the learnedfilter pairs.These relational features are further processed through multiple layers to extract high-level and global features.Multiple groups of ConvNets are constructed in order to achieve robustness and characterize face similarities from different aspects. The top-layer RBM performs inference from complementary high-level features extracted from different ConvNet groups with a two-level average pooling hierarchy.The entire hybrid deep network is jointlyfine-tuned to optimize for the task of face verification.Our model achieves competitive face verification performance on the LFW dataset.1.IntroductionFace recognition has been extensively studied in recent decades[29,28,30,1,16,5,33,12,6,3,7,25,34]. This paper addresses the key challenge of computing the similarity of two face images given their large intra-personal variations in poses,illuminations,expressions, ages,makeups,and occlusions.It becomes more difficult when faces to be compared are acquired in the wild. We focus on the task of face verification,which aims to determine whether two face images belong to the same identity.Existing methods generally address the problem in two steps:feature extraction and recognition.In the feature extraction stage,a variety of hand-crafted features are used [10,22,20,6].Although some learning-based feature ex-traction approaches are proposed,their optimizationtargetsFigure1:The hybrid ConvNet-RBM model.Solid and hol-low arrows show forward and back propagation directions.are not directly related to face identity[5,13].There-fore,the features extracted encode intra-personal variations.More importantly,existing approaches extract features from each image separately and compare them at later stages [8,16,3,4].Some important correlations between the two compared images have been lost at the feature extraction stage.At the recognition stage,classifiers such as SVM are used to classify two face images as having the same identity or not[5,24,13],or other models are employed to compute the similarities of two face images[10,22,12,6,7,25].The purpose of these models is to separate inter-personal variations and intra-personal variations.However,all of these models have been shown to have shallow structures[2].To handle large-scale data with complex distributions,large amount of over-completed features may need to be ex-tracted from the face[12,7,25].Moreover,since the feature extraction stage and the recognition stage are separate,they cannot be jointly optimized.Once useful information is lost 1in feature extraction,it cannot be recovered in recognition. On the other hand,without the guidance of recognition,the best way to design feature descriptors to capture identity information is not clear.All of the issues discussed above motivate us to learn a hybrid deep network to compute face similarities.A high-level illustration of our model is shown in Figure1.Our model has several unique features,as outlined below.(1)It directly learns visual features from raw pixel-s under the supervision of face identities.Instead of extracting features from each face image separately,the model jointly extracts relational visual features from two face images in comparison.In our model,such relational features arefirst locally extracted with the automatically learnedfilter pairs(pairs offilters convolving with the two face images respectively as shown in Figure1),and then further processed through multiple layers of the deep convolutional networks(ConvNets)to extract high-level and global features.The extracted features are effective for computing the identity similarities of face images.(2)Considering the regular structures of faces,the deep ConvNets in our model locally share weights in higher convolutional layers,such that different mid-or high-level features are extracted from different face regions,which is contrary to conventional ConvNet structures[18],and can greatly improve theirfitting and generalization capabilities.(3)The deep and wide architecture of our hybrid network can handle large-scale face data with complex distributions. The deep ConvNets in our network have four convolutional layers(followed by max-pooling)and two fully-connected layers.In addition,multiple groups of ConvNets are constructed to achieve good robustness and characterize face similarities from different aspects.Predictions from multiple ConvNet groups are pooled hierarchically and then associated by the top-layer RBM for thefinal inference.(4)The feature extraction and recognition stages are unified under a single network architecture.The parameters of the entire pipeline(weights and biases in all the layers) are jointly optimized for the target of face verification. 2.Related workAll existing methods for face verification start by extract-ing features from two faces in comparison separately.A variety of low-level features are commonly used[27,10, 22,33,20,6],including the hand-crafted features like LBP [23]and its variants[32],SIFT[21],Gabor[31]and the learned LE features[5].Some methods generated mid-level features[24,13]with variants of convolutional deep belief networks(CDBN)[19]or ConvNets[18].They are not learned with the supervision of identity matching. Thus variations other than identity are encoded in the features,such as poses,illumination,and expressions, which constitute the main impediment to face recognition.Many face recognition models are shallow structures, and need high-dimensional over-completed feature repre-sentations to learn the complex mappings from pairs of noisy features to face similarities[12,7,25];otherwise, the models may suffer from inferior performance.Many methods[5,24,13]used linear SVM to make the same-or-different verification decisions.Li et al.[20]and Chen et al.[6,7]factorized the face images as identity variations plus variations within the same identity,and assumed each factor as a Gaussian distribution for closed form solutions. Huang et al.[12]and Simonyan et al.[25]learns linear transformations via metric learning.Some methods further learn high-level features based on low-level hand-crafted features[16,3,4].They are outputs of classifiers that are trained to distinguish faces of different people.All these methods extract features from a single face separately,and the comparison of two face images are deferred in the later recognition stage.Some identity information may have been lost in the feature extraction stage,and it cannot be retrieved in the recognition stage, since the two stages are separated in the existing methods. To avoid the potential information loss and make a reliable decision,a large amount of high-level feature extractors may need to be trained[3,4].There are a few methods that also used deep models for face verification[8,24,13],but extracted features independently from each face.Thus relations between the two faces are not modeled at their feature extraction stages. In[34],face images under various poses and lighting conditions were transformed to a canonical view with a convolutional neural network.Then features are extracted from the transformed images.In contrast,we deal with face pairs directly by extracting relational visual features from the two compared faces.The top layer RBM in our model is similar to that of the deep belief net(DBN)proposed by Hinton and Osindero[11].However,we use ConvNets instead of stack of RBMs in the lower layers to take the local correlation in images into consideration.Averaging the results of multiple ConvNets has been shown to be an effective way of improving performance[9,15],while we will show that our hybrid structure is significantly better than the simple averaging scheme.Moreover,unlike most existing face recognition pipelines,in which each stage is optimized independently,our hybrid ConvNet-RBM model is jointly optimized after pre-training each part separately, which further enhances its performance.3.The hybrid ConvNet-RBM model3.1.Architecture overviewWe detect the two eye centers and mouth center with the facial point detection method proposed by Sun et al.[26]. Faces are aligned by similarity transformation according toFigure2:Architecture of the hybrid ConvNet-RBM model. Neuron(or feature)number is marked beside each layer. Figure3:The structure of one ConvNet.The map numbers and dimensions of the input layer and all the convolutional and max-pooling layers are illustrated as the length,width, and height of cuboids.The3D convolution kernel sizes of the convolutional layers and the pooling region sizes of the max-pooling layers are shown as the small cuboids and squares inside the large cuboids of maps respectively. Neuron numbers of other layers are marked beside each layer.the three points.Figure2is an overview of our hybrid ConvNet-RBM model,which is a cascade of deep ConvNet groups,two levels of average pooling,and Classification RBM.The lower part of our hybrid model contains12groups, each of which containsfive ConvNets.Figure3shows the structure of one ConvNet.Each ConvNet takes a pair of aligned face regions as input.Its four convolutional layers (followed by max-pooling)extract the relational features hierarchically.Finally,the extracted features pass a fully connected layer and are fully connected to a single neuron in layer L0(shown in Figure2),which indicates whether the two regions belong to the same person.The input region pairs for ConvNets in different groups differ in terms of region ranges and color channels(shown in Figure4) to make their predictions complementary.When the size of the input regions changes in different groups,the map sizes in the following layers of the ConvNets will change accordingly.Although ConvNets in the same group take the same kind of region pair as input,they are different in that they are trained with different bootstraps of the training data(Section4.1).Each input region pair generates eight modes by exchanging the two regions and horizontally flipping each region(shown in Figure5).When the eight modes(shown as M1-M8in Figure2)are input to thesame Figure4:Twelve face regions used in our network.P1-P4are global regions covering the whole face,of size39×31.P1and P2(P3and P4)differ slightly in the ranges of regions.P5-P12are local regions covering different face parts,of size31×47.P1,P2,and P5-P8are in color.P3, P4,and P9-P12are in grayvalues.Figure5:8possible modes for a pair of face regions. ConvNet,eight outputs are yer L0contains the outputs of all the5×12ConvNets and therefore has 8×5×12neurons.The purpose of bootstrapping and data augmentation is to achieve robustness of predictions.The group prediction is given by two levels of average pooling of ConvNet yer L1(with5×12 neurons)is formed by averaging the eight predictions of the same ConvNet from eight different input yer L2 (with12neurons)is formed by averaging thefive neurons in L1associated with the same group.The prediction variance is greatly reduced after average pooling.The top layer of our model in Figure2is a Classification RBM[17].It merges the12group outputs in L2to give thefinal prediction.The RBM has two outputs that indicate the probability distribution over the two classes; that is,whether they are the same person.The large number of deep ConvNets means that our model has a high capacity.Directly optimizing the whole network would lead to severe over-fitting.Therefore,wefirst train each ConvNet separately.Then,byfixing all the ConvNets,the RBM is trained.All the ConvNets and the RBM are trained under supervision with the aim of predicting whether two faces in comparison belong to the same person.These two steps initialize the model to be near a good local minimum.Finally,the whole network isfine-tuned by back-propagating errors from the top-layer RBM to all the lower-layer ConvNets.3.2.Deep ConvNetsA pair of gray regions forms two input maps of a ConvNet(Figure5),while a pair of color regions forms sixinput maps,replacing each gray map with three maps from RGB channels.The input regions are stacked into multiple maps instead of being concatenated to form one map,which enables the ConvNet to model the relations between the two regions from the first convolutional stage.Our deep ConvNets contain four convolutional layers (followed by max-pooling).The operation in each convo-lutional layer can be expressed asy r j =max 0,b r j +ik r ij ∗x r i,(1)where ∗denotes convolution,x i and y j are the i -th inputmap and the j -th output map respectively,k ij is the convolution kernel (filter)connecting the i -th input map and the j -th output map,and b j is the bias for the j -th output map.max (0,·)is the non-linear activation function,and is operated element-wise.Neurons with such non-linearities are called rectified linear units [15].Moreover,weights of neurons (including convolution kernels and biases)in the same map in higher convolutional layers are locally shared.r indicates a local region where weights are shared.Since faces are structured objects,locally sharing weights in higher layers allows the network to learn different high-level features at different locations.We find that sharing in this way can significantly improve the fitting and generalization abilities of the network.The idea of locally sharing weights was proposed by Huang et al .[13].However,their model is much shallower than ours and the gained improvement is small.Since each stage extracts features from all the maps in the previous stage,relations between the two face regions are modeled;see Figure 6for examples.As the network goes deeper,more global and higher-level relations between the two regions are modeled.These high-level relational features make it possible for the top layer neurons in ConvNets to predict the high-level concept of whether the two input regions come from the same person.The networkoutput is a two-way softmax,y i =exp(x i )2j =1exp(x j)for i =1,2,where x i is the total input to an output neuron i ,and y i is its output.It represents a probability distribution over the two classes (being the same person or not).Such a probability distribution makes it valid to directly average multiple ConvNet outputs without scaling.The ConvNets are trained by minimizing −log y t ,where t ∈{1,2}denotes the target class.The loss is minimized by stochastic gradient descent,where the gradient is calculated by back-propagation.3.3.Classification RBMClassification RBM models the joint distribution be-tween its output neurons y (one out of C classes),input neurons x (binary),and hidden neurons h (binary),asFigure 6:Examples of the learned 4×4filter pairs of the first convolutional layer of ConvNets taking color (line 1)and gray (line 2)input region pairs,respectively.The upper and lower filters in each pair convolve with the two face regions in comparison,respectively,and the results are added.For filter pairs in which one filter varies greatly while the other remains near uniform (column 1,2),features are extracted from the two input regions separately.For those pairs in which both filters vary greatly,some kind of relations between the two input regions are extracted.Among the latter,some pairs extract simple relations such as addition (column 5)or subtraction (column 6),while others extract more complex relations (column 6,7).Interestingly,we find that filters in some filter pairs are nearly the same as those in some others,except that the order of the two filters are inversed (columns 1-4).This makes sense since face similarities should be invariant with the order of the two face regions in comparison.p (y,x,h )∝e −E (y,x,h ),where E (y,x,h )=−h W x −h Uy −b x −c h −d y .Given input x ,the conditional probability of its output y can be explicitly expressed asp (y c |x )=e d c j1+e c j +U jc + k W jk x ki e d i j 1+e c j +U ji + k W jk x k ,(2)where c indicates the c -th class.We discriminatively trainthe Classification RBM by minimizing the negative log probability of the target class t given input x ;that is,minimizing −log p (y t |x ).The target can be optimizedby computing the exact gradient −∂log p (y t |x )∂θ,where θ∈{W,U,b,c,d }are RBM parameters to be learned.3.4.Fine-tuning the entire networkLet N and M be the number of groups and the numberof ConvNets in each group,respectively,and C nm (·)be the input-output mapping for the m -th ConvNet in the n -th group.Since the two outputs of the ConvNet represent a probability distribution (summed to 1),when one output is known,the other output contains no additional information.So the hybrid model (and the mapping)only keeps the firstoutput from the ConvNet.Let {I n k }Kk =1be the K possible input modes formed by a pair of face regions of group n .Then the n-th ConvNet group prediction can be expressed asx n=1MMm=11KKk=1C n m(I n k),(3)where the inner and outer sums are over different in-put modes(level1pooling)and different ConvNets (level2pooling),respectively.Given the N group predictions{x n}N n=1,thefinal prediction by RBM is max c∈{1,2}{p(y c|x)},where p(y c|x)is defined in Eq.(2).After separately training each ConvNet and the RBM to derive a good initialization,error is back-propagated from the RBM to all groups of ConvNets and the whole model is fine-tuned.Let L(x)=−log p(y t|x)be the RBM loss function,andαn m be the parameters for the m-th ConvNet in the n-th group.The gradient of the loss w.r.t.αn m is∂L ∂αn m =∂L∂x n∂x n∂αn m=1MK∂L∂x nKk=1∂C n m(I nk)∂αn m.(4)∂L∂x ncan be calculated by the closed form expression ofp(y t|x)(Eq.(2)),and∂C n m(I n k)∂αnm can be calculated usingthe back-propagation algorithm in the ConvNet.4.ExperimentsWe evaluate our algorithm on LFW[14],which has been used extensively to evaluate algorithms of face verification in the wild.We conduct evaluation under two different settings:(1)10-fold cross validation under the unrestricted protocol of LFW without using extra data to train the model,and(2)cross-dataset validation in which external data exclusive to LFW is used for training.The former shows the performance with a limited amount of training data,while the latter shows the generalization ability across different datasets.Section4.1explains the experimental settings in detail,section4.2validates various aspects of model design,and section4.3compares our results with state-of-art results in literature.4.1.Experiment settingsLFW is divided into10folds of mutually exclusive people sets.For the unrestricted setting,performance is evaluated using the10-fold cross-validation.Each time one fold is used for testing and the other nine for training. Results averaged over the10folds are reported.The600 testing pairs in each fold are predefined by LFW andfixed, whereas training pairs can be generated using the identity information in the other nine folds and the number is not limited.This is referred as the LFW training settings.For the cross-dataset setting,we use outside data ex-clusive to LFW for training.PubFig[16]and WDRef[6] are two large datasets other than LFW with faces in the wild.However,PubFig only contains200people,thus the identity variation is quite limited,while the images in WDRef are not publicly available.Accordingly,we created a new dataset,called the Celebrity Faces dataset (CelebFaces).It contains87,628face images of5,436 celebrities from the web,and was assembled byfirst collecting the celebrity names that do not exist in LFW to avoid any overlap,then searching for the face images for each name on the web.To conduct cross-dataset testing,the model is trained on CelebFaces and tested on the predefined 6,000test pairs in LFW.We will refer to this setting as the CelebFaces training settings.For both settings,we randomly choose80%people from the training data to train the deep ConvNets,and use the remaining20%people to train the top-layer RBM and fine-tune the entire model.The positive training pairs are randomly formed such that on average each face image appears in k=6(3)positive pairs for LFW(CelebFaces) dataset,unless a person does not have enough training im-ages.Given afixed number of training images,generating more training pairs provides minimal assistance.Negative training pairs are also randomly generated and their number is the same as the number of positive training pairs.In this way,we generate approximately40,000(240,000)training pairs for the ConvNets and8,000(50,000)training pairs for the RBM andfine-tuning for LFW(CelebFaces)training dataset.This random process for generating training data is repeated for each ConvNet so that multiple different ConvNets are trained in each group.A separate validation dataset is needed during training to avoid overfitting.After each training epoch1,we observe the errors on the validation dataset and select the model that provides the lowest validation error.We randomly select100people from the training people to generate the validation data.The free parameters in training(the learning rate and its decreasing rate)are selected using view 1of LFW2and arefixed in all the experiments.We report both the average accuracy and the ROC curve.The average accuracy is defined as the percentage of correctly classified face pairs.We assign each face pair to the class with higher probabilities without further learning a threshold for the final classification.4.2.Investigation on model designLocal weight sharing.Our ConvNets locally share weights in the last two convolutional layers.In the second last convolutional layer,maps are evenly divided into 2×2regions,and weights are shared among neurons in each region.In the last convolutional layer,weights are independent for each neuron.We compare our ConvNets 1One training epoch is a single pass of all the training samples.2View1is provided by LFW for algorithm development and parameter selecting without over-fitting the test data.[14].Figure7:Average training set failure rates with respect to the number of training epochs for ConvNets in group P1 with the local(S1)or global(S2)weight-sharing schemes for the LFW and CelebFaces training settings.L0(%)L1(%)L2(%) S1for LFW84.7886.5488.78S2for LFW83.5485.2886.78S1for CelebFaces87.7188.7189.60S2for CelebFaces85.6586.6187.72 Table1:Average testing accuracies for ConvNets in group P1with the local(S1)or global(S2)weight sharing schemes for the LFW and CelebFaces training settings.L0 -L2refer to the three layers shown in Figure2.L2is the final group predictions.(refer to as S1)with the conventional ConvNets(refer to as S2),where weights in all the convolutional layers are globally shared,on both training errors and test accuracies. Figure7and Table1show the betterfitting and generaliza-tion abilities of our ConvNets(S1),where locally sharing weights improved the group P1(we will refer to each group as the type of regions used(Figure4))prediction accuracies by approximately2%for both the LFW and CelebFaces training settings.The same conclusion holds for ConvNets in other groups.Two-level average pooling in ConvNet groups.The ConvNet group predictions are derived from two levels of average pooling as described in Section3.1.Figure8 shows that the performance is consistently improved after each level of average pooling(from L0to L2)under the LFW training settings.The accuracy increases over3% on average after the two levels of pooling(L2compared to L0).The same conclusion holds for the CelebFaces training settings.Complementarity of group predictions.We validate that the pooled group predictions are complementary.Given the12group predictions(referred as features),we employ a greedy feature selection algorithm.Each time,a feature is added to the feature set,in such a way that the RBM trained on these features provides the highest accuracy on the validation set.The increase of the RBM prediction accuracies would indicate that complementary information Figure8:ConvNet prediction accuracies for each group averaged over the10-fold LFW training settings.L0-L2 refer to the three layers shown in Figure2.Figure9:Average RBM prediction accuracies with respect to the number of features selected for the LFW and CelebFaces training settings.The accuracy is consistently improved with the increase of feature numbers.is contained in the added features.In this experiment,the ConvNets are pre-trained and their weights arefixed with-out jointlyfine-tuning the whole network.The experiment is repeatedfive times,with the training samples for the RBM randomly generated each time.The averaged test results are reported.Figure9shows that performance is consistently improved when more features are added.So all the group predictions contain additional information.Top-layer RBM andfine-tuning.Since different groups observe different kinds of regions,each group may be good at judging particular kinds of face pairs differently. Continuing to average group predictions may smooth out the patterns in different group predictions.Instead,we let the top-layer RBM in our model learn such patterns. Then the whole model isfine-tuned to jointly optimize all the parts.Moreover,wefind that the performance can be further enhanced by averagingfive different hybrid ConvNet-RBM models.This is achieved byfirst training five RBMs(each with a different set of randomly generated training data)with the weights of ConvNets pre-trained and fixed,and thenfine-tuning each of the whole ConvNet-RBM network separately.The results are summarized in Table2.Interestingly,though directly averaging the 12group predictions(group averaging)is suboptimal,itLFW(%)CelebFaces(%) Best single group88.7889.70Group averaging89.9790.18RBMfix90.9391.26Fine-tuning91.3892.23Model averaging91.7592.52Table2:Accuracies of the best prediction results with a single group(best single group),directly averaging the group predictions(group averaging),training a top layer RBM whilefixing the weights of ConvNets(RBMfix),fine-tuning the whole hybrid ConvNet-RBM model(fine-tuning),and averaging the predictions of thefive hybrid ConvNet-RBM models(model averaging),for LFW and CelebFaces training settings respectively.still improves the best prediction results of a single group (best single group).We achieved our best results with the averaging offive hybrid ConvNet-RBM model predictions (model averaging).4.3.Method comparisonWe compare our best results on LFW with the state-of-the-art methods in accuracies(Table3and4)and ROC curves(Figure10and11)respectively.Table3and Figure10are comparisons of methods that follow the LFW unrestricted protocol without using outside data to train the model.Table4and Figure11report the results when the training data outside LFW is allowed to use.Methods marked with*are published after the submission of this paper.Our ConvNet-RBM model achieves the third best performance in both settings.Although Tom-vs-Pete[3], high-dim LBP[7],and Fisher vector faces[25]have better accuracy than our method,there are two important factors to be considered.First,all the three methods used stronger alignment than ours:95points in[3],27points in[7],and9 points in[25],while we only use three points for alignment. Berg and Belhumeur[3]reported90.47%accuracy with three point(the eyes and mouth)alignment.Chen et al.[7]reported6%∼7%accuracy drop if usefive point alignment and single scale patches.Second,all the three methods used hand-crafted features(SIFT or LBP)as their base features,while we learn features from raw pixels.The base features used in[7]and[25]are densely sampled on landmarks or grids with many different scales and the dimension is particularly high(100K LBP features in[7] and1.7M SIFT features in[25]).5.ConclusionThis paper has proposed a new hybrid ConvNet-RBM model for face verification.The model learns directly and jointly extracts relational visual features from face pairs under the supervision of face identities.Both feature extrac-Method Accuracy(%)PLDA[20]90.07Joint Bayesian[6]90.90Fisher vector faces[25]*93.03High-dim LBP[7]*93.18ConvNet-RBM91.75Table3:Accuracy comparison of our hybrid ConvNet-RBM model and the state-of-the-art methods under the LFW unrestricted protocol.Method Accuracy(%)Associate-predict[33]90.57Joint Bayesian[6]92.4Tom-vs-Pete classifiers[3]93.30High-dim LBP[7]*95.17ConvNet-RBM92.52Table4:Accuracy comparison of our hybrid ConvNet-RBM model and the state-of-the-art methods that rely on outside training data.Figure10:ROC comparison of our hybrid ConvNet-RBM model and the state-of-the-art methods under the LFW unrestricted protocol.tion and recognition stages are unified under a single deep network architecture and all the components are jointly optimized for the target of face verification.It achieved competitive face verification performance on LFW.6.AcknowledgementThis work is supported by the General Research Fund sponsored by the Research Grants Council of the Kong Kong SAR(Project No.CUHK416312and CUHK 416510)and Guangdong Innovative Research Team Pro-gram(No.201001D010*******).References[1]T.Ahonen,A.Hadid,and M.Pietikainen.Face descriptionwith local binary patterns:Application to face recognition.。
面部识别在中国的应用英语作文Facial recognition technology, a cutting-edge biometric technology, has been experiencing rapid development and widespread application in China. Leveraging advances in artificial intelligence and machine learning, this technology has become an integral part of daily life,革命izing various industries and sectors.In the realm of security, facial recognition has become a powerful tool in the hands of law enforcement agencies. Police forces across the country are using this technologyto identify criminal suspects, track fugitives, and monitor public places for suspicious activities. This not only enhances the efficiency of law enforcement but alsoimproves public safety.The retail industry has also been revolutionized by facial recognition. Stores are now able to recognize their customers and provide personalized shopping experiences. This technology can identify a customer's preferences and buying habits, enabling retailers to offer targeted discounts and recommendations. Furthermore, it can alsohelp in preventing shoplifting by identifying known thieves.Financial institutions have also embraced facial recognition technology. Banks and other financialinstitutions are using this technology to authenticate customers and prevent fraud. By comparing a customer's face with their stored biometric data, these institutions can ensure that only the rightful owner can access their accounts.In addition to these industries, facial recognition technology is also finding its way into our daily lives. Smartphones and other electronic devices now come withfacial unlock features, making it easier and moreconvenient for users to unlock their devices. This technology is also being used in airports, railway stations, and other public places to facilitate fast and efficient check-in and identification processes.Despite its widespread application, facial recognition technology in China has also raised concerns regarding privacy and ethical issues. There have been reports of misuse of this technology, such as the unauthorized collection and sale of biometric data. To address these concerns, the Chinese government has been working onregulating the use of facial recognition technology, ensuring that it is used ethically and within legal limits. In conclusion, facial recognition technology has brought about significant changes in China, revolutionizing various industries and enhancing public safety. However, it is crucial to address the privacy and ethical issues associated with this technology to ensure its responsible and sustainable use.**面部识别在中国的应用**面部识别技术,作为前沿的生物识别技术,在中国经历了快速发展和广泛应用。
1.人脸检测:Face_recognition库使用HOG(Histogram of Oriented Gradients,方向梯度直方图)算法进行人脸检测。
为了实现人脸对齐,face_recognition库使用了dlib库中的正交距离变换(Orthogonal Procrustes Analysis)算法。
3.人脸特征提取:Face_recognition库基于深度学习模型的思想,使用预训练的卷积神经网络(Convolutional Neural Networks,CNN)来提取人脸特征。
该模型可以将人脸图像映射到一个128维的特征向量,这个特征向量被称为人脸嵌入(face embedding)或人脸特征向量。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Face Recognition:A Hybrid Neural Network Approach Steve Lawrence,C.Lee Giles,Ah Chung Tsoi,Andrew D.Back,NEC Research Institute,4Independence Way,Princeton,NJ08540 Electrical and Computer Engineering,University of Queensland,St.Lucia,AustraliaTechnical ReportUMIACS-TR-96-16and CS-TR-3608Institute for Advanced Computer StudiesUniversity of MarylandCollege Park,MD20742AbstractFaces represent complex,multidimensional,meaningful visual stimuli and developing a computa-tional model for face recognition is difficult[42].We present a hybrid neural network solution which compares favorably with other methods.The system combines local image sampling,a self-organizing map neural network,and a convolutional neural network.The self-organizing map provides a quanti-zation of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space,thereby providing dimensionality reduction and invariance to mi-nor changes in the image sample,and the convolutional neural network provides for partial invariance to translation,rotation,scale,and deformation.The convolutional network extracts successively larger features in a hierarchical set of layers.We present results using the Karhunen-Lo`e ve transform in place of the self-organizing map,and a multi-layer perceptron in place of the convolutional network.The Karhunen-Lo`e ve transform performs almost as well(5.3%error versus3.8%).The multi-layer per-ceptron performs very poorly(40%error versus3.8%).The method is capable of rapid classification, requires only fast,approximate normalization and preprocessing,and consistently exhibits better clas-sification performance than the eigenfaces approach[42]on the database considered as the number of images per person in the training database is varied from1to5.With5images per person the proposed method and eigenfaces result in3.8%and10.5%error respectively.The recognizer provides a measure of confidence in its output and classification error approaches zero when rejecting as few as10%of the examples.We use a database of400images of40individuals which contains quite a high degree of variability in expression,pose,and facial details.We analyze computational complexity and discuss how new classes could be added to the trained recognizer.Keywords:Convolutional Networks,Hybrid Systems,Face Recognition,Self-Organizing Map1IntroductionThe requirement for reliable personal identification in computerized access control has resulted in an in-creased interest in biometrics1.Biometrics being investigated includefingerprints[4],speech[7],signature dynamics[36],and face recognition[8].Sales of identity verification products exceed$100million[29]. Face recognition has the benefit of being a passive,non-intrusive system for verifying personal identity.The techniques used in the best face recognition systems may depend on the application of the system.We can identify at least two broad categories of face recognition systems:1.We want tofind a person within a large database of faces(eg.in a police database).These systemstypically return a list of the most likely people in the database[34].Often only one image is available per person.It is usually not necessary for recognition to be done in real-time.2.We want to identify particular people in real-time(eg.in a security monitoring system,locationtracking system,etc.),or we want to allow access to a group of people and deny access to all others (eg.access to a building,computer,etc.)[8].Multiple images per person are often available for training and real-time recognition is required.In this paper,we are primarily interested in the second case2.We are interested in recognition with varying facial detail,expression,pose,etc.We do not consider invariance to high degrees of rotation or scaling-we assume that a minimal preprocessing stage is available if required.We are interested in rapid classification and hence we do not assume that time is available for extensive preprocessing and normalization.Good algorithms for locating faces in images can be found in[42,40,37].The remainder of this paper is organized as follows.The data we used is presented in section2and related work with this and other databases is discussed in section3.The components and details of our system are described in sections4and5respectively.We present and discuss our results in sections6and7. Computational complexity is considered in section8and we draw conclusions in section10.2DataWe have used the ORL database which contains a set of faces taken between April1992and April1994at the Olivetti Research Laboratory in Cambridge,UK3.There are10different images of40distinct subjects. For some of the subjects,the images were taken at different times.There are variations in facial expression (open/closed eyes,smiling/non-smiling),and facial details(glasses/no glasses).All the images were taken against a dark homogeneous background with the subjects in an up-right,frontal position,with tolerance for some tilting and rotation of up to about20degrees.There is some variation in scale of up to about10%. Thumbnails of all of the images are shown infigure1and a larger set of images for one subject is shown in figure2.The images are greyscale with a resolution of92x112.Figure1:The ORL face database.There are10images each of the40subjects.Figure2:The set of10images for one subject.Considerable variation can be seen.3Related Work3.1Geometrical FeaturesMany people have explored geometrical feature based methods for face recognition.Kanade[18]presented an automatic feature extraction method based on ratios of distances and reported a recognition rate of be-tween45-75%with a database of20people.Brunelli and Poggio[6]compute a set of geometrical features such as nose width and length,mouth position,and chin shape.They report a90%recognition rate on a database of47people.However,they show that a simple template matching scheme provides100%recog-nition for the same database.Cox et al.[9]have recently introduced a mixture-distance technique which achieves a recognition rate of95%using a query database of95images from a total of685individuals.Each face is represented by30manually extracted distances.Systems which employ precisely measured distances between features may be most useful forfinding pos-sible matches in a large mugshot database4.For other applications,automatic identification of these points would be required,and the resulting system would be dependent on the accuracy of the feature location algorithm.Current algorithms for automatic location of feature points do not consistently provide a high degree of accuracy[41].3.2EigenfacesHigh-level recognition tasks are typically modeled with many stages of processing as in the Marr paradigm of progressing from images to surfaces to three-dimensional models to matched models[28].However, Turk and Pentland[42]argue that it is likely that there is also a recognition process based on low-level,two-dimensional image processing.Their argument is based on the early development and extreme rapidity of face recognition in humans,and on physiological experiments in monkey cortex which claim to have isolated neurons that respond selectively to faces[35].However,it is not clear that these experiments exclude the sole operation of the Marr paradigm.Turk and Pentland[42]present a face recognition scheme in which face images are projected onto the princi-pal components of the original set of training images.The resulting eigenfaces are classified by comparison with known individuals.The linear principle components technique assumes that the faces lie in a lower dimensional space,and hence the sum or average of two faces should also be a face.Clearly this is not true when principal components is applied to an entire face[17].Turk and Pentland present results on a database of16subjects with various head orientation,scaling,and lighting.Their images appear identical otherwise with little variation in facial expression,facial details, pose,etc.For lighting,orientation,and scale variation their system achieves96%,85%and64%correct classification respectively.Scale is renormalized to the eigenface size based on an estimate of the head size. The middle of the faces is accentuated,reducing any negative affect of changing hairstyle and backgrounds. In Pentland et al.[34,33]good results are reported on a large database(95%recognition of200people from a database of3,000).It is difficult to draw broad conclusions as many of the images of the same people look very similar,and the database has accurate registration and alignment[30].In Moghaddam and Pentland [30],very good results are reported with the FERET database-only one mistake was made in classifying 150frontal view images.The system used extensive preprocessing for head location,feature detection,andnormalization for the geometry of the face,translation,lighting,contrast,rotation,and scale.In summary,it appears that eigenfaces is a fast,simple,and practical algorithm that may be limited due to the requirement that there is a high degree of correlation between the pixel intensities of the training and test images.This limitation has been addressed by using extensive preprocessing to normalize the images.3.3Template MatchingTemplate matching methods such as[6]operate by performing direct correlation of image segments.Tem-plate matching is only effective when the query images have the same scale,orientation,and illumination as the training images[9].3.4Neural Network ApproachesMuch of the present literature on face recognition with neural networks presents results with only a small number of classes(often below20).For example,in[10]thefirst50principal components of the images are extracted and reduced to5dimensions using an autoassociative neural network.The resulting representation is classified using a standard multi-layer perceptron.Good results are reported but the database is quite simple:the pictures are manually aligned and there is no lighting variation,rotation,or tilting.There are20 people in the database.3.5The ORL DatabaseIn[38]a HMM-based approach is used for classification of the ORL database images.The best model resulted in a13%error rate.Samaria also performed extensive tests using the popular eigenfaces algorithm [42]on the ORL database and reported a best error rate of around10%when the number of eigenfaces was between175and199.We implemented the eigenfaces algorithm and also observed around10%error. In[39]Samaria extends the top-down HMM of[38]with pseudo two-dimensional HMMs.The error rate reduces to5%at the expense of high computational complexity-a single classification takes four minutes on a Sun Sparc II.Samaria notes that although an increased recognition rate was achieved the segmentation obtained with the pseudo two-dimensional HMMs appeared quite erratic.Samaria uses the same training and test set sizes as we do(200training images and200test images with no overlap between the two sets). The5%error rate is the best error rate previously reported for the ORL database that we are aware of.4System Components4.1OverviewIn the following sections we introduce the techniques which form the components of our system and describe our motivation for using them.Briefly,we explore the use of local image sampling and a technique for partial lighting invariance,a self-organizing map(SOM)for projection of the texture representation into a quantized lower dimensional space,the Karhunen-Lo`e ve(KL)transform for comparison with the self-organizing map,a convolutional network(CN)for partial translation and deformation invariance,and a multi-layer perceptron(MLP)for comparison with the convolutional network.4.2Local Image SamplingWe have evaluated two different methods of representing local image samples.In each method a window is scanned over the image as shown infigure3.1.Thefirst method simply creates a vector from a local window on the image using the intensity valuesat each point in the window.Let be the intensity at the th column,and the th row of the given image.If the local window is a square of sides long,centered on,then the vector associated with this window is simply.2.The second method creates a representation of the local sample by forming a vector out of a)theintensity of the center pixel,and b)the difference in intensity between the center pixel and all other pixels within the square window.The vector is given by.The resulting representation becomes partially invariant to variations in intensity of the complete sample.The degree of invariance can be modified by adjusting the weight connected to the central intensity component.Figure3:A depiction of the local image sampling process.A window is stepped over the image and a vector is created at each location.4.3The Self-Organizing Map4.3.1IntroductionMaps are an important part of both natural and artificial neural information processing systems[2].Ex-amples of maps in the nervous system are retinotopic maps in the visual cortex[32],tonotopic maps in the auditory cortex[19],and maps from the skin onto the somatosensoric cortex[31].The self-organizing map,or SOM,introduced by Teuvo Kohonen[21,20]is an unsupervised learning process which learns the distribution of a set of patterns without any class information.A pattern is projected from an input space to a position in the map-information is coded as the location of an activated node.The SOM is unlike most classification or clustering techniques in that it provides a topological ordering of the classes.Similarity ininput patterns is preserved in the output of the process.The topological preservation of the SOM process makes it especially useful in the classification of data which includes a large number of classes.In the local image sample classification,for example,there may be a very large number of classes in which the transition from one class to the next is practically continuous(making it difficult to define hard class boundaries). 4.3.2AlgorithmWe give a brief description of the SOM algorithm,for more details see[21].The SOM defines a mapping from an input space onto a topologically ordered set of nodes,usually in a lower dimensional space. An example of a two-dimensional SOM is shown infigure4.A reference vector in the input space,,is assigned to each node in the SOM.During training,each input,,is compared to all of the,obtaining the location of the closest match().The input point is mapped to this location in the SOM.Nodes in the SOM are updated according to:(1)where is the time during learning and is the neighborhood function,a smoothing kernel which is maximum ually,,where and represent the location of the nodes in the SOM output space.is the node with the closest weight vector to the input sample and ranges over all nodes.approaches0as increases and also as approaches.A widely applied neighborhood function is:Figure4:A two-dimensional SOM showing a square neighborhood function which starts as and reduces in size to over time.2.Each learning pass requires computation of the distance of the current sample to all nodes in thenetwork,which is.However,this may be reduced to using a hierarchy of networks which is created from the above node doubling strategy5.4.4Karhunen-Lo`e ve TransformThe optimal linear method6for reducing redundancy in a dataset is the Karhunen-Lo`e ve(KL)transform or eigenvector expansion via Principle Components Analysis(PCA)[12].PCA generates a set of orthogonal axes of projections known as the principal components,or the eigenvectors,of the input data distribution in the order of decreasing variance.The KL transform is a well known statistical method for feature extraction and multivariate data projection and has been used widely in pattern recognition,signal processing,image processing,and data analysis.Points in an-dimensional input space are projected into an-dimensional space,.We use the KL transform for comparison with the SOM in the dimensionality reduction of the local image samples.The use of the KL transform here is not the same as in the eigenfaces approach because we operate on small local image samples as opposed to the entire images.The KL technique is fundamentally different to the SOM method,as it assumes the images are sufficiently described by second order statistics,while the SOM is an attempt to approximate the probability density as shown in Kohonen[21].4.5Convolutional NetworksTheoretically,we should be able to train a large enough multi-layer perceptron neural network to perform any required mapping[14],including that required to perfectly distinguish the classes in face recognition. However,in practice,such a system is unable to form the required features in order to generalize to unseen inputs(the class of functions which can perfectly classify the training data is too large and it is not easy to constrain the solution to the subset of this class which exhibits good generalization).In other words,the problem is ill-posed-there is not enough training points in the space created by the input images in orderto allow accurate approximation of class probabilities throughout the input space.Additionally,there is no invariance to translation or local deformation of the images[23].Convolutional networks(CN)incorporate constraints and achieve some degree of shift and deformation invariance using three ideas:local receptive fields,shared weights,and spatial subsampling.The use of shared weights also reduces the number of parameters in the system aiding generalization.Convolutional networks have been successfully applied to character recognition[24,22,23,5,3].A typical convolutional network for recognizing characters is shown infigure5[24].The network consists of a set of layers each of which contains one or more planes.Approximately centered and normalized images enter at the input layer.Each unit in a plane receives input from a small neighborhood in the planes of the previous layer.The idea of connecting units to local receptivefields dates back to the1960s with the perceptron and Hubel and Wiesel’s[15]discovery of locally sensitive,orientation-selective neurons in the cat’s visual system[23].The weights forming the receptivefield for a plane are forced to be equal at all points in the plane.Each plane can be considered as a feature map which has afixed feature detector that is convolved with a local window which is scanned over the planes in the previous layer.Multiple planes are usually used in each layer so that multiple features can be detected.These layers are called convolutional layers.Once a feature has been detected,its exact location is less important.Hence,the convolutional layers are typically followed by another layer which does a local averaging and subsampling operation(eg.for a subsampling factor of2:where is the output of a subsampling plane at position and is the output of the same plane in the previous layer).The network is trained with the usual backpropagation gradient-descent procedure[13].Figure5:A typical convolutional network for recognizing characters.5System DetailsThe system we have used for face recognition is a combination of the preceding parts-a high-level block diagram is shown infigure6andfigure7shows a breakdown of the various subsystems that we experimented with or discuss.Figure6:A high-level block diagram of the system we have used for face recognition.Figure7:A diagram of the system we have used for face recognition showing alternative methods which we con-sider in this paper.We present results with either a self-organizing map or the Karhunen-Lo`e ve transform used for dimensionality reduction,and either a convolutional neural network or a multi-layer perceptron for classification.We consider the possibility of replacing thefinal classification stage in the convolutional neural network with a nearest-neighbor or related classifier.A complete recognizer consists of only one path through the diagram.Our system works as follows(we give complete details of dimensions ter):1.For the images in the training set,afixed size window(eg.5x5)is stepped over the entire image asshown infigure3and local image samples are extracted at each step.At each step the window is moved by4pixels.2.A self-organizing map(eg.with three dimensions andfive nodes per dimension,total nodes)is trained on the vectors from the previous stage.The SOM quantizes the25-dimensional input vectors into125topologically ordered values.The three dimensions of the SOM can be thought of as three features.We also experimented with replacing the SOM with the Karhunen-Lo`e ve transform.In this case,the KL transform projects the vectors in the25-dimensional space into a3-dimensional space.3.The same window as in thefirst step is stepped over all of the images in the training and test sets.Thelocal image samples are passed through the SOM at each step,thereby creating new training and test sets in the output space created by the self-organizing map.(Each input image is now represented by 3maps,each of which corresponds to a dimension in the SOM.The size of these maps is equal to the size of the input image(92x112)divided by the step size(for a step size of4,the maps are23x28).)4.A convolutional neural network is trained on the newly created training set.We also experimentedwith training a standard multi-layer perceptron for comparison.5.1Simulation DetailsIn this section we give the details of one of the best performing systems.For the SOM,training is split into two phases as recommended by Kohonen[21]-an ordering phase,and afine-adjustment phase.100,000updates are performed in thefirst phase,and50,000in the second.Inthefirst phase,the neighborhood radius starts at two-thirds of the size of the map and reduces linearly to1. The learning rate during this phase is:where is the current update number,and is the total number of updates.In the second phase,the neighborhood radius starts at2and is reduced to1.The learning rate during this phase is:.The convolutional network containedfive layers excluding the input layer.A confidence measure was calcu-lated for each classification:where is the maximum output,and is the second maxi-mum output(for outputs which have been transformed using the softmax transformation:7This helps avoid saturating the sigmoid function.If targets were set to the asymptotes of the sigmoid this would tend to:a) drive the weights to infinity,b)cause outlier data to produce very large gradients due to the large weights,and c)produce binary outputs even when incorrect-leading to decreased reliability of the confidence measure.8Relatively high learning rates are typically used in order to help avoid slow convergence and local minima.However,a constant learning rate results in significant parameter and performancefluctuation during the entire training cycle such that the performance of the network can alter significantly from the beginning to the end of thefinal epoch.Moody and Darkin have proposed“search then converge”learning rate schedules.We have found that these schedules still result in considerable parameterfluctuation and hence we have added another term to further reduce the learning rate over thefinal epochs.We have found the use of learning rate schedules to improve performance considerably.Layer Units y Receptivefield x Percentage 120263Subsampling92-325113Subsampling52-540161040Error rate 4.33%9We ran multiple simulations in each experiment where we varied the selection of the training and test images(out of a total of possibilities)and the random seed used to initialize the weights in the convolutional neural network.0246810102040T e s t E r r o r %Number of classes Figure 9:The error rate as a function of the number of classes.We did not modify the network from that used for the 40class case.SOM Dimension24Error rate6.75% 5.83%Table 3:Error rate of the face recognition system with varying number of dimensions in the self-organizing map.Each result given is the average of three simulations.02468101234T e s t E r r o r %SOM Dimensions Figure 10:The error rate as a function of the number of dimensions in the SOM.SOM Size578.5%6.0% 3.83%Table 4:Error rate of the face recognition system with varying number of nodes per dimension in the self-organizingmap.Each result given is the average of three simulations.024681045678T e s t E r r o r %SOM nodes per dimension Figure 11:The error rate as a function of the number of nodes per dimension in the SOM.4.Variation of the texture extraction algorithm–table5shows the result of using the two local imagesample representations described earlier.We found that using the original intensity values gave the best performance.We tried altering the weight assigned to the central intensity value in the alternative representation but were unable to improve the results.Input type Differences w/base intensityError rate7.17%Table5:Error rate of the face recognition system with varying image sample representation.Each result is the average of three simulations.5.Substituting the SOM with the KL transform–table6shows the results of replacing the self-organizingmap with the Karhunen-Lo`e ve transform.We tried using thefirst one,two,or three eigenvectors for projection.Surprisingly,the system performed best with only1eigenvector.The best SOM parameters we tried produced slightly better performance.The quantization inherent in the SOM could provide a degree of invariance to minor image sample differences and quantization of the PCA projections may improve performance.Dimensionality reduction SOMError rate 3.83%Table6:Error rate of the face recognition system with linear PCA and SOM feature extraction mechanisms.Each result is the average of three simulations.6.Replacing the CN with an MLP–table7shows the results of replacing the convolutional networkwith a multi-layer perceptron.Performance is very poor,as we expect due to the loss of shift and deformation invariance.We tried a number of different hidden layer sizes for the multi-layer percep-tron in the range20to100.Note that the best performing KL parameters were used while the best performing SOM parameters were not.SOMMLP39.6%CN 3.83%Table7:Error rate comparison of the various feature extraction and classification methods.Each result is the average of three simulations.7.The tradeoff between rejection threshold and recognition accuracy–Figure12shows a histogram ofthe recognizer’s confidence for the cases when the classifier is correct and when it is wrong for one of the best performing systems.From this graph we expect that classification performance will increase significantly if we reject cases below a certain confidence threshold.Figure13shows the system performance as the rejection threshold is increased.We can see that by rejecting examples with low confidence we can significantly increase the classification performance of the system.If we considera system which used a video camera to take a number of pictures over a short period,we could expectthat a high performance would be attainable with an appropriate rejection threshold.05101520253000. i s t o g r a m ConfidenceConfidence when Wrong Confidence when Correct Figure 12:A histogram depicting the confidence of the classifier when it turns out to be correct,and the confidence when it is wrong.The graph suggests that we can improve classification performance considerably by rejecting cases where the classifier has a low confidence.98.498.698.89999.299.499.699.810005101520P e r c e n t C o r r e c t Reject Percentage Classification Performance Figure 13:The test set classification performance as a function of the percentage of samples rejected.Classification performance can be improved significantly by rejecting cases with low confidence.parison with other known results on the same database –Table 8shows a summary of the per-formance of the systems for which we have results using the ORL database.In this case,we used a SOM quantization level of 8.Our system is the best performing system 10and performs recognition roughly 500times faster than the second best performing system -the pseudo 2D-HMMs of Samaria.Figure 14shows the images which were incorrectly classified for one of the best performing systems.SystemClassification time Top-down HMMn/a Eigenfacesn/a Pseudo 2D-HMM240seconds SOM+CN 0.5secondsTable 8:Error rate of the various systems.On a Sun Sparc II.On an SGI Indy MIPS R4600100Mhz system.9.Variation of the number of training images per person.Table 9shows the results of varying thenumber of images per class used in the training set from 1to 5for PCA+CN,SOM+CN and also for the eigenfaces algorithm.We implemented two versions of the eigenfaces algorithm -the first version creates vectors for each class in the training set by averaging the results of the eigenface representation over all images for the same person.This corresponds to the algorithm as described by Turk and Pentland [42].However,we found that using separate training vectors for each training image resulted in better performance.We found that using between 40to 100eigenfaces resulted in similar performance.We can see that the PCA+CN and SOM+CN methods are both superior to the。