PatternRecognition33(2000)1575}1584 Unsupervisedsegmentationusingaself-organizingmap
指纹英文
*Corresponding author.Fax:(#39)516443073;e-mail:zkovacs @deis.unibo.itPattern Recognition 32(1999)877—889Fingerprint minutiae extraction from skeletonizedbinary imagesAlessandro Farina,Zsolt M.Kova cs-Vajna *,Alberto LeoneD.E.I.S.,Uni v ersity of Bologna,Viale Risorgimento 2,I-40136Bologna,ItalyReceived 26June 1997,in revised form 25June 1998AbstractFingerprint comparison is usually based on minutiae matching.The minutiae considered in automatic identificationsystems are normally ridge bifurcations and terminations.In this paper we present a set of algorithms for the extraction of fingerprint minutiae from skeletonized binary images.The goal of the present work is the extraction of the real 40—60minutiae of a fingerprint image from the 2000—3000contained in typical skeletonized and binarized images.Besides classical methodologies for minutiae filtering,a new approach is proposed for bridge cleaning based on ridge positions instead of classical methods based on directional maps.Finally,two novel criteria and related algorithms are introduced for validating the endpoints and bifurcations.Statistical analysis of the results obtained by the proposed approach shows efficient reduction of spurious minutiae.The use of the fingerprint minutiae extraction algorithms has also been considered in a fingerprint identification system in terms of timing and false reject or acceptance rates.The presented minutiae extraction algorithm performs correctly in dirty areas and on the background as well,making computationally expensive segmentation algorithms unnecessary.The results are confirmed by visual inspections of validated minutiae of the NIST sdb 4reference fingerprint image database. 1999Pattern Recognition Society.Published by Elsevier Science Ltd.All rights reserved.Keywords :Image processing;Binary image;Post-processing fingerprint;Minutiae;NIST sdb 41.IntroductionResearch in the design of automatic fingerprint identi-fication systems (AFIS)is currently focusing on finger-print classification into five classes [1]and on fingerprint matching,whose goal is the identification of a person by means of the fingerprints.If the stored image database is huge,the identification process needs both the classifica-tion and the matching stages.In this paper we focus ourattention on a part of the matching stage,which is usu-ally based on preprocessing,feature extraction and matching algorithms.There are several useful features in fingerprints,related to ridge topology,called minutiae.The American National Standards Institute has proposed a classifica-tion of minutiae into four main groups:terminations,bifurcations,crossovers and undetermined [2].The most important minutiae types are terminations and bifurca-tions.In a fingerprint identification system the fingerprint image is obtained by a sensor or camera and the match-ing process starts with image filtering.The minutiae are extracted and they are compared with those of the0031-3203/99/$—See front matter 1999Pattern Recognition Society.Published by Elsevier Science Ltd.All rights reserved.PII:S 0031-3203(98)00107-1already stored images in order to establish the corres-pondence.The minutiae extraction can be based on gray-scale[3]or binary images.In this paper we focus our attention on a binary image-based technique,where we assume that thefinger-print image has been either acquired directly in binary or binarized from a gray-scale image.We assume also that the image has been skeletonized.The problem in this methodology comes from the fact that a minutia in the skeleton image does not always correspond to a mean-ingful point in the real image.In fact,there are more than a thousand apparent ending or bifurcation points,while the real minutiae are less than100.Such a behavior arises as a consequence of under-inking,over-inking,wrinkle, scars and excessively worn prints,and,therefore,spuri-ous minutiae can appear in the skeleton image after pre-processing.In order to simplify or to preserve the reliability of the AFIS and to lower computational costs,post-processing of the skeleton image is necessary to reduce the number of spurious minutiae.Several approaches have been already proposed in the last few years to enhance skeletonized images.The AFIS proposed by Stosz and Alyea[4]uses pore positions coupled to other minutiae extracted from live scanned images:the quality of the valley skeleton isfirst improved by analyzing and extracting segments that represent pores(cleaning),then syntactic processing is used to remove undesirable artifacts:two disconnected ridges are connected if their distance is less than a given threshold and endpoint directions are almost the same; wrinkles are detected by analyzing information on neigh-boring branch points.Chen and Kuo[5]adopt a three-step false minutiae identification process:(i)ridge breaks are repaired some-how using the ridge directions close to the minutiae;(ii) minutiae associated with short ridges are dropped;(iii) crowded minutiae in a noisy region are dropped.Malles-wara’s methods[6]are used to eliminate those false minutiae which are caused by noise or imperfect image processing.A post-processing algorithm applied to binarized and not thinned images is proposed by Fitz and Green[7]. These techniques are employed to remove small holes, breaks and lines in and along the ridges and they are implemented by convolving morphological operators with the image.Spikes caused by the skeletonizing process can be removed by the adaptive morphologicalfilter proposed by Ratha et al.[8].Minutiae are detected by counting the BLACK neighbors in a3;3window.False endpoints and bifurcations are removed by means of three heuristic criteria:(i)two endpoints with the same orientation and whose distance is below a threshold are deleted(ridge break elimination);(ii)an endpoint connected to a bifur-cation and below a certain threshold distance is removed (spike elimination);(iii)minutiae below a certain distance from the boundary of the foreground region are cancelled (boundary effect treatment).The duality between valley image and ridge image has been exploited by Hung[9]considering that bifurcations and bridges in one of the two images correspond to endings and breaks,respectively,in the dual image. Therefore,the same algorithm can be applied both to valley and ridge images to remove ridge breaks and bridges.A graph is defined for both images by assigning a vertex to each endpoint and bifurcation and by assign-ing an edge to each ridge.Each edge is characterized by the length of the corresponding ridge,while the degree of a vertex is given by the number of converging edges. Spurs,holes,bridges and breaks are then removed by considering some property of the ridge graph.The same algorithm can be mostly applied to the valley graph to remove ridge breaks,crossovers and other particular configurations.Xiao and Raafat[10]assume thatfingerprints have already been pre-processed and skeletonized.They pro-pose a post-processing based on a combined statistical and structural approach.Fingerprint minutiae are char-acterized by a set of attributes like the ridge length,the direction for endpoints and bifurcations,the angle be-tween two minutiae.Moreover,each endpoint or bifurca-tion is also characterized by the number of‘‘facing’’minutiae in its neighborhood.The neighborhood area depends on the local distance between ridges.Finally,the number of‘‘connected’’minutiae is evaluated.The post-processing algorithm connects facing endpoints,removes bifurcations facing with endpoints or other bifurcations and then connects the newly generated endpoints,and finally removes spurs,bridges,triangle structures and ladder structures.Unfortunately,it is not possible to assign a‘‘global’’ridge distance to the whole image, because it varies up to300%of its local value in a typical fingerprint.Several methodologies of the local ridge distance map computation may be found in the liter-ature[11].The algorithm proposed in this paper works on the skeleton image obtained from a binarized version of the fingerprint image.Standard algorithms are used to re-move close minutiae in noisy areas.Ridge repair is per-formed on the base of the directions of the two ridge pieces to be connected and the direction of the segment which should rebuild the broken ridge,while previous works suggested the use of structural and statistical con-siderations[10]or analysis of the distance between the two broken ridges[8].The novel approach proposed to remove bridges is based on the ridge positions instead of evaluating CPU-consuming direction maps[9].Short ridges are removed on the basis of the relationship be-tween ridge length and average distance between ridges, instead of considering the average ridge length.Island filtering(already proposed in the literature)is performed878 A.Farina et al./Pattern Recognition32(1999)877—889Fig.1.Generic structure of an automatic fingerprint identification system and detail of the ‘‘Minutiae Extraction’’block.in a more efficient way,since it is not based on a graph construction to detect closed paths.Finally,two new topological validation algorithms are presented to clas-sify reliable endpoints and bifurcations:they are removed if topological requirements are not satisfied,they are classified as less reliable minutiae if requirements are not fully satisfied,else they are considered as highly reliable minutiae.Less reliable minutiae can be taken into ac-count if the number of highly reliable minutiae cannot guarantee the required confidence level of the identifica-tion.Simulation results demonstrate the effectiveness of the proposed algorithm and display a sort of implicit segmentation that removes all minutiae related to re-gions outside the fingerprint.2.The Proposed AlgorithmThe minutiae extraction methodology proposed in this work is suitable for an AFIS like the one sketched in Fig.1.We assume that the skeleton image has been defined and a local ridge distance map has been derived.The local ridge distance map defines the average ridge distance in each region of the image [11].This is neces-sary because the ridge distance varies over the image and it is one of the most important reference parameters in the image both for filter design and minutiae extraction.The algorithm performs the following steps,according to the sequence detailed in Fig.1.2.1.Pixel codi ficationThe codification stage performs minutiae classification and removal of unclassified configurations.The skeleton image is processed in order to obtain a codified imagewhere the value of each pixel represents the number of outgoing branches in the corresponding pixel of the skel-eton image [3].2.2.Pre-filteringLow-quality areas are often present in the original scanned fingerprint image.Binarizing and thinning algo-rithms usually produce a large number of spurious high-density minutiae in these sections of the fingerprint with obvious difficulties for the identification algorithms.Pre-filtering is necessary to remove as many as possible spurious minutiae without reducing,if possible,the num-ber of details useful for the identification.A first pre-filtering has already been performed in the codification phase,removing isolated points and blobs.The pre-filtering algorithm scans the codified image in the usual way (row by row from top-left to bottom-right)and acts according to the type of the minutia configura-tion.At this phase an endpoint next to another minutia is deleted.Adjacent bifurcations are removed and those where one or more crosses are detected in their neighbor-hood.Also,crosses are removed if another cross is detec-ted in their neighborhood.2.3.Skeleton enhancementSkeleton enhancement allows for:ridge repair by con-necting endpoints identified as ridge breaks,elimination of bridges,spurs and short ridges.Ridge breaks are caused by insufficient ink,wrinkle and scars.Two facing endpoints can be connected,repairing the ridge,if they are assumed to belong to the same ridge.The ridge repair algorithm works in three steps:search of facing end-points,best endpoint selection and ridge reconstruction.A.Farina et al./Pattern Recognition 32(1999)877—889879Fig.3.Spur and processedslide.Fig.2.Bridge and processed slide.2.4.Minutiae in v alidationMinutiae originated by bridges and spurs are invali-dated,once these configurations have been recognized.Close minutiae are also removed.2.4.1.Elimination of bridges and spursBridges and spurs give rise to false minutiae and they must be removed as in the examples shown in Figs 2a and 3a.It is important to apply this algorithm before the close minutiae are searched for,in order to keep as many as possible valid minutiae.A novel method is proposed to detect and clear bridges and spurs.In previous works,bridges and spurs are detected using a local dominant directional map [9].The novel algorithm,proposed in this work,uses a visual consideration:bifurcations found when these two spurious patterns are detected,are differ-ent from real bifurcations.In these ‘‘spurious’’bifurca-tions only two branches are aligned while the direction of the third branch is generally different.In bridges,the third branch is almost orthogonal to the other two.The absolute value of the scalar product between branch unit vectors is used to estimate the alignment or orthogonality of the two branches.In this implemen-tation one orthogonality threshold @Qand two align-ment thresholds @Q and @Qhave been used.Twodirections are considered to be orthogonal if the absolute value of the scalar product is lower than @Q"0.55,i.e.when the angle is greater than 56°.Two directions are considered to be aligned if the absolute value of the scalar product is greater than @Q(in step 1e ofthe algorithm)or @Q (in step 2a). @Q"0.85and@Q"0.80(corresponding to an angle less than 32°and 37°,respectively).Algorithm.Elimination of bridges and spurs For each bifurcation:(1)if there are at least N JP" points in each branch,where is the local ridge distance in pixels:(a)estimate the direction of each branch by linearregression using at least N JPpoints;880 A.Farina et al./Pattern Recognition 32(1999)877—889Fig.4.Islands caused by largeridges.Fig.5.Three bifurcations belonging to anisland.mon bifurcation topology.(b)if two branches are aligned and the third is almostorthogonal:search for another minutia along the orthogonal branch in the first N @Q"5 /6points;(c)if another minutia is found:remove the bridge or spur;(d)if two branches are aligned,but the third is notorthogonal:search for an endpoint in the first N @Q"3 /2points;(e)if the endpoint is found:remove the spur;(2)if only two branches contain at least N JP points:(a)if these two branches are aligned:search along thethird one and remove this short bridge or spur.2.4.2.Elimination of close minutiaeThe algorithm first removes the close endpoints and then the other close minutiae.Any endpoint found in a small square area is removed.The minimum acceptable distance between two endpoints is given by N AC" /2pixels,where is the local ridge distance.The search region is a square of side 2N AC#1centered on the cur-rent endpoint.There are less than 100minutiae in a good quality rolled fingerprint image and their relative distance is rarely less than the local ridge distance,therefore too close minutiae on the same branch are removed.2.5.Topological v alidationThe topological validation algorithms are based on neighboring ridge topology:Island removal :Bifurcations belonging to a short closed path are invalidated.Bifurcation and endpoint validation :Neighboring ridge layout is considered to verify the existence of the correct pattern.These two novel algorithms will introduce re-liable and less reliable minutiae codes.2.5.1.Island eliminationClosed paths are commonly referred to as islands.Even if this particular minutia can occur in a common fingerprint,islands are often generated in the skeleton image by noisy areas and where large irregular ridges are thinned as shown in Fig.4,giving rise to false minutiae.The term island is usually employed for two facing bifurcations which form a closed path.It may be noted that this is the most frequent situation only in high-quality images.We also experienced closed paths gener-ated by the interaction of three or more bifurcations and crosses,as the example of Fig. 5.An island can be generated also by a single bifurcation where two branches flow into one another.2.5.2.Bifurcation v alidationFig.6shows the topology of a typical valid bifurcation.In fact,with the exception of bifurcations located nearA.Farina et al./Pattern Recognition 32(1999)877—889881Fig.8.Reliabilitydetermination.Fig.7.Bifurcationvalidation.mon endpoint topology.the core or delta of the fingerprint,ridges near the bifur-cation run parallel to the outgoing branches.Different bifurcation structures can be evidenced in the fingerprint margins or in textured areas.Therefore,the above prop-erty is suitable for removing isolated bifurcations in meaningless areas.The proposed validation algorithm can be divided into two phases:the first is necessary to distinguish between a valid and an invalid bifurcation,the second is required to find a reliable bifurcation.The first phase checks the existence of lateral ridges for each branch departing from the bifurcation (Fig.7).The second phase checks the geometrical properties around the bifurcation (Fig.8).Algorithm.Bifurcation validation For each valid bifurcation:(1)compute the three branch directions on N @T"3 /2points;(2)if a minutia is found within N @T" points:invali-date this bifurcation and go further;(3)move along each branch for N @T" /2points andverify that a lateral edge is present within N @T"3 /2points (see Fig.7);(4)if a lateral ridge is not found:invalidate this bifurca-tion and go further;(5)if lateral ridges are found:mark the bifurcation asa less reliable bifurcation;(6)define a rectangular area ABCD (see Fig.8)with:AB "3 /2,AD "4 ;(7)move along the lateral ridges (see Fig.8)from the leftintersections with the rectangle P0and P1to the right intersections P2and P3;(8)if an endpoint is found:invalidate this bifurcation andgo further;(9)if P2and P3are reached:mark the bifurcation asreliable .2.5.3.Endpoint v alidationThe distance between two adjacent ridges remains almost constant where no other minutiae exist in the valley between the two ridges.The same consideration applies to the dual representation of the fingerprint ana-lyzing the valley behavior.This regularity is perturbed by the presence of an endpoint when a ridge is interrupted or by its dual structure,a bifurcation,when a valley is ended.Fig.9shows the typical ridge behavior around an endpoint:neighbor ridges close each other up when the ridge between them is interrupted.It can be noted how the distance between adjacent ridges remains almost con-stant.The proposed endpoint validation algorithm is based on this visual consideration and removes most of the882 A.Farina et al./Pattern Recognition 32(1999)877—889Fig.10.Rectangular area.Table1Reduction of the number of minutiae(average and standard deviation)after application of the proposed algorithm Preliminary Initial Final Reduction processing factorGLGX GLGX DGL DGLNo processing2933111449(29)1798.33% Segmentation184570347(29)1797.45% Segmentationandfiltering72236370(54)1590.30%spurious endpoints and those originated byfingerprint borders.Algorithm.Endpoint validationFor each valid endpoint:(1)evaluate the endpoint direction defined as the direc-tion of the broken ridge over N P" points;(2)if the direction cannot be evaluated for the existenceof a minutia within N P points:invalidate the end-point and go further;(3)move along the endpoint ridge for N CT " /2points;(4)search orthogonally to the endpoint ridge directionfor the neighboring ridges;(5)if within N CT "3 /2points the ridges are not inter-cepted:invalidate the endpoint and go further; (6)define a rectangular area ABCD(see Fig.10)with:AB"3 /2,AD"3 ,(7)lateral ridges are scanned from P0and P1to the exitfrom the rectangle(points P2and P3)or a minutia occurrence;(8)if a lateral ridge crosses the rectangle in DC or AB,while the other crosses the rectangle in BC:mark the endpoint as a less reliable endpoint;(9)if a minutia is found:invalidate the endpoint and gofurther;(10)if both lateral ridges cross the rectangle in BC:(a)if the lateral ridges are not convergent(as definedin Section2.5.4):mark the endpoint as a less reliable endpoint else mark the endpoint as a highly reliable endpoint;(b)if a ridge is detected between P2and P3:invali-date the endpoint.2.5.4.Ridge con v ergenceThe points of lateral ridges contained in the rectangle ABCD are used to estimate their directions.If u and u are the unit vectors of the estimate directions,the degree of convergence is estimated by the vertical com-ponent of the vector product:C"(u u ))k(1) where k is the unit vector of the axis normal to the image teral ridges are considered convergent if the coefficient C* CN"0.085corresponding to an angle of 4°.If the ridges are almost parallel,we have C:0,while C(0for divergent ridges.If an endpoint does not verify the convergence condi-tion it is marked as less reliable but it is not invalidated since lateral ridges of valid endpoints can be almost parallel and characterized by very slow convergence.3.ResultsThe proposed minutiae extraction procedure has been tested on500fingerprint images from the NIS¹Special Database4[12](figs—0data).Fingerprint images are stored as512;512pi xel8-bit gray-scale images.The database is organized to guarantee the existence of sev-eral differentfingerprint topologies and different quality images.3.1.Statistical resultsThe performance evaluation of this kind of algorithm is not a well-defined task.However,qualitative consider-ations can be based on a contextual visual check. The goal of post-processing is to minimize the number of spurious minutiae and to maximize the amount of reliable minutiae to allow reliablefingerprint matching. Statistical analysis has been carried out on three different initial image conditions:(i)raw image;(ii)seg-mented image:boundary areas without information have been removed,then images have been binarized andA.Farina et al./Pattern Recognition32(1999)877—889883Table 2Minutiae average number and relative reduction AlgorithmEnd.Bif.Endpoint Bifurcation Total reductionreductionreductionPre-filtering 379266Ridge breaks 31126617.940.0010.54Bridge and spur 26412415.1153.3832.76Short ridge 21712417.800.0012.11Close endpoint 17012421.660.0013.78Close minutiae 1416717.0645.9729.25Island141550.0017.915.77Validate bif.14134(10)0.0038.18(56.36)10.71(15.82)Validate ep.36(6)34(10)74.47(78.72)0.0060.00(67.27)Table 3Total execution time (s)Execution timePreventive AverageStandard action deviation No action 1.7060.137Segmentation 1.6610.121Segmentation 1.4970.087and filteringTable 4Algorithm execution time (segmented and filtered images)AlgorithmExecution Standard Normalized time deviation exec.time Codification 0.0510.0090.000087Pre-filtering 0.0400.0090.000068Ridge breaks 0.0890.0290.000247Bridge and 0.3420.0330.001702spursShort ridges 0.1580.0110.000712Close endpoint 0.1470.0100.000800Close minutiae 0.1530.0100.000613Island 0.1600.0140.002692Bif.valid.0.1780.0130.003624Endp.valid.0.1790.0140.001403thinned;(iii)segmented and filtered image:segmented image has also been improved by non-linear enhance-ment [13].The minutiae extraction algorithms applied to the three different sets of test images produce the results shown in each row of Table 1,where the average and the standard deviation of the number of minutiae in the original images are shown in columns 2and 3,respective-ly,while average and standard deviation estimated after the application of the proposed algorithm are shown in columns 4and 5.Column 5also reports the average number of less reliable minutiae (between round brackets).The last column reports the percentage of minutiae removed by the algorithm.Obviously,segmented and filtered images contain a lower initial number of spurious minutiae,because the image quality is higher.Even for this class of images,the number of minutiae is reduced by an order of magnitude.It must also be noted how the standard deviation is reduced.Therefore,the number of valid minutiae is ‘‘more uniform’’in the post-processed image set,which displays the real minutiae distribution in fingerprints.Better results are obtained for raw and segmented-only images.It should be noted how a lower number of minutiae have been validated in these lower-quality im-ages.This,together with the visual check,highlights an implicit segmentation property of the proposed algo-rithms.Furthermore,the algorithms eliminate minutiae located in noisy inner fingerprint image regions.Table 2gives the statistics related to segmented and filtered images,reporting outcome of application of each step of the proposed algorithm in each row:columns 2and 3show the average number of endpoints and bifurcations after application of the algorithm specified in column 1.The number of less reliable minutiae is reported in brackets.The last three columns show the endpoint,bifurcation and total reduction factors after application of each algorithm (numbers in brackets show the reduction factors,including less reliable minutiae).The algorithm which most reduces the endpoint num-ber is the topological endpoint validation,since it takes into account image boundaries.For the bifurcation num-ber,bridge and spur elimination and close minutia elim-ination are the most effective.This is due to low-quality image areas where the minutiae density is very high.884 A.Farina et al./Pattern Recognition 32(1999)877—889Fig.11.Skeleton with minutiae points marked:(a)before post-processing;(b)afterpost-processing.Fig.12.Extracted minutiae points superimposed on the gray-level image.False minutiae are marked by a square.The statistics of raw images or segmented-only images display a major impact of the algorithms on close minu-tiae and topological validation.This is due to the higher number of spurious minutiae that are present in the raw or segmented-only images.This result also shows that the proposed algorithm performs an implicit image segmen-tation,since the final number of minutiae obtained from raw images turns out to be comparable to that obtained from high-quality pre-processed images.3.2.Algorithm performanceThe algorithms have been executed on a S ºN SPARC -station 20with total execution time depending on the initial image treatment as shown in Table 3.For each algorithm the following statistics have been evaluated on the 500test images:average execution time;standard deviation of the execution time.The execution times of each step of the extraction procedure (relative to segmented and filtered images)are reported in Table 4.The normalized average execution time is also reported in column 4.Normalization is obtained by dividing the execution time by the number of minutiae processed by the corresponding algorithm.The bridge and spur elimination algorithms are the most CPU-expensive but this is due to the order of application of the different algorithms.If normalized times are considered the island elimination and the bifur-cation topological validation turn out to be computa-tionally more intensive.The sequence of the different sub-algorithms has been adjusted to optimize the final result.However,since the most CPU-expensive steps are performed when the number of minutiae has already been significantly reduced,execution times appear to be almost equally distributed among the different procedures.3.3.Minutiae extraction algorithms in an AFISThe effect of the minutiae extraction algorithm set has been investigated in terms of false acceptance and false reject rates.The AFIS where the algorithms are applied has the structure reported in Fig.1.The last operation,before the answer is generated,is a validation stage where the fingerprint regions are checked between matching minutiae.This final inter-minutiae check is very strict and it is able to discard the matchings where a number of false minutiae generate a valid matching configuration.Thanks to this stage,the false acceptance rate does not increase if the minutiae are not completely filtered.The false reject rate does not increase by adding false points to the correct minutiae.These considerations have beenA.Farina et al./Pattern Recognition 32(1999)877—889885。
ch1 模式与模式识别概论 (13)
武汉理工大学理学院
2013
习题
1.
试简述样本,模式和模式类等概念间的关系。
2.
试简述模式识别系统的主要组成部分。
武汉理工大学理学院
2013
武汉理工大学理学院
2013
监督学习和非监督学习
监督学习 非监督学习 半监督学习
划分的类别已知,且训练样本已知类别信息 没有类别标签, 通常利用聚类方法完成 样本的分类
部分样本有类别标签, 目标与有监督分
类一致
武汉理工大学理学院
2013
监督学习和非监督学习
监督学习 非监督学习 半监督学习
地面覆盖物的图示
模式( Pattern):对研究对象所具有的特征和信息的 描述,对某类事物的抽象也成为该类的模式 .
模式识别(Pattern Recognition):确定一个样本的类别 属性(模式)的过程,即把某一样本归属于多个类 型中的某个类型.
武汉理工大学理学院
2013
1.1 模式及模式识别的概念
人类具有很强的模式识别能力,时刻在完成某种模式 识别的任务。 模式识别是从样本到类别的映射。
Pattern Recognition
第一章
武汉理工大学
概论
wanwanyuan@
理学院
第一章 模式识别概论
1 模式及模式识别的概念
2 模式识别系统
3 模式识别的应用 4 模式识别的方法
武汉理工大学理学院
2013
1.1 模式及模式识别的概念
样本(Sample):所研究对象的一个个体, 如患者的 细胞、一个汉字、一幅图片、一段视频等.
A pattern recognition approach to the detection of complex edges
* Corresponding author. Email: dori@ie.technion.ac.il 0167-8655/95/$09.50 © 1995 Elsevier Science B.V. All rights reserved SSDI 01 67-8655(95)00118-9
518
1. Introduction
Physical edges of the objects are fundamental descriptions of physical objects as they relate to transitions in surface orientation or texture. Edge detection is the identification of the intensity changes corresponding to the underlying physical changes. Detecting edges in a radiograph is a first step in taking measurements. Edges in radiographs differ from " c o n v e n t i o n a l " edges, because X-rays, unlike visible light, are only partially absorbed by the object they hit. The physical properties of the object and the width the X-rays must traverse before hitting the photographic film determine the brightness level of each point in the radiograph. Edges in radiographs have therefore a pattern of edge function that is more complex than the step function that models edges in ordinary images. To achieve radiograph understanding, computer systems must relate the raw input data to the physical structure that cause it, i.e., the object being radiated. Davis (1975) provides a survey of edge detection techniques prior to 1975. One of these methodes is the " g r a d i e n t " operator I g(i, j ) - g ( i + 1, j + 1) 1 + ] g(i, j + 1) - g(i + 1, j) 1, proposed by Roberts (Davis,
Pattern_Recognition
d2 A1
1 2
0 32 .52+ 6.52
A2
25 17 .52+1.52
A3
36+36 8 6.52+0.52
B1
9+4 5 3.52+4.52
B2
25+25 2 5.52+1.52
B3
16+36 4 4.52+0.52
C1
1+64 41 0.52+1.52
C2
4+1 13 2.52+5.52
3
神经网络
• 大规模并行计算 • 学习、推广、自适应、容错、分布表达 和计算
• 优点:可以有效地解决一些复杂的非线 性问题 • 缺点:取少有效的学习理论
模式识别应用
• • • • • • • • • • 文本分类 文本图像分析 工业自动化 数据挖掘 多媒体数据库检索 生物特征识别 语音识别 生物信息学 遥感 ….
29
C1
1+64
C2
4+1
36+36 9+4
25+25 16+36
9+4
1+64
9+9
1+9
16+36
0
1+1
58
1: A1 (2,10)
2:A3, B1,B2, B3, C2 (6, 6)
3: A2, C1 (1.5,3,5)
第二次迭代: 中心为1: (2,10), 2: (6,6), 3: (1.5,3.5)
• 决策 (Decision) • 学习 (Learning) • 普适、推广、概括(Generalization)
翻译2
Bayesian face *, Tony Jebara , Alex Pentland
Mitsubishi Electric Research Laboratory, 201 Broadway, 8th yoor, Cambridge, MA 02139, USA Massachusettes Institute of Technology, Cambridge, MA 02139, USA Received 15 January 1999; received in revised form 28 July 1999; accepted 28 July 1999
1772
B. Moghaddam et al. / Pattern Recognition 33 (2000) 1771 } 1782
discriminant analysis (LDA) as used by Etemad and Chellappa [16], the `Fisherfacea technique of Belhumeur et al. [17], hierarchical discriminants used by Swets and Weng [18] and `evolutionary pursuita of optimal subspaces by Liu and Wechsler [19] * all of which have proved equally (if not more) powerful than standard `eigenfacesa. Eigenspace techniques have also been applied to modeling the shape (as opposed to texture) of the face. Eigenspace coding of shape-normalized or `shape-freea faces, as suggested by Craw and Cameron [20], is now a standard pre-processing technique which can enhance performance when used in conjunction with shape information [21]. Lanitis et al. [22] have developed an automatic face-processing system with subspace models of both the shape and texture components, which can be used for recognition as well as expression, gender and pose classi"cation. Additionally, subspace analysis has also been used for robust face detection [12,14,23], nonlinear facial interpolation [24], as well as visual learning for general object recognition [13,25,26].
模式识别中英文
The Science of Pattern RecognitionAchievements and PerspectivesRobert P.W. Duin1 and El˙zbieta P_ ekalska21 ICT group, Faculty of Electr. Eng., Mathematics and Computer Science Delft University of Technology, The Netherlandsr.duin@2 School of Computer Science, University of Manchester, United Kingdompekalska@Summary. Automatic pattern recognition is usually considered as an engineering area which focusses on the development and evaluation of systems that imitate or assist humans in their ability of recognizing patterns. It may, however, also be considered as a science that studies the faculty of human beings (and possibly other biological systems) to discover, distinguish, characterize patterns in their environment and accordingly identify new observations. The engineering approach to pattern recognition is in this view an attempt to build systems that simulate this phenomenon. By doing that, scientific understanding is gained of what is needed in order to recognize patterns, in general. Like in any science understanding can be built from different, sometimes even opposite viewpoints. We will therefore introduce the main approaches to the science of pattern recognition as two dichotomies of complementary scenarios. They give rise to four different schools, roughly defined under the terms of expert systems, neural networks, structural pattern recognition and statistical pattern recognition.We will briefly describe what has been achieved by these schools, what is common and what is specific, which limitations are encountered and which perspectives arise for the future. Finally, we will focus on the challenges facing pattern recognition in the decennia to come. They mainly deal with weaker assumptions of the models to make the corresponding procedures for learning and recognition wider applicable. In addition, new formalisms need to be developed.IntroductionWe are very familiar with the human ability of pattern recognition. Since our early years we have been able to recognize voices, faces, animals, fruits or inanimate objects. Before the speaking faculty is developed, an object like a ball is recognized, even if it barely resembles the balls seen before. So, except for the memory, the skills of abstraction and generalization are essential to find our way in the world. In later years we are able to deal with much more complex patterns that may not directly be based on sensorial observations.For example, we can observe the underlying theme in a discussion or subtle patterns in human relations. The latter may become apparent, e.g. only by listening to somebody’s complaints about his personal problems at work that again occur in a completely new job. Without a direct participation in theevents, we are able to see both analogy and similarity in examples as complex as social interaction between people. Here, we learn to distinguish the pattern from just two examples.The pattern recognition ability may also be found in other biological systems:the cat knows the way home, the dog recognizes his boss from the footsteps or the bee finds the delicious flower. In these examples a direct connection can be made to sensory experiences. Memory alone is insufficient; an important role is that of generalization from observations which are similar,although not identical to the previous ones. A scientific challenge is to find out how this may work. Scientific questions may be approached by building models and, more explicitly, by creating simulators, i.e. artificial systems that roughly exhibit the same phenomenon as the object under study. Understanding will be gained while constructing such a system and evaluating it with respect to the real object. Such systems may be used to replace the original ones and may even improve some of their properties. On the other hand, they may also perform worse in other aspects. For instance, planes fly faster than birds but are far from being autonomous. We should realize, however, that what is studied in this case may not be the bird itself, but more importantly, the ability to fly.Much can be learned about flying in an attempt to imitate the bird, but also when differentiating from its exact behavior or appearance. Byconstructing fixed wings instead of freely movable ones, the insight in how to fly grows.Finally, there are engineering aspects that may gradually deviate from the original scientific question. These are concerned with how to fly for a long time, with heavy loads, or by making less noise, and slowly shift the point of attention to other domains of knowledge.The above shows that a distinction can be made between the scientific study of pattern recognition as the ability to abstract and generalize from observations and the applied technical area of the design of artificial pattern recognition devices without neglecting the fact that they may highly profit from each other. Note that patterns can be distinguished on many levels,starting from simple characteristics of structural elements like strokes, through features of an individual towards a set of qualities in a group of individuals,to a composite of traits of concepts and their possible generalizations. A pattern may also denote a single individual as a representative for its population, model or concept. Pattern recognition deals, therefore, with patterns,regularities,characteristics or qualities that can be discussed on a low level of sensory measurements (such as pixels in an image) as well as on a high level of the derived and meaningful concepts (such as faces in images). In this work, we will focus on the scientific aspects, i.e. what we know about the way pattern recognition works and, especially, what can be learned from our attempts to build artificial recognition devices.A number of authors have already discussed the science of pattern recognition based on their simulation and modeling attempts. One of the first, in the beginning of the sixties, was Sayre [64], who presented a philosophical study on perception, pattern recognition and classification. He made clear that classification is a task that can be fulfilled with some success, but recognition either happens or not. We can stimulate the recognition by focussing on some aspects of the question. Although we cannot set out to fully recognize an individual, we can at least start to classify objects on demand. The way Sayre distinguishes between recognition and classification is related to the two subfields discussed in traditional texts on pattern recognition, namelyunsupervised and supervised learning. They fulfill two complementary tasks. They act as automatic tools in the hand of a scientist who sets out to find the regularities in nature.Unsupervised learning (also related to exploratory analysis or cluster analysis) gives the scientist an automatic system to indicate the presence of yet unspecified patterns (regularities) in the observations. They have to be confirmed (verified) by him. Here, in the terms of Sayre, a pattern is recognized.Supervised learning is an automatic system that verifies (confirms)the patterns described by the scientist based on a representation defined by him. This is done by an automatic classification followed by an evaluation.In spite of Sayre’s discussion, the concepts of pattern recognition and classification are still frequently mixed up. In our discussion, classification is a significant component of the pattern recognition system, but unsupervised learning may also play a role there. Typically, such a system is first presented with a set of known objects, the training set, in some convenient representation. Learning relies on finding the data descriptions such that the system can correctly characterize, identify or classify novel examples. After appropriate preprocessing and adaptations, various mechanisms are employed to train the entire system well. Numerous models and techniques are used and their performances are evaluated and compared by suitable criteria. If the final goal is prediction, the findings are validated by applying the best model to unseen data. If the final goal is characterization, the findings may be validated by complexity of organization (relations between objects) as well as by interpretability of the results.Fig. 1 shows the three main stages of pattern recognition systems: Representation, Generalization and Evaluation, and an intermediate stage of Adaptation[20]. The system is trained and evaluated by a set of examples, the Design Set. The components are:•Design Set. It is used both for training and validating the system. Given the background knowledge, this set has to be chosen such that it is representative for the set of objects to be recognized by the trained system.There are various approaches how to split it into suitable subsets for training,validation and testing. See e.g. [22, 32, 62, 77] for details.•Representation. Real world objects have to be represented in a formal way in order to be analyzed and compared by mechanical means such as a computer. Moreover, the observations derived from the sensors or other formal representations have to be integrated with the existing, explicitly formulated knowledge either on the objects themselves or on the class they may belong to. The issue of representation is an essential aspect of pattern recognition and is different from classification. It largely influences the success of the stages to come.•Adaptation. It is an intermediate stage between Representation and Generalization,in which representations, learning methodology or problem statement are adapted or extended in order to enhance the final recognition.This step may be neglected as being transparent, but its role is essential.It may reduce or simplify the representation, or it may enrich it by emphasizing particular aspects, e.g. by a nonlinear transformation of features that simplifies the next stage. Background knowledge may appropriately be (re)formulated and incorporated into arepresentation. If needed, additional representations may be considered to reflect other aspects of the problem. Exploratory data analysis (unsupervised learning) may be used to guide the choice of suitable learning strategies.•Generalization or Inference. In this stage we learn a concept from a training set, the set of known and appropriately represented examples, in such a way that predictions can be made on some unknown properties of new examples. We either generalize towards a concept or infer a set of general rules that describe the qualities of the training data. The most common property is the class or pattern it belongs to, which is the above mentioned classification task.•Evaluation. In this stage we estimate how our system performs on known training and validation data while training the entire system. If the results are unsatisfactory, then the previous steps have to be reconsidered.Different disciplines emphasize or just exclusively study different parts of this system. For instance, perception and computer vision deal mainly with the representation aspects [21], while books on artificial neural networks [62],machine learning [4, 53] and pattern classification [15] are usually restricted to generalization. It should be noted that these and other studies with the words “pattern” and “recognition” in the title often almost entirely neglect the issue of representation. We think, however, that the main goal of the field of pattern recognition is to study generalization in relation to representation[20].In the context of representations, and especially images, generalization has been thoroughly studied by Grenander [36]. What is very specific and worthwhile is that he deals with infinite representations (say, unsampled images),thereby avoiding the frequently returning discussions on dimensionality and directly focussing on a high, abstract level of pattern learning. We like to mention two other scientists that present very general discussions on the pattern recognition system: Watanabe [75] and Goldfarb [31, 32]. They both emphasize the structural approach to pattern recognition that we will discuss later on. Here objects are represented in a form that focusses on their structure.A generalization over such structural representations is very difficult if one aims to learnthe concept, i.e. the underlying, often implicit definition of a pattern class that is able to generate possible realizations. Goldfarb argues that traditionally used numeric representations are inadequate and that an entirely new, structural representation is necessary. We judge his research program as very ambitious, as he wants to learn the (generalized) structure of the concept from the structures of the examples. He thereby aims to make explicit what usually stays implicit. We admit that a way like his has to be followed if one ever wishes to reach more in concept learning than the ability to name the right class with a high probability, without having built a proper understanding.模式识别研究的成果与展望自动模式识别通常被认为是这样的一个工程领域:专注于开发和评价模仿或辅助人类识别模式能力的系统,但是也可能被认为是这样的一门科学:学习人类(或其它生物系统)在所处环境中发现、区别和找出特征从而标识出观察结果的本领。
HandwrittenDigit...
Handwritten Digit Recognition Based on PrincipalComponent Analysis and Support Vector MachinesRui Li and Shiqing ZhangSchool of Physics and Electronic Engineering, Taizhou University318000 Taizhou, China{lirui,zhangshiqing}@Abstract. Handwritten digit recognition has always been a challenging task inpattern recognition area. In this paper we explore the performance of supportvector machines (SVM) and principal component analysis (PCA) onhandwritten digits recognition. The performance of SVM on handwritten digitsrecognition task is compared with three typical classification methods, i.e.,linear discriminant classifiers (LDC), the nearest neighbor (1-NN), and theback-propagation neural network (BPNN). The experimental results on thepopular MNIST database indicate that SVM gets the best performance with anaccuracy of 89.7% with 10-dimensional embedded features, outperforming theother used methods.Keywords: Handwritten digits recognition, Principal component analysis,Support vector machines.1 IntroductionHandwritten digit recognition is an active topic in pattern recognition area due to its important applications to optical character recognition, postal mail sorting, bank check processing, form data entry, and so on.The performance of character recognition largely depends on the feature extraction approach and the classifier learning scheme. For feature extraction of character recognition, various approaches, such as stroke direction feature, the statistical features and the local structural features, have been presented [1, 2]. Following feature extraction, it’s usually needed to reduce the dimensionality of features since the original features are high-dimensional. Principal component analysis [3] is a fundamental multivariate data analysis method and widely used for reducing the dimensionality of the existing data set and extracting important information. The task of classification is to partition the feature space into regions corresponding to source classes or assign class confidences to each location in the feature space. At present, the representative statistical learning techniques [4] including linear discriminant classifiers (LDC) and the nearest neighbor (1-NN), and neural network [5], have been widely used for handwritten digit recognition. Support vector machines (SVM) [6] became a popular classification tool due to its strong generalization capability, which was successfully employed in various real-world applications. In the present study we employ PCA to extract the low-dimensional embedded data representations and explore the performance of SVM for handwritten digit recognition.S. Lin and X. Huang (Eds.): CSEE 2011, Part I, CCIS 214, pp. 595–599, 2011.© Springer-Verlag Berlin Heidelberg 2011596 R. Li and S. Zhang2 Principal Component AnalysisPrincipal Component Analysis (PCA) [3] is a basis transformation to diagonalize an estimate of the covariance matrix of the data set. PCA can be applied to represent the input digit images by projecting them onto a low-dimensional space constituted by a small number of basis images derived by finding the most significant eigenvectors of the covariance matrix.In order to find a linear mapping M which maximizes the objective function ((cov())T trace M X M ), PCA solves the following eigenproblem:cov()X M M λ= (1)where cov()X is the sample covariance matrix of the data X . The d principal eigenvectors of the covariance matrix form the linear mapping M . And then the low-dimensional data representations are computed by Y XM =.3 Support Vector MachinesSupport vector machines (SVM) [6] is based on the statistical learning theory of structural risk management and quadratic programming optimization. And its main idea is to transform the input vectors to a higher dimensional space by a nonlinear transform, and then an optimal hyperplane which separates the data can be found. Given training data set {}11(,),...,(,),1,1l l i x y x y y ∈−, to find the optimal hyperplane, a nonlinear transform, ()Z x =Φ, is used to make training data become linearly dividable. A weight w and offset b satisfying the following criteria will be found:1,11,1T i i T i iw z b y w z b y ⎧+≥=⎪⎨+≤−=−⎪⎩ (2) We can summarize the above procedure to the following:,1min ()()2T w b w w w Φ= (3) Subject to ()1,1,2,...,T i i y w z b i n +≥=If the sample data is not linearly dividable, the following function should be minimized.11()2l T i i w w w C ξ=Φ=+∑ (4) whereas ξcan be understood as the error of the classification and C is the penalty parameter for this term.By using Lagrange method, the decision function of 01li i i i w y z λ==∑ will be0sgn[()]l T i i i i f y z z b λ==+∑(5)Handwritten Digit Recognition Based on Principal Component Analysis 597 From the functional theory, a non-negative symmetrical function (,)K u v uniquely defines a Hilbert space H , where K is the rebuild kernel in the space H :(,)()()i i i K u v u v αϕϕ=∑(6)This stands for an internal product of a characteristic space:()()(,)T T i i i z z x x K x x =ΦΦ=(7) Then the decision function can be written as:1sgn[(,)]l i i i i f y K x x b λ==+∑(8)The development of a SVM emotion classification model depends on the selection of kernel function. There are several kernel functions, such as linear, polynomial, radial basis function (RBF) and sigmoid, that can be used in SVM models.4 Experiment Study4.1 MNIST DatabaseThe popular MNIST database of handwritten digits, which has been widely used for evaluation of classification and machine learning algorithms, is used for our experiments. The MNIST database of handwritten digits, available from the web site: /exdb/mnist, has a training set of 60000 examples, and a test set of 10000 examples. It is a subset of a larger set available from NIST. The original black and white images from NIST were size normalized to fit in a 20×20 pixel box while preserving their aspect ratio. The images were centered in a 28×28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28×28 field. In our experiments, for computation simplicity we randomly selected 3000 training samples and 1000 testing samples for handwritten digits recognition. Some samples from the MNIST database are shown in Fig.1.4.2 Experimental Results and AnalysisTo verify the performance of SVM on handwritten digits recognition task, three typical methods, i.e., linear discriminant classifiers (LDC), the nearest neighbor (1-NN) and the back-propagation neural network (BPNN) as a representative neural network were used to compare with SVM. For BPNN method, the number of the hidden layer nodes is 30. We employed the LIBSVM package, available at .tw/cjlin/libsvm, to implement SVM algorithm with RBF kernel, kernel parameter optimization, one-versus-one strategy for multi-class classification problem. The RBF kernel was used for its better performance compared with other kernels. For simplicity, the feature dimension of the original grey image features (28×28=784) is reduced to 10 as an illustration of evaluating the performance of SVM.598 R. Li and S. ZhangFig. 1. Some samples from the MNIST databaseTable 1. Handwritten Digits Recognition Results with 10-Dimensional Embedded FeaturesMethods LDC 1-NN BPNN SVM Accuracy(%) 77.6 82.7 84.8 89.7 Table 2. Confusion matrix of Handwritten Digits Recognition Results with SVMDigits 0 1 2 3 4 5 6 7 8 90 90 0 0 0 0 5 0 0 1 01 0 110 1 0 0 0 0 0 4 02 0 0 84 1 2 0 1 1 0 03 0 0 5 108 0 3 0 0 7 04 0 0 3 0 69 1 1 0 1 125 2 0 3 2 2 88 0 0 1 16 2 0 0 0 1 0 84 0 1 07 0 2 1 0 1 0 0 101 0 68 2 0 2 2 0 4 1 0 75 39 0 2 0 0 6 1 0 2 4 88Handwritten Digit Recognition Based on Principal Component Analysis 599 Table 1 presents the different recognition results of four classification methods including LDC, 1-NN, BPNN as well as SVM. From the results in Table 1, we can observe that SVM performs best, and achieves the highest accuracy of 89.7% with 10-dimensional embedded features, followed by BPNN, 1-NN and LDC. This demonstrates that SVM has the best generalization ability among all used four classification methods. In addition, the recognition accuracies for BPNN, 1-NN and LDC, are 84.8%, 82.7% and 77.6%, respectively.To further explore the recognition results of different handwritten digits with SVM, the confusion matrix of recognition results with SVM is presented in Table 2. As shown in Table 2, we can see that three digits, i.e., “1”, “3” and “7”, could be discriminated well, while other digits could be classified poor.5 ConclusionsIn this paper, we performed reduction dimension with PCA for the grey digits image features and explored the performance of four different used classification methods, i.e., LDC, 1-NN, BPNN and SVM, for handwritten digits recognition from the popular MNIST database. The experimental results on the MNIST database demonstrate that SVM can achieve the best performance with an accuracy of 89.7% with 10-dimensional reduced features, due to its good generalization ability. In our future work, it’s an interesting task to study the performance of other more advanced dimensionality reduction techniques than PCA on handwritten digits recognition. Acknowledgments. This work is supported by Zhejiang Provincial Natural Science Foundation of China (Grant No. Y1111058).References1.Trier, O.D., Jain, A.K., Taxt, T.: Feature extraction methods for character recognition—asurvey. Pattern Recognition 29(4), 64–662 (1996)uer, F., Suen, C.Y., Bloch, G.: A trainable feature extractor for handwritten digitrecognition. Pattern Recognition 40(6), 1816–1824 (2007)3.Partridge, M., Calvo, R.: Fast dimensionality reduction and simple PCA. Intelligent DataAnalysis 2(3), 292–298 (1998)4.Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEETransactions on Pattern Analysis and Machine Intelligence 22(1), 4–37 (2000)5.Kang, M., Palmer-Brown, D.: A modal learning adaptive function neural network appliedto handwritten digit recognition. Information Sciences 178(20), 3802–3812 (2008)6.Vapnik, V.: The nature of statistical learning theory. Springer, New York (2000)。
层次门限秘密图片共享
A hierarchical threshold secret image sharingCheng Guo a ,Chin-Chen Chang b ,c ,⇑,Chuan Qin baDepartment of Computer Science,National Tsing-Hua University,Hsinchu 30013,Taiwan,ROCbDepartment of Information Engineering and Computer Science,Feng Chia University,No.100Wenhwa Rd.,Seatwen,Taichung 40724,Taiwan,ROC cDepartment of Biomedical Imaging and Radiological Science,Chinese Medical University,Taichung 40402,Taiwan,ROCa r t i c l e i n f o Article history:Received 31March 2011Available online 1October 2011Communicated by S.Sarkar Keywords:Hierarchical threshold Distortion-freeSecret image sharing Access structurea b s t r a c tIn the traditional secret image sharing schemes,the shadow images are generated by embedding the secret data into the cover image such that a sufficient number of shadow images can cooperate to recon-struct the secret image.In the process of reconstruction,each shadow image plays an equivalent role.However,a general threshold access structure could have other useful properties for the application.In this paper,we consider the problem of secret shadow images with a hierarchical threshold structure,employing Tassa’s hierarchical secret sharing to propose a hierarchical threshold secret image sharing scheme.In our scheme,the shadow images are partitioned into several levels,and the threshold access structure is determined by a sequence of threshold requirements.If and only if the shadow images involved satisfy the threshold requirements,the secret image can be reconstructed without distortion.Ó2011Elsevier B.V.All rights reserved.1.IntroductionA secret sharing scheme is a technique to share a secret among a group of participants.The secret data can be divided into several pieces,called secret shadows,which are distributed to the partici-pants.If enough participants cooperate to reconstruct the secret data and pool their secret shadows together,the secret data can be reconstructed.In 1979,the first (t ,n )threshold secret sharing schemes were pro-posed by Shamir (1979)and Blakley (1979),based on Lagrange interpolating and liner project geometry,respectively.In 1995,based on the concept of threshold secret sharing,Naor and Shamir (1995)introduced visual cryptology,or the visual secret sharing scheme (VSS scheme).In the (t ,n )VSS schemes (Yang,2004;Wang et al.,2007;Chang et al.,2009;Lin and Wang,2010),the secret data is an image comprised of black and white pixels that is encoded into n shadow images,and the secret image can be reconstructed only by stacking t of the shadow images;no information about the secret im-age can be obtained from t À1or fewer shadow images.However,in this kind of visual secret sharing scheme,the shadow images are meaningless,which tends to call attention to them.In 2003,Thien and Lin (2003)utilized the steganography approach to embed the secret image into a cover image to generate the shadow images.In their scheme,the shadow images are meaningful and the distortion between the cover image and the shadow images is imperceptible.Since then,the secret image sharing schemes (Lin and Tsai,2004;Wu et al.,2004;Yang et al.,2007;Chang et al.,2008;Zhao et al.,2009;Lin et al.,2009;Eslami et al.,2010;Lin and Chan,2010)have been extensively developed to meet the requirements of our daily lives.In 2004,Lin and Tsai (2004)proposed a secret image sharing scheme with steganography and authentication,in which the sha-dow images are meaningful,while the reconstructed secret image is distorted slightly.In 2007,Yang et al.(2007)improved Lin and Tsai’s scheme by making it possible to restore a distortion free secret image,but their scheme reduces the visual quality of the shadow images and increases the risk of their being deceived by malicious intruders.In 2009,Lin et al.(2009)employed the modulus operator to embed the secret image into a cover image,where each partici-pant can obtain a meaningful shadow image with high visual qual-ity,the authorized participants can detect intruders,and the secret image and the cover image can be recovered losslessly.In 2010,Lin and Chan (2010)proposed a new secret sharing scheme that achieves an excellent combination of embedding capacity and the visual quality of shadow images.In addition,their scheme can reconstruct the secret image and the cover image without distortion.However,in all of these schemes,all shadow images are equal in terms of privileges and authority in the process of reconstructing the secret image.In this paper,we consider a special situation in which the shadow images may be not equal.We achieve a hierar-chical threshold access structure by introducing Tassa’s threshold secret sharing scheme,and we present a novel hierarchical thresh-old secret image sharing scheme.Secret sharing schemes have been extensively studied,and se-cret image sharing and its many variations form an important re-search direction.In 2007,Tassa (2007)proposed a hierarchical threshold secret sharing scheme.Tassa’s scheme is based on the0167-8655/$-see front matter Ó2011Elsevier B.V.All rights reserved.doi:10.1016/j.patrec.2011.09.030⇑Corresponding author at:Department of Information Engineering and Com-puter Science,Feng Chia University,No.100Wenhwa Rd.,Seatwen,Taichung 40724,Taiwan,ROC.Tel.:+886424517250x3790;fax:+886427066495.E-mail addresses:guo8016@ (C.Guo),alan3c@ (C.-C.Chang),qin@ (C.Qin).Birkhoff interpolation.In their scheme,the secret is shared by a set of participants partitioned into several levels,and the secret data can be reconstructed by satisfying a sequence of threshold require-ments(e.g.,it has at least t0participants from the highest level,as well as at least t1>t0participants from the two highest levels and so forth).There are many real-life examples of hierarchical thresh-old schemes.Consider the following example.According to a grad-uate school’s policy,a graduate who wants to apply for a postgraduate position must have letters of recommendation.As-sume that the graduate school’s policy concerning such recom-mendations is that the candidate must have at least two recommendations from professors and at leastfive recommenda-tions from a combination of professors and associate professors. In this scenario,the professor is the highest level,the associate pro-fessor is the second-highest level,the corresponding threshold val-ues are t0¼2and t1¼5,and the shadows of a higher level can substitute for those of a lower level.In this example,recommenda-tions from two professors and three associate professors,three professors and two associate professors,four professors and one associate professor,orfive professors are all acceptable.The same situation also appears in the sharing of secret images. Inspired by the hierarchical secret sharing scheme,we construct a hierarchical threshold secret image sharing scheme.The existing secret image sharing schemes(Lin and Tsai,2004; Wu et al.,2004;Yang et al.,2007;Chang et al.,2008;Zhao et al., 2009;Lin et al.,2009;Eslami et al.,2010;Lin and Chan,2010)have tried to improve the visual quality of the shadow images so the sus-picions of malicious intruders won’t be aroused.Two of the most popular steganographic methods are the least significant bits (LSB)replacement and the modulus operation.Shamir’s(t,n) threshold scheme is an ingenious method with which to share se-cret data among n participants.Traditional secret image sharing schemes(Lin et al.,2009;Lin and Chan,2010)usually transformed the secret image pixels into base-m representation,s1;s2;...;s MSÂN S Âd log m255e,where M SÂN S denotes the size of the secret image, and then constructed a Lagrange interpolation polynomial f(x)byusing the s1;s2;...;s MS ÂN SÂd log m255eas the polynomial coefficients.Inorder to make the shadow images meaningful for the purposes of camouflage—that is,to diminish the distortion of the shadow images—they utilized the modulus operator to embed secret data into the pixel of the cover image.However,in Tassa’s scheme,the hierarchical threshold access structure can work so long as the modulus p is far greater than the threshold t,so it is difficult to combine the existing secret im-age sharing schemes based on the modulus operation directly with Tassa’s hierarchical threshold access structure.Therefore,how to combine the hierarchical threshold access structure proposed by Tassa(2007)with steganographic methods is our scheme’s main challenge.We also need to guarantee that the secret image can be reconstructed losslessly.To the best of our knowledge,no hierarchical secret image shar-ing schemes have been proposed in the literature to date.Since we believe that the application of secret image sharing in groups with hierarchical structure has good prospects,we provide a novel hier-archical threshold secret image sharing scheme.In our scheme,the n shadow images generated from the secret image and the cover image are partitioned into several levels such that each level has a certain number of shadow images and a cor-responding threshold.The secret image can be reconstructed only by meeting a sequence of threshold requirements.The novel characteristic of the proposed scheme is not available in the existing mechanisms,so the proposed scheme has the potential to work in many applications.In addition to the unique hierarchical threshold characteristic,our proposed scheme has three key characteristics:1.The secret image can be retrieved losslessly.2.The scheme solves the problems of overflow and underflow.3.Unlike the traditional secret image sharing schemes in whichthe embedding capacity is proportional to the increase of t, the capacity of the embedded secret data is stable and large.2.Review of Tassa’s hierarchical threshold secret sharing schemeIn case of hierarchical threshold secret sharing,the set of participants is partitioned into some levels P0,P1,...,P m and the access structure is then determined by a sequence of threshold requirements t0,t1,...,t m according to their hierarchy.Tassa’s method(2007)for hierarchical threshold secret sharing is based on Birkhoff interpolation.In this section,we briefly introduce Tassa’s hierarchical threshold secret sharing scheme.Assume that there are n participants and one dealer responsible for generating the secret shadows and distributing them to the participants.1.The dealer generates the polynomial F(x)of degree at mostt mÀ1over GF q,FðxÞ¼Sþa1xþa2x2þÁÁÁþa tmÀ1x t mÀ1,where S is the shared secret data.2.An element i e GF q is assigned to the participant i,for all16i6n.3.For any level j,each participant i from the j th level will receivethe secret shadow F t jÀ1ðiÞ,where F t jÀ1ð:Þis the(t jÀ1)th derivative of F(x)and t0=0.4.In the reconstruction phase,the participants can cooperate toreconstruct the shared secret data by using Birkhoff interpolation.3.The proposed schemeGiven a shared secret image S and a cover image O,the dealer can generate n shadow images p i,for i=1,2,...,n.In the proposed scheme,the n shadow images do not have equal status;instead, the secret image is shared among n shadow images that are parti-tioned into several levels.Based on Tassa’s definition(Tassa,2007), we define the hierarchical secret image sharing as follows:Definition1.Let P be a set of n shadow images and assume that Pis composed of levels;that is,P¼[mi¼0P i,where P i\P j¼/for all i–j and i,j e[0,m].Let t=f t i g m i¼0be a monotonically increasing sequence of integers.Then the(t,n)hierarchical threshold access structure isC¼V&P:V\[ij¼0P j!P t i;8i2f0;1;...;m g():3.1.Initialization procedureFirst,the dealer constructs n secret shadow images and divides these n shadow images into(m+1)levels P={P0,P1,...,P m} according to the real-life situation.Then the dealer sets a sequence of threshold values{t0,t1,...,t m},0<t0<t1...<t m,where t=t m is the overall number of shadow images that are required for recov-ery of the secret image,and assumes thatp1;p2;...;pl02P0;pl0þ1;pl0þ2;...;pl12P1;...pl mÀ1þ1;pl mÀ1þ2;...;pn2P m;84 C.Guo et al./Pattern Recognition Letters33(2012)83–91where pi;06i6n denotes the i th shadow image and P i;06i6m denotes the set of shadow images of the i th level.The secret image can be reconstructed by satisfying a sequence of threshold require-ments,such as that it has at least t0secret shadow images from the highest level as well as at least t1>t0secret shadow images from the two highest levels and so forth.Assume that the cover image O has MÂN pixels, O={o i|i=1,2,...,(MÂN)},and secret image S has M SÂN S pixels.Step1:The dealer selects a large prime modulus p.Step2:The dealer obtains all pixels of the secret image S, denoted as S¼f s j j j¼1;2;...;ðM SÂN SÞg,where s j e[0,255].3.2.Secret image sharing procedureThe procedure consists of two phases:(1)the sharing phase, and(2)the embedding phase.3.2.1.Sharing phaseWithout loss of generality,assume that we want to embed s0;s1;s2;...;s tÀ1into the cover image to generate n shadow images using a hierarchical access structure.The dealer performs the fol-lowing steps:Step1.Construct a(tÀ1)th-degree polynomial FðxÞ¼s0þs1x þÁÁÁþs tÀ1x tÀ1mod p,where p is a large prime,t=t m,and s0,s1,s2,...,s tÀ1denote the pixel values of the shared secret image.Step2.For thefirst level shadow images,utilize the(tÀ1)th-degree polynomial F(x)to generate the shadow images.The shadow images of other levels are processed in the following manner.The shadow images of the i th level in the hierarchy can be generated by using the polynomial F t iÀ1ðxÞ,whereF t iÀ1ðxÞis the(t iÀ1)th derivative of F(x).For example,there are three levels in the shadow images, P=P0[P1[P2.Assume that the threshold sequence requirements are t0=2,t1=4and t2=7;that is,the secret image can be recon-structed if and only if there are at least seven shadow images,of which at least four are from P0[P1,and at least two are from P0. In this example,we should construct a6th-degree polynomial FðxÞ¼s0þs1xþÁÁÁþs6x6mod p:First,the dealer utilizes F(x)to generate the shadow images that belong to P0.Since t0=2,the sec-ond level shadow images are generated by using the polynomial F00ðxÞ,and since t1=4,the shadow images of the lowest level can be computed using the polynomial F(4)(x).3.2.2.Embedding phaseIn order to diminish the distortion of the shadow images,most existing secret image sharing schemes have utilized the modulus operator to embed the secret image data into the pixels of the cov-er image.However,in the proposed scheme,in keeping with Tas-sa’s scheme,we calculate the shadow images in afinitefield of size p,which is a large prime,so we need to develop a new method in order to embed the shadow data into the cover image.Lin and Chan’s scheme(Lin and Chan,2010)formed a camou-flaged pixel usingQ i¼b o i=k cÂk;q i ¼Q iþy i;ð1Þwhere Q i is the quantized value of o i,and q i represents the i th cam-ouflaged pixel.Inspired by Lin and Chan’s scheme,we also use a quantization operation to embed the secret data.However,in Lin and Chan’s scheme,all operations are in afield modulo a small prime number r,such as5,7,or11,so y i can be directly embedded into a pixel of the cover image without causing a large distortion.However, in our scheme,the modulus p must be far greater than the threshold t,so we obtain a large integer y i=F(i)by feeding an integer i,i2[1,n]into F(x)and need to use r pixels of shadow images to rep-resent shadow data y i.In the traditional secret image sharing schemes,the dealer feeds a secret key or a unique ID i into the poly-nomial F(x)to obtain y i;and in order to facilitate the embedding of y i into the shadow image,the polynomial F(x)can modulo a small prime.However,in the proposed scheme,the polynomial F(x)needs to modulo a large prime,so we need more pixels to represent y i. Obviously,the larger y i is,the more pixels are needed to represent y i.Therefore,in order to minimize y i,we feed a series of integers i, for i=1,2,...,n into F(x)instead of feeding ID i.In order to guaran-tee that r pixels are sufficient to represent y i,we maximize the shared secret data s i and assume that all s i=255,for i=1,2,...,n.Wefirst talk about how to generate the highest level shadow images.Assume that the highest level P0includes l0shadow images p1;p2;...;pl0,and the selected r camouflage pixels in the cover im-age O are o i,o i+1,...,o i+rÀ1.In the embedding phase,we perform the following steps:Step1.Assume that we want to generate the shadow image p i,i e[1,l0].The dealerfirst computes y i by feeding i into F(x).Step2.We utilize Lin and Chan’s method(2010)to generate and camouflage the shadow images.Firstly,we convert the secret data y i into the r-ary notational system.In Lin and Chan’s scheme(2010),Eq.(1)may lead to an overflow situation.There-fore,we must ensure that b o i=k cÂkþr6255.Meanwhile,the parameters(k,r)can also affect the quality of the shadow images and the embedding capacity.The greater the value r is,the larger the embedding capacity is.However,the value r may increase the gap between adjacent pixel values,especially for the smooth image.Therefore,the greater the value r is,the less smooth the shadow image is.In regard to the different cov-er images,if the cover image is smooth,we need to select a small r.On the contrary,if the cover image is rich,we can selecta great r aiming at improving the embedding capacity.In orderto simplify the proposed method,in this paper,we let k=10and transform the y i into base-5representation.For instance,if y i=1304,we obtain y i=(2,0,2,0,4)5.This pair(10,5)can effec-tively avoid the overflow problem since b255=10cÂ10þ56255.And,we need r¼d log5Fðl0Þe pixels of the shadow image to represent y i.As to the i th level,r i can be computed by r i¼d log5F t iÀ1ðl iÞe;where F t iÀ1ðl iÞdenotes the(t iÀ1)th derivative of the function F(x).Step3.Without loss of generality,assume that we use r pixelspij;j¼1;2;...;r of the shadow image p i to represent y i as follows:pi1¼b o i=10cÂ10þy i1;pi2¼b o iþ1=10cÂ10þy i2;...pir¼b o iþrÀ110cÂ10þy ir;ð2Þwhere each yij;j¼1;2;...;r,denotes yi’s base-5representation.Step4.By repeating Steps1–3,the dealer can camouflage all se-cret data y i into the cover pixels,and by feeding i,for i=1,2,...,l0into F(x),the dealer can obtain thefirst level sha-dow images.As to the shadow images at the second highest level,since the threshold values are{t0,t1,...,tm},the dealer uses the polynomial F t0ðxÞto generate the shadow data yi,and so forth,so the shadow images of the i th level in the hierarchy are computed using the polynomial F t iÀ1ðxÞ.C.Guo et al./Pattern Recognition Letters33(2012)83–9185The generation process for the other levels of shadow images is the same as that of the highest level except that different polyno-mials are used.Repeat the above steps until all shadow images of various levels are generated.Fig.1displays theflowchart of the se-cret image sharing scheme.3.3.Secret image retrieving procedureIn the traditional secret image sharing schemes,given any t sha-dow images,the shared secret image can be reconstructed.In our scheme,according to Tassa’s threshold access structure,given sha-dow images must satisfy a sequence of threshold requirements.In order to extract the secret digits,the polynomial F(x)must be reconstructed by retrieving the shadow data yifrom the shadow images p i’s.The same method is used for different levels of shadow images.The details are as follows:pute y i byyi¼y i1jj y i2jj...jj y ir;ð3Þwhere yij¼p ij mod5,for j=1,2,...,r,and p ij denotes the j th pixel value of the i th shadow image,for i=1,2,...,t m.Step2.Collect enough t pairs(i,yi)’s to satisfy the hierarchical threshold access structure and employ the Birkhoff interpola-tion to reconstruct the(tÀ1)degree polynomial F(x).Step3.Extract the corresponding t coefficients s0,s1,s2,...,s tÀ1.Step4.Repeat Steps1–3until all secret data is extracted.Step5.Reconstruct the secret image.4.Experimental results and analysisThis section describes some experimental results in order to demonstrate the characteristics of the proposed scheme.We perform experiments for n=10.A secret image can be gen-erated in ten shadow images,and the ten shadow images are par-titioned into three levels.Assume that thefirst(highest)level has three shadow images,the second level has three shadow images, and the third(lowest)level has four shadow images.Assume a se-quence of threshold requirements t¼ðt0;t1;t2Þ¼ð2;4;7Þ;that is, the secret image can be reconstructed if and only if a subset of sha-dow images has at least seven shadow images,of which at least four are from thefirst two levels and at least two are from thefirst level.4.1.Simulation resultsThe peak signal-to-noise ratio(PSNR),defined in Eq.(4),can be used to measure the distortion of the shadow images after the se-cret data have been embedded into the coverimage. Fig.1.The diagram of the secret image sharing scheme.PSNR¼10log102552 MSE!dB:The mean square error(MSE)between the pixels and the shadow image is defined asMSE¼1MÂNXMÂNj¼1ðp jÀp0jÞ2;where p j is the original pixel value and p0jis thedow image.We usefifteen grayscale images with sizethe test images,as shown in Fig.2,and theAirplane is set to256Â256pixels,as shown inTable1displays the PSNR value of theusing the proposed scheme.Although the pixelimages of the proposed scheme are slightlyexisting secret image sharing methods,it iswhat we use as the cover image,the PSNR valuesalways maintain a steady level and are withinmore,we obtained a new access structure thatcations,and the distortion between the shadowimage is imperceptible by visual perception.In order to demonstrate the visual perception of the shadowimages,we use Peppers as the cover image with size512Â512pix-els and Airplane as the secret image with size256Â256pixels.If the secret images involved meet the hierarchical threshold access structure,our method can reconstruct them without distortion. Fig.4(a)and(b)shows the cover image and the extracted secret image,respectively.Fig.5(a)–(j)display ten shadow images of Peppers that are partitioned into three levels.Since the distortion between the cover image and the shadow images is slight,we can success-fully conceal the embedded the secret image data’s existence from intruders.(a) Bird (b)Woman (c) Lake (d) Man (e)Tiffany(f)Peppers (g)Lena (h)Fruits(i)Baboon(j)Airplane(k) Couple (l) Crowd (m) Cameraman(n) Boat(o) HouseFig.2.The test images.Fig.3.The secret image.5.DiscussionsProgressive visual secret sharing mechanism(Fang,2008; Huang et al.,2010)has the similar characteristics with our pro-posed scheme.Progressive visual secret sharing can be utilized to reconstruct the shared secret image gradually by superimposing more and more shadow images.By increasing the number of the shadow images being stacked,the details of the shared secret im-age can be revealed progressively.In our scheme,if we use the con-stant term and all coefficients of the polynomial F(x)to hide the secret data,our proposed scheme also can achieve a progressive ef-fect.Meanwhile,our scheme has a hierarchical threshold feature. That is,when the shadow images of a level involved meet a corre-sponding level threshold,the recovery of the shared secret image will be clearer.Further more,if we just use t0coefficients of the F(x)to hide the secret data,the proposed scheme can achieve an ideal hierarchical threshold access structure.If the shadow images involved can not satisfy the hierarchical threshold requirement, they can not obtain anything about the secret image.Progressive visual secret sharing is an important mechanism for application in transmission while hierarchical threshold secret image sharing provides a hierarchical threshold access structure for secret image sharing.To the best of our knowledge,our proposed scheme has a unique hierarchical threshold characteristic as compared with the existing secret image sharing schemes.Table2compares the functionality of the proposed scheme with that of related schemes.As presented in Table2,the new method satisfied the camouflage purpose and provided the satisfactory quality of shadow images.Meanwhile,the secret image can be reconstructed lossless.And the proposed secret image sharing scheme can provide a hierarchical threshold access structure.The new mechanism allows the participants to be partitioned into sev-eral levels,and the access structure is then determined by a se-quence of threshold requirements.In comparison with the traditional secret image sharing schemes(Yang et al.,2007;Chang et al.,2008;Lin et al.,2009;Lin and Chan,2010),the proposed hier-archical threshold secret image sharing scheme can not recover the cover image.And,quality of shadow images needs to be improved.In the proposed experiment,the ten shadow images are parti-tioned into three levels,and the corresponding thresholds are t0=2,t1=4,t2=7.We can compute the secret image data y i embedded into shadow images by using a(t2À1)th-degree poly-nomial F(x).Since y i is a large integer,we need to use r pixels of the shadow image to represent y i,so parameter r is important in order to maximize the secret capacity.In our experiment,three dif-ferent r values correspond with the three levels of shadow images.Table1The PSNR value(dB)of the shadow images for test images,n=10,t0=2,t1=4,t2=7.Test images Thefirst level The second level The third level PSNR(dB)12345678910Bird36.8737.3137.5837.9438.0238.1737.9938.0638.1038.14 Woman38.2938.7238.9939.3339.4239.5739.4039.4539.5139.54 Lake37.9438.3938.6739.0139.1039.2639.0839.1439.1939.22 Man37.7338.1738.4238.7638.8639.0138.8338.8738.9438.97 Tiffany36.3736.8137.0837.4337.5237.6737.4937.5537.5937.63Fig.4.The cover image and the extracted secret image.88 C.Guo et al./Pattern Recognition Letters33(2012)83–91r0¼d log5Fðl0Þe;r1¼d log5F00ðl1Þe;r2¼d log5Fð4Þðl2Þe::In this example,the embedding capacity(the number of pixels)can be computed asCapacity¼512Â512max f r0;r1;r2gÂt2:(a) The shadow from thefirst level, PSNR=37.32 dB(b) The shadow from thefirst level, PSNR=37.76 dB(c) The shadow from thefirst level, PSNR=38.03 dB(d) The shadow from thesecond level, PSNR=38.36 dB(e) The shadow from thesecond level, PSNR=38.45 dB(f) The shadow from thesecond level, PSNR=38.60 dB (g) The shadow from thethird level, PSNR=38.42 dB(h) The shadow from thethird level, PSNR=38.49 dB(i) The shadow from thethird level, PSNR=38.53 dB(j) The shadow from thethird level, PSNR=38.56 dB Fig.5.The results of Peppers,n=10,t0=2,t1=4,t2=7.Table2Comparisons of the related secret image sharing schemes.Functionality Yang et al.(2007)Chang et al.(2008)Lin et al.(2009)Lin and Chan(2010)OursHierarchical threshold No No No No Yes Meaningful shadow image Yes Yes Yes Yes Yes Quality of shadow images40dB40dB43dB42dB38dB Lossless secret image Yes Yes Yes Yes Yes Lossless cover image No No Yes Yes NoMaximum capacity MÂN4MÂN4ðtÀ3ÞÂMÂN3ðtÀ1ÞÂMÂN=d log r255e b MÂN=max f r i gcÂt mTable3The maximum capacity under different n and t i.n Level Threshold value Capacity(pixels)123t0t1t27223124174762822412417476210334247166818102441351639401224613615728614266148161318C.Guo et al./Pattern Recognition Letters33(2012)83–9189。
模式识别Pattern Recognition
2019年5月24
感谢你的观看
16
根据应用领域的划分
图象识别:染色体分类、遥感图象识别
人脸识别:
文字识别:中外文印刷体、手写体识别
数字识别:0-9印刷体、手写体识别,典型 例子:邮政手写数字识别
指纹识别:
掌纹识别:
语音识别:
2019年5月24
感谢你的观看
17
1.1.3 刊登模式识别研究成果的中文期刊
感谢你的观看
25
特征提取和选择
目的:从原始数据中,得到最能反映分类本质的特 征
特征形成:通过各种手段从原始数据中得出反映分 类问题的若干特征(有时需进行数据标准化)
特征选择:从特征中选取若干最能有利于分类的若 干特征
特征提取:通过某些数学变换,降低特征数目
2019年5月24
感谢你的观看
26
测量空间:原始测量数据组成的空间 特征空间:进行模式分类的空间。一个模式(样
感谢你的观看
19
模式识别系统的基本构造
模式识别系统的主要组成部分:数据获取、
预处理、特征提取和选择、分类决策
训
练
数据 获取
预 处 理
特征提取 与选择
分类器设计
过 程
分类决策
决 策
过
2019年5月24
感谢你的观看
20程
说明
1 这一系统构造适合于统计模式识别、模 糊模式识别、人工神经网络中有监督方 法
感谢你的观看
29
模式识别关注的内容
1 特征选择与提取 2 分类器的设计 3 分类决策规则
2019年5月24
感谢你的观看
30
1.3 关于模式识别的一些基本问题
1.3.1 模式样本)表示方法
一pattern recognition c1
• 分类决策:把特征送入决策分类器
第一章 模式识别概论
34
第一章 模式识别概论
35
第一章 模式识别概论
36
第一章 模式识别概论
37
第一章 模式识别概论
38
第一章 模式识别概论
39
第一章 模式识别概论
40
模式分类器的评测过程
• • • • • 数据采集 特征选取 模型选择 训练和测试 计算结果和复杂度分析,反馈
• 分类器设计:分类器设计的主要功能是通过训 练确定判决规则,使按此类判决规则分类时, 错误率最低。把这些判决规则建成标准库。 • 分类决策:在特征空间中对被识别对象进行分 类。
第一章 模式识别概论
31
模式识别过程实例
• 在传送带上用光学传感器件对鱼按品种 分类 鲈鱼(Seabass) 品种 鲑鱼(Salmon)
模式识别
模式识别
1
模式识别
引 言
2
模式识别
课程对象
模式识别学科硕士研究生的专业基础课
3
第一章 模式识别概论
4
与模式识别相关的学科
• • • •
• • • •
统计学 概率论 线性代数(矩阵计算) 形式语言
机器学习 人工智能 图像处理 计算机视觉
模式识别
教学方法
着重讲述模式识别的基本概念,基本方法 和算法原理。 注重理论与实践紧密结合 实例教学:通过大量实例讲述如何将所学 知识运用到实际应用之中 避免引用过多的、繁琐的数学推导。
第一章 模式识别概论
15
模式识别与图象识别,图象处理的关系 模式识别是模拟人的某些功能 模拟人的视觉: 计算机+光学系统 模拟人的听觉: 计算机+声音传感器 模拟人的嗅觉和触觉: 计算机+传感器
Pattern Recognition
Pattern Recognition 49 (2016) 102–114Yimin Zhou a ,n ,1, Guolai Jiang a ,b ,1, Yaorong Lin baShenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, ChinabSchool of Electronic and Information Engineering, South China University of Technology, Chinaa r t i c l e i n f o ab s t r ac t&2015 Elsevier Ltd. All rights reserved.Article history:Received 17 March 2014 Received in revised form 8 August 2014Accepted 29 July 2015Available online 8 August 2015This paper presents a high-level hand feature extraction method for real-time gesture recognition. Firstly, the fi ngers are modelled as cylindrical objects due to their parallel edge feature. Then a novel algorithm is proposed to directly extract fi ngers from salient hand edges. Considering the hand geometrical characteristics, the hand posture is segmented and described based on the fi nger positions, palm center location and wrist position. A weighted radial projection algorithm with the origin at theKeywords: Computer vision Finger modelling Salient hand edge Convolution operatorReal-time hand gesture recognitionwrist position is applied to localize each fi nger. The developed system can not only extract extensional fi ngers but also fl exional fi ngers with high accuracy. Furthermore, hand rotation and fi nger angle variation have no effect on the algorithm performance. The orientation of the gesture can be calculated without the aid of arm direction and it would not be disturbed by the bare arm area. Experiments have been performed to demonstrate that the proposed method can directly extract high-level hand feature and estimate hand poses in real-time.A novel fi for handreal-time technique estimation pose hand and nger gesture recognitionat available lists Contents ScienceDirec tjournal homepage:/locate/prPattern RecognitionY . Zhou et al. / Pattern Recognition 49 (2016) 102–114 1031. IntroductionHand gesture recognition based on computer vision technology has been received great interests recently, due to its natural human-computer interaction characteristics. Hand gestures are generally composed of different hand postures and their motions. However, human hand is an articulated object with over 20 degrees of freedom (DOF) [12], and many self-occlusions would occur in its projection results. Moreover, hand motion is often too fast and complicated compared with current computer image processing speed. Therefore, real-time hand posture estimation is still a challenging research topic with multi-disciplinary work including pattern recognition, image processing, computer vision, arti fi cial intelligence and machine learning. In human –machine interaction history, keyboard input & character text output and mouse input & graphic window display are main traditional interaction forms. With the development of computer techniques, the human –machine interaction via hand posture plays an important role under three dimensional virtual environment. Many methods have been developed for hand pose recognition [3,4,10,18,24,29].A general framework for visual based hand gesture recognition is illustrated in Fig. 1. Firstly, the hand is located and segmented from the input image, which can be achieved via skin-color based segmentation methods [27,31] or direct object recognition algorithms. The second step is to extract useful feature for static hand posture and motion identi fi cation. Then the gesture can be identi fi ed via feature matching. Finally, different human machine interaction can be applied based on the successful hand gesture recognition.There are a lot of constraints and dif fi culties in accurate hand gesture recognition from images since human hand is an object with complex and versatile shapes [25]. Firstly, different from less remarkable metamorphosis objects such as human face, human hand possesses over 20 free degree plus variations in hand gesture location and rotation which make hand posture estimation extremely dif fi cult. Evidence shows that at least 6-dimension information is required for basic hand gesture estimation. The occlusion also could increase the dif fi culty in pose recognition. Since the involved hand gesture images are usually two dimensioned images, it would result in occlusion of some key parts of the hand on the plane project due to various heights of the hand shapes.Besides, the impact of the complex environment to the broadly applied visual-based hand gesture recognition techniques has to/10.1016/j.patcog.2015.07.014 0031-3203/&2015Elsevier Ltd. All rights reserved.nCorresponding author.E-mail addresses: ym.zhou@ (Y . Zhou), gl.jiang@ (G. Jiang). 1fi rst author and second author contribute equally in the paper. Thebe considered. The lightness variation and complex background such factors make it more dif fi cult for the hand gesture segmentation. Up to now, there is no united de fi nition for dynamic hand104Y . Zhou et al. / Pattern Recognition 49 (2016) 102–114Fig. 2. Hand gesture models with different complexities (a) 3D strip model; (b) 3 D surface model; (c) paper model [36]; (d) gesture silhouette; and (e) gesture contour.gesture recognition, which is also an unsolved problem to accommodate human habits and facilitate computer recognition. It should be noted that human hand has deformable shape in front of a camera due to its own characteristics. The extraction of a hand image has to be executed in real-time independent of the users and device. Human motion possesses a fast speed up to 5 m/s for translation and 300 1C/s for rotation. The sampling frequency of a digital camera is about 30–60 Hz, which could result in fuzzi fi cation on the collected images with negative impact on further identi fi cation. On the other hand, with the hand gesture module added in the system, the dealt frame number per second for the computer will be even less, which will bring more serious pressure on the relatively lower sampling speed. Moreover, a large amount of data have to be dealt in computer visual system, especially for high complex versatile objects. Under current computer hardware conditions, a lot of high-precision recognition algorithms are dif fi cult to be operated in real-time.Our developed algorithm focuses on single camera based realtime hand gesture recognition. Some assumptions are made without loss of generality: (a) the background is not too complex without large area skin color disturbance; (b) lightness should avoid too low or too light such worse conditions; (c) the palm is right faced to the camera with distance in the range r 0:5 m. These three limitations are not dif fi cult to be realized in the actual application scenarios.Firstly, a new fi nger detection algorithm is proposed. Compared to previous fi nger detection algorithms, the developed algorithm is independent of the fi nger tip feature but can extract fi ngers directly from the main edge of the whole fi ngers. Considering that each fi nger has two main “parallel ” edges, a fi nger is determined from convolution result of salient hand edge image with a speci fi c operator G. The algorithm can not only extract extensional fi ngers but also fl exional fi ngers with high accuracy, which is the basis for complete hand pose high-level feature extraction. After the fi nger central area has been obtained, the center, orientation of the hand gesture can be calculated. During the procedure, a novel high-level gesture feature extraction algorithm is developed. Through weighted radius projection algorithm, the gesture feature sequence can be extracted and the fi ngers can also be localized from the local maxima of angular projection, thus the gesture can be estimated directly in real-time.The remainder of the paper is organized as follows. Section 2 describes hand gesture recognition procedure and generally used methods. Finger extraction algorithm based on parallel edge characteristics is introduced in Section 3. Salient hand image can also be achieved. The speci fi c operator G and threshold is explained in detail in Section 4. High-level hand feature extraction through convolution is demonstrated in Section 5. Experiments in different scenarios are performed to prove the effectiveness of the proposed algorithm in Section 6. Conclusions and future works are given in Section 7.2. Methods of hand gesture recognition based on computer vision2.1. Hand modellingHand posture modelling plays a key role in the whole hand gesture recognition system. The selection of the hand model is dependent on the actual application environments. The hand model can be categorized as gesture appearance modelling and 3D modelling. Generally used hand gesture models are demonstrated in Fig. 2.3D hand gesture model considers the geometrical structure with histogram or hyperquadric surface to approximate fi nger joints and palm. The model parameters can be estimated from single image or several images. However, the 3D model based gesture modelling has quite a high calculation complexity, and too many linearization and approximation would cause unreliable parameter estimation. As for appearance based gesture models, they are built through appearance characteristics, which have the advantages of less computation load and fast processing speed. The adoption of the silhouette, contour model and paper model can only re fl ect partial hand gesture characteristics. In this paper, based on the simpli fi ed paper gesture model [36], a new gesture model is proposed where each fi nger is represented by extension and fl exion states considering gesture completeness and real-time recognition requirements.Many hand pose recognition methods use skin color-based detection and take geometrical features for hand modelling. Hand pose estimation from 2D to 3D using multi-viewpoint silhouette images is described in [35]. In recent years, 3D sensors, such as binocular cameras, Kinect and leap motion, have been applied for hand gesture recognition with good performance [5]. However, hand gesture recognition has quite a limitation, since 3D sensors are not always available in many systems, i.e., Google Glasses.2.2. Description of hand gesture featureThe feature extraction and matching is the most important component in vision-based hand posture recognition system. In early stage of the hand gesture recognition, colored glove or labeling methods are usually chosen to strengthen the feature in different parts of the hand for extraction and recognition. Mechanical gloves can be used to capture hand motion, however, they are rigid with only certain free movements and relatively expensive cost [23]. Compared with the hand recognition methods with additional assistance of data glove or other devices, computervision based hand gesture recognition will need less or no additional device, which is more adaptable and has bright application prospect. A real-time algorithm to track and recognize hand gesture is described for video game in [23]. Only four gestures can be recognized, which has no generality. Here, the hand gesture images without any markers arediscussed for feature extraction.Fig. 1. The general framework of computer based hand posture recognition.Y . Zhou et al. / Pattern Recognition 49 (2016) 102–114 105The generally used image feature for hand gesture recognition can be divided into two categories, low-level and high-level, as shown in Fig. 3. The low-level features such as edge, edge orientation, histogram of oriented gradients (HOG) contour/silhouette and Haar feature, are basic computer image characteristics and can be extracted conveniently. However, in actual applications, due to the diversities of hand motions, even for the same gesture, the subtle variation in fi nger angle can result in large difference in the image. With rotational changes in hand gesture, it is much more dif fi cult to recognize gestures with direct adoption of low-level feature matching.Since the skin color is a distinctive cue of hands which is invariant to scale and rotation, it is regarded as one of the key features. Skin color segmentation is widely used for hand localization [16,31]. Skin detection are normally achieved by Bayesian decision theory, Bayesian classi fi er model and training images [22]. Edge is another common feature for model-based matching [36]. Histogram of oriented gradients has been implemented in [30]. Combinations of multiple features can improve the accuracy and robustness of the algorithm [8].Fingertip position, fi nger location and gesture orientation such high-level gesture features are related to the hand structure, which has direct relationship to the hand recognition. Therefore, they can be easily matched for various gesture recognition in real-time. However, this type of features is generally dif fi cult to be extracted accurately.In [1], fi ngertips were located from probabilistic models. The detected edge segments of monochrome images are computed by Hough transform for fi ngertip detection. But the light and brightness would seriously affect the quality of the dealt images and detection result. Fingertip detection and fi nger type determination are studied with a model-based method in [5], which is only applicable for static hand pose recognition. In [26], fi ngertips are found by fi ngertip masks considering their characteristics, and they can be located via feature matching. However, objects which share similar fi ngertip shapes could result in a misjudgment.Hand postures can be recognized through the geometric features and external shapes of the palm and fi ngers [4]. It proposes a prediction model for showing the hand postures. The measurement error would be large, however, because of the complexity of the hand gestures and diversi fi ed hand motions. In [3], palm and fi ngers were detected by skin-colored blob and ridge features. In [11], a fi nger detection method using grayscale morphology and blob analysis is described, which can be used for fl exional fi nger detection. In [9,13], high-level hand features were extracted by analyzing hand contour. 2.3. Methods of hand gesture segmentationFast and accurate hand segmentation from image sequences is the fundamental for gesture recognition, which have direct impact on the followed gesture tracking, feature extraction and fi nal recognition performance. Many geometrical characteristics can be used to detect hand existence in image sequences via projection, such as contour, fi ngertip and fi nger orientation [26]. Other non-geometrical features, i.e., color [2], strip [34] and motion can also be used for hand detection. Due to the complex background, unpredictable environmental factors and diversi fi ed hand shapes, hand gesture segmentation is still an open issue.Typical methods for hand segmentation are summarized as follows. Increasing constraints and building hand gesture shape database are usually used for segmentation. Black or white wall and dark color cloth can be applied to simplify the backgrounds. Besides, particular colored gloves can be worn todivide the hand and background through emphasized front view. Although these kinds of methods have good performance, it adds more limitation at the cost of freedom. A database can be built to collect hand sample images at any moment with different positions and ratios for hand segmentation through matching. It is a time consuming process, though, the completeness of thedatabase can never be achieved which has to be updated all the time.Methods of contour tracking include snake-model based segmentation [17], which can track the deformation and non-rigid movement effectively, so as to segment the hand gesture from the backgrounds. Differential method [20] and its improved algorithm can realize the segmentation by the deduction from the object images to the background images. It has a fatal defect that the camera has to be fi xed and the background should be kept invariant during background and hand image extraction.Skin color, assumed as one of the most remarkable surface features of human body, is often applied in gesture segmentation [31]. However, only adoption of this feature would be easily affected by the ambient environmental variations, especially when a large area of skin color disturbance is in the background, i.e., hand gesture overlapped by human face. Motion is another remarkable and easily extracted feature in gesture images. The combination of these two features becomes more and more popular in recent years [15].Depth information (the distance information between the object and the camera) in the object images can also be used for background elimination and segmentation since human hands are the closet objects to the camera. Currently, the normally used depth camera are Swiss Ranger 4000 from Mesa Imaging company, Cam cube 2.0 from PMD Technologies, Kinect from Microsoft and Depth camera from PrimeSense. 2.4. Methods of gesture recognition2.4.1. Methods of static gesture recognitionMethods of static gesture recognition can be classi fi ed into several categories:(1) Edge feature based matching: Gesture recognition based on this type offeature is realized through the calculated relationship between data sets of the features and samples to seek the best matching [22,28,33,36]. Although it is relatively simple for feature extraction and adaptable for complex background and lightness, the data based matching algorithm is quite complicated with heavy computational load and time cost. A large amount of templates should be prepared to identify different gestures.(2) Gesture silhouette based matching: Gesture silhouette is normallydenoted as binary images of the wrapped gesture from segmentation. In [20], matching is calculated through the size of the overlapped area between the template and silhouette.Zernike matrix of the images is used to cope with the gesture rotation [14] and feature set is developed for matching. The disadvantages of this type of method are that not all gestures can be identi fi ed with only silhouette and accurate hand segmentation is required.(3) Harr-like feature based recognition: Harr-like feature based Adaboostrecognition algorithm has achieved good performance in face recognition [21]. This method can be used for hand detection [37] and simple gesture recognition [10]. Experiments demonstrate that the method can recognize speci fi c gestures in real-time under complex background environments. However, Harr-like feature based algorithmFig. 3. The generally used feature and classi fi cation for hand gesture recognition.Extraction Identification106 Y. Zhou et al. / Pattern Recognition 49 (2016) 102–114has high requirement on the consistency for the dealt objects, whereas hand gesture has diversi fi ed shape variations. Currently, this type method can only be applied in prede fi ned static gestures recognition.(4)External contour based recognition: External contour is an importantfeature for gesture. Generally speaking, different gestures have different external contours. The curvature of the external contour is varied in different positions of a hand (i.e., curvature is large at fi ngertip). In [9], curvature is analyzed for CSS (Curvature Scale Space) feature extraction to recognize gesture. Fingertip, fi nger root and joint such high-level features can be extracted from contour analysis [13]. A feature sequence is constructed by the distances from the contour points to the center for gesture recognition [32]. This type of method is adaptable for the angle variation between fi ngers but also dependent on the performance of the segmentation.(5)Finger feature based recognition: Finger is the most widely appliedhigh-level feature in hand pose recognition since the location and states of the fi ngers embody the most intuitional characteristics of different gestures. Only the fi nger positions, fi nger states and hand center are located thus simple hand gestures can be determined directly.Several fi ngertip recognition algorithms are compared in [6]. In [26], circular fi ngertip template is used to seek the fi ngertip location and motion tracking. Combined with skin color feature, Blob and Ridge features are used to recognize palm and fi ngers [3]. However, only extensional fi ngers can be recognized via this type of methods.2.4.2. Methods of motion gesture recognitionTime domain models are normally adopted for motion gesture recognition, which includes HMM (Hidden Markov Model) and DTW (Dynamic Time Warping) based methods:(1)HMM-based method: It has achieved good performance in voicerecognition area and applied in gesture recognition as well. Different motion gesture sequences are modelled via HMM, and each gesture is related to a HMM process. HMMbased method can realize recognition through feature matching at each moment, whose training process is a dynamic programming (DP) process. This method can provide time scale invariance and keep the gestures in time sequence. However, the training process is time consuming and the selection of its topology structure is determined by the expert experience, i.e., trial and error method used for number of invisible states and transfer states determination.(2)DTW-based method: It is widely used in simple tracking recognitionthrough the difference between the dealt gestures and standard gestures for feature matching at each moment. HMM and DTW are essentially dynamic programming processes, and DTW is the simpli fi ed version of HMM. DTW-based recognition has limitation on the word database applications.In summary, methods of hand modelling, gesture segmentation and feature extraction are discussed. Most used hand gesture recognition methods are also illustrated. The following sections will introduce the proposed algorithm for real-time hand gesture recognition in detail. 3.Finger extraction algorithm based on parallel edge featureThe most notable parts of a hand to differentiate other skin objects, i.e, human face, arm, are fi ngers. As it is known, fi nger feature extraction and matching have great signi fi cance in hand segmentation. Contour can be extracted from silhouette of a hand region as the commonly used feature for hand recognition. Due to nearly 30 degrees of freedom in hand motion, hand image extraction will be executed regarding a hand as a whole. Moreover, arm should be eliminated. It should be noted that occlusion among four fi ngers (except thumb) could frequently occur, especially for fl exional fi ngers.To solve these problems associated with hand image extraction, a model-based approach for fi nger extraction is developed in this paper. It can obviate the fi nger joint location in hand motion and extract fi nger features from the silhouette of the segmented hand region. In complex background circumstances, models with fi xed threshold can result in false detection or detection failure. However, fi xed threshold color model is still selected for segmentation in this paper because of its simplicity, low computational load and invariant properties with regard to the various hand shapes. The threshold is prede fi ned to accommodate the general human hand sizes. Then the selected pixels are transformed from RGB-space to YCbCr-space for segmentation. Finger extraction is explained in detail.3.1.Salient hand gesture edge extraction3.1.1. Finger modellingCombined with the skin, edge and external contour such easily extracted low-level features, a novel fi nger extraction algorithm is proposed based on the approximately parallel fi nger edge appearance. It can detect the states of the extensional and fl exional fi ngers accurately, which can also be used for further high-level feature extraction. It is known that fi ngers are cylindrical objects with nearly constant diameter from root to tip. As for human hand motion, it is almost impossible to move the distal interphalangeal (DIP) joint without moving the adjacent proximal interphalangeal (PIP) joint with no external force assistance and vice versa. Therefore, there is almost a linear relationship between these two types of joints, where the fi nger description can be simpli fi ed accordingly.Firstly, each fi nger is modelled by its edges, which is illustrated in Fig. 4(a).C fi, the boundary of the ith fi nger, is the composition of arc edges (C ti, fi ngertip or joints) and a pair of parallel edges (C ei [ C0ei , fi nger body), described asC fi ¼ ðC ei [ C 0ei Þ [ X C ti;j ð1Þj ¼1;2where the arc edge C ti;j denotes the fi nger either in extensional (j¼1) state or fl exional state (j¼2) (see two green circles inFig. 4(b)).The fi nger center line (FCL), C FCLi, is introduced as the center line of the parallel fi nger edges to represent the main fi nger body. The distance between fi nger edges is de fi ned as 2d, which is the averaged diameter of all the fi ngers. Fingertip/joint center O ti is located at the end of FCL, and it is the center of the arc curve C ti as well. The fi nger central area along with C FCLi will be extracted for fi nger detection. Compared with many algorithms based on fi ngertip feature [26], the proposed method is more reliable which can also detect the fl exional fi ngers successfully.Y . Zhou et al. / Pattern Recognition 49 (2016) 102–114 107Fig. 5. The diagram of extracting hand salient edge procedure.3.1.2. Structure of hand gesture edgeThe remarkable hand edge C hand of a gesture can provide concise and clear label for different hand gestures. Considering the hand structure characteristics and the assumed constraints, C hand consists of the following curves:5C hand ¼ X C fi þC p þC nð2Þi ¼ 1where C p is the palm/arm edge curve and C n is the noise edge curve. The diagram of the hand edge is shown in Fig. 4(b). Finger edges and palm/arm edges have direct relationship with the gesture structure, which are the main parts of the hand gesture edges and have to be detected as fully as possible. Edge curves formed by palmprint, skin color variation and skin rumple are noise, which has no connection with the gesture structure and should be eliminated completely.3.1.3. Extracting the salient hand gesture edgeAs for the hand gesture image input with complex background, skin color based segmentation algorithm is selected for initial segmentation. Morphology fi lter is then used to extract hand area mask and gesture contour I contour ðx ; y Þ. The gray image I gray ðx ; y Þof the gesture can be obtained at the same time, where arm area might be included. Canny edge detection algorithm [7] can extract most of the remarkable gesture edges. However, the detection results will also contain some noisy edges formed by the rumple and re fl ection from hand surface, which should be separated from the fi nger edges. handThecontoursalient whichhand edgeincludesimageboundariesI edge ðx ; y Þ ofis mainlyextensional made fi ngers,up offl exional fi ngers and arm/palm. Adduction (approximation) and abduction (separation) movements of the fi ngers can be referenced with fi nger III (middle fi nger), and it has slight movement without external force disturbance during motion. When the fi ngers are in extensional states, they are free to carry out adduction and abduction movements, whose edges areeasily obtained. When the fi ngers are clenched into a fi st or in ‘six ’ numbergesture such fl exional states, as shown in Fig. 4, obvious ravine would be formed in the appressed part with lower gray values in the related pixels. Based on this characteristic, most noisy edges can be eliminated.The procedure of extracting the salient hand edge is depicted in Fig. 5. One of the hand posture shown in Fig. 4(b) is used as an example, where its grayscale hand image can be seen in Fig. 5( a ). The steps of I edge ðx ; y Þ extraction are summarized as follows:1. Extract grayscale hand image I gray ðx ; y Þ (see Fig. 5(a)) and hand contour image I contour ðx ; y Þ (see Fig. 5(b)) from source color image using skin color segmentation method in [16].2. Extract canny edge image I canny ðx ; y Þ (see Fig. 5(c)) from3. ApplyingI gray ðx ; y Þ [7]the. threshold Th black ( prede fi ned) to grayscale handimage for extraction, then the obtained I black ðx ; y Þ (see Fig. 5( d )) is I black ðx ; y Þ ¼ ( 10;; ððII graygray ððxx ;; yy ÞÞZo ThTh blackblack Þ Þ3Þ ðThe boundaries of the fl exional fi ngers are extracted from the overlapped area, i.e., I canny ðx ; y Þ \ I black ðx ; y Þ.4. Then the salient hand edge image I edge ðx ; y Þ is obtained:I edge ðx ; y Þ ¼ I black ðx ; y Þ \ I canny ðx ; y Þ [ I contour ðx ; y Þ ð4ÞThe curve denoted by binary image I edge ðx ; y Þ shown in Fig. 5( e )) is the remarkable edge C hand .3.2. Finger extraction via parallel edge featureFinger edge modelHand gesture edgesFig. 4. The diagram of the fi nger edge model. (For interpretation of the references to color in this fi gure, the reader is referred to the web version of this paper.)ANDORgreyI contourI canny I blackI edgeI ExtentionalfingersFlexionalfingerseiC eiC tiC d2 Fingertip/jointcurveFinger paralleledgesFinger center line ( F CL) tiO Fingertip/jointcenterFingertip or finger joint edge tjCPalm/arm edge pC nC Noise edge The parallel edges of fingersFinger center line (FCL) iFCLC。
模式识别论文(Pattern recognition)
模式识别论文(Pattern recognition)Face recognition based on sparse representationImage sparse representation of the image processing in the exergy is very suitable for image sparse representation of the image obtained by decomposition of gaugeThe calculations are enormous. Using MP implementation method based on image sparse decomposition algorithm using genetic algorithm for fast exergy processThe best atom is decomposed at each step.The problem of face recognition is a classical pattern recognition problem. In recent years by the Exergy Theory of compressed sensing based on dilute inspired exergySparse representation of face recognition technology has been extensively studied. Face recognition based on sparse representation is the construction of words using training picturesThe sparse linear combination coefficients and exergy exergy code by solving an underdetermined equation to obtain the test images according to these coefficientsThe image recognition classification.Keywords image processing in the sparse representation of the MP within the genetic algorithm of sparse decompositionFace, recognition, via, sparse, representationAbstract:, sparse, representation, of, images, is, very, suitable,, for, image, processing,But, the, computational, burden, in, sparse, decomposition, process, image, is, huge,, A, newFast, algorithm, was, presented, based, on, Matching, Pursuit (MP), image, sparseDecomposition. At, first, Genetic, Algorithms (GA), was, applied, to, effectively, searchIn, the, dictionary, of, atoms, for, the, best, atom, at, each,, step, of, MPFace, recognition, problem, is, a, classic, problem, of, pattern,, recognition., In, recentYears, inspired, by, the, theory, of, perception, is, compressed, sparseRepresentation-based, face, recognition, technology, has, been, widely, studied., FaceRecognition, based, on, sparse, representation, is, to, take, advantage,, of, the, trainingImages, constructed, dictionary, owed, by, solving, a, the, most,, sparse, linear, combinationCoefficients, given, equation, to, obtain, the, test, images, then, these, coefficients, toIdentify image classification.Key words: image processing; sparse representation; sparse decomposition;Matching Pursuit; Genetic Algorithms0 Introduction the current face recognition technology of rapid development especially the exergy basedStatic face detection and recognition, and face feature extractionMulti face recognition based on multi pose has been achievedA great deal of research. But the exergy exergy in more complex environmentsSuch as facial expression recognition, illumination compensation and Guang ZhaomoThe establishment of the model, the treatment of age changes, and a variety of testing dataThere is a lack of effective methods for fusion.Face recognition includes three steps in face detectionMeasurement, face feature extraction, face recognition and verification. There arePeople on thisExtension of the exergy based on the above three stepsOn Exergy increased early standardization, and correction and later pointsClass and management these two steps.The research of face recognition started in the late 1960sL2]. Has experienced 40 years of development. Roughly divided into threeThree stages:The first stage is the initial stage from 60s to the end of exergyLate 80s. The main technique adopted at that time was baseTo set the structure characteristics of the face recognition method of exergy isAs a general pattern recognition problem is studied. generationThe figures include Bledsoe (Bledsoe) and Gordon Stein(Goldstein), Harmon (Harmon), and Kim Wu Hsiung(KanadeTakeo) et al. At that time almost all were identifiedThe process relies on manual operation and results in no exergy into very important practical applications in not many basically noHave practical application.The second stage is in the exploration stage from 70s to eightThe ten age. During this period, as well as engineers in the smokeLead neuroscientists and psychologists to the fieldResearch. The former is mainly through the perception mechanism of the human brainTo explore the possibility in automatic face recognition while the orderSome theoretical obtained has some defects and partial nature but inEngineering techniques for design and implementation of algorithms and systemsThe personnel have the important theory instructionsignificance.The third stage is the stage of rapid development in the last century from the nineFrom the ten to the present. Computer vision and pattern recognition technologyIn the rapid development of computer image processing technology and drivesThe rapid development of face recognition. Governments are also heavily financedIn the study of face recognition and achieved fruitful results.Among them, Eigenfaee and Fisherface is this momentThe most representative, the most significant achievements of the twoThree kinds of face recognition algorithms have become the base of face recognitionAlgorithms and industrial standards.1 sparse representation of the mathematical form of sparse representation of the face recognition problem is represented mathematicallyF = A X Y is in the m where Y is the dimension of natural channelNo, A is also known from a predefined dictionary based X is a natural increase.The n-dimensional sparse representation of signals under predefined bases. KnownBased on the original signal by solving its in the predefined baseIn the sparse representation is a sparse encoding problem in the following twoSolution method]3-1 [fSparse encoding f sparse regularization constraints K||X|| S.T. ||AX-Y||argmin0?The 22 rate in XThe error constrained sparse encoding exergy in FRate of 220 ||AX-Y|| S.T. ||X||argmin?XType F XIs the original signal Y, under the predefined baseThe sparse representation coefficient of exergy is share error tolerance share K is sparseShare threshold 0||The || said in that the number of columns of 0l norm vector 0Number of elements.Sparse coding and compressed sensing reconstruction of signals haveThat rate and the minimum eight norm can be very goodRestructure。
PatternRecognition
16
Decision Trees
#holes
0 moment of inertia
1 #strokes
2
#strokes
<t
best axis direction
0
t #strokes 2 x 4 w
0
1 0 1
60
90
-
/
1
0
A
8
B
17
Decision Tree Characteristics
20
Information Gain
The information gain of an attribute A is the expected reduction in entropy caused by partitioning on this attribute. |Sv| Gain(S,A) = Entropy(S) ----- Entropy(Sv) v Values(A) |S| where Sv is the subset of S for which attribute A has value v.
8
Classification using nearest class mean
Compute the
Euclidean distance between feature vector X and the mean of each class.
Choose closest class,
if close enough (reject otherwise)
The data is converted to a discrete structure (such as a grammar or a graph) and the techniques are related to computer science subjects (such as parsing and graph matching).
pattern recognition格式
pattern recognition格式(原创实用版)目录1.模式识别的定义和重要性2.模式识别的基本任务和方法3.模式识别的应用领域4.我国在模式识别领域的发展正文一、模式识别的定义和重要性模式识别,顾名思义,是指对模式进行识别的过程。
它是指通过计算机系统、人工智能技术等手段,对输入的信号、数据或图像等信息进行处理,分析其内在规律,从而识别出特定的模式或特征。
在信息处理、自动化技术、人工智能等领域,模式识别具有重要的应用价值。
二、模式识别的基本任务和方法模式识别的基本任务是根据输入的信号、数据或图像等信息,提取其特征,进行分类、识别或描述。
模式识别的方法主要包括以下几种:1.数据预处理:对输入的数据进行清洗、归一化等操作,提高数据质量。
2.特征提取:从输入数据中提取有代表性的特征,作为分类或识别的依据。
3.分类:根据提取的特征,将数据划分到不同的类别中。
4.识别:通过比较输入数据与已知的模式,确定其所属的类别。
5.描述:对输入数据的特征进行描述,以便于人们理解和分析。
三、模式识别的应用领域模式识别在多个领域都有广泛的应用,如:1.图像识别:在计算机视觉领域,模式识别技术可以用于图像识别、目标检测等任务。
2.语音识别:将人类的语音信号转换为文字,是语音识别的主要任务。
3.自然语言处理:模式识别在自然语言处理领域也有广泛应用,如文本分类、情感分析等。
4.医学诊断:模式识别可以用于辅助医生进行疾病诊断,提高诊断的准确性。
四、我国在模式识别领域的发展我国在模式识别领域取得了一系列的研究成果和应用,如人脸识别技术、语音识别技术等。
基于Pattern Recognition的数据分类算法研究
基于Pattern Recognition的数据分类算法研究在当今数据大爆炸的时代,各行各业都面临着海量数据的处理问题。
如何快速准确地进行数据分类,成为了迫切需要解决的难题。
其中,Pattern Recognition(模式识别)技术的应用能够有效提高数据分类的准确率。
本文将围绕基于Pattern Recognition的数据分类算法展开研究。
一、Pattern Recognition技术简介Pattern Recognition技术,又称模式识别技术,是一种机器学习的应用。
其主要功能是自动识别、分类和预测数据。
Pattern Recognition技术有着广泛的应用,例如人脸识别、手写字符识别、语音识别、文本分类等领域。
二、基于Pattern Recognition的数据分类算法种类1、决策树算法决策树算法是一种很常见的分类算法。
其原理是依据数据特征进行分枝,建立一棵决策树。
该算法通过递归使得每个分支的数据集更加单纯,直到无法分解,即叶子节点。
在实际应用中,决策树算法应用广泛,例如垃圾邮件过滤和医学诊断等领域。
2、贝叶斯分类算法贝叶斯分类算法通过对数据集进行训练,得到每类数据的特征概率。
当新数据输入时,该算法会计算该数据属于每个类别的概率,并选择概率最大的类别作为分类结果。
在口碑评价和垃圾邮件过滤等领域中,贝叶斯分类算法是一种常见的分类方法。
3、AdaBoost算法AdaBoost算法是一种集成学习算法,它将多个弱分类器组合成一个强分类器。
该算法通过迭代训练,提高模型分类准确率。
AdaBoost算法已广泛应用于人脸检测、文本分类等领域。
4、支持向量机算法支持向量机算法是一种二分类算法。
该算法通过寻找一个最优的超平面,将数据分为两类。
支持向量机算法具有分类效果好、泛化能力强、在高维空间也能有效地工作等优点。
在生物特征识别、网络入侵检测等方面,支持向量机算法具有一定的优势。
三、基于Pattern Recognition的数据分类算法实践应用在实际应用中,基于Pattern Recognition的数据分类算法已经得到广泛应用。
pattern recognition c2-3,4,5模式识别课件
即: i 2 ... 0 2 I ... ... ... ,只有方差,协方差为 零。 0 ... 2
判别函数:
贝 叶 斯 决 策 理 论
1 x i T i1 x i n ln 2 1 ln i ln P(i ) 2 2 2 n 1 因为 i 2 I , i 1 2 I , i 2 I , ln 2都与 i无关。 2 对分类无影响。 1 1 T g i ( x ) x i i x i ln P (i ) 2 g i ( x)
14
第 二 讨论:针对ω1,ω2二类情况,如图: 章 2
(a ) : 因为 i 0 I , 所以等概率面是椭圆, 长轴由 i 本征值决定 (b) : 因为W与( x x0 )点积为0, 所以W与( x x0 )正交, H通过x0点。
(c) : 因为W 1 ( i j ); 所以W与( i j )不同相; H不垂直于值联线。 (d ) : 若各类先验概率相等, 则x0 1 ( i j ),则H通过均值联线中点 ; 2 否则H离开先验概率大的一类 。 H 2
5
第 二 章
E x1 1 x1 1 ...E x1 1 xn n ...... E xn n x1 1 ...E xn n xn n
2 11 ... 2 n 1 2 12
贝 叶 斯 决 策 理 论
未知 x,把 x 与各类均值相减,把 x 归于最近 一类。最小距离分类器。
把( x i )T 1 ( x i )展开; xT 1 x与i无关。 g i ( x) Wi T x wi 0(线性函数) , 其中 Wi 1i wi 0 1 T i 1i ln P(i ) 2
Pattern Recognition Mechanisms
PONTIFICIAE ACADEMIAE SCIENTIARVM SCRIPTA VARIA54STUDY WEEKonPATTERN RECOGNITIONMECHANISMSApril 25-29, 1983EDITED BYCARLOS CHAGAS - RICARDO GATTASS - CHARLES GROSSEX AEDIBVS ACADEMICIS IN CIVITATE VATICANA—MCMLXXXVTHE ANALYSIS OF MOVING VISUAL PATTERNSJ. ANTHONY MOVSHON*, EDWARD H. ADELSON**, MARTIN S. GIZZI*and WILLIAM T. NEWSOME***I NTRODUCTIONThere is abundant evidence that the orientation of contours is a feature of considerable importance to the visual system. Both psycho-physical and electrophysiological studies suggest that the retinal image is treated relatively early in the visual process by orientationally-tuned spatial filters (see Hubel and Wiesel, 1962; Campbell and Kulikowski, 1966, among many others). Orientational filtering undoubtedly plays a role in the analysis of the structure of a visual pattern, but the visual system has other tasks, most obviously that of extracting information about the motion of objects. A simple analysis reveals that separating a two-dimensional image into its one-dimensional (that is, oriented) com-ponents presents problems for a system concerned with extracting object motion. Here we outline the problem, propose a novel formal solution to it, and consider the applications of this solution to a variety of per-ceptual and electro-physiological phenomena.The ambiguity of motion of one-dimensional patterns. The motion of a single extended contour does not by itself allow one to determine the motion of the surface containing that contour. The problem is illustrat-ed in Fig. 1. The three sections of the figure each show a surface contain-ing an oblique grating in motion behind a circular aperture. In Fig. 1A the surface moves up and to the left; in Fig. 1B it moves up; in Fig. 1C, it * Department of Psychology, New York University, New York, NY 10003, USA.** Present address: RCA David Sarnoff Research Laboratories, Princeton, NJ 08540, USA.***Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, MD 20205, USA. Present address: Department of Neurobiology and Behavior, State University of New York, Stony Brook, NY 11794, USA.118 PONTIFICIAE ACADEMIAE SCIENTIARVM SCRIPTA VARTA - 54Fig. 1. Three different motions that produce the same physical stimulus.moves to the left. Note that in all three cases the appearance of the moving grating, as seen through the window, is identical: the bars appear to move up and to the left, normal to their own orientation, as if produced by the arrangement shown in Fig. 1A. The fact that a single stimulus can have many interpretations derives from the structure of the stimulus rather than from any quirk of the visual system. Any motion parallel to a gra-ting's bars is invisible, and only motion normal to the bars can be detected. Thus, there will always be a family of real motions in two dimensions that can give rise to the same motion of an isolated contour or grating (Wohlgemuth, 1911, Wallach, 1935; Fennema and Thompson, 1979; Marr and Ullman, 1981).PATTERN RECOGNITION MECHANISMS 119 We must distinguish at the outset between what we term one-dimensional (1-D) and two-dimensional (2-D) patterns. A 1-D pattern is one like an extended grating, edge, or bar: it is uniform along one axis. In general, such a pattern would have to extend infinitely along its axis to be truly 1-D but for the present purposes it is sufficient that the pattern extend beyond the borders of the receptive field of a neuron being stud-ied, or beyond the edge of a viewing aperture. The essential property is that, when a l-D pattern is moved parallel to its own orientation, its appearance does not change. By convention (and in agreement with its appearance),we will represent the "primary" motion of a 1-D pattern as having the velocity normal to its orientation. 2-D patterns are not invariant with translation along any single axis; they include random dot fields, plaids, and natural scenes. Such patterns change no matter how they are moved, and their motion is not ambiguous in the same way as the motion of a 1-D pattern is.In this paper we are concerned only with uniform linear motion. For certain other kinds of motion (e.g. rotation or curvilinear motion, or motion in depth), analogous ambiguities exist and can be described and solved in a manner similar to the one we present here (but see also Hildreth, 1983).The disambiguation of motion. If the motion of a 1-D pattern such as an edge is ambiguous, how is it possible to determine the motion of an object at all? It turns out that, although a single moving contour cannot offer a unique solution, two moving contours (which belong to the same object) can, as long as they are not parallel. As Fig. 1 shows, there is a family of motions consistent with a given 1-D stimulus. Naturally, this is also true of the 1-D elements of a 2-D stimulus. Consider the diamonds shown in Fig. 2A. The left-hand diamond moves to the right; the right-hand diamond moves down. Note that in both cases, in the local region indicated on each diamond by the small circle, the border moves down-ward and to the right. The moving edge in Fig. 2B, which could represent a magnified view of the circled regions of the diamonds' borders in Fig. 2A, can be generated by any of the motions shown by the arrows. Motion parallel to the edge is not visible, so all motions that have the same com-ponent of motion normal to the edge are possible candidates for the "true" motion giving rise to the observed motion of the edge. We may map this set of possible motions as a locus in "velocity space", as shown in Fig. 2B. Velocities are plotted as vectors in polar coordinates, starting at the origin. The length of the vector corresponds to the speed of the motion, and the120 PONTIFICIAE ACADEMIE SCIENTIARVM VARIA - 54Fig. 2. A. Two moving diamonds. The local regions circled on each diamond’s border have identical motions. B. A single moving contour, with the representation of its possible motions in a polar “velocity space”, in which each vector represents a possible direction and speed. C. The solution to the ambiguity of one-dimensional motion based on an intersection of constraints. Each border’s motion establishes a family of possible motions; the single intersection of these two families represents the only possible motion for a single object containing both contours.PATTERN RECOGNITION MECHANISMS121 angle corresponds to the direction. As shown in Fig. 2B, the locus of motions consistent with a given l-D stimulus maps to a line in velocity space. The line is perpendicular to the primary vector representing the motion normal to the 1-D pattern.It now becomes clear how one may unambiguously assign a velocity to a 2-D pattern, given knowledge only of the motion of its 1-D components. Consider, for example, the diamond moving rightward in Fig. 2C. One edge (viewed in isolation) moves up and to the right; the other moves down and to the right. In velocity space the two edges set up two lines of possible motions. Only a single point in velocity space is consistent with both—namely, the point of their intersection, which corresponds to a pure rightward motion (Fennema and Thompson, 1979; Horn and Schunck, 1981; Adelson and Movshon, 1982).There are, of course, other ways of combining vectors. For example, one might argue that a simple vector sum would do just as well as the more complex "intersection of constraints" just described. Indeed a vector sum happens to give the correct answer for the diamond of Fig. 2C, but this is only by chance. Consider, for example, the triangle of Fig. 3, which moves straight to the right. The velocities normal to the edges all have a downward component. Thus, when they are summed, the resultant itself goes down and to the right, instead of straight to the right. On the other hand, applying the intersecting constraints principle leads to the correct solution of a pure rightward motion, as shown in the lower part of Fig. 3.The solution to motion ambiguity just described is purely formal, and does not imply a particular model of how the visual system actually establishes the motion of objects. In the case of the triangle, there are a number of strategies, such as tracking the motion of the corners, which would not give ambiguous results. But while alternate solutions exist in particular cases, the ambiguity inherent in 1-D motion remains a constant problem when we try to understand how the visual system analyzes motion. 1-D stimuli such as bars and gratings are among the most important stimuli used in studying motion mechanisms. Moreover, the visual system itself seems to analyze the world via orientation selective neurons or channels, which necessarily discard information along one axis in favor of another. In this chapter, we consider some issues this analysis raises in the perception of motion, and describe a series of psychophysical and physiological experiments that address these questions.122 PONTIFICIAE ACADEMIAE SCIENTIARVM SCRIPTA VARIA - 54Fig. 3. An illustration of the inadequacy of vector summation as a solution to motion ambiguity. All three primary motion of the triangle's borders have a downward component but the true motion is directly to the right, as given by the intersection of constraints.StimuliWe used two kinds of stimuli in our experiments: sine wave gratings and sine wave plaids. The sine wave grating is our 1-D stimulus, and is therefore mathematically ambiguous in its motion. A moving grating can be diagrammed as occupying a line in velocity space, as shown in Fig. 4A.A pair of sine wave gratings, when crossed, produce the "plaid" pattern of Fig. 4B. In this case, there is no ambiguity about the motion of the whole pattern, since the two families of possible velocities (shown by the dotted lines) intersect at a single point. These stimuli have some advan-tages for experimentation over more conventional patterns like single con-tours and geometric figures. For one thing, all of our stimuli were identicalPATTERN RECOGNITION MECHANISMS 123Fig. 4. A single grating (A) and a 90 deg plaid (B), and the representation of their motions in velocity space. Both patterns move directly to the right, but have different orientations and 1-D motions. The dashed lines indicate the families of possible motions for each component.in spatial extent, and uniformly stimulated the entire retinal region they covered. This sidesteps the issue which arises in considering stimuli like the diamond of Fig. 2, of how the identification of spatially separate moving borders with a common object takes place. Moreover, the plaid patterns were the literal physical sum of the grating patterns, which makes superposition models particularly simple to evaluate.These stimuli were generated by a PDPll computer on the face of a display oscilloscope, using modifications of methods that are well-established (Movshon et al., 1978). Gratings were generated by modulat-124 PONTIFICIAE ACADEMIAE SCIENTIARVM SCRIPTA VARIA - 54ing the luminance of a uniform raster (125 frames/sec, 550 lines/frame) with appropriately timed and shaped signals. The orientation of the raster could be changed between frames, permitting the presentation of superimposed moving gratings on alternate frames. Plaid patterns were generated by this interleaving method at the cost of reducing the effective frame rate of each component of the display. The spatial frequency, drift rate, contrast and spatial extent of the test patterns were determined by the computer.The same computer was responsible for organizing the series of experimental presentations and collecting the data, using methods detailed elsewhere (Movshon et al., 1978; Arditi et al., 1981). In psychophysical studies, subjects' responses were normally yes-no decisions concerning some aspects of the immediately preceding display; in electrophysiologi-cal experiments, the computer collected standard pulses triggered by each action potential and assembled them into conventional averaged response histograms. In both kinds of experiment, all of the stimuli in an ex-perimental series were presented in a randomly shuffled sequence to reduce the effects of response variability.P SYCHOPHYSICAL STUDIESWhen presented with a pair of crossed gratings in motion, the visual system usually chooses the percept of a plaid in coherent motion, rather than the equally consistent percept of two gratings sliding over one another. Informal preliminary observations suggested to us that the likelihood that two gratings would phenomenally cohere was determined by various features of the gratings. We decided to examine the mechan-isms that underlie this percept of coherent motion. We first established the conditions that produce or prohibit coherence, and then used masking and adaptation techniques to test the hypothesis that the mechanisms re-sponsible for coherence represent a later and different stage of motion processing than the mechanisms responsible for the detection of simple moving patterns.The conditions for coherenceWe quickly found that the likelihood that a pair of gratings would cohere depended critically on the similarity between them. The first and most obvious dimension we examined was contrast, and the results of these experiments led to the methodology that we used for subsequent studiesPATTERN RECOGNITION MECHANISMS 125 (Adelson and Movshon, 1982). Figure 5A shows the results of an ex-periment on the effect of contrast.The two gratings were of 1.5 and 2.0 c/deg, and they moved at an angle of 120 deg to one another with a speed of 3 deg/sec. The contrast of the lower-frequency grating was fixed at 0.3, and that of the other was varied from trial to trial. The absolute orientations and directions of the two gratings were varied randomly from trial to trial. We performed two experiments in this situation. In the first (results given by open symbols), we asked the subject to indicate whether the second grating was detectable in the display. For this sequence, 14% of the trials were blank containing only one grating, and the probability that the observer signaled the presence of the second grating in this case was about 0.05 (half-symbol on the ordinate). As the contrast was increased, the probability that the observer detected the grating increased rapidly and monotonically, so that his performance was perfect by a contrast of about 0.008. In the second experimental series (results given by filled symbols), we showed the same family of 120 deg plaids, but now asked the subject to indicate whether the two gratings moved coherently, as a single plaid, or slid incoherently across one another. This judgment is, of course, criterion-dependent, and naive subjects often required several practice sessions before they gave stable data. It was also especially important to maintain stable fixation on the mark at the center of the display, since coherence seems to depend strongly on retinal speed. The data show that as the contrast increased, the likelihood of a coherence judgment also increased. It is clear, how-ever, that there was a considerable range of contrasts (between about 0.01 and 0.07) over which the two gratings were clearly visible, but failed to cohere. As the contrast of the weaker grating was increased (i.e. made closer in contrast to the "standard" grating), the probability of coherence increased. Because of the monotonicity of this kind of data, it is possible to define a "coherence threshold", as the contrast of the weaker grating that produces a 50% probability of coherence. In subsequent experiments, we measured this coherence threshold for various combinations of grat-ings using a staircase technique.Figure 5B shows the results of two experiments that tested the de-pendence of coherence on the relative spatial frequency of the test grat-ings (Adelson and Movshon, 1982). in these, the spatial frequency of the "standard" grating was set at 1.2 (open arrow and symbols) or 2.2 c/deg (filled arrow and symbols), and the coher-ence threshold measured for a variety of test spatial frequencies. The two gratings were separated inFig. 5. Two experiments on perceptual coherence. A. The effect of contrast on coherence. The two curves show the subject's probability of detecting the second grating (open symbols), and of seeing coherent motion (filled symbols). See text for details. B. The effect of spatial frequency on coherence. The standard grating was of 1.2 (open symbols and arrow) or 2.2 c/deg (filled symbols and arrow), and the data represent the coherence thresholds for a number of gratings of different spatial frequencies. See text for details. From Adelson and Movshon (1982).direction by 120 deg, and their absolute orientation and direction were again varied randomly from trial to trial. The speed of all test gratings was fixed at 3 deg/sec. It is clear that the relative spatial frequency of the gratings importantly influenced coherence: when the test and standardgratings were of similar spatial frequency, the coherence threshold waslow, but when they were made more than about a factor of two different, threshold rose sharply. The coherence threshold when the two gratings were of the same spatial frequency was about 0.7 log units higher than the detection threshold.We performed a variety of experiments conceptually similar to these, investigating the effects of the angle between the gratings, their relative speeds, and also the effects of the absolute speeds and spatial frequencies of the gratings. In general, coherence threshold rises as the angle between the gratings is made larger, as their speeds increase, and as the spatial frequency increases, although this latter effect is rather weaker than the others. Under ideal conditions (identical spatial frequencies, low speeds, and a modest angle), the coherence threshold approaches detection thresh-old so closely as to make the measurements problematic, since coherence is difficult to judge when the observer is not even certain that the second grating is visible.Models for the perception of coherent pattern motionThe experiments just described gave us a base from which to con-struct models for various aspects of motion perception. One of the striking features of coherent motion perception is its spatial frequency tuning: two gratings cohere into a moving pattern only if they are of similar spatial frequencies (Fig. 5B). This suggests that the visual system imposes a band-pass spatial filtering on the stimulus before extracting the coherent percept. The filtering could be isotropic—such as the filtering imposed by mechanisms with circularly symmetric receptive fields (e.g. retinal gang-lion cells). It could also be oriented—such as the filtering imposed by mechanisms with elongated receptive fields (e.g. cortical simple cells). We consider two models, schematically out-lined in Fig. 6.Model 1: analyzing motion without orientational filtering. The first scheme (Fig. 6A) passes the image through a set of non-oriented bandpass channels. The outputs of these stages are sent to a motion analysis system, which might track salient features such as local peaks, or might perform a cross-correlation between successive views (e.g. Reichardt, 1957; van Santen and Sperling, 1983). This analysis must proceed in para-llel in several spatial frequency bands, schematically indicated by the small and large symbols in Fig. 6A. After the determination of motion direction has proceeded within each spatial frequency band, the results are com-bined (in an unspecified way) to give the final motion percept. The resultsFig. 6. Two models of the mechanisms underlying perceptual coherence. See text for discussion.shown in Fig. 5B would come about in the following way: when two gratings are of similar spatial frequency, they would both pass the same spatial filter, and so would produce strong local peaks and troughs where their bars crossed. Thus, a feature tracker or a cross-correlator would be able unambiguously to assign a single motion to the whole pattern. If, on the other hand, the two gratings were of different spatial frequencies, they would not pass the same filter, and so would not produce, in the output of any filter, local peaks and troughs that could be tracked. Indeed, within each frequency band, it would be as though there were only a single grating present, and the familiar problem of motion ambiguity would cause this grating to appear to move normal to its own orientation—the motion extraction stages would operate in their default mode, with onlyl-D patterns to process. Thus, two separate motions would be seen, rather than a single coherent one. This model, incidentally, bears a close resem-blance to one put forward by Marr and Poggio (1979) for stereopsis.Model 2: analyzing motion after orientational filtering. An alternate scheme (Fig. 6B) would begin by filtering the image with orientation-selective mechanisms (shown as bars), similar to those commonly associat-ed with cortical neurons or psychophysical channels. The outputs of these mechanisms would then pass to motion analyzers, which would not need to track localizable features, because they only provide information about the motion normal to their own orientation (bars with arrows). As we will see below, motion-sensitive cells in striate cortex behave in this way. But here, of course, arises the problem of motion ambiguity—how does one determine the motion of the pattern as a whole, given the velocities of its oriented components? There are several ways in which this problem can be solved, but they are all formally equivalent to the "intersection of constraints" scheme we outlined at the start of this chapter. This might be implemented in neural terms by combining the signals from several appropriately distributed l-D motion detectors by circuitry similar to a logical "and" or a conjunction detector, requiring the simultaneous activa-tion of several l-D analyzers before the second-stage 2-D analyzers would respond. The combination rule here corresponds to a cosinusoidal rela-tionship between component velocity and direction; since this relationship maps to a circle in the polar velocity space, we symbolize the second-stage analyzers by these circles. As in model 1, this analysis must take place in parallel in several frequency bands, two of which are symbolized by the small and large symbols in Fig. 6B.The question of empirical interest is whether the visual system begins with oriented motion channels, and deals with the ambiguity problem later, or begins analyzing motion before orientation in order to avoid the ambiguity problem. Almost all of the psychophysics and physiology available points to the prevalence of oriented filtering at early stages in the visual system, and it would be surprising to find that the task of extracting pattern motion used mechanisms very different from those inferred in other experiments. Yet, on the other hand, it appears that early oriented filtering makes the task needlessly difficult. If the first stage were non-oriented, there should be no problem in finding these local features and using them to infer the pattern's motion.Affecting coherence with one-dimensional noise. To study the role of orientation selectivity in coherent motion perception, we combined sinewave plaid stimuli with one-dimensional dynamic random noise, which appears as a rapidly and randomly moving pattern of parallel stripes of various widths. This noise pattern masks the gratings that compose the moving plaid (e.g. Stromeyer and Julesz, 1972). if coherence depends on the outputs of oriented analyzers, then noise masking should elevate coherence threshold more strongly when the mask is oriented parallel to one of the gratings than when it is oriented differently from either. If, on the other hand, the process involves non-oriented filtering, then the orientation of the noise mask should not matter. Only the noise energy within the frequency band of interest, and not its orientation, should have effects on coherence. Our observations of the effects of one-dimensional noise on the threshold for coherence unambiguously demonstrate an orientation dependence in the masking. If the orientation of the noise pattern is within about 20 deg of the orientation of either component of the plaid, the pattern's coherence is reduced in a manner that seems consistent with the reduction in the apparent contrast of the component masked by the noise. If, on the other hand, the noise orientation is different from that of the components, even if it is normal to the direction of pattern motion, little or no effect on coherence is observed. We conclude from these observations that the mechanisms responsible for the phenomenal coherence of moving plaids belong to a pathway which, at some point, passes through a stage of orientation selective spatial analysis. The effects of adaptation on coherenceAs we have seen, the apparent direction of a pattern's motion can be quite different from the motions of the components that comprise it. We suggested earlier that pattern motion might be extracted in two distinct stages. The first stage is presumably revealed by the many orientationally-selective effects seen in experiments on the detection of moving gratings (e.g. Sekuler et al., 1968; Sharpe and Tolhurst, 1973). The second stage, involving further analysis of complex 2-D motions, reveals itself in our experiments on the coherence of plaids. If these stages are really distinct, it might be possible to affect them differentially in adaptation experi-ments. That is to say, it should be component motion, rather than pattern motion, that elevates detection threshold, whereas it should be pattern motion, rather than component motion, that affects coherence phenomena. We have presented some preliminary data suggesting that this is the case (Adelson and Movshon, 1981).It is well established that adapting to a moving grating elevates threshold for the detection of a similar grating moving in the same direc-tion (Sekuler and Ganz, 1963). This adaptation is both direction and orientation selective: an oblique drifting grating has little or no effect on the threshold of a vertical grating (Sharpe and Tolhurst, 1973). Suppose now that we combine two oblique gratings into a plaid, so that the plaid appears to move directly to the right. Suppose further that the oblique gratings have been chosen so that they cause no threshold elevation of a vertical grating (moving rightward), when presented alone. If adaptation is caused by the motion of the components, then threshold for the vertical grating should remain unchanged. If adaptation is caused by the coherent motion of the pattern as a whole, then threshold should be elevated, since the plaid adapting pattern, like the test grating, moves directly rightward. Similarly, the effect on the detection of a rightward moving plaid of adaptation to a vertical, rightward moving grating may be assessed.Figure 7 shows threshold elevation data for four different test-adapt combinations of this sort. All the stimuli in the experiment moved directly to the right at a constant speed of 1.5 deg/sec. Two kinds of stimuli were employed: single vertical gratings (spatial frequency 3 c/deg), and 120 deg plaids whose component gratings (oriented plus and minus 60 deg from vertical) had a spatial frequency of 3 c/deg. Thus all stimuli were identical in direction and speed of movement, but the orientational com-ponents of the plaids and gratings differed by 60 deg. We examined the elevation of contrast threshold for each kind of test stimulus following adaptation by each kind of adapting stimulus; the adapting stimuli were all of high contrast (0.5), and thresholds were measured by the method of adjustment. We tested for threshold elevation both in the adapted and unadapted directions. Inspection of Fig. 7 reveals that the results of these experiments conformed closely to the expectations of a model involving orientation selectivity. The detection threshold for a plaid or grating pattern could be strongly elevated in a directionally-selective man-ner following adaptation to a similar pattern, but was only slightly changed after adaptation to a different pattern. This result is in line with the ample evidence in the literature concerning the orientation and direction selectivity of the threshold elevation aftereffect (Blakemore and Campbel1, 1969; Blakemore and Nachmias, 1971; Sharpe and Tolhurst, 1973), and suggests that the 2-D motion of patterns is not encoded at the level of visu-al processing where these effects are expressed. There is some reason to suppose that threshold elevation effects of this kind are mediated by。
模式识别原理(PatternRecognition)、概念、系统、特征选择和特征
模式识别原理(PatternRecognition)、概念、系统、特征选择和特征§1.1 模式识别的基本概念⼀、⼴义定义1、模式:⼀个客观事物的描述,⼀个可⽤来仿效的完善的例⼦。
2、模式识别:按哲学的定义是⼀个“外部信息到达感觉器官,并被转换成有意义的感觉经验”的过程。
例:识别热⽔、字迹等⼆、狭义的定义1、模式:对某些感兴趣的客体的定量的或结构的描述。
模式类是具有某些共同特性的模式的集合。
2、模式识别:研究⼀种⾃动技术,依靠这种技术,计算机将⾃动地(或⼈尽量少地⼲涉)把待别识模式分配到各⾃的模式类中去。
注意:狭义的“模式”概念——是对客体的描述,不论是待识别客体,还是已知的客体。
⼴义的“模式”概念——是指“⽤于效仿的完善例⼦三、相关的计算机技术1、⽬前的计算机建⽴在诺依曼体系基础之上。
1946年:美籍匈⽛利数学家冯·诺依曼提出了关于计算机组成和⼯作⽅式的基本设想:数字计算机的数制采⽤⼆进制;计算机按照程序顺序执⾏,即 “程序存储”的概念。
1949年:研制出第⼀台冯·诺依曼式计算机。
1956年:第⼀次⼈⼯智能(artificial intelligence) 研讨会在美国召开。
2、第五代⼈⼯智能型计算机本质区别:主要功能将从信息处理上升为知识处理(学习、联想、推理、解释问题),使计算机具有⼈类的某些智能。
研制⼯作从80年代开始,⽬前尚未形成⼀致结论。
⼏种可能的发展⽅向:神经⽹络计算机--模拟⼈的⼤脑思维。
⽣物计算机--运⽤⽣物⼯程技术、蛋⽩分⼦作芯⽚。
光计算机--⽤光作为信息载体,通过对光的处理来完成对信息的处理。
四、研究和发展模式识别的⽬的提⾼计算机的感知能⼒,从⽽⼤⼤开拓计算机的应⽤。
§1.2 模式识别系统⼀、简例:建⽴感性认识以癌细胞识别为例,了解机器识别的全过程。
1、信息输⼊与数据获取将显微细胞图像转换成数字化细胞图像,像素的值反映光密度的⼤⼩,⼜称灰度数字图像。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 3 5 - 1
Laboratoire GTS (Groupe de Traitement du Signal), Ecole Navale BP 600, 29 240 Brest Naval, France Universite & de Bretagne Occidentale, LEST-UMR CNRS 6616 6 av. Le Gorgeu, BP 809, 29 285 Brest, France Received 9 January 1998; received in revised form 11 June 1999; accepted 11 June 1999
* Corresponding author. Tel.: 00-33-298-234-018; fax: 00-33298-233-857. E-mail address: yao@poseidon.ecole-navale.fr (K.C. Yao).
located on the sea#oor (as wrecks, rocks, man-made objects, and so on2) are generally based on the extraction and the identi"cation of its associated cast shadow [1]. Thus, before any classi"cation step, one must segment the sonar image between shadow areas and reverberation areas. In fact, the sea-bottom reverberation and the echo are considered as a single class. Unfortunately, sonar images contain speckle noise [2] which a!ects any simple segmentation scheme such as a maximum likelihood (ML) segmentation. In this simple case, each pixel is classi"ed only from its associated grey level intensity. In order to face speckle noise and to obtain an accurate segmentation map, a solution consists in taking into account the contextual information, i.e. class of the neighborhood pixels. This can be done using Markov random "eld (MRF) models [3] and this is why Markovian assumption has been proposed in sonar imagery [4]. In this global bayesian method, pixels are classi"ed
Pattern Recognition 33 (2000) 1575 }mentation using a self-organizing map and a noise model estimation in sonar imagery
K.C. Yao *, M. Mignotte , C. Collet , P. Galerne , G. Burel
Abstract This work deals with unsupervised sonar image segmentation. We present a new estimation and segmentation procedure on images provided by a high-resolution sonar. The sonar image is segmented into two kinds of regions: shadow (corresponding to a lack of acoustic reverberation behind each object lying on the seabed) and reverberation (due to the re#ection of acoustic wave on the seabed and on the objects). The unsupervised contextual method we propose is de"ned as a two-step process. Firstly, the iterative conditional estimation is used for the estimation step in order to estimate the noise model parameters and to accurately obtain the proportion of each class in the maximum likelihood sense. Then, the learning of a Kohonen self-organizing map (SOM) is performed directly on the input image to approximate the discriminating functions, i.e. the contextual distribution function of the grey levels. Secondly, the previously estimated proportion, the contextual information and the Kohonen SOM, after learning, are then used in the segmentation step in order to classify each pixel on the input image. This technique has been successfully applied to real sonar images, and is compatible with an automatic processing of massive amounts of data. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
1576
K.C. Yao et al. / Pattern Recognition 33 (2000) 1575 } 1584
using the whole information contained in the observed image simultaneously. Nevertheless, simple spatial MRF model have a limited ability to describe properties on large scale, and may not su$cient to ensure the regularization process of the set of labels when the sonar image contains high speckle noise. Such a model can be improved by using a large spatial neighborhood for each pixel [5], or a causal scale and spatial neighborhood [6] but this rapidly increases the complexity of segmentation algorithms and the parameter estimation procedure required to make this segmentation unsupervised. Besides, the segmentation and the estimation procedure with such a priori model requires a lot of computing time. Moreover, the uses of such a global method does not allow to take into account the noise correlation on the sonar image [7]. An alternate approach adopted here, uses a local method, i.e. takes into account the grey levels of the neighborhood pixels. In this scheme, each pixel is classi"ed from information contained in its neighborhood. This method allowing to take into account the noise correlation is divided in two main steps: the model parameter estimation [8] and the segmentation algorithm which is fed with the previously estimated parameters. In this paper, we adopt for the parameter estimation step an iterative method called iterative conditional estimation (ICE) [9] in order to estimate, in the ML sense, the noise model parameters and specially the proportion of each class (shadow and reverberation). Followed by the training of a competitive neural network as a Kohonen self-organizing map (SOM) [10] in order to approximate the discriminating function (i.e. the contextual distribution function of the grey level). For the segmentation step, we develop a contextual segmentation algorithm exploiting e$ciently the previously estimated parameters, the input sonar image, and the topology of the resulting Kohonen SOM. This paper is organized as follows. In Section 2, we detail the parameter estimation step based on the ICE procedure in Section 2.1, and the training step of the SOM in Section 2.2. Section 3 presents the segmentation step. Experimental results both on real scenes and synthetic sonar images are presented in subsection 3.3, where we compare the results obtained with the proposed scheme, a ML segmentation and a classical monoscale Markovian segmentation. Then a conclusion is drawn in Section 4.