[1.2] D. Marr, Vision A Computational Investigation into the Human Representation

合集下载

Computer-Vision计算机视觉英文ppt

Computer-Vision计算机视觉英文ppt
At the same time, AI MIT laboratory has attracted many famous scholars from the world to participate in the research of machine vision,which included the theory of machine vision , algorithm and design of system .
Its mainstream research is divided into three stages:
Stage 1: Research on the visual basic method ,which take the model world as the main object;
Stage 2: Research on visual model ,which is based on the computational theory;
the other is to rebuild the three dimensional object according to the two-dimensional projection images .
History of computer vision
1950s: in this period , statistical pattern recognition is most applied in computer vision , it mainly focuse on the analysis and identification of two-dimensional image,such as: optical character recognition, the surface of the workpiece, the analysis and interpretation of the aerial image.

MARR Active vision model

MARR Active vision model

MARR: Active vision modelLubov N. Podladchikova1, Valentina I. Gusakova1, Dmitry G. Shaposhnikov1, Alain Faure2, Alexander V. Golovan3, and Natalia A. Shevtsova41A.B. Kogan Research Institute for Neurocybernetics, RSU, Russia2LACOS, Le Havre University, France3MBTI, George Mason University, VA, USA4UMIACS, University of Maryland, MD, USAABSTRACTEarlier13,18,19, the biologically plausible active vision ,model for Multiresolutional Attentional Representation and Recognition (MARR) has been developed. The model is based on the scanpath theory of Noton and Stark17and provides invariant recognition of gray-level images. In the present paper, the algorithm of automatic image viewing trajectory formation in the MARR model, the results of psychophysical experiments, and possible applications of the model are considered. Algorithm of automatic image viewing trajectory formation is based on imitation of the scanpath formed by operator Several propositions about possible mechanisms for a consecutive selection of fixation points in human visual perception inspired by computer simulation results and known psyhophysical data have been tested and confirmed in our psychophysical experiments. In particular, we have found that gaze switch may be directed (i) to a peripheral part of the vision field which contains an edge oriented orthogonally to the edge in the point of fixation, and (ii) to a peripheral part of the vision field containing crossing edges. Our experimental results have been used to optimize automatic algorithm of image viewing in the MARR model. The modified model demonstrates an ability to recognize complex real world images invariantly with respect to scale, shift, rotation, illumination conditions, and, in part, to point of view and can be used to solve some robot vision tasks.Keywords: active vision, invariant recognition, scanpath1. INTRODUCTIONAt present, several biologically plausible approaches based on space-variant resolution and intelligent gaze control such as Smart Sensing7, Active Vision1,2,4, What-Where6,8 and Foveal Systems3,5,10,11,14,18,19,21,25 are dramatically developed in computer and robot vision. The most of the systems developed according to these approaches imitate two interconnected properties of biological vision, i.e. space-variant resolution from the fovea to the retinal periphery and foveation at the most informative visual objects. Such systems are considered as foveal alternative3 to the conventional artificial visual systems based on space-invariant resolution and it is widely accepted that they may solve main problems of visual information processing in real world. Main computational advantages of the foveal vision for artificial visual systems are a sharp decrease of information volume that must be processed at a high resolution level and a possibility of invariant image recognition3,5,6,13,14,16,20. It has been shown that space-variant architecture can lead to up to four orders of magnitude of information compression in the complex visual tasks20 because fixations of an input window based on space-variant resolution only on selected image (or scene) fragments during object viewing may compress processed information volume in a great extent._______________________Further author information-L.N.P. (correspondence), V. I.G. and D.G.Sh.: E-mail: w516@krinc.rnd.runnet.ru; WWW: /`rybak/nisms/html; Telephone: (78632) 28-07-44; Fax: (78632) 28-09-93.A.F.: E-mail: faure@iut.univ-lehavre.fr; WWW: http://gunthar.iut.univ-lehavre.fr; Telephone: (33) 23279720; Fax: (33) 235304968A.V. G.: E-mail: agolovan@; Telephone: (703) 993-4322; Fax: (703) 993-1002N. A. Sh.: E-mail: natalia@; Telephone: (301) 405-2686; Fax: (301) 405-6707418/ SPIE Vol. 3208 0-8194-2640-7/97/$6.00One of the systems developed in the frameworks of the foveal approach is biologically plausible active vision model for Multiresolutional Attentional Representation and Recognition of gray level images (MARR)13,18,19 .In the MARR model, the input (attention) window provides visual information representation and processing with space-variant resolution from the center to its periphery similarly to the other foveal systems. Besides, the MARR model is biologically plausible in some other relations. In particular, it is based on the scanpath theory of Noton and Stark17 and takes into account the fact that during visual perception and recognition human eyes actively perform problem-oriented image processing by way of successive fixations at the most informative fragments of the image under the control of visual attention19. Specific properties of the MARR model differing it from the other foveal systems are as follows:(i) special structure and algorithm of input window to provide invariant encoding of a set of oriented edges within the input window (Fig.1);(ii) special algorithms of viewing scanpath (trajectory of input window movements) to provide invariant representation of spatial relationships between the viewed image fragments ;(iii) topological similarity of image viewing trajectories during memorizing and recognition.FIG.1 Structure of the input window in the MARR model [ Fig.2, in19]The areas of different resolution (central circle - "fovea" ) are separated by circles. The context points are located at the intersections of sixteen radiating lines and three concentric circles, each in a different resolution leve . XOY is the absolute coordinate system. The relative coordinate system is attached to the basic edge at the center of the input window.The structure of the input window allows to realize a parallel determing of edges orientation in basic (in the center) and 48 context (on intersections of 16 radiating lines and 3 circles) regions at various resolution level for each fixation point (see Fig.1). Besides, the resolution level in the basic region may be chosen depending on the object fragment parameters in the given fixation point.During image viewing, a next fixation of the input window is selected from 48 context areas which are considered to be potential targets for a shift of the input window . Each shift of the input window from the n-th fixation point is invariantly represented by the parameters of the m-th context point which is selected to be the next (n+1)-th fixation point. The parameters are defined with respect to the relative coordinate system in current n-th image fragment providing transfixation features linking during such shift of the input window.The above properties of the MARR system provide an essential compression of input information and allow to receive the invariant representation of fragments and an image as a whole. The invariant description ofSPIE Vol. 3208/ 419fragments are presented by 49 dimension vectors of edges parameters which are identified at each fixation point. The invariant representation of an object as a whole is formed sequentially during image viewing. The MARR model based on image viewing trajectory formed by operator was tested during computer simulation. It was shown that gray-level images can be recognized by means of the MARR model invariantly to scale, shift, rotation and noise19.It is known that one of the most important problem of the biological and artificial foveal vision is a study of the mechanisms of next fixation points choice and a development of their efficient computational algorithms. This problem is extensively studied by means of neurobiological and mathematical modeling methods 5,9,12,13,15,19,20,22,23. But up to now, many aspects of it is far from an effective solution.In the present paper, an algorithm of automatic image viewing trajectory formation in the MARR model, the results of psychophysical experiments, and possible applications of the model have been considered.2. ALGORITHM OF AUTOMATIC IMAGE VIEWING TRAJECTORY FORMATION.Sequential points of input window fixation in the MARR model developed earlier are selected by operator. It is evident that a formalization of a criterium of such selection is necessary to provide the processing of real world images by means of the MARR model. Taking into account the fact that the mechanisms of human visual search include many interconnected components whose contribution is far from full understanding up to now, we have attempted to solve this problem by way of the development of automatic algorithms of next fixation point selection and image viewing trajectory formation basing on imitation of the scanpath formed by operator. In some sense, viewing trajectory formed by operator may be considered as a template determined by image properties and attention and cognitive mechanisms. It is known that different subjects fix the gaze at a number of the same image fragments while viewing the same pictures. So, the use of viewing trajectory, formed by operator, as the template during development of automatic algorithm allow us to receive a formal description of scanpath formation based on image properties.The development of automatic algorithm of viewing trajectory formation included several stages:Stage 1. Formalization of the criterium of selection of next fixation point used by operator in a first approximation.A set of computer experiments has been performed by means of the MARR model at processing of one test face images (Fig. 2). In each fixation point of the input window of the MARR model, operator selected a next fixation point from 48 context points of the current image fragment basing on comparison of input image fragment features and a set of oriented edges identified by means of the MARR model at different resolution levels, and according to subjective estimation of importance of image fragment features to represent the image. The results of these experiments performed by 3 operators at processing the same image are formalized as follows:F x y t k Em x y k Or x y k D x y k P x y t k L x y (,,)(,)(,)(,)(,,)(,)=+⋅+⋅+⋅+⋅+⋅12345 (1)where F x y (,) is function determing the selection of next fixation point:Em x y (,) is the value of an averaged contrast of oriented edges; identified around given context point at different resolution levels,Or x y (,) is number of oriented edges;D x y (,) is difference between edges orientations in basic and context areas of the input window: P x y t (,,) is potential function; describing a priority for points (x,y),L x y (,) is resolution level .According to (1) the context point where F(x,y,t) is maximal is selected as the target for next fixation of the input window.Stage 2. Optimization of the automatic algorithm.The contribution of the parameters in (1) has been estimated during numerical experiments. The coefficients k1, k2,...k5 in (1) are chosen to receive the automatic viewing trajectory resembled to that formed by operator while processing the same image. The resemblance of these trajectories is estimated as follows:420/ SPIE Vol. 3208S r X X r Y Y r c i a i c i a i i m ()(()((),=−+−=∑221 (2)where r is parameters vector (,,,,,)r r r r r r 123456;(,)X Y c i c iis i-th point of the trajectory, formed by operator;(,)X Y a i a i is i-th point of the trajectory, formed by automatic algorithm.It was shown that there exist some range of parameters values at which automatically formed trajectory corresponds to that formed by operator (compare Fig. 2a and 2b). Besides, a contribution of some components in (1) may be minimized. In particular, the image viewing trajectory presented in Fig.2b was received by means of (1) when the coefficients under the parameters had the following values: k1=k2=k3=1, k4=k5=0. Thus, according to numerical experiments results the formula (1) can be simplified.a bFIG. 2. Image viewing trajectories, formed by operator (a) and by automatic algorithm (b).Stage 3. Testing of automatic algorithm at processing various faces images.Automatic algorithm of viewing trajectory formation in memorizing mode described in (1) with coefficients values chosen above has been used at processing of various faces images. It was revealed that the MARR model successfully recognized test faces images basing on the trajectories formed by means of (1) without corrections of their parameters by operator and saved its properties of invariant recognition to some transformation of initial images.Series of propositions about possible mechanisms for the consecutive selection of fixation points in human visual perception inspired by computer experiments results received during development of automatic algorithm of viewing trajectory formation and some known neurobiological data (see below) have been postulated:1. The existence of a critical value of differences between orientations of linear segments located in thefoveal and peripheral vision fields. This value may be important for a view switch to the certain fragment of an image;2. Angles and intersections of oriented elements as semantic features exciting the shift of gaze.3. The existence of discrete centers of visual information processing in the peripheral vision field (in the retino-cortical system) that may be attraction areas during foveation at the most informative image fragments.SPIE Vol. 3208/ 4213. THE RESULTS OF PSYHOPHYSICAL EXPERIMENTS.Evidently, related quantitative psychophysical data might allow us to optimize the above description and more exactly estimate a contribution of various images parameters. But, known data 9,12,17,22,23,24 can not be formilized for our goals. Known experimental data about properties of peripheral visual signals exciting gaze shift can be summarized as follows. It is known, that viewing eye movements mechanisms include an estimation of physical and semantic properties of visual objects and are controlled by attention and cognitive mechanisms.A priority image fragment at the periphery of the vision field, selected for foveation, is determinated as the most informative image point24. As concerned physical properties of stationary peripheral stimuli, it was shown, that the gaze is mainly shifted to vision field parts where elements of maximal brightness, contrast and size are concentrated. Besides, peripheral stimuli exciting gaze shift must be located at some distance from current fixation point. Finally, it was shown that human eyes during viewing of complex images are mainly fixed on the semantically important fragments and image contour24.Taking into account above problems, we carried out several series of computer-aided psychophysical experiments to receive some quantitative estimations. First of all, these experiments were directed toverify our propositions.Low nasal quadrant of the vision field has been monocularly (right eye) tested for the 7 volunteers. Each experimental trial was repeated no less than 7 times for every subject. As stimuli, stationary bars and crossings of various width, length, brightness, orientation, duration, and position have been used.Let us consider the results of two sets of experiments (Fig.3). In these experiments, the subject fixed the gaze at the "foveal" bar presented in the top right corner of the display that was located at the distance of 30 cm in front of subject eyes. Then two testing peripheral stimuli were simultaneously presented (duration-200 msec) and masked by spot stimuli of the same brightness and diameter (duration-10 msec). The subject must shift his gaze from foveal bar to the one of peripheral stimuli. Two kinds of control experiments have been carried out (Fig.3b and 3c). This allow to minimize the contribution of difference in position of peripheral stimuli and to estimate orientation (set `1` experiments) or bars-crossings (set `2` experiments) selectivity to excite the gaze shift. The results of two series of experiments have been presented in Table 1. It may be seen that there exist the preferable gaze shift to peripheral bars orthogonally oriented with respect to foveal one (set `1`) and in more extent to crossings of bars as compared with orthogonal bars (`set`2`). These results are in agreement with propositions `1` and `2` (see above) and indicated on an exsitence of some `semantic` priority to select the next fixation points during image viewing.Preliminary results of psyhophysical experiments directed on verification of the proposition `3` have been received as well. Brightness sensitivity, acuity of the perception, and orientation selectivity have been tested in the vision fild parts separeted by 2 degrees from each other. The results obtained shown that besides the known exponential dynamics of tested parameters of the visual perception in the direction from the fovea to the retinal periphery there exsist local microareas in the vision field where these parameters are sharply different from neighbor microareas. In some cases, these microareas are groupped and forme periodically arranged spatial structures with respect to the fovea.The results of psyhophysical experiments allow us to optimize the description (1) for automatic algorithm of image viewing formation in the memorizing mode.422/ SPIE Vol. 3208FIG.3. .Scheme of two sets of psyhophysical experiments.Table 1.Estimation of preferable peripheral stimuli in two sets of psyhophysical experiments.Compared peripheral stimuli Collineary andorthogonal bars(set ‘1’)Orthogonal barsand crossings(set ‘2’)Preferable stimulus Orthogonal bar CrossingsNumber of trialspreferability, %59.5 68.9Total numberof trials, n121 1324. APPICATION OF THE MARR MODEL TO THE SOLUTION OF ROBOT VISIONTASKS.The modified MARR model has been tested in some mobile robot vision tasks. In particular, a set of real video images received during robot moving in laboratory environment (seeFig.4), was processed by means of the MARR model. The computer simulation showed that the MARR system is able to recognize complex gray-level real video images invariantly to shift, rotation, scale, noise, illumination conditions and in part to point of view. Besides, the results obtained indicate on an opportunity of effective solution mobile robot visual tasks by means of the MARR system, such as the identification of areas free for robot motion and scene segmentation on foreground and background objects.SPIE Vol. 3208/ 423Fig.4. Real video images of mobile robot environment.In future the problems of object classification, learning of the MARR system, as well as integration of this system into the LACOS mobile robot control system should be studied.REFERENCES1. Aloimonus J., WeissI., Bandyopadhyay A. Active vision. In: Proc. of the Int. Joint Conf. on ComputerVision. London. 1987. pp.35-54.2. Ballard D.H. Cortical connections and parallel processing: Structure and function. The behavioral andbrain sciences.1986. 9. pp.67-120.3. Bandera C., Scott P.D. Foveal machine vision systems. IEEE International Conference on Systems, Manand Cybernetics. Conference Proceeding. (Cat. No. 89CH2809-2). 1989. vol.2. pp. 596-599.4. Bessarabov I., Gavriley Y., Samarin A. Visual Perception and Selective Image Analysis. In: Proc. of theThe Second International Symposium of Neuroinformatics and Neurocomputers, Rostov-on-Don, Russia. 1995. pp. 46-53.5. Bolle R.M., Califano A., Kjeldsen R. Data and model driven foveation. IBM Tech. Report. 1989. pp.1-27.6. Buhmann J., Lange J., and Malsburg C.von der. Distortion invariant object recognition by matchinghierarchically labeled graphs. In: Proc. Int.Joit Conf. Neural Networks. 1989. vol.1. pp. 155-159.7. Burt P.J. Attention mechanisms for vision in a dynamic world. In: Proc. 9th Int. Conf.on PatternRecognition, Roma. 1988. pp.977-987.8. Carpenter G.A. Grossberg S. and Lesher G.W. A what-and-where neural network for invariant imagepreprocessing., In Proc. of IJCNN'92, Baltimore. 1992. vol.3, pp.303-308.9. Epelboim J., Steinman R.M., Knowler E., Edwards M., Pizlo Z., Erkelens C.J., Collewijn H. The Functionof Visual Search and Memory in Sequential Looking Tasks. Vision Res. Vol. 35, No. 23/24, pp. 3401-3422, 1995.10. ESPRIT project "Vision as Process", BR 3038/BR 7108.11. Fujita M. Filling-in by Foveal Vision. In: Proc. of the Int. Joint Conf. on Neural Networks (IJCNN'93),Nagoya. 1993. vol.1, pp. 211-214.12. Giefing G., Janssen,H., Mallot H. Saccadis object recognition with an active vision system. In: Proc. 11thIAPR Int. Conf. on Pattern Recognition. Conference A: Computer Vision and Applications, vol.1. pp. 664-671.13. Gusakova V.I., Rybak I.A., Podladchikova L.N., Golovan A.V., Shevtsova N.A. Formation of objectrepresentation for invariant image recognition using specific trajectories of viewing. Proc. SPIE, Vol. 2051, pp. 495-500, 1993.14. Hecht-Nielsen R. and Zhow Y.T. VARTAC: A Foveal Active Vision ART System.Neural Networks v.4,N718,1995, pp.1309-1321.15. Kuromiya A., Matsushita S., Katoh T., Ohnishi N. , Sugie N. An active vision system with associativememory can recognize overlapping patterns by shifting the fixation-point. In: Proc. of the Int. Joint Conf. on Neural Networks (IJCNN'93), Nagoya. 1993. vol.1. pp.185-188.16. Messner R.A.,.Szu H.H. An image processing architecture for real time generation of scale and rotationinvariant patterns. Computer Vision, Graphics, and Image Processing. 1985. vol.31, pp.50-60.17. Noton D. and Stark L. Scanpaths in eye movements during pattern recognition. Science. 1971. vol.171.pp.72-75.18. Rybak I.A, Golovan A.V., Gusakova V.I., Shevtsova N.A, Podladchikova L.N. A neural network systemfor active visual perception and recognition. Neural Network World. 1991. 1. 4. pp.245- 250.19. Rybak I.A., Gusakova V.I., Golovan A.V., Podladchikova L.N., and Shevtsova N.A. A Model ofAttentoin-Guided Invariant Visual Recognition. Submited to Vision Research, 1997.424/ SPIE Vol. 320820. Schwartz E. Algorithms and Hardware for the Application of Space-Variant Active Vision to HighPerformance Machine Vision. Proc. Vision, Recognition, Action: Neural Models of Mind and Machine.May 29-31, 1997, Boston, USA.21. Shevtsova N.A., Faure A., Klepatch A.A, Podladchikova L.N., Golovan A., Rybak I.A. Model of fovealvisual preprocessor. Proc. of the Intel. Robots and Computer Vision XIV Conf., 23-26 October 1995, Proc. SPIE Vol. 25888,pp.588-596.22. Swain M.J., Kahn R.E., Ballard D.H. Low resolution cues for guiding saccadic eye movements. In Proc.IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.92CH3168-2).1992. pp. 737-4023. Wolfe J.M. Guided Search 2.0, A revised model of visual search. Psychonomic Bulletin & Review. 1994,1 (2), pp. 202-238.24. Yarbus A.L. Eye movements and vision. Plenum Press. New York, 1967.25. Zeevi Y.Y. and Ginosar R. Neural Computers for Foveating Vision Systems. In R.Eckmiller, editor,Advanced Neural Computers, Elsevier Science Publishers B.V., North-Holland, 1990, pp. 323-330.SPIE Vol. 3208/ 425。

nature science上关于计算机视觉的一些原创文献

nature science上关于计算机视觉的一些原创文献

nature science上关于计算机视觉的一些原创文献1. D. Marr; T. Poggio.Cooperative Computation of Stereo Disparity.Science, New Series, Vol. 194, No. 4262. (Oct. 15, 1976), pp. 283-287. 这一篇是marr计算机视觉框架的开创性论文,到目前为止,计算机视觉基本上都在这个框架里做。

2. LONGUET-HIGGINS H C.A computer algorithm for reconstructing a scene from two projections[J].Nature,1981,293:133-135. 这一篇奠定了计算机视觉三维重构的基础,又称"八点算法”,导致计算机视觉三维重构热了20多年。

3. H. Bülthoff*, J. Little & T. Poggio.A parallel algorithm for real-time computation of optical flow.Nature 337, 549 - 553 (09 February 1989) ,光流实时并行算法的原始创新链接:/nature/journal/v337/n6207/abs/337549a0.html。

4. Hurlbert, A., and Poggio, T. 1986. Visual information: Do computers need attention?Nature 321(12).5. Dov Sagi* & Bela Julesz.Enhanced detection in the aperture of focal attention during simple discrimination tasks.Nature 321, 693 - 695 (12 June 1986)6. Gad Geiger; Tomaso Poggio.Science, New Series, Vol. 190, No. 4213. (Oct. 31, 1975), pp. 479-480.7. Gad Geiger; Tomaso Poggio.The Müller-Lyer Figure and the Fly.Science, New Series, Vol. 190, No. 4213. (Oct. 31, 1975), pp. 479-480.8. P. Sinha and T. Poggio.Role of Learning in Three-dimensional Form Perception," . Nature, Vol. 384, No. 6608, 460-463, 1996.9. Hubel DH,Wiesel TN.Cells sensitive to binocular depth in area 18 of the macaque monkey cortex.Nature,1970,225∶41~42 10.Livingstone M and Hubel D.Segregation of form, color, movement and depth: Anatomy, physiology and perception. Science, 1988, 240∶740~-749. 被引用1372次,关于眼睛立体视觉机制的原创论文。

视觉感知 相关书籍

视觉感知 相关书籍

视觉感知相关书籍视觉感知是指人类通过视觉系统获取和处理外部世界信息的能力。

视觉感知不仅是我们日常生活的基础,也是科学研究中的重要内容之一。

以下是几本关于视觉感知的相关书籍,可以作为参考内容:1. 《The Visual Brain in Action》- A. David Milner, Melvyn A. Goodale这本书详细介绍了大脑如何处理视觉信息,并将其转化为实际行为和决策。

书中讨论了视觉感知的基本原理以及大脑中与视觉相关的不同区域的功能。

2. 《Vision: A Computational Investigation into the Human Representation and Processing of Visual Information》- David Marr本书是经典的计算机视觉领域的著作之一,作者从计算机科学的角度探讨了视觉感知,并提出了一种用于分析和理解视觉过程的框架。

3. 《Perception and Imaging: Photography – A Way of Seeing》- Richard D. Zakia这本书介绍了摄影艺术与视觉感知的关系。

通过讲解摄影技术和视觉感知的基本原理,作者帮助读者提高他们的视觉感知能力,并以新的视角欣赏和理解摄影作品。

4. 《Sensation and Perception》- E. Bruce Goldstein这本教科书系统地介绍了感觉和知觉的原理和过程。

书中涵盖了视觉感知的各个方面,包括视觉神经系统、色彩、深度感知等,并且提供了实验和案例研究来支持理论分析。

5. 《Visual Intelligence: How We Create What We See》- DonaldD. Hoffman这本书探索了视觉系统如何组织和解释视觉信息,以及我们是如何通过感知和推断来创造和理解我们的视觉世界的。

书中讨论了视觉感知的进化和发展,以及与语言、意识和理性的关系。

计算机视觉(作业)

计算机视觉(作业)

视觉是各个应用领域,如制造业、检验、文档分析、医疗诊断,和军事等领域中各种智能/自主系统中不可分割的一部分。

由于它的重要性,一些先进国家,例如美国把对计算机视觉的研究列为对经济和科学有广泛影响的科学和工程中的重大基本问题,即所谓的重大挑战(grand challenge)。

"计算机视觉的挑战是要为计算机和机器人开发具有与人类水平相当的视觉能力。

机器视觉需要图象信号,纹理和颜色建模,几何处理和推理,以及物体建模。

一个有能力的视觉系统应该把所有这些处理都紧密地集成在一起。

作为一门学科,计算机视觉开始于60年代初,但在计算机视觉的基本研究中的许多重要进展是在80年代取得的。

现在计算机视觉已成为一门不同于人工智能、图象处理、模式识别等相关领域的成熟学科。

计算机视觉与人类视觉密切相关,对人类视觉有一个正确的认识将对计算机视觉的研究非常有益。

为此我们将先介绍人类视觉。

计算机视觉与人类视觉密切相关,对人类视觉有一个正确的认识将对计算机视觉的研究非常有益。

为此我们将先介绍人类视觉。

人类视觉感觉是人的大脑与周围世界联系的窗口,它的任务是识别周围的物体,并告诉这些物体之间的关系。

我们的思维活动是以我们对客观世界与环境的认识为基础的,而感觉则是客观世界与我们对环境的认识之间的桥梁,使我们的思维与周围世界建立某种对应关系。

视觉则是人最重要的感觉,它是人的主要感觉来源。

人类认识外界信息的80%来自视觉。

人有多种感觉,但对人的智力产生影响的主要是视觉和听觉。

味觉和嗅觉是丰富多样的,但很少有人去思考它们。

在视觉和听觉中形状、色彩、运动、声音等就很容易被结合成各种明确和高度复杂、多样的空间和时间的组织结构。

所以这两种感觉就成了理智活动得以行使和发挥作用的非常合适的媒介和环境。

但人听到的声音要想具有意义还需要联系其它的感性材料。

而视觉则不同,它是一种高度清晰的媒介,它提供关于外界世界中各种物体和事件的丰富信息。

因此它是思维的一种最基本的工具。

模式识别认知理论导引

模式识别认知理论导引

十种模式识别认知理论简介导引人们在认知景物时,常常寻找它与其它事物的相同与不同之处,根据使用目的进行分类,人脑的这种思维能力就构成了模式和识别的能力。

所谓模式,是指若干元素或成分按一定关系形成某种刺激结构,也可以说模式是刺激的组合。

当人们能够确认他所知觉的某个模式是什么时,将它与其他模式区分开来,这就是模式识别。

例如,有人想把一大批图片分成人物、动物、风景、建筑物、其他等五种类型分别保管,上述五种类型就是五个类别,也就是五个不同的模式,分类的过程叫做模式识别。

模式有简有繁,繁杂的模式往往是由多个子模式组成。

认知心理学家西蒙认为:“人们在解决数学问题时,大多数是通过模式识别来解决的,首先要识别眼前的问题属于哪一类,然后以此为索引在记忆储存中提取相应的知识,这就是模式识别。

我们之所以关心模式识别认知理论,是因为它是建立图像(景物)理解数学模型的思想源泉。

例如:传统的模式识别理论有人把它分为五类:模板匹配模式;原型匹配模式;特征分析模式;结构描述模式;傅里叶模式。

现在图像理解中主要的数学处理方法,几乎都是源于五种传统模式识别理论而建立的,或是基于它们的变形。

近二十多年来新提出的模式识别理论有人把它分为五种:视觉计算理论;注意的特征整合理论;成分识别理论;相互作用激活理论;视觉拓扑理论。

其中,马尔(Marr)的视觉计算理论是当前计算机(机器人)视觉的主流理论。

其它的理论,也被众多探索者们作为创新的源泉。

然而,无论上述那一种模式识别理论,都存在着或多或少的片面性,迄今为止尚未形成一个较具有说服力的、普遍认可的模式识别理论。

这正是制约图像识别(计算机视觉)数学模型发展的根本所在。

下面我们将各种模式识别理论分别介绍之。

模板匹配模式(传统模式识别之一)这个模型最早是针对机器的模式识别而提出来的,后来被用来解释人的模式识别。

它的核心思想是认为在人的长时记忆中,贮存着许多各式各样的过去在生活中形成的外部模式的袖珍复本。

Multimedia Applications).

Multimedia Applications).

Computational Semiotics and the Processing of Moving Images BibliographyAdshead (1988 Ed.). Janet Adshead, Dance Analysis: theory and practice. London: Dance Books.Agosti and Smeaton (1996 Eds.). Maristella Agosti and Alan F. Smeaton, Information Retrieval and Hypertext. Boston: Kluwer Academic Publishers.Ahmad, Salway and Lansdale (1998). Khurshid Ahmad, Andrew Salway and Janet Lansdale,‘(An)notating Dance: Multimedia Storage and Retrieval’, to appear in the Proceedings of ICCIMA’98 (International Conference on Computational Intelligence and Multimedia Applications).Ahmad and Thiopoulos (1995). Khurshid Ahmad and Constantin Thiopoulos, ‘Terminological storage and filtering of unstructured multimedia information’. In (eds) Yuichiro Anzai, Katsuhiko Ogawa & Hirohiko Mori. Symbiosis of Human and Artifact: Future Computing and Design for Human-Computer Interaction. Amsterdam: Elsevier Science Publications B.V. pp 241-247 Azarbayejani, Wren and Pentland (1996). Ali Azarbayejani, Christopher Wren and Alex Pentland,‘Real-Time 3-D Tracking of the Human Body’, MIT Media Lab Perceptual Computing Section Technical Report, No. 374.Barthes (1977). Roland Barthes, Image, Music, Text. London: Fontana Press.Birdwhistell (1971). Ray L. Birdwhistell, Kinesics and Context: essays on body-motion communication. London: Penguin.Bizzi et al. (1995). Emilio Bizzi, Simon F. Giszter, Eric Loeb, Ferdinando A. Mussa-Ivaldi and Philippe Saltiel, ‘Modular Organization of Motor Behaviour in the Frog’s Spinal Cord’, in: Trends in Neurosciences, 18 (10), 442-446.Bobick (1997). Aaron F. Bobick, ‘Movement, Activity, and Action: The Role of Knowledge in the Perception of Motion’, MIT Media Lab Perc. Comp. Section Technical Report 413.Bøgh Andersen (1990). Peter Bøgh Andersen, A Theory of Computer Semiotics. Cambridge: Cambridge University Press.Bøgh Andersen, Holmqvist, Jensen (1993 Eds.). Peter Bøgh Andersen, Berit Holmqvist, Jens Jensen, The Computer as Medium. Cambridge: Cambridge University Press.Bove (1996). V. Michael Bove, ‘Multimedia based on object models: Some whys and hows’, in: IBM Systems Journal 35 (3&4), 337-348.Brachman and Levesque (1985 Eds.). Ronald J. Brachman and Hector J. Levesque, Readings in Knowledge Representation. Los Altos, CA: Morgan Kaufman.Calvert et al. (1993). Tom Calvert, Armin Bruderlin, John Dill, Thecia Schiphorst and Chris Welman, ‘Desktop Animation of Multiple Human Figures’, in: IEEE Computer Graphics and Applications, May 1993, 18-26.Campbell et al. (1996). Lee Campbell, David Becker, Ali Azarbayejani, Aaron Bobick and Alex Pentland, ‘Invariant Features for 3-D Gesture Recognition’, MIT Media Lab Perceptual Computing Section Technical Report, No. 379.Chang and Smith (1997). Shih-Fu Chang and John R. Smith, ‘Finding Images / Video in Large Archives’, in: D-Lib Magazine, Feb 1997.Christel et al.. (1996). Michael Christel, Scott Stevens, Takeo Kanade, Michael Mauldin, Raj Reddy and Howard Wactlar, ‘Techniques for the Creation and Exploration of Digital Video Libraries’, in: Borko Furht (1996 Ed.) Multimedia Tools and Applications, 283-327. Boston: Kluwer Academic Publishers.Christel, Winkler and Taylor (1997). Michael Christel, David Winkler and C. Roy Taylor,‘Multimedia Abstractions for a Digital Video Library’, in: Proc. ACM Digital Libraries ‘97.Churchland (1992). Paul M. Churchland, ‘Some Reductive Strategies in Cognitive Neurobiology’, in: A Neurocomputational Perspective: the nature of mind and the structure of science, 77-110, Cambridge MA and London: The MIT Press.Crangle and Suppes (1994). Colleen Crangle and Patrick Suppes, Language and Learning for Robots, Stanford CA: CSLI Publications.de Marinis (1993). Marco de Marinis, The Semiotics of Performance. Bloomington and Indianapolis: Indiana University Press.Dittrich (1993). Winand H. Dittrich, ‘Action Categories and the Perception of Biological Motion’, in: Perception 22, 15-22.Eco (1979). Umberto Eco, The Role of the Reader. Bloomington: Indiana University Press.Eco (1979/1976). Umberto Eco, A Theory of Semiotics. 1st Midland Book Edition, Bloomington, London: Indiana University Press.Eco (1992). Umberto Eco (with Richard Rorty, Jonathan Culler and Christine Brooke-Rose; edited by Stefan Collini), Interpretation and Overinterpretation. Cambridge: CUP.Elam (1980). Keir Elam, The Semiotics of Theatre and Drama. London and New York: Routledge.Enoka (1994). Roger M. Enoka, Neuromechanical Basis of Kinesiology. 2nd Edition, Champaign, IL: Human Kinetics.Ericsson and Simon (1993). K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Reports as Data. Cambridge MA and London: The MIT Press.Ford and Bradshaw (1993 Eds.). Kenneth M. Ford and Jeffrey M. Bradshaw. Knowledge Acquisition as Modeling (Special Issues of ‘International Journal of Intelligent Systems’ 8 1/2), New York: John Wiley.Ford et al. (1991). Kenneth M. Ford, Frederick E. Petry, Jack R. Adams-Webber and Paul J. Chang, ‘An Approach to Knowledge Acquisition Based on the Structure of Personal Construct Systems’, in: IEEE Trans. on Knowledge and Data Eng. 3(1), 78-88.Furht, Smoliar and Zhang (1995). Borko Furht, Stephen W. Smoliar and HongJiang Zhang, Video and Image Processing in Multimedia Systems, Boston: Kluwer Academic.Gaines and Shaw (1995). Brian R. Gaines and Mildred L.G. Shaw, ‘Concept Maps as Hypermedia Components’, in: Int. J. of Human-Computer Studies.Gombrich (1994/1977). E.H. Gombrich, A study in the psychology of pictorial representation. 5th Edition, London: Phaidon Press.Gombrich (1994/1982). E.H. Gombrich, The Image and the Eye: further studies in the psychology of pictorial representation. London: Phaidon Press.Gonzalez (1997). Ruben Gonzalez, ‘Hypermedia Data Modeling, Coding, and Semiotics’, in: Proc. IEEE 85 (7), July 1997.Goodman (1976). Nelson Goodman, Languages of Art: an approach to a theory of symbols.2nd Edition, Indianapolis: Hackett Publishing Company.Hampapur, Jain and Weymouth (1996). Arun Hampapur, Ramesh Jain and Terry E. Weymouth,‘Production Model Based Digital Video Segmentation’, in: Borko Furht (1996 ed.) Multimedia Tools and Applications, 111-153. Boston: Kluwer Academic Publishers.Hermes et al. (1995). Thorsten Hermes, Chrisoph Klauck, Jutta Kreyß and Jinyou Zhang,‘Knowledge-based Information Retrieval’, in: Proc. Storage and Retrieval for Image and Video Databases III, SPIE - Int. Soc. of Optical Engineering, Feb 1995, San Jose: CA, vol. 24240, 394-430.Hess-Lüttich (1982 Ed.). Ernest W.B. Hess- Lüttich, Multimedial Communication. Volume1:Semiotic Problems of its Notation. Tübingen: Narr.Hodgins (1992). Paul Hodgins, Relationships Between Score and Choregraphy in Twentieth-Century Dance: music, movement and metaphor, Dyfed Wales: The Edwin Mellen Press.Hutchinson Guest (1984). Ann Hutchinson Guest, Dance Notation: the process of recording movement on paper. London: Dance Books.Innis (1986 Ed.). Robert E. Innis, Semiotics: an introductory anthology. London: Hutchinson & Co.Jin et al. (1996). Jesse S. Jin, Ruth Kurniawati and Guangyu Zu, ‘A Scheme for Intelligent Image Retreival in Multimedia Databases’, in: J. of Vis. Comm. and Image Rep. 7 (4), 369-377.Kelly (1955). George A. Kelly, The Psychology of Personal Constructs. New York: W.W. Norton.Kelly (1968). George A. Kelly, ‘A Mathematical Approach to Psychology’, in: Clinical Psychology and Personality: the selected papers of George Kelly, New York: John Wiley. Kreighbaum and Barthels (1990). Ellen Kreighbaum and Katharine M. Barthels, Biomechanics. 3rd Edition, New York: Macmillan.MacDonald and McGurk (1978). John MacDonald and Harry McGurk, ‘Visual Influences on Speech Perception Processes’, in: Perception and Psychophysics 24 (3), 253-257.Mackrell (1997). Judith Mackrell, Reading Dance. London: Michael Joseph.Marr (1982). David Marr, Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. San Francisco: W.H. Freeman. Maybury (1993 Ed.). Mark Maybury, Intelligent Multimedia Interfaces, Menlo Park CA: AAAI Press / The MIT Press.McKevitt (1995 Ed.). Paul McKevitt, Integration of Natural Language and Vision Processing: Intelligent Multimedia, Special Edition of Artificial Intelligence Review, 9 (2-3).Metz (1974). Christian Metz, Film Language: a semiotics of the cinema. New York: Oxford University Press.Nöth (1995). Winfried Nöth, Handbook of Semiotics. 1st paperback edition, Bloomington and Indianapolis: Indiana University Press.Nwosu, Thuraisingham & Berra (1996, Eds.). Kingsley C. Nwosu, Bhavani Thuraisingham and P. Bruce Berra. Multimedia Database Systems: Design and Implementation Strategies. Dordrecht: Kluwer Academic Publishers.Panofsky (1962/1939). Erwin Panofsky, Studies in Iconology. New York: Harper.Panofsky (1970/1955). Erwin Panofsky, Meaning in the Visual Arts. Harmondsworth: Penguin.Picard (1995). Rosalind W. Picard, ‘Towards a Visual Thesaurus’, M.I.T. Media Lab Perceptual Computing Section Technical Report No. 358. (To appear Springer Verlag Workshops in Computing, MIRO 95, Invited Paper, Glasgow, Sept. 95).Raibert (1986). M.H. Raibert, Legged Robots that Balance, Cambridge MA: The MIT Press.Rakow, Neuhold & Löhr (1995). Thomas C. Rakow, Erich J. Neuhold and Michael Löhr,‘Multimedia Database Systems - the Notions and the Issues’ in Georg Lausen (Ed.): Datenbanksysteme in Büro, Technik und Wissenschaft (BTW), GI-Fachtagung, Dresden, März 1995, S. 1-29. Springer, Reihe Informatik Aktuell, Berlin 1995.Rich and Knight (1991). Elaine Rich and Kevin Knight, Artificial Intelligence. 2nd Edition, New York: McGraw-Hill Inc.Ahmad, Salway and Lansdale (1998). Khurshid Ahmad, Andrew Salway and Janet Lansdale, ‘(An)notating Dance: Multimedia Storage and Retrieval’, to appear in the Proceedings of ICCIMA ’98 (International Conference on Computational Intelligence and Multimedia Applications).Roy and Pentland (1997). Deb Roy and Alex Pentland, ‘Multimodal Adaptive Interfaces’, MIT Media Lab Technical Report.Sebok (1994 Ed.). Thomas A. Sebok, An Encyclopedic Dictionary of Semiotics.2nd Edition, Berlin and New York: Mouton de Gruyter.Sebok (1994). Thomas A. Sebok, Signs: an introduction to semiotics. Toronto: University of Toronto Press.Shaw and Gaines (1992). Mildred L. G. Shaw and Brian R. Gaines, ‘Kelly’s “Geometry of Psychological Space” and Its Significance for Cognitive Modeling’, in: The New Psychologist, Oct. 1992, 23-31.Smith and Chang (1996). John R. Smith and Shih-Fu Chang, ‘VisualSEEK: a fully automated content-based image query system’, in: Proc. ACM Multimedia ‘96.Song and Waldron (1989). S. Song and K. J. Waldron, Machines that Walk, Cambridge MA: The MIT Press.Sowa (1984). John Sowa, Conceptual Structures: information processing in mind and machine. Reading, MA: Addison-Wesley.Sowa (1991 Ed.). John Sowa, Principles of Semantic Networks: explorations in the representation of knowledge. San Mateo, CA: Morgan Kaufman.Subrahmanian & Jajodia (1996, Eds.). V.S. Subrahmanian and Sushil Jajodia, Multimedia Database Systems: Issues and Research Directions. Berlin, Heidelberg, New York: Springer-Verlag.Torres and Kunt (1996 Eds.). Luis Torres and Murat Kunt, Video Coding: the second generation approach. Boston: Kluwer.van Doorn and Koenderink (1983). Andrea J. van Doorn and Jan J. Koenderink, ‘The Structure of the Human Motion Detection System’, in: IEEE Trans. System, Man and Cybernetics 13 (5), 916-922.Wilson, Bobick and Cassell (1996). Andrew D. Wilson, Aaron F. Bobick and Justine Cassell,‘Recovering the Temporal Structure of Natural Gesture’, MIT Media Lab Perceptual Computing Section Technical Report, No. 388.Winter (1990). David A. Winter, Biomechanics and Motor Control of Human Movement, New York: John Wiley.。

Granular Computing

Granular Computing

Granular ComputingY.Y. YaoDepartment of Computer Science, University of ReginaRegina, Saskatchewan, Canada S4S 0A2E-mail: yyao@cs.uregina.ca, http://www.cs.uregina.ca/~yyaoAbstract The basic ideas and principles of granular computing (GrC) have been studied explicitly or implicitly in many fields in isolation. With the recent renewed and fast growing interest, it is time to extract the commonality from a diversity of fields and to study systematically and formally the domain independent principles of granular computing in a unified model. A framework of granular computing can be established by applying its own principles. We examine such a framework from two perspectives, granular computing as structured thinking and structured problem solving. From the philosophical perspective or the conceptual level, granular computing focuses on structured thinking based on multiple levels of granularity. The implementation of such a philosophy in the application level deals with structured problem solving.Keywords: Granularity, granule, level, hierarchy, structured thinking, structured problem solving1. IntroductionHuman problem solving involves the perception, abstraction, representation and understanding of real world problems, as well as their solutions, at different levels of granularity [4, 6, 23, 28, 32-35]. The consideration of granularity is motivated by the practical needs for simplification, clarity, low cost, approximation, and tolerance of uncertainty [32]. As an emerging field of study, granular computing attempts to formally investigate and model the family of granule-oriented problem solving methods and information processing paradigms [14, 23, 28].Ever since the introduction of the term of “Granular computing (GrC)” by T.Y. Lin in 1997 [8, 32], we have witnessed a rapid development of and a fast growing interest in the topic [2, 5, 8-10, 13, 14,16-20, 22-31, 33, 35, 37]. Many models and methodsof granular computing have been proposed and studied. From the wide spectrum of current research, one can easily make several observations. There does not exist a general agreement about what is granular computing, nor there is a unified model [36]. Many studies concentrate on concrete models in particular contexts, and hence only capture limited aspects of granular computing. Consequently, the potential applicability and usefulness of granular computing are not well perceived and appreciated.The studies of concrete models and methods are important for the development of a field in its early stage. It is equally important, if not more, to study a general theory that avoids constraints of a concrete model.The basic notions and principles of granular computing, though under different names, have in fact been appeared in many related fields, such as programming, artificial intelligence, divide and conquer, interval computing, quantization, data compression, chunking, cluster analysis, rough set theory, quotient space theory, belief functions, machine learning, databases, and many others [8, 23, 28, 32, 33]. However, granular computing has not been fully explored in its own right. It is time to extract the commonality from these diverse fields andto study systematically and formally the domain independent principles of granular computing in a unified and well-formulated framework.In this paper, we study high level and qualitative characteristics of a theory of granular computing. A general domain independent framework is presented,in which basic issues are examined.2. Perspectives of Granular ComputingIt may be difficult, if not impossible, to give a formal, precise and uncontroversial definition of granular computing. N evertheless, one can still extract the fundamental elements from the human problem solving experiences and methods. There are basic principles, techniques and methodologies that are commonly used in most types of problem solving. Granular computing, therefore, focuses on problem solving based on the commonsense concepts of granule, granulated view, granularity, and hierarchy. They are interpreted as the abstraction, generalization, clustering, levels of abstraction, levelsof detail, and so on in various domains. We view granular computing as a study of a general theory of problem solving based on different levels of granularity and detail [28].Granular computing can be studied by applying its principles and ideas. It can be investigated in different levels or perspectives by focusing on itsphilosophical foundations, basic components, fundamental issues, and general principles. The philosophical level concerns structured thinking, and the application level deals with principles of structured problem solving. While structured thinking provides guidelines and leads naturally to structured problem solving, structured problem solving implements the philosophy structured thinking.The philosophy of thinking in terms of levels of granularity, and its implementation in more concrete models, would result in disciplined procedures that help to avoid errors and to save time for solving a wide range of complex problems.3. Basic Components of Granular ComputingIn modeling granular computing, we focus on three basic components and their interactions.3.1. GranulesA granule may be interpreted as one of the numerous small particles forming a larger unit. Collectively, they provide a representation of the unit with respect to a particular level of granularity. That is, a granule may be considered as a localized view or a specific aspect of a large unit.Granules are regarded as the primitive notion of granular computing. Its physical meanings become clearer when dealing with more concrete models. For example, in set-theoretic setting, such as rough sets, quotient space theory and cluster analysis, a granule may be interpreted as a subset of a universal set [12, 13, 34, 35]. In planning, a granule can be a sub-plan [6]. In programming, a granule can be a program module [7]. For the conceptual formulation of granular computing, we do not attempt to interpret the notion of granules based on more intuitive, but rather restrictive, concepts. We focus on some fundamental issues based on this weak view of granules.The size of a granule is considered as a basic property. Intuitively, the size may be interpreted as the degree of abstraction, concreteness, or detail. In the set-theoretic setting, the size of a granule can be the cardinality of the granule.Connections and relationship between granules can be represented by binary relations. In concrete models, they may be interpreted as dependency, closeness, or overlapping. For example, based on the notion of size, one can define an order relation on granules. Depending on the particular context, the relation may be interpreted as “greater than or equal to”, “more abstract than”, or “coarser than”. The order relation may be reflexive and transitive, but not symmetric. The order relation is particularly useful in studying connections between granules in different levels.One can define operations on granules so that one can operate on granules, such as combining many granules to form a new granule or decomposing a granule into many granules. The operations on granules must be consistent with the binary relations on the granules. For example, the combined granule should be more abstract than its components. The sizes of granules, the relations between granules, and the operations on granules provide the essential ingredients for developing a theory of granular computing.3.2. Granulated views and levelsIn his work on vision, Marr convincingly made the point that a full understanding of an information processing system involves explanations at various levels [11]. The three levels considered are the computational, algorithmic, and implementational. The computational level describes the information processing problem to be solved by the system. The algorithmic level describes the steps that need to be carried out to solve the problem. The implementational level deals with physical realization of the system. Although there does exist a general agreement on the interpretations and the exact number of levels, it is commonly accepted that the notion of levels is an important one in computer science [3].Foster critically reviewed and systematically compared various definitions and interpretations of the notion of levels [3]. Three basic issues, namely, definition of levels, number of levels, and relationship between levels, are clarified. Levels are considered simply as descriptions or points of views and often for the purpose of explanation. The number of levels is not fixed, but depends on the context and the purpose of description or explanation.A multi-layered theory of levels captures two senses of abstraction. One is the abstraction in terms of concreteness and is represented by planes along the dimension from top to bottom. The other is the abstraction in terms of the amount of detail and can be modeled along another dimension from less detail to more detail on the same plane.By viewing a level as a description or a point of view, one can immediately apply it as a basic notion to model granular computing. In order to emphasize the context of granular computing, we also refer to a level as a granulated view. A level consists of entities called granules whose properties characterize and describe the subject matters of study, such as a real world problem, a theory, a design, a plan, a program, or an information processing system. Granules are formed with respect to a particular degree of granularity or detail. Granules in a level are defined and formed within a particular context and are related to granules in other levels.There are two types of information and knowledge encoded in a level. A granule captures a particular aspect, and collectively, all granules in the level provide a granulated view. The granularity of a level refers to the collective properties of granules in a level with respect to their sizes. The granularity is reflected by the sizes of all granules involved.3.3. HierarchiesGranules in different levels are linked by the order relations and operations on granules. The order relation on granules can be extended to granulated views (levels). A level is above another level if each granule in the former level is ordered before a granule in the latter level, and each granule in the latter level is ordered after a granule in the former level, under the order relation. The ordering of levels can be described by the notion of hierarchy.The theory of hierarchy provides a multi-layered framework based on levels. Mathematically, a hierarchy may be viewed as a partially ordered set [1]. For the study of granular computing, the elements of the ordered set are interpreted as hierarchical levels or granulated views. The ordering of levels in a hierarchy is based on criteria that are related to the order relations on granules. A higher level may provide a constraint to and/or context of a lower level, and may contain and be made of lower levels. Depending on the context, a hierarchy may consist of levels of interpretation, levels of abstraction, levels of organization, levels of observation, and levels of detail. A hierarchy represents relationships between different granulated views, and explicitly shows the structure of granulation.A granule in a higher level can be decomposed into many granules in a lower level, and conversely many granules in a lower level can be combined into one granule in a higher level. A granule in a lower level may be a more detailed description of a granule in a higher level with added information. In the other direction, a granule in a higher level is a coarse-grained description of a granule in a lower level by omitting irrelevant details.3.4. Granular structuresWith the introduction of the three components, one can examine three types of structures for modeling their interactions. They are the internal structure of a granule, the collective structure of the all granules (i.e., the internal structure of a granulated view or level), and the overall structure of all levels.Although a granule is normally considered as a whole instead of many sub-granules at a given level, its internal structure needs to be examined. The internal structure of a granule provides a proper description, interpretation, and characterization of the granule. A granule may have a complex structure itself. For examples, the internal structure of a granule may be a hierarchy consisting of many levels. The internal structure is also useful in establishing linkage among granules in different levels.All granules in a level may collectively show a certain structure. This is the internal structure of a granulated view. Granules in a level, although may be relatively independent, are somehow related to a certain degree. This stems from the fact that they together form a granulated view. On the other hand, it is expected that in many situations the relationships between different granules are much weaker. The internal structure of a level is only meaningful if all the granules in the level are considered together.A hierarchy represents the overall structure of all levels. In a hierarchy, both the internal structure of granule and the internal structure of granulated views are reflected, to some degree, by the order relations. In a hierarchy, not any two granulated views can be compared based on the order relation. In the special case, the hierarchy is a tree.The three structures as a whole is referred to as the granular structure. One can establish more connections between three structures. For example, granules in a higher level may have greater integrity and higher bond strength than those in a lower level. The structures need to be fully explored to establish a basis of granular computing.3.5. A partition modelThe three basic components of granular computing can be easily illustrated by a concrete model known as the partition model of granular computing [28], which is based on rough set theory [12, 13] and quotient space theory [34, 35].A central notion of the partition model is equivalence relations. In rough set theory, an equivalence relation on a set of objects can be concretely defined in an information table based on their values on a finite set of attributes [12, 31]. Two objects are equivalent if they have exact the same values on a set of attributes.An equivalence relation divides a universal set into a family of pair-wise disjoint subsets, called the partition of the universe. A granule of a partition model is therefore an equivalence class defined by an equivalence relation. The internal structure of an equivalence class is captured by the same values of some attributes. A granulated view is the partition induced by an equivalence relation, and its structure is defined by the properties of the partition. Different equivalence relations can be ordered based on set inclusion, which leads to a hierarchy of partitions. In an information table, we only consider partitions generated by different subsets of attributes. The overall hierarchical structure is therefore induced bysubsets of attributes.The partition model may be viewed as a special case of cluster analysis. Following the same argument, one can easily find the correspondence between basic components of granular computing and its structures in cluster analysis. In general, given any concrete model of granular computing, we can easily find the corresponding components and structures.4. Basic Issues of Granular ComputingThe discussions of this section summarize and extend the preliminary results reported in [23, 28]. The list of issues discussed should not be viewed as a complete one. It can only be viewed as a set of representatives. Based on the principles of granular computing, these issues may also be studied at different levels of detail.Granular computing may be studied based on two related issues, i.e., granulation and computation 23, 28]. The former deals with the construction, interpretation, and representation of the three basic components, and the latter deals with the computing and reasoning with granules and granular structures.Studies of granular computing cover two perspectives, namely, the algorithmic and the semantic [23, 28]. Algorithmic study concerns the procedures for constructing granules and related computation, and the semantic study concerns the interpretation and physical meaningfulness of various algorithms. Studies from both aspects are necessary and important. The results from semantic study may provide not only interpretations and justifications for a particular granular computing model, but also guidelines that prevent possible misuses of the model. The results from algorithmic study may lead to efficient and effective granular computing methods and tools.4.1. GranulationGranulation involves the construction of the three basic components, granules, granulated views and hierarchies. Two basic operations are the top-down decomposition of large granules to smaller granules, or the bottom-up combination of smaller granules into larger granules.The notion of granulation can be studied in many different contexts. The granulation of a problem, a theory, or a universe, particularly the semantics of granulation, is domain and application dependent. Nevertheless, one can still identify some domain independent issues. For clarity, some of these issues are discussed in the set-theoretic setting.In the set-theoretic setting, a granule may be viewed as a subset of the universe, which may be either fuzzy or crisp. A family of granules containingevery object in the universe is called a granulatedview of the universe. A granulated view may consistof a family of either disjoint or overlapping granules.There are many granulated views of the same universe. Different views of the universe can belinked together, and a hierarchy of granulated viewscan be established.Granulation criteria. A granulation criteriondeals with the semantic issues and addresses the question of why two objects are put into the same granule. It is domain specific and relies on the available knowledge. In many situations, objects are usually grouped together based on their relationships,such as indistinguishability, similarity, proximity, or functionality [32]. One needs to build models to provide both semantical and operational interpretations of these notions. They enable us to formally and precisely define various notions involved, and to systematically study the meaningsand rationale of a granulation criterion.Granulation methods. From the algorithmic aspect, a granulation method addresses the problemof how to put two objects into the same granule. It is necessary to develop algorithms for constructing granules and granulated views efficiently based on a granulation criterion.Representation/description. The next issue isthe interpretation of the results of a granulation method, i.e., the granular structures. Once constructed, it is necessary to describe, to name andto label granules using certain languages. One may assign a name to a granule such that an element in the granule is an instance of the named category. Onemay also provide a formal description of objects inthe same granule. By pooling the representations of granules, one can obtain the overall representation ofa granulated view.Qualitativ e and quantitativ e characterization.One can associate quantitative measures to the three components, granules, granulated views, and hierarchies. The measures should reflect and be consistent with the three structures, the internal structure of a granule, the collective structure of a granulated view, and the overall structure of a hierarchy.4.2. Computing with granulesComputing and reasoning with granules explorethe three types of structures. They can be similarly studied from both the semantic and algorithmic perspectives. One needs to design and interpret various methods based on the interpretation of granules and relationships between granules, as wellas to define and interpret operations of granular computing.Mappings. The connections between differentlevels of granulations can be described by mappings. At each level of the hierarchy, a problem is represented with respect to the granularity of the level. The mapping links different representations of the same problem at different levels of detail. In general, one can classify and study different types of granulations by focusing on the properties of the mappings.Granularity conversion. A basic task of granular computing is to change views with respect to different levels of granularity. As we move from one level of detail to another, we need to convert the representation of a problem accordingly. A move to a more detailed view may reveal information that otherwise cannot be seen, and a move to a simpler view can improve the high level understanding by omitting irrelevant details of the problem.Operators. Operators can precisely define the conversion of granularity in different levels. They serve as the basic building blocks of granular computing. There are at least two types of operators that can be defined. One type deals with the shift from a fine granularity to a coarse granularity. A characteristic of such an operator is that it will discard certain details, which makes distinct objects no longer differentiable. Depending on the context, many interpretations and definitions are available, such as abstraction, simplification, generalization, coarsening, zooming-out, and so on. The other type deals with the change from a coarse granularity to a fine granularity. A characteristic of such an operator is that it will provide more details, so that a group of objects can be further classified. They can be defined and interpreted differently, such as articulation, specification, expanding, refining, zooming-in, and so on.Property preservation. Granulation allows different representations of the same problem in different levels of detail. It is naturally expected that the same problem must be consistently represented. Granulation and its related computing methods are meaningful only if they preserve certain desired properties. For example, Zhang and Zhang studied the “false-preserving” property, which states that if a coarse-grained space has no solution for a problem then the original fine-grained space has no solution [34, 35]. Such a property can be explored to improve the efficiency of problem solving by eliminating a more detailed study in a coarse-grained space. One may require that the structure of a solution in a coarse-grained space is similar to the solution in a fine-grained space. Such a property is used in top-down problem solving techniques. More specifically, one starts with a sketched solution and successively refines it into a full solution. In the context of hierarchical planning, one may impose similar properties, such as upward solution property, downward solution property, monotonicity, etc. [6]. 4.3. The rough set modelAs an illustration, we discuss the basic issues of granular computing based on the results from the rough set theory. Many applications of the rough set theory are based on the exploration of those issues.Granulation. The granulation criterion is an equivalence relation on a set of objects, which is concretely defined in an information table based on the values of a set of attributes. The granulation method is simply the collection of equivalent objects. One associates a formula to each equivalence class, which provides a formal description of the equivalence class. One also associates quantitative measures to equivalence classes and the partition induced by the equivalence relation.Computing with granules. Many of the applications of rough set theory can be viewed as concrete examples of computing with granules. With respect to an information table, mappings between different granulated views are in fact defined by different subsets of attributes. The conversion of granularity is achieved by adding or deleting attributes. The rough set approximation operators are granularity conversion operators.An important application of rough set theory is to learn classification rules [12, 21]. One of the important steps is to find a reduct of attributes, i.e., a set of individually necessary and collectively sufficient attributes that provide the correct classification [12, 21]. Conceptually, this can be easily modeled as searching the partition hierarchy defined by all subsets of attributes. Even in this simple search process, we have to deal with the issues discussed earlier. The mappings between levels direct the search direction; granularity conversion and property preserving principles govern the quality of the searched granulated views, the operators can be used to define the quality of each decision rule.5. ConclusionBy explicitly introducing an umbrella term of granular computing, one can explore, organize and unify the divergent concepts, theories, and applications into a well-formulated and unified theory of problem solving. It is time to move from studies of particular methods and concrete models of granular computing to a more abstract level. One needs to study its basic philosophy and principles, and to build a more general framework. This paper may be viewed as a step toward this goal.Although this paper does not cover all aspects of a complete model of granular computing, the results are useful in building a concrete model in which one can examine specific techniques and issues of granular computing in the context of particularapplications.The notions of granules, granulated views (levels) and hierarchies are sufficient for us to discuss the basic issues of granular computing. The sizes of granules, the granular structures, and the operations on granules provide the essential ingredients for the development of a theory of granular computing. References1.Ahl, V. and Allen, T.F.H. (1996) Hierarchy Theory, aision, V ocabulary and Epistemology, Columbia University Press.2.Bargiela, A. and Pedrycz W. (2002) GranularComputing: an Introduction,Kluwer Academic Publishers, Boston.3.Foster, C.L. (1992) Algorithms, Abstraction andImplementation: Levels of Detail in Cognitive Science,Academic Press, London.4.Hobbs, J.R. (1985) Granularity, Proceedings of the 9thInternational Joint Conference on Artificial Intelligence, 432-435.5.Inuiguchi, M., Hirano, S. and Tsumoto, S. (Eds.)(2003) Rough Set Theory and Granular Computing,Springer, Berlin.6.Knoblock, C.A. (1993) Generating AbstractionHierarchies: an Automated Approach to ReducingSearch in Planning, Kluwer Academic Publishers,Boston.7.Ledgard, H.F., Gueras, J.F. and N agin, P.A. (1979)PASCAL with Style: Programming Proverbs, HaydenBook Company, Inc., Rechelle Park, New Jersey.8.Lin, T.Y. (1997) Granular computing, announcementof the BISC Special Interest Group on GranularComputing.9.Lin, T.Y. (2003) Granular computing, LN CS 2639,Springer, Berlin, 16-24.10.Lin, T.Y., Yao, Y.Y. and Zadeh, L.A. (Eds.) (2002)Data Mining, Rough Sets and Granular Computing,Physica-Verlag, Heidelberg.11.Marr, D. (1982) V ision: A ComputationalInvestigation into the Human Representation andProcessing of Visual Information, W.H. Freeman andCompany, New York.12.Pawlak, Z. (1991) Rough Sets, Theoretical Aspects ofReasoning about Data, Kluwer Academic Publishers,Dordrecht.13.Pawlak, Z. (1998) Granularity of knowledge,indiscernibility and rough sets, Proceedings of 1998IEEE International Conference on Fuzzy Systems,106-110.14.Pedrycz, W. (Ed.), 2001, Granular Computing: anEmerging Paradigm, Physica-Verlag, Heidelberg.15.Peikoff, L. (1981) Objectivism: the Philosophy of AynRand, Dutton, New York.16.Peters, J.F., Pawlak, Z. and Skowron, A. (2002) Arough set approach to measuring information granules,Proceedings of COMPSAC 2002, 1135-1139.17.Polkowski, L. and Skowron, A. (1998) Towardsadaptive calculus of granules, Proceedings of 1998IEEE International Conference on Fuzzy Systems,111-116.18.Skowron, A. (2001) Toward intelligent systems:calculi of information granules, Bulletin of International Rough Set Society, 5, 9-30.19.Skowron, A. and Stepaniuk, J. (2001) Informationgranules: towards foundations of granular computing,International Journal of Intelligent Systems, 16, 57-85.20.Wang, G., Liu, Q., Yao, Y.Y. and Skowron, A. (Eds.)(2003) Rough Sets, Fuzzy Sets, Data Mining, andGranular Computing, LNCS 2639, Springer, Berlin. 21.Wang, J. (2002), Rough sets and their applications indata mining, in: Fuzzy Logic and Soft Computing,Chen, G., Ying, M. and Cai, K.-Y. (Eds.), KluwerAcademic Publishers, Boston.22.Yao, Y.Y., (1999) Granular computing usingneighborhood systems, in: Advances in Soft Computing: Engineering Design and Manufacturing,Roy, R., Furuhashi, T., and Chawdhry, P.K. (Eds),Springer-Verlag, London, 539-553.23.Yao, Y.Y. (2000) Granular computing: basic issuesand possible solutions, Proceedings of the 5th JointConference on Information Sciences, 186-189.24.Yao, Y.Y. (2001) Information granulation and roughset approximation, International Journal of IntelligentSystems, 16, 87-104.25.Yao, Y.Y. (2001) Modeling data mining with granularcomputing, Proceedings of COMPSAC 2001, 638-643.26.Yao, Y.Y. (2003) Information granulation andapproximation in a decision-theoretical model of rough sets, in: Rough-Neural Computing: Techniquesfor Computing with Words, Pal, S.K., Polkowski, L.,and Skowron, A. (Eds), Springer, Berlin, 491-518.27.Yao, Y.Y. (2003) Granular computing for the designof information retrieval support systems, in: Information Retrieval and Clustering, Wu, W., Xiong,H. and Shekhar, S. (Eds.), Kluwer AcademicPublishers 299.28.Yao, Y.Y. (2004) A partition model of granularcomputing, LNCS Transactions on Rough Sets, toappear.29.Yao, Y.Y. and Liau, C.-J. (2002) A generalizeddecision logic language for granular computing, Proceedings of FUZZ-IEEE'02, 1092-1097.30.Yao, Y.Y., Liau, C.-J. and Zhong, N. (2003) Granularcomputing based on rough sets, quotient space theory,and belief functions, Proceedings of ISMIS'03, 152-159.31.Yao, Y.Y. and Zhong, N. (2002) Granular computingusing information tables, in: Data Mining, Rough Setsand Granular Computing, Lin, T.Y., Yao, Y.Y. andZadeh, L.A. (Eds.), Physica-Verlag, Heidelberg, 102-124.32.Zadeh, L.A. (1997) Towards a theory of fuzzyinformation granulation and its centrality in humanreasoning and fuzzy logic, Fuzzy Sets and Systems, 19,111-127.33.Zadeh, L.A. (1998) Some reflections on softcomputing, granular computing and their roles in theconception, design and utilization of information/ intelligent systems, Soft Computing, 2, 23-25.34.Zhang, B. and Zhang, L. (1992) Theory andApplications of Problem Solving, N orth-Holland, Amsterdam.35.Zhang, L. and Zhang, B. (2003) The quotient spacetheory of problem solving, LNCS 2639, 11-15.36.Zhao, M. (2004) Data Description based on RuductTheory, Ph.D. Dissertation, Institute of Automation,Chinese Academy of Sciences.37.Zhong, N., Skowron, A. and Ohsuga S. (Eds.) (1999)New Directions in Rough Sets, Data Mining, andGranular-Soft Computing, LN AI 1711, Springer,。

计算机视觉技术

计算机视觉技术

计算机视觉的发展(二)
➢20世纪70年代中期
– 麻省理工学院(MIT)人工智能(AI)实验室正式 开设“计算机视觉”( Machine Vision)课程
– 进行计算机视觉的理论、算法、系统设计的 研究
➢20世纪70年代末期
– Prof. David Marr 提出视觉计算(computational vision)理论,该理论在80年代成为计算机视 觉研究领域中的一个十分重要的理论框架
彩等关于场景的基本特征 – 还包含了各种图像变换(如校正)、图像纹理检测、
图像运动检测等
计算机视觉研究的五大内容(二)
➢中层视觉(middle level)
– 任务:恢复场景的深度、表面法线方向、轮廓等有 关场景的2.5维信息等
– 实现途径: • 立体视觉(stereo vision) • 测距成像(rangefinder) • 运动估计(motion estimation) • 从X恢复形状的估计方法:遮挡、明暗特征、纹 理特征等
– 同时研究: • 系统标定 • 系统成像模型等
计算机视觉研究的五大内容(三)
➢高层视觉(high level)
– 任务:在以物体为中心的坐标系中,在原始输入图 像、图像基本特征、2.5维图的基础上 • 恢复物体的完整三维图 • 建立物体三维描述 • 识别三维物体 • 确定物体的位置和方向
计算机视觉研究的五大内容(五)
– 信息恢复由计算机视觉来完成
➢计算机图形学
– 计算机图形学:通过几何基元,如线、圆和 自由曲面,来生成图像→图像综合
– 计算机视觉:从图像中估计几何基元和其它 特征→逆过程
与其它学科领域的关系(二)
➢模式识别
– 模式:一类事物区别于其它事物所具有的共 同特征。

学习与创新 -Learning & Innovation - 清华国家信息实验室

学习与创新 -Learning & Innovation - 清华国家信息实验室

可鲁棒检测的特征空间
有语义含义的特征空间
几十bytes
2,000 bytes-50% (6464)维特征
借鉴人脑视觉机制-MIT(1)
• 最终表示 2D 轮廓(2D-sketches)
• 图像元素(image primitive): 不同取向的线段 • 组合方式(Configuration): HMAX: hierarchical sum-max operations
计算机(科学与)技术
• 硬件(运控、内存、外部设备)-46
• 软件(系统软件、应用软件)-75
• 计算机网络-89 计算机从人们视线逐渐淡出
计算机科学(与技术)
计算机科学是研究计算(Computing)的科学 建议:Computing Science
Datalogy (1969)
Computics (1995)
Bottom-up vs. Top-down
Top-down feedback
Priorknowledge Annotated Knowledge Contextual Knowledge
Top-down feedback
Local connection Data-driven
Data-driven vs. Structure-analysis
具有反馈的多层神经网络模型
G. E. Hinton, The “wake-sleep” algorithms for unsupervised neural networks, SCIENCE vol.268, 26 May 1995, 1158-1161
感知空间表示
RBM: Restricted Boltzmann Machine

国际机器视觉产业发展现状与趋势

国际机器视觉产业发展现状与趋势

国际机器视觉产业发展现状与趋势一、国际机器视觉产业市场规模1.产业发展历程机器视觉的概念起始于20世纪60年代,最先的应用来自"机器人"的研制。

最早基于视觉的机器系统,先由视觉系统采集图像并进行处理,然后通过计算估计目标的位置来控制机器运动。

1979年提出了视觉伺服(VisualServo)概念,即可以将视觉信息用于连续反馈,提高视觉定位或追踪的精度。

20世纪50年代:主要集中在二维图像的简单分析和识别上,如字符识别,工件表面、显微图片和航空图片的分析和解释等.60年代:MIT(MassachusettsInstituteofTechnology)的Roberts通过计算机程序从数字图像中提取出诸如立方体、楔形体、棱柱体等多面体的三维结构,并对物体形状及物体的空间关系进行描述.他的研究工作开创了以理解三维场景为目的的三维计算机视觉研究。

70年代:首次提出较为完整的视觉理论,已经出现了一些视觉应用系统.70年代中期,MIT人工智能(ArtificialIntelligence)实验室正式开设"机器视觉"课程。

1973年MITAILab吸引了国际上许多知名学者参与视觉理论、算法、系统设计的研究,D.Marr教授就是其中的一位.他于1973年应邀在MITAILab领导一个以博士生为主体的研究小组,1977年提出了视觉计算理论(VisionComputationalTheory),该理论在80年代成为计算机视觉领域中的一个十分重要的理论框架。

80年代中期:计算机视觉获得蓬勃发展,新概念、新方法、新理论不断涌现。

我国早期正式介绍计算机视觉的文献:计算机视觉:一个兴起的研究领域,计算机应用与软件,1984年第3期。

90年代中期:深入发展、广泛应用的时期。

2.应用现状分析随着微处理器、半导体技术的进步,以及劳动力成本上升和高质量产品的需求,国外机器视觉于20世纪90年代进入高速发展期,广泛运用于工业控制领域。

计算机视觉的历史

计算机视觉的历史

计算机视觉的历史计算机视觉的最终⽬标是让计算机能像⼈类那样利⽤视觉观察和理解世界,具有⾃主适应变化环境的能⼒。

下⾯简短介绍计算机视觉的发展历史:第⼀阶段:20世纪50年代,计算机视觉还属于模式识别领域,当时的主要⼯作是⼆维图像的分析和识别。

尽管属于模式识别领域,但当时的模式识别还不是⼀个独⽴的学科,直到60年代,模式识别才成为⼀门独⽴的学科。

第⼆阶段:20世界60年代中期,Robert的研究⼯作开创了以理解三维场景为⽬标的三维计算机视觉研究。

Robert的系统能从⼆维数字图像中抽取多⾯体的线画,利⽤已知多⾯体的模型分析分析线画中对应的物体在三维空间中真实的位置。

由于Robert的研究给⼈们极⼤的启发,使计算机视觉进⼊蓬勃发展时期。

第三阶段:20世纪70年代,David Marr在计算机视觉历史上画上了浓墨重彩的⼀笔,他提出了第⼀个较为完善的视觉系统框架——视觉计算理论框架。

他认为视觉是⼀个复杂的信息处理过程,并有不同的信息表达⽅式和不同层次的处理过程,⽽最终的⽬的是实现计算机对外部世界的描述。

由此他提出三个层次的研究⽅法,计算理论层、表征与算法层、实现层。

并提出了⾃上⽽下,没有反馈的视觉处理框架,他认为视觉主要是获得物体的三维形状,并把获得过程分为三个阶段:1. 原始基元图(primal sketch)。

基元图由⼆维图像中的边缘点,直线,曲线、顶点等基本⼏何元素构成。

2. 2.5维图(2.5dimensional sketch)。

以观测者为中⼼的坐标系中,由输⼊图像和基元图恢复场景可见部分的深度、法线⽅向、轮廓等,这些信息的包含了深度信息,但不是真正的物体三维表⽰,因此称为⼆维半图(另⼀部分是物体背⾯或是被遮挡的部分)。

3. 3维模型(3dimensional model)。

在以物体为中⼼的坐标系中,由输⼊图像、基元图、⼆维半图来恢复、表⽰和识别三维物体。

Marr的理论给了我们研究计算机视觉许多珍贵的哲学思想和研究⽅法,同时也给计算机视觉研究领域创造了许多研究起点。

[1.2] D. Marr, Vision A Computational Investigation into the Human Representation

[1.2] D. Marr, Vision A Computational Investigation into the Human Representation

CPSC505References Handout2 Items marked†will be on reserve for CPSC505in the CICSR/CS Reading Room.1TextbooksHorn[1.1]is highly recommended.In the past,it has been required in CPSC505.Despite its vintage,it remains the best textbook available. Marr[1.2]is an essential reference for anyone intending to do research in computational vision.Ballard and Brown[1.3]was required in CPSC505 prior to the publication of[1.1].Other texts are suggested to provide differ-ent perspectives.Faugeras[1.8]and Kanatani[1.9],for example,provide a geometric viewpoint.Duda and Hart[1.10]is a classic still worth reading.[1.1]B.K.P.Horn,Robot Vision.Cambridge,MA:MIT Press,1986.†[1.2]D.Marr,Vision:A Computational Investigation into the Human Rep-resentation and Processing of Visual Information.San Francisco,CA: W.H.Freeman,1982.†[1.3]D.H.Ballard and C.M.Brown,Computer Vision.Englewood Cliffs,NJ:Prentice-Hall,1982.†[1.4]M.D.Levine,Vision in Man and Machine.New York,NY:McGraw-Hill,1985.[1.5]A.Rosenfeld and A.C.Kak,Digital Picture Processing(2nd edition).New York,NY:Academic Press,1982.[1.6]R.Nevatia,Machine Perception.Englewood Cliffs,NJ:Prentice-Hall,1982.[1.7]W.K.Pratt,Digital Image Processing(2nd edition).New York,NY:John Wiley&Sons,1991.[1.8]O.Faugeras,Three-Dimensional Computer Vision.Cambridge,MA:MIT Press,1993.[1.9]K.Kanatani,Group-Theoretical Methods in Image Understanding.New York,NY:Springer-Verlag,1990.Handout22 [1.10]R.O.Duda and P.E.Hart,Pattern Classification and Scene Analysis.New York,NY:John Wiley&Sons,1973.2Edited CollectionsPublishing collections of papers that share a theme has become quite popular. Typically,the papers already have appeared elsewhere.The list of edited collections is given in reverse chronological order.The last three contain many of the classic computer vision papers.[2.1]L.Wolff,S.Shafer,and G.Healey,eds.,Physics-Based Vision:Princi-ples and Practice(Vol I:Radiometry).Boston,MA:Jones and Bartlett Publishers,Inc.,1992.†[2.2]G.Healey,S.Shafer,and L.Wolff,eds.,Physics-Based Vision:Prin-ciples and Practice(Vol II:Color).Boston,MA:Jones and Bartlett Publishers,Inc.,1992.†[2.3]L.Wolff,S.Shafer,and G.Healey,eds.,Physics-Based Vision:Prin-ciples and Practice(Vol III:Shape Recovery).Boston,MA:Jones and Bartlett Publishers,Inc.,1992.†[2.4]R.Kasturi and R.C.Jain,eds.,Computer Vision:Principles(Vol.1).Los Alamitos,CA:IEEE Computer Society Press,1991.[2.5]R.Kasturi and R.C.Jain,eds.,Computer Vision:Advances andApplications(Vol.2).Los Alamitos,CA:IEEE Computer Society Press,1991.[2.6]B.K.P.Horn and M.J.Brooks,eds.,Shape from Shading.Cambridge,MA:MIT Press,1989.[2.7]M.A.Fischler and O.Firschein,eds.,Readings in Computer Vision.Los Altos,CA:Morgan Kaufmann Publishers,Inc.,1987.†[2.8]M.Brady,ed.,Computer Vision.New York,NY:North-Holland,1981.†[2.9]A.R.Hanson and E.M.Riseman,eds.,Computer Vision Systems.New York,NY:Academic Press,1978.[2.10]P.H.Winston,ed.,The Psychology of Computer Vision.New York,NY:McGraw-Hill,1975.†Handout23 3Survey papersOccasionally,survey papers are published.They circumscribe thefield from the particular perspective of the authors.In recent years,computer vision has evolved to the point where no single survey paper can adequately cover the breadth of topics required.The recently published Encyclopedia of AI[3.1] contains a number of entries related to vision,some of which are further referenced here.Again,the list is given in reverse chronological order. [3.1]S.C.Shapiro,ed.,Encyclopedia of Artificial Intelligence(2nd Edition).New York,NY:John Wiley&Sons,1992.[3.2]J.Aloimonos and A.Rosenfeld,“Visual recovery,”in[3.1],pp.1664–1687,1992.†[3.3]T.Pavlidis,“Image analysis,”Annual Review of Computer Science,vol.3,pp.121–146,1988.†[3.4]E.C.Hildreth and J.M.Hollerbach,“Artificial intelligence:compu-tational approach to vision and motor control,”in Handbook of Physi-ology–The Nervous System V(F.Plum,ed.),pp.605–642,American Physiological Society,1985.†[3.5]M.Brady,“Artificial intelligence and robotics,”Artificial Intelligence,vol.26,pp.79–121,1985.†[3.6]T.O.Binford,“Survey of model-based image analysis systems,”Inter-national Journal of Robotics Research,vol.1,no.1,pp.18–64,1982.(Reprinted in[2.4]).†[3.7]M.Brady,“Computational approaches to image understanding,”ACMComputing Surveys,vol.14,pp.3–72,1982.†[3.8]H.G.Barrow and J.M.Tenenbaum,“Computational vision,”Pro-ceedings of the IEEE,vol.69,pp.572–595,1981.†4Binary image processingMaterial on binary image processing has been deleted from CPSC505.The material deleted is covered in Chapters3–4of the text[1.1].The references retained here expand on the material covered in the text,as traditionallyHandout24 was done in CPSC505.The focus is on the issues of connectivity and par-allelism in(binary)image processing.Reference[4.5]is an example of early work done at Hitachi.It is foundational to modern work based on morphol-ogy.Reference[4.6]is the basis of a Toshiba product thatfirst successfully read handwritten mail zip codes in the early1970’s.Minsky and Papert’s Perceptrons wasfirst published in1969.It was republished in1988as an “expanded edition”[4.8].The expansion consists of a new chapter,“Epi-logue:The New Connectionism.”This is recommended to enthusiasts of a connectionist(i.e.,neural network)approach to image analysis.[4.1]K.Preston Jr.,“Feature extraction by Golay hexagonal pattern trans-formations,”IEEE Transactions on Computers,vol.20,pp.1007–1014,1971.†[4.2]S.B.Gray,“Local properties of binary images in two dimensions,”IEEE Transactions on Computers,vol.20,pp.551–561,1971.†[4.3]M.J.E.Golay,“Hexagonal parallel pattern transformations,”IEEETransactions on Computers,vol.18,pp.733–740,1969.†[4.4]M.Ingram and K.Preston Jr.,“Automatic analysis of blood cells,”Scientific American,vol.223,pp.72–82,November,1970.†[4.5]M.Ejiri,T.Uno,M.Mese,and S.Ikeda,“A process for detectingdefects in complicated patterns,”Computer Graphics and Image Pro-cessing,vol.2,pp.326–339,1973.†[4.6]K.Mori,H.Genchi,S.Watanabe,and S.Katsuragi,“Microprogramcontrolled pattern processing in a handwritten mail reader-sorter,”Pattern Recognition,vol.2,pp.175–185,1970.†[4.7]A.Rosenfeld,“Connectivity in digital pictures,”Journal of the Asso-ciation for Computing Machinery,vol.17,pp.146–160,1970.†[4.8]M.L.Minsky and S.A.Papert,Perceptrons,Expanded Edition.Cam-bridge,MA:MIT Press,1988.5Fourier transform and sampling theory Reference[5.1]is a useful survey and includes an extensive bibliography. Reference[5.2]is a classic text that links Fourier analysis to optics.Refer-ences[5.3]and[5.4]both are seminal papers dating from the1940’s.BothHandout25 also are very readable.References[5.5]and[5.6]have been used as the basis for problem sets(and exam questions).[5.1]A.J.Jerri,“The Shannon sampling theorem—its various extensionsand applications:a tutorial review,”Proceedings of the IEEE,vol.65, pp.1565–1596,1977.†[5.2]J.W.Goodman,Introduction to Fourier Optics.New York,NY:McGraw-Hill,1968.[5.3]C.E.Shannon,“Communication in the presence of noise,”Proc.IRE,vol.37,pp.10–21,1949.†[5.4]D.Gabor,“Theory of communication,”J.Institute of Electrical En-gineers,vol.93,pp.429–457,1946.†[5.5]J.A.Parker,R.V.Kenyon,and D.E.Troxel,“Comparison of interpo-lating methods for image resampling,”IEEE Transactions on Medical Imaging,vol.2,pp.31–39,1983.†[5.6]S.Shlien,“Geometric correction,registration and resampling of Land-sat imagery,”Canadian Journal of Remote Sensing,vol.5,pp.74–89, 1979.†6Edge detectionThe Marr–Hildreth theory of edge detection is described in[1.2],[6.1]and [6.2].(Both[6.1]and[6.2]are key references,the former concentrating on theory and the latter on implementation).Reference[6.3]is a survey writ-ten years later.The definitive reference for Canny edge detection is[6.4]. Deriche’s extensions are described in[6.7]and[6.8].Torre and Poggio[6.5] provide a conceptual framework for edge detection that also serves as an important link to topics that follow.Finally,Blake and Zisserman step outside the domain of linear operators to introduce their Graduated Non-Convexity(GNC)algorithm.[6.1]D.Marr and E.C.Hildreth,“Theory of edge detection,”Proc.R.Soc.Lond.B,vol.207,pp.187–217,1980.(Reprinted in[2.4]).†Handout26 [6.2]E.C.Hildreth,“The detection of intensity changes by computer andbiological vision systems,”Computer Vision Graphics and Image Pro-cessing,vol.22,pp.1–27,1983.†[6.3]E.Hildreth,“Edge detection and local feature detection,”in[3.1],pp.422–434,1992.†[6.4]J.F.Canny,“A computational approach to edge detection,”IEEETransactions on Pattern Analysis and Machine Intelligence,vol.8, no.6,pp.679–698,1986.(Reprinted in[2.4],[2.7]).†[6.5]V.Torre and T.A.Poggio,“On edge detection,”IEEE Transactionson Pattern Analysis and Machine Intelligence,vol.8,pp.147–163, 1986.†[6.6]A.Blake and A.Zisserman,Visual Reconstruction.Cambridge,MA:MIT Press,1987.†[6.7]R.Deriche,“Using Canny’s criteria to derive a recursively imple-mented optimal edge detector,”International Journal of Computer Vision,vol.1,pp.167–187,1987.†[6.8]R.Deriche,“Fast algorithms for low-level vision,”IEEE Transactionson Pattern Analysis and Machine Intelligence,vol.12,no.1,pp.78–87,1990.†7Shape from shading and photometric stereo Twice,Horn has published new work on shape from shading shortly after publishing a book.Reference[7.3]followed[2.6]and reference[7.5](included in[2.6])followed[1.1].[7.1]R.J.Woodham,“Gradient and curvature from the photometric stereomethod including local confidence estimation,”Journal of the Optical Society of America,A,(in press),1994.†[7.2]S.K.Nayar,K.Ikeuchi,and T.Kanade,“Shape from interreflections,”International Journal of Computer Vision,vol.6,no.3,pp.173–195, 1991.(Reprinted in[2.3]).†Handout27 [7.3]B.K.P.Horn,“Height and gradient from shading,”InternationalJournal of Computer Vision,vol.5,pp.37–75,1990.(Reprinted in[2.3]).†[7.4]R.T.Frankot and R.Chellappa,“A method for enforcing integra-bility in shape from shading algorithms,”IEEE Transactions on Pat-tern Analysis and Machine Intelligence,vol.10,pp.439–451,1988.(Reprinted in[2.6]).†[7.5]B.K.P.Horn and M.J.Brooks,“The variational approach to shapefrom shading,”Computer Vision Graphics and Image Processing, vol.33,pp.174–208,1986.(Reprinted in[2.6]).†[7.6]K.Ikeuchi and B.K.P.Horn,“Numerical shape from shading andoccluding boundaries,”Artificial Intelligence,vol.17,pp.141–184, 1981.(Reprinted in[2.6],[2.8]).†[7.7]R.J.Woodham,“Analysing images of curved surfaces,”Artificial In-telligence,vol.17,pp.117–140,1981.(Reprinted in[2.8]).†[7.8]R.J.Woodham,“Photometric method for determining surface orienta-tion from multiple images,”Optical Engineering,vol.19,pp.139–144, 1980.(Reprinted in[2.3],[2.4],[2.6]).†[7.9]B.K.P.Horn and R.W.Sjoberg,“Calculating the reflectance map,”Applied Optics,vol.18,pp.1770–1779,1979.(Reprinted in[2.1],[2.6]).†[7.10]B.K.P.Horn,“Understanding image intensities,”Artificial Intelli-gence,vol.8,pp.201–231,1977.(Reprinted in[2.1],[2.7]).†[7.11]B.K.P.Horn,“Obtaining shape from shading information,”in[2.10],pp.115–155,1975.(Reprinted in[2.6]).†8Opticalflow and the2-D motionfield Motion analysis is surveyed in[8.1].Horn and Schunck[8.2]is the classic reference to opticalflow.Subsequent work has both explored the basic prob-lem formulation and suggested alternative ways to combine local evidence. References[8.3],[8.4],[8.5]and[8.6]are important contributions.Other ra-diometric factors can be significant,as shown in[8.7].The idea of usingHandout28 multiple light sources is explored in[8.8].Hildreth[8.9]explored the motion of contours using a methodology that is similar to[8.2]and that can be seen, in hindsight,to be another example of regularization.[8.1]T.S.Huang,“Visual motion analysis,”in[3.1],pp.1638–1655,1992.†[8.2]B.K.P.Horn and B.G.Schunck,“Determining opticalflow,”Artifi-cial Intelligence,vol.17,pp.185–203,1981.(Reprinted in[2.3],[2.4],[2.8]).†[8.3]J.K.Kearney,W.B.Thompson,and D.L.Boley,“Opticalflow esti-mation:an error analysis of gradient-based methods with local opti-mization,”IEEE Transactions on Pattern Analysis and Machine In-telligence,vol.9,pp.229–244,1987.†[8.4]B.G.Schunck,“Imageflow:fundamentals and algorithms,”in MotionUnderstanding:Robot and Human Vision(W.N.Martin and J.K.Aggarwal,eds.),pp.23–80,Boston,MA:Kluwer Academic Publishers, 1988.†[8.5]A.Verri and T.Poggio,“Motionfield and opticalflow:qualitativeproperties,”IEEE Transactions on Pattern Analysis and Machine In-telligence,vol.11,pp.490–498,1989.(Reprinted in[2.3],[2.5]).†[8.6]A.Verri,F.Girosi,and V.Torre,“The mathematical properties ofthe two-dimensional motionfield:from singular points to motion pa-rameters,”Journal of the Optical Society of America,A,vol.6,no.5, pp.698–712,1989.†[8.7]A.P.Pentland,“Photometric motion,”IEEE Transactions on PatternAnalysis and Machine Intelligence,vol.13,no.9,pp.879–890,1991.(Reprinted in[2.3]).†[8.8]R.J.Woodham,“Multiple light source opticalflow,”in Proc.3rd In-ternational Conference on Computer Vision,(Osaka,Japan),pp.42–46,1990.(Reprinted in[2.3]).†[8.9]E.C.Hildreth,The Measurement of Visual Motion.Cambridge,MA:MIT Press,1984.Handout29 9Mathematical ToolsThe classic references for regularization,translated from the Soviet litera-ture,are[9.1]and[9.2].The most commonly cited text is[9.3].Poggio was thefirst to note the connection between regularization and computational vision.Reference[9.4]documents the insight.Algorithmic support for regu-larization is less common.Stochastic processes seem related.References[9.5] and[9.7]are central.(Reference[9.7]is not for the easily discouraged.In-deed,[2.7]provides[9.6]as its own“guide to the reader”).Reference[9.8] is an example of recent work that establishes a connection to Markov Ran-dom Fields(MRF’s).There also is new interest in connections to“robust statistics,”as evidenced in[9.9]and[9.10].[9.1]A.N.Tikhonov,“Regularization of incorrectly posed problems,”So-viet Math.Dokl.,vol.4,pp.1624–1627,1963.†[9.2]A.N.Tikhonov,“Solution of incorrectly formulated problems and theregularization method,”Soviet Math.Dokl.,vol.4,pp.1035–1038, 1963.†[9.3]A.N.Tikhonov and V.Y.Arsenin,Solutions of ill-posed problems.Washington,DC:V.H.Winston&Sons,1977.[9.4]T.Poggio,V.Torre,and C.Koch,“Computational vision and regu-larization theory,”Nature,vol.317,pp.314–319,1985.(Reprinted in[2.7]).†[9.5]S.Kirkpatrick,C.D.Gelatt,and M.P.Vecchi,“Optimization bysimulated annealing,”Science,vol.220,pp.671–680,1983.(Reprinted in[2.7]).†[9.6]G.B.Smith,“Preface to S.Geman and D.Geman:Stochastic relax-ation,Gibbs distributions and the Bayesian restoration of images,”in[2.7],pp.562–563,1987.†[9.7]S.Geman and D.Geman,“Stochastic relaxation,Gibbs distributions,and the Bayesian restoration of images,”IEEE Transactions on Pat-tern Analysis and Machine Intelligence,vol.6,pp.721–741,1984.(Reprinted in[2.7]).†[9.8]D.Geiger and F.Girosi,“Parallel and deterministic algorithms fromHandout210 MRF’s:surface reconstruction,”IEEE Transactions on Pattern Anal-ysis and Machine Intelligence,vol.13,no.5,pp.401–412,1991.†[9.9]F.Girosi,“Models of noise and robust estimation,”AI-Memo-1287,MIT AI Laboratory,Cambridge,MA,1991.†[9.10]M.J.Black and A.Rangarajan,“The outlier process:unifying lineprocesses and robust statistics,”in Proc.IEEE puter Vision and Pattern Recognition,1994,(Seattle,WA),pp.15–22,1994.†。

认知心理学 英文介绍

认知心理学 英文介绍

Pause
认知心理学的定位
• • • • 认识的三个层次: 1. Episteme 认识-人类对世界的一般性认识 2. Cognition 认知-个体对世界的一般性认识 3. Know,Understand 知道(认识)个体对个别事 件的认识
• 列举考察三个层面上认识的学科代表。 • 认知心理学总体上介于2和3之间,并努力影响1。
认知心理学的思想构成
• 两个问题构成了认知心理学的基础:
– 1.人如何有思想?——皮亚杰 – 2.如何让机器有思想?——图灵
• 这两个问题的交汇点称为认知科学。
• 交叉以后的分支:
– 1. 认知心理学——回答人的问题 – 2. 计算机科学(人工智能)——回答机器的问题。
• 问题:在交汇点上最核心的学科支持是什么学科?
参考书
• 4. 《视觉-人类对视觉 信息的类计算机处理 和表达的探索》(Vision: A Computational Investigation into the Human Representation and Processing of Visual Information) • David Marr • The MIT Press
– 3 篇英文文献
• 经典源文、领域综述、最新进展
• 考试
– 平时40%(课堂 笔记和提问、课外阅读和报告或实验参与) – 考试60%
教材
• 1. 《认知心理学》 • 王甦、汪安生 • 北京大学出版社
教材
• 2.《认知心理学》---理 论、实验和应用 • 邵志芳 • 上海教育出版社
参考书
• 1. 《认知心理学》 • 艾森克,基恩 • 华东师范大学出版社
认知心理学
Ya-Jun Zhao yajunzhaois@

DavidMarr对计算机视觉的贡献

DavidMarr对计算机视觉的贡献

DavidMarr对计算机视觉的贡献David Marr生于1945年1月19日,早年就读于剑桥大学,获得数学硕士、神经生理学博士学位,同时还受过神经解剖学、心理学、生物化学等方面的严格训练。

他在英国曾从事新皮层、海马,特别是小脑方面的理论研究。

1974年访问美国,并应M.Minsky教授之请,留在麻省理工学院开展知觉和记忆方面的研究工作。

他从计算机科学的观点出发,熔数学、心理物理学、神经生理学于一炉,首创人的视觉计算理论,从而使视觉研究的面貌为之一新。

Marr于1980年11月17日在波士顿病死,享年35岁。

他的理论由他创建的一个以博士研究生为主体的研究小组继承、丰富和发展,并由其学生归纳总结为一本计算机视觉领域著作:Vision: A computational investigation into thehuman representation and processing of visual information (ISBN 0-7167-1567-8),于他后发表。

从人工智能杂志1981年第17卷“计算机视觉”专辑中,我们可以清楚地看到这一理论已产生巨大的影响。

从20世纪70年代以来,随着认知心理学自身的发展,认知心理学关于模式识别的研究在取向上出现了某些重要的变化。

一些认知心理学家继续在物理符号系统假设的基础上进行研究,探讨计算机和人的识别模式的特点;而另一些认知心理学家则转向用神经网络的思想来研究识别模式的问题。

下面介绍的一些模型是近十多年来有重要影响的理论模型。

视觉计算理论(computational theory of vision)是在20世纪70年代由马尔(David Marr)提出的。

1982发表代表作《视觉计算理论》。

DavidMarr 奠定了这个领域叫做计算视觉,这包含了两个领域:一个就是计算机视觉(Computer Vision),一个是计算神经学(ComputationalNeuroscience)。

Marr-Vision

Marr-Vision

第五章 以识别为目的的物体形状表达 5.1 3-D模型表达 一种规范的3-D模型表达是基于物体形状 几何对称轴的表达. 这种表达可用来描 述广义锥. 骨化图
三维物体的模块化表达
1)模型的主轴 2)物体主要组成部分的主轴,及它们沿 模型主轴的空间分布和它们的尺寸. 3)物体各组成部分的名称.
例子 human arm forearm hand
刚性约束 从三幅不同角度的4个非共面刚性物体点的投 影图像,可以计算出这4个点的3维结构. 刚性假设 任何2维点集如果可以用在空间中运动的刚性 物体的点集的投影来解释,则人眼就是这样解 释的. 缺点:不够鲁棒.
3.3形状轮廓
遮挡轮廓 由于物体深度的不连续导致这种轮廓, 它通常对应于二维图像中的物体轮廓 线. 表面方向轮廓线 物体表面方向的突变产生的轮廓线.
遮挡轮廓线
导致我们理解轮廓的4条假设 1)轮廓产生体上的每个不同的点投影 到图像轮廓上的不同点. 2)轮廓产生体上相邻的点投影到图像 轮廓上相邻的点 3)轮廓产生体位于空间一平面. 4)被观察的物体是广义的锥体.
光照
从光照变化恢复物体形状(shape from shading)研究如何从图像的亮度值恢复物 体表面的倾斜角. 倾斜度空间.一个物体的倾斜角可用两 个参数表示:p, q.

2.1 过零点(zero-crossing)
为了能检测亮度的变化,一个滤波器应具有如 下特性: 1)它是个微分算子 2)它可以被调制到任意尺度 2G 是具有这些特性的最优滤波器
2 = 2 x
2
+
2 y 2
G ( x , y) =
x 2 + y2 2 πσ 2 e
为什么选择
2G
高斯分布在物理空间和频率空间都是平 滑和定位的. 2 是一个各项同性的算子,这样可以减 少计算量.

1 绪论

1  绪论

•早期文献: Olsztyn. Joseph T. et al. Application of computer vision to a
simulated assembly task. American Society of Mechanical Engineers, 1973
MIT AI Lab 吸引了国际上许多知名学者参与视觉理论、算法、系统设
自由曲面,来生成图像,它在可视化(Visualization)和虚拟现实(Virtual Reality)中起着很重要的作用.计算机视觉正好是解决相反的问题,即从 图像中估计几何基元和其它特征.因此,计算机图形学属于图像综合,计 算机视觉属于图像分析(图像识别/理解)。
3. 模式识别Pattern Recognition :模式一般指一类事物区别于其它
28
1.2.5 视觉测量技术的应用
• 产品测量:视觉三维坐标测量机 • 逆向工程(reverse engineering): 以三维尺寸测量方
式建立出用于自由曲面的逆向工程
• 质量检验: ……. • 机器人导航:场景三维信息还原,自身定位与姿态估计
29
1.2.6 视觉测量技术的发展趋势
•实现在线实时检测:使用专用硬件实现独立于环境的处理算法 •实现智能化检测:利用许多传感器获得测量信息,从而得出所 需的测量结果,对加工过程进行控制
计算机视觉也称为机器视觉(machine vision),可以实现人 类视觉系统理解外部世界、完成各种测量和判断的功能。计算 机视觉是利用计算机对采集的图像或者视频进行处理,实现对 客观世界三维场景的感知、识别和理解。。
3
发展计算机视觉的意义
提高生产的柔性和自动化程度 扩大人类视觉的应用领域,具有可靠性和更为宽广的波谱感 受范围 既是工程领域,也是科学领域中一个富有挑战性的重要研究 领域 美国把对计算机视觉的研究列为对经济和科学有广泛影响的科 学和工程中的重大基本问题,即所谓的重大挑战

视觉传感技术

视觉传感技术

7.2.2 电荷耦合摄像器件工作原理
如图 7-2(a) 空势阱的情况。在没有反型层 时,势阱的深度和UG成正比例关系。 如图 7-2(b) ,当反型层电荷填充势阱时, 表面势收缩, 如图 7-2(c) 所示,反型层电荷浓度继续增 加,势阱被填充更多,此时表面不再束缚 多余的电子,电子将产生“溢出”现象。
精度问题
传统的计算机视觉研究,侧重于定性的三维场景识别和理解, 定量的精度分析很少涉及或不涉及;视觉传感测量则是以计 算机视觉为理论基础,结合精密测量、测试理论,解决工程 应用领域内的测量问题,要求在满足一定的精度前提下,实 现被测对象的可靠测量。
7.1.3
视觉传感技术的发展
视觉传感应用于测量是多方面的, 一个主要的研究领域就是基于视 觉传感的几何量测量―视觉测量, 尤其是3D坐标尺寸测量,在现 代工业制造领域内有着广泛的应 用背景。
7.1.1
生物视觉与机器视觉
借助于信息处理理论、电子器件和计算机技术的进步,人们 试图用摄像机获取环境场景图像,转化为计算机处理的数字 信号,由计算机平台进行视觉信息处理,由此诞生一门新兴 学科-计算机视觉。
程应用的考虑,在计算机视觉中可以将视觉传感(信 息获取)、视觉信息处理、理解和认知等环节分开考虑,一 方面简化了类生物视觉系统复杂的相互作用体系结构,同时 便于现有计算机平台的实现。将计算机视觉用于工程应用, 产生了一门新的学科-机器视觉。
7.1.2
Marr 计算机视觉理论
Marr将视觉过程区分为三个阶段:
图像->要素图->2.5维图->三维表示
第一阶段,称为早期视觉,由输入二维图像获得要素图。 第二阶段,称为中期视觉,由要素图获取2.5维图。 第三阶段,称为后期视觉,由输入图像、要素图和2.5维图获得 环境场景的三维表示。
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

CPSC505References Handout2 Items marked†will be on reserve for CPSC505in the CICSR/CS Reading Room.1TextbooksHorn[1.1]is highly recommended.In the past,it has been required in CPSC505.Despite its vintage,it remains the best textbook available. Marr[1.2]is an essential reference for anyone intending to do research in computational vision.Ballard and Brown[1.3]was required in CPSC505 prior to the publication of[1.1].Other texts are suggested to provide differ-ent perspectives.Faugeras[1.8]and Kanatani[1.9],for example,provide a geometric viewpoint.Duda and Hart[1.10]is a classic still worth reading.[1.1]B.K.P.Horn,Robot Vision.Cambridge,MA:MIT Press,1986.†[1.2]D.Marr,Vision:A Computational Investigation into the Human Rep-resentation and Processing of Visual Information.San Francisco,CA: W.H.Freeman,1982.†[1.3]D.H.Ballard and C.M.Brown,Computer Vision.Englewood Cliffs,NJ:Prentice-Hall,1982.†[1.4]M.D.Levine,Vision in Man and Machine.New York,NY:McGraw-Hill,1985.[1.5]A.Rosenfeld and A.C.Kak,Digital Picture Processing(2nd edition).New York,NY:Academic Press,1982.[1.6]R.Nevatia,Machine Perception.Englewood Cliffs,NJ:Prentice-Hall,1982.[1.7]W.K.Pratt,Digital Image Processing(2nd edition).New York,NY:John Wiley&Sons,1991.[1.8]O.Faugeras,Three-Dimensional Computer Vision.Cambridge,MA:MIT Press,1993.[1.9]K.Kanatani,Group-Theoretical Methods in Image Understanding.New York,NY:Springer-Verlag,1990.Handout22 [1.10]R.O.Duda and P.E.Hart,Pattern Classification and Scene Analysis.New York,NY:John Wiley&Sons,1973.2Edited CollectionsPublishing collections of papers that share a theme has become quite popular. Typically,the papers already have appeared elsewhere.The list of edited collections is given in reverse chronological order.The last three contain many of the classic computer vision papers.[2.1]L.Wolff,S.Shafer,and G.Healey,eds.,Physics-Based Vision:Princi-ples and Practice(Vol I:Radiometry).Boston,MA:Jones and Bartlett Publishers,Inc.,1992.†[2.2]G.Healey,S.Shafer,and L.Wolff,eds.,Physics-Based Vision:Prin-ciples and Practice(Vol II:Color).Boston,MA:Jones and Bartlett Publishers,Inc.,1992.†[2.3]L.Wolff,S.Shafer,and G.Healey,eds.,Physics-Based Vision:Prin-ciples and Practice(Vol III:Shape Recovery).Boston,MA:Jones and Bartlett Publishers,Inc.,1992.†[2.4]R.Kasturi and R.C.Jain,eds.,Computer Vision:Principles(Vol.1).Los Alamitos,CA:IEEE Computer Society Press,1991.[2.5]R.Kasturi and R.C.Jain,eds.,Computer Vision:Advances andApplications(Vol.2).Los Alamitos,CA:IEEE Computer Society Press,1991.[2.6]B.K.P.Horn and M.J.Brooks,eds.,Shape from Shading.Cambridge,MA:MIT Press,1989.[2.7]M.A.Fischler and O.Firschein,eds.,Readings in Computer Vision.Los Altos,CA:Morgan Kaufmann Publishers,Inc.,1987.†[2.8]M.Brady,ed.,Computer Vision.New York,NY:North-Holland,1981.†[2.9]A.R.Hanson and E.M.Riseman,eds.,Computer Vision Systems.New York,NY:Academic Press,1978.[2.10]P.H.Winston,ed.,The Psychology of Computer Vision.New York,NY:McGraw-Hill,1975.†Handout23 3Survey papersOccasionally,survey papers are published.They circumscribe thefield from the particular perspective of the authors.In recent years,computer vision has evolved to the point where no single survey paper can adequately cover the breadth of topics required.The recently published Encyclopedia of AI[3.1] contains a number of entries related to vision,some of which are further referenced here.Again,the list is given in reverse chronological order. [3.1]S.C.Shapiro,ed.,Encyclopedia of Artificial Intelligence(2nd Edition).New York,NY:John Wiley&Sons,1992.[3.2]J.Aloimonos and A.Rosenfeld,“Visual recovery,”in[3.1],pp.1664–1687,1992.†[3.3]T.Pavlidis,“Image analysis,”Annual Review of Computer Science,vol.3,pp.121–146,1988.†[3.4]E.C.Hildreth and J.M.Hollerbach,“Artificial intelligence:compu-tational approach to vision and motor control,”in Handbook of Physi-ology–The Nervous System V(F.Plum,ed.),pp.605–642,American Physiological Society,1985.†[3.5]M.Brady,“Artificial intelligence and robotics,”Artificial Intelligence,vol.26,pp.79–121,1985.†[3.6]T.O.Binford,“Survey of model-based image analysis systems,”Inter-national Journal of Robotics Research,vol.1,no.1,pp.18–64,1982.(Reprinted in[2.4]).†[3.7]M.Brady,“Computational approaches to image understanding,”ACMComputing Surveys,vol.14,pp.3–72,1982.†[3.8]H.G.Barrow and J.M.Tenenbaum,“Computational vision,”Pro-ceedings of the IEEE,vol.69,pp.572–595,1981.†4Binary image processingMaterial on binary image processing has been deleted from CPSC505.The material deleted is covered in Chapters3–4of the text[1.1].The references retained here expand on the material covered in the text,as traditionallyHandout24 was done in CPSC505.The focus is on the issues of connectivity and par-allelism in(binary)image processing.Reference[4.5]is an example of early work done at Hitachi.It is foundational to modern work based on morphol-ogy.Reference[4.6]is the basis of a Toshiba product thatfirst successfully read handwritten mail zip codes in the early1970’s.Minsky and Papert’s Perceptrons wasfirst published in1969.It was republished in1988as an “expanded edition”[4.8].The expansion consists of a new chapter,“Epi-logue:The New Connectionism.”This is recommended to enthusiasts of a connectionist(i.e.,neural network)approach to image analysis.[4.1]K.Preston Jr.,“Feature extraction by Golay hexagonal pattern trans-formations,”IEEE Transactions on Computers,vol.20,pp.1007–1014,1971.†[4.2]S.B.Gray,“Local properties of binary images in two dimensions,”IEEE Transactions on Computers,vol.20,pp.551–561,1971.†[4.3]M.J.E.Golay,“Hexagonal parallel pattern transformations,”IEEETransactions on Computers,vol.18,pp.733–740,1969.†[4.4]M.Ingram and K.Preston Jr.,“Automatic analysis of blood cells,”Scientific American,vol.223,pp.72–82,November,1970.†[4.5]M.Ejiri,T.Uno,M.Mese,and S.Ikeda,“A process for detectingdefects in complicated patterns,”Computer Graphics and Image Pro-cessing,vol.2,pp.326–339,1973.†[4.6]K.Mori,H.Genchi,S.Watanabe,and S.Katsuragi,“Microprogramcontrolled pattern processing in a handwritten mail reader-sorter,”Pattern Recognition,vol.2,pp.175–185,1970.†[4.7]A.Rosenfeld,“Connectivity in digital pictures,”Journal of the Asso-ciation for Computing Machinery,vol.17,pp.146–160,1970.†[4.8]M.L.Minsky and S.A.Papert,Perceptrons,Expanded Edition.Cam-bridge,MA:MIT Press,1988.5Fourier transform and sampling theory Reference[5.1]is a useful survey and includes an extensive bibliography. Reference[5.2]is a classic text that links Fourier analysis to optics.Refer-ences[5.3]and[5.4]both are seminal papers dating from the1940’s.BothHandout25 also are very readable.References[5.5]and[5.6]have been used as the basis for problem sets(and exam questions).[5.1]A.J.Jerri,“The Shannon sampling theorem—its various extensionsand applications:a tutorial review,”Proceedings of the IEEE,vol.65, pp.1565–1596,1977.†[5.2]J.W.Goodman,Introduction to Fourier Optics.New York,NY:McGraw-Hill,1968.[5.3]C.E.Shannon,“Communication in the presence of noise,”Proc.IRE,vol.37,pp.10–21,1949.†[5.4]D.Gabor,“Theory of communication,”J.Institute of Electrical En-gineers,vol.93,pp.429–457,1946.†[5.5]J.A.Parker,R.V.Kenyon,and D.E.Troxel,“Comparison of interpo-lating methods for image resampling,”IEEE Transactions on Medical Imaging,vol.2,pp.31–39,1983.†[5.6]S.Shlien,“Geometric correction,registration and resampling of Land-sat imagery,”Canadian Journal of Remote Sensing,vol.5,pp.74–89, 1979.†6Edge detectionThe Marr–Hildreth theory of edge detection is described in[1.2],[6.1]and [6.2].(Both[6.1]and[6.2]are key references,the former concentrating on theory and the latter on implementation).Reference[6.3]is a survey writ-ten years later.The definitive reference for Canny edge detection is[6.4]. Deriche’s extensions are described in[6.7]and[6.8].Torre and Poggio[6.5] provide a conceptual framework for edge detection that also serves as an important link to topics that follow.Finally,Blake and Zisserman step outside the domain of linear operators to introduce their Graduated Non-Convexity(GNC)algorithm.[6.1]D.Marr and E.C.Hildreth,“Theory of edge detection,”Proc.R.Soc.Lond.B,vol.207,pp.187–217,1980.(Reprinted in[2.4]).†Handout26 [6.2]E.C.Hildreth,“The detection of intensity changes by computer andbiological vision systems,”Computer Vision Graphics and Image Pro-cessing,vol.22,pp.1–27,1983.†[6.3]E.Hildreth,“Edge detection and local feature detection,”in[3.1],pp.422–434,1992.†[6.4]J.F.Canny,“A computational approach to edge detection,”IEEETransactions on Pattern Analysis and Machine Intelligence,vol.8, no.6,pp.679–698,1986.(Reprinted in[2.4],[2.7]).†[6.5]V.Torre and T.A.Poggio,“On edge detection,”IEEE Transactionson Pattern Analysis and Machine Intelligence,vol.8,pp.147–163, 1986.†[6.6]A.Blake and A.Zisserman,Visual Reconstruction.Cambridge,MA:MIT Press,1987.†[6.7]R.Deriche,“Using Canny’s criteria to derive a recursively imple-mented optimal edge detector,”International Journal of Computer Vision,vol.1,pp.167–187,1987.†[6.8]R.Deriche,“Fast algorithms for low-level vision,”IEEE Transactionson Pattern Analysis and Machine Intelligence,vol.12,no.1,pp.78–87,1990.†7Shape from shading and photometric stereo Twice,Horn has published new work on shape from shading shortly after publishing a book.Reference[7.3]followed[2.6]and reference[7.5](included in[2.6])followed[1.1].[7.1]R.J.Woodham,“Gradient and curvature from the photometric stereomethod including local confidence estimation,”Journal of the Optical Society of America,A,(in press),1994.†[7.2]S.K.Nayar,K.Ikeuchi,and T.Kanade,“Shape from interreflections,”International Journal of Computer Vision,vol.6,no.3,pp.173–195, 1991.(Reprinted in[2.3]).†Handout27 [7.3]B.K.P.Horn,“Height and gradient from shading,”InternationalJournal of Computer Vision,vol.5,pp.37–75,1990.(Reprinted in[2.3]).†[7.4]R.T.Frankot and R.Chellappa,“A method for enforcing integra-bility in shape from shading algorithms,”IEEE Transactions on Pat-tern Analysis and Machine Intelligence,vol.10,pp.439–451,1988.(Reprinted in[2.6]).†[7.5]B.K.P.Horn and M.J.Brooks,“The variational approach to shapefrom shading,”Computer Vision Graphics and Image Processing, vol.33,pp.174–208,1986.(Reprinted in[2.6]).†[7.6]K.Ikeuchi and B.K.P.Horn,“Numerical shape from shading andoccluding boundaries,”Artificial Intelligence,vol.17,pp.141–184, 1981.(Reprinted in[2.6],[2.8]).†[7.7]R.J.Woodham,“Analysing images of curved surfaces,”Artificial In-telligence,vol.17,pp.117–140,1981.(Reprinted in[2.8]).†[7.8]R.J.Woodham,“Photometric method for determining surface orienta-tion from multiple images,”Optical Engineering,vol.19,pp.139–144, 1980.(Reprinted in[2.3],[2.4],[2.6]).†[7.9]B.K.P.Horn and R.W.Sjoberg,“Calculating the reflectance map,”Applied Optics,vol.18,pp.1770–1779,1979.(Reprinted in[2.1],[2.6]).†[7.10]B.K.P.Horn,“Understanding image intensities,”Artificial Intelli-gence,vol.8,pp.201–231,1977.(Reprinted in[2.1],[2.7]).†[7.11]B.K.P.Horn,“Obtaining shape from shading information,”in[2.10],pp.115–155,1975.(Reprinted in[2.6]).†8Opticalflow and the2-D motionfield Motion analysis is surveyed in[8.1].Horn and Schunck[8.2]is the classic reference to opticalflow.Subsequent work has both explored the basic prob-lem formulation and suggested alternative ways to combine local evidence. References[8.3],[8.4],[8.5]and[8.6]are important contributions.Other ra-diometric factors can be significant,as shown in[8.7].The idea of usingHandout28 multiple light sources is explored in[8.8].Hildreth[8.9]explored the motion of contours using a methodology that is similar to[8.2]and that can be seen, in hindsight,to be another example of regularization.[8.1]T.S.Huang,“Visual motion analysis,”in[3.1],pp.1638–1655,1992.†[8.2]B.K.P.Horn and B.G.Schunck,“Determining opticalflow,”Artifi-cial Intelligence,vol.17,pp.185–203,1981.(Reprinted in[2.3],[2.4],[2.8]).†[8.3]J.K.Kearney,W.B.Thompson,and D.L.Boley,“Opticalflow esti-mation:an error analysis of gradient-based methods with local opti-mization,”IEEE Transactions on Pattern Analysis and Machine In-telligence,vol.9,pp.229–244,1987.†[8.4]B.G.Schunck,“Imageflow:fundamentals and algorithms,”in MotionUnderstanding:Robot and Human Vision(W.N.Martin and J.K.Aggarwal,eds.),pp.23–80,Boston,MA:Kluwer Academic Publishers, 1988.†[8.5]A.Verri and T.Poggio,“Motionfield and opticalflow:qualitativeproperties,”IEEE Transactions on Pattern Analysis and Machine In-telligence,vol.11,pp.490–498,1989.(Reprinted in[2.3],[2.5]).†[8.6]A.Verri,F.Girosi,and V.Torre,“The mathematical properties ofthe two-dimensional motionfield:from singular points to motion pa-rameters,”Journal of the Optical Society of America,A,vol.6,no.5, pp.698–712,1989.†[8.7]A.P.Pentland,“Photometric motion,”IEEE Transactions on PatternAnalysis and Machine Intelligence,vol.13,no.9,pp.879–890,1991.(Reprinted in[2.3]).†[8.8]R.J.Woodham,“Multiple light source opticalflow,”in Proc.3rd In-ternational Conference on Computer Vision,(Osaka,Japan),pp.42–46,1990.(Reprinted in[2.3]).†[8.9]E.C.Hildreth,The Measurement of Visual Motion.Cambridge,MA:MIT Press,1984.Handout29 9Mathematical ToolsThe classic references for regularization,translated from the Soviet litera-ture,are[9.1]and[9.2].The most commonly cited text is[9.3].Poggio was thefirst to note the connection between regularization and computational vision.Reference[9.4]documents the insight.Algorithmic support for regu-larization is less common.Stochastic processes seem related.References[9.5] and[9.7]are central.(Reference[9.7]is not for the easily discouraged.In-deed,[2.7]provides[9.6]as its own“guide to the reader”).Reference[9.8] is an example of recent work that establishes a connection to Markov Ran-dom Fields(MRF’s).There also is new interest in connections to“robust statistics,”as evidenced in[9.9]and[9.10].[9.1]A.N.Tikhonov,“Regularization of incorrectly posed problems,”So-viet Math.Dokl.,vol.4,pp.1624–1627,1963.†[9.2]A.N.Tikhonov,“Solution of incorrectly formulated problems and theregularization method,”Soviet Math.Dokl.,vol.4,pp.1035–1038, 1963.†[9.3]A.N.Tikhonov and V.Y.Arsenin,Solutions of ill-posed problems.Washington,DC:V.H.Winston&Sons,1977.[9.4]T.Poggio,V.Torre,and C.Koch,“Computational vision and regu-larization theory,”Nature,vol.317,pp.314–319,1985.(Reprinted in[2.7]).†[9.5]S.Kirkpatrick,C.D.Gelatt,and M.P.Vecchi,“Optimization bysimulated annealing,”Science,vol.220,pp.671–680,1983.(Reprinted in[2.7]).†[9.6]G.B.Smith,“Preface to S.Geman and D.Geman:Stochastic relax-ation,Gibbs distributions and the Bayesian restoration of images,”in[2.7],pp.562–563,1987.†[9.7]S.Geman and D.Geman,“Stochastic relaxation,Gibbs distributions,and the Bayesian restoration of images,”IEEE Transactions on Pat-tern Analysis and Machine Intelligence,vol.6,pp.721–741,1984.(Reprinted in[2.7]).†[9.8]D.Geiger and F.Girosi,“Parallel and deterministic algorithms fromHandout210 MRF’s:surface reconstruction,”IEEE Transactions on Pattern Anal-ysis and Machine Intelligence,vol.13,no.5,pp.401–412,1991.†[9.9]F.Girosi,“Models of noise and robust estimation,”AI-Memo-1287,MIT AI Laboratory,Cambridge,MA,1991.†[9.10]M.J.Black and A.Rangarajan,“The outlier process:unifying lineprocesses and robust statistics,”in Proc.IEEE puter Vision and Pattern Recognition,1994,(Seattle,WA),pp.15–22,1994.†。

相关文档
最新文档