Ajanthan_Iteratively_Reweighted_Graph_2015_CVPR_paper

合集下载

集成学习综述

集成学习综述梁英毅摘要机器学习方法在生产、科研和生活中有着广泛应用，而集成学习则是机器学习的首要热门方向[1]。

集成学习是使用一系列学习器进行学习，并使用某种规则把各个学习结果进行整合从而获得比单个学习器更好的学习效果的一种机器学习方法。

本文对集成学习的概念以及一些主要的集成学习方法进行简介，以便于进行进一步的研究。

一、引言机器学习是计算机科学中研究怎么让机器具有学习能力的分支，[2]把机器学习的目标归纳为“给出关于如何进行学习的严格的、计算上具体的、合理的说明”。

[3]指出四类问题的解决对于人类来说是困难的甚至不可能的，从而说明机器学习的必要性。

目前，机器学习方法已经在科学研究、语音识别、人脸识别、手写识别、数据挖掘、医疗诊断、游戏等等领域之中得到应用[1, 4]。

随着机器学习方法的普及，机器学习方面的研究也越来越热门，目前来说机器学习的研究主要分为四个大方向[1]： a) 通过集成学习方法提高学习精度；b) 扩大学习规模；c) 强化学习；d) 学习复杂的随机模型；有关Machine Learning 的进一步介绍请参考[5, 1,3, 4, 6]。

本文的目的是对集成学习的各种方法进行综述，以了解当前集成学习方面的进展和问题。

本文以下内容组织如下：第二节首先介绍集成学习；第三节对一些常见的集成学习方法进行简单介绍；第四节给出一些关于集成学习的分析方法和分析结果。

二、集成学习简介1、分类问题分类问题属于概念学习的范畴。

分类问题是集成学习的基本研究问题，简单来说就是把一系列实例根据某种规则进行分类，这实际上是要寻找某个函数)(x f y =，使得对于一个给定的实例x ，找出正确的分类。

机器学习中的解决思路是通过某种学习方法在假设空间中找出一个足够好的函数来近似，这个近似函数就叫做分类器[7]。

y h f h2、什么是集成学习传统的机器学习方法是在一个由各种可能的函数构成的空间（称为“假设空间”）中寻找一个最接近实际分类函数的分类器h [6]。

GSPBOX_-Atoolboxforsignalprocessingongraphs_

GSPBOX_-Atoolboxforsignalprocessingongraphs_GSPBOX:A toolbox for signal processing on graphsNathanael Perraudin,Johan Paratte,David Shuman,Lionel Martin Vassilis Kalofolias,Pierre Vandergheynst and David K.HammondMarch 16,2016AbstractThis document introduces the Graph Signal Processing Toolbox (GSPBox)a framework that can be used to tackle graph related problems with a signal processing approach.It explains the structure and the organization of this software.It also contains a general description of the important modules.1Toolbox organizationIn this document,we brie?y describe the different modules available in the toolbox.For each of them,the main functions are brie?y described.This chapter should help making the connection between the theoretical concepts introduced in [7,9,6]and the technical documentation provided with the toolbox.We highly recommend to read this document and the tutorial before using the toolbox.The documentation,the tutorials and other resources are available on-line 1.The toolbox has ?rst been implemented in MATLAB but a port to Python,called the PyGSP,has been made recently.As of the time of writing of this document,not all the functionalities have been ported to Python,but the main modules are already available.In the following,functions pre?xed by [M]:refer to the MATLAB implementation and the ones pre?xed with [P]:refer to the Python implementation. 1.1General structure of the toolbox (MATLAB)The general design of the GSPBox focuses around the graph object [7],a MATLAB structure containing the necessary infor-mations to use most of the algorithms.By default,only a few attributes are available (see section 2),allowing only the use of a subset of functions.In order to enable the use of more algorithms,additional ?elds can be added to the graph structure.For example,the following line will compute the graph Fourier basis enabling exact ?ltering operations.1G =gsp_compute_fourier_basis(G);Ideally,this operation should be done on the ?y when exact ?ltering is required.Unfortunately,the lack of well de?ned class paradigm in MATLAB makes it too complicated to be implemented.Luckily,the above formulation prevents any unnecessary data copy of the data contained in the structure G .In order to avoid name con?icts,all functions in the GSPBox start with [M]:gsp_.A second important convention is that all functions applying a graph algorithm on a graph signal takes the graph as ?rst argument.For example,the graph Fourier transform of the vector f is computed by1fhat =gsp_gft(G,f);1Seehttps://lts2.epfl.ch/gsp/doc/for MATLAB and https://lts2.epfl.ch/pygsp for Python.The full documentation is also avail-able in a single document:https://lts2.epfl.ch/gsp/gspbox.pdf1a r X i v :1408.5781v 2 [c s .I T ] 15 M a r 2016The graph operators are described in section4.Filtering a signal on a graph is also a linear operation.However,since the design of special?lters(kernels)is important,they are regrouped in a dedicated module(see section5).The toolbox contains two additional important modules.The optimization module contains proximal operators,projections and solvers compatible with the UNLocBoX[5](see section6).These functions facilitate the de?nition of convex optimization problems using graphs.Finally,section??is composed of well known graph machine learning algorithms.1.2General structure of the toolbox(Python)The structure of the Python toolbox follows closely the MATLAB one.The major difference comes from the fact that the Python implementation is object-oriented and thus allows for a natural use of instances of the graph object.For example the equivalent of the MATLAB call:1G=gsp_estimate_lmax(G);can be achieved using a simple method call on the graph object:1G.estimate_lmax()Moreover,the use of class for the"graph object"allows to compute additional graph attributes on the?y,making the code clearer as its MATLAB equivalent.Note though that functionalities are grouped into different modules(one per section below) and that several functions that work on graphs have to be called directly from the modules.For example,one should write:1layers=pygsp.operators.kron_pyramid(G,levels)This is the case as soon as the graph is the structure on which the action has to be performed and not our principal focus.In a similar way to the MATLAB implementation using the UNLocBoX for the convex optimization routines,the Python implementation uses the PyUNLocBoX,which is the Python port of the UNLocBoX. 2GraphsThe GSPBox is constructed around one main object:the graph.It is implemented as a structure in Matlab and as a class in Python.It stores the nodes,the edges and other attributes related to the graph.In the implementation,a graph is fully de?ned by the weight matrix W,which is the main and only required attribute.Since most graph structures are far from fully connected, W is implemented as a sparse matrix.From the weight matrix a Laplacian matrix is computed and stored as an attribute of the graph object.Different other attributes are available such as plotting attributes,vertex coordinates,the degree matrix,the number of vertices and edges.The list of all attributes is given in table1.2Attribute Format Data type DescriptionMandatory?eldsW N x N sparse matrix double Weight matrix WL N x N sparse matrix double Laplacian matrixd N x1vector double The diagonal of the degree matrixN scalar integer Number of verticesNe scalar integer Number of edgesplotting[M]:structure[P]:dict none Plotting parameterstype text string Name,type or short descriptiondirected scalar[M]:logical[P]:boolean State if the graph is directed or notlap_type text string Laplacian typeOptional?eldsA N x N sparse matrix[M]:logical[P]:boolean Adjacency matrixcoords N x2or N x3matrix double Vectors of coordinates in2D or3D.lmax scalar double Exact or estimated maximum eigenvalue U N x N matrix double Matrix of eigenvectorse N x1vector double Vector of eigenvaluesmu scalar double Graph coherenceTable1:Attributes of the graph objectThe easiest way to create a graph is the[M]:gsp_graph[P]:pygsp.graphs.Graph function which takes the weight matrix as input.This function initializes a graph structure by creating the graph Laplacian and other useful attributes.Note that by default the toolbox uses the combinatorial de?nition of the Laplacian operator.Other Laplacians can be computed using the[M]:gsp_create_laplacian[P]:pygsp.gutils.create_laplacian function.Please note that almost all functions are dependent of the Laplacian de?nition.As a result,it is important to select the correct de?nition at? rst.Many particular graphs are also available using helper functions such as:ring,path,comet,swiss roll,airfoil or two moons. In addition,functions are provided for usual non-deterministic graphs suchas:Erdos-Renyi,community,Stochastic Block Model or sensor networks graphs.Nearest Neighbors(NN)graphs form a class which is used in many applications and can be constructed from a set of points (or point cloud)using the[M]:gsp_nn_graph[P]:pygsp.graphs.NNGraph function.The function is highly tunable and can handle very large sets of points using FLANN[3].Two particular cases of NN graphs have their dedicated helper functions:3D point clouds and image patch-graphs.An example of the former can be seen in thefunction[M]:gsp_bunny[P]:pygsp.graphs.Bunny.As for the second,a graph can be created from an image by connecting similar patches of pixels together.The function[M]:gsp_patch_graph creates this graph.Parameters allow the resulting graph to vary between local and non-local and to use different distance functions [12,4].A few examples of the graphs are displayed in Figure1.3PlottingAs in many other domains,visualization is very important in graph signal processing.The most basic operation is to visualize graphs.This can be achieved using a call to thefunction[M]:gsp_plot_graph[P]:pygsp.plotting.plot_graph. In order to be displayable,a graph needs to have2D(or3D)coordinates(which is a?eld of the graph object).Some graphs do not possess default coordinates(e.g.Erdos-Renyi).The toolbox also contains routines to plot signals living on graphs.The function dedicated to this task is[M]:gsp_plot_ signal[P]:pygsp.plotting.plot_signal.For now,only1D signals are supported.By default,the value of the signal is displayed using a color coding,but bars can be displayed by passing parameters.3Figure 1:Examples of classical graphs :two moons (top left),community (top right),airfoil (bottom left)and sensor network (bottom right).The third visualization helper is a function to plot ?lters (in the spectral domain)which is called [M]:gsp_plot_filter [P]:pygsp.plotting.plot_filter .It also supports ?lter-banks and allows to automatically inspect the related frames.The results obtained using these three plotting functions are visible in Fig.2.4OperatorsThe module operators contains basics spectral graph functions such as Fourier transform,localization,gradient,divergence or pyramid decomposition.Since all operator are based on the Laplacian de? nition,the necessary underlying objects (attributes)are all stored into a single object:the graph.As a ?rst example,the graph Fourier transform [M]:gsp_gft [P]:pygsp.operators.gft requires the Fourier basis.This attribute can be computed with the function [M]:gsp_compute_fourier_basis[P]:/doc/c09ff3e90342a8956bec0975f46527d3240ca692.html pute_fourier_basis [9]that adds the ?elds U ,e and lmax to the graph structure.As a second example,since the gradient and divergence operate on the edges of the graph,a search on the edge matrix is needed to enable the use of these operators.It can be done with the routines [M]:gsp_adj2vec[P]:pygsp.operators.adj2vec .These operations take time and should4Figure 2:Visualization of graph and signals using plotting functions.NameEdge derivativefe (i,j )Laplacian matrix (operator)Available Undirected graph Combinatorial LaplacianW (i,j )(f (j )?f (i ))D ?WV Normalized Laplacian W (i,j ) f (j )√d (j )f (i )√d (i )D ?12(D ?W )D ?12V Directed graph Combinatorial LaplacianW (i,j )(f (j )?f (i ))12(D ++D ??W ?W ?)V Degree normalized Laplacian W (i,j ) f (j )√d ?(j )?f (i )√d +(i )I ?12D ?12+[W +W ?]D ?12V Distribution normalized Laplacianπ(i ) p (i,j )π(j )f (j )? p (i,j )π(i )f (i )12 Π12PΠ12+Π?12P ?Π12 VTable 2:Different de?nitions for graph Laplacian operator and their associated edge derivative.(For directed graph,d +,D +and d ?,D ?de?ne the out degree and in-degree of a node.π,Πis the stationary distribution of the graph and P is a normalized weight matrix W .For sake of clarity,exact de?nition of those quantities are not given here,but can be found in [14].)be performed only once.In MATLAB,these functions are called explicitly by the user beforehand.However,in Python they are automatically called when needed and the result stored as an attribute. The module operator also includes a Multi-scale Pyramid Transform for graph signals [6].Again,it works in two steps.Firstthe pyramid is precomputed with [M]:gsp_graph_multiresolution [P]:pygsp.operators.graph_multiresolution .Second the decomposition of a signal is performed with [M]:gsp_pyramid_analysis [P]:pygsp.operators.pyramid_analysis .The reconstruction uses [M]:gsp_pyramid_synthesis [P]:pygsp.operators.pyramid_synthesis .The Laplacian is a special operator stored as a sparse matrix in the ?eld L of the graph.Table 2summarizes the available de?nitions.We are planning to implement additional ones.5FiltersFilters are a special kind of linear operators that are so prominent in the toolbox that they deserve their own module [9,7,2,8,2].A ?lter is simply an anonymous function (in MATLAB)or a lambda function (in Python)acting element-by-element on the input.In MATLAB,a ?lter-bank is created simply by gathering these functions together into a cell array.For example,you would write:51%g(x)=x^2+sin(x)2g=@(x)x.^2+sin(x);3%h(x)=exp(-x)4h=@(x)exp(-x);5%Filterbank composed of g and h6fb={g,h};The toolbox contains many prede?ned design of?lter.They all start with[M]:gsp_design_in MATLAB and are in the module[P]:pygsp.filters in Python.Once a?lter(or a?lter-bank)is created,it can be applied to a signal with[M]: gsp_filter_analysis in MATLAB and a call to the method[P]:analysis of the?lter object in Python.Note that the toolbox uses accelerated algorithms to scale almost linearly with the number of sample[11].The available type of?lter design of the GSPBox can be classi?ed as:Wavelets(Filters are scaled version of a mother window)Gabor(Filters are shifted version of a mother window)Low passlter(Filters to de-noise a signal)High pass/Low pass separationlterbank(tight frame of2lters to separate the high frequencies from the low ones.No energy is lost in the process)Additionally,to adapt the?lter to the graph eigen-distribution,the warping function[M]:gsp_design_warped_translates [P]:pygsp.filters.WarpedTranslates can be used[10].6UNLocBoX BindingThis module contains special wrappers for the UNLocBoX[5].It allows to solve convex problems containing graph terms very easily[13,15,14,1].For example,the proximal operator of the graph TV norm is given by[M]:gsp_prox_tv.The optimization module contains also some prede?ned problems such as graph basis pursuit in[M]:gsp_solve_l1or wavelet de-noising in[M]:gsp_wavelet_dn.There is still active work on this module so it is expected to grow rapidly in the future releases of the toolbox.7Toolbox conventions7.1General conventionsAs much as possible,all small letters are used for vectors(or vector stacked into a matrix)and capital are reserved for matrices.A notable exception is the creation of nearest neighbors graphs.A variable should never have the same name as an already existing function in MATLAB or Python respectively.This makes the code easier to read and less prone to errors.This is a best coding practice in general,but since both languages allow the override of built-in functions,a special care is needed.All function names should be lowercase.This avoids a lot of confusion because some computer architectures respect upper/lower casing and others do not.As much as possible,functions are named after the action they perform,rather than the algorithm they use,or the person who invented it.No global variables.Global variables makes it harder to debug and the code is harder to parallelize.67.2MATLABAll function start by gsp_.The graph structure is always therst argument in the function call.Filters are always second.Finally,optional parameter are last.In the toolbox,we do use any argument helper functions.As a result,optional argument are generally stacked into a graph structure named param.If a transform works on a matrix,it will per default work along the columns.This is a standard in Matlab(fft does this, among many other functions).Function names are traditionally written in uppercase in MATLAB documentation.7.3PythonAll functions should be part of a module,there should be no call directly from pygsp([P]:pygsp.my_function).Inside a given module,functionalities can be further split in differentles regrouping those that are used in the same context.MATLAB’s matrix operations are sometimes ported in a different way that preserves the efciency of the code.When matrix operations are necessary,they are all performed through the numpy and scipy libraries.Since Python does not come with a plotting library,we support both matplotlib and pyqtgraph.One should install the required libraries on his own.If both are correctly installed,then pyqtgraph is favoured unless speci?cally speci?ed. AcknowledgementsWe would like to thanks all coding authors of the GSPBOX.The toolbox was ported in Python by Basile Chatillon,Alexandre Lafaye and Nicolas Rod.The toolbox was also improved by Nauman Shahid and Yann Sch?nenberger.References[1]M.Belkin,P.Niyogi,and V.Sindhwani.Manifold regularization:A geometric framework for learning from labeled and unlabeledexamples.The Journal of Machine Learning Research,7:2399–2434,2006.[2] D.K.Hammond,P.Vandergheynst,and R.Gribonval.Wavelets on graphs via spectral graph theory.Applied and ComputationalHarmonic Analysis,30(2):129–150,2011.[3]M.Muja and D.G.Lowe.Scalable nearest neighbor algorithms for high dimensional data.Pattern Analysis and Machine Intelligence,IEEE Transactions on,36,2014.[4]S.K.Narang,Y.H.Chao,and A.Ortega.Graph-wavelet?lterbanks for edge-aware image processing.In Statistical Signal ProcessingWorkshop(SSP),2012IEEE,pages141–144.IEEE,2012.[5]N.Perraudin,D.Shuman,G.Puy,and P.Vandergheynst.UNLocBoX A matlab convex optimization toolbox using proximal splittingmethods.ArXiv e-prints,Feb.2014.[6] D.I.Shuman,M.J.Faraji,and P.Vandergheynst.A multiscale pyramid transform for graph signals.arXiv preprint arXiv:1308.4942,2013.[7] D.I.Shuman,S.K.Narang,P.Frossard,A.Ortega,and P.Vandergheynst.The emerging?eld of signal processing on graphs:Extendinghigh-dimensional data analysis to networks and other irregular domains.Signal Processing Magazine,IEEE,30(3):83–98,2013.7[8] D.I.Shuman,B.Ricaud,and P.Vandergheynst.A windowed graph Fourier transform.Statistical Signal Processing Workshop(SSP),2012IEEE,pages133–136,2012.[9] D.I.Shuman,B.Ricaud,and P.Vandergheynst.Vertex-frequency analysis on graphs.arXiv preprint arXiv:1307.5708,2013.[10] D.I.Shuman,C.Wiesmeyr,N.Holighaus,and P.Vandergheynst.Spectrum-adapted tight graph wavelet and vertex-frequency frames.arXiv preprint arXiv:1311.0897,2013.[11] A.Susnjara,N.Perraudin,D.Kressner,and P.Vandergheynst.Accelerated?ltering on graphs using lanczos method.arXiv preprintarXiv:1509.04537,2015.[12] F.Zhang and E.R.Hancock.Graph spectral image smoothing using the heat kernel.Pattern Recognition,41(11):3328–3342,2008.[13] D.Zhou,O.Bousquet,/doc/c09ff3e90342a8956bec0975f46527d3240ca692.html l,J.Weston,and B.Sch?lkopf.Learning with local and global consistency.Advances in neural informationprocessing systems,16(16):321–328,2004.[14] D.Zhou,J.Huang,and B.Sch?lkopf.Learning from labeled and unlabeled data on a directed graph.In the22nd international conference,pages1036–1043,New York,New York,USA,2005.ACM Press.[15] D.Zhou and B.Sch?lkopf.A regularization framework for learning from graph data.2004.8。

A Comprehensive Survey of Multiagent Reinforcement Learning

156
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 38, NO. 2, MARCH 2008
A Comprehensive Survey of Multiagent ReinfoN
A
MULTIAGENT system [1] can be deﬁned as a group of autonomous, interacting entities sharing a common environment, which they perceive with sensors and upon which they act with actuators [2]. Multiagent systems are ﬁnding applications in a wide variety of domains including robotic teams, distributed control, resource management, collaborative decision support systems, data mining, etc. [3], [4]. They may arise as the most natural way of looking at the system, or may provide an alternative perspective on systems that are originally regarded as centralized. For instance, in robotic teams, the control authority is naturally distributed among the robots [4]. In resource management, while resources can be managed by a central authority, identifying each resource with an agent may provide a helpful, distributed perspective on the system [5].

Gradient-based learning applied to document recognition

Gradient-Based Learning Appliedto Document RecognitionYANN LECUN,MEMBER,IEEE,L´EON BOTTOU,YOSHUA BENGIO,AND PATRICK HAFFNER Invited PaperMultilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient-based learning technique.Given an appropriate network architecture,gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns,such as handwritten characters,with minimal preprocessing.This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task.Convolutional neural networks,which are speciﬁcally designed to deal with the variability of two dimensional(2-D)shapes,are shown to outperform all other techniques.Real-life document recognition systems are composed of multiple modules includingﬁeld extraction,segmentation,recognition, and language modeling.A new learning paradigm,called graph transformer networks(GTN’s),allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure.Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training,and theﬂexibility of graph transformer networks.A graph transformer network for reading a bank check is also described.It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks.It is deployed commercially and reads several million checks per day. Keywords—Convolutional neural networks,document recog-nition,ﬁnite state transducers,gradient-based learning,graphtransformer networks,machine learning,neural networks,optical character recognition(OCR).N OMENCLATUREGT Graph transformer.GTN Graph transformer network.HMM Hidden Markov model.HOS Heuristic oversegmentation.K-NN K-nearest neighbor.Manuscript received November1,1997;revised April17,1998.Y.LeCun,L.Bottou,and P.Haffner are with the Speech and Image Processing Services Research Laboratory,AT&T Labs-Research,Red Bank,NJ07701USA.Y.Bengio is with the D´e partement d’Informatique et de Recherche Op´e rationelle,Universit´e de Montr´e al,Montr´e al,Qu´e bec H3C3J7Canada. Publisher Item Identiﬁer S0018-9219(98)07863-3.NN Neural network.OCR Optical character recognition.PCA Principal component analysis.RBF Radial basis function.RS-SVM Reduced-set support vector method. SDNN Space displacement neural network.SVM Support vector method.TDNN Time delay neural network.V-SVM Virtual support vector method.I.I NTRODUCTIONOver the last several years,machine learning techniques, particularly when applied to NN’s,have played an increas-ingly important role in the design of pattern recognition systems.In fact,it could be argued that the availability of learning techniques has been a crucial factor in the recent success of pattern recognition applications such as continuous speech recognition and handwriting recognition. The main message of this paper is that better pattern recognition systems can be built by relying more on auto-matic learning and less on hand-designed heuristics.This is made possible by recent progress in machine learning and computer ing character recognition as a case study,we show that hand-crafted feature extraction can be advantageously replaced by carefully designed learning machines that operate directly on pixel ing document understanding as a case study,we show that the traditional way of building recognition systems by manually integrating individually designed modules can be replaced by a uniﬁed and well-principled design paradigm,called GTN’s,which allows training all the modules to optimize a global performance criterion.Since the early days of pattern recognition it has been known that the variability and richness of natural data, be it speech,glyphs,or other types of patterns,make it almost impossible to build an accurate recognition system entirely by hand.Consequently,most pattern recognition systems are built using a combination of automatic learning techniques and hand-crafted algorithms.The usual method0018–9219/98$10.00©1998IEEE2278PROCEEDINGS OF THE IEEE,VOL.86,NO.11,NOVEMBER1998Fig.1.Traditional pattern recognition is performed with two modules:aﬁxed feature extractor and a trainable classiﬁer.of recognizing individual patterns consists in dividing the system into two main modules shown in Fig.1.Theﬁrst module,called the feature extractor,transforms the input patterns so that they can be represented by low-dimensional vectors or short strings of symbols that:1)can be easily matched or compared and2)are relatively invariant with respect to transformations and distortions of the input pat-terns that do not change their nature.The feature extractor contains most of the prior knowledge and is rather speciﬁc to the task.It is also the focus of most of the design effort, because it is often entirely hand crafted.The classiﬁer, on the other hand,is often general purpose and trainable. One of the main problems with this approach is that the recognition accuracy is largely determined by the ability of the designer to come up with an appropriate set of features. This turns out to be a daunting task which,unfortunately, must be redone for each new problem.A large amount of the pattern recognition literature is devoted to describing and comparing the relative merits of different feature sets for particular tasks.Historically,the need for appropriate feature extractors was due to the fact that the learning techniques used by the classiﬁers were limited to low-dimensional spaces with easily separable classes[1].A combination of three factors has changed this vision over the last decade.First, the availability of low-cost machines with fast arithmetic units allows for reliance on more brute-force“numerical”methods than on algorithmic reﬁnements.Second,the avail-ability of large databases for problems with a large market and wide interest,such as handwriting recognition,has enabled designers to rely more on real data and less on hand-crafted feature extraction to build recognition systems. The third and very important factor is the availability of powerful machine learning techniques that can handle high-dimensional inputs and can generate intricate decision functions when fed with these large data sets.It can be argued that the recent progress in the accuracy of speech and handwriting recognition systems can be attributed in large part to an increased reliance on learning techniques and large training data sets.As evidence of this fact,a large proportion of modern commercial OCR systems use some form of multilayer NN trained with back propagation.In this study,we consider the tasks of handwritten character recognition(Sections I and II)and compare the performance of several learning techniques on a benchmark data set for handwritten digit recognition(Section III). While more automatic learning is beneﬁcial,no learning technique can succeed without a minimal amount of prior knowledge about the task.In the case of multilayer NN’s, a good way to incorporate knowledge is to tailor its archi-tecture to the task.Convolutional NN’s[2],introduced in Section II,are an example of specialized NN architectures which incorporate knowledge about the invariances of two-dimensional(2-D)shapes by using local connection patterns and by imposing constraints on the weights.A comparison of several methods for isolated handwritten digit recogni-tion is presented in Section III.To go from the recognition of individual characters to the recognition of words and sentences in documents,the idea of combining multiple modules trained to reduce the overall error is introduced in Section IV.Recognizing variable-length objects such as handwritten words using multimodule systems is best done if the modules manipulate directed graphs.This leads to the concept of trainable GTN,also introduced in Section IV. Section V describes the now classical method of HOS for recognizing words or other character strings.Discriminative and nondiscriminative gradient-based techniques for train-ing a recognizer at the word level without requiring manual segmentation and labeling are presented in Section VI. Section VII presents the promising space-displacement NN approach that eliminates the need for segmentation heuris-tics by scanning a recognizer at all possible locations on the input.In Section VIII,it is shown that trainable GTN’s can be formulated as multiple generalized transductions based on a general graph composition algorithm.The connections between GTN’s and HMM’s,commonly used in speech recognition,is also treated.Section IX describes a globally trained GTN system for recognizing handwriting entered in a pen computer.This problem is known as “online”handwriting recognition since the machine must produce immediate feedback as the user writes.The core of the system is a convolutional NN.The results clearly demonstrate the advantages of training a recognizer at the word level,rather than training it on presegmented, hand-labeled,isolated characters.Section X describes a complete GTN-based system for reading handwritten and machine-printed bank checks.The core of the system is the convolutional NN called LeNet-5,which is described in Section II.This system is in commercial use in the NCR Corporation line of check recognition systems for the banking industry.It is reading millions of checks per month in several banks across the United States.A.Learning from DataThere are several approaches to automatic machine learn-ing,but one of the most successful approaches,popularized in recent years by the NN community,can be called“nu-merical”or gradient-based learning.The learning machine computes afunction th input pattern,andtheoutputthatminimizesand the error rate on the trainingset decreases with the number of training samplesapproximatelyasis the number of trainingsamples,is a number between0.5and1.0,andincreases,decreases.Therefore,when increasing thecapacitythat achieves the lowest generalizationerror Mostlearning algorithms attempt tominimize as well assome estimate of the gap.A formal version of this is calledstructural risk minimization[6],[7],and it is based on deﬁn-ing a sequence of learning machines of increasing capacity,corresponding to a sequence of subsets of the parameterspace such that each subset is a superset of the previoussubset.In practical terms,structural risk minimization isimplemented byminimizingisaconstant.that belong to high-capacity subsets ofthe parameter space.Minimizingis a real-valuedvector,with respect towhichis iteratively adjusted asfollows:is updated on the basis of a singlesampleof several layers of processing,i.e.,the back-propagation algorithm.The third event was the demonstration that the back-propagation procedure applied to multilayer NN’s with sigmoidal units can solve complicated learning tasks. The basic idea of back propagation is that gradients can be computed efﬁciently by propagation from the output to the input.This idea was described in the control theory literature of the early1960’s[16],but its application to ma-chine learning was not generally realized then.Interestingly, the early derivations of back propagation in the context of NN learning did not use gradients but“virtual targets”for units in intermediate layers[17],[18],or minimal disturbance arguments[19].The Lagrange formalism used in the control theory literature provides perhaps the best rigorous method for deriving back propagation[20]and for deriving generalizations of back propagation to recurrent networks[21]and networks of heterogeneous modules[22].A simple derivation for generic multilayer systems is given in Section I-E.The fact that local minima do not seem to be a problem for multilayer NN’s is somewhat of a theoretical mystery. It is conjectured that if the network is oversized for the task(as is usually the case in practice),the presence of “extra dimensions”in parameter space reduces the risk of unattainable regions.Back propagation is by far the most widely used neural-network learning algorithm,and probably the most widely used learning algorithm of any form.D.Learning in Real Handwriting Recognition Systems Isolated handwritten character recognition has been ex-tensively studied in the literature(see[23]and[24]for reviews),and it was one of the early successful applications of NN’s[25].Comparative experiments on recognition of individual handwritten digits are reported in Section III. They show that NN’s trained with gradient-based learning perform better than all other methods tested here on the same data.The best NN’s,called convolutional networks, are designed to learn to extract relevant features directly from pixel images(see Section II).One of the most difﬁcult problems in handwriting recog-nition,however,is not only to recognize individual charac-ters,but also to separate out characters from their neighbors within the word or sentence,a process known as seg-mentation.The technique for doing this that has become the“standard”is called HOS.It consists of generating a large number of potential cuts between characters using heuristic image processing techniques,and subsequently selecting the best combination of cuts based on scores given for each candidate character by the recognizer.In such a model,the accuracy of the system depends upon the quality of the cuts generated by the heuristics,and on the ability of the recognizer to distinguish correctly segmented characters from pieces of characters,multiple characters, or otherwise incorrectly segmented characters.Training a recognizer to perform this task poses a major challenge because of the difﬁculty in creating a labeled database of incorrectly segmented characters.The simplest solution consists of running the images of character strings through the segmenter and then manually labeling all the character hypotheses.Unfortunately,not only is this an extremely tedious and costly task,it is also difﬁcult to do the labeling consistently.For example,should the right half of a cut-up four be labeled as a one or as a noncharacter?Should the right half of a cut-up eight be labeled as a three?Theﬁrst solution,described in Section V,consists of training the system at the level of whole strings of char-acters rather than at the character level.The notion of gradient-based learning can be used for this purpose.The system is trained to minimize an overall loss function which measures the probability of an erroneous answer.Section V explores various ways to ensure that the loss function is differentiable and therefore lends itself to the use of gradient-based learning methods.Section V introduces the use of directed acyclic graphs whose arcs carry numerical information as a way to represent the alternative hypotheses and introduces the idea of GTN.The second solution,described in Section VII,is to eliminate segmentation altogether.The idea is to sweep the recognizer over every possible location on the input image,and to rely on the“character spotting”property of the recognizer,i.e.,its ability to correctly recognize a well-centered character in its inputﬁeld,even in the presence of other characters besides it,while rejecting images containing no centered characters[26],[27].The sequence of recognizer outputs obtained by sweeping the recognizer over the input is then fed to a GTN that takes linguistic constraints into account andﬁnally extracts the most likely interpretation.This GTN is somewhat similar to HMM’s,which makes the approach reminiscent of the classical speech recognition[28],[29].While this technique would be quite expensive in the general case,the use of convolutional NN’s makes it particularly attractive because it allows signiﬁcant savings in computational cost.E.Globally Trainable SystemsAs stated earlier,most practical pattern recognition sys-tems are composed of multiple modules.For example,a document recognition system is composed of aﬁeld loca-tor(which extracts regions of interest),aﬁeld segmenter (which cuts the input image into images of candidate characters),a recognizer(which classiﬁes and scores each candidate character),and a contextual postprocessor,gen-erally based on a stochastic grammar(which selects the best grammatically correct answer from the hypotheses generated by the recognizer).In most cases,the information carried from module to module is best represented as graphs with numerical information attached to the arcs. For example,the output of the recognizer module can be represented as an acyclic graph where each arc contains the label and the score of a candidate character,and where each path represents an alternative interpretation of the input string.Typically,each module is manually optimized,or sometimes trained,outside of its context.For example,the character recognizer would be trained on labeled images of presegmented characters.Then the complete system isLECUN et al.:GRADIENT-BASED LEARNING APPLIED TO DOCUMENT RECOGNITION2281assembled,and a subset of the parameters of the modules is manually adjusted to maximize the overall performance. This last step is extremely tedious,time consuming,and almost certainly suboptimal.A better alternative would be to somehow train the entire system so as to minimize a global error measure such as the probability of character misclassiﬁcations at the document level.Ideally,we would want toﬁnd a good minimum of this global loss function with respect to all theparameters in the system.If the loss functionusing gradient-based learning.However,at ﬁrst glance,it appears that the sheer size and complexity of the system would make this intractable.To ensure that the global loss functionwithrespect towith respect toFig.2.Architecture of LeNet-5,a convolutional NN,here used for digits recognition.Each plane is a feature map,i.e.,a set of units whose weights are constrained to be identical.or other2-D or one-dimensional(1-D)signals,must be approximately size normalized and centered in the input ﬁeld.Unfortunately,no such preprocessing can be perfect: handwriting is often normalized at the word level,which can cause size,slant,and position variations for individual characters.This,combined with variability in writing style, will cause variations in the position of distinctive features in input objects.In principle,a fully connected network of sufﬁcient size could learn to produce outputs that are invari-ant with respect to such variations.However,learning such a task would probably result in multiple units with similar weight patterns positioned at various locations in the input so as to detect distinctive features wherever they appear on the input.Learning these weight conﬁgurations requires a very large number of training instances to cover the space of possible variations.In convolutional networks,as described below,shift invariance is automatically obtained by forcing the replication of weight conﬁgurations across space. Secondly,a deﬁciency of fully connected architectures is that the topology of the input is entirely ignored.The input variables can be presented in any(ﬁxed)order without af-fecting the outcome of the training.On the contrary,images (or time-frequency representations of speech)have a strong 2-D local structure:variables(or pixels)that are spatially or temporally nearby are highly correlated.Local correlations are the reasons for the well-known advantages of extracting and combining local features before recognizing spatial or temporal objects,because conﬁgurations of neighboring variables can be classiﬁed into a small number of categories (e.g.,edges,corners,etc.).Convolutional networks force the extraction of local features by restricting the receptive ﬁelds of hidden units to be local.A.Convolutional NetworksConvolutional networks combine three architectural ideas to ensure some degree of shift,scale,and distortion in-variance:1)local receptiveﬁelds;2)shared weights(or weight replication);and3)spatial or temporal subsampling.A typical convolutional network for recognizing characters, dubbed LeNet-5,is shown in Fig.2.The input plane receives images of characters that are approximately size normalized and centered.Each unit in a layer receives inputs from a set of units located in a small neighborhood in the previous layer.The idea of connecting units to local receptiveﬁelds on the input goes back to the perceptron in the early1960’s,and it was almost simultaneous with Hubel and Wiesel’s discovery of locally sensitive,orientation-selective neurons in the cat’s visual system[30].Local connections have been used many times in neural models of visual learning[2],[18],[31]–[34].With local receptive ﬁelds neurons can extract elementary visual features such as oriented edges,endpoints,corners(or similar features in other signals such as speech spectrograms).These features are then combined by the subsequent layers in order to detect higher order features.As stated earlier,distortions or shifts of the input can cause the position of salient features to vary.In addition,elementary feature detectors that are useful on one part of the image are likely to be useful across the entire image.This knowledge can be applied by forcing a set of units,whose receptiveﬁelds are located at different places on the image,to have identical weight vectors[15], [32],[34].Units in a layer are organized in planes within which all the units share the same set of weights.The set of outputs of the units in such a plane is called a feature map. Units in a feature map are all constrained to perform the same operation on different parts of the image.A complete convolutional layer is composed of several feature maps (with different weight vectors),so that multiple features can be extracted at each location.A concrete example of this is theﬁrst layer of LeNet-5shown in Fig.2.Units in theﬁrst hidden layer of LeNet-5are organized in six planes,each of which is a feature map.A unit in a feature map has25inputs connected to a5case of LeNet-5,at each input location six different types of features are extracted by six units in identical locations in the six feature maps.A sequential implementation of a feature map would scan the input image with a single unit that has a local receptive ﬁeld and store the states of this unit at corresponding locations in the feature map.This operation is equivalent to a convolution,followed by an additive bias and squashing function,hence the name convolutional network.The kernel of the convolution is theOnce a feature has been detected,its exact location becomes less important.Only its approximate position relative to other features is relevant.For example,once we know that the input image contains the endpoint of a roughly horizontal segment in the upper left area,a corner in the upper right area,and the endpoint of a roughly vertical segment in the lower portion of the image,we can tell the input image is a seven.Not only is the precise position of each of those features irrelevant for identifying the pattern,it is potentially harmful because the positions are likely to vary for different instances of the character.A simple way to reduce the precision with which the position of distinctive features are encoded in a feature map is to reduce the spatial resolution of the feature map.This can be achieved with a so-called subsampling layer,which performs a local averaging and a subsampling,thereby reducing the resolution of the feature map and reducing the sensitivity of the output to shifts and distortions.The second hidden layer of LeNet-5is a subsampling layer.This layer comprises six feature maps,one for each feature map in the previous layer.The receptive ﬁeld of each unit is a 232p i x e l i m a g e .T h i s i s s i g n i ﬁc a n tt h e l a r g e s t c h a r a c t e r i n t h e d a t a b a s e (a t28ﬁe l d ).T h e r e a s o n i s t h a t i t it h a t p o t e n t i a l d i s t i n c t i v e f e a t u r e s s u c h o r c o r n e r c a n a p p e a r i n t h e c e n t e r o f t h o f t h e h i g h e s t l e v e l f e a t u r e d e t e c t o r s .o f c e n t e r s o f t h e r e c e p t i v e ﬁe l d s o f t h e l a y e r (C 3,s e e b e l o w )f o r m a 2032i n p u t .T h e v a l u e s o f t h e i n p u t p i x e l s o t h a t t h e b a c k g r o u n d l e v e l (w h i t e )c o ro fa n d t h e f o r e g r o u n d (b l ac k )c o r r e s p T h i s m a k e s t h e m e a n i n p u t r o u g h l y z e r o r o u g h l y o n e ,w h i c h a c c e l e r a t e s l e a r n i n g I n t h e f o l l o w i n g ,c o n v o l u t i o n a l l a y e r s u b s a m p l i n g l a y e r s a r e l a b e l ed S x ,a n d l a ye r s a r e l a b e l e d F x ,w h e r e x i s t h e l a y L a y e r C 1i s a c o n v o l u t i o n a l l a y e r w i t h E a c h u n i t i n e a c hf e a t u r e m a p i s c o n n e c t28w h i c h p r e v e n t s c o n n e c t i o n f r o m t h e i n p t h e b o u n d a r y .C 1c o n t a i n s 156t r a i n a b l 122304c o n n e c t i o n s .L a y e r S 2i s a s u b s a m p l i n g l a y e r w i t h s i s i z e 142n e i g h b o r h o o d i n t h e c o r r e s p o n d i n g f T h e f o u r i n p u t s t o a u n i t i n S 2a r e a d d e d ,2284P R O C E E D I N G S O F T H E I E E E ,V O L .86,N O .11,N O VTable 1Each Column Indicates Which Feature Map in S2Are Combined by the Units in a Particular Feature Map ofC3a trainable coefﬁcient,and then added to a trainable bias.The result is passed through a sigmoidal function.The25neighborhoods at identical locations in a subset of S2’s feature maps.Table 1shows the set of S2feature maps combined by each C3feature map.Why not connect every S2feature map to every C3feature map?The reason is twofold.First,a noncomplete connection scheme keeps the number of connections within reasonable bounds.More importantly,it forces a break of symmetry in the network.Different feature maps are forced to extract dif-ferent (hopefully complementary)features because they get different sets of inputs.The rationale behind the connection scheme in Table 1is the following.The ﬁrst six C3feature maps take inputs from every contiguous subsets of three feature maps in S2.The next six take input from every contiguous subset of four.The next three take input from some discontinuous subsets of four.Finally,the last one takes input from all S2feature yer C3has 1516trainable parameters and 156000connections.Layer S4is a subsampling layer with 16feature maps of size52neighborhood in the corresponding feature map in C3,in a similar way as C1and yer S4has 32trainable parameters and 2000connections.Layer C5is a convolutional layer with 120feature maps.Each unit is connected to a55,the size of C5’s feature maps is11.This process of dynamically increasing thesize of a convolutional network is described in Section yer C5has 48120trainable connections.Layer F6contains 84units (the reason for this number comes from the design of the output layer,explained below)and is fully connected to C5.It has 10164trainable parameters.As in classical NN’s,units in layers up to F6compute a dot product between their input vector and their weight vector,to which a bias is added.This weighted sum,denotedforunit (6)wheredeterminesits slope at the origin.Thefunctionis chosen to be1.7159.The rationale for this choice of a squashing function is given in Appendix A.Finally,the output layer is composed of Euclidean RBF units,one for each class,with 84inputs each.The outputs of each RBFunit(7)In other words,each output RBF unit computes the Eu-clidean distance between its input vector and its parameter vector.The further away the input is from the parameter vector,the larger the RBF output.The output of a particular RBF can be interpreted as a penalty term measuring the ﬁt between the input pattern and a model of the class associated with the RBF.In probabilistic terms,the RBF output can be interpreted as the unnormalized negative log-likelihood of a Gaussian distribution in the space of conﬁgurations of layer F6.Given an input pattern,the loss function should be designed so as to get the conﬁguration of F6as close as possible to the parameter vector of the RBF that corresponds to the pattern’s desired class.The parameter vectors of these units were chosen by hand and kept ﬁxed (at least initially).The components of thoseparameters vectors were set to1.While they could have been chosen at random with equal probabilities for1,or even chosen to form an error correctingcode as suggested by [47],they were instead designed to represent a stylized image of the corresponding character class drawn on a7。

Journal of Loss Prevention in the Process Industries

HAZOP e Local approach in the Mexican oil &gas industryM.Pérez-Marín a ,M.A.Rodríguez-Toral b ,*aInstituto Mexicano del Petróleo,Dirección de Seguridad y Medio Ambiente,Eje Central Lázaro Cárdenas Norte No.152,07730México,D.F.,Mexicob PEMEX,Dirección Corporativa de Operaciones,Gerencia de Análisis de Inversiones,Torre Ejecutiva,Piso 12,Av.Marina Nacional No.329,11311México,D.F.,Mexicoa r t i c l e i n f oArticle history:Received 3September 2012Received in revised form 26March 2013Accepted 27March 2013Keywords:HAZOPRisk acceptance criteria Oil &gasa b s t r a c tHAZOP (Hazard and Operability)studies began about 40years ago,when the Process Industry and complexity of its operations start to massively grow in different parts of the world.HAZOP has been successfully applied in Process Systems hazard identi ﬁcation by operators,design engineers and consulting ﬁrms.Nevertheless,after a few decades since its ﬁrst applications,HAZOP studies are not truly standard in worldwide industrial practice.It is common to ﬁnd differences in its execution and results format.The aim of this paper is to show that in the Mexican case at National level in the oil and gas industry,there exist an explicit acceptance risk criteria,thus impacting the risk scenarios prioritizing process.Although HAZOP studies in the Mexican oil &gas industry,based on PEMEX corporate standard has precise acceptance criteria,it is not a signi ﬁcant difference in HAZOP applied elsewhere,but has the advantage of being fully transparent in terms of what a local industry is willing to accept as the level of risk acceptance criteria,also helps to gain an understanding of the degree of HAZOP applications in the Mexican oil &gas sector.Contrary to this in HAZOP ISO standard,risk acceptance criteria is not speci ﬁed and it only mentions that HAZOP can consider scenarios ranking.The paper concludes indicating major implications of risk ranking in HAZOP,whether before or after safeguards identi ﬁcation.Ó2013Elsevier Ltd.All rights reserved.1.IntroductionHAZOP (Hazard and Operability)studies appeared in systematic way about 40years ago (Lawley,1974)where a multidisciplinary group uses keywords on Process variables to ﬁnd potential hazards and operability troubles (Mannan,2012,pp.8-31).The basic prin-ciple is to have a full process description and to ask in each node what deviations to the design purpose can occur,what causes produce them,and what consequences can be presented.This is done systematically by applying the guide words:Not ,More than ,Less than ,etc.as to generate a list of potential failures in equipment and process components.The objective of this paper is to show that in the Mexican case at National level in the oil and gas industry,there is an explicit acceptance risk criteria,thus impacting the risk scenarios priori-tizing process.Although HAZOP methodology in the Mexican oil &gas industry,based on PEMEX corporate standard has precise acceptance criteria,it is not a signi ﬁcant difference in HAZOP studies applied elsewhere,but has the advantage of being fullytransparent in terms of what a local industry is willing to accept as the level of risk acceptance criteria,also helps to gain an under-standing of the degree of HAZOP applications in the Mexican oil &gas sector.Contrary to this in HAZOP ISO standard (ISO,2000),risk acceptance criteria is not speci ﬁed and it only mentions that HAZOP can consider scenarios ranking.The paper concludes indicating major implications of risk prioritizing in HAZOP,whether before or after safeguards identi ﬁcation.2.Previous workHAZOP studies include from original ICI method with required actions only,to current applications based on computerized documentation,registering design intentions at nodes,guide words,causes,deviations,consequences,safeguards,cause fre-quencies,loss contention impact,risk reduction factors,scenarios analysis,ﬁnding analysis and many combinations among them.In the open literature there have been reported interesting and signi ﬁcant studies about HAZOP,like HAZOP and HAZAN differences (Gujar,1996)where HAZOP was identi ﬁed as qualitative hazard identi ﬁcation technique,while HAZAN was considered for the quantitative risk determination.This difference is not strictly valid today,since there are now companies using HAZOP with risk analysis*Corresponding author.Tel.:þ525519442500x57043.E-mail addresses:mpmarin@imp.mx (M.Pérez-Marín),miguel.angel.rodriguezt@ ,matoral09@ (M.A.Rodríguez-Toral).Contents lists available at SciVerse ScienceDirectJournal of Loss Prevention in the Process Industriesjou rn al homepage :/locate/jlp0950-4230/$e see front matter Ó2013Elsevier Ltd.All rights reserved./10.1016/j.jlp.2013.03.008Journal of Loss Prevention in the Process Industries 26(2013)936e 940and its acceptance criteria(Goyal&Kugan,2012).Other approaches include HAZOP execution optimization(Khan,1997);the use of intelligent systems to automate HAZOP(Venkatasubramanian,Zhao, &Viswanathan,2000);the integration of HAZOP with Fault Tree Analysis(FTA)and with Event Tree Analysis(ETA)(Kuo,Hsu,& Chang,1997).According to CCPS(2001)any qualitative method for hazard evaluation applied to identify scenarios in terms of their initial causes,events sequence,consequences and safeguards,can beextended to register Layer of Protection Analysis(LOPA).Since HAZOP scenarios report are presented typically in tabular form there can be added columns considering the frequency in terms of order of magnitude and the probability of occurrence identiﬁed in LOPA.There should be identiﬁed the Independent and the non-Independent Protection Layers,IPL and non-IPL respec-tively.Then the Probability of Failure on Demand(PFDs)for IPL and for non-IPL can be included as well as IPL integrity.Another approach consists of a combination of HAZOP/LOPA analysis including risk magnitude to rank risk reduction actions (Johnson,2010),a general method is shown,without emphasizing in any particular application.An extended HAZOP/LOPA analysis for Safety Integrity Level(SIL)is presented there,showing the quan-titative beneﬁt of applying risk reduction measures.In this way one scenario can be compared with tolerable risk criteria besides of being able to compare each scenario according to its risk value.A recent review paper has reported variations of HAZOP methodology for several applications including batch processes, laboratory operations,mechanical operations and programmable electronic systems(PES)among others(Dunjó,Fthenakis,Vílchez, &Arnaldos,2010).Wide and important contributions to HAZOP knowledge have been reported in the open literature that have promoted usage and knowledge of HAZOP studies.However,even though there is available the IEC standard on HAZOP studies,IEC-61882:2001there is not a worldwide agreement on HAZOP methodology and there-fore there exist a great variety of approaches for HAZOP studies.At international level there exist an ample number of ap-proaches in HAZOP studies;even though the best advanced prac-tices have been taken by several expert groups around the world, there is not uniformity among different consulting companies or industry internal expert groups(Goyal&Kugan,2012).The Mexican case is not the exception about this,but in the local oil and gas industry there exist a national PEMEX corporate standard that is speciﬁc in HAZOP application,it includes ranking risk scenarios (PEMEX,2008),qualitative hazard ranking,as well as the two ap-proaches recognized in HAZOP,Cause by Cause(CÂC)and Devia-tion by Deviation(DÂD).Published work including risk criteria include approaches in countries from the Americas,Europe and Asia(CCPS,2009),but nothing about Mexico has been reported.3.HAZOP variationsIn the technical literature there is no consensus in the HAZOP studies procedure,from the several differences it is consider that the more important are the variations according to:(DÂD)or (CÂC).Table1shows HAZOP variations,where(CQÂCQ)means Consequence by Consequence analysis.The implications of choosing(CÂC)are that in this approach there are obtained unique relationships of Consequences,Safeguards and Recommendations,for each speciﬁc Cause of a given Deviation. For(DÂD),all Causes,Consequences,Safeguards and Recommenda-tions are related only to one particular Deviation,thus producing that not all Causes appear to produce all the Consequences.In practice HAZOP approach(DÂD)can optimize analysis time development.However,its drawback comes when HAZOP includes risk ranking since it cannot be determined easily which Cause to consider in probability assignment.In choosing(CÂC)HAZOP there is no such a problem,although it may take more time on the analysis.The HAZOP team leader should agree HAZOP approach with customer and communicate this to the HAZOP team.In our experience factors to consider when choosing HAZOP approach are:1.If HAZOP will be followed by Layers of Protection Analysis(LOPA)for Safety Integrity Level(SIL)selection,then choose (CÂC).2.If HAZOP is going to be the only hazard identiﬁcation study,it isworth to make it with major detail using(CÂC).3.If HAZOP is part of an environmental risk study that requires aConsequence analysis,then use(DÂD).4.If HAZOP is going to be done with limited time or becauseHAZOP team cannot spend too much time in the analysis,then use(DÂD).Although this is not desirable since may compro-mise process safety.Regarding risk ranking in HAZOP,looking at IEC standard(IEC, 2001)it is found that HAZOP studies there are(DÂD)it refers to (IEC,1995)in considering deviation ranking in accordance to their severity or on their relative risk.One advantage of risk ranking is that presentation of HAZOP results is very convenient,in particular when informing the management on the recommendations to be followedﬁrst or with higher priority as a function of risk evaluated by the HAZOP team regarding associated Cause with a given recommendation.Tables2and3are shown as illustrative example of the convenience of event risk ranking under HAZOP,showing no risk ranking in Table2and risk ranking in Table3.When HAZOP presents a list of recommendations without ranking,the management can focus to recommendations with perhaps the lower resource needs and not necessarily the ones with higher risk.Table1Main approaches in HAZOP studies.Source HAZOP approach(Crowl&Louvar,2011)(DÂD)(ABS,2004)(CÂC)&(DÂD)(Hyatt,2003)(CÂC),(DÂD)&(CQÂCQ) (IEC,2001)(DÂD)(CCPS,2008);(Crawley,Preston,& Tyler,2008)(DÂD),(CÂC)Table2HAZOP recommendations without risk ranking.DescriptionRecommendation1Recommendation2Recommendation3Recommendation4Recommendation5Table3HAZOP recommendations with risk ranking.Scenario risk DescriptionHigh Recommendation2High Recommendation5Medium Recommendation3Low Recommendation1Low Recommendation4M.Pérez-Marín,M.A.Rodríguez-Toral/Journal of Loss Prevention in the Process Industries26(2013)936e940937As can be seen in Tables 2and 3,for the management there will be more important to know HAZOP results as in Table 3,in order to take decisions on planning response according to ranking risk.4.HAZOP standard for the Mexican oil &gas industryLooking at the worldwide recognized guidelines for hazard identi ﬁcation (ISO,2000)there is mentioned that when consid-ering scenarios qualitative risk assignment,one may use risk matrix for comparing the importance of risk reduction measures of the different options,but there is not a speci ﬁc risk matrix with risk values to consider.In Mexico there exist two national standards were tolerable and intolerable risk is de ﬁned,one is the Mexican National Standard NOM-028(NOM,2005)and the other is PEMEX corporate standard NRF-018(PEMEX,2008).In both Mexican standards the matrix form is considered for relating frequency and consequences.Fig.1shows the risk matrix in (NOM,2005),nomenclature regarding letters in this matrix is described in Tables 4e 6.It can be mentioned that risk matrix in (NOM,2005)is optional for risk management in local chemical process plants.For Mexican oil &gas industry,there exist a PEMEX corporate standard (NRF),Fig.2,shows the corresponding risk matrix (PEMEX,2008).Nomenclature regarding letters in this matrix is described in Tables 7e 9for risk concerning the community.It is important to mention that PEMEX corporate standard considers environmental risks,business risks,and corporate image risks.These are not shown here for space limitations.The Mexican National Standard (NOM)as being of general applicability gives the possibility for single entities (like PEMEX)to determine its own risk criteria as this company opted to do.PEMEX risk matrix can be converted to NOM ’s by category ’s grouping infrequency categories,thus giving same ﬂexibility,but with risk speci ﬁc for local industry acceptance risk criteria.One principal consideration in ranking risk is to de ﬁne if ranking is done before safeguards de ﬁnition or after.This de ﬁnition is relevant in:HAZOP kick-off presentation by HAZOP leader,explaining im-plications of risk ranking.HAZOP schedule de ﬁnition.Risk ranking at this point takes shorter time since time is not consumed in estimating risk reduction for each safeguard.If after HAZOP a LOPA is going to be done,then it should be advisable to request that HAZOP leader considers risk ranking before safeguards de ﬁnition,since LOPA has established rules in de ﬁning which safeguards are protections and the given risk reduction.Otherwise if for time or resource limitations HAZOP is not going to be followed by LOPA,then HAZOP should consider risk ranking after safeguards de ﬁnition.Therefore,the HAZOP leader should explain to the HAZOP team at the kick-off meeting a concise explanation of necessary considerations to identify safeguards having criteria to distinguish them as Independent Protection Layers (IPL)as well as the risk reduction provided by each IPL.In HAZOP report there should be make clear all assumptions and credits given to the Protections identi ﬁed by the HAZOP team.Figs.3and 4,shows a vision of both kinds of HAZOP reports:For the case of risk ranking before and after safeguards de ﬁnition.In Figs.3Fig.1.Risk matrix in (NOM,2005).Table 5Probability description (Y -axis of matrix in Fig.1)(NOM,2005).Frequency Frequency quantitative criteria L41in 10years L31in 100years L21in 1000years L1<1in 1000yearsTable 6Risk description (within matrix in Fig.1)(NOM,2005).Risk level Risk qualitative descriptionA Intolerable:risk must be reduced.B Undesirable:risk reduction required or a more rigorous risk estimation.C Tolerable risk:risk reduction is needed.DTolerable risk:risk reduction not needed.Fig.2.Risk matrix as in (PEMEX,2008).Table 7Probability description (Y -axis of matrix in Fig.2)(PEMEX,2008).Frequency Occurrence criteria Category Type Quantitative QualitativeHighF4>10À1>1in 10yearsEvent can be presented within the next 10years.Medium F310À1À10À21in 10years e 1in 100years It can occur at least once in facility lifetime.LowF210À2À10À31in 100years e 1in 1000years Possible,it has never occurred in the facility,but probably ithas occurred in a similar facility.Remote F1<10À3<1in 1000years Virtually impossible.It is norealistic its occurrence.Table 4Consequences description (X -axis of matrix in Fig.1)(NOM,2005).Consequences Consequence quantitative criteriaC4One or more fatalities (on site).Injuries or fatalities in the community (off-site).C3Permanent damage in a speci ﬁc Process or construction area.Several disability accidents or hospitalization.C2One disability accident.Multiple injuries.C1One injured.Emergency response without injuries.M.Pérez-Marín,M.A.Rodríguez-Toral /Journal of Loss Prevention in the Process Industries 26(2013)936e 940938and4“F”means frequency,C means consequence and R is risk as a function of“F”and“C”.One disadvantage of risk ranking before safeguards deﬁnition is that resulting risks usually are found to be High,Intolerable or Unacceptable.This makes difﬁcult the decision to be made by the management on what recommendations should be carried outﬁrst and which can wait.One advantage in risk ranking after safeguards deﬁnition is that it allows to show the management the risk scenario fully classiﬁed, without any tendency for identifying most risk as High(Intolerable or Unacceptable).In this way,the management will have a good description on which scenario need prompt attention and thus take risk to tolerable levels.There is commercial software for HAZOP methodology,but it normally requires the user to use his/her risk matrix,since risk matrix deﬁnition represents an extensive knowledge,resources and consensus to be recognized.The Mexican case is worldwide unique in HAZOP methodology, since it uses an agreed and recognized risk matrix and risk priori-tizing criteria according to local culture and risk understanding for the oil&gas sector.The risk matrix with corresponding risk levels took into account political,economical and ethic values.Advantages in using risk matrix in HAZOP are:they are easy to understand and to apply;once they are established and recognized they are of low cost;they allow risk ranking,thus helping risk reduction requirements and limitations.However,some disad-vantages in risk matrix use are:it may sometimes be difﬁcult to separate frequency categories,for instance it may not be easy to separate low from remote in Table7.The risk matrix subdivision may have important uncertainties,because there are qualitative considerations in its deﬁnition.Thus,it may be advantageous to update Pemex corporate HAZOP standard(PEMEX,2008)to consider a6Â6matrix instead of the current4Â4matrix.5.ConclusionsHAZOP studies are not a simple procedure application that as-sures safe Process systems on its own.It is part of a global design cycle.Thus,it is necessary to establish beforehand the HAZOP study scope that should include at least:methodology,type(CÂC,DÂD, etc.)report format,acceptance risk criteria and expected results.Mexico belongs to the reduced number of places where accep-tance risk criteria has been explicitly deﬁned for HAZOP studies at national level.ReferencesABS.(2004).Process safety institute.Course103“Process hazard analysis leader training,using the HAZOP and what-if/checklist techniques”.Houston TX:Amer-ican Bureau of Shipping.CCPS(Center for Chemical Process Safety).(2001).Layer of protection analysis: Simpliﬁed process risk assessment.New York,USA:AIChE.CCPS(Center for Chemical Process Safety).(2008).Guidelines for hazard evaluation procedures(3rd ed.).New York,USA:AIChE/John Wiley&Sons.CCPS(Center for Chemical Process Safety).(2009).Guidelines for Developing Quan-titative Safety Risk Criteria,Appendix B.Survey of worldwide risk criteria appli-cations.New York,USA:AIChE.Crawley,F.,Preston,M.,&Tyler,B.(2008).HAZOP:Guide to best practice(2nd ed.).UK:Institution of Chemical Engineers.Crowl,D.A.,&Louvar,J.F.(2011).Chemical process safety,fundamentals with ap-plications(3rd ed.).New Jersey,USA:Prentice Hall.Table8Consequences description(X-axis of matrix in Fig.2)(PEMEX,2008).Event type and consequence categoryEffect:Minor C1Moderate C2Serious C3Catastrophic C4 To peopleNeighbors Health and Safety.No impact on publichealth and safety.Neighborhood alert;potentialimpact to public health and safety.Evacuation;Minor injuries or moderateconsequence on public health and safety;side-effects cost between5and10millionMX$(0.38e0.76million US$).Evacuation;injured people;one ormore fatalities;sever consequenceon public health and safety;injuriesand side-consequence cost over10million MX$(0.76million US$).Health and Safetyof employees,serviceproviders/contractors.No injuries;ﬁrst aid.Medical treatment;Minor injurieswithout disability to work;reversible health treatment.Hospitalization;multiple injured people;total or partial disability;moderate healthtreatment.One o more fatalities;Severe injurieswith irreversible damages;permanenttotal or partial incapacity.Table9Risk description(within matrix in Fig.2)(PEMEX,2008).Risk level Risk description Risk qualitative descriptionA Intolerable Risk requires immediate action;cost should not be a limitation and doing nothing is not an acceptable option.Risk with level“A”represents an emergency situation and there should be implements with immediate temporary controls.Risk mitigation should bedone by engineered controls and/or human factors until Risk is reduced to type“C”or preferably to type“D”in less than90days.B Undesirable Risk should be reduced and there should be additional investigation.However,corrective actions should be taken within the next90days.If solution takes longer there should be installed on-site immediate temporary controls for risk reduction.C Acceptablewith control Signiﬁcant risk,but can be compensated with corrective actions during programmed facilities shutdown,to avoid interruption of work plans and extra-costs.Solutions measures to solve riskﬁndings should be done within18months.Mitigation actions should focus operations discipline and protection systems reliability.D ReasonablyacceptableRisk requires control,but it is of low impact and its attention can be carried out along with other operations improvements.Fig.3.Risk ranking before safeguard deﬁnition.Fig.4.Risk ranking after safeguards deﬁnition.M.Pérez-Marín,M.A.Rodríguez-Toral/Journal of Loss Prevention in the Process Industries26(2013)936e940939Dunjó,J.,Fthenakis,V.,Vílchez,J.A.,&Arnaldos,J.(2010).Hazard and opera-bility(HAZOP)analysis.A literature review.Journal of Hazardous Materials, 173,19e32.Goyal,R.K.,&Kugan,S.(2012).Hazard and operability studies(HAZOP)e best practices adopted by BAPCO(Barahin Petroleum Company).In Presented at SPE middle east health,safety,security and environment conference and exhibition.Abu Dhabi,UAE.2e4April.Gujar,A.M.(1996).Myths of HAZOP and HAZAN.Journal of Loss Prevention in the Process Industry,9(6),357e361.Hyatt,N.(2003).Guidelines for process hazards analysis,hazards identiﬁcation and risk analysis(pp.6-7e6-9).Ontario,Canada:CRC Press.IEC.(1995).IEC60300-3-9:1995.Risk management.Guide to risk analysis of techno-logical systems.Dependability management e Part3:Application guide e Section 9:Risk analysis of technological systems.Geneva:International Electrotechnical Commission.IEC.(2001).IEC61882.Hazard and operability studies(HAZOP studies)e Application guide.Geneva:International Electrotechnical Commission.ISO.(2000).ISO17776.Guidelines on tools and techniques for hazard identiﬁcation and risk assessment.Geneva:International Organization for Standardization.Johnson,R.W.(2010).Beyond-compliance uses of HAZOP/LOPA studies.Journal of Loss Prevention in the Process Industries,23(6),727e733.Khan,F.I.(1997).OptHAZOP-effective and optimum approach for HAZOP study.Journal of Loss Prevention in the Process Industry,10(3),191e204.Kuo,D.H.,Hsu,D.S.,&Chang,C.T.(1997).A prototype for integrating automatic fault tree/event tree/HAZOP puters&Chemical Engineering,21(9e10),S923e S928.Lawley,H.G.(1974).Operability studies and hazard analysis.Chemical Engineering Progress,70(4),45e56.Mannan,S.(2012).Lee’s loss prevention in the process industries.Hazard identiﬁca-tion,assessment and control,Vol.1,3rd ed.,Elsevier,(pp.8e31).NOM.(2005).NOM-028-STPS-2004.Mexican National standard:“Norma Oﬁcial Mexicana”.In Organización del trabajo-Seguridad en los procesos de sustancias químicas:(in Spanish),published in January2005.PEMEX.(2008).Corporate Standard:“Norma de Referencia NRF-018-PEMEX-2007“Estudios de Riesgo”(in Spanish),published in January2008. Venkatasubramanian,V.,Zhao,J.,&Viswanathan,S.(2000).Intelligent systems for HAZOP analysis of complex process puters&Chemical Engineering, 24(9e10),2291e2302.M.Pérez-Marín,M.A.Rodríguez-Toral/Journal of Loss Prevention in the Process Industries26(2013)936e940 940。

模拟ai英文面试题目及答案

模拟ai英文面试题目及答案模拟AI英文面试题目及答案1. 题目: What is the difference between a neural network anda deep learning model?答案: A neural network is a set of algorithms modeled loosely after the human brain that are designed to recognize patterns. A deep learning model is a neural network with multiple layers, allowing it to learn more complex patterns and features from data.2. 题目: Explain the concept of 'overfitting' in machine learning.答案: Overfitting occurs when a machine learning model learns the training data too well, including its noise and outliers, resulting in poor generalization to new, unseen data.3. 题目: What is the role of a 'bias' in an AI model?答案: Bias in an AI model refers to the systematic errors introduced by the model during the learning process. It can be due to the choice of model, the training data, or the algorithm's assumptions, and it can lead to unfair or inaccurate predictions.4. 题目: Describe the importance of data preprocessing in AI.答案: Data preprocessing is crucial in AI as it involves cleaning, transforming, and reducing the data to a suitableformat for the model to learn effectively. Proper preprocessing can significantly improve the performance of AI models by ensuring that the input data is relevant, accurate, and free from noise.5. 题目: How does reinforcement learning differ from supervised learning?答案: Reinforcement learning is a type of machine learning where an agent learns to make decisions by performing actions in an environment to maximize a reward signal. It differs from supervised learning, where the model learns from labeled data to predict outcomes based on input features.6. 题目: What is the purpose of a 'convolutional neural network' (CNN)?答案: A convolutional neural network (CNN) is a type of deep learning model that is particularly effective for processing data with a grid-like topology, such as images. CNNs use convolutional layers to automatically and adaptively learn spatial hierarchies of features from input images.7. 题目: Explain the concept of 'feature extraction' in AI.答案: Feature extraction in AI is the process of identifying and extracting relevant pieces of information from the raw data. It is a crucial step in many machine learning algorithms, as it helps to reduce the dimensionality of the data and to focus on the most informative aspects that can be used to make predictions or classifications.8. 题目: What is the significance of 'gradient descent' in training AI models?答案: Gradient descent is an optimization algorithm used to minimize a function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In the context of AI, it is used to minimize the loss function of a model, thus refining the model's parameters to improve its accuracy.9. 题目: How does 'transfer learning' work in AI?答案: Transfer learning is a technique where a pre-trained model is used as the starting point for learning a new task. It leverages the knowledge gained from one problem to improve performance on a different but related problem, reducing the need for large amounts of labeled data and computational resources.10. 题目: What is the role of 'regularization' in preventing overfitting?答案: Regularization is a technique used to prevent overfitting by adding a penalty term to the loss function, which discourages overly complex models. It helps to control the model's capacity, forcing it to generalize better to new data by not fitting too closely to the training data.。

联合训练生成对抗网络的半监督分类方法

光学精密工程Optics and Precision Engineering第 29 卷第 5 期2021年5月Vol. 29 No. 5May 2021文章编号 1004-924X( 2021)05-1127-09联合训练生成对抗网络的半监督分类方法徐哲，耿杰*，蒋雯，张卓，曾庆捷(西北工业大学电子信息学院，西安710072)摘要：深度神经网络需要大量数据进行监督训练学习，而实际应用中往往难以获取大量标签数据°半监督学习可以减小深度网络对标签数据的依赖，基于半监督学习的生成对抗网络可以提升分类效果,旦仍存在训练不稳定的问题°为进一步提高网络的分类精度并解决网络训练不稳定的问题，本文提出一种基于联合训练生成对抗网络的半监督分类方法，通过两个判别器的联合训练来消除单个判别器的分布误差，同时选取无标签数据中置信度高的样本来扩充标签数据集，提高半监督分类精度并提升网络模型的泛化能力°在CIFAR -10和SVHN 数据集上的实验结果表明，本文方法在不同数量的标签数据下都获得更好的分类精度°当标签数量为2 000时，在CIFAR -10数据集上分类精度可达80.36% ;当标签数量为10时，相比于现有的半监督方法，分类精度提升了约5%°在一定程度上解决了 GAN 网络在小样本条件下的过拟合问题°关键词：生成对抗网络；半监督学习；图像分类；深度学习中图分类号:TP391文献标识码：Adoi ：10. 37188/OPE. 20212905.1127Co -training generative adversarial networks forsemi -supervised classification methodXU Zhe , GENG Jie * , JIANG Wen , ZHANG Zhuo , ZENG Qing -jie(School of E lectronics and Information , Northwestern Polytechnical University , Xian 710072, China )* Corresponding author , E -mail ： gengjie@nwpu. edu. cnAbstract ： Deep neural networks require a large amount of data for supervised learning ； however , it is dif ficult to obtain enough labeled data in practical applications. Semi -supervised learning can train deep neuralnetworks with limited samples. Semi -supervised generative adversarial networks can yield superior classifi cation performance ； however ， they are unstable during training in classical networks. To further improve the classification accuracy and solve the problem of training instability for networks ， we propose a semi -su pervised classification model called co -training generative adversarial networks ( CT -GAN ) for image clas sification. In the proposed model ， co -training of two discriminators is applied to eliminate the distribution error of a single discriminator and unlabeled samples with higher confidence are selected to expand thetraining set , which can be utilized for semi -supervised classification and enhance the generalization of deep networks. Experimental results on the CIFAR -10 dataset and the SVHN dataset showed that the pro posed method achieved better classification accuracies with different numbers of labeled data. The classifi cation accuracy was 80. 36% with 2000 labeled data on the CIFAR -10 dataset , whereas it improved by收稿日期：2020-11-04；修订日期:2021-01-04.基金项目:装备预研领域基金资助项目(No. 61400010304);国家自然科学基金资助项目(No. 61901376)1128光学精密工程第29卷about5%compared with the existing semi-supervised method with10labeled data.To a certain extent, the problem of GAN overfitting under a few sample conditions is solved.Key words：generative adversarial networks；semi-supervised learning；image classification；deep learning1引言图像分类作为计算机视觉领域最基础的任务之一，主要通过提取原始图像的特征并根据特征学习进行分类［11o传统的特征提取方法主要是对图像的颜色、纹理、局部特征等图像表层特征进行处理实现的，例如尺度不变特征变换法［21,方向梯度法［31以及局部二值法［41等。

Adaptive tracking control of uncertain MIMO nonlinear systems with input constraints

article
info
abstract
In this paper, adaptive tracking control is proposed for a class of uncertain multi-input and multi-output nonlinear systems with non-symmetric input constraints. The auxiliary design system is introduced to analyze the effect of input constraints, and its states are used to adaptive tracking control design. The spectral radius of the control coefficient matrix is used to relax the nonsingular assumption of the control coefficient matrix. Subsequently, the constrained adaptive control is presented, where command filters are adopted to implement the emulate of actuator physical constraints on the control law and virtual control laws and avoid the tedious analytic computations of time derivatives of virtual control laws in the backstepping procedure. Under the proposed control techniques, the closed-loop semi-global uniformly ultimate bounded stability is achieved via Lyapunov synthesis. Finally, simulation studies are presented to illustrate the effectiveness of the proposed adaptive tracking control. © 2011 Elsevier Ltd. All rights reserved.

Projected gradient methods for non-negative matrix factorization

1
Introduction
Non-negative matrix factorization (NMF) (Paatero and Tapper, 1994; Lee and Seung, 1999) is useful for ﬁnding representations of non-negative data. Given an n × m data matrix V with Vij ≥ 0 and a pre-speciﬁed positive integer r < min(n, m), NMF ﬁnds two non-negative matrices W ∈ Rn×r and H ∈ Rr×m such that V ≈ W H. If each column of V represents an object, NMF approximates it by a linear combination of r “basis” columns in W . NMF has been applied to many areas such as ﬁnding basis vectors of images (Lee and Seung, 1999), document clustering (Xu et al., 2003), molecular pattern discovery (Brunet et al., 2004), etc. Donoho and 1
Stodden (2004) have addressed the theoretical issues associated with the NMF approach. The conventional approach to ﬁnd W and H is by minimizing the diﬀerence between V and W H : min

Graph Regularized Nonnegative Matrix

Ç
1 INTRODUCTION
HE
techniques for matrix factorization have become popular in recent years for data representation. In many problems in information retrieval, computer vision, and pattern recognition, the input data matrix is of very high dimension. This makes learning from example infeasible [15]. One then hopes to find two or more lower dimensional matrices whose product provides a good approximation to the original one. The canonical matrix factorization techniques include LU decomposition, QR decomposition, vector quantization, and Singular Value Decomposition (SVD). SVD is one of the most frequently used matrix factorization techniques. A singular value decomposition of an M Â N matrix X has the following form: X ¼ UÆVT ; where U is an M Â M orthogonal matrix, V is an N Â N orthogonal matrix, and Æ is an M Â N diagonal matrix with Æij ¼ 0 if i 6¼ j and Æii ! 0. The quantities Æii are called the singular values of X, and the columns of U and V are called

Finding community structure in networks using the eigenvectors of matrices

Finding community structure in networks using the eigenvectors of matrices
M. E. J. Newman
Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109–1040
We consider the problem of detecting communities or modules in networks, groups of vertices with a higher-than-average density of edges connecting them. Previous work indicates that a robust approach to this problem is the maximization of the beneﬁt function known as “modularity” over possible divisions of a network. Here we show that this maximization process can be written in terms of the eigenspectrum of a matrix we call the modularity matrix, which plays a role in community detection similar to that played by the graph Laplacian in graph partitioning calculations. This result leads us to a number of possible algorithms for detecting community structure, as well as several other results, including a spectral measure of bipartite structure in neteasure that identiﬁes those vertices that occupy central positions within the communities to which they belong. The algorithms and measures proposed are illustrated with applications to a variety of real-world complex networks.

co-segmentation

Co-segmentation
Report
协同分割定义
图像协同分割是指从包含共同对象(往往也称为前景)的图像组中分割出共同对象的问题
基于MRF的协同分割
MRF能量函数公式：
基于MRF协同分割对比
基于主动轮廓的协同分割
基本能量函数公式：其中, 第 1 项描述曲线的长度, 第 2 项描述曲线所围成区域的面积, 第 3 项描述前景的一致性, 第 4 项描述背景的相似性
基于主动轮廓协同分割对比
基于聚类的协同分割
基于聚类协同分割过程示意图向图的协同分割示意图基于有向图协同分割分析
协同分割前景
图像协同分割结合了无监督分割和有监督分割的特点. 在大数据时代背景下, 随着越来越多大规模图像数据集的出现以及网络照片流的应用越来越广泛, 单幅图像的分割在效率和效果方面已经不能满足需求. 而协同分割则能充分利用相关图像之间的信息进行图像分割, 为图像分割提供了较有前景的解决方案。
协同分割主要问题
• 协同分割模型构建、选择以及自适应问题 • 协同分割模型的优化问题 • 局部区域的一致性衡量问题 • 中高层语义下共同对象信息的挖掘问题 • 特定应用下的协同分割问题
参考文献
[1] Rother C, Minka T, Blake A, et al. Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs[C] //Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2006, 1: 993-1000 [2] Zhu H Y, Meng F M, Cai J F, et al. Beyond pixels: a comprehensive survey from bottom-up to semantic image segmentation and cosegmentation[J]. Journal of Visual Communication and Image Representation, 2016, 34: 12-27 [3] Vicente S, Kolmogorov V, Rother C. Cosegmentation revisited: Models and optimization[M] //Lecture Notes in Computer Science. Heidelberg: Springer, 2010, 6312: 465479 [4] Mukherjee L, Singh V, Dyer C R. Half-integrality based algorithms for cosegmentation of images[C] //Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2009: 2028-2035 [5] Hochbaum D S, Singh V. An efficient algorithm for Co-segmentation[C] //Proceedings of the 12th IEEE International Conference on Computer Vision. Los Alamitos: IEEE Computer Society Press, 2009: 269-276 [6] Rubio J C, Serrat J, Ló pez A, et al. Unsupervised co-segmentation through region matching[C] //Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2012: 749-756 [7] Collins M D, Xu J, Grady L, et al. Random walks based multiimage segmentation: Quasiconvexity results and GPU-based solutions[C] //Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2012: 1656-1663 [8] Kim G, Xing E P, Li F F, et al. Distributed cosegmentation via submodular optimization on anisotropic diffusion[C] //Proceedings of IEEE International Conference on Computer Vision. Los Alamitos: IEEE Computer Society Press, 2011: 169-176 [9] Kichenassamy S, Kumar A, Olver P, et al. Conformal curvature flows: From phase transitions to active vision[J]. Archive for Rational Mechanics and Analysis, 1996, 134(3): 275-301 [10] Caselles V, Kimmel R, Sapiro G. Geodesic active contours[J]. International Journal of Computer Vision, 1997, 22(1): 61-79 [11] Meng F M, Li H L, Liu G H. Image co-segmentation via active contours[C] //Proceedings of IEEE International Symposium on Circuits and Systems. Los Alamitos: IEEE Computer Society Press, 2012: 2773-2776 [13] Ali S, Madabhushi A. An integrated region-, boundary-, shape-based active contour for multiple object overlap resolution in histological imagery[J]. IEEE Transactions on Medical Imaging, 2012, 31(7): 1448-1460 [14] He N, Zhang P. Varitional level set image segmentation method based on boundary and region information[J]. Acta Electronica Sinica, 2009, 37(10): 2215-2219 [15] Joulin A, Bach F, Ponce J. Discriminative clustering for image co-segmentation[C] //Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2010: 1943-1950 [16] Meng F M, Li H L, Liu G H, et al. Object co-segmentation based on shortest path algorithm and saliency model[J]. IEEE Transactions on Multimedia, 2012, 14(5):1429-1441 [17] Meng F M, Li H L, Liu G H. A new co-saliency model via pairwise constraint graph matching[C] //Proceedings of International Symposium on Intelligent Signal Processing and Communications Systems. Los Alamitos: IEEE Computer Society Press, 2012: 781-786

THE INTERNATIONAL JOURNAL OF MEDICAL ROBOTICS AND COMPUTER ASSISTED SURGERY Int J Med Robot

Introduction
Computer-assisted surgery (CAS) is a methodology that translates into accurate and reliable image-to-surgical space guidance. Neurosurgery is a very complex procedure and the surgeon has to integrate multi-modal data to produce an optimal surgical plan. Often the lesion of interest is surrounded by vital structures, such as the motor cortex, temporal cortex, vision and audio sensors, etc., and has irregular conﬁgurations. Slight damage to such eloquent brain structures can severely impair the patient (1,2). CASMIL, an imageguided neurosurgery toolkit, is being developed to produce optimum plans resulting in minimally invasive surgeries. This system has many innovative features needed by neurosurgeons that are not available in other academic and commercial systems. CASMIL is an integration of various vital modules, such as rigid and non-rigid co-registration (image–image, image–atlas and

Recurrent extensions of self–similar Markov processes and Cramer’s condition

Universit´e s de Paris6&Paris7-CNRS(UMR7599)PR´EPUBLICATIONS DU LABORATOIRE DE PROBABILIT´ES&MOD`ELES AL´EATOIRES4,place Jussieu-Case188-75252Paris cedex05http://www.proba.jussieu.frRecurrent extensions of self-similarMarkov processes and Cramer’s conditionV.RIVEROJUILLET2003Pr´e publication n o838Laboratoire de Probabilit´e s et Mod`e les Al´e atoires,CNRS-UMR7599, Universit´e Paris VI&Universit´e Paris VII,4,place Jussieu,Case188,F-75252Paris Cedex05.Recurrent extensions of self–similar Markov processes and Cramer’s conditionV´ıctor RIVERO∗†July4,2003AbstractLetξbe a real valued L´e vy process that drifts to−∞and satisﬁes Cramer’s condition,and X a self–similar Markov process associated toξvia Lamperti’s[22]transformation.In this case,X has0as a trap and fulﬁlls the assumptions of Vuolle-Apiala[34].We deduce from[34]thatthere exists a unique excursion measure n,compatible with the semigroup of X and such thatn(X0+>0)=0.Here,we give a precise description of n via its associated entrance law.To thatend,we construct a self–similar process X ,which can be viewed as X conditioned to never hit0,and then we construct n in a similar way like the Brownian excursion measure is constructedvia the law of a Bessel(3)process.An alternative description of n is given by specifying the lawof the excursion process conditioned to have a given length.We establish some duality relationsfrom which we determine the image under time reversal of n.Key words.Self–similar Markov process,description of excursion measures,weak dual-ity,L´e vy processes.A.M.S.Classiﬁcation.60J25(60G18).1IntroductionLet X=(X t,t≥0)be a strong Markov process with values in[0,∞[and for x≥0,denote by P x its law starting from x.Assume that X fulﬁlls the scaling property:thereexists someα>0such thatthe law of(cX tc−1/α,t≥0)under P x is P cx,(1) for any x≥0and c>0.Such processes were introduced by Lamperti[22]under the nameof semi–stable processes,nowadays they are calledα–self–similar Markov processes.Werefer to Embrechts and Maejima[14]for a recent account on self–similar processes.Lamperti established that for eachﬁxedα>0,there exists a one to one correspondence betweenα–self–similar Markov processes on]0,∞[and L´e vy processes that we next sketch.Let(D,D)be the space of c`a dl`a g pathsω:[0,∞[→]−∞,∞[endowed with theσ–algebra ∗Research supported by a grant from CONACYT(National Council of science and technology Mexico).†Laboratoire de Probabilit´e s et Mod`e les Al´e atoires,Universit´e Pierre et Marie Curie;175,rue du Chevaleret, F-75013Paris,France.mail:rivero@ccr.jussieu.fr121INTRODUCTION generated by the coordinate maps and the naturalﬁltration(D t,t≥0).Let P be aprobability measure on D such that under P the coordinate processξis a L´e vy processthat drifts to−∞,i.e.lim s→∞ξs=−∞.Set for t≥0τ(t)=inf{s>0, s0eξr/αdr>t},with the usual convention that inf{∅}=∞.For an arbitrary x>0,let P x be thedistribution on D+={ω:[0,∞[→[0,∞[c`a dl`a g},of the time–changed processx exp ξτ(tx−1/α) ,t≥0,where the above quantity is assumed to be0whenτ(tx−1/α)=∞.We agree that P0isthe law of the process identical to0.Classical results on time change yields that under(P x,x≥0)the process X is Markovian with respect to theﬁltration(G t=Dτ(t),t≥0).Furthermore,X has the scaling property(1).Thus,X is a self–similar Markov process on[0,∞[having0as trap or absorbing point.Conversely,any self–similar Markov processthat has0as a trap can be constructed in this way,cf.[22].Let T0be theﬁrst hitting time of0for X,i.e.T0=inf{t>0:X t=0}.It should be clear that the distribution of T0under P x is the same as that of x1/αI underP,with I the so–called L´e vy exponential functional associated toξandα,that isI= ∞0exp{ξs/α}ds.(2)Sinceξdrifts to−∞we have that I<∞,P–a.s.and as a consequence P x(T0<∞)=1for all x>0.Denote P t and V q the semigroup and resolvent for the process X killed attime T0,say(X,T0),P t f(x)=E x(f(X t),t<T0),x>0,V q f(x)= ∞0e−qt P t f(x)dt,x>0,for measurable functions f non–negative or bounded.It is customary to refer to(X,T0)as the minimal process.Given that the former construction enables to describe the behavior of the self–similar Markov process X until itsﬁrst hitting time of0,Lamperti[22]raised the following ques-tion:What are the self–similar Markov processes X on[0,∞[which behave like(X,T0) up to the time T0?Lamperti solved this problem in the case when the minimal processis a Brownian motion killed at0.Then Vuolle-Apiala[34]tackled this problem using theexcursion theory for Markov processes and assuming that the following hypotheses hold.There existsκ>0such that(H1-a)the limitlim x→0E x(1−e−T0)xκ,exists and is strictly positive;3(H1-b)the limitlimx→0V q f(x) x,exists for all f∈C K]0,∞[and is strictly positive for some such functions,with C K]0,∞[={f:R→R,continuous and with compact support on]0,∞[}.The main result in[34]is the existence of an unique entrance law(n s,s>0)such thatlims→0n s B c=0,for every neighborhood B of0and∞0e−s n s1ds=1.This entrance law is determined by its q–potential by the formula∞0e−qs n s fds=lim x→0V q f(x)E x(1−e−T0),q>0,(3) for f∈C K]0,∞[.Then,using the results of Blumenthal[7],Vuolle-Apiala proved that associated to the entrance law(n s,s>0)there exists a unique recurrent Markov process X having the scaling property(1)which is an extension of the minimal process(X,T0), that is X killed at time T0is equivalent to(X,T0)and0is a recurrent regular state for X,i.e. P x(T0<∞)=1,∀x>0, P0(T0=0)=1,with P the law on D+of X.Furthermore,we know from[7]that there exists a unique excursion measure say n,on(D+,G∞)compatible with the semigroup P t such that its associated entrance law is(n s,s>0);the property lim s→0n s B c=0,for any B neigh-borhood of0is equivalent to n(X0+>0)=0,that is the process leaves0continuously under n.Then the excursion measure n is the unique excursion measure having the prop-erties n(X0+>0)=0and n(1−e−T0)=1.See subsection2.1for the deﬁnitions.Theﬁrst aim of this paper is provide a more explicit description of the excursion measure n and its associated entrance law(n s,s>0).To that purpose,we shall mimic a well known construction of the Brownian excursion measure via the Bessel(3)process that we next sketch for ease of reference.Let P(respectively R)be a probability measure on (D+,G∞)under which the coordinate process is a Brownian motion killed at0(respectively a Bessel(3)process).The probability measure R appears as the law of the Brownian motion conditioned to never hit0.More precisely,for u>0,x>0limt→∞P x(A|T0>t)=R x(A),for any A∈G u,see e.g.McKean[23].Moreover,the function h(x)=x−1,x>0is excessive for the semigroup of the Bessel(3)process and its h–transform is the semigroup of the Brownian motion killed at0.Let n be the h–transform of R0via the function h(x)=x−1,i.e.n is the unique measure on(D+,G∞)with support on{T0>0}such that under n the coordinate process is Markovian with semigroup that of the Brownian motion killed at0,and for every stopping time T in G t and any G T–measurable variableF T,n(F T,T<T0)=R0(F T1X T).41INTRODUCTION Then the measure n is a multiple of the Itˆo’s excursion measure for Brownian motion,seee.g.Imhof[20]§4.In order to carry out this program we will make the following hypotheses on the L´e vy processξ.(H2-a)ξis not arithmetic,i.e.the state space is not a subgroup of k Z for any real number k;(H2-b)There existsθ>0such that E(eθξ1)=1;(H2-c)E(ξ+1eθξ1)<∞,with a+=a∨0.The condition(H2-c)can be stated in terms of the L´e vy measure ofξas(H2-c’) {x>1}xeθxΠ(dx)<∞;cf.Sato[32]Theorem25.3.Such hypotheses are satisﬁed by a wide class of L´e vy processes,in particular by those associated with self–similar diﬀusions and stable processes.In thesequel we will refer to these hypotheses as the(H2)hypotheses,unless otherwise stated.The condition(H2-b)is the so–called Cramer’s condition for the L´e vy processξand forcesξto drifts to−∞or equivalently E(ξ1)<0.Cramer’s condition enable us toconstruct a law P on D,such that under P the coordinate processξ ,is a L´e vy processthat drifts to∞and P |D t=eθξt P|D t.Then,we will show that the self–similar Markovprocess X associated to the L´e vy processξ plays the rˆo le of a Bessel(3)process in ourconstruction of the excursion measure n.The rest of this paper is organized as follows.In Subsection2.1we recall the Itˆo’s program as settled by Blumenthal[7].The excursion measure n that interests us is theonly excursion measure having the property n(X0+>0)=0.Nevertheless,this is notthe only excursion measure compatible with the semigroup of the minimal process,that iswhy in Subsection2.2we review some properties that should be satisﬁed by any excursionmeasure corresponding to a self–similar extension of the minimal process.There we alsoobtain necessary and suﬃcient conditions for the existence of an excursion measure n jsuch that n j(X0=0)=0,which are valid for any self–similar Markov process having0asa trap.In Subsection2.3we construct a self–similar Markov process X which is related to(X,T0)in an analogue way like the Bessel(3)process does to Brownian motion killed at0,prove that the conditions(H1)are satisﬁed under the hypothesis(H2),give a more explicitexpression for the limit in equation(3)and that the hypothesis(H1)imply the conditions(H2-b,c).Next,in Section3we give our main description of the excursion measure n andgive an answer to the question raised by Lamperti that can be sketched as follows:givena L´e vy processξsatisfying the hypotheses(H2),then anα–self–similar Markov processX associated toξ,admits a recurrent extension that leaves0continuously a.s.if andonly if0<αθ<1.The purpose of Section4is to give an alternative description of themeasure n by determining the law of the excursion process conditioned by its length,forBrownian motion this corresponds to the description of the Itˆo excursion measure via thelaw of a Bessel(3)bridge.In Section5we study some duality relations for the minimalprocess and in particular we determine the image under time reversal of n.Finally,in theAppendix A we establish that the extensions of any two minimal processes which are inweak duality still are in weak duality as could be expected.Last,the development of this work is largely based on the theory of h–transforms of Doob,cf.Sharpe[33]or Walsh[35],which will be used without further reference.5 2Preliminaries andﬁrst resultsThis section contains several parts.In theﬁrst one,we recall the Itˆo’s program and the results in Blumenthal[7].The purpose of Subsection2.2is study the excursion measures compatible with the semigroup of the minimal process(X,T0).Finally,in Subsection2.3we establish the existence of a self–similar Markov process X which bears the same relation to the minimal process(X,T0)as the Bessel(3)process does to Brownian motion killed at0.The results in Subsections2.1and2.2do not require hypotheses(H2).2.1Some general facts on recurrent extensions of Markov processesA measure n on(D+,G∞)having inﬁnite mass is called a pseudo excursion measure com-patible with the semigroup P t if the following are satisﬁed:(i)n is carried by{ω∈D+|T0(ω)>0and X t(ω)=0,∀t≥T0};(ii)for every bounded G∞–measurable H and each t>0andΛ∈G tn(H◦θt,Λ∩{t<T0})=n(E X t(H),Λ∩{t<T0}),whereθt denotes the shift operator.If moreover(iii)n(1−e−T0)<∞;we will say that n is an excursion measure.A normalized excursion measure n is an excursion measure n such that n(1−e−T0)=1.The rˆo le played by condition(iii)will be explained below.The entrance law associated to a pseudo excursion measure n is deﬁned byn s(dy):=n(X s∈dy,s<T0),s>0.A partial converse holds:given an entrance law(n s,s>0)such that∞0(1−e−s)dn s1<∞,there exists a unique excursion measure n such that its associated entrance law is(n s,s> 0),see e.g.[7].It is well known in the theory of Markov process that a way to construct recurrent extensions of a Markov process is the Itˆo’s program or pathwise approach that can be described as follows.Assume that there exists an excursion measure n compatible withthe semigroup of the minimal process P t.Realize a Poisson point process∆=(∆s,s>0)on D+with characteristic measure n.Thus each atom∆s is a path and T0(∆s)denotesits lifetime,i.e.T0(∆s)=inf{t>0:∆s(t)=0}.Setσt= s≤t T0(∆s),t≥0.62PRELIMINARIES AND FIRST RESULTS Since n(1−e−T0)<∞,σt<∞a.s.for every t>0.It follows that the processσ=(σt,t≥0)is an increasing c`a dl`a g process with stationary and independent increments,i.e.a subordinator.Its law is characterized by its Laplace exponentφ,deﬁned byE(e−λσ1)=e−φ(λ),λ>0,andφ(λ)can be expressed thanks to the L´e vy–Kintchine’s formula asφ(λ)= ]0,∞[(1−e−λs)ν(ds),withνa measure such that s∧1ν(ds)<∞,called the L´e vy measure ofσ;see e.g.Bertoin[1]§3for background.An application of the exponential formula for Poissonpoint process givesE(e−λσ1)=e−n(1−e−λT0),λ>0,i.e.φ(λ)=n(1−e−λT0)and the tail of the L´e vy measure is given byν[s,∞[=n(s<T0)=n s1,s>0.Observe that if we assumeφ(1)=n(1−e−T0)=1thenφis uniquely determined.Sincen has inﬁnite mass,σt is strictly increasing in t.Let L t be the local time at0,i.e.thecontinuous inverse ofσL t=inf{r>0:σr>t}=inf{r>0:σr≥t}.Deﬁne a process( X t,t≥0)as follows.For t≥0,let L t=s,thenσs−≤t≤σs,setX t= ∆s(t−σs−)ifσs−<σs0ifσs−=σs or s=0.(4)That the process so constructed is a Markov process has been established in all its gene-rality by Salisbury[30,31]and under some regularity hypotheses on the semigroup of theminimal process by Blumenthal[7].See also Rogers[29]for its analytical counterpart.Inour setting the hypotheses in[7]are satisﬁed as it is stated in the following lemma.Lemma1.Let C0]0,∞[,be the space of continuous functions on]0,∞[vanishing at0and∞.(i)if f∈C0]0,∞[,then P t f∈C0]0,∞[and P t f→f uniformly as t→0.(ii)E x(e−qT0)is continuous in x for each q>0andlim x→0E x(e−T0)=1and limx→∞E x(e−T0)=0.A proof to this Lemma can be found in[34]pp.549–550.Then we have from[7]that X is a Markov process with Feller semigroup and its resolvent{U q,q>0}satisﬁesU q f(x)=V q f(x)+E x(e−qT0)U q f(0),x>0,for f∈C b(R+)={f:R+→R,continuous and bounded}.That is X is an extension of the minimal process.Furthermore,if{X t,t≥0}is a Markov process extending the minimal one with Itˆo excursion measure n and local time at0,say{L t,t≥0},such thatE ( ∞0e−s dL s)=1,2.2Some properties of excursion measures for self–similar Markov process7 where E is the law for X .Then the process X and X are equivalent and the Itˆo’s excursion measure for X is n.Thus,the results in[7]establish a one to one relation between excursion measures and recurrent extensions of Markov process.Given an excursion measure n we will say that theassociated extension of the minimal process leaves0continuously a.s.if n(X0+>0)=0or equivalently,in terms of its entrance law,lim s→0n s(B c)=0for every neighborhood Bof0,see e.g.[7];if n is such that n(X0+=0)=0,we will say that the extension leaves0by jumps a.s.The latter condition on n is equivalent to the existence of a jumping–inmeasureη,that isηis aσ–ﬁnite measure on]0,∞[such that the entrance law associatedto n can be expressed asn s f=n(f(X s),s<T0)= ]0,∞[η(dx)P s f(x),s>0,for every f∈C b(R+),cf.Meyer[25].Finally,observe that if n is a pseudo excursion measure that does not satisﬁes the condition(iii),one can still realize a Poisson point process of excursions on(D+,G∞)withcharacteristic measure n but we can not form a process extending the minimal one by sticking together the excursions because the sum of lengths s≤t T0(Y s),is inﬁnite P-a.s.for every t>0.2.2Some properties of excursion measures for self–similarMarkov processNext,we deduce necessary and suﬃcient conditions that must be satisﬁed by an excursionmeasure in order that the associated recurrent extension of the minimal process to be self–similar.For c∈R,let H c be the dilatation H c f(x)=f(cx).Lemma2.Let n be an excursion measure and X the associated recurrent extension ofthe minimal process.The following are equivalent(i)The process X has the scaling property(ii)there existsγ∈]0,1[such that for any c>0,n( T00e−qs f(X s)ds)=c(1−γ)/αn( T00e−(qc1/αs)H c f(X s)ds),for f∈C b(R+).(iii)there existsγ∈]0,1[such that for any c>0,n s f=c−γ/αn s/c1/αH c f for all s>0,for f∈C b(R+).Remark If one of the conditions(i–iii)in the previous Lemma holds,then the subordi-natorσwhich is the inverse local time of X,is a stable subordinator of parameterγ,withγdetermined in the condition(ii)or(iii).Proof.(ii)⇐⇒(iii)is straightforward.82PRELIMINARIES AND FIRST RESULTS(i)⇒(ii).Suppose that there exists an excursion measure n such that the associated recurrent extension X has the scaling property(1).Let M be the random set of zeros for the process X,i.e.M={t≥0| X(t)=0}.By construction M is the closed range of the subordinatorσ=(σt,t≥0),that is M is a regenerative set.The recurrence of X implies that M is unbounded a.s.By the scaling property for X we have thatM=d c M,for each c>0,that is M is self–similar.Thus the subordinator should have the scaling property andsince the only L´e vy processes that have the scaling property are the stable processes itfollows thatσis a subordinator stable of parameterγfor someγ∈]0,1[or in terms of itsLaplace exponentφ(λ)=n(1−e−λT0)=λγ,λ>0.Recall that the scaling property forthe extension can be stated in terms of its resolvent by saying that for any c>0,U q f(x)=c1/αU qc1/αH c f(x/c),for all x≥0,(5) for f∈C b(R+).Using the compensation formula for Poisson point processes we get thatU q f(0)=n( T00e−qs f(X s)ds)n(1−e0),(6)From equation(5)we have that the measure n should be such thatn( T00e−qs f(X s)ds) n(1−e0)=c1/αn( T00e−qc1/αs H c f(X s)ds)n(1−e−qc1/αT0),and therefore we conclude thatn( T00e−qs f(X s)ds)=c(1−γ)/αn( T00e−(qc1/αs)H c f(X s)ds).(ii)⇒(i).The scaling property of X is obtained by means of(5).In fact,the only thing that should be veriﬁed is that equation(5)holds for x=0,since we have the identity U q f(x)=V q f(x)+E x(e−qT0)U q f(0),x>0,and the scaling property of the minimal process stated in terms of its resolvent V q,i.e.V q f(x)=c1/αV qc1/αH c f(x/c),x>0,c>0,q>0.Indeed,by construction it follows that the formula(6)holds and the hypothesis(ii)implies that n(1−e−qT0)=qγ,q>0;the conclusion is immediate.In the following proposition we give a description of the sojourn measure of X and a necessary condition for the existence of a excursion measure n such that one of the conditions in the Lemma2holds.Lemma3.Let n be a normalized excursion measure and X the associated extension of the minimal process(X,T0).Assume that one of the conditions(i–iii)in Lemma2holds. Thenn( T001{X s∈dy}ds)=Cα,γy(1−α−γ)/αdy,y>0,withγdetermined in(ii)of Lemma2and Cα,γ∈]0,∞[a constant.As a consequence, E(I−(1−γ))<∞and Cα,γ=(αE(I−(1−γ))Γ(1−γ))−1,where I denote the exponential functional(2).2.2Some properties of excursion measures for self–similar Markov process9Proof.Recall that the sojourn measuren( T001X s∈dy ds)= ∞0n s(dy)ds,is aσ–ﬁnite measure on]0,∞[and is the unique excessive measure for the semigroup of the process X,see e.g.Dellacherie et al.[12]XIX.46.Next,using the result(iii)inLemma2and the Fubini’s Theorem we obtain the following representation of the sojournmeasure,for f≥0measurable∞0n s fds= ∞0s−γn1(H sαf)ds= n1(dz) ∞0s−γf(sαz)ds=Cα,γ ∞0u(1−α−γ)/αf(u)du,with0<Cα,γ=α−1 n1(dz)z−(1−γ)/α<∞.This proves theﬁrst part of the claimedresult.We now prove that E(I−(1−γ))<∞.On the one hand,the functionϕ(x)=E x(e−T0)is integrable with respect to the sojourn measure.To see this,use the Markovproperty under n,to obtainn( T00ϕ(X s)ds)= ∞0n(ϕ(X s),s<T0)ds= ∞0n(e−T0◦θs,s<T0)ds= ∞0n(e−(T0−s),s<T0)ds=n(1−e−T0)=1.On the other hand,using the representation of the sojourn measure,Fubini’s Theoremand the scaling property we have thatCα,γ ∞0E y(e−T0)y(1−α−γ)/αdy=Cα,γ ∞0E(e−y1/αI)y(1−α−γ)/αdy=Cα,γαE(I−(1−γ))Γ(1−γ).Therefore,E(I−(1−γ))<∞and Cα,γ=(αE(I−(1−γ))Γ(1−γ))−1.We next study the extensions X that leave0a.s.by ing only the scaling property(1)it can be veriﬁed that the only possible jumping–in measures such that theassociated excursion measure satisﬁes(ii)in Lemma2should be of the typeη(dx)=bα,βx−(1+β)dx,x>0,0<αβ<1,with a constant bα,β>0,depending onαandβ,cf.[34].This being said we can state anelementary but satisfactory result on the existence of extensions of the minimal processthat leaves0by jumps a.s.102PRELIMINARIES AND FIRST RESULTSProposition 1.Let β∈]0,1/α[.The following are equivalent(i)E (I αβ)<∞,(ii)The pseudo excursion measure n j =P η,based on the jumping–in measure η(dx )=x −(1+β)dx,x >0,is an excursion measure,(iii)the minimal process (X,T 0)admits an extension X,that is a self–similar recurrent Markov process and leaves 0by jumps a.s.according to the jumping–in measureη(dx )=b α,βx −(1+β)dx,with b α,β=β/E (I αβ)Γ(1−αβ).If one of these conditions holds then γin (ii)in Lemma 2is equal to αβ.The condition (i)in Proposition 1is easily veriﬁed under weak technical assumptions.Namely,if we assume the hypothesis (H2)the aforementioned condition is veriﬁed for every β∈]0,(1/α)∧θ[;this will we deduced from Lemma 4below.On the other hand,that condition is veriﬁed in other settings as can be viewed in the following example.Example 1(Generalized self–similar saw tooth processes).Let α>0,ζa sub-ordinator such that E (ζ1)<∞,and X the α–self–similar process associated to the L´e vy process ξ=−ζ.Then ξdrifts to −∞,X has a ﬁnite lifetime T 0and X decreases from its starting point until the time T 0,when it is absorbed at 0.Furthermore,it was proved by Carmona et al.[10]that the L´e vy exponential functional I = ∞0exp {−ζs /α}ds,has ﬁnite integral moments of all orders.It follows that the condition (i)in Proposition 1is satisﬁed by every β∈]0,1/α[.Thus for each β∈]0,1/α[the α–self–similar extension X that leaves 0by jumps according to the jumping–in measure in (iii)of Proposition 1,is a process having sample paths that looks like a saw with “rough”tooths.These are all the possible extensions of X,that is,it is impossible to construct an excursion measure such that its associated extension of (X,T 0)leaves 0continuously a.s.since we know that the process X decreases to 0.Proof of Proposition 1.Let η(dx )=x −(1+β)dx,x >0and n j be the pseudo excursion measure n j =P η.By deﬁnition the entrance law associated to n j is n j s f =∞0dx x −(1+β)P s f (x ),s >0.Thus the only thing that should be veriﬁed by n j to be an excursion measure is that n j (1−e −T 0)<∞.This follows from the elementary calculation ∞0dx x −(1+β)E x (1−e −T 0)= ∞0dx x −(1+β)E (1−e −x 1/αI )=αEdy y −αβ−1(1−e −yI ) =E (I αβ)Γ(1−αβ)β.That is,n j (1−e −T 0)<∞if and only if E (I αβ)<∞,which proves the equivalence between the assertions in (i)and (ii).If (ii)holds it follows from the results in [7]and the Lemma 2that associated to the normalized excursion measure n j =b α,βP ηthere exists a unique extension of the minimal process (X,T 0)that is a self–similar Markov process and that leaves 0by jumps according to the jumping–in measure b α,βx −(1+β)dx,x >0,whichestablish (iii).Conversely,if (iii)holds the Itˆo ’s excursion measure of X,is n j =b α,βP ηand the statement in (ii)follows.2.3The process X analogue to the Bessel(3)process112.3The process X analogue to the Bessel(3)processHere we shall establish the existence of a self–similar Markov process X that can beviewed as the self–similar Markov process(X,T0)conditioned to never hit0.In the case(X,T0)is a Brownian motion killed at0,X corresponds to the Bessel(3)process.Tothat end,we next recall some facts on L´e vy processes and density transformations anddeduce some consequence for self–similar Markov processes.We assume henceforth(H2).The law of a L´e vy processξ,is characterized by a functionΨ:R→C,deﬁned by the relationE(e iuξ1)=exp{−Ψ(u)},u∈R.The functionΨis called the characteristic exponent of the L´e vy processξand can beexpressed thank to the L´e vy–Khintchine’s formula asΨ(u)=iau+σ2u22+ R(1−e iux+iux1{|x|<1})Π(dx),whereΠis a measure on R\{0}such that (|x|2∧1)Π(dx)<∞.The measureΠis called the L´e vy measure,a the drift andσ2the Gaussian coeﬃcient ofξ.Conditions(H2-b,c) imply that the L´e vy exponent ofξadmits an analytic extension to the complex strip I(z)∈[−θ,0].Thus we can deﬁne a functionψ:[0,θ]→R byE(eλξ1)=eψ(λ)andψ(λ)=−Ψ(−iλ),0≤λ≤θ.Holder’s inequality implies thatψis a convex function and thatθis the unique solution to the equationψ(λ)=0forλ>0.Furthermore,the function h(x)=eθx is invariant for the semigroup ofξ.Let P be the h–transform of P via the invariant function h(x)=eθx.That is,the measure P is the unique measure on(D,D)such that for everyﬁnite D t-stopping time T and each A∈D TP (A)=P(eθξT A).Under P the process(ξt,t≥0)still is a L´e vy process,with characteristic exponentΨ (u)=Ψ(u−iθ),u∈R,and drifts to∞,more precisely,0<m :=E (ξ1)=ψ (θ−)<∞.See e.g.Sato[32]§33,for a proof of these facts and more about this change of measure.Let P x denote the law on D+of the self–similar Markov process started at x>0 associated to the L´e vy processξ via Lamperti’s transformation.In the sequel it will be implicit that the superscript refers to the measures P or P .We now establish a relation between the probability measures P and P analogue to that between the law of a Brownian motion killed at0and the law of a Bessel(3)process,see e.g.McKean[23]. Informally,the law P x can be interpreted as the law under P x of X conditioned to never hit0.Proposition2.(i)Let x>0arbitrary,we have that P x is the unique measure such that for every T stopping time in G t we haveP x(A)=x−θP x(A XθT,T<T0),for any A∈G T.In particular,the function h∗:[0,∞[→[0,∞[deﬁned by h∗(x)=xθis invariant for the semigroup P t.122PRELIMINARIES AND FIRST RESULTS(ii)For every x>0and t>0we haveP x(A)=lims→∞P x(A|T0>s),for any A∈G t.The proof of(i)in Proposition2is a straightforward consequence of the fact that P is the h–transform of P and that for every T stopping time in G t we have thatτ(T)isan stopping time in F t.To prove(ii)in Proposition2we need the following lemma thatprovides us a tail estimation for the law of the L´e vy exponential functional I associatedtoξas deﬁned in(2).Lemma4.Under the conditions(H2)we have thatlimt→∞tαθP(I>t)=C,where0<C=αm tαθ−1(P(I>t)−P(eξ 1I>t))dt<∞,withξ 1=dξ1and independent of I.If0<αθ<1,thenC=αmE(I−(1−αθ)).Two proofs of this result have been given in a slight restrictive setting by Mejane[24]. However,one of its proofs can be extended to our case and in fact it is an easy consequence of a result on random equations originally due to Kesten[21]who in turn uses a diﬃcult result on random matrices.A simpler proof of Kesten’s result was given in Goldie[19]. Sketch of proof of Lemma4.It is straightforward that the L´e vy exponential functional I satisﬁes the equation in lawI=d 10eξs/αds+eξ1/αI =Q+MI ,with I the L´e vy exponential functional associated toξ ={ξ t=ξ1+t−ξ1,t≥0},a L´e vy process independent of F1and with the same distribution asξ.Thus,according to[21] if the conditions(i–iv)below are satisﬁed then there exists a strictly positive constant C such thatlimt→∞tαθP(I>t)=C.The hypotheses of Kesten’s Theorem are(i)M is not arithmetic(ii)E(Mαθ)=1,(iii)E(Mαθln+(M))<∞,(iv)E(Qαθ)<∞.。

智能科学与技术导论课件第4章

预处理生成的特征可以仍然用数值来表示，也可以用拓扑关系、逻辑结构等其它形式来表示，分别适用于不同的模式识别方法。
4.1 模式识别概述
4.1.4 模式识别原理与过程
3.特征提取和选择
从大量的特征中选取出对分类最有效的有限特征，降低模式识别过程的计算复杂度，提高分类准确性，是特征提取和选择环节的主要任务，目的都是为了降低特征的维度，提高所选取的特征对分类的有效性。
4.1 模式识别概述
4.1.2 模式识别的基本概念
3.有监督学习与无监督学习
模式识别的核心是分类器，在已经确定分类器模型和样本特征的前提下，分类器通过某些算法找到自身最优参数的过程，称为分类器的训练，也称为分类器的“学习”。
根据训练样本集是否有类别标签，可以分为有监督学习和无监督学习。（1）有监督学习
1936年，英国学者Ronald Aylmer Fisher提出统计分类理论，奠定了统计模式识别的基础。 1960年，美国学者Frank Rosenblatt提出了感知机。 60年代，L.A.Zadeh（乍得）提出了模糊集理论，基于模糊数学理论的模糊模式识别方法得以发展和应用。
4.1 模式识别概述
由于过分追求训练样本集中样本的分类的正确性，从而导致的分类器泛化能力降低，称为分类器训练过程中“过拟合”。
4.1 模式识别概述
4.1.3 模式识别的基本方法
1.统计模式识别
统计模式识别原理： 1）根据待识别对象所包含的原始数据信息，从中提取出若干能够反映该类对象某方面性质的相应特征参数，并根据识别的实际需要从中选择一些参数的组合作为一个特征向量。 2）依据某种相似性测度，设计一个能够对该向量组表示的模式进行区分的分类器，就可把特征向量相似的对象分为一类。统计模式识别是主流的模式识别方法，其将样本转换成多维特征空间中的点，再根据样本的特征取值情况和样本集的特征值分布情况确定分类决策规则。其主要的理论基础包括概率论和数理统计；主要方法包括线性分类、非线性分类、Bayes分类器、统计聚类算法等。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

As mentioned above, our algorithm is inspired by the IRLS method. Recently, several methods similarly motivated by the IRLS have been proposed to minimize different objective functions. For instance, in [2], the Lq norm (for 1 ≤ q < 2) was minimized by iteratively minimizing a weighted L2 cost function. In [16], an iterated L1 algorithm was introduced to optimize non-convex functions that are the sum of convex data terms and concave smoothness terms. More recently, a general formulation (not restricted to weighted L2 or L1 minimization) was studied, together with the conditions under which such iteratively reweighted algorithms ensure the cost to decrease [1]. In the next section, we propose an extension of this formulation that will later allow us to tackle the case of multi-label MRFs.
Abstract
While widely acknowledged as highly effective in computer vision, multi-label MRFs with non-convex priors are difﬁcult to optimize. To tackle this, we introduce an algorithm that iteratively approximates the original energy with an appropriately weighted surrogate energy that is easier to minimize. Our algorithm guarantees that the original energy decreases at each iteration. In particular, we consider the scenario where the global minimizer of the weighted surrogate energy can be obtained by a multi-label graph cut algorithm, and show that our algorithm then lets us handle of large variety of non-convex priors. We demonstrate the beneﬁts of our method over state-of-the-art MRF energy minimization techniques on stereo and inpainting problems.
1. Introduction
In this paper, we introduce an algorithm to minimize the energy of multi-label Markov Random Fields (MRFs) with non-convex edge priors. In general, minimizing a multilabel MRF energy function is NP-hard. While in rare cases a globally optimal solution can be obtained in polynomial time, e.g., in the presence of convex priors [10], in most scenarios one has to rely on an approximate algorithm. Even though graph-cut-based algorithms [6] have proven successful for speciﬁc problems (e.g., metric priors), there does not seem to be a single algorithm that performs well with different non-convex priors such as the truncated quadratic, the Cauchy function and the corrupted Gaussian. Here, we propose to ﬁll this gap and introduce an iterative graph-cut-based algorithm to minimize multi-label MRF energies with a certain class of non-convex priors. Our algorithm iteratively minimizes a weighted surrogate energy function that is easier to optimize, with weights computed from the solution at the previous iteration. We show that, under suitable conditions on the non-convex priors, and as long as the weighted surrogate energy can be decreased, our approach guarantees that the true energy de-
creases at each iteration. More speciﬁcally, we consider MRF energies with arbitrary data terms and where the non-convex priors are concave functions of some convex priors over pairs of nodes. In this scenario, and when the label set is linearly ordered, the solution at each iteration of our algorithm can be obtained by applying the multi-label graph cut algorithm of [10]. Since the resulting solution is optimal, our algorithm guarantees that our MRF energy decreases. In fact, our method is inspired by the Iteratively Reweighted Least Squares (IRLS) algorithm which is wellknown for continuous optimization. To the best of our knowledge, this is the ﬁrst time that such a technique is transposed to the MRF optimization scenario. We demonstrate the effectiveness of our algorithm on the problems of stereo correspondence estimation and image inpainting. Our experimental evaluation shows that our method consistently outperforms other state-of-the-art graph-cut-based algorithms [6, 24], and, in most scenarios, yields lower energy values than TRW-S [13], which was shown to be one of the best-performing multi-label approximate energy minimization methods [22, 12].raph Cut for Multi-label MRFs with Non-convex Priors
Thalaiyasingam Ajanthan, Richard Hartley, Mathieu Salzmann, and Hongdong Li Australian National University & NICTA Canberra, Australia
1.1. Related work
Over the years, two different types of approximate MRF energy minimization methods have been proposed. The ﬁrst class of such methods consists of move-making techniques that were inspired by the success of the graph cut algorithm at solving binary problems in computer vision. These techniques include α-expansion, α-β swap [6] and multi-label moves [24, 23, 11]. The core idea of these methods is to reduce the original multi-label problem to a sequence of binary graph cut problems. Each graph cut problem can then be solved either optimally by the max-ﬂow algorithm [5] if the resulting binary energy is submodular, or approximately via a roof dual technique [4] otherwise. The second type of approximate energy minimization methods consists of message passing algorithms, such as belief propagation (BP) [8], tree-reweighted message passing (TRW) [25, 13] and the dual decomposition-based approach of [14], which TRW is a special case of.