Semi-supervised learning of hierarchical latent trait models for data visualization
参考文献的标准格式1
参考文献著录格式标准规范参考文献应引作者亲自阅过的近年的主要文献,一般文章不超过5篇,评述、综述性文章不超过20篇。
未公开发表的资料请勿引用,但可作脚注处理。
参考文献列于文后,正文中也须用上角标标出引用文献的序号。
参考文献用“顺序编码制”,即各篇文献按其在正文中的标注序号依次列出。
参考文献条目著录:个人著者采用姓在前,名在后的著录格式。
作者3人以下全部著录,4人以上只著录前3人,之间加“,”,后加“等” 或“etal”。
1.参考文献类型及标识2.文后参考文献编排格式(1)专著、论文集、学位论文、报告[序号]作者.文献题名[文献类型标识] .出版地:出版者.出版年,起止页码(任选).范例:[1]杨浩滨.食品微生物学[M].北京:北京农业大学出版社,1995,28-30.(2) 期刊文章[序号]主要责任者.文献题名[J].刊名,年,卷(期):起止页码.范例:[5]肖凯军.大豆分离蛋白的酶法改性[J]. 食品科学,1995,16(9):30-34.范例:[7]OUJP,YOSHIDA O,SOONGT, etal. Recent advance in research on applications energy dissipation systems[J]. Earthquake Eng , 1997,38(3): 358-361.(3) 论文集[序号]作者.文献题名[A].编者.论文题名[C].出版地:出版者,出版年,起止页码.范例:[8]瞿秋白.现代文明的问题与社会主义[A] .罗荣渠.从西化到现代化[C].北京:北京大学出版社,1990,121-133.(4) 报纸文章[序号]作者.文献题名[N].报纸名,出版日期(版次).范例:[10] 胡鞍钢. 中国能够实现粮食自给目标[N].联合早报,1994,10.(5) 国际、国家标准[序号]标准编号,标准名称[S].范例:[11]GB/T 16159-1996 ,汉语拼音正词法基本规则[S]. (6) 专利[序号]专利所有者.专利题名[P].专利国别:专利号,出版日期.范例:[12]姜锡洲. 一种温热外敷药制备方案[P].中国专利:881056073,1989,07,26.(7) 电子文献[序号]作者.电子文献题名[电子文献及载体类型标名].电子文献的出处或可获得地址,发表或更新日期/引用日期(任选).范例:[13]王明亮.关于中国学术期刊标准化数据库系统工程的进展[EB OL]. , 1998,08,16/1998,10,04.(8)各种未定义类型的文献[序号]主要责任者.文献题名.出版地:出版者,出版年.范例:[15]张永禄. 唐代长安词典[Z].西安:陕西人民出版社,1980. 毕业论文参考文献规范格式专著-M;论文集-C;报刊-N;期刊文章-J;学位论文-D;报告-R;专著或论文集中析出的文献-A;标准-S;专利- P;对于不属于上述的文献类型,采用字母“Z”标识。
半监督学习综述
缺点:误差也同时会自我迭代放大。
返回
19
协同训练(Co-training)
1998 年,Blum 和 Mitchell[11]提出了协同训练 方法。如图 3 所示,协同训练方法的基本训练过程 为:在有类标签的样本的两个不同视图(View)上分 别训练,得到两个不同的学习机,然后用这两个学 习机预测无类标签的样例的类标签,每个学习机选 择标记结果置信度最高的样例和它们的类标签加 入另一个学习机的有类标签的样本集中。这个过程 反复迭代进行,直到满足停止条件。这个方法需要 满足两个假设条件:(1)视图充分冗余(Sufficient and Redundant)假设,即给定足够数量的有类标签的样 本,基于每个视图都能通过训练得到性能很好的学 习机;(2)条件独立假设,即每个视图的类标签都条 件独立于另一视图给定的类标签。
17
半监督学习的主要方法
半监督学习算法按照不同的模型假设,可以大致将现 有的半监督学习算法分为五类:
自学习(Self-training) 基于生成模型的方法(EM with generative mixture
models) 协同训练(Co-training) 直推式支持向量机 (Transductive Support Vector
1
模式识别
模式识别(英语:Pattern Recognition),就是通过 计算机用数学技术方法来研究模式的自动处理和判读。 我们把环境与客体统称为“模式”。随着计算机技术 的发展,人类有可能研究复杂的信息处理过程。信息 处理过程的一个重要形式是生命体对环境及客体的识 别。对人类来说,特别重要的是对光学信息(通过视 觉器官来获得)和声学信息(通过听觉器官来获得) 的识别。这是模式识别的两个重要方面。市场上可见 到的代表性产品有光学字符识别、语音识别系统
Semi-Supervised Learning by
Under consideration for publication in Knowledge and InformationSystemsSemi-Supervised Learning by DisagreementZhi-Hua Zhou and Ming LiNational Key Laboratory for Novel Software TechnologyNanjing University,Nanjing210093,ChinaAbstract.In many real-world tasks there are abundant unlabeled examples but the number of labeled training examples is limited,because labeling the examples requires human efforts and expertise.So,semi-supervised learning which tries to exploit unla-beled examples to improve learning performance has become a hot topic.Disagreement-based semi-supervised learning is an interesting paradigm,where multiple learners are trained for the task and the disagreements among the learners are exploited during the semi-supervised learning process.This survey article provides an introduction to research advances in this paradigm.Keywords:Machine Learning;Data Mining;Semi-Supervised Learning;Disagreement-Based Semi-Supervised Learning1.IntroductionIn traditional supervised learning,hypotheses are learned from a large number of training examples.Each training example has a label which indicates the desired output of the event described by the example.In classification,the label indicates the category into which the corresponding example falls into;in regression,the label is a real-valued output such as temperature,height,price,etc.Advances in data collection and storage technology enable the easy accu-mulation of a large amount of training instances without labels in many real-world applications.Assigning labels to those unlabeled examples is expensive because the labeling process requires human efforts and expertise.For example, in computer-aided medical diagnosis,a large number of X-ray images can be obtained from routine examination,yet it is difficult to ask physicians to markReceived October16,2008Revised March16,2009Accepted April03,20092Z.-H.Zhou and M.Li all focuses in all images.If we use traditional supervised learning techniques to build a diagnosis system,then only a small portion of training data,on which the focuses have been marked,are useful.Due to the limited amount of labeled training examples,it may be difficult to get a strong diagnosis system.Then,a question arises:Can we leverage the abundant unlabeled training examples with a few labeled training examples to generate a strong hypothesis?Roughly speak-ing,there are three major techniques for this purpose[82],i.e.,semi-supervised learning,transductive learning and active learning.Semi-supervised learning[21,92]deals with methods for automatically ex-ploiting unlabeled data in addition to labeled data to improve learning per-formance,where no human intervention is assumed.Transductive learning is a cousin of semi-supervised learning,which also tries to exploit unlabeled data au-tomatically.The main difference between them lies in the different assumptions on the test data.Transductive learning takes a“close-world”assumption,i.e., the test data set is known in advance and the goal of learning is to optimize the generalization ability on this test data set,while the unlabeled examples are exactly the test examples.Semi-supervised learning takes an“open-world”assumption,i.e.,the test data set is not known and the unlabeled examples are not necessary test examples.In fact,the idea of transductive learning originated from statistical learning theory[69].Vapnik[69]believed that one often wants to make predictions on test examples at hand instead of on all potential examples, while inductive learning that seeks the best hypothesis over the whole distri-bution is a problem more difficult than what is actually needed;we should not try to solve a problem by solving a more difficult intermediate problem,and so, transductive learning is more appropriate than inductive learning.Up to now there is still a debate in the machine learning community on this learning phi-losophy.Nevertheless,it is well recognized that transductive learning provides an important insight into the exploitation of unlabeled data.Active learning deals with methods that assume that the learner has some control over the input space.In exploiting unlabeled data,it requires an oracle, such as a human expert,from which the ground-truth labels of instances can be queried.The goal of active learning is to minimize the number of queries for building a strong learner.Here,the key is to select those unlabeled examples where the labeling will convey the most helpful information to the learner.There are two major schemes,i.e.,uncertainty sampling and committee-based sampling. Approaches of the former train a single learner and then query the unlabeled example on which the learner is least confident[45];approaches of the latter generate multiple learners and then query the unlabeled example on which the learners disagree to the most[1,63].In this survey article,we will introduce an interesting and important semi-supervised learning paradigm,i.e.,disagreement-based semi-supervised learning. This line of research started from Blum&Mitchell’s seminal paper on co-training [13]1.Different relevant approaches have been developed with different names, and recently the name disagreement-based semi-supervised learning was coined [83]to reflect the fact that they are actually in the same family,and the key for the learning process to proceed is to maintain a large disagreement between base learners.Although transductive learning or active learning may be involved in some place,we will not talk more on them.In the following we will start by a 1This seminal paper won the“ten years best paper award”at ICML’08.Semi-Supervised Learning by Disagreement3 brief introduction to semi-supervised learning,and then we will go to the main theme to introduce representative disagreement-based semi-supervised learning approaches,theoretical foundations,and some applications to real-world tasks.2.Semi-Supervised LearningIn semi-supervised learning,a labeled training data set L={(x1,y1),(x2,y2),...,(x|L|,y|L|)}and an unlabeled training data set U={x 1,x 2,...,x|U|}are pre-sented to the learning algorithm to construct a function f:X→Y for predicting the labels of unseen instances,where X and Y are respectively the input space and output space,x i,x j∈X(i=1,2,...,|L|,j=1,2,...,|U|)are d-dimensional feature vectors drawn from X,and y i∈Y is the label of x i;usually|L| |U|.It is well-known that semi-supervised learning originated from[64].In fact, some straightforward use of unlabeled examples appeared even earlier[40,50, 52,53,57].Due to the difficulties in incorporating unlabeled data directly into conventional supervised learning methods(e.g.,BP neural networks)and the lack of a clear understanding of the value of unlabeled data in the learning process, the study of semi-supervised learning attracted attention only after the middle of1990s.As the demand for automatic exploitation of unlabeled data increases and the value of unlabeled data was disclosed by some early analyses[54,78], semi-supervised learning has become a hot topic.Most early studies did not provide insight or explanation to the reason why unlabeled data can be benefiler and Uyar[54]provided possibly the first explanation to the usefulness of unlabeled data from the perspective of data distribution estimation.They assumed that the data came from a Gaussian mixture model with L mixture components,i.e.,f(x|θ)=Ll=1αl f(x|θl),(1)whereαl is the mixture coefficient satisfying Ll=1αl=1,whileθ={θl}arethe model parameters.In this case,label c i can be considered a random variable C whose distribution P(c i|x i,m i)is determined by the mixture component m i and the feature vector x i.The optimal classification rule for this model is the MAP(maximum a posterior)criterion,that is,h(x)=arg maxkjP(c i=k|m i=j,x i)P(m i=j|x i),(2)whereP(m i=j|x i)=αj f(x i|θj)Ll=1αl f(x i|θl).(3)Thus,the objective of learning is accomplished by estimating the terms P(c i=k|m i=j,x i)and P(m i=j|x i)from the training data.It can be found that only the estimate of thefirst probability involves the class label.So,unla-beled examples can be used to improve the estimate of the second probability,and hence improve the performance of the learned ter,Zhang and Oles4Z.-H.Zhou and M.Li [78]analyzed the value of unlabeled data for parametric models.They suggested that if a parametric model can be decomposed as P(x,y|θ)=P(y|x,θ)P(x|θ), the use of unlabeled examples can help to reach a better estimate of the model parameters.There are two basic assumptions in semi-supervised learning,that is,the cluster assumption and the manifold assumption.The former assumes that data with similar inputs should have similar class labels;the latter assumes that data with similar inputs should have similar outputs.The cluster assumption concerns classification,while the manifold assumption can also be applied to tasks other than classification.In some sense,the manifold assumption is a generalization of the cluster assumption.These assumptions are closely related to the idea of low density separation,which has been taken by many semi-supervised learning algorithms.No matter which assumption is taken,the common underlying belief is that the unlabeled data provide some helpful information on the ground-truth data distribution.So,a key of semi-supervised learning is to exploit the distributional information disclosed by unlabeled examples.Many semi-supervised learning algorithms have been developed.Roughly speaking,they can be categorized into four categories,i.e.,generative meth-ods[54,56,64],S3VMs(Semi-Supervised Support Vector Machines)[22,37,42,44], graph-based methods[7–9,80,93],and disagreement-based methods[13,16,36, 48,55,85,88,89,91].In generative approaches,both labeled and unlabeled examples are assumed to be generated by the same parametric model.Thus,the model parameters directly link unlabeled examples and the learning objective.Methods in this category usually treat the labels of the unlabeled data as missing values of model parameters,and employ the EM(expectation-maximization)algorithm[29]to conduct maximum likelihood estimation of the model parameters.The methods differ from each other by the generative models used tofit the data,for example, mixture of Gaussian[64],mixture of experts[54],Na¨ıve Bayes[56],etc.The generative methods are simple and easy to implement,and may achieve better performance than discriminative models when learning with a very small number of labeled examples.However,methods in this category suffer from a serious deficiency.That is,when the model assumption is incorrect,fitting the model using a large number of unlabeled data will result in performance degradation [23,26].Thus,in order to make it effective in real-world applications,one needs to determine the correct generative model to use based on domain knowledge. There are also attempts of combining advantages of generative and discriminative approaches[4,33].S3VMs try to use unlabeled data to adjust the decision boundary learned from the small number of labeled examples,such that it goes through the less dense region while keeping the labeled data being correctly classified.Joachims [42]proposed TSVM(Transductive Support Vector Machine).This algorithm firstly initiates an SVM using labeled examples and assigns potential labels to unlabeled data.Then,it iteratively maximizes the margin over both labeled and unlabeled data with their potential labels byflipping the labels of the unlabeled examples on different sides of the decision boundary.An optimal solution is reached when the decision boundary not only classifies the labeled data as accu-rate as possible but also avoids going through the high density region.Chapelle and Zien[22]derived a special graph kernel using the low density separation cri-terion,and employed gradient descent to solve the SVM optimization problem. The non-convexity of the loss function of TSVM leads to the fact that there areSemi-Supervised Learning by Disagreement5 many local optima.Many studies tried to reduce the negative influence caused by the non-convexity.Typical methods include:employing a continuation approach, which begins by minimizing an easy convex objective function and sequentially deforms it to the non-convex loss function of TSVM[20];employing a determin-istic annealing approach,which decomposes the original optimization problem into a series of convex optimization problems,from easy to hard,and solves them sequentially[65,66];employing the convex-concave procedure(CCCP)[77] to directly optimize the non-convex loss function[25],etc.Thefirst graph-based semi-supervised learning method is possibly[11].Blum and Chawla[11]constructed a graph whose nodes are the training examples (both labeled and unlabeled)and the edges between nodes reflect certain re-lation,such as similarity,between the corresponding examples.Based on the graph,the semi-supervised learning problem can be addressed by seeking the minimum cut of the graph such that nodes in each connected component have the same ter,Blum et al.[12]disturbed the graph with some random-ness and produced a“soft”minimum cut using majority voting.Note that the predictive function in[11]and[12]is discrete,i.e.,the prediction on unlabeled ex-amples should be one of the possible labels.Zhu et al.[93]extended the discrete prediction function to continuous case.They modelled the distribution of the prediction function over the graph with Gaussian randomfields and analytically showed that the prediction function with the lowest energy should have the har-monic property.They designed a label propagation strategy over the graph using such a harmonic property.Zhou et al.[80]defined a quadratic loss of the predic-tion function over both the labeled and unlabeled data,and used a normalized graph Laplacian as the regularizer.They provided an iterative label propagation method yielding the same solution of the regularized loss function.Belkin and Niyogi[7]assumed that the data are distributed on a Riemannian manifold,and used the discrete spectrum and its eigenfunction of a nearest neighbor graph to reform the learning problem to interpolate over the data points in Hillbert space. Then,Belkin et al.[8,9]further extended the idea of manifold learning in semi-supervised learning scenario,and proposed manifold regularization framework in Reproducing Kernel Hillbert Space(RKHS).This framework directly exploits the local smoothness assumption to regularize the loss function defined over the labeled training examples such that the learned prediction function is biased to give similar output to the examples in a local region.Sindhwani et al.[67] embedded the manifold regularization into a semi-supervised kernel defined over the overall input space.They modified the original RKHS by changing the norm while keeping the same function space.This leads to a new RKHS,in which learning supervised kernel machines with only the labeled data is equivalent to a certain manifold regularization over both labeled and unlabeled data in original input space.Most of the previous studies on graph-based semi-supervised learning usually focus on how to conduct semi-supervised learning over a given graph.It is note-worthy that how to construct a graph which reflects the essential relationship between examples is a key that will seriously affect the learning performance. Although the graph construction might favor certain domain knowledge,some re-searchers have attempted to construct graphs of high quality using some domain-knowledge-independent properties.Carreira-Perpinan and Zemel[19]generated multiple minimum spanning trees based on perturbation to construct a robust graph.Wang and Zhang[70]used the idea of LLE[60]that instances can be reconstructed by their neighbors to obtain weights over the edges in the graph.6Z.-H.Zhou and M.Li Zhang and Lee[79]selected a better RBF bandwidth to minimize the predic-tive error on labeled data using cross validation.Hein and Maier[39]attempted to remove noisy data and hence obtained a better graph.Note that,although graph-based semi-supervised learning approaches have been used in many ap-plications,they suffer seriously from poor scalability.This deficiency has been noticed and some efforts have been devoted to this topic[34,76,94].Recently, Goldberg et al.[35]proposed an online manifold regularization framework as well as efficient solutions,which improves the applicability of manifold regularization to large-scale and real-time problems.The name disagreement-based semi-supervised learning was coined recently by Zhou[83],but this line of research started from Blum&Mitchell’s seminal work[13].In those approaches,multiple learners are trained for the same task and the disagreements among the learners are exploited during the learning process. Here,unlabeled data serve as a kind of“platform”for information exchange.If one learner is much more confident on a disagreed unlabeled example than other learner(s),then this learner will teach other(s)with this example;if all learners are comparably confident on a disagreed unlabeled example,then this example may be selected for query.Since methods in this category do not suffer from the model assumption violation,nor the non-convexity of the loss function,nor the poor scalability of the learning algorithms,disagreement-based semi-supervised learning has become an important learning paradigm.In the following sections, we will review studies of this paradigm in more detail.3.Disagreement-Based Semi-Supervised LearningA key of disagreement-based semi-supervised learning is to generate multiple learners,let them collaborate to exploit unlabeled examples,and maintain a large disagreement between the base learners.In this section,we roughly classify existing disagreement-based semi-supervised learning techniques into three cat-egories,that is,learning with multiple views,learning with single view multiple classifiers,and learning with single view multiple regressors.3.1.Learning with Multiple ViewsIn some applications,the data set has several disjoint subsets of attributes(each subset is called as a view).For example,the web page classification task has two views,i.e.,the texts appearing on the web page itself and the anchor text attached to hyper-links pointing to this page[13].Naturally,we can generate multiple learners with these multiple views and then use the multiple learn-ers to start disagreement-based semi-supervised learning.Note that there were abundant research on multi-view learning,yet a lot of work was irrelevant to semi-supervised learning,and so they are not mentioned in this section.Thefirst algorithm of this paradigm is the co-training algorithm proposed by Blum and Mitchell[13].They assumed that the data has two sufficient and redundant views(i.e.,attribute sets),where each view is sufficient for training a strong learner and the views are conditionally independent to each other given the class label.The co-training procedure,which is illustrated in Fig.1,is rather simple. In co-training,each learner is generated using the original labeled data.Then,Semi-Supervised Learning by Disagreement7Fig.1.An illustration of the co-training procedureeach learner will select and label some high-confident unlabeled examples for its peer ter,the learners will be refined using the newly labeled examples provided by its peer.With such a process,when two learners disagree on an unlabeled example,the learner which misclassifies this example will be taught by its peer.The whole process will repeat until no learner changes or a pre-set number of learning rounds has been executed.Blum and Mitchell[13]analyzed the effectiveness of the co-training algorithm,and showed that co-training can effectively exploit unlabeled data to improve the generalization ability,given that the training data are described by sufficient and redundant views which are conditionally independent to each other given the class label.Another famous multi-view semi-supervised learning algorithm,co-EM[55], combines multi-view learning with the probabilistic EM approach.This algo-rithm requires the base learners be capable of estimating class probabilities, and so na¨ıve Bayes classifiers are generally used.Through casting linear classi-fiers into a probabilistic framework,Brefeld and Scheffer[15]replaced the na¨ıve Bayes classifiers by support vector machines.The co-EM algorithm has also been applied to unsupervised clustering[10].Brefeld et al.[14]tried to construct a hidden markov perceptron[3]in each of the two views,where the two hidden markov perceptrons were updated according to the heuristic that,if the two perceptrons disagree on an unlabled example then each perceptron is moved towards that of its peer view.Brefeld et al.[14]did not mention how to extend this method to more than two views,but it could be able to move the perceptrons towards their median peer view when they disagree, according to the essential of their heuristics.However,the convergence of the process has not been proved even for two-view case.Brefeld and Scheffer[16] extended SVM-2K[32],a supervised co-SVM that minimizes the training error as well as the disagreement between the two views,to semi-supervised learning and applied it to several tasks involving structured output variables such as multi-class classification,label sequence learning and natural language parsing.In real-world applications,when the data has two views,it is rarely that the two views are conditionally independent given the class label.Even a weak8Z.-H.Zhou and M.Li conditional independence[2]is difficult to be met in practice.In fact,the as-sumption of sufficient and redundant views,which are conditionally independent to each other given the class label,is so strong that when it holds,a single labeled training example is able to launch a successful semi-supervised learning[91].Zhou et al.[91]effectively exploited the“compatibility”of the two views to turn some unlabeled examples into labeled ones.Specifically,given two suffi-cient and redundant views v1and v2(in this case,an instance is represented byx=(x(v1),x(v2))),a prediction function f vi is learned from each view respec-tively.Since the two views are sufficient,the learned prediction functions satisfyf v1(x(v1))=f v2(x(v2))=y,where y is the ground-truth label of x.Intuitively,some projections in these two views should have strong correlation with the ground-truth.For either view,there should exist at least one projection which is correlated strongly with the ground-truth,since otherwise this view could not be sufficient.Since the two sufficient views are conditionally independent given the class label,the most strongly correlated pair of projections should be in accor-dance with the ground-truth.Thus,if such highly correlated projections of these two views can be identified,they can help induce the labels of some unlabeled ex-amples.With those additional labeled examples,two learners can be generated, and then they can be improved using the standard co-training routine,i.e.,if the learners disagree on an unlabeled example,the learner which misclassifies this example will be taught by its peer.To identify the correlated projections,Zhou et al.[91]employed kernel canonical component analysis(KCCA)[38]tofind two sets of basis vectors in the feature space,one for each view,such that after projecting the two views onto the corresponding sets of basis vectors,the corre-lation between the projected views is maximized.Here the correlation strength (λ)of the projections are also given by KCCA.Instead of considering only the highest correlated projections,they used top m projections with the m-highest correlation strength.Finally,by linearly combining similarity in each projection, they computed confidenceρi of each unlabeled example x i as being with the same label as that of the single labeled positive example x0,as shown in Eq.4,ρi=mj=1λj sim i,j,(4)wheresim i,j=exp−d2P j(x(v1)i),P j(x(v1))+exp−d2P j(x(v2)i),P j(x(v2))(5)and d(a,b)measures the Euclidian distance between a and b.Thus,several unlabeled examples with the highest and lowest confidence values can be picked out,respectively,and used as extra positive and negative examples.Based on this augmented labeled training set,standard co-training can be employed for semi-supervised learning.Again,intuitively,when the two learners disagree on an unlabeled example,the learner which misclassifies this example will be taught by its peer.Such kind of method has been applied to content-based image retrieval[91]where there is only one example image in the first round of query.Semi-Supervised Learning by Disagreement9 3.2.Learning with Single View Multiple ClassifiersIn most real-world applications the data sets have only one attribute set rather than two.So the effectiveness and usefulness of the standard co-training is lim-ited.To take advantage of the interaction between learners when exploiting un-labeled data,methods that do not rely on the existence of two views have been developed.A straightforward way to tackle this problem is to partition the attribute sets into two disjoint sets,and conduct standard co-training based on the manually generated views.Nigam and Ghani[55]empirically studied the performance of standard co-training algorithm in this case.The experimental results suggested that when the attribute set is sufficiently large,randomly splitting the attributes and then conducting standard co-training may lead to a good performance.How-ever,many applications are not described by a large number of attribute,and co-training on randomly partitioned views is not always effective.Thus,a better way is to design single-view methods that can exploit the interaction between multiple learners rather than tailoring the data sets for standard two-view co-training.Goldman and Zhou[36]proposed a method that does not rely on two views. They employed different learning algorithms to train the two classifiers,respec-tively.It is required that each classifier is able to partition the instance space into a number of equivalence classes.In order to identify which unlabeled ex-ample to label,and to decide how to make the prediction when two classifiers disagree,ten-fold cross validations are executed such that the confidences of the two classifiers as well as the confidences of the equivalence classes that contain the concerned instance can be ter,this idea was extended to in-volving more learning algorithms[81].Note that although[36]does not rely on the existence of two views,it requires special learning algorithms to construct classifiers.This prevents its application to other kinds of learning algorithms.Zhou and Li[88]proposed the tri-training method,which requires neither the existence of two views nor special learning algorithms,thus it can be applied to more real-world problems.In contrast to the previous studies[13,36,55],tri-training attempts to exploit unlabeled data using three classifiers.Such a setting tackles the problem of determining how to efficiently select most confidently predicted unlabeled examples to label and producesfinal hypothesis.Note that the essential of tri-training is extensible for more than three classifiers,which will be introduced later.The use of more classifiers also provides a chance to employ ensemble learning techniques[84]to improve the performance of semi-supervise learning.Generally,tri-training works in the following way.First,three classifiers are initially trained from the original labeled data.Unlike[36],tri-training uses the same learning algorithm(e.g.,C4.5decision tree)to generate the three classifiers. In order to make the three classifiers diverse,the original labeled example set is bootstrap sampled[31]to produce three perturbed training sets,on each of which a classifier is then generated.The generation of the initial classifiers is similar to training an ensemble from the labeled example set using Bagging[17].Then, intuitively,in each tri-training round,if two classifiers agree on the labeling of an unlabeled example while the third one disagrees,then these two classifiers will teach the third classifier on this example.Finally,three classifiers are combined by majority voting.Note that the“majority teaches minority”strategy serves as an implicit confidence measurement,which avoids the use of complicated time-。
半监督深度学习图像分类方法研究综述
半监督深度学习图像分类方法研究综述吕昊远+,俞璐,周星宇,邓祥陆军工程大学通信工程学院,南京210007+通信作者E-mail:*******************摘要:作为人工智能领域近十年来最受关注的技术之一,深度学习在诸多应用中取得了优异的效果,但目前的学习策略严重依赖大量的有标记数据。
在许多实际问题中,获得众多有标记的训练数据并不可行,因此加大了模型的训练难度,但容易获得大量无标记的数据。
半监督学习充分利用无标记数据,提供了在有限标记数据条件下提高模型性能的解决思路和有效方法,在图像分类任务中达到了很高的识别精准度。
首先对于半监督学习进行概述,然后介绍了分类算法中常用的基本思想,重点对近年来基于半监督深度学习框架的图像分类方法,包括多视图训练、一致性正则、多样混合和半监督生成对抗网络进行全面的综述,总结多种方法共有的技术,分析比较不同方法的实验效果差异,最后思考当前存在的问题并展望未来可行的研究方向。
关键词:半监督深度学习;多视图训练;一致性正则;多样混合;半监督生成对抗网络文献标志码:A中图分类号:TP391.4Review of Semi-supervised Deep Learning Image Classification MethodsLYU Haoyuan +,YU Lu,ZHOU Xingyu,DENG XiangCollege of Communication Engineering,Army Engineering University of PLA,Nanjing 210007,ChinaAbstract:As one of the most concerned technologies in the field of artificial intelligence in recent ten years,deep learning has achieved excellent results in many applications,but the current learning strategies rely heavily on a large number of labeled data.In many practical problems,it is not feasible to obtain a large number of labeled training data,so it increases the training difficulty of the model.But it is easy to obtain a large number of unlabeled data.Semi-supervised learning makes full use of unlabeled data,provides solutions and effective methods to improve the performance of the model under the condition of limited labeled data,and achieves high recognition accuracy in the task of image classification.This paper first gives an overview of semi-supervised learning,and then introduces the basic ideas commonly used in classification algorithms.It focuses on the comprehensive review of image classification methods based on semi-supervised deep learning framework in recent years,including multi-view training,consistency regularization,diversity mixing and semi-supervised generative adversarial networks.It summarizes the common technologies of various methods,analyzes and compares the differences of experimental results of different methods.Finally,this paper thinks about the existing problems and looks forward to the feasible research direction in the future.Key words:semi-supervised deep learning;multi-view training;consistency regularization;diversity mixing;semi-supervised generative adversarial networks计算机科学与探索1673-9418/2021/15(06)-1038-11doi:10.3778/j.issn.1673-9418.2011020基金项目:国家自然科学基金(61702543)。
semi-supervised learning半监督学习
Semi-supervised clustering
65
6
Concept of semi-supervised learning
Unsupervised learning
Learning from data without label, e.g., clustering
data
65
2
Is data enough?
• Big data era, obtaining data is getting easier and easier.
P2P网络流 量分类
65
27
Other disagree-based algorithms
• TriTrain TriTrain is a semi-supervised algorithm, which iteratively refines each of the three component classifiers , and finally combines their prediction via majority voting. • CoForest CoForest is a semi-supervised algorithm, which exploits the power of ensemble learning and large amount of unlabeled data available to produce hypothesis with better performance. • COREG COREG is a co-training style semi-supervised regression algorithm, which employs two k-NN regressors using different distance metrics to select the most confidently labeled unlabeled examples for each other.
基于深度CNN和极限学习机相结合的实时文档分类
基于深度CNN和极限学习机相结合的实时文档分类闫河;王鹏;董莺艳;罗成;李焕【摘要】提出一种文档图像实时分类训练和测试的方法.在实际应用中,数据训练的精确性和高效性在文档图像识别中起着关键的作用.现有的深度学习方法不能满足此要求,因为需要大量的时间用于训练和微调深层次的网络架构.针对此问题,提出一种基于计算机视觉的新方法:第一阶段训练深度网络,作为特征提取器;第二阶段用极限学习机(ELM)用于分类.该方法的性能优于目前最先进的基于深度学习的相关方法,在Tobacco-3482数据集上的最终准确率为83.45%.与之前基于卷积神经网络(CNN)的方法相比,相对误差降低了26%.ELM的训练时间仅为1.156秒,对2 482张图像的整体预测时间是3.083秒.因此,该文档分类方法适合于大规模实时应用.【期刊名称】《计算机应用与软件》【年(卷),期】2019(036)003【总页数】6页(P174-179)【关键词】文档图像分类;CNN;迁移学习【作者】闫河;王鹏;董莺艳;罗成;李焕【作者单位】重庆理工大学计算机科学与工程学院重庆401320;重庆理工大学两江人工智能学院重庆401147;重庆理工大学计算机科学与工程学院重庆401320;重庆理工大学计算机科学与工程学院重庆401320;重庆理工大学计算机科学与工程学院重庆401320;重庆理工大学计算机科学与工程学院重庆401320【正文语种】中文【中图分类】TP391.410 引言如今,商业文件(见图1)通常由文档分析系统(DAS)进行处理,以减少工作人员的工作量。
DAS的一项重要任务是对文档进行分类,即确定文档所指的业务流程的类型。
典型的文档类是发票、地址变更或索赔等。
文档分类方法可分为基于图像[1-6]和基于内容的方法[7-8]。
DAS选取哪一种方法更合适,通常取决于用户处理的文档。
像通常的字母一样,自由格式的文档通常需要基于内容的分类,而在不同布局中包含相同文本的表单则可以通过基于图像的方法来区分。
211018237_改进YOLOv5+DeepSort的行人跟踪算法
现代电子技术Modern Electronics Technique2023年4月1日第46卷第7期Apr.2023Vol.46No.70引言近年来,计算机视觉在行人多目标跟踪[1]领域得到广泛的应用,该技术的发展有助于视频监视、辅助驾驶、人机交互等[2⁃4]。
行人多目标跟踪算法大多基于Trace⁃By⁃Detection [5⁃7],即检测和跟踪,通过检测器将结果传输到跟踪器中,进行卡尔曼预测和匈牙利算法匹配,得到匹配轨迹,输出跟踪结果。
文献[8]提出了卷积块注意模块(Convolutional BlockAttention Module ,CBAM ),它可以嵌入任何卷积网络架构中,更加关注重要特征、抑制次要特征,有效细化了中间特征。
文献[9]介绍了一种全新的损失函数SIoU ,其中考虑到回归之间的向量角度,重新定义了惩罚指标,能有效提高训练速度和推理准确性。
文献[10]利用简单在线和实时跟踪Sort 算法,使用卡尔曼滤波器与匈牙利算法相结合的思想,在线性环境下取得了不错的效果,但在非线性环境下,跟踪效果较差。
文献[11]在Sort 的基础上提出了DeepSort 算法,较之前算法引入了外观信息,改进YOLOv5+DeepSort 的行人跟踪算法韩晓冰,王雨田,黄综浏,张玮良(西安科技大学通信与信息工程学院,陕西西安710000)摘要:针对复杂环境道路行人跟踪易发生身份丢失、切换的问题,提出一种改进的YOLOv5检测并结合DeepSort 跟踪算法。
检测阶段,融合注意力模块CBAM 与YOLOv5颈部网络增强对行人特征的提取;用SIoU 边界框损失函数代替CIoU 边界框损失函数,加速边界框回归的同时提高准确定位度。
跟踪阶段,改进DeepSort 利用拓展卡尔曼滤波器对非线性环境行人位置进行预测,通过匈牙利算法匹配预测和检测轨迹,优化复杂环境下行人身份切换频繁的问题。
最后连接改进后的YOLOv5与DeepSort 算法,对MOT⁃16数据集进行检测跟踪。
联合训练生成对抗网络的半监督分类方法
光学 精密工程Optics and Precision Engineering第 29 卷 第 5 期2021年5月Vol. 29 No. 5May 2021文章编号 1004-924X( 2021)05-1127-09联合训练生成对抗网络的半监督分类方法徐哲,耿杰*,蒋雯,张卓,曾庆捷(西北工业大学电子信息学院,西安710072)摘要:深度神经网络需要大量数据进行监督训练学习,而实际应用中往往难以获取大量标签数据°半监督学习可以减小深度网络对标签数据的依赖,基于半监督学习的生成对抗网络可以提升分类效果,旦仍存在训练不稳定的问题°为进一步提高网络的分类精度并解决网络训练不稳定的问题,本文提出一种基于联合训练生成对抗网络的半监督分类方法,通 过两个判别器的联合训练来消除单个判别器的分布误差,同时选取无标签数据中置信度高的样本来扩充标签数据集,提高半监督分类精度并提升网络模型的泛化能力°在CIFAR -10和SVHN 数据集上的实验结果表明,本文方法在不同数量的标签数据下都获得更好的分类精度°当标签数量为2 000时,在CIFAR -10数据集上分类精度可达80.36% ;当标签 数量为10时,相比于现有的半监督方法,分类精度提升了约5%°在一定程度上解决了 GAN 网络在小样本条件下的过拟合问题°关键词:生成对抗网络;半监督学习;图像分类;深度学习中图分类号:TP391文献标识码:Adoi :10. 37188/OPE. 20212905.1127Co -training generative adversarial networks forsemi -supervised classification methodXU Zhe , GENG Jie * , JIANG Wen , ZHANG Zhuo , ZENG Qing -jie(School of E lectronics and Information , Northwestern Polytechnical University , Xian 710072, China )* Corresponding author , E -mail : gengjie@nwpu. edu. cnAbstract : Deep neural networks require a large amount of data for supervised learning ; however , it is dif ficult to obtain enough labeled data in practical applications. Semi -supervised learning can train deep neuralnetworks with limited samples. Semi -supervised generative adversarial networks can yield superior classifi cation performance ; however , they are unstable during training in classical networks. To further improve the classification accuracy and solve the problem of training instability for networks , we propose a semi -su pervised classification model called co -training generative adversarial networks ( CT -GAN ) for image clas sification. In the proposed model , co -training of two discriminators is applied to eliminate the distribution error of a single discriminator and unlabeled samples with higher confidence are selected to expand thetraining set , which can be utilized for semi -supervised classification and enhance the generalization of deep networks. Experimental results on the CIFAR -10 dataset and the SVHN dataset showed that the pro posed method achieved better classification accuracies with different numbers of labeled data. The classifi cation accuracy was 80. 36% with 2000 labeled data on the CIFAR -10 dataset , whereas it improved by收稿日期:2020-11-04;修订日期:2021-01-04.基金项目:装备预研领域基金资助项目(No. 61400010304);国家自然科学基金资助项目(No. 61901376)1128光学精密工程第29卷about5%compared with the existing semi-supervised method with10labeled data.To a certain extent, the problem of GAN overfitting under a few sample conditions is solved.Key words:generative adversarial networks;semi-supervised learning;image classification;deep learning1引言图像分类作为计算机视觉领域最基础的任务之一,主要通过提取原始图像的特征并根据特征学习进行分类[11o传统的特征提取方法主要是对图像的颜色、纹理、局部特征等图像表层特征进行处理实现的,例如尺度不变特征变换法[21,方向梯度法[31以及局部二值法[41等。
1_classifiers_bayes
25
Model Choice
Unsatisfied with the performance of our fish classifier and want to jump to another class of model
26
Training
Use data to determine the classifier. Many different procedures for training classifiers and choosing models
the patterns, and the classification task is based on their respective values.
Feature vectors: A number of features
x1 ,..., xl ,
T
constitute the feature vector
27
Evaluation
Measure the error rate (or performance and switch from one set of features to another one
28
Computational Complexity
What is the trade-off between computational ease and performance?
21
The Design Cycle
Data collection Feature Choice Model Choice Training Evaluation Computational Complexity
Hale Waihona Puke 2223Data Collection
论文中参考文献和图、表的标准格式
论文中参考文献和图、表的标准格式1.参考文献的作用着录参考文献是科技论文写作的传统贯例和必要内容。
首先,它体现了科学的严肃性、继承性和对他人劳动的尊重。
其次,表明作者使用参考文献的起点和深度。
一份完整的参考文献,也是作者向读者提供的一份有价值的信息检索资料。
第三,为了保持篇章结构的简洁和紧凑,也不宜在正文中过多地赘述参考文献的内容,而善于利用参考文献,则可以适当节省篇幅,提高论文质量。
2.文献的着录规则参考文献的着录要求采用顺序编码标控制,即按引用文献出现的先后顺序,在文献的著者或成果叙述文字的右上角用方括号标注阿拉伯数字编排序号,在参考文献表中按此序号依次著录。
具体讲,有以下几种情况。
参考文献类型:专著[M],论文集[C],报纸文章[N],期刊文章[J],学位论文[D],报告[R],标准[S],专利[P],论文集中的析出文献[A]电子文献类型:数据库[DB],计算机[CP],电子公告[EB]电子文献的载体类型:互联网[OL],光盘[CD],磁带[MT],磁盘[DK](1)文内的标注视则①引文写出原著者时,序号应置于著者姓名的右上角。
如在谈到当前我国优质烤烟生产中存在的问题时,朱尊权[2]指出:……②不写出著者时,序号应置于引文之后右上角。
如“营养不良.发育不全,成熟不够.烘烤不当”[3],这“四不”问题是符合我国烤烟生产实际情况的。
③当将参考文献序号作为文句的组成部分时,不用角码序号,而以正文形式用方括号序号列出。
如烟草增香剂的制各方法见文献[2]。
④当引用多篇文献时,只须将各篇文献的序号在一个角码内全部列出,各序号间用逗号隔开;如通连续序号,应在标注起止序号间加“~”连接。
如早期的研究结果[2,5,7~9]表明,……(2)文后的着录规则①书籍的基本著录项目和着录规则[序号]著者.书名.版本,出版地:出版者,出版年.页次.示例[1]管雨霖,张凤英,马超群,等.烟草病害诊断虫害识别及防治[M].北京:农业出版社,1989.78-81.②期刊的基本著录项目和着录规则[序号]著者.题(篇)名.刊名,出版年,卷(期)号:页次.示例[1]朱尊权.论当前我国优质烤烟生产技术导向[J].烟草科技,1994,(1):2-3.[2]金龙林.关于在论文中标往参考文献顺序码的位置问题[J].编辑学报,1994,6(4):240-241.③专利文献的基本著录项目和着录规则[序号]专利申请者.专利题名.专利国别,专利文献种类,专利号.出版日期.示例[1]孙中培,刘清,段完晶.烟草薄片用烟杆纤维物质.中国专利,发明,CN1074592A.1993-07-28.[2]牛聪阳,刘立全,荆海强.固体物料造粒机.中国专利,实用新型,ZL93 2 24039.9.1995-02-12.④报纸的基本著录项目和通用格式[序号]著者.题(篇)名.报纸名,出版年.月.日(版位).示例[1]刘彩望.保护烟农利益稳定烟叶生产:河南省提高优质烟奖励标准.东方烟草报,1995.1.25(1).A:专著、论文集、学位论文、报告[序号]主要责任者.文献题名[文献类型标识].出版地:出版者,出版年.起止页码(可选)[1]刘国钧,陈绍业.图书馆目录[M].北京:高等教育出版社,1957.15-18.B:期刊文章[序号]主要责任者.文献题名[J].刊名,年,卷(期):起止页码[1]何龄修.读南明史[J].中国史研究,1998,(3):167-173.[2]OU J P,SOONG T T,et al.Recent advance in research on applications of passi ve energy dissipation systems[J].Earthquack Eng,1997,38(3):358-361.C:论文集中的析出文献[序号]析出文献主要责任者.析出文献题名[A].原文献主要责任者(可选).原文献题名[C].出版地:出版者,出版年.起止页码[7]钟文发.非线性规划在可燃毒物配置中的应用[A].赵炜.运筹学的理论与应用——中国运筹学会第五届大会论文集[C].西安:西安电子科技大学出版社,1996.468.D:报纸文章[序号]主要责任者.文献题名[N].报纸名,出版日期(版次)[8]谢希德.创造学习的新思路[N].人民日报,1998-12-25(10).E:电子文献[文献类型/载体类型标识]:[J/OL]网上期刊、[EB/OL]网上电子公告、[M/CD]光盘图书、[DB/OL]网上数据库、[DB/MT]磁带数据库[序号]主要责任者.电子文献题名[电子文献及载体类型标识].电子文献的出版或获得地址,发表更新日期/引用日期[12]王明亮.关于中国学术期刊标准化数据库系统工程的进展[EB/OL].http://www.cajcd. /pub/wml.html,1998-08-16/1998-10-01.[8]万锦.中国大学学报文摘(1983-1993).英文版[DB/CD].北京:中国大百科全书出版社,1996.参考文献规范格式一、参考文献的类型参考文献(即引文出处)的类型以单字母方式标识,具体如下:M——专著C——论文集N——报纸文章J——期刊文章D——学位论文R——报告对于不属于上述的文献类型,采用字母“Z”标识。
传统神经网络ANN训练算法总结
传统神经⽹络ANN训练算法总结学习/训练算法分类神经⽹络类型的不同,对应了不同类型的训练/学习。
因⽽根据神经⽹络的分类,总结起来,传统神经⽹络的学习算法也可以主要分为以下三类:1)前馈型神经⽹络学习算法-----(前馈型神经⽹络)2)反馈型神经⽹络学习算法------(反馈型神经⽹络)3)⾃组织神经⽹络学习算法------(⾃组织神经⽹络)以下我们将通过三类典型的神经⽹络模型分别阐述这三类不同的学习算法其区别与相似点。
虽然针对不同的⽹络模型,这⾥产⽣了三类不同类型的训练算法,但是总结起来,这三类训练算法都可以归属到两种类型的机器训练⽅法中,即监督型学习算法和⾮监督型学习算法。
在20-30年的神经⽹络学习算法研究过程中,科学家往往都通过将监督型学习算法和⾮监督型学习算法进⾏单独或混合使⽤,提出并构建出了不同类型的训练算法及其改进算法。
因⽽总结起来现今的神经⽹络训练算法都可以归类到监督型学习算法和⾮监督型学习算法中,这在后续讲解的Deep Learning 中的DBNs⽹络学习中也会体现出来。
当然⽬前也提出⼀种半监督学习⽅法,其定义为。
半监督学习(Semi-supervised Learning)是模式识别和领域研究的重点问题,是监督学习与⽆监督学习相结合的⼀种学习⽅法。
它主要考虑如何利⽤少量的标注样本和⼤量的未标注样本进⾏训练和分类的问题。
半监督学习对于减少标注代价,提⾼学习机器性能具有⾮常重⼤的实际意义。
半监督学习是监督学习算法和⾮监督学习算法的结合体,可以认为是两种⽅法的结合型算法,其根源也归属为两类本质的学习算法,因⽽也逃不脱监督学习和⾮监督学习领域圈,这⾥我们就不再进⼀步深⼊讨论半监督学习算法了。
在以下传统神经⽹络训练算法的总结中我们也将具体指明具体的训练算法和监督型学习算法和⾮监督型学习算法的关系。
BP神经⽹络训练算法以下我们分析BP神经⽹络学习过程。
其学习算法基本步骤可以归纳如下:1、初始化⽹络权值和神经元的阈值(最简单的办法就是随机初始化)2、前向传播:按照公式⼀层⼀层的计算隐层神经元和输出层神经元的输⼊和输出。
semi-supervised contrastive learning 训练
semi-supervised contrastive learning 训练Semi-supervised contrastive learning is a training approach used in machine learning to leverage labeled and unlabeled data in a semi-supervised setting.In contrastive learning, the goal is to learn representations (embeddings) of data points such that similar points are pulled together and dissimilar points are pushed apart in the embedding space. This is typically done by training a deep neural network to maximize the agreement between augmented positive pairs (similar samples) while minimizing the agreement between augmented negative pairs (dissimilar samples).In a semi-supervised setting, both labeled and unlabeled data are available. Labeled data includes samples with known class labels, while unlabeled data does not have these labels. Semi-supervised contrastive learning utilizes both labeled and unlabeled data to improve the quality of learned representations.Specifically, the training process involves two main steps. First, labeled data is used to create positive and negative pairs. Positive pairs consist of two augmented versions of the same labeled sample, while negative pairs consist of two augmented versions of different labeled samples. The model is then trained to maximize agreement for positive pairs and minimize agreement for negative pairs.Next, the model is further trained on unlabeled data to generalize the learned representations beyond the labeled samples. This is done by encouraging augmented versions of similar unlabeledsamples to have consistent representations, while pushing apart augmented versions of dissimilar unlabeled samples.By utilizing both labeled and unlabeled data, semi-supervised contrastive learning can improve the performance of models compared to supervised learning approaches that only use labeled data. It allows the model to learn more robust representations and generalize better to unseen data.。
基于标记信息的新增型半监督图像分割技术研究(IJIGSP-V4-N1-7)
I.J. Image, Graphics and Signal Processing, 2012, 1, 51-56Published Online February 2012 in MECS (/) DOI: 10.5815/ijigsp.2012.01.07A New Enhanced semi supervised image segmentation using Marker as Prior information.L.Sankari 11.Asst Professor, Department of Computer Science,Sri Ramakrishna College of Arts and Science for women, Coimbatore, IndiaE-mail: sankarivnm@Dr.C.Chandrasekar 22.Associate Professor , Dept of Computer Science, Periyar University, Salem, IndiaE-mail: ccsekar@Abstract — In Recent days Semi supervised image segmentation techniques play a noteworthy role in image processing. Semi supervised image segmentation needs both labeled data and unlabeled data. It means that a Small amount of human assistance or Prior information is given during clustering process. This paper discusses an enhancedsemi supervised image segmentation method from labeledimage. It uses both a background selection marker and foreground object selection marker separately. The EM (Expectation Maximization) algorithm is used for clusteringalong with must link constraints. The proposed method isapplied for natural images using MATLAB 7. Thus the proposed method extracts Object of Interest (OOI) from OONI (Object of Not Interest) efficiently and the experimental results are compared with Standard K Means and EM Algorithm also. The results show that the proposedsystem gives better results than the other two methods. Itmay also be suitable for object extraction from natural images and medical image analysis.Index Terms — Semi supervised image segmentation - prior knowledge – constrained clustering.I. I NTRODUCTIONImage segmentation is the method of dividing an image into different regions such that each region is homogeneous. By partitioning an image into a set of disjoint segments, image segmentation leads to more compact image representation. As the central step in computer vision and image understanding, image segmentation has been extensively investigated in the past decades, with a large number of image segmentation algorithms. There are number of segmentation techniques exist in the literature. But no single method can be considered best for all kind of images. Most of the techniques are being pretty ad hoc in nature.A)Need of semi supervised image segmentation:Semi supervised method is the combination of both supervised (classification) and unsupervised (clustering) and classification concept. Before performing clustering some prior knowledge is given. If the algorithm is purely an unsupervised (clustering) algorithm it will not show good result for all kind of images since an iterativeclustering algorithms commonly do not lead to optimal cluster solutions. Partitions that are generated by these algorithms are known to be sensitive to the initial partitions that are fed as an input parameter. A “good” selection of initial seed is an important clustering problem. Likewise the classification algorithm will not give best solution since the result depends on type of classifier. So this paper discuss about combination ofthese two methods called ‘semi supervised model’ for image segmentation. The following section discuss about semi supervised model[13][14]. b.)Semi supervised clustering: During clustering process a small amount of priorknowledge is given either as labels or constraints or any other prior information. The following figure explains about the semi supervised clustering model[15][16][17].Figure 1. Semi supervised clustering modelIn the above figure, the three clusters are formedusing certain constraints or prior information. Besides the similarity information which is used as color knowledge, the other kind of knowledge is also available by either pair wise (must-link or cannot-link) constraints between data items or class labels for some items. Instead of simply using this knowledge for the external validation of the results of clustering, one can imagine letting it “guide” or “adjust” the clustering process, i.e. provide a limited form of supervision. There are two ways to provide information for semi supervised clustering.1. Search based.2. Similarity based.52 A New Enhanced semi supervised image segmentation using Marker as Prior information1.Search based :The clustering algorithm itself is modified so that user-provided constraints or labels can be used to bias the search for an appropriate clustering. This can be done in several ways, such as by performing a transitive closure of the constraints and using them to initialize clusters [4], by including the cost function a penalty for lack of compliance with the specified constraints [10][11], or by requiring constraints to be satisfied during cluster assignment in the clustering process [12].2.Similarity based:There are several similarity measures existing in the domain. Any one similarity measure is adapted[7][8][9] so that the given constraints can be easily satisfied.In this paper semi supervised image segmentation with minimum user label is discussed. Instead of selecting some sample pixels with mouse clicks [1], a group of pixels are selected as a region using mouse. A pixel which has the same color and intensity as in the selected marker region (by mouse selection) will come under one cluster and others pixels will not be. This concept is given in detail in the following sections.I.P REVIOUS R ELATED W ORKSRecently there are many papers focusing the importance of semi supervised image segmentation. Among them a few papers are analyzed. According to paper [3], the semi-supervised C-Means algorithm is introduced in this paper to solve three problems in the domains like choosing and validating the correct number of clusters, Insuring that algorithmic labels correspond to meaningful physical labels tendency to recommend solutions that equalize cluster populations. The algorithm used MRI brain image for segmentation.In this [4] paper, how the popular k-means Clustering algorithm can be modified to make use of the available information with some artificial constraints. This method was implemented for six datasets and it has showed good improvement in clustering accuracy. This method was also applied to the real world problem of automatically detecting road lanes from GPS data and observed dramatic increases in performance.In paper [5] a novel semi-supervised Fuzzy C-means algorithm is proposed. A set called as seed set which contains a small amount of labeled data is used. First, an initial partition in the seed set is done, then use the center of each partition as the cluster center and optimize the objective function of FCM using EM algorithm. Experiments results show that the defect of fuzzy c-means is avoided that is sensitive to the initial centers partly and give much better partition accuracy.In Paper [6], Semi-supervised clustering uses a small amount of labeled data to aid and bias the clustering of unlabeled data. Here labeled data is used to generate initial seed clusters along with the constraints generated from labeled data to guide the clustering process. It introduces two semi-supervised variants of KMeans clustering that can be viewed as instances of the EM algorithm, where labeled data provides prior information about the conditional distributions of hidden category labels. Experimental results demonstrate the advantages of these methods over standard random seeding and COP-KMeans, a previously developed semi-supervised clustering algorithm.This paper [12] focuses on semi-supervised clustering, where the goal is to cluster a set of data-points given a set of similar/dissimilar examples. Along with instance-level equivalence (similar pairs belong to the same cluster) and in-equivalence constraints (dissimilar pairs belong to different clusters) feature space level constraints (how similar are two regions in feature space) are also used for getting final clustering. This task is accomplished by learning distance metrics (i.e., how similar are two regions in the feature space?) over the feature space which that are guided by the instance-level. A bag of words models, which are nothing but code words (or visual-words) are used as building blocks. Our proposed technique learns non-parametric distance metrics over codewords from these equivalence (and optionally, in-equivalence) constraints, which are then able to propagate back to compute a dissimilarity measure between any two points in the feature space. Thus this work is more advanced than previous works. First, unlike past efforts on global distance metric learning which try to transform the entire feature space so that similar pairs are close. This transformation is non-parametric and thus allows arbitrary non-linear deformations of the feature space. Second, while most Mahalanobis metrics are learnt using Semi-Definite Programming (SDP), this paper discuss about a Linear Program (LP) and in practice, is extremely fast. Finally, Corel image datasets (MSRC, Corel) where ground-truth segmentation is available. Over all, this idea gives improved clustering accuracy.II.METHODOLOGYIn this paper, ground truth image is taken with proper class labels. The Octree color quantization algorithm is applied to get the reduced colors. This color table is integrated with must link constraints for the given image using EM algorithm.A group of pixels as a region must be selected separately for object of interest (OOI) and object of not interest (OONI) for back ground and foreground from an input image. Here OOI and OONI refer foreground and back ground objects respectively. Find the smallestA New Enhanced semi supervised image segmentation using Marker as Prior information 53distance for each and every pixel in the marker to its neighboring pixels using Mahalanobi’s formula.---- (1) Wherex row vector which contains the pixels inside a marker and the other area.COV represents sample covariance matrix.If this distance is less than the assumed threshold value ( 0.5) then find the exact color index. If the index values are same then assign to the same region otherwise (same value & different class index) delete any one index and group. The above process is repeated for all pixels in the marker. Finally the object of interest (OOI) is clearly segmented than the other methods.Fig 1. Work Flow Diagram of the proposed work The algorithm based on proposed idea is given below:beled image is taken as input image.2.Octree color quantization to reduce the colorsand store with class label.3.Mark object of interest (OOI) and object of notinterest (OONI) using mouse.4.Cluster quantized color table using EMAlgorithm interated with must-link constraints.Must link constraints means that if any pointbelongs to same region, group them into oneregion otherwise need not group.5. Repeats steps 4 and 5 for each point in object ofinterest(OOI) and object of not interest(OONI)6. Let X be a selection with N points7. For each point in X na.Let Y = x i (x ∈ X)b.Consider its neighboring coordinates withina rectangle, R of size 3 x 3.c.Calculate distance vector usingMahalanobis distance d(R i, Y)d.Find minimum distance d(R i, Y).e.If d < threshold then find Color index of Yand R ii.If belong to same label, group intosame regionii.If belong to different labels buthave same color then delete R iIII.RESULTS AND DISCUSSIONSFigure 2:Here three different labeled images are taken as input .These three images are partially separated images.Fig 2.a Fig 2.bFig 2.cFigure 3:Marking of OOI (Object of interest) and OONI (Objectof not interest) for all the above three figures using blueand green color.Object of intereste (Green Color mark) Object of Not interest (Blue color mark )54A New Enhanced semi supervised image segmentation using Marker as Prior informationFig 3.a Fig 3.Fig 3.cFigure 4:Resullt of proposed method for the above three figures.Fig 4.aFig 4.bFig 4.c Figure 5:Labeled image and different marker selection.Fig 5.a Fig.5.bFig 5.cFigure 6:Results using proposed idea for the above figure 5 usingdifferent marker selection..Fig 6.a Fig 6.b(From these two pictues the object is segmented properly)Fig.6.cThe object of interest is not segmented properlyFigure 7:Results of input images using K Means method.Fig 7.aFig 7.bFig 7.cFigure 8:Result of input images using Standard EM method.Fig 8.aFig.8.bA New Enhanced semi supervised image segmentation using Marker as Prior information 55Fig 8.cThe time taken for getting the result of the input image1 (Bird) using proposed method is noted down in the table.Table 1. Performance Table for image1It shows that the proposed method has taken more time than K means and less time than EM method. The performance is shown in the chart given below.Figure 9. Performance chartUsing proposed idea any labeled image is taken as input. The two markings are given like OOI and OONI(object of interest and object of not interest) with different colors. There are two different marker selections are shown in the figure 3 and 5. According to the marker selection given, the object is extracted from its background. Compared to the results in Figure 4, the segmentation results are better in figure 6 for some pictures. This is because the quality of segmentation depends on marker selection. These proposed results are also compared with K means and Standard EM Algorithm. In Figure7 and 8, the K Means & EM Algorithms do not segment the foreground of the object accurately. But the proposed method results are better in figure 6 and the object of interest is also separatedaccurately.IV. CONCLUSIONThe above result shows that the proposed semi supervised segmentation extracts the object of interest precisely. But the result of segmentation depends on the marker selection on left and right side of the image. If the marker selection is not given properly, the result of segmentation will not be good. This may be eliminated in future by adding certain other constraints for texture, color etc.REFERENCES[1] Yuntao Qian, Wenwu Si, IEEE, ” Semi-supervised Color Image Segmentation Method”-2005[2] Yanhua Chen, Manjeet Rege, Ming Dong, JingHua FarshadFotouhi Department of Computer Science Wayne State UniversityDetroit, MI48202 “Incorporating User Provided Constraints into Document Clustering”,2009[3] Amine M. Bensaid, Lawrence O. Hall Department ofComputer Science and Engineering Universit of South Florida Tampa, Partially Supervised Clustering for Image Segmentation -1994[4] Kiri Wagstaff, Claire Cardie ,Seth Rogers &StefanSchroedl ,”Constrained K-means Clustering with Background Knowledge-2001” [5]Kunlun Li; Zheng Cao; Liping Cao; Rui Zhao; Coll. Of Electron. & Inf. Eng., Hebei Univ., Baoding, China ,“A novel semi-supervised fuzzy c-means clustering method”2009,IEEE Explorer[6]Sugato Basu , Arindam Banerjee , R. Mooney , In proceedings of 19th international conference on Machine Learning(ICML-2002),Semi-supervised Clustering by Seeding (2002)[7] David Cohn, Rich Caruana, and Andrew McCallum. Semi-supervised clustering with user feedback, 2000.[8]Dan Klein, Sepandar D. Kamvar, and Christopher D. Manning. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In Proceedings of the 19th International Conference on Machine Learning, pages 307–314. Morgan Kaufmann Publishers Inc., 2002.[9]Eric P. Xing, Andrew Y. Ng, Michael I. Jordan, and Stuart Russell. Distance metric learning with application to clustering with sideinformation. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 505–512, Cambridge, MA, 2003. MIT Press.[10] A. Demiriz, K. Bennett, and M. Embrechts. Semi-supervised clustering using genetic algorithms. In C. H. Dagli et al., editor, Intelligent Engineering Systems Through Artificial Neural Networks 9, pages 809–814. ASME Press, 1999. [11] [11] K.Wagstaff and C. Cardie. Clustering with instance-level constraints. In Proceedings of the 17th International Conference on Machine Learning, pages 1103–1110, 2000. [12]Dhruv Batra, Rahul Sukthankar and Tsuhan Chen,“Semi-Supervised Clustering via LearntCodeword Distances”,2008.[13]Richard Nock, Frank Nielsen,Semi supervisedstatistical region refinement for color image segmentation,2005S NoMethodsTime in sec.1 K Means 6.32142 EM 40.22405 3Proposed13.0559856 A New Enhanced semi supervised image segmentation using Marker as Prior information [14]` Jan Kohout,Czech Technical University in PragueFaculty of Electrical Engineering,Supervisor: Ing .Jan Urban,Semi supervised image segmentation ofbiological samples-PPT,July 29, 2010[15] Ant´onio R. C. Paiva1 and Tolga Tasdizen,Fast SemiSupervised image segmentation by novelty selection,2009[16] Kwangcheol Shin and Ajith Abraham,Two Phase Semi-supervised ClusteringUsing Background Knowledge,2006.[17] M´ario A. T. Figueiredo, Dong Seon Cheng, Vittorio Murino,Clustering Under Prior Knowledge with Application toImage Segmentation,2005.Biography:Mrs L.Sankari is currently workingas an Assistant Professor in theDepartment of Computer Science, SriRamakrishna College of Arts andScience for women, Coimbatore- 641044, Tamilnadu, India. She is about16 years of teaching experience. She has published fournational and four international research papers. Her researchinterest area includes image processing, Data mining , Patternclassification and optimization techniques.Dr. C. Chandrasekar received hisPh.D. degree from Periyar University,Salem. He has been working asAssociate Professor at Dept. ofComputer Science, Periyar University,Salem – 636 011, TamilNadu, India.His research interest includes Wirelessnetworking, Mobile computing, Computer Communication andNetworks. He was a Research guide at various University inIndia. He has been published more than 50 research papers atvarious National/ International Journals.。
半监督降维方法的实验比较
软件学报ISSN 1000-9825, CODEN RUXUEW E-mail: jos@Journal of Software,2011,22(1):28−43 [doi: 10.3724/SP.J.1001.2011.03928] +86-10-62562563 ©中国科学院软件研究所版权所有. Tel/Fax:∗半监督降维方法的实验比较陈诗国, 张道强+(南京航空航天大学计算机科学与工程系,江苏南京 210016)Experimental Comparisons of Semi-Supervised Dimensional Reduction MethodsCHEN Shi-Guo, ZHANG Dao-Qiang+(Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China)+ Corresponding author: E-mail: dqzhang@Chen SG, Zhang DQ. Experimental comparisons of semi-supervised dimensional reduction methods. Journalof Software, 2011,22(1):28−43. /1000-9825/3928.htmAbstract: Semi-Supervised learning is one of the hottest research topics in the technological community, whichhas been developed from the original semi-supervised classification and semi-supervised clustering to thesemi-supervised regression and semi-supervised dimensionality reduction, etc. At present, there have been severalexcellent surveys on semi-supervised classification: Semi-Supervised clustering and semi-supervised regression, e.g.Zhu’s semi-supervised learning literature survey. Dimensionality reduction is one of the key issues in machinelearning, pattern recognition, and other related fields. Recently, a lot of research has been done to integrate the ideaof semi-supervised learning into dimensionality reduction, i.e. semi-supervised dimensionality reduction. In thispaper, the current semi-supervised dimensionality reduction methods are reviewed, and their performances areevaluated through extensive experiments on a large number of benchmark datasets, from which some empiricalinsights can be obtained.Key words: semi-supervised dimensionality reduction; dimensionality reduction; semi-supervised learning; classlabel; pairwise constraint摘要: 半监督学习是近年来机器学习领域中的研究热点之一,已从最初的半监督分类和半监督聚类拓展到半监督回归和半监督降维等领域.目前,有关半监督分类、聚类和回归等方面的工作已经有了很好的综述,如Zhu的半监督学习文献综述.降维一直是机器学习和模式识别等相关领域的重要研究课题,近年来出现了很多将半监督思想用于降维,即半监督降维方面的工作.有鉴于此,试图对目前已有的一些半监督降维方法进行综述,然后在大量的标准数据集上对这些方法的性能进行实验比较,并据此得出了一些经验性的启示.关键词: 半监督降维;降维;半监督学习;类别标号;成对约束中图法分类号: TP181文献标识码: A在很多机器学习和模式识别的实际应用中,人们经常会遇到高维数据,如人脸图像、基因表达数据、文本∗基金项目: 国家自然科学基金(60875030); 模式识别国家重点实验室开放课题(20090044)收稿时间: 2009-12-18; 定稿时间: 2010-07-28CNKI网络优先出版: 2010-11-05 11:50, /kcms/detail/11-2560.TP.20101105.1150.001.html陈诗国 等:半监督降维方法的实验比较29数据等.直接对这些高维数据进行处理是非常费时且费力的,而且由于高维数据空间的特点,容易出现所谓的“维数灾难”问题[1].降维(dimensionality reduction)是根据某一准则,将高维数据变换到有意义的低维表示[2].因此,降维能够在某种意义上克服维数灾难.根据是否使用了类别标号,传统的降维方法可以分为两类:无监督降维,如主成分分析(PCA)[3];有监督降维,如线性判别分析(LDA),或称为Fisher 判别分析(FDA)[4].在很多实际任务中,无标号的数据往往很容易获取,而有标号的数据则很难获取.为了获得更好的学习精度同时又要充分地利用现有的数据,出现了一种新的学习形式,即半监督学习(semi-supervised learning).相比较传统的学习方法,半监督学习可以同时利用无标号数据和有标号数据,只需较少的人工参与就能获得更精确的学习精度,因此,无论在理论上,还是在实践中,都受到越来越多的关注.目前,半监督学习已从最初的半监督分类和半监督聚类拓展到半监督回归和半监督降维等领域.有关半监督分类、聚类和回归等方面的工作已经有了很好的综述,如Zhu 的半监督学习文献综述[5].然而据我们所知,目前国内外尚未专门针对半监督降维方面的综述工作.因此,本文试图对目前已有的一些半监督降维方法进行综述,然后在大量的标准数据集上比较这些方法的性能,并根据实验得出一些经验性的启示.本文第1节介绍降维的概念,并对已有的降维方法进行分类.第2节详细介绍当今较为流行的几种典型的半监督降维方法,本文将这些方法大致分为3类:基于类别标号的方法、基于成对约束的方法和基于其他监督信息的方法.第3节分别在UCI 标准数据集、半监督学习数据集[6]和标准人脸数据上比较第2节中介绍的半监督降维方法,并作一些经验性的讨论.最后一节对本文的工作进行总结并指出下一步研究的方向.1 降 维给定一批观察样本,记X ∈R D ×N ,包含N 个样本,每个样本有D 个特征.降维的目标是:根据某个准则,找到数据的低维表示Z ={z i }∈R d (d <D ),同时保持数据的“内在信息(intrinsic information)”[3].当降维方法为线性时,降 维的过程就转变为学习一个投影矩阵1{}d D d i i W w R ×==∈,使得Z =W T X (1)其中,T 表示矩阵的转置操作.当降维方法为非线性时,不需要学习这样一个投影矩阵W ,而直接从原始数据中学习得到低维的数据表示Z .图1对当今流行的一些降维方法进行了分类.首先,根据是否使用数据中的监督信息,将所有方法分成监督的(supervised)、半监督的(semi-supervised)和无监督的(unsupervised)降维这样3类.其中,根据监督信息的不同,半监督降维又可分为基于类别标号(class label)的、基于成对约束(pairwise constraints)的和基于其他监督信息的3类方法.然后再根据算法模型的不同,又可以将所有的降维方法分成线性(linear)降维和非线性(nonlinear)降维.最后列出了一些有代表性的降维方法以及该方法所出自的文献.下面对图中出现的每一种降维方法进行简要的说明.线性判别分析(LDA)也叫Fisher 判别分析(FDA)[4],是当今最流行的监督降维方法之一.其主要思想是,寻找一个投影矩阵,使得降维之后同类数据之间尽量紧凑,而不同类别数据之间尽量分离.Baudat 和Anouar 使用核技巧,把LDA 扩展到非线性形式,即广义判别分析(GDA)[7].间隔Fisher 判别分析(MFA)[8]和局部Fisher 判别分析(LFDA)[9]是传统FDA 的两个扩展版本.与上面的方法不同,判别成分分析(DCA)[10]是利用成对约束进行度量学习(成对约束的定义将在第2.2节中给出介绍).其主要思想与LDA 类似:寻找一个投影矩阵,使得降维之后正约束数据之间尽量紧凑,而负约束数据之间尽量分离.KDCA 是它的核化版本[10].上述降维方法都是有监督的,需要知道数据的某种监督信息,如类别标号或者成对约束等.无监督降维则不需要这些监督信息,它直接利用无标号的数据,在降维过程中保持数据的某种结构信息.主成分分析(PCA)[3]是一种典型的无监督降维方法,其目的是寻找在最小平方意义下最能够代表原始数据的投影[1].与PCA 不同,多维尺度分析(MDS)[11]使得变换后的低维数据点之间的欧氏距离与原数据点之间的欧氏距离尽量保持一致.非负矩阵分解(NMF)[12]则基于这样的假设:数据矩阵可以分解为两个非负矩阵的乘积——基矩阵和系数矩阵.核PCA(KPCA)[13]是传统PCA 方法的核化版本.KPCA 中的核函数需要人为地指定,而最大方差展开(MVU)[14]则通30 Journal of Software 软件学报 V ol.22, No.1, January 2011过对数据的学习直接得到核矩阵.除了上面提到的一些非监督降维方法以外,流形学习(manifold learning)是最近发展起来的一种新的降维方法,它假设数据采样于高维空间中的一个潜在流形上,通过寻找这样一个潜在的流形很自然地找到高维数据的低维表示.ISOMAP [15]、局部线性嵌入(LLE)[16]、拉普拉斯特征映射(LE)[17]和局部保持投影(LPP)[18]是流形学习的代表性方法.Fig.1 Taxonomy of dimensionality reduction methods图1 降维方法的分类图上面分别介绍了监督降维和无监督降维的一些具有代表性的方法,下一节重点介绍半监督降维以及一些典型的半监督降维方法.2 半监督降维把半监督学习思想用于降维,就形成了半监督学习的一个新的分支,即半监督降维.半监督降维是传统降维方法的有效综合,它既可以像监督降维方法那样利用数据标号,又可以像无监督降维方法那样保持数据的某种结构信息,如数据的全局方差、局部结构等等.因此,半监督降维能够克服传统降维方法的缺点,有重要的研究价值和广阔的应用前景.根据使用监督信息的不同,半监督降维方法可以大致分成3类:(1) 基于类别标号的方法;(2) 基于成对约束的方法;(3) 基于其他监督信息的方法. 2.1 基于类别标号的半监督降维首先,我们给出基于类别标号半监督降维方法的数学描述.假设有N 个数据1{}N i X x ==,每个数据的维数为D ,即x i ∈R D (i =1,2,…,N ).在所有数据中,L 个数据已经知道类别标号,记为11{(,)}L i i i X x y ==,其中,x i 表示第i 个数据, y i 是x i 的类别标号,总共有C 个类;剩下的数据没有类别标号,记为21{}N j j L X x =+=.半监督降维的目的是,利用有类 别标号的数据和无类别标号的数据X ={X 1,X 2},寻找数据的低维表示Z ={z i }∈R d (d <D ).近年来,研究者们已经提出了多种基于类别标号的半监督降维方法.Yu 等人在概率PCA 模型[19]的基础上加入了类别标号信息,提出了监督和半监督形式的概率PCA 模型[20].Costa 和Hero 在构造拉普拉斯图时引入了类别标号信息,得到了拉普拉斯特征映射算法的一种监督和半监督版本[21].Cai 等人在传统的LDA 方法中加入流形正则化项,提出了一种半监督的判别分析方法SDA [22].Song 等人提出了一个半监督降维方法框架[23],SDA 可以看成是该框架下的一个例子.Zhang 等人在文献[24]中提出的半监督降维方法与SDA 方法相似,也是使用陈诗国 等:半监督降维方法的实验比较31正则化项来保持数据的流形结构.所不同的是,它使用了一种基于路径的鲁棒的相似性来构造邻接图.文献[25]在最大化LDA 准则中加入没有类别标号的数据,使用约束凹凸过程解决最终的优化问题.Chen 等人把LDA 重写成最小平方的形式,通过加入拉普拉斯正则化项,该模型可以转化为一个正则化的最小平方问题[26].最近,Sugiyama 把局部Fisher 判别分析(LFDA)和PCA 结合起来,提出了一种半监督局部降维Fisher 判别分析SELF [27].Chatpatanasiri 等人从流形学习的角度提出了一个半监督降维框架.在该框架下,可以很容易地把传统的Fisher 判别分析扩展到半监督的形式[28].下面我们简要介绍5种半监督降维方法:半监督概率PCA(S2PPCA)[19]、分类约束降维(CCDR)[21]、半监督判别分析(SDA)[23]和两个半监督Fisher 判别分析SELF [27],SSLFDA [28].最后,对这几种方法的属性以及它们之间的关系进行分析.2.1.1 半监督概率PCA(S2PPCA)S2PPCA 是概率PCA 模型[20]的半监督版本.首先,仅考虑有标号的样本X 1.假设样本(x ,y )由下列隐变量模型生成:x =W x z +μx +εx , y =f (z ,Θ)+εy (2)这里,f (z ,Θ)=[f 1(z ,θ1),…,f c (z ,θC )]T 是关于类别标号的函数,其中Θ={θ1,…,θC }表示C 个确定性函数f 1,…,f C 的参数(假设每个函数f c ,c =1,…,C ,都是关于隐变量z 的线性函数(,)cT cc c y y f z w z θμ=+,则f (z ,Θ)=W x z +μy ).z ~N (0,I ).z ~N (0,I )是输入x 与输出y 所共享的隐变量.两个相互独立的噪声模型被定义成各向同性的高斯函数,即22~(0,),~(0,)x x x x N I N I εδεδ.因此,对隐变量z 求积分,得到样本(x ,y )的似然函数:(,)(,|)()d (|)(|)()d P x y P x y z P z z P x z P y z P z z ==∫∫ (3)这里,22|~(,),|~(,)x x x y y y x z N W z I y z N W z I μθμθ++.如果样本之间相互独立,则有11()(,)L i i i P X P x y ==∏.最后,所有需要估计的参数向量表示为22{,,,,,}x y x y x y W W Ωμμδδ=.然后,考虑所有样本X ={X 1,X 2}的情况.因为样本之间假设是相互独立的,那么关于有所有样本的似然函数为1211()()()(,)()L Ni i j i j L P X P X P X P x y P x ==+==∏∏ (4)这里,P (x i ,y i )可以由公式(3)计算得到,而()(|)()d j j j j j P x P x z P z z =∫可以由概率PCA 模型计算得到.2.1.2 分类约束降维(CCDR)CCDR 的主要思想如下:将每个类所有样本的中心点作为新的数据节点加入到邻接图中,然后同类样本点与它们的中心点之间加入一条权重为1的边.这样,CCDR 可以形式化地写成最小化下面的目标函数:22()||||||||n ki k i ij i j kiijE Z a z y w y y β=−+−∑∑ (5)这里,z k 表示嵌入到低维空间后第k 个类的中心;A ={a ki }表示类别关系矩阵(如果数据x i 属于第k 类,则a ik =1;否则,a ik =0);W ={w ij }表示数据的邻接图,它的构造方法有很多种,可以参考文献[29];y i 是指嵌入到低维空间后的数据向量,其中,Z n =[z 1,…,z C ,y 1,…,y N ].2.1.3 半监督判别分析(SDA)SDA 是基于线性判别分析的一个半监督降维版本,它通过在LDA 的目标函数中添加正则化项,使得SDA 在最大化类间离散度的同时可以保持数据的局部结构信息.SDA 能够优化下面的目标函数:arg maxarg max ()()T T b b T T T wat t w S w w S ww S w J w w S XLX I w ααβ=+++ (6)式中,S b 表示带标号数据的类间离散度矩阵,S t 表示总体离散度,J (W )是正则化项(通过构造k -近邻图保持数据的流形),L 表示拉普拉斯矩阵.像LDA 一样,SDA 的目标函数也可以转化为一个广义特征分解问题.2.1.4 半监督局部Fisher 判别分析(SELF)SELF 是由文献[27]提出来的另一种半监督降维方法,它是局部Fisher 判别分析[9]的半监督版本.SELF 通过32 Journal of Software 软件学报 V ol.22, No.1, January 2011把PCA 和LFDA 综合起来,可以保持无类别标号数据的全局结构,同时保留LFDA 方法的优点(比如类内的数据为多模态分布、LDA 的维数限制等).SELF 可以表示为求解下面的优化问题:()()1arg max[(())]T rlb T rlw WWopt tr W S W W S W −= (7)上式中,W 是映射矩阵.S(rlb )是正则化局部类间离散矩阵,S (rlw )是正则化局部类内离散矩阵,定义如下:S (rlb )=(1−β)S (lb )+βS (t ), S (rlw )=(1−β)S (lw )+βI d (8)其中,S (lb )和S (lw )分别是LFDA 算法中的局部类间离散矩阵和局部类内离散矩阵,S (t )是离散度矩阵(数据方差矩阵).β∈[0,1]是模型的调节参数,当β=1时,SELF 就退化为PCA;而当β=0时,SELF 就退化为LFDA.2.1.5 基于流形学习的半监督局部Fisher 判别分析(SSLFDA)与第2.1.4节中的SELF 算法类似,SSLFDA 也是LFDA 算法的一个半监督版本,它是根据文献[28]中提出的半监督降维框架直接推导得到的.它与SELF 在利用无类别标号数据方面有所不同:SSLFDA 算法保持数据的流形结构,而SELF 保持数据的全局结构.在这个框架里面,半监督降维方法可以简单地表示为求解下面的优化问题:*arg min ()()l T u T WW f W X f W X γ=+ (9)其中,f l(⋅)和f u(⋅)分别表示关于有类别数据和无类别标号数据的函数,γ是调节因子.通常,f 可以写成成对数据加权距离的函数,最终,问题(9)转化为矩阵的特征分解问题.通过定义不同的权值,该框架可以导出不同的半监督降维方法.2.1.6 方法总结表1中列出了第2.1.1节~第2.1.5节中5个半监督降维方法的一些属性.k 表示数据邻接图近邻样本点的数目,t 是高斯核函数(Gaussian-kernel)中的带宽,i 表示迭代的次数,p 表示稀疏矩阵中的非零元素的个数.因为在S2PPCA 方法中提供了两个互为对偶的EM 算法,所以表中列出了它的两个计算复杂度和存储复杂度.S2PPCA 是概率模型,算法的性能一方面依赖于模型假设,即数据分布是高斯的或混合高斯的;另一方面还依赖于样本数据的个数.如果样本数目太少,参数的估计就不可信,从而导致算法的性能下降.CCDR 是一种非线性方法,与LE 方法一样,邻接图构造的好坏会直接影响算法的性能.SDA,SELF 和SSLFDA 是3种线性降维方法.SDA 通过增加正则化项,使得在降维的过程中能够保持数据的局部结构;SELF 需要最大化数据的协方差(PCA 准则),因此它在降维的过程中利用无标号的数据保持数据的全局结构;SSLFDA 利用无标号的数据保持数据的流形结构(LPP 准则),使数据的局部结构得到保持.Table 1 Properties of semi-supervised dimensionality reduction methods based on class label表1 基于类别标号的半监督降维方法的属性Methods Basic idea ParametersComputational MemoryS2PPCA CCDR SDA SELF SSLFDA Label+PPCALabel+LELDA+Adjacency graph LFDA+PCA LFDA+LPP None k , t , β k , α, β k , β k , βO (i (D +L )Nd ) or O (N 2(id +D ))O (p (N +C )2) O (D 3) O (D 3) O (D 3) O ((D +L )N ) or O (N 2)O (p (N +C )2)O (N 2+D 2) O (D 2)O (D 2)2.2 基于成对约束的半监督降维在半监督学习中,除了类别标号信息,还可以利用其他形式的先验信息,比如成对约束.在很多情况下,人们往往不知道样本的具体类别标号,却知道两个样本属于同一个类别,或者不属于同一个类别,我们称这样的监督信息为成对约束.成对约束往往分为两种:正约束(must-link)和负约束(cannot-link).正约束指的是两个样本属于同一个类别;相反地,负约束指的是两个样本属于不同的类别.本文中,把所有正约束的集合记为ML ,所有负约束的集合记为CL .我们首先回顾一些基于成对约束的半监督降维方法.Tang 等人提出用约束指导降维过程[30],他们的方法仅仅用到约束而忽略了无标号的数据.Bar-Hillel 等人提出一种约束FDA(cFDA)[31]方法对数据进行预处理,它是陈诗国 等:半监督降维方法的实验比较33作为相关成分分析算法(RCA)的一个中间步骤.Zhang 等人从一个更为直观的角度同时利用正约束和负约束指导降维过程,提出了一个半监督降维框架(SSDR)[32].Cevikalp 等人在局部保持投影方法(LPP)中引入约束信息,提出了约束局部保持投影算法(cLPP)[33].Wei 等人提出了一种邻居保持降维(NPSSDR)方法[34],在利用约束指导降维的同时,保持数据的局部结构信息.Baghshah 等人将NPSSDR 方法用于度量学习[35],他们使用了一种二分搜索方法来优化求解过程.Chen 等人在文献[36]中提出了一个基于约束信息的半监督非负矩阵分解(NMF)框架.彭岩等人在传统的典型相关分析算法中加入成对约束,提出了一种半监督典型相关分析算法[37].最近,Davidson 提出了一个基于图的降维框架[38],在该框架中,首先构造一个约束图,然后根据构建出来的图来指导降维.下面我们简要介绍4种基于成对约束的半监督降维方法:约束Fisher 判别分析(cFDA)[31]、基于约束的半监督降维框架(SSDR)[32]、约束的局部保持投影(cLPP)[33]和邻域保持半监督降维(NPSSDR)[34,35].最后,对这几种方法的属性以及它们之间的关系进行分析.2.2.1 约束Fisher 判别分析(cFDA)cFDA 是度量学习算法相关成分分析(RCA)[31]的一个中间步骤.cFDA 的具体做法是:首先,使用正约束(文献[31]中称为equivalence constraints)把数据聚成若干个簇(cluster);然后,类似于LDA 构建簇内散布矩阵S w 和总体散布矩阵S t ;最后,最大化下面的比率:max T t TW wW S WW S W (10) 其中,W 是映射矩阵,T 是矩阵转置符号.优化目标W 可以简单地由矩阵1w t S S −的前d 个特征向量组成.2.2.2 基于约束的半监督降维框架(SSDR)不同于cFDA 利用约束信息来构造散布矩阵,SSDR [32]直接使用约束来指导降维.SSDR 在降维的过程中保持数据之间的约束关系,同时像PCA 一样保持数据内部结构信息.SSDR 最大化下面的目标方程:2222,(,)(,)1()()()()222i j i j T T T T T T i j i j i j i j x x CL x x MLCL ML J w w x w x w x w x w x w x n n n αβ∈∈=−+−−−∑∑∑ (11) 上式中,第1项可以使得降维后两两数据之间的距离保持最大,它等价于PCA 准则,即最大化数据的方差;n CL 和n ML 分别表示负约束和正约束的个数. 2.2.3 约束的局部保持投影(cLPP)与SSDR 不同,cLPP 在降维过程中保持数据的局部结构信息.cLPP 的具体步骤如下:首先,构造数据的邻接矩阵;然后,利用约束信息修改邻接矩阵中的权值,使得正约束数据之间的权值增大,负约束数据之间的权值变小,同时修改与有约束的数据点直接相连点的相关权值,对约束信息进行传播;最后,cLPP 的目标函数可以显示地写成下面的形式:222,,,1()()()()2i j ij i j i j i j i j ML i j CL J w z z A z z z z ∈∈⎛⎞=−+−−−⎜⎟⎜⎟⎝⎠∑∑∑% (12) 这里,ijA %代表修改后的数据邻接矩阵,z i 是原始数据x i 映射到低维空间后所对应的点. 2.2.4 邻域保持半监督降维(NPSSDR)∗∗NPSSDR 利用正约束和负约束降维,同时保持数据的局部结构信息.与cLPP 方法不同,NPSSDR 不需要构造数据的邻接矩阵,而是通过添加正则化项的方法来实现.这里,我们给出了文献[35]中的目标函数:2(,)*(,)()arg max()()i j T i j i j z z CLW W Ii j z z MLz z W z z J W α∈=∈−=−+∑(13)∗∗ Wei等人在文献[34]中提出NPSSDR 方法.Baghshah 等人在文献[35]中提出的算法思想与文献[34]相似,我们统一称他们的方法为NPSSDR.实验中比较的NPSSDR 方法是按照文献[35]中的求解方法实现的.34 Journal of Software 软件学报 V ol.22, No.1, January 2011式中,J (W )是正则化项.若使用局部线性嵌入(LLE)的思想构造正则化项,则有J (W )=tr (W T XMX T W ),其中,M 是数据重构矩阵.于是,目标函数(13)最终转化为求解一个迹比(trace radio)问题,该问题可以使用一种二分搜索算法来求解[39].2.2.5 方法总结表2中显示了基于成对约束的半监督降维方法的一些属性,k 表示数据邻接图近邻样本点的数目,t 是高斯核函数中的带宽,i 表示迭代的次数,p 表示稀疏矩阵中的非零元素的个数.cFDA 仅仅利用正约束而没有用负约束,它对约束的选取有很强的依赖性;并且,当S W 奇异的时候,cFDA 的求解过程还需要作特殊的处理.SSDR 既可以利用正约束,也可以利用负约束,但是它在降维过程中只保持数据的全局结构,并且约束的信息也没有进行传播.cLPP 与SSDR 相比,保持了数据的局部结构,并且把约束信息传播到邻近的数据点.但是,cLPP 与LPP 方法一样,降维性能的好坏需要依赖于数据邻接图的构建.NPSSDR 使用一种二分搜索算法近似求解最终的问题,这种搜索算法需要反复求解特征值问题,计算复杂度较大,并且算法有可能会遇到不收敛的问题.NPSSDR 使用LLE 策略保持无标号数据的局部结构,它会遇到LLE 同样的问题,比如局部结构塌陷问题等等[2].Table 2 Properties of semi-supervised dimensionality reduction methods based on pairwise constraints表2 基于成对约束的半监督降维方法的属性Methods Basic idea Structure Parameters Computational MemorycFDA SSDR cLPP NPSSDR must-link+LDA cannot-link+must-link+PCA cannot-link+must-link+LPP cannot-link+must-link+LLE Global Global Local Local None α, β k , tk , αO (D 3) O (D 3) O (D 3) O (iD 3+pN 2) O (D 2)O (N 2+D 2) O (N 2+D 2) O (D 2+pN 2)2.3 基于其他监督信息的方法半监督降维方法除了可以利用类别标号和成对约束作为监督信息以外,还有很多其他形式的监督信息,我们把它们统一分在了第3类.扩充关系嵌入(ARE)[40]、语义子空间映射(SSP)[41]和相关集成映射(RAP)[42]是利用图像检索中的检索与被检索图像间的相关关系作为监督信息指导特征抽取的过程.Yang 等人使用流形上的嵌入关系,把一些无监督的流形方法扩展到半监督的形式[43],例如半监督的ISOMAP(SS-ISOMAP)和半监督的局部线性嵌入(SS-LLE).Memisevic 等人提出了一种半监督降维框架多关系嵌入(MRE)[44],可以综合利用多种相似性关系.图2展示了一个使用流形上的嵌入关系的例子[43].图中相对较大的实心样本点表示已知嵌入关系的样本,即在嵌入过程中,这些样本在低维空间中的位置是已知的,而其他样本的位置是未知的.文献[43]就是利用这种已知的嵌入关系作为先验知识,把几种经典的流形学习方法扩展到半监督的形式.(a) Original samples (b) Low dimensional embedded samples [43](a) 原始数据 (b) 低维嵌入[43]Fig.2 Prior information in the form of on-manifold coordinates图2 已知低维嵌入关系作为先验知识3 实 验在这一节中,我们在大量的标准数据集上比较了第2节中介绍的几种半监督降维方法.这些标准数据集包−11042−2−4−4−224陈诗国等:半监督降维方法的实验比较35括UCI标准数据集∗∗∗、半监督学习数据集[5]和人脸标准数据集.实验中,对于基于类别标号的降维方法,我们按照不同的比例将数据集随机划分为两个部分:一部分(占数据总数的10%,30%或50%)作为有标号的数据,另外一部分作为无标号的数据.对于基于成对约束的降维方法,随机选取一定数目(占数据总数的10%,30%或50%)的成对约束,而把整个数据集当作无标号的数据.对于UCI标准数据集和半监督数据集,选用最大似然估计方法[45]计算数据集的内在维数,文献[2]中的实验中也采用了相同的处理方法.实验采用简单的最近邻分类器(nearest neighbor,简称NN)的分类精度作为降维方法的评价指标,使用留一交叉验证法估计最终的实验结果.各种降维方法的参数设置见表3,除了部分参数经验设定以外,其他参数都采用网格式搜索的方法确定最优设置.T able 3Parameter settings for experiments表3实验中的参数设置Methods ParametersettingsLDA NoneCCDR SDA SELF SSLFDA 1≤k≤15, t=1, 0<β≤10 1≤k≤15, β=0, 0<α≤10 1≤k≤15, 0<β<11≤k≤15, 0<γ≤10, α=8cFDA DCA SSDR cLPP NPSSDRNoneNoneα=1, 1≤β≤301≤k≤15, t=51≤k≤15, α=0.2 or 0.023.1 UCI标准数据集上的实验比较本实验中使用了8个UCI标准数据集用于测试降维算法的性能.表4中列出了这些数据集的属性(C表示类别数目,D表示维数,N表示样本数目).这些数据集中既包括简单的数据集,例如iris和soybean_small,也包括一些复杂的数据集,例如手写数据集digits0.05和letter0.05.Table 4Properties of 8 UCI datasets表48个UCI数据集的属性iris digits0.05 letter0.05protein soybean_small letter_0.1_IJL ionosphere zooC D N34150101655026161 00062011643547316227234351716101表5和表6分别显示了基于类别标号和成对约束的半监督降维方法在UCI标准数据集上的实验结果.表中列出来的结果是经过降维后的数据再用最近邻分类器得到的分类精度,每个实验结果都是算法各自运行20遍后的平均值.PCA,LDA和只利用约束的DCA作为比较的基准方法也被列了出来.表中的最左列表示的是数据集的名称、最大似然估计得到的维数值D和类别数C.表5中的nL表示的是有标号的数据在整个数据集中所占的比例(10%,30%或50%),表6中的nC表示的是约束的数目占数据总数的比例(10%,30%或50%),其中,正约束和负约束的数目相同.黑色粗体数值表示的是该数据集上最好的实验结果.∗∗∗实验中使用的UCI标准数据集可以从网站/ml/上得到,其中,digits0.05,letter0.05和letter_0.1_IJL是从原始数据集digits,letter中进行抽样得到的.具体的抽样方法可参照工具包Weka(/ml/weka/)中的数据集.。
深度强化学习在自动驾驶中的应用研究(英文中文双语版优质文档)
深度强化学习在自动驾驶中的应用研究(英文中文双语版优质文档)Application Research of Deep Reinforcement Learning in Autonomous DrivingWith the continuous development and progress of artificial intelligence technology, autonomous driving technology has become one of the research hotspots in the field of intelligent transportation. In the research of autonomous driving technology, deep reinforcement learning, as an emerging artificial intelligence technology, is increasingly widely used in the field of autonomous driving. This paper will explore the application research of deep reinforcement learning in autonomous driving.1. Introduction to Deep Reinforcement LearningDeep reinforcement learning is a machine learning method based on reinforcement learning, which enables machines to intelligently acquire knowledge and experience from the external environment, so that they can better complete tasks. The basic framework of deep reinforcement learning is to use the deep learning network to learn the mapping of state and action. Through continuous interaction with the environment, the machine can learn the optimal strategy, thereby realizing the automation of tasks.The application of deep reinforcement learning in the field of automatic driving is to realize the automation of driving decisions through machine learning, so as to realize intelligent driving.2. Application of Deep Reinforcement Learning in Autonomous Driving1. State recognition in autonomous drivingIn autonomous driving, state recognition is a very critical step, which mainly obtains the state information of the environment through sensors and converts it into data that the computer can understand. Traditional state recognition methods are mainly based on rules and feature engineering, but this method not only requires human participation, but also has low accuracy for complex environmental state recognition. Therefore, the state recognition method based on deep learning has gradually become the mainstream method in automatic driving.The deep learning network can perform feature extraction and classification recognition on images and videos collected by sensors through methods such as convolutional neural networks, thereby realizing state recognition for complex environments.2. Decision making in autonomous drivingDecision making in autonomous driving refers to the process of formulating an optimal driving strategy based on the state information acquired by sensors, as well as the goals and constraints of the driving task. In deep reinforcement learning, machines can learn optimal strategies by interacting with the environment, enabling decision making in autonomous driving.The decision-making process of deep reinforcement learning mainly includes two aspects: one is the learning of the state-value function, which is used to evaluate the value of the current state; the other is the learning of the policy function, which is used to select the optimal action. In deep reinforcement learning, the machine can learn the state-value function and policy function through the interaction with the environment, so as to realize the automation of driving decision-making.3. Behavior Planning in Autonomous DrivingBehavior planning in autonomous driving refers to selecting an optimal behavior from all possible behaviors based on the current state information and the goal of the driving task. In deep reinforcement learning, machines can learn optimal strategies for behavior planning in autonomous driving.4. Path Planning in Autonomous DrivingPath planning in autonomous driving refers to selecting the optimal driving path according to the goals and constraints of the driving task. In deep reinforcement learning, machines can learn optimal strategies for path planning in autonomous driving.3. Advantages and challenges of deep reinforcement learning in autonomous driving1. AdvantagesDeep reinforcement learning has the following advantages in autonomous driving:(1) It can automatically complete tasks such as driving decision-making, behavior planning, and path planning, reducing manual participation and improving driving efficiency and safety.(2) The deep learning network can perform feature extraction and classification recognition on the images and videos collected by the sensor, so as to realize the state recognition of complex environments.(3) Deep reinforcement learning can learn the optimal strategy through the interaction with the environment, so as to realize the tasks of decision making, behavior planning and path planning in automatic driving.2. ChallengeDeep reinforcement learning also presents some challenges in autonomous driving:(1) Insufficient data: Deep reinforcement learning requires a large amount of data for training, but in the field of autonomous driving, it is very difficult to obtain large-scale driving data.(2) Safety: The safety of autonomous driving technology is an important issue, because once an accident occurs, its consequences will be unpredictable. Therefore, the use of deep reinforcement learning in autonomous driving requires very strict safety safeguards.(3) Interpretation performance: Deep reinforcement learning requires a lot of computing resources and time for training and optimization. Therefore, in practical applications, the problems of computing performance and time cost need to be considered.(4) Interpretability: Deep reinforcement learning models are usually black-box models, and their decision-making process is difficult to understand and explain, which will have a negative impact on the reliability and safety of autonomous driving systems. Therefore, how to improve the interpretability of deep reinforcement learning models is an important research direction.(5) Generalization ability: In the field of autonomous driving, vehicles are faced with various environments and situations. Therefore, the deep reinforcement learning model needs to have a strong generalization ability in order to be able to accurately and Safe decision-making and planning.In summary, deep reinforcement learning has great application potential in autonomous driving, but challenges such as data scarcity, safety, interpretability, computational performance, and generalization capabilities need to be addressed. Future research should address these issues and promote the development and application of deep reinforcement learning in the field of autonomous driving.深度强化学习在自动驾驶中的应用研究随着人工智能技术的不断发展和进步,自动驾驶技术已经成为了当前智能交通领域中的研究热点之一。
15ICCV_Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for
Weakly-and Semi-Supervised Learning of a Deep Convolutional Network forSemantic Image SegmentationGeorge Papandreou∗Google,Inc. gpapan@ Liang-Chieh Chen∗UCLAlcchen@Kevin P.MurphyGoogle,Inc.kpmurphy@Alan L.YuilleUCLAyuille@AbstractDeep convolutional neural networks(DCNNs)trained on a large number of images with strong pixel-level anno-tations have recently significantly pushed the state-of-art in semantic image segmentation.We study the more challeng-ing problem of learning DCNNs for semantic image seg-mentation from either(1)weakly annotated training data such as bounding boxes or image-level labels or(2)a com-bination of few strongly labeled and many weakly labeled images,sourced from one or multiple datasets.We develop Expectation-Maximization(EM)methods for semantic im-age segmentation model training under these weakly super-vised and semi-supervised settings.Extensive experimental evaluation shows that the proposed techniques can learn models delivering competitive results on the challenging PASCAL VOC2012image segmentation benchmark,while requiring significantly less annotation effort.We share source code implementing the proposed system at https: ///deeplab/deeplab-public.1.IntroductionSemantic image segmentation refers to the problem of assigning a semantic label(such as“person”,“car”or “dog”)to every pixel in the image.Various approaches have been tried over the years,but according to the results on the challenging Pascal VOC2012segmentation benchmark,the best performing methods all use some kind of Deep Convo-lutional Neural Network(DCNN)[2,5,8,14,25,27,41].In this paper,we work with the DeepLab-CRF approach of[5,41].This combines a DCNN with a fully connected Conditional Random Field(CRF)[19],in order to get high resolution segmentations.This model achieves state-of-art results on the challenging PASCAL VOC segmentation benchmark[13],delivering a mean intersection-over-union (IOU)score exceeding70%.A key bottleneck in building this class of DCNN-based∗Thefirst two authors contributed equally to this work.segmentation models is that they typically require pixel-level annotated images during training.Acquiring such data is an expensive,time-consuming annotation effort.Weak annotations,in the form of bounding boxes(i.e.,coarse object locations)or image-level labels(i.e.,information about which object classes are present)are far easier to collect than detailed pixel-level annotations.We develop new methods for training DCNN image segmentation mod-els from weak annotations,either alone or in combination with a small number of strong annotations.Extensive ex-periments,in which we achieve performance up to69.0%, demonstrate the effectiveness of the proposed techniques.According to[24],collecting bounding boxes around each class instance in the image is about15times faster/cheaper than labeling images at the pixel level.We demonstrate that it is possible to learn a DeepLab-CRF model delivering62.2%IOU on the PASCAL VOC2012 test set by training it on a simple foreground/background segmentation of the bounding box annotations.An even cheaper form of data to collect is image-level labels,which specify the presence or absence of se-mantic classes,but not the object locations.Most exist-ing approaches for training semantic segmentation models from this kind of very weak labels use multiple instance learning(MIL)techniques.However,even recent weakly-supervised methods such as[25]deliver significantly infe-rior results compared to their fully-supervised counterparts, only achieving25.7%.Including additional trainable ob-jectness[7]or segmentation[1]modules that largely in-crease the system complexity,[31]has improved perfor-mance to40.6%,which still significantly lags performance of fully-supervised systems.We develop novel online Expectation-Maximization (EM)methods for training DCNN semantic segmentation models from weakly annotated data.The proposed algo-rithms alternate between estimating the latent pixel labels (subject to the weak annotation constraints),and optimiz-ing the DCNN parameters using stochastic gradient descent (SGD).When we only have access to image-level anno-tated training data,we achieve39.6%,close to[31]butwithout relying on any external objectness or segmenta-tion module.More importantly,our EM approach also excels in the semi-supervised scenario which is very im-portant in practice.Having access to a small number of strongly (pixel-level)annotated images and a large number of weakly (bounding box or image-level)annotated images,the proposed algorithm can almost match the performance of the fully-supervised system.For example,having access to 2.9k pixel-level images and 9k image-level annotated im-ages yields 68.5%,only 2%inferior the performance of the system trained with all 12k images strongly annotated at the pixel level.Finally,we show that using additional weak or strong annotations from the MS-COCO dataset can further improve results,yielding 73.9%on the PASCAL VOC 2012benchmark.Contributions In summary,our main contributions are:1.We present EM algorithms for training with image-level or bounding box annotation,applicable to both the weakly-supervised and semi-supervised settings.2.We show that our approach achieves excellent per-formance when combining a small number of pixel-level annotated images with a large number of image-level or bounding box annotated images,nearly match-ing the results achieved when all training images have pixel-level annotations.3.We show that combining weak or strong annotations across datasets yields further improvements.In partic-ular,we reach 73.9%IOU performance on PASCAL VOC 2012by combining annotations from the PAS-CAL and MS-COCO datasets.2.Related workTraining segmentation models with only image-level labels has been a challenging problem in the literature [12,36,37,39].Our work is most related to other re-cent DCNN models such as [30,31],who also study the weakly supervised setting.They both develop MIL-based algorithms for the problem.In contrast,our model em-ploys an EM algorithm,which similarly to [26]takes into account the weak labels when inferring the latent image seg-mentations.Moreover,[31]proposed to smooth the predic-tion results by region proposal algorithms,e.g .,CPMC [3]and MCG [1],learned on pixel-segmented images.Neither [30,31]cover the semi-supervised setting.Bounding box annotations have been utilized for seman-tic segmentation by [38,42],while [15,21,40]describe schemes exploiting both image-level labels and bounding box annotations.[4]attained human-level accuracy for car segmentation by using 3D bounding boxes.Bounding box annotations are also commonly used in interactive segmen-tation [22,33];we show that such foreground/backgroundPixel annotationsImage Deep Convolutional Neural NetworkLossFigure 1.DeepLab model training from fully annotated images.segmentation methods can effectively estimate object seg-ments accurate enough for training a DCNN semantic seg-mentation system.Working in a setting very similar to ours,[9]employed MCG [1](which requires training from pixel-level annotations)to infer object masks from bounding box labels during DCNN training.3.Proposed MethodsWe build on the DeepLab model for semantic image seg-mentation proposed in [5].This uses a DCNN to predict the label distribution per pixel,followed by a fully-connected (dense)CRF [19]to smooth the predictions while preserv-ing image edges.In this paper,we focus for simplicity on methods for training the DCNN parameters from weak la-bels,only using the CRF at test time.Additional gains can be obtained by integrated end-to-end training of the DCNN and CRF parameters [41,6].Notation We denote by x the image values and y the seg-mentation map.In particular,y m ∈{0,...,L }is the pixel label at position m ∈{1,...,M },assuming that we have the background as well as L possible foreground labels and M is the number of pixels.Note that these pixel-level la-bels may not be visible in the training set.We encode the set of image-level labels by z ,with z l =1,if the l -th label is present anywhere in the image,i.e .,if m [y m =l ]>0.3.1.Pixel-level annotationsIn the fully supervised case illustrated in Fig.1,the ob-jective function isJ (θ)=log P (y |x ;θ)=Mm =1log P (y m |x ;θ),(1)where θis the vector of DCNN parameters.The per-pixellabel distributions are computed byP (y m |x ;θ)∝exp(f m (y m |x ;θ)),(2)where f m (y m |x ;θ)is the output of the DCNN at pixel m .We optimize J (θ)by mini-batch SGD.3.2.Image-level annotationsWhen only image-level annotation is available,we can observe the image values x and the image-level labels z ,but the pixel-level segmentations y are latent variables.WeAlgorithm 1Weakly-Supervised EM (fixed bias version)Input:Initial CNN parameters θ′,potential parameters b l ,l ∈{0,...,L },image x ,image-level label set z .E-Step:For each image position m1:ˆf m (l )=f m (l |x ;θ′)+b l ,if z l =12:ˆf m (l )=f m (l |x ;θ′),if z l =03:ˆy m =argmax l ˆf m (l )M-Step:4:Q (θ;θ′)=log P (ˆy |x ,θ)= M m =1log P (ˆy m |x ,θ)5:Compute ∇θQ (θ;θ′)and use SGD to update θ′.have the following probabilistic graphical model:P (x ,y ,z ;θ)=P (x )Mm =1P (y m |x ;θ)P (z |y ).(3)We pursue an EM-approach in order to learn the model parameters θfrom training data.If we ignore terms that do not depend on θ,the expected complete-data log-likelihood given the previous parameter estimate θ′isQ (θ;θ′)= yP (y |x ,z ;θ′)log P (y |x ;θ)≈log P (ˆy |x ;θ),(4)where we adopt a hard-EM approximation,estimating in the E-step of the algorithm the latent segmentation by ˆy =argmax yP (y |x ;θ′)P (z |y )(5)=argmax ylog P (y |x ;θ′)+log P (z |y )(6)=argmaxyMm =1f m (y m |x ;θ′)+log P (z |y ) .(7)In the M-step of the algorithm,we optimize Q (θ;θ′)≈log P (ˆy |x ;θ)by mini-batch SGD similarly to (1),treatingˆyas ground truth segmentation.To completely identify the E-step (7),we need to specifythe observation model P (z |y ).We have experimented withtwo variants,EM-Fixed and EM-Adapt .EM-Fixed In this variant,we assume that log P (z |y )fac-torizes over pixel positions aslog P (z |y )=Mm =1φ(y m ,z )+(const),(8)allowing us to estimate the E-step segmentation at eachpixel separatelyˆy m =argmaxy mˆf m (y m ).=f m (y m |x ;θ′)+φ(y m ,z ).(9)ImageFigure 2.DeepLab model training using image-level labels.We assume thatφ(y m =l,z )=b l if z l =10if z l =0(10)We set the parameters b l =b fg ,if l >0and b 0=b bg ,with b fg >b bg >0.Intuitively,this potential encourages a pixel to be assigned to one of the image-level labels z .We choose b fg >b bg ,boosting present foreground classes more than the background,to encourage full object coverage andavoid a degenerate solution of all pixels being assigned to background.The procedure is summarized in Algorithm 1and illustrated in Fig.2.EM-Adapt In this method,we assume that log P (z |y )=φ(y ,z )+(const),where φ(y ,z )takes the form of a cardi-nality potential [23,32,35].In particular,we encourage atleast a ρl portion of the image area to be assigned to classl ,if z l =1,and enforce that no pixel is assigned to classl ,if z l =0.We set the parameters ρl =ρfg ,if l >0andρ0=ρbg .Similar constraints appear in [10,20].In practice,we employ a variant of Algorithm 1.Weadaptively set the image-and class-dependent biases b l so as the prescribed proportion of the image area is assigned to the background or foreground object classes.This acts as a powerful constraint that explicitly prevents the background score from prevailing in the whole image,also promoting higher foreground object coverage.The detailed algorithm is described in the supplementary material.EM It is instructive to compare our EM-based approach with two recent Multiple Instance Learning (MIL)methods for learning semantic image segmentation models [30,31].The method in [30]defines an MIL classification objective based on the per-class spatial maximum of the lo-cal label distributions of (2),ˆP (l |x ;θ).=max m P (y m =l |x ;θ),and [31]adopts a softmax function.While this approach has worked well for image classification tasks [28,29],it is less suited for segmentation as it does not pro-mote full object coverage:The DCNN becomes tuned to focus on the most distinctive object parts (e.g .,human face)instead of capturing the whole object (e.g .,human body).ImageBbox annotationsDeep ConvolutionalNeural NetworkDenseCRFargmaxLossFigure3.DeepLab model training from bounding boxes.3.3.Bounding Box AnnotationsWe explore three alternative methods for training our segmentation model from labeled bounding boxes.Thefirst Bbox-Rect method amounts to simply consider-ing each pixel within the bounding box as positive example for the respective object class.Ambiguities are resolved by assigning pixels that belong to multiple bounding boxes to the one that has the smallest area.The bounding boxes fully surround objects but also contain background pixels that contaminate the training set with false positive examples for the respective object classes.Tofilter out these background pixels,we have also explored a second Bbox-Seg method in which we per-form automatic foreground/background segmentation.To perform this segmentation,we use the same CRF as in DeepLab.More specifically,we constrain the center area of the bounding box(α%of pixels within the box)to be fore-ground,while we constrain pixels outside the bounding box to be background.We implement this by appropriately set-ting the unary terms of the CRF.We then infer the labels for pixels in between.We cross-validate the CRF parameters to maximize segmentation accuracy in a small held-out set of fully-annotated images.This approach is similar to the grabcut method of[33].Examples of estimated segmenta-tions with the two methods are shown in Fig.4.The two methods above,illustrated in Fig.3,estimate segmentation maps from the bounding box annotation as a pre-processing step,then employ the training procedure of Sec.3.1,treating these estimated labels as ground-truth.Our third Bbox-EM-Fixed method is an EM algorithm that allows us to refine the estimated segmentation maps throughout training.The method is a variant of the EM-Fixed algorithm in Sec.3.2,in which we boost the present foreground object scores only within the bounding box area.3.4.Mixed strong and weak annotationsIn practice,we often have access to a large number of weakly image-level annotated images and can only afford to procure detailed pixel-level annotations for a small fraction of these images.We handlethishybrid training scenario byImage with Bbox Ground-Truth Bbox-Rect Bbox-SegFigure4.Estimatedsegmentation frombounding box annotation.+Pixel AnnotationsFG/BGBiasargmax1. Car2. Person3. HorseDeep ConvolutionalNeural Network LossDeep ConvolutionalNeural NetworkLossScore mapsFigure5.DeepLab model training on a union of full(strong labels)and image-level(weak labels)annotations.combining the methods presented in the previous sections,as illustrated in Figure5.In SGD training of our deep CNNmodels,we bundle to each mini-batch afixed proportionof strongly/weakly annotated images,and employ our EMalgorithm in estimating at each iteration the latent semanticsegmentations for the weakly annotated images.4.Experimental Evaluation4.1.Experimental ProtocolDatasets The proposed training methods are evaluatedon the PASCAL VOC2012segmentation benchmark[13],consisting of20foreground object classes and one back-ground class.The segmentation part of the original PAS-CAL VOC2012dataset contains1464(train),1449(val),and1456(test)images for training,validation,and test,re-spectively.We also use the extra annotations provided by[16],resulting in augmented sets of10,582(train aug)and12,031(trainval aug)images.We have also experimentedwith the large MS-COCO2014dataset[24],which con-tains123,287images in its trainval set.The MS-COCO2014dataset has80foreground object classes and one back-ground class and is also annotated at the pixel level.The performance is measured in terms of pixelintersection-over-union(IOU)averaged across the21classes.Wefirst evaluate our proposed methods on the PAS-CAL VOC2012val set.We then report our results on the official PASCAL VOC2012benchmark test set(whose an-notations are not released).We also compare our test set results with other competing methods.Reproducibility We have implemented the proposed methods by extending the excellent Caffe framework[18]. We share our source code,configurationfiles,and trained models that allow reproducing the results in this paper at a companion web site https:/// deeplab/deeplab-public.Weak annotations In order to simulate the situations where only weak annotations are available and to have fair comparisons(e.g.,use the same images for all settings),we generate the weak annotations from the pixel-level annota-tions.The image-level labels are easily generated by sum-marizing the pixel-level annotations,while the bounding box annotations are produced by drawing rectangles tightly containing each object instance(PASCAL VOC2012also provides instance-level annotations)in the dataset. Network architectures We have experimented with the two DCNN architectures of[5],with parameters initialized from the VGG-16ImageNet[11]pretrained model of[34]. They differ in the receptivefield of view(FOV)size.We have found that large FOV(224×224)performs best when at least some training images are annotated at the pixel level, whereas small FOV(128×128)performs better when only image-level annotations are available.In the main paper we report the results of the best architecture for each setup and defer the full comparison between the two FOVs to the supplementary material.Training We employ our proposed training methods to learn the DCNN component of the DeepLab-CRF model of [5].For SGD,we use a mini-batch of20-30images and ini-tial learning rate of0.001(0.01for thefinal classifier layer), multiplying the learning rate by0.1after afixed number of iterations.We use momentum of0.9and a weight decay of 0.0005.Fine-tuning our network on PASCAL VOC2012 takes about12hours on a NVIDIA Tesla K40GPU.Similarly to[5],we decouple the DCNN and Dense CRF training stages and learn the CRF parameters by cross val-idation to maximize IOU segmentation accuracy in a held-out set of100Pascal val fully-annotated images.We use10 mean-field iterations for Dense CRF inference[19].Note that the IOU scores are typically3-5%worse if we don’t use the CRF for post-processing of the results.4.2.Pixel-level annotationsWe havefirst reproduced the results of[5].Training the DeepLab-CRF model with strong pixel-level annota-tions on PASCAL VOC2012,we achieve a mean IOU scoreMethod#Strong#Weak val IOUEM-Fixed(Weak)-10,58220.8EM-Adapt(Weak)-10,58238.2EM-Fixed(Semi)20010,38247.650010,08256.97509,83259.81,0009,58262.01,4645,00063.21,4649,11864.6Strong1,464-62.510,582-67.6Table1.VOC2012val performance for varying number of pixel-level(strong)and image-level(weak)annotations(Sec.4.3).Method#Strong#Weak test IOUMIL-FCN[30]-10k25.7MIL-sppxl[31]-760k35.8MIL-obj[31]BING760k37.0MIL-seg[31]MCG760k40.6EM-Adapt(Weak)-12k39.6EM-Fixed(Semi)1.4k10k66.22.9k9k68.5Strong[5]12k-70.3Table2.VOC2012test performance for varying number of pixel-level(strong)and image-level(weak)annotations(Sec.4.3).of67.6%on val and70.3%on test;see method DeepLab-CRF-LargeFOV in[5,Table1].4.3.Image-level annotationsValidation results We evaluate our proposed methods in training the DeepLab-CRF model using image-level weak annotations from the10,582PASCAL VOC2012train aug set,generated as described in Sec.4.1above.We report the val performance of our two weakly-supervised EM vari-ants described in Sec.3.2.In the EM-Fixed variant we use b fg=5and b bg=3asfixed foreground and background biases.We found the results to be quite sensitive to the dif-ference b fg−b bg but not very sensitive to their absolute val-ues.In the adaptive EM-Adapt variant we constrain at least ρbg=40%of the image area to be assigned to background and at leastρfg=20%of the image area to be assigned to foreground(as specified by the weak label set).We also examine using weak image-level annotations in addition to a varying number of pixel-level annotations, within the semi-supervised learning scheme of Sec.3.4. In this Semi setting we employ strong annotations of a subset of PASCAL VOC2012train set and use the weak image-level labels from another non-overlapping subset of the train aug set.We perform segmentation inference for the images that only have image-level labels by means of EM-Fixed,which we have found to perform better than EM-Adapt in the semi-supervised training setting.The results are summarized in Table1.We see that the EM-Adapt algorithm works much better than the EM-Fixed algorithm when we only have access to image level an-notations,20.8%vs.38.2%validation ing1,464 pixel-level and9,118image-level annotations in the EM-Fixed semi-supervised setting significantly improves per-formance,yielding64.6%.Note that image-level annota-tions are helpful,as training only with the1,464pixel-level annotations only yields62.5%.Test results In Table2we report our test results.We com-pare the proposed methods with the recent MIL-based ap-proaches of[30,31],which also report results obtained with image-level annotations on the VOC benchmark.Our EM-Adapt method yields39.6%,which improves over MIL-FCN[30]by a large13.9%margin.As[31]shows,MIL can become more competitive if additional segmentation in-formation is introduced:Using low-level superpixels,MIL-sppxl[31]yields35.8%and is still inferior to our EM algo-rithm.Only if augmented with BING[7]or MCG[1]can MIL obtain results comparable to ours(MIL-obj:37.0%, MIL-seg:40.6%)[31].Note,however,that both BING and MCG have been trained with bounding box or pixel-annotated data on the PASCAL train set,and thus both MIL-obj and MIL-seg indirectly rely on bounding box or pixel-level PASCAL annotations.The more interestingfinding of this experiment is that including very few strongly annotated images in the semi-supervised setting significantly improves the performance compared to the pure weakly-supervised baseline.For example,using 2.9k pixel-level annotations along with 9k image-level annotations in the semi-supervised setting yields68.5%.We would like to highlight that this re-sult surpasses all techniques which are not based on the DCNN+CRF pipeline of[5](see Table6),even if trained with all available pixel-level annotations.4.4.Bounding box annotationsValidation results In this experiment,we train the DeepLab-CRF model using bounding box annotations from the train aug set.We estimate the training set segmentations in a pre-processing step using the Bbox-Rect and Bbox-Seg methods described in Sec.3.3.We assume that we also have access to100fully-annotated PASCAL VOC2012val images which we have used to cross-validate the value of the single Bbox-Seg parameterα(percentage of the cen-ter bounding box area constrained to be foreground).We variedαfrom20%to80%,finding thatα=20%maxi-mizes accuracy in terms of IOU in recovering the ground truth foreground from the bounding box.We also examine the effect of combining these weak bounding box annota-tions with strong pixel-level annotations,using the semi-supervised learning methods of Sec.3.4.The results are summarized in Table3.When using only bounding box annotations,we see that Bbox-Seg improves over Bbox-Rect by8.1%,and gets within7.0%of the strong pixel-level annotation result.We observe that combining 1,464strong pixel-level annotations with weak bounding box annotations yields65.1%,only2.5%worse than the strong pixel-level annotation result.In the semi-supervisedMethod#Strong#Box val IOUBbox-Rect(Weak)-10,58252.5Bbox-EM-Fixed(Weak)-10,58254.1Bbox-Seg(Weak)-10,58260.6Bbox-Rect(Semi)1,4649,11862.1Bbox-EM-Fixed(Semi)1,4649,11864.8Bbox-Seg(Semi)1,4649,11865.1Strong1,464-62.510,582-67.6Table3.VOC2012val performance for varying number of pixel-level(strong)and bounding box(weak)annotations(Sec.4.4).Method#Strong#Box test IOUBoxSup[9]MCG10k64.6BoxSup[9] 1.4k(+MCG)9k66.2Bbox-Rect(Weak)-12k54.2Bbox-Seg(Weak)-12k62.2Bbox-Seg(Semi) 1.4k10k66.6Bbox-EM-Fixed(Semi) 1.4k10k66.6Bbox-Seg(Semi) 2.9k9k68.0Bbox-EM-Fixed(Semi) 2.9k9k69.0Strong[5]12k-70.3Table4.VOC2012test performance for varying number of pixel-level(strong)and bounding box(weak)annotations(Sec.4.4).learning settings and1,464strong annotations,Semi-Bbox-EM-Fixed and Semi-Bbox-Seg perform similarly.Test results In Table4we report our test results.We com-pare the proposed methods with the very recent BoxSup ap-proach of[9],which also uses bounding box annotations on the VOC2012segmentation paring our al-ternative Bbox-Rect(54.2%)and Bbox-Seg(62.2%)meth-ods,we see that simple foreground-background segmenta-tion provides much better segmentation masks for DCNN training than using the raw bounding boxes.BoxSup does 2.4%better,however it employs the MCG segmentation proposal mechanism[1],which has been trained with pixel-annotated data on the PASCAL train set;it thus indirectly relies on pixel-level annotations.When we also have access to pixel-level annotated im-ages,our performance improves to66.6%(1.4k strong annotations)or69.0%(2.9k strong annotations).In this semi-supervised setting we outperform BoxSup(66.6%vs.66.2%with1.4k strong annotations),although we do not use MCG.Interestingly,Bbox-EM-Fixed improves over Bbox-Seg as we add more strong annotations,and it per-forms1.0%better(69.0%vs.68.0%)with2.9k strong an-notations.This shows that the E-step of our EM algorithm can estimate the object masks better than the foreground-background segmentation pre-processing step when enough pixel-level annotated images are available.Comparing with Sec.4.3,note that2.9k strong+9k image-level annotations yield68.5%(Table2),while2.9k strong+9k bounding box annotations yield69.0%(Ta-ble3).Thisfinding suggests that bounding box annotations add little value over image-level annotations when a suffi-cient number of pixel-level annotations is also available.Method#Strong COCO#Weak COCO val IOU PASCAL-only--67.6EM-Fixed(Semi)-123,28767.7Cross-Joint(Semi)5,000118,28770.0Cross-Joint(Strong)5,000-68.7Cross-Pretrain(Strong)123,287-71.0Cross-Joint(Strong)123,287-71.7 Table5.VOC2012val performance using strong annotations for all10,582train aug PASCAL images and a varying number of strong and weak MS-COCO annotations(Sec.4.5).Method test IOUMSRA-CFM[8]61.8FCN-8s[25]62.2Hypercolumn[17]62.6TTI-Zoomout-16[27]64.4DeepLab-CRF-LargeFOV[5]70.3BoxSup(Semi,with weak COCO)[9]71.0DeepLab-CRF-LargeFOV(Multi-scale net)[5]71.6Oxford TVG CRF RNN VOC[41]72.0Oxford TVG CRF RNN COCO[41]74.7Cross-Pretrain(Strong)72.7Cross-Joint(Strong)73.0Cross-Pretrain(Strong,Multi-scale net)73.6Cross-Joint(Strong,Multi-scale net)73.9Table6.VOC2012test performance using PASCAL and MS-COCO annotations(Sec.4.5).4.5.Exploiting Annotations Across Datasets Validation results We present experiments leveraging the 81-label MS-COCO dataset as an additional source of data in learning the DeepLab model for the21-label PASCAL VOC2012segmentation task.We consider three scenarios:•Cross-Pretrain(Strong):Pre-train DeepLab on MS-COCO,then replace the top-level network weights and fine-tune on Pascal VOC2012,using pixel-level anno-tation in both datasets.•Cross-Joint(Strong):Jointly train DeepLab on Pas-cal VOC2012and MS-COCO,sharing the top-level network weights for the common classes,using pixel-level annotation in both datasets.•Cross-Joint(Semi):Jointly train DeepLab on Pascal VOC2012and MS-COCO,sharing the top-level net-work weights for the common classes,using the pixel-level labels from PASCAL and varying the number of pixel-and image-level labels from MS-COCO.In all cases we use strong pixel-level annotations for all 10,582train aug PASCAL images.We report our results on the PASCAL VOC2012val in Table5,also including for comparison our best PASCAL-only67.6%result exploiting all10,582strong annotations as a baseline.When we employ the weak MS-COCO an-notations(EM-Fixed(Semi))we obtain67.7%IOU,which does not improve over the PASCAL-only baseline.How-ever,using strong labels from5,000MS-COCO images (4.0%of the MS-COCO dataset)and weak labels from the remaining MS-COCO images in the Cross-Joint(Semi) semi-supervised scenario yields70.0%,a significant2.4%boost over the baseline.This Cross-Joint(Semi)result is also1.3%better than the68.7%performance obtained us-ing only the5,000strong and no weak annotations from MS-COCO.As expected,our best results are obtained by using all123,287strong MS-COCO annotations,71.0%for Cross-Pretrain(Strong)and71.7%for Cross-Joint(Strong). We observe that cross-dataset augmentation improves by 4.1%over the best PASCAL-only ing only a small portion of pixel-level annotations and a large portion of image-level annotations in the semi-supervised setting reaps about half of this benefit.Test results We report our PASCAL VOC2012test re-sults in Table6.We include results of other leading models from the PASCAL leaderboard.All our models have been trained with pixel-level annotated images on the PASCAL trainval aug and the MS-COCO2014trainval datasets.Methods based on the DCNN+CRF pipeline of DeepLab-CRF[5]are the most competitive,with perfor-mance surpassing70%,even when only trained on PAS-CAL data.Leveraging the MS-COCO annotations brings about2%improvement.Our top model yields73.9%,using the multi-scale network architecture of[5].Also see[41], which also uses joint PASCAL and MS-COCO training,and further improves performance(74.7%)by end-to-end learn-ing of the DCNN and CRF parameters.4.6.Qualitative Segmentation ResultsIn Fig.6we provide visual comparisons of the results obtained by the DeepLab-CRF model learned with some of the proposed training methods.5.ConclusionsThe paper has explored the use of weak or partial anno-tation in training a state of art semantic image segmenta-tion model.Extensive experiments on the challenging PAS-CAL VOC2012dataset have shown that:(1)Using weak annotation solely at the image-level seems insufficient to train a high-quality segmentation model.(2)Using weak bounding-box annotation in conjunction with careful seg-mentation inference for images in the training set suffices to train a competitive model.(3)Excellent performance is obtained when combining a small number of pixel-level an-notated images with a large number of weakly annotated images in a semi-supervised setting,nearly matching the results achieved when all training images have pixel-level annotations.(4)Exploiting extra weak or strong annota-tions from other datasets can lead to large improvements. AcknowledgmentsThis work was partly supported by ARO62250-CS,and NIH5R01EY022247-03.We also gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.。
semi-supervised deep continuous learning 笔记
semi-supervised deep continuous
learning 笔记
半监督学习(semi-supervised learning)研究的是如何充分利用已标记数据和大量未标记数据来提升学习性能的方法。
假设数据取样自样本空间$X$和标记空间$Y$上的未知分布$D$。
在半监督学习中,给定标记数据集$L$和未标记数据集$U$,其中$xi∈X,yi∈Y$,且通常有$l<<m$,目标为学习获得$H:X→V$。
简单起见,考虑二分类任务,即$Y=\{-1,+1\}$。
未标记数据可以帮助揭示潜在数据分布的一些信息,进而帮助构建具有更强泛化能力的模型。
半监督学习方法的基础就在于如何构建假设来刻画未标记数据所揭示的数据分布和类别标记信息之间的联系。
两类基本假设包括聚类假设(cluster assumption)和流形假设(manifold assumption)。
聚类假设假设相似的输入具有相似的类别标记;流形假设假设所有的数据都在一个低维流形中,未标记数据有助于构造此流形。
聚类假设关注分类任务,流形假设可被应用到分类以外的其他学习任务上。
另外,直推学习和半监督学习紧密相关,两者的区别在于对测试数据的假设。
直推学习采用封闭世界假设(closed-world assumption),即未标记数据就是测试数据,事先已知;半监督学习是基于开放世界假设(open-world assumption),即测试数据未知,未标记数据未必是测试数据。
因此,直推学习可以看成一种特殊的半监督学习。
总的来说,半监督学习是一个不断发展和演进的领域,它在机器学习和数据挖掘领域中具有重要的研究价值和应用前景。
机器学习中强化学习与监督学习、无监督学习和强化学习的区别
机器学习中强化学习与监督学习、⽆监督学习和强化学习的区别监督学习(Supervised learning)监督学习即具有特征(feature)和标签(label)的,即使数据是没有标签的,也可以通过学习特征和标签之间的关系,判断出标签--分类。
简⽽⾔之:提供数据,预测标签。
⽐如对动物猫和狗图⽚进⾏预测,预测label为cat或者dog。
通过已有的⼀部分输⼊数据与输出数据之间的对应关系,⽣成⼀个函数,将输⼊映射到合适的输出,例如分类。
e.g. 分类和回归问题⽆监督学习(Unsupervised learning)⽆监督学习即只有特征,没有标签。
没有标签的训练数据集中,通过数据之间的内在联系和相似性将他们分成若⼲类--聚类。
根据数据本⾝的特征,从数据中根据某种度量学习出⼀些特征。
e.g. ⽐如⼀个⼈没有见过恐龙和鲨鱼,如果给他看了⼤量的恐龙和鲨鱼,虽然他没有恐龙和鲨鱼的概念,但是他能够观察出每个物种的共性和两个物种之间的区别,并对这两种动物予以区分。
简⽽⾔之:给出数据,寻找隐藏的关系。
半监督学习(semi-supervised learning):半监督学习使⽤的数据,⼀部分是标记过的,⽽⼤部分是没有标记的,和监督学习相⽐较,半监督学习的成本⽐较低,但是⼜能达到较⾼的准确度。
即综合利⽤有类标的和没有类别标记的数据,来⽣成合适的分类函数。
强化学习(Reinforcement learning)强化学习与半监督学习类似,均使⽤未标记的数据,但是强化学习通过算法学习是否距离⽬标越来越近,我理解为激励与惩罚函数。
类似⽣活中,⼥朋友不断调教直男友变成暖男。
区别:(1)监督学习有反馈,⽆监督学习⽆反馈,强化学习是执⾏多步之后才反馈。
(2)强化学习的⽬标与监督学习的⽬标不⼀样,即强化学习看重的额时⾏为序列下的长期收益,⽽监督学习往往关注的是和标签或已知输出的误差。
(3)强化学习的奖惩是没有正确或错误之分的,⽽监督学习标签就是正确的,并且强化学习是⼀个学习+决策的过程,有和环境交互的能⼒(交互的结果以惩罚的形式返回),⽽监督学习不具备。
重庆大学研究生学位论文格式
重庆大学研究生学位论文格式内容摘要:重庆大学研究生学位论文格式,重庆大学研究生论文模板 1 引言的质量,便利研究生学位论文的收集、存储、处理、加工、检索、利用、交流、传播。
1.2 本标准... 重庆大学研究生学位论文格式,重庆大学研究生论文模板1 引言的质量,便利研究生学位论文的收集、存储、处理、加工、检索、利用、交流、传播。
1.2 本标准适用于申请硕士学位、博士学位的学位论文的编写格式。
1.3 本标准是参照中华人民共和国国家标准《科学技术报告、学位论文和学术论文的编写格式》和《文后参考文献著录规则》制订的。
2 学位论文2.1 硕士学位论文硕士学位论文应能表明作者确已在本门学科上掌握了坚实的基础理论和系统的专业知识,并对所研究课题有新的见解,有从事科学研究工作或独立担负专门技术工作的能力。
2.2 博士学位论文博士学位论文应能表明作者确已在本门学科上掌握了坚实宽广的基础理论和系统深入的专门知识,并具有独立从事科学研究工作的能力,在科学或专门技术上做出了创造性的成果。
3 编写要求3.1 学位论文须用16K标准白纸、使用简化汉字、计算机打印、复制。
3.2 学位论文页边距按以下标准设置:上边距:2.8cm;下边距:2.5cm;左边距:2.5cm;右边距:2.5cm;装订线:0.5cm;页眉:1.6cm;页脚:1.5cm。
3.3 页眉从摘要页开始到最后,在每一页的最上方,用5号宋体,左对齐为“重庆大学博士(或硕士)学位论文”,右对齐为各章章名,页眉之下划1条线。
双面复制的论文,左页页眉居中为“重庆大学硕士(或博士)学位论文”,右页页眉居中为各章章名。
3.4 学位论文字间距设置为标准字间距(小四号宋体)或加宽0.2磅(五号宋体);行间距设置为加宽0.2磅。
也可参考上述值按每页32字×36行(小四字宋体)或34字×36行(五号宋体)设置。
4 编写格式4.1 学位论文章、节的编号采用阿拉伯数字分级编号(见6.2.1)。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
I. I NTRODUCTION Topographic visualisation of multi-dimensional data has been an important method of data analysis and data mining for several years [4], [18]. Visualisation is an effective way for domain experts to detect clusters, outliers and other important structural features in data. In addition, it can be used to guide the data mining process itself by giving feedback on the results of analysis [23]. In this paper we use latent variable models to visualise data, so that a single plot may contain several data clusters; our aim is to provide sufficiently informative plots that the clusters can be seen to be distinct rather than confining each model to a single cluster (as would be appropriate for cluster analysis). In a complex domain, however, a single two-dimensional projection of high-dimensional data may not be sufficient to capture all of the interesting aspects of the data. Therefore, hierarchical
1
Semi-Supervised Learning of Hierarchical Latent Trait Models for Data Visualisation
Ian T. Nabney, Yi Sun, Peter Tiˇ no, and Ata Kab´ an
Ian T. Nabney is with the Neural Computing Research Group, Aston University, Birmingham, B4 7ET, United Kingdom. E-mail: i.t.nabney@ Yi Sun is with the School of Computer Science, University of Hertfordshire, Hatfield, Herts AL10 9AB, United Kingdom. E-mail: Y.2.Sun@ Peter Tiˇ no and Ata Kab´ an are with the School of Computer Science, University of Birmingham, Birmingham B15 2TT, United Kingdom. E-mail: P.Tino, A.Kaban@.
August 16, 2004 Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an interactive method for visualisation of large high-dimensional real-valued data sets. In this paper, we propose a more general visualisation system by extending HGTM in 3 ways, which allow the user to visualise a wider range of datasets and better support the model development process. (i) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM). This enables us to visualise data of inherently discrete nature, e.g. collections of documents in a hierarchical manner. (ii) We give the user a choice of initialising the child plots of the current plot in either interactive, or automatic mode. In the interactive mode the user selects “regions of interest”, whereas in the automatic mode an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualising large data sets. (iii) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets. Index Terms Hierarchical model, Latent trait model, Magnification factors, Data visualisation, Document mining.
August 16, 2004 DRAFT
3
extensions of visualisation methods [7], [22] have been developed. These allow the user to ‘drill down’ into the data; each plot covers a smaller region and it is therefore easier to discern the structure of data. Also plots may be at an angle and so reveal more information. For example, clusters may be split apart instead of lying on top of each other. Recently, we have developed a general and principled approach to the interactive construction of non-linear visualisation hierarchies [27], the basic building block of which is the Generative Topographic Mapping (GTM) [4]. GTM is a probabilistic reformulation of the self-organizing map (SOM) [17] in the form of a non-linear latent variable model with a spherical Gaussian noise model. The extension of the GTM algorithm to discrete variables was described in [5] and a generalisation of this to the Latent Trait Model (LTM), a latent variable model class whose noise models are selected from the exponential family of distributions, was developed in [14]. In this paper we extend the hierarchical GTM (HGTM) visualisation system to incorporate LTMs. This enables us to visualise data of an inherently discrete nature, e.g. collections of documents. A hierarchical visualisation plot is built in a recursive way; after viewing the plots at a given level, the user may add further plots at the next level down in order to provide more insight. These child plots can be trained using the EM algorithm [10], but their parameters must be initialized in some way. Existing hierarchical models do this by allowing the user to select the position of each child plot in an interactive mode; see [27]. In this paper, we show how to provide the user with an automatic initialization mode which works within the same principled probabilistic framework as is used for the overall hierarchy. The automatic mode allows the user to determine both the number and the position of child LTMs in an unsupervised manner. This is particularly valuable when dealing with large quantities of data that make visualisation plots at higher levels complex and difficult to deal with in an interactive manner. An intuitively simple but flawed approach would be to use a data partitioning technique (e.g. [25]) for segmenting the data set, followed by constructing visualisation plots in the individual compartments. Clearly, in this case there would be no direct connection between the criterion for choosing the quantization regions and that of making the local low-dimensional projections. By employing LTM, however, such a connection can be established in a principled manner. This is achieved by exploiting the probabilistic nature of the model, which enables us to use a principled minimum message length (MML)-based learning of mixture models with an embedded model