Semi-supervised Clustering with Limited Background Knowledge
The EM (Expectation Maximization) algorithm is used for clusteringalong with must link constraints. The proposed method isapplied for natural images using MATLAB 7. Thus the proposed method extracts Object of Interest (OOI) from OONI (Object of Not Interest) efficiently and the experimental results are compared with Standard K Means and EM Algorithm also. The results show that the proposedsystem gives better results than the other two methods. Itmay also be suitable for object extraction from natural images and medical image analysis.Index Terms — Semi supervised image segmentation - prior knowledge – constrained clustering.I. I NTRODUCTIONImage segmentation is the method of dividing an image into different regions such that each region is homogeneous. By partitioning an image into a set of disjoint segments, image segmentation leads to more compact image representation. As the central step in computer vision and image understanding, image segmentation has been extensively investigated in the past decades, with a large number of image segmentation algorithms. There are number of segmentation techniques exist in the literature. But no single method can be considered best for all kind of images. Most of the techniques are being pretty ad hoc in nature.A)Need of semi supervised image segmentation:Semi supervised method is the combination of both supervised (classification) and unsupervised (clustering) and classification concept. Before performing clustering some prior knowledge is given. If the algorithm is purely an unsupervised (clustering) algorithm it will not show good result for all kind of images since an iterativeclustering algorithms commonly do not lead to optimal cluster solutions. Partitions that are generated by these algorithms are known to be sensitive to the initial partitions that are fed as an input parameter. A “good” selection of initial seed is an important clustering problem. Likewise the classification algorithm will not give best solution since the result depends on type of classifier. So this paper discuss about combination ofthese two methods called ‘semi supervised model’ for image segmentation. The following section discuss about semi supervised model[13][14]. b.)Semi supervised clustering: During clustering process a small amount of priorknowledge is given either as labels or constraints or any other prior information. The following figure explains about the semi supervised clustering model[15][16][17].Figure 1. Semi supervised clustering modelIn the above figure, the three clusters are formedusing certain constraints or prior information. Besides the similarity information which is used as color knowledge, the other kind of knowledge is also available by either pair wise (must-link or cannot-link) constraints between data items or class labels for some items. Instead of simply using this knowledge for the external validation of the results of clustering, one can imagine letting it “guide” or “adjust” the clustering process, i.e. provide a limited form of supervision. There are two ways to provide information for semi supervised clustering.1. Search based.2. Similarity based.52 A New Enhanced semi supervised image segmentation using Marker as Prior information1.Search based :The clustering algorithm itself is modified so that user-provided constraints or labels can be used to bias the search for an appropriate clustering. This can be done in several ways, such as by performing a transitive closure of the constraints and using them to initialize clusters [4], by including the cost function a penalty for lack of compliance with the specified constraints [10][11], or by requiring constraints to be satisfied during cluster assignment in the clustering process [12].2.Similarity based:There are several similarity measures existing in the domain. Any one similarity measure is adapted[7][8][9] so that the given constraints can be easily satisfied.In this paper semi supervised image segmentation with minimum user label is discussed. Instead of selecting some sample pixels with mouse clicks [1], a group of pixels are selected as a region using mouse. A pixel which has the same color and intensity as in the selected marker region (by mouse selection) will come under one cluster and others pixels will not be. This concept is given in detail in the following sections.I.P REVIOUS R ELATED W ORKSRecently there are many papers focusing the importance of semi supervised image segmentation. Among them a few papers are analyzed. According to paper [3], the semi-supervised C-Means algorithm is introduced in this paper to solve three problems in the domains like choosing and validating the correct number of clusters, Insuring that algorithmic labels correspond to meaningful physical labels tendency to recommend solutions that equalize cluster populations. The algorithm used MRI brain image for segmentation.In this [4] paper, how the popular k-means Clustering algorithm can be modified to make use of the available information with some artificial constraints. This method was implemented for six datasets and it has showed good improvement in clustering accuracy. This method was also applied to the real world problem of automatically detecting road lanes from GPS data and observed dramatic increases in performance.In paper [5] a novel semi-supervised Fuzzy C-means algorithm is proposed. A set called as seed set which contains a small amount of labeled data is used. First, an initial partition in the seed set is done, then use the center of each partition as the cluster center and optimize the objective function of FCM using EM algorithm. Experiments results show that the defect of fuzzy c-means is avoided that is sensitive to the initial centers partly and give much better partition accuracy.In Paper [6], Semi-supervised clustering uses a small amount of labeled data to aid and bias the clustering of unlabeled data. Here labeled data is used to generate initial seed clusters along with the constraints generated from labeled data to guide the clustering process. It introduces two semi-supervised variants of KMeans clustering that can be viewed as instances of the EM algorithm, where labeled data provides prior information about the conditional distributions of hidden category labels. Experimental results demonstrate the advantages of these methods over standard random seeding and COP-KMeans, a previously developed semi-supervised clustering algorithm.This paper [12] focuses on semi-supervised clustering, where the goal is to cluster a set of data-points given a set of similar/dissimilar examples. Along with instance-level equivalence (similar pairs belong to the same cluster) and in-equivalence constraints (dissimilar pairs belong to different clusters) feature space level constraints (how similar are two regions in feature space) are also used for getting final clustering. This task is accomplished by learning distance metrics (i.e., how similar are two regions in the feature space?) over the feature space which that are guided by the instance-level. A bag of words models, which are nothing but code words (or visual-words) are used as building blocks. Our proposed technique learns non-parametric distance metrics over codewords from these equivalence (and optionally, in-equivalence) constraints, which are then able to propagate back to compute a dissimilarity measure between any two points in the feature space. Thus this work is more advanced than previous works. First, unlike past efforts on global distance metric learning which try to transform the entire feature space so that similar pairs are close. This transformation is non-parametric and thus allows arbitrary non-linear deformations of the feature space. Second, while most Mahalanobis metrics are learnt using Semi-Definite Programming (SDP), this paper discuss about a Linear Program (LP) and in practice, is extremely fast. Finally, Corel image datasets (MSRC, Corel) where ground-truth segmentation is available. Over all, this idea gives improved clustering accuracy.II.METHODOLOGYIn this paper, ground truth image is taken with proper class labels. The Octree color quantization algorithm is applied to get the reduced colors. This color table is integrated with must link constraints for the given image using EM algorithm.A group of pixels as a region must be selected separately for object of interest (OOI) and object of not interest (OONI) for back ground and foreground from an input image. Here OOI and OONI refer foreground and back ground objects respectively. Find the smallestA New Enhanced semi supervised image segmentation using Marker as Prior information 53distance for each and every pixel in the marker to its neighboring pixels using Mahalanobi’s formula.---- (1) Wherex row vector which contains the pixels inside a marker and the other area.COV represents sample covariance matrix.If this distance is less than the assumed threshold value ( 0.5) then find the exact color index. If the index values are same then assign to the same region otherwise (same value & different class index) delete any one index and group. The above process is repeated for all pixels in the marker. Finally the object of interest (OOI) is clearly segmented than the other methods.Fig 1. Work Flow Diagram of the proposed work The algorithm based on proposed idea is given below:beled image is taken as input image.2.Octree color quantization to reduce the colorsand store with class label.3.Mark object of interest (OOI) and object of notinterest (OONI) using mouse.4.Cluster quantized color table using EMAlgorithm interated with must-link constraints.Must link constraints means that if any pointbelongs to same region, group them into oneregion otherwise need not group.5. Repeats steps 4 and 5 for each point in object ofinterest(OOI) and object of not interest(OONI)6. Let X be a selection with N points7. For each point in X na.Let Y = x i (x ∈ X)b.Consider its neighboring coordinates withina rectangle, R of size 3 x 3.c.Calculate distance vector usingMahalanobis distance d(R i, Y)d.Find minimum distance d(R i, Y).e.If d < threshold then find Color index of Yand R ii.If belong to same label, group intosame regionii.If belong to different labels buthave same color then delete R iIII.RESULTS AND DISCUSSIONSFigure 2:Here three different labeled images are taken as input .These three images are partially separated images.Fig 2.a Fig 2.bFig 2.cFigure 3:Marking of OOI (Object of interest) and OONI (Objectof not interest) for all the above three figures using blueand green color.Object of intereste (Green Color mark) Object of Not interest (Blue color mark )54A New Enhanced semi supervised image segmentation using Marker as Prior informationFig 3.a Fig 3.Fig 3.cFigure 4:Resullt of proposed method for the above three figures.Fig 4.aFig 4.bFig 4.c Figure 5:Labeled image and different marker selection.Fig 5.a Fig.5.bFig 5.cFigure 6:Results using proposed idea for the above figure 5 usingdifferent marker selection..Fig 6.a Fig 6.b(From these two pictues the object is segmented properly)Fig.6.cThe object of interest is not segmented properlyFigure 7:Results of input images using K Means method.Fig 7.aFig 7.bFig 7.cFigure 8:Result of input images using Standard EM method.Fig 8.aFig.8.bA New Enhanced semi supervised image segmentation using Marker as Prior information 55Fig 8.cThe time taken for getting the result of the input image1 (Bird) using proposed method is noted down in the table.Table 1. Performance Table for image1It shows that the proposed method has taken more time than K means and less time than EM method. The performance is shown in the chart given below.Figure 9. Performance chartUsing proposed idea any labeled image is taken as input. The two markings are given like OOI and OONI(object of interest and object of not interest) with different colors. There are two different marker selections are shown in the figure 3 and 5. According to the marker selection given, the object is extracted from its background. Compared to the results in Figure 4, the segmentation results are better in figure 6 for some pictures. This is because the quality of segmentation depends on marker selection. These proposed results are also compared with K means and Standard EM Algorithm. In Figure7 and 8, the K Means & EM Algorithms do not segment the foreground of the object accurately. But the proposed method results are better in figure 6 and the object of interest is also separatedaccurately.IV. CONCLUSIONThe above result shows that the proposed semi supervised segmentation extracts the object of interest precisely. But the result of segmentation depends on the marker selection on left and right side of the image. If the marker selection is not given properly, the result of segmentation will not be good. This may be eliminated in future by adding certain other constraints for texture, color etc.REFERENCES[1] Yuntao Qian, Wenwu Si, IEEE, ” Semi-supervised Color Image Segmentation Method”-2005[2] Yanhua Chen, Manjeet Rege, Ming Dong, JingHua FarshadFotouhi Department of Computer Science Wayne State UniversityDetroit, MI48202 “Incorporating User Provided Constraints into Document Clustering”,2009[3] Amine M. Bensaid, Lawrence O. Hall Department ofComputer Science and Engineering Universit of South Florida Tampa, Partially Supervised Clustering for Image Segmentation -1994[4] Kiri Wagstaff, Claire Cardie ,Seth Rogers &StefanSchroedl ,”Constrained K-means Clustering with Background Knowledge-2001” [5]Kunlun Li; Zheng Cao; Liping Cao; Rui Zhao; Coll. Of Electron. & Inf. Eng., Hebei Univ., Baoding, China ,“A novel semi-supervised fuzzy c-means clustering method”2009,IEEE Explorer[6]Sugato Basu , Arindam Banerjee , R. Mooney , In proceedings of 19th international conference on Machine Learning(ICML-2002),Semi-supervised Clustering by Seeding (2002)[7] David Cohn, Rich Caruana, and Andrew McCallum. Semi-supervised clustering with user feedback, 2000.[8]Dan Klein, Sepandar D. Kamvar, and Christopher D. Manning. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In Proceedings of the 19th International Conference on Machine Learning, pages 307–314. Morgan Kaufmann Publishers Inc., 2002.[9]Eric P. Xing, Andrew Y. Ng, Michael I. Jordan, and Stuart Russell. Distance metric learning with application to clustering with sideinformation. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 505–512, Cambridge, MA, 2003. MIT Press.[10] A. Demiriz, K. Bennett, and M. Embrechts. Semi-supervised clustering using genetic algorithms. In C. H. Dagli et al., editor, Intelligent Engineering Systems Through Artificial Neural Networks 9, pages 809–814. ASME Press, 1999. [11] [11] K.Wagstaff and C. Cardie. Clustering with instance-level constraints. In Proceedings of the 17th International Conference on Machine Learning, pages 1103–1110, 2000. [12]Dhruv Batra, Rahul Sukthankar and Tsuhan Chen,“Semi-Supervised Clustering via LearntCodeword Distances”,2008.[13]Richard Nock, Frank Nielsen,Semi supervisedstatistical region refinement for color image segmentation,2005S NoMethodsTime in sec.1 K Means 6.32142 EM 40.22405 3Proposed13.0559856 A New Enhanced semi supervised image segmentation using Marker as Prior information [14]` Jan Kohout,Czech Technical University in PragueFaculty of Electrical Engineering,Supervisor: Ing .Jan Urban,Semi supervised image segmentation ofbiological samples-PPT,July 29, 2010[15] Ant´onio R. C. Paiva1 and Tolga Tasdizen,Fast SemiSupervised image segmentation by novelty selection,2009[16] Kwangcheol Shin and Ajith Abraham,Two Phase Semi-supervised ClusteringUsing Background Knowledge,2006.[17] M´ario A. T. Figueiredo, Dong Seon Cheng, Vittorio Murino,Clustering Under Prior Knowledge with Application toImage Segmentation,2005.Biography:Mrs L.Sankari is currently workingas an Assistant Professor in theDepartment of Computer Science, SriRamakrishna College of Arts andScience for women, Coimbatore- 641044, Tamilnadu, India. She is about16 years of teaching experience. She has published fournational and four international research papers. Her researchinterest area includes image processing, Data mining , Patternclassification and optimization techniques.Dr. C. Chandrasekar received hisPh.D. degree from Periyar University,Salem. He has been working asAssociate Professor at Dept. ofComputer Science, Periyar University,Salem – 636 011, TamilNadu, India.His research interest includes Wirelessnetworking, Mobile computing, Computer Communication andNetworks. He was a Research guide at various University inIndia. He has been published more than 50 research papers atvarious National/ International Journals.。
semi-supervised deep continuous learning 笔记
semi-supervised deep continuous
learning 笔记
半监督学习(semi-supervised learning)研究的是如何充分利用已标记数据和大量未标记数据来提升学习性能的方法。
两类基本假设包括聚类假设(cluster assumption)和流形假设(manifold assumption)。
直推学习采用封闭世界假设(closed-world assumption),即未标记数据就是测试数据,事先已知;半监督学习是基于开放世界假设(open-world assumption),即测试数据未知,未标记数据未必是测试数据。
dbscan算法的基本概念DBSCAN(Density-Based Spatial Clustering of Applications with Noise)是一种常用的密度聚类算法,用于对数据集进行聚类分析。
1. 什么是密度聚类?密度聚类是一种基于样本之间密度关系的聚类方法,与传统的基于距离的聚类方法不同。
2. DBSCAN算法的基本原理是什么?DBSCAN算法基于以下两个基本概念:核心点(Core Point)和密度直达(Density-Reachability)。
3. DBSCAN算法的基本步骤是什么?DBSCAN算法的基本步骤包括以下几个步骤:步骤1:选择一个未被访问的点P。
Proximal Policy Optimization (PPO)PPO是一种基于策略的深度强化学习算法,它可以解决具有连续动作空间的强化学习问题。
Double Deep Q-Network (DDQN)DDQN是一种改进的DQN算法,它通过使用两个神经网络来估计Q值,从而解决DQN中存在的稳定性问题。
Asynchronous Advantage Actor-Critic (A3C)A3C是一种基于策略的深度强化学习算法,它可以解决多智能体任务的问题。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Proceedings of the Ninth AAAI/SIGART Doctoral Consortium,pp. 979-980, San Jose, CA, July 2004Semi-supervised Clustering with Limited Background KnowledgeSugato BasuEmail:sugato@Address:Department of Computer Sciences,University of Texas at Austin,Austin,TX-78712,USAThesis GoalIn many machine learning domains,there is a large supply of unlabeled data but limited labeled data,which can be expen-sive to generate.Consequently,semi-supervised learning, learning from a combination of both labeled and unlabeled data,has become a topic of significant recent interest.Our research focus is on semi-supervised clustering,which uses a small amount of supervised data in the form of class labels or pairwise constraints on some examples to aid unsuper-vised clustering.Semi-supervised clustering can be either constraint-based,i.e.,changes are made to the clustering ob-jective to satisfy user-specified labels/constraints,or metric-based,i.e.,the clustering distortion measure is trained to sat-isfy the given labels/constraints.Our main goal in this thesis is to study constraint-based semi-supervised clustering algo-rithms,integrate them with metric-based approaches,char-acterize some of their properties and empirically validate our algorithms on different domains,e.g.,text processing and bioinformatics.BackgroundExisting methods for semi-supervised clustering fall into two general approaches that we call constraint-based and metric-based methods.In constraint-based approaches,the clustering algorithm itself is modified so that user-provided labels or constraints are used to get a more appropriate clustering.Previous work in this area includes modifying the clustering objective func-tion so that it includes a term for satisfying specified con-straints(Demiriz,Bennett,&Embrechts1999),and enforc-ing constraints to be satisfied during the cluster assignment in the clustering process(Wagstaff et al.2001).In metric-based approaches,an existing clustering al-gorithm that uses a particular distortion measure is em-ployed;however,the measure isfirst trained to satisfy the labels or constraints in the supervised data.Several dis-tortion measures have been used for metric-based semi-supervised clustering,including Jensen-Shannon divergence trained using gradient descent(Cohn,Caruana,&McCal-lum2003),Euclidean distance modified by a shortest-path algorithm(Klein,Kamvar,&Manning2002),or Maha-Copyright c 2004,American Association for Artificial Intelli-gence().All rights nobis distances trained using convex optimization(Bar-Hillel et al.2003;Xing et al.2003).However,metric-based and constraint-based approaches to semi-supervised clustering have not been adequately compared in previous work,and so their relative strengths and weaknesses are largely unknown.An important domain that motivates the semi-supervised clustering problem is the clustering of genes for functional prediction.For most organisms,only a limited number of genes are annotated with their functional pathways,with the majority of the genes still having unknown functions. Categorization of these genes into functional groups using gene microarray data,phylogenetic profiles, a nat-ural semi-supervised clustering problem.Clustering(with model selection to choose the right number of clusters)is more well suited to this domain than classification,since the number of functional classes is not known a priori.More-over background knowledge,available in the form of func-tional pathway labels(KEGG,GO)or constraints over some of the genes(DIP),could easily be incorporated as supervi-sion to improve the clustering accuracy.ProgressIn ourfirst work,we showed how supervision in the form of labeled data can be incorporated into clustering(Basu, Banerjee,&Mooney2002).The labeled data were used to generate seed clusters for initializing model-based clustering algorithms,and constraints generated from the labeled data were used to guide the clustering process towards a parti-tioning similar to the user-specified labels.We showed that the K-Means algorithm is equivalent to an EM algorithm on a mixture of K Gaussians under assumptions of iden-tity covariance of the Gaussians,uniform mixture compo-nent priors and expectation under a particular type of condi-tional distribution.This underlying model helps us to prove convergence guarantees for the proposed label-based semi-supervised clustering algorithms.Next,we showed that semi-supervised clustering with pairwise must-link and cannot-link constraints has an under-lying probabilistic model–a Hidden Markov Random Field (HMRF)(Basu,Banerjee,&Mooney2004).In this work, we also outlined a method for selecting maximally infor-mative constraints in a query-driven framework for pairwise constrained clustering.In order to maximize the utility ofthe limited supervised data available in a semi-supervised setting,supervised training examples should be,if possible, actively selected as maximally informative ones rather than chosen at random.This would imply that fewer constraints will be required to significantly improve the clustering accu-racy.To this end,a new algorithm was developed to actively select good pairwise constraints for semi-supervised cluster-ing,using an active learning strategy based on farthest-first traversal.The proposed scheme has two phases:(a)explore the given data to get pairwise disjoint non-null neighbor-hoods,each belonging to a different cluster in the underly-ing true categorization of the data,within a small number of queries,and(b)consolidate this cluster structure using the remaining queries,to get better centroid estimates.In recent work,we have shown that the HMRF clus-tering model is able to incorporate any Bregman diver-gence(Banerjee et al.2004)as the clustering distortion measure,which allows using the framework with such com-mon distortion measures as KL-divergence,I-divergence, and parameterized squared Mahalanobis distance.Addition-ally,cosine similarity can also be used as the clustering dis-tortion measure in the framework,which makes it useful for directional datasets(Basu,Bilenko,&Mooney2004). For all such measures,minimizing the semi-supervised clus-tering objective function becomes equivalent tofinding the maximum a posteriori probability(MAP)configuration of the underlying HMRF.We have also developed a new semi-supervised cluster-ing approach that unifies constraint-based and metric-based techniques in an integrated framework(Bilenko,Basu,& Mooney2004).This algorithm trains the distortion measure with each clustering iteration,utilizing both unlabeled data and pairwise constraints.The formulation is able to learn individual metrics for each cluster,which permits clusters of different shapes.This work also explores metric learning for feature generation(in contrast to simple feature weight-ing),which we empirically demonstrate to outperform cur-rent state-of-the-art metric learning algorithms,under cer-tain conditions.In all these projects,experiments have been performed on both low dimensional UCI datasets and high dimensional text data sets,using KMeans and EM as the baseline cluster-ing algorithms.Proposed ResearchIn future,we want to study the following aspects of semi-supervised clustering,in decreasing order of priority: (1)Semi-supervised approaches forfinding overlapping clusters in the data.This is especially relevant for gene clus-tering in the bioinformatics domain,since genes often be-long to multiple functional pathways.(2)The feasibility of semi-supervising other clustering al-gorithms,e.g.,spectral clustering,agglomerative clustering, etc.We are currently exploring semi-supervised versions of kernel-based clustering,which would be useful for datasets that are not linearly separable.(3)Application of the semi-supervised clustering model to other domains apart from UCI datasets and text.We are currently focusing on two domains:(a)search result cluster-ing of web search engines,e.g.,Google,and(b)clustering of gene microarray data in bioinformatics.(4)Effect of noisy or probabilistic supervision in pairwise constrained clustering.This study will be especially impor-tant for deploying our proposed semi-supervised clustering algorithms to practical settings,where background knowl-edge would be in general noisy.(5)Model selection using both unsupervised data and the limited supervised data,for automatic selection of number of clusters in semi-supervised clustering.Most model se-lection criteria in clustering are only based on unsupervised data–we want to explore whether supervised data available in the form of labels or constraints can be used to select the number of clusters more effectively.(6)Theoretical study of the relative benefits of supervised and unsupervised data in semi-supervised clustering,similar to the analysis of(Ratsaby&Venkatesh1995).ReferencesBanerjee,A.;Merugu,S.;Dhillon,I.S.;and Ghosh,J.2004. Clustering with Bregman divergences.In Proc.of the2004SIAM Intl.Conf.on Data Mining(SDM-04).Bar-Hillel,A.;Hertz,T.;Shental,N.;and Weinshall,D.2003. Learning distance functions using equivalence relations.In Proc. of20th Intl.Conf.on Machine Learning(ICML-2003),11–18. Basu,S.;Banerjee,A.;and Mooney,R.J.2002.Semi-supervised clustering by seeding.In Proc.of19th Intl.Conf.on Machine Learning(ICML-2002),19–26.Basu,S.;Banerjee,A.;and Mooney,R.J.2004.Active semi-supervision for pairwise constrained clustering.In Proc.of the 2004SIAM Intl.Conf.on Data Mining(SDM-04).Basu,S.;Bilenko,M.;and Mooney,R.J. 2004.A probabilistic framework for semi-supervised clustering.In submission,available at /˜ml/publication. Bilenko,M.;Basu,S.;and Mooney,R.J.2004.Integrating constraints and metric learning in semi-supervised clustering.In Proc.of21st Intl.Conf.on Machine Learning(ICML-2004). Cohn,D.;Caruana,R.;and McCallum,A.2003.Semi-supervised clustering with user feedback.Technical Report TR2003-1892, Cornell University.Demiriz,A.;Bennett,K.P.;and Embrechts,M.J.1999.Semi-supervised clustering using genetic algorithms.In Artificial Neu-ral Networks in Engineering(ANNIE-99),809–814.Klein,D.;Kamvar,S.D.;and Manning,C.2002.From instance-level constraints to space-level constraints:Making the most of prior knowledge in data clustering.In Proc.of19th Intl.Conf.on Machine Learning(ICML-2002),307–314.Ratsaby,J.,and Venkatesh,S.S.1995.Learning from a mixture of labeled and unlabeled examples with parametric side informa-tion.In Proc.of the8th Annual Conf.on Computational Learning Theory,412–417.Wagstaff,K.;Cardie,C.;Rogers,S.;and Schroedl,S.2001. Constrained K-Means clustering with background knowledge.In Proc.of18th Intl.Conf.on Machine Learning(ICML-2001),577–584.Xing,E.P.;Ng,A.Y.;Jordan,M.I.;and Russell,S.2003.Dis-tance metric learning,with application to clustering with side-information.In Advances in Neural Information Processing Sys-tems15,505–512.Cambridge,MA:MIT Press.。