Partitions of complete geometric graphs into plane trees

合集下载

stoerwagner-mincut.[Stoer-Wagner,Prim,连通性,无向图,最小边割集]

stoerwagner-mincut.[Stoer-Wagner,Prim,连通性,无向图,最小边割集]

A Simple Min-Cut AlgorithmMECHTHILD STOERTeleverkets Forskningsinstitutt,Kjeller,NorwayANDFRANK WAGNERFreie Universita¨t Berlin,Berlin-Dahlem,GermanyAbstract.We present an algorithm for finding the minimum cut of an undirected edge-weighted graph.It is simple in every respect.It has a short and compact description,is easy to implement,and has a surprisingly simple proof of correctness.Its runtime matches that of the fastest algorithm known.The runtime analysis is straightforward.In contrast to nearly all approaches so far,the algorithm uses no flow techniques.Roughly speaking,the algorithm consists of about͉V͉nearly identical phases each of which is a maximum adjacency search.Categories and Subject Descriptors:G.L.2[Discrete Mathematics]:Graph Theory—graph algorithms General Terms:AlgorithmsAdditional Key Words and Phrases:Min-Cut1.IntroductionGraph connectivity is one of the classical subjects in graph theory,and has many practical applications,for example,in chip and circuit design,reliability of communication networks,transportation planning,and cluster analysis.Finding the minimum cut of an undirected edge-weighted graph is a fundamental algorithmical problem.Precisely,it consists in finding a nontrivial partition of the graphs vertex set V into two parts such that the cut weight,the sum of the weights of the edges connecting the two parts,is minimum.A preliminary version of this paper appeared in Proceedings of the2nd Annual European Symposium on Algorithms.Lecture Notes in Computer Science,vol.855,1994,pp.141–147.This work was supported by the ESPRIT BRA Project ALCOM II.Authors’addresses:M.Stoer,Televerkets Forskningsinstitutt,Postboks83,2007Kjeller,Norway; e-mail:mechthild.stoer@nta.no.;F.Wagner,Institut fu¨r Informatik,Fachbereich Mathematik und Informatik,Freie Universita¨t Berlin,Takustraße9,Berlin-Dahlem,Germany;e-mail:wagner@inf.fu-berlin.de.Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage,the copyright notice,the title of the publication,and its date appear,and notice is given that copying is by permission of the Association for Computing Machinery(ACM),Inc.To copy otherwise,to republish,to post on servers,or to redistribute to lists,requires prior specific permission and/or a fee.᭧1997ACM0004-5411/97/0700-0585$03.50Journal of the ACM,Vol.44,No.4,July1997,pp.585–591.586M.STOER AND F.WAGNER The usual approach to solve this problem is to use its close relationship to the maximum flow problem.The famous Max-Flow-Min-Cut-Theorem by Ford and Fulkerson[1956]showed the duality of the maximum flow and the so-called minimum s-t-cut.There,s and t are two vertices that are the source and the sink in the flow problem and have to be separated by the cut,that is,they have to lie in different parts of the partition.Until recently all cut algorithms were essentially flow algorithms using this duality.Finding a minimum cut without specified vertices to be separated can be done by finding minimum s-t-cuts for a fixed vertex s and all͉V͉Ϫ1possible choices of tʦVگ{s}and then selecting the lightest one.Recently Hao and Orlin[1992]showed how to use the maximum flow algorithm by Goldberg and Tarjan[1988]in order to solve the minimum cut problem in timeᏻ(͉VʈE͉log(͉V͉2/͉E͉),which is nearly as fast as the fastest maximum flow algorithms so far[Alon1990;Ahuja et al.1989;Cheriyan et al. 1990].Nagamochi and Ibaraki[1992a]published the first deterministic minimum cut algorithm that is not based on a flow algorithm,has the slightly better running time ofᏻ(͉VʈE͉ϩ͉V͉2log͉V͉),but is still rather complicated.In the unweighted case,they use a fast-search technique to decompose a graph’s edge set E into subsets E1,...,E␭such that the union of the first k E i’s is a k-edge-connected spanning subgraph of the given graph and has at most k͉V͉edges.They simulate this approach in the weighted case.Their work is one of a small number of papers treating questions of graph connectivity by non-flow-based methods [Nishizeki and Poljak1989;Nagamochi and Ibaraki1992a;Matula1992].Karger and Stein[1993]suggest a randomized algorithm that with high probability finds a minimum cut in timeᏻ(͉V͉2log͉V͉).In this context,we present in this paper a remarkably simple deterministic minimum cut algorithm with the fastest running time so far,established in Nagamochi and Ibaraki[1992b].We reduce the complexity of the algorithm of Nagamochi and Ibaraki by avoiding the unnecessary simulated decomposition of the edge set.This enables us to give a comparably straightforward proof of correctness avoiding,for example,the distinction between the unweighted, integer-,rational-,and real-weighted case.This algorithm was found independently by Frank[1994].Queyranne[1995]generalizes our simple approach to the minimization of submodular functions.The algorithm described in this paper was implemented by Kurt Mehlhorn from the Max-Planck-Institut,Saarbru¨cken and is part of the algorithms library LEDA[Mehlhorn and Na¨her1995].2.The AlgorithmThroughout the paper,we deal with an ordinary undirected graph G with vertex set V and edge set E.Every edge e has nonnegative real weight w(e).The simple key observation is that,if we know how to find two vertices s and t, and the weight of a minimum s-t-cut,we are nearly done:T HEOREM2.1.Let s and t be two vertices of a graph G.Let G/{s,t}be the graph obtained by merging s and t.Then a minimum cut of G can be obtained by taking the smaller of a minimum s-t-cut of G and a minimum cut of G/{s,t}.The theorem holds since either there is a minimum cut of G that separates s and t ,then a minimum s -t -cut of G is a minimum cut of G ;or there is none,then a minimum cut of G /{s ,t }does the job.So a procedure finding an arbitrary minimum s -t -cut can be used to construct a recursive algorithm to find a minimum cut of a graph.The following algorithm,known in the literature as maximum adjacency search or maximum cardinality search ,yields the desired s -t -cut.M INIMUM C UT P HASE (G ,w ,a )A 4{a }while A Vadd to A the most tightly connected vertexstore the cut-of-the-phase and shrink G by merging the two vertices added lastA subset A of the graphs vertices grows starting with an arbitrary single vertex until A is equal to V .In each step,the vertex outside of A most tightly connected with A is added.Formally,we add a vertexz ʦ͞A such that w ͑A ,z ͒ϭmax ͕w ͑A ,y ͉͒y ʦ͞A ͖,where w (A ,y )is the sum of the weights of all the edges between A and y .At the end of each such phase,the two vertices added last are merged ,that is,the two vertices are replaced by a new vertex,and any edges from the two vertices to a remaining vertex are replaced by an edge weighted by the sum of the weights of the previous two edges.Edges joining the merged nodes are removed.The cut of V that separates the vertex added last from the rest of the graph is called the cut-of-the-phase .The lightest of these cuts-of-the-phase is the result of the algorithm,the desired minimum cut:M INIMUM C UT (G ,w ,a )while ͉V ͉Ͼ1M INIMUM C UT P HASE (G ,w ,a )if the cut-of-the-phase is lighter than the current minimum cutthen store the cut-of-the-phase as the current minimum cutNotice that the starting vertex a stays the same throughout the whole algorithm.It can be selected arbitrarily in each phase instead.3.CorrectnessIn order to proof the correctness of our algorithms,we need to show the following somewhat surprising lemma.L EMMA 3.1.Each cut -of -the -phase is a minimum s -t -cut in the current graph ,where s and t are the two vertices added last in the phase .P ROOF .The run of a M INIMUM C UT P HASE orders the vertices of the current graph linearly,starting with a and ending with s and t ,according to their order of addition to A .Now we look at an arbitrary s -t -cut C of the current graph and show,that it is at least as heavy as the cut-of-the-phase.587A Simple Min-Cut Algorithm588M.STOER AND F.WAGNER We call a vertex v a active(with respect to C)when v and the vertex added just before v are in the two different parts of C.Let w(C)be the weight of C,A v the set of all vertices added before v(excluding v),C v the cut of A vഫ{v} induced by C,and w(C v)the weight of the induced cut.We show that for every active vertex vw͑A v,v͒Յw͑C v͒by induction on the set of active vertices:For the first active vertex,the inequality is satisfied with equality.Let the inequality be true for all active vertices added up to the active vertex v,and let u be the next active vertex that is added.Then we havew͑A u,u͒ϭw͑A v,u͒ϩw͑A uگA v,u͒ϭ:␣Now,w(A v,u)Յw(A v,v)as v was chosen as the vertex most tightly connected with A v.By induction w(A v,v)Յw(C v).All edges between A uگA v and u connect the different parts of C.Thus they contribute to w(C u)but not to w(C v).So␣Յw͑C v͒ϩw͑A uگA v,u͒Յw͑C u͒As t is always an active vertex with respect to C we can conclude that w(A t,t)Յw(C t)which says exactly that the cut-of-the-phase is at most as heavy as C.4.Running TimeAs the running time of the algorithm M INIMUM C UT is essentially equal to the added running time of the͉V͉Ϫ1runs of M INIMUM C UT P HASE,which is called on graphs with decreasing number of vertices and edges,it suffices to show that a single M INIMUM C UT P HASE needs at mostᏻ(͉E͉ϩ͉V͉log͉V͉)time yielding an overall running time ofᏻ(͉VʈE͉ϩ͉V͉2log͉V͉).The key to implementing a phase efficiently is to make it easy to select the next vertex to be added to the set A,the most tightly connected vertex.During execution of a phase,all vertices that are not in A reside in a priority queue based on a key field.The key of a vertex v is the sum of the weights of the edges connecting it to the current A,that is,w(A,v).Whenever a vertex v is added to A we have to perform an update of the queue.v has to be deleted from the queue,and the key of every vertex w not in A,connected to v has to be increased by the weight of the edge v w,if it exists.As this is done exactly once for every edge,overall we have to perform͉V͉E XTRACT M AX and͉E͉I NCREASE K EY ing Fibonacci heaps[Fredman and Tarjun1987],we can perform an E XTRACT M AX operation inᏻ(log͉V͉)amortized time and an I NCREASE K EY operation inᏻ(1)amortized time.Thus,the time we need for this key step that dominates the rest of the phase, isᏻ(͉E͉ϩ͉V͉log͉V͉).5.AnExample F IG .1.A graph G ϭ(V ,E )withedge-weights.F IG .2.The graph after the first M INIMUM C UT P HASE (G ,w ,a ),a ϭ2,and the induced ordering a ,b ,c ,d ,e ,f ,s ,t of the vertices.The first cut-of-the-phase corresponds to the partition {1},{2,3,4,5,6,7,8}of V with weight w ϭ5.F IG .3.The graph after the second M INIMUM C UT P HASE (G ,w ,a ),and the induced ordering a ,b ,c ,d ,e ,s ,t of the vertices.The second cut-of-the-phase corresponds to the partition {8},{1,2,3,4,5,6,7}of V with weight w ϭ5.F IG .4.After the third M INIMUM C UT P HASE (G ,w ,a ).The third cut-of-the-phase corresponds to the partition {7,8},{1,2,3,4,5,6}of V with weight w ϭ7.589A Simple Min-Cut AlgorithmACKNOWLEDGMENT .The authors thank Dorothea Wagner for her helpful re-marks.REFERENCESA HUJA ,R.K.,O RLIN ,J.B.,AND T ARJAN ,R.E.1989.Improved time bounds for the maximum flow problem.SIAM put.18,939–954.A LON ,N.1990.Generating pseudo-random permutations and maximum flow algorithms.Inf.Proc.Lett.35,201–204.C HERIYAN ,J.,H AGERUP ,T.,AND M EHLHORN ,K.1990.Can a maximum flow be computed in o (nm )time?In Proceedings of the 17th International Colloquium on Automata,Languages and Programming .pp.235–248.F ORD ,L.R.,AND F ULKERSON ,D.R.1956.Maximal flow through a network.Can.J.Math.8,399–404.F RANK , A.1994.On the Edge-Connectivity Algorithm of Nagamochi and Ibaraki .Laboratoire Artemis,IMAG,Universite ´J.Fourier,Grenoble,Switzerland.F REDMAN ,M.L.,AND T ARJAN ,R.E.1987.Fibonacci heaps and their uses in improved network optimization algorithms.J.ACM 34,3(July),596–615.G OLDBERG ,A.V.,AND T ARJAN ,R.E.1988.A new approach to the maximum-flow problem.J.ACM 35,4(Oct.),921–940.H AO ,J.,AND O RLIN ,J.B.1992.A faster algorithm for finding the minimum cut in a graph.In Proceedings of the 3rd ACM-SIAM Symposium on Discrete Algorithms (Orlando,Fla.,Jan.27–29).ACM,New York,pp.165–174.K ARGER ,D.,AND S TEIN ,C.1993.An O˜(n 2)algorithm for minimum cuts.In Proceedings of the 25th ACM Symposium on the Theory of Computing (San Diego,Calif.,May 16–18).ACM,New York,pp.757–765.F IG .5.After the fourth and fifth M INIMUM C UT P HASE (G ,w ,a ),respectively.The fourth cut-of-the-phase corresponds to the partition {4,7,8},{1,2,3,5,6}.The fifth cut-of-the-phase corresponds to the partition {3,4,7,8},{1,2,5,6}with weight w ϭ4.F IG .6.After the sixth and seventh M INIMUM C UT P HASE (G ,w ,a ),respectively.The sixth cut-of-the-phase corresponds to the partition {1,5},{2,3,4,6,7,8}with weight w ϭ7.The last cut-of-the-phase corresponds to the partition {2},V گ{2};its weight is w ϭ9.The minimum cut of the graph G is the fifth cut-of-the-phase and the weight is w ϭ4.590M.STOER AND F.WAGNERM ATULA ,D.W.1993.A linear time 2ϩ⑀approximation algorithm for edge connectivity.In Proceedings of the 4th ACM–SIAM Symposium on Discrete Mathematics ACM,New York,pp.500–504.M EHLHORN ,K.,AND N ¨AHER ,S.1995.LEDA:a platform for combinatorial and geometric mun.ACM 38,96–102.N AGAMOCHI ,H.,AND I BARAKI ,T.1992a.Linear time algorithms for finding a sparse k -connected spanning subgraph of a k -connected graph.Algorithmica 7,583–596.N AGAMOCHI ,H.,AND I BARAKI ,puting edge-connectivity in multigraphs and capaci-tated graphs.SIAM J.Disc.Math.5,54–66.N ISHIZEKI ,T.,AND P OLJAK ,S.1989.Highly connected factors with a small number of edges.Preprint.Q UEYRANNE ,M.1995.A combinatorial algorithm for minimizing symmetric submodular functions.In Proceedings of the 6th ACM–SIAM Symposium on Discrete Mathematics ACM,New York,pp.98–101.RECEIVED APRIL 1995;REVISED FEBRUARY 1997;ACCEPTED JUNE 1997Journal of the ACM,Vol.44,No.4,July 1997.591A Simple Min-Cut Algorithm。

基于层次划分的最佳聚类数确定方法

基于层次划分的最佳聚类数确定方法

ISSN 1000-9825, CODEN RUXUEW E-mail: jos@Journal of Software, Vol.19, No.1, January 2008, pp.62−72 DOI: 10.3724/SP.J.1001.2008.00062 Tel/Fax: +86-10-62562563© 2008 by Journal of Software. All rights reserved.∗基于层次划分的最佳聚类数确定方法陈黎飞1, 姜青山2+, 王声瑞31(厦门大学计算机科学系,福建厦门 361005)2(厦门大学软件学院,福建厦门 361005)3(Department of Computer Science, University of Sherbooke, J1K 2R1, Canada)A Hierarchical Method for Determining the Number of ClustersCHEN Li-Fei1, JIANG Qing-Shan2+, WANG Sheng-Rui31(Department of Computer Science, Xiamen University, Xiamen 361005, China)2(School of Software, Xiamen University, Xiamen 361005, China)3(Department of Computer Science, University of Sherbooke, J1K 2R1, Canada)+ Corresponding author: Phn: +86-592-2186707, E-mail: qjiang@, /View/shizi/jqs.htmChen LF, Jiang QS, Wang SR. A hierarchical method for determining the number of clusters. Journal ofSoftware, 2008,19(1):62−72. /1000-9825/19/62.htmAbstract: A fundamental and difficult problem in cluster analysis is the determination of the “true” number ofclusters in a dataset. The common trail-and-error method generally depends on certain clustering algorithms and isinefficient when processing large datasets. In this paper, a hierarchical method is proposed to get rid of repeatedlyclustering on large datasets. The method firstly obtains the CF (clustering feature) via scanning the dataset andagglomerative generates the hierarchical partitions of dataset, then a curve of the clustering quality w.r.t the varyingpartitions is incrementally constructed. The partitions corresponding to the extremum of the curve is used toestimate the number of clusters finally. A new validity index is also presented to quantify the clustering quality,which is independent of clustering algorithm and emphasis on the geometric features of clusters, handlingefficiently the noisy data and arbitrary shaped clusters. Experimental results on both real world and synthesisdatasets demonstrate that the new method outperforms the recently published approaches, while the efficiency issignificantly improved.Key words: clustering; clustering validity index; statistics; number of cluster; hierarchically clustering摘要: 确定数据集的聚类数目是聚类分析中一项基础性的难题.常用的trail-and-error方法通常依赖于特定的聚类算法,且在大型数据集上计算效率欠佳.提出一种基于层次思想的计算方法,不需要对数据集进行反复聚类,它首先扫描数据集获得CF(clustering feature,聚类特征)统计值,然后自底向上地生成不同层次的数据集划分,增量地构建一条关于不同层次划分的聚类质量曲线;曲线极值点所对应的划分用于估计最佳的聚类数目.另外,还提出一种新的∗ Supported by the National Natural Science Foundation of China under Grant No.10771176 (国家自然科学基金); the National 985Project of China under Grant No.0000-X07204 (985工程二期平台基金); the Scientific Research Foundation of Xiamen University ofChina under Grant No.0630-X01117 (厦门大学科研基金)Received 2007-04-01; Accepted 2007-10-09陈黎飞等:基于层次划分的最佳聚类数确定方法63聚类有效性指标用于衡量不同划分的聚类质量.该指标着重于簇的几何结构且独立于具体的聚类算法,能够识别噪声和复杂形状的簇.在实际数据和合成数据上的实验结果表明,新方法的性能优于新近提出的其他指标,同时大幅度提高了计算效率.关键词: 聚类;聚类有效性指标;统计指标;聚类数;层次聚类中图法分类号: TP18文献标识码: A聚类是数据挖掘研究中重要的分析手段.迄今,研究者已提出了多种聚类算法[1],在商务智能、Web挖掘等领域中得到了广泛的应用.然而,许多聚类算法需要用户给定聚类数,在实际应用中,这通常需要用户根据经验或具备相关领域的背景知识.确定数据集的聚类数问题目前仍是聚类分析研究中的一项基础性难题[2−4].现有的研究[2−13]是通过以下过程(一种迭代的trial-and-error过程[9])来确定数据集最佳聚类数的,如图1所示.在给定的数据集或通过随机抽样得到的数据子集上,使用不同的参数(通常是聚类数k)运行特定的聚类算法对数据集进行不同的划分,计算每种划分的统计指标值,最后比较分析各个指标值的大小或变化情况,符合预定条件的指标值所对应的算法参数k被认为是最佳的聚类数k*.k*Fig.1 A typical process for determining the number of clusters in a dataset图1 典型的确定数据集最佳聚类数的计算过程各种类型的统计指标从不同角度出发衡量数据集划分的聚类质量.聚类有效性指标(cluster validity index,简称CVI)是一类常见的指标,在为数众多的CVI[6−13]中,基于数据集几何结构的指标函数具有代表意义,它们考虑了聚类的基本特征,即一个“好”的聚类结果应使得k个簇的簇内数据点是“紧凑”的,而不同簇的点之间是尽可能“分离”的,指标量化聚类的簇内紧凑度和簇间分离度并组合二者[6,7],代表性的指标包括Xie-Beni指标[8]V xie 和S.Wang-H.Sun-Q.Jiang指标[9]V wsj等.对应于最大或最小指标值的k被认为是最佳聚类数k*.其他类型的统计指标包括Gap statistic[2]、信息熵[3]和IGP(in-group proportion)[4]等,其中,IGP是一种新近提出的指标,它使用簇内数据点的in-group比例来衡量聚类结果的质量,取得了优于现有其他指标的性能[4].然而,现有的工作[2−13]多集中在对统计指标的改进上,而忽略了对计算过程的研究.图1所示的计算过程存在两个问题:首先,由于需要多次地对整个数据集进行聚类,其效率显然与所选用聚类算法本身的效率密切相关,且将随着数据集的增大而显著下降.尽管可以通过减少聚类次数来提高效率,然而估计准确的k min和k max也是不容易的[14];其次,指标被特定的聚类算法联系在一起.例如,实际中许多CVI[6−12]是与FCM(fuzzy C-mean)算法(或融合GA(genetic algorithm)算法[13])结合在一起的,上述的其他指标出于计算的需要总要选择以聚类数k 为参数的算法,如k-means或层次型聚类算法[2−4].这导致了指标性能依赖于聚类算法的问题.例如,FCM和k-means算法存在不能有效识别噪声和只能发现凸形簇的局限性,使得这些指标自然地存在同样的缺陷.本文提出的新方法COPS(clusters optimization on preprocessing stage)采用与图1完全不同的两阶段计算方案,首先通过扫描一遍数据集一次性地构造出数据集所有合理的划分组合,进而生成一条关于不同划分的聚类质量曲线;在第2阶段抽取出对应曲线极小值点的划分来估计数据集的最佳聚类数目,从而避免了对大型数据集的反复聚类,且不依赖于特定的聚类算法,能够有效识别数据集中可能包含的噪声和复杂形状的簇.在实际数据和合成数据上的测试结果表明,COPS的性能优于IGP,同时大幅度提高了计算效率.本文第1节给出COPS方法.第2节提出和分析COPS使用的聚类有效性新指标.第3节进行实验验证和分析.最后在第4节作出总结.64Journal of Software 软件学报 V ol.19, No.1, January 20081 COPS 方法给定d 维数据集DB ={X 1,X 2,…,X n },X ={x 1,x 2,…,x d }为一个数据点,n 为数据点数目.一个硬划分聚类算法[1]将DB 划分为k (k >1)个子集的集合C k ={C 1,C 2,…,C k },且∀j ≠l ,1≤j ,l ≤k ,C j ∩C l =Φ,C j 称为DB 的簇.常用的trail-and- error 方法为产生C k (k =k min ,…,k max )需要对数据集进行k max −k min +1次聚类,这影响了算法效率,尤其当数据量较大时;另一方面,不恰当的k min 和k max 设置也会影响计算结果的准确性.因此,若能根据数据集的几何结构一次性地生成所有合理的划分,同时评估它们的聚类质量,就可以在很大程度上提高计算效率和结果的准确性.COPS 借鉴层次聚类的思想来达到这个目的.其原理是,首先将每个数据点看作单独的簇,然后在自底向上层次式的簇合并过程中生成所有合理的划分,同时计算它们的聚类质量,保存具有最优聚类质量的划分为C *,最后根据C *的有关统计信息来估计聚类数k *.这里使用新的聚类有效指标性指标函数Q (C )来评估划分C 的聚类质量,其最小值对应最优的质量.COPS 计算过程的形式化表示如下:12***{,,...,}arg min (), ()k n k C C C C C Q C k C θ∈==.COPS 的处理对象是可能包含噪声和复杂形状(非凸形)簇的数据集,这样的数据在实际应用中是常见的,如空间数据和一些高维度的数据.过程θ剔除噪声的影响,识别C *中有意义的簇的数目.1.1 算法原理下面,首先结合距离和维度投票(dimension voting)[15]思想提出数据点间相似度的定义,以此为基础给出确定DB 最优划分C *的计算过程.定义1(点的相似维度[15]). 给定一个阈值t j ≥0,1≤j ≤d ,若|x j −y j |≤t j ,则称点X 和Y 是关于t j 第j 维相似的. 根据定义1,可以定义相似的数据点.定义2(相似点). 给定一个阈值向量T ={t 1,t 2,…,t d },若点X 和Y 在所有数据维度上都是关于t j (j =1,2,…,d )相似的,则称数据点X 和Y 是关于T 相似的.彼此相似的数据点组成DB 的簇.若T 的各分量相等,则||T ||相当于基于密度聚类算法[16]中点的邻域半径.这里,我们允许T 的各分量不相等,它反映了数据集各维度属性值分布的差异.显然,T 的取值“大小”决定了簇结构.鉴于T 是一个向量,定义3用于比较不同T 之间的相对关系.定义3(T 的比较). 给定两个阈值向量12{,,...,}a a a a d T t t t =和12{,,...,}b b b b d T t t t =,称T a >T b 若满足:(1) ,ab j j t t ≥j =1,2,…,d ;(2) 至少存在一个j ∈[1,d ],使得ab j j t t >.给定数据集DB ,根据定义2和定义3,一个很小的T 令大多数数据点不相似,极端情形是每个数据点(设DB 中没有相同的数据点)都构成单独的“簇”,此时,DB 的划分数目达到最大值,k =n ;记这样的T 为T 0.相反地,一个足够大的T 将使得所有数据点彼此相似而组成一个大簇,此时,k 达到最小值,k =1;用T m 表示使得k =1的最小的T .至此,可以把确定数据集DB 最优划分C *的问题转换为求解最优阈值向量T *(T 0<T *<T m )的问题,T *是使得Q 在T 从T 0开始增大到T m 过程中取得最小值的阈值向量.由此可以给出COPS 的计算过程,如图2所示.Fig.2 Flowchar of the COPS图2 COPS 的计算流程图计算从T =T 0(每个分量值为0)开始,每个步骤T 增大一个量Δ(Δ={Δ1,Δ2,…,Δd }),根据定义2,此时将有部分原本属于不同子集的点变得相似,这些子集被合并,生成了新的划分;对每个划分计算其Q 的值,直到T 增长到所有陈黎飞 等:基于层次划分的最佳聚类数确定方法 65点被划分到同一个集合为止.这里子集的合并,也就是簇的合并,以类似single link 的方式[1]进行.图3给出一个例子,说明图2的计算过程如何自底向上地进行簇的合并.图3给出了COPS 在一个2维数据集上若干步骤的结果,该数据集包含2个簇和1个噪声点,其中一个簇是非凸形的.如初始状态图3(a)所示,所有的数据点均构成独立的簇;随T 的增大数据点被逐渐合并,假设在某个步骤形成了图3(b)所示的椭圆形区域所代表的若干个小簇;当T 进一步增大使得两个分属于不同簇的点(如图3(b)中分属于簇A 和B 的两个标志为‘x ’的点)变得相似时,两个原本为凸形的簇被合并.基于这样的策略可以生成任意形状的簇,如图3(c)所示,最终合并成了一个banana 型的非凸形簇.图3中,3个阶段的簇结构组成聚类树的3个层次(实际的聚类树可能不止这3个层次,作为一个例子,这里只给出其中的3层).对于每个层次上的簇集合,分别计算它们的Q 值,抽取出其中使得Q取值最小的层次,识别其中的噪声点.考虑图3(c)所示的簇集合,位于下方只包含一个数据点的“簇”被识别为噪声,这样就得到了该数据集的最佳聚类数目为2.以上簇的合并方法与single link [1]的区别是,传统的single link 方式以全空间的欧氏距离为基础,而COPS 依据定义2来衡量簇间的相似度.Single link 已被证明可以识别数据集中非凸形的簇结构[1],我们在此基础上增加考虑了数据集各维度属性值分布的差异因素,这种差异在具有较高维度的实际应用数据中是常见的[17].Δ的选取以及随T 的变化如何快速地生成新的划分和计算Q 值是影响算法性能的重要环节,以下章节将分别阐述.(a) (b) (c)Fig.3 An example of the working flow of the COPS图3 COPS 计算过程示例1.2 算法过程和参数设置COPS 的计算过程类似于凝聚型层次聚类算法中层次聚类树[17]的构建过程.然而,一个典型的层次聚类算法具有O (n 2)的计算复杂度[1].性能的瓶颈在于为每个数据点X 查找与其相似点的集合Neighbors (X ),这是range queries 问题[18],通常这需要遍历整个数据集,一些改进方法可参见文献[18].基于定义2可以简化查找相似点的过程.首先将数据点按照每个维度的属性值大小进行排序(每个维度j 有一个排序的序列A j );根据定义1,通过顺序扫描A j 可以得到所有与点X 第j 维相似的点Y ,扫描范围局限在符合|x j −y j |≤t j 条件的有限区间内.当t j 增加Δj 时,只需在原有范围的基础上扩展扫描区间t j <|x j −y j |≤ t j +Δj 即可.基于此优化方法的COPS 伪代码如图4所示.其中MergePartitions 的功能是在C k +1的基础上合并两个相似点X 和Y 所在的子集生成新的划分C k ;UpdateQ 在Q (C k +1)的基础上根据X 和Y 所在子集的统计信息计算得到新的值Q (C k ).第2节将阐述基于CF 的子集合并和Q (C )的计算方法.算法参数Δ的选取与T 的一个隐含性质有关.考察定义2,T 可以看作是聚类数据集的分辨率(clustering resolution)[19],而分辨率可以被想象为一个看待数据点是否构成簇的“望远镜”.由此可知,T 的各分量间应成一定的比例关系,这个比例与数据点投影到各维度上时点的分布密度有关.同理,Δ的各分量间也应具有这个性质.定义4通过度量维度的稀疏度来量化这种比例关系.定义4(维度稀疏度). 数据集DB 在第j 维的分布稀疏度为λj ,j λ其中,ij x ′是数据点X i 第j 维属性的[0,1]规范化值,μj 表示第j 维的中心,即66Journal of Software 软件学报 V ol.19, No.1, January 20081,...,11,...,1,...,min {}1, max{}min {}n ij lj l nij j ij i lj lj l n l n x x x x x x n μ====−′′==−∑.λj 实际上是数据集第j 维规范化的标准偏差.在高维数据的投影聚类[20]中,正是以标准偏差为基础度量维度与簇之间的相关程度.λj 值越大,表明第j 维属性值分布得越稀疏,与其相关的簇也可能就越多.COPS 利用这些维度上属性值的变化来揭示数据集潜在的簇结构,因此可用以下公式来确定算法的参数Δ:12max{,,...,}d j jλλλΔελ=×, 其中,ε(ε>0)是给定的一个具有很小的数值的算法参数,用于控制计算Q 序列的精度.显然,ε越小,COPS 在每个维度上的搜索步数(即每个维度被分割成的用于计算的区间数)就越多,因而也就扩大了算法搜索数据集最优划分的搜索空间,其结果也就越有可能是最优的结果.另一方面,ε越小,将使得算法的时间开销越大,因而需要在这两者之间取一个平衡点.经过实验环节的反复验证,我们设定ε=0.01.1.3 确定k *的值|C *|是候选的聚类数k *,但由于噪声的影响,k *=|C *|并不完全成立.在COPS 中,噪声数据点也是C *的组成部份,其特点是这些子集所包含的数据点数目较少[19].设|C *|>2,过程θ采用基于MDL(minimal description length)的剪枝方法[21]识别出C *中“有意义”的子集.MDL 的基本思想是,对输入的数据进行编码,进而选择具有最短编码长度的编码方案.在COPS 中,其输入的数据是各个子集包含的数据点数目,簇的重要性由其包含数据点的数目来决定.令****12{,,...,}k C C C C =,*||i C 为*i C 包含的数据点数目.首先按照*||i C 从大到小排序生成一个新的序列C 1,C 2,…,C k ;然后将这个序列以C p (1<p <k )为界分为两个部分:S L (p )={C 1,C 2,…,C p }和S R (p )={C p +1,C p +2,…,C k },求得数据的编码长度CL (p ).CL (p )的定义[21]为2()2()2()2()11()log ()log (||||)log ()log (||||)L L R R S p j S p S p j S p j p p j k CL p C C μμμμ≤≤+≤≤=+−++−∑∑,其中,()()11||, ||)L R S p j S p j j p p j k C p C k p μμ≤≤+≤≤⎡⎤⎡⎤==−⎢⎥⎢⎥∑∑.上式的第1项和第3项分别表示以p 为界的两个Algorithm . COPS (DB ,Δ). begin k =n ,T =T 0,C n ={{X 1},{X 2},…,{X n }},Q n =Q (C n )For each dimension j ∈[1,d ] doA j =Points sorted on the values of j th-attributes {1. Generating Q -sequence} Repeat For each dimension j ∈[1,d ] do For each point X ∈Aj do For each point Y ∈Neighbors (X ,A j ,t j ,t j +Δj )begin Flag X and Y are j th-similar If X and Y are full-dimensional similar begink =k −1 C k =MergePartitions (C k +1,X ,Y ) Q k =UpdateQ (Q k +1,X ,Y )end; end; {for Y ∈Neighbors ()} T =T +Δ Until k =1 {2. Computing k *} C *=Partitions having the minimum of Q -sequence Return k *=θ(C *) end; Fig.4 The pseudocodes of the COPS 图4 COPS 的伪代码陈黎飞 等:基于层次划分的最佳聚类数确定方法 67 序列的平均编码长度;其余两项衡量|C j |与平均数据点数之间的差异.实际计算中若出现()||L j S p C μ=或()||R j S p C μ=而使得log 函数没有定义,则直接忽略该子集,即设定此时的差异为0 bit.最短的编码长度CL (p )对应的分割位置p 被看作数据序列的最优分割点,根据MDL(minimal description length)剪枝方法的思想,此时S L (p )所包含的数据点可以认为代表了对DB 的覆盖[21].在COPS 中,S R (p )所包含的数据点就识别为噪声.至此,我们得到了数据集的最佳聚类数,k *=p .1.4 算法复杂度最坏情况下,COPS 的空间复杂度为O (n 2),实际中采用以下策略降低算法的空间使用量:对任意两个不相似的数据点X i 和X j ,若X i 和X j 至少在一个维度上是相似的,则通过一个hash 函数映射到一个线性表中的单元HASH (i ,j ),HASH (i ,j )记录X i 和X j 的维度相似情况.算法开始时,该表的所有单元为空(未被使用),随着T 的增大,一些点对变得相似时,其映射在线性表中的单元亦被释放,从而有效降低了算法的实际空间占有量.采用快速排序(quicksort)方法对数据点进行排序的时间复杂度为O (dn log n ).生成Q 序列部分算法的时间复 杂度为()O kdnN ,其中,k 为外层循环的执行次数,是一个与n 无关的量,在数值上,k <<n .这是因为随着T 的增 大,越来越多的点变得相似,k 在内层循环中将迅速减少,其数值只与数据点的分布和ε的取值有关;N 是数据点在Δ邻域内的平均相似点数目,在数值上N <<n ,它也只与数据点分布和ε有关.计算初始值Q (C n )的复杂度为 O (dn )(参见第2.3节分析),MDL 剪枝方法的复杂度为O (k 2).综上,COPS 的时间复杂度为O (dn log n ).2 COPS 的聚类有效性指标COPS 用有效性指标Q (C )评估DB 被划分为C 时的聚类质量.本节提出的指标Q (C )主要考虑数据集的几何结构,即通过衡量簇内数据点分布的紧凑度以及簇间的分离度,并保持二者之间的平衡.Q (C )不依赖于具体的聚类算法.2.1 新的有效性指标设||X −Y||表示点X 和Y 之间的欧氏距离,给定DB 的一个划分C k ={C 1,C 2,…,C k },Scat (C k )衡量C k 的簇内紧凑度, Sep (C k )对应C k 的簇间分离度.具体地,21,()||||i k k i X Y C Scat C X Y =∈=−∑∑ (1) 211,,1()||||||||i l k k k i l l i X C Y C i l Sep C X Y C C ==≠∈∈⎛⎞=−⎜⎟⎜⎟⋅⎝⎠∑∑∑ (2) 以上两式的定义原理如下:Scat 是簇内任意两个数据点之间距离的平方和;Sep 的原理是将每个簇看作是一个大“数据点”,大“数据点”间的“距离”通过簇间点对的平均距离来衡量.这样,Scat 和Sep 保持了度量上的一致性.另一方面,Scat 和Sep 基于“点对”的平均距离定义,可用于度量非凸形簇结构的聚类质量.传统的基于几何结构的聚类有效性指标(如V xie [8])通常基于簇质心(centroids)使用簇的平均半径和质心之间的距离来定义Scat 和Sep ,这样的指标往往只对球(超球)形的簇结构有效[7].代入欧氏距离公式再做简单的变换,Scat (C k )和Sep (C k )可分别表示为211()2(||)d kk i ij ij j i Scat C C SS LS ===−∑∑,2221111()2(1)||||||dk k k ij ij ij k j i i i i i i SS LS LS Sep C k C C C ====⎛⎞⎛⎞⎜⎟=−−+⎜⎟⎜⎟⎝⎠⎝⎠∑∑∑∑, 其中,2, i iij j ij j X C X C SS x LS x ∈∈==∑∑.直观上,Scat 的值越小,表明簇越紧凑;Sep 的值越大,表明簇间的分离性越 好.在下式中使用线性组合平衡二者,β(β>0)为组合参数,用于平衡Scat 和Sep 取值范围上的差异:Q 1(C )=Scat (C )+β⋅Sep (C ).68Journal of Software 软件学报 V ol.19, No.1, January 2008这里将数据集的划分C 看作一个变量,其定义域为{C 1,C 2,…,C n }.根据定理1可以推定β=1.定理1. 给定数据集DB ,Scat (C )和Sep (C )具有相同的值域范围.证明:在初始状态k =n ,C n ={{X 1},{X 2},…,{X n }},由公式(1)可知Scat (C n )=0;根据公式(2)有()()221()2)d n j j j X DB X DB Sep C n x x M =∈∈=⋅−=∑∑∑ (3)设在某个步骤C u 和C v (u ,v ∈[1,k ],u ≠v ,k >1)合并:22111122||||()()2(2)||||(||||)|||||||||| 2(||||)||||||d d k uj vj v uj u vj ij k k j j i u v v u u v i v uj u vj ij i v u u v i LS LS C SS C SS SS Sep C Sep C k C C C C C C C C LS C LS LS C C C C C −===⎛⎞+−=−−−+−⎜⎟⎜⎟+⎝⎠++∑∑∑11,,0d k j i u v ==≠⎛⎞<⎜⎟⎜⎟⎝⎠∑∑ (4)11()()2(||||2)0dk k u vj v uj uj vj j Scat C Scat C C SS C SS LS LS −=−=++>∑ (5)因此,Scat (C )是单调递增函数,而Sep (C )为单调递减函数.当k =1时,C 1={{X 1,X 2,…,X n }},容易求得Sep (C 1)=0和Scat (C 1)=M . □ 根据定理1,COPS 使用的聚类有效性指标函数Q (C )取以下形式: 1()(()())Q C Scat C Sep C M=+ (6) 2.2 指标分析最优的聚类质量对应于簇内紧凑度和簇间分离度的平衡点[9,10],在数值上反映为指标函数Q (C )取得最小值.定理2表明,对于大多数(一种特例除外,见定理条件)数据集Q (C )存在(0,1)区间的最小值.定理2. 给定数据集DB ={X 1,X 2,…,X n },若n >2且至少存在一个i ∈[2,n −1]使得||X i −1−X i ||≠||X i −X i +1||,则Q (C )存在小于1的极小值.证明:考虑在COPS 的初始状态k =n ,C n ={{X 1},{X 2},…,{X n }}.令t j =min l =1,2,…,n −1{x lj −x (l +1)j },j =1,2,…,d ,若满足定理条件,根据定义1,∃u ,v ∈[1,n ],u ≠v ,使得X u 和X v 是相似的,且(X i −1,X i )和(X i ,X i +1)中至少有1对是不相似的,后者确保对所有相似点做合并处理后k >1.考虑点X u 和X v 合并后Q 的变化,根据公式(3)~公式(6)有1111,,1,,21()1()()02d dn n n uj vj ij uj vj ij j j i i u v i i u v n Q C SS SS SS LS LS LS M M −===≠=≠⎛⎞⎛⎞−−=−+−++<⎜⎟⎜⎟⎜⎟⎝⎠⎝⎠∑∑∑∑. 定理1已经证明Q (C 1)=Q (C n )=1,这意味着若满足定理条件,则Q (C )存在小于1的极小值. □ 定理2给出了Q (C )无法取得(0,1)区间极小值的一种特殊结构的数据集,其直观情形是所有数据点均匀地分布在空间等分网格的节点上.对这样的数据集,COPS 将输出k *=n .这是合理的,因为此时,合理的k *取值为1或n ,而聚类算法通常要求k *>1.2.3 指标计算根据公式(3)~公式(6)可以进行Q (C k )的增量计算(UpdateQ 过程).为了得到Q (C k ),只需在Q (C k +1)的基础上使用公式(4)和公式(5)计算其增量即可.为此,在算法中为每个C i (i =1,2,…,k )保存一个结构: CF i =(|C i |,〈SS i 1,LS i 1〉,〈SS i 2,LS i 2〉,…,〈SS id ,LS id 〉). 这正是BIRCH 算法提出的聚类特征(clustering feature)[17].基于此,合并数据集划分的操作(MergePartitions 过程)可以转化成为相应的|C i |,SS ij 和LS ij 数值之间简单的加法运算.在计算初始值M 时,需获得每个数据点的CF 结构,再按照公式(3)计算,其时间复杂度为O (dn ).一条典型的Q (C )曲线例子如图5所示.图中,在最小值之后Q值出1.00.80.60.40.20.0 Fig.5 An example of a Q (C ) curve图5 Q (C )曲线的一个例子陈黎飞等:基于层次划分的最佳聚类数确定方法69现大幅跳变,这意味着若合并最优划分的一些子集将使得聚类质量急剧下降.利用这个特点,此时加大T的增量Δ可以进一步提高COPS的性能.具体地,当数据集较大时(比如n>1000),计算过程中若Q大于已出现的最小值,我们设定ε=ε×2.3 实验与分析实验验证包括算法有效性和算法效率两方面.在众多的聚类有效性指标[6−13]里,选用基于几何结构的V xie[8]和V wsj[9]这两种有代表意义的指标作为对比对象,其中,V xie是首个采用“紧凑度”和“分离度”概念的经典指标; V wsj改进了线性组合方法的稳定性,可以有效地处理包含有重叠的簇和噪声的数据集[9,12].两种指标都基于FCM 算法,实验设定FCM算法的模糊因子w=2.在其他类型方法中,选用Gap statistic[2]和IGP[4]作比较.Gap statistic 的特点是通过检测聚类质量的“突变(dramatic change)”确定最佳的k值;IGP是新近提出的一个指标,使用in- group比例衡量聚类的质量,其性能已被验证优于现有的其他统计指标[4].根据文献[2,4]的建议,使用k-means作为它们的基本算法,取参数R=5;IGP使用的Cutoff阈值设置为0.90.使用Greedy技术[9]选择FCM/k-means的初始簇中心点以提高算法的收敛速度.实验在CPU 2.6GHz,RAM 512MB的计算机上进行,操作系统为Microsoft Windows 2000.3.1 实验数据实验分别采用了真实数据和人工合成数据.以下报告6个有代表性的数据集上的实验结果,数据集的参数汇总见表1.为比较起见,选取的前2个数据集DS1和DS2是常被类似研究引用的真实数据X30和IRIS数据[3,7−9,12].DS3和DS4是两个具有较高维度的实际应用数据.DS3来源于Vowel Recognition(deterding data) (/~mlearn/databases/undocumented/connectionist-bench/vowel/),包含有10个说话人、11个英语元音的发音数据,用于语音识别研究;DS4来源于Wisconsin Breast Cancer Database(/ databases/breast-cancer-wisconsin/),是患者肺部FNA测试的临床数据,用于医疗诊断.Table 1Summarized parameters of datasets表1测试数据参数汇总表DB Description of the dataset Dimension (d)Size(n)True number of clustersDS1 X30 230 3150 3 DS2 IRIS 4DS3 Vowel recognition (deterding data)10 528 11DS4 Wisconsin breast cancer database9 699 2DS5 Synthetic dataset 3 4 000 660008DS6 t5.8k 2为测试各种方法处理大型数据集的性能,根据文献[17]提供的方法(在原方法基础上改进为随机簇中心)合成了含4 000个数据点的3维数据集DS5.DS5同时还包含少量的噪声,这些噪声模糊了簇的边界,其中两个簇存在明显的重叠.第6个数据集DS6包含有8 000个数据点,是命名为“t5.8k”的公用数据集,其特点是包含有大量的噪声和复杂形状的簇(其6个簇呈‘GEOAGE’字母形状).更为重要的是,我们通过实验发现,在适当的参数配置下,即指定了正确的聚类数,FCM和k-means算法可以很好地区分出这6个簇.以此验证基于FCM或k-means 算法的其他4种方法以及COPS识别复杂形状簇的性能.3.2 有效性实验COPS在6个数据集上都得到了正确的聚类数,实验结果如图6所示.对DS1和DS2,COPS检测出在聚类数从最优数目3变为2时聚类质量大幅度下降.DS2包含有两个重叠的簇[12],图6(b)表明,COPS能够有效区分重叠的簇.受噪声影响,DS5中有两个边界模糊的簇,图6(e)显示,对应于聚类数6和5的聚类质量只存在很小的差异,尽管如此,COPS还是完成了准确的区分,得到最优聚类数6,这说明COPS可以有效地识别噪声并区分簇间的密度差异.对DS3~DS6这4个较为复杂的数据,COPS没有检测到连续变化的k值,例如在图6(f)中,k从18跳变到最优数6.这是因为COPS采用了与其他方法不同的做法,它不是通过设定一个k的区间反复运行聚类算法来。

代数英语

代数英语

(0,2) 插值||(0,2) interpolation0#||zero-sharp; 读作零井或零开。

0+||zero-dagger; 读作零正。

1-因子||1-factor3-流形||3-manifold; 又称“三维流形”。

AIC准则||AIC criterion, Akaike information criterionAp 权||Ap-weightA稳定性||A-stability, absolute stabilityA最优设计||A-optimal designBCH 码||BCH code, Bose-Chaudhuri-Hocquenghem codeBIC准则||BIC criterion, Bayesian modification of the AICBMOA函数||analytic function of bounded mean oscillation; 全称“有界平均振动解析函数”。

BMO鞅||BMO martingaleBSD猜想||Birch and Swinnerton-Dyer conjecture; 全称“伯奇与斯温纳顿-戴尔猜想”。

B样条||B-splineC*代数||C*-algebra; 读作“C星代数”。

C0 类函数||function of class C0; 又称“连续函数类”。

CA T准则||CAT criterion, criterion for autoregressiveCM域||CM fieldCN 群||CN-groupCW 复形的同调||homology of CW complexCW复形||CW complexCW复形的同伦群||homotopy group of CW complexesCW剖分||CW decompositionCn 类函数||function of class Cn; 又称“n次连续可微函数类”。

Cp统计量||Cp-statisticC。

徽派建筑英文介绍

徽派建筑英文介绍

Ladies and gentlemen :It is an honor to be here to introduce our traditional residence of South China. We come from China Jingdezhen, where is famous for its porcelain, well ,it is also a place be typical its ancient Huizhou architecture. As well, our company ,Boyuan ancient architecture company is engaged in spreading the ancient Chinese culture of architecture to the world.As you look, what In front of us is part of our Chinese Huizhou traditional house ——antechamber (前厅).This antechamber included courtyard (天井) , wing-rooms ( 厢房)and so on. Let 's take care of its construction, as we know, Chinese people all cares about FengShuey( 风水) ,so we always pay much attention to the arrangement of a house. Therefore, the courtyard not only play an important role in dailylighting ,and also collecting water for fire protection ,as well collecting water also means collecting wealth in Fengshuey.There is one thing that can not be ignored, the amazing latticework on the windows. In some window latticework you may still find thin layers of gold foil, which indicates the houses' past eminence, while some carved boards feature geometric variations of Buddhist symbols. Others are carved with poems and lyrics, or even calligraphy and paintings, which implies that literary families once lived in the houses.Latticework motifs on doors, partitions, and windows are mostly derived from traditional designs with auspicious meanings, such as storks, deer, kylins (Chinese mythical beasts), pied magpies, bats, peonies, and fu (the Chinese character for happiness), all symbolizing longevity, good health, and wealth.Latticework is often enhanced by various motifs such as intricate floral figures, and episodes from local operas and folk stories. Geometric and cross patterns are also favored because they are simple yet graceful. There are also motifs based on the symbols of the five elements (air, earth, fire, water, and wood) and the eight trigrams (sky, earth, thunder, wind, water or moon, fire or sun, mountain, and water). The carved boards are mainly those from doors, windows, tables, beds, chairs, and screens.Corbie gable, also called fire-sealing gable, is a major molding characteristic of south Chinese architecture.Huizhou Carving Art。

《数据科学导论》Graph模块:图数据分析 - Centrality

《数据科学导论》Graph模块:图数据分析 - Centrality

。乎哈产
图片来源: [Bearman et al., American Journal of Sociology, 2004] 2
• 节点中心度 ( Node Centrality ) 分析
• 在网络中 ,不同节点的 〃地位〃 是不平等的
- 例子 :美国高中生恋爱关系图
- 如果定义有向边 :”追求〃 关系
有向图
• 思考: - 右边两图中男生的重要性一样吗 ?
- 你怎么解释这种重要性?
Graph模块
Node Centrality
i.基于几何图形的度量方法
•2.基于路径的度量方法
3. PageRank算法
・ ・•4.总结

Node Centrality
• i.基于几何图形的度量方法 - Degree Centrality - Closenes块 :图数据分析- Centrality
• 节点中心度 ( Node Centrality ) 分析
• 在网络中 ,不同节点的 〃地位〃 是不平等的
- 例子 :美国高中生恋爱关系图
1 - 边表示18个月内谈过恋爱
1
£ .公」喝滑嚏 1 无向图
• 思考:
- 你觉得哪些节点更重要? - 你怎么解释这种重要性?
• 2.基于路径的度量方法
- Betweenness Centrality
• 3. PageRank算法
- 矩阵运算形式( 为什么要有damping factor? )
- 马尔科夫链的数学本质
- 个性化PageRank算法
71
“Beautiful math tends to be useful, and useful things tend to have beautiful math.”

Topological Data Analysis:方法与应用说明书

Topological Data Analysis:方法与应用说明书

Topological Methods for the Analysis of ApplicationsYumiao LeiDepartment of Mathematics, Faculty of Information And Computing Science, TaiyuanUniversity of Technology, Taiyuan, Shanxi, 030024, ChinaCorresponding E-mail :Keywords: TDA, Persistent Homology, Hausdorff Distance, text c lassification, face detectionAbstract. Topological Data Analysis(TDA) is a rapidly developing data analysis field in recent years. It provides topological and geometric methods to obtain the relevant features of high-dimensional data. This paper introduces the related mathematical principles of Persistent Homology, Mapper, Hausdorff Distance in topology and enumerates two applications of TDA. One is about text classification of natural languag e. It uses persistent homology to analyze poetry data and mapper algorithm to analyze and visualize data sets. The other application is based on the principle of Robust Hausdorff Distance, and proposes a fast and accurate shape comparison method for face detection. The result shows that the TDA method is not only accurate, but also can realize data visualization.1.IntroductionData is everywhere, there are many connections hidden in the complex data. Generally, four ”V” namely, Volume, Variety, Value, Velocity, are used to summarize the characteristics of data. However, massive and complex data sets that cannot be extracted, stored, searched, shared, analyzed and processed with the current software tools. At the core of the time of big data, forecasting analysis has been widely used in business and society. Because of the huge volume of data, various types of data and low value density, different requirements for data processing have been put forward in various fields. Deal with high dimensional data and transform it into data with less dimensionality in order to make it easier for analyzing. How to purify data and get valuable information is a big problem. The use of topology, in particular, algebraic topology has been used to address a wide variety of problems[4]. People use topological methods to reduce the dimensionality of high-dimensional data, analyze the topological structure or shape of data, and finally cluster complex data [3].This paper selects the application of TDA in text classification and face detection in order to illustrate the advantages of topology in data analysis. These two applications use three main methods about TDA , Persistent Homology, Mapper and Hausdorff Distance. In Persistent homology, a filtration of combinatorial objects, s implicial complexes, is constructed and then main topological structures of data is derived. The mapper is used to analyze the result as a simplificial complex which is interactive and can be quantified in several ways using statistics [3]. Simplicial complexes can be seen as higher dimensional generalization of graphs. They are mathematical objects that are both topological and combinatorial, a property making them particularly useful for TDA [5]. Then these two methods are applied to analyze authorship attribution (data set of poems) and obtain high accuracy results [3]. Another application about topological methods is that robust face detection based on enhanced Hausdorff Distance (HD) which provides higher efficiency and more reliability. In terms of algorithm complexity , HD is more faster.2.PreliminariesDefinition 1 (Convex combination) If A 1, A 2...A p are points in R d . A convex combination is a pointof the form 1p P A A with 11p and i 01i. International Conference on Modern Educational Technology and Innovationand Entrepreneurship (ICMETIE 2020)Copyright © 2020 The Authors. Published by Atlantis Press SARL.The set of all convex combination of A 1,...,A p is called the convex hall of A 1,...,A p . [1]Definition 2 (Simplicail complex) A simplicial complex is a collection K of finite nonempty sets such that if A is an element of K, then so is every nonempty subset of A [2]. Definition 3 (Simplicial Homology). Given n ∈Z+ , the n-th homology group of a simplicial complex K, is denoted by Hn(K,F) and is defined as (1) [2] : (,)(,):.(,)n n K H k K β=(1) Definition 4 Hausdorff distance between A and B is defined by any of the two following equalities (2) [2]: d H (A ,B )=max {sup d (b,A ),sup d (a,B )}=sup│d (x,A )-d (x,B )│=║d (.,A )-d (.,B )║∞ (2) b ∈B a ∈A x ∈M3. TDA of A pplications3.1 New Text classification for Natural Language Processing3.1.1 Difficulty in Text classification Text classification is a hot research topic at present, and one of its difficulty is the high dimension of feature space. In high-dimensional feature space, the features may be redundant or unrelated, resulting in the inconvenience of high dimensional spatial processing, prone to over-learning, time and space overhead, without affecting the classification accuracy, it is necessary to carry out feature dimension reduction [6]. 3.1.2 Process In an experiment [3] that is in order to classify Persian poems which has been composed by two of the best Iranian poets namely Ferdowsi and Hafez. In this experiment, the author used two R packages, TDA and TDA status, for text classification. Those two R packages were implemented persistent homology. The textual data (poems) of two Iranian poets (Hafez and Ferdowsi) was used and the data set was gathered from Shahnameh and Ghazaliat-e-Hafez [3], which included about 8000 hemistich from each book. After preprocessing the data was fed to TF-IDF algorithm in order to make document term matrix, next the document term matrix was fed to the persistent homology algorithm. First, it sketches persistent diagram, barcode and persistent landscapes for a sample of Ferdowsi poems including 1000 hemistich [3]. It is also divided hemistich of hafez into some parts, then it computed persistent diagram and first landscape of each part. Finally it sketched the mean landscape of these parts. Then it did same work for hemistich of Ferdowsi . At last step it computed Wasserstein distances between persistent Diagrams of correspondence parts of hafez’s and Ferdowsi’s poems. The topological method about Mapper can be explained as follows: Suppose we have a point cloud data that represents a shape, such as a knot [3]. Firstly, we projected the whole data onto a coordinate system with a smaller dimension, so as to reduce the complexity by dimensionality reduction.Put the data into the overlapping receipt divided by parameter space, classify points by clustering algorithm, and finally create an interactive model. The experiment examined two accuracy tests shape graph. Firstly, it partitions the whole graph into 3 clusters: Hafez, Ferdowsi and Both. In Hafez cluster it have the nodes which include the high percent of Hafezian poems, similarly in Ferdowsi cluster it has the nodes which include the high percent of poems of Ferdowsi and in the ”Both” cluster it have about the same amount of both poems [3]. To do this, it simply divides the number of Hafezian poems in each node in the Hafez cluster by the number of all poems in each node in the same cluster, and it does the same test to other clusters as well. 3.1.3 Evaluation The key of text classification is to reduce the dimension of unstructured data sets. Generally, feature selection and feature extraction are used to reduce the dimension. However, according to the existing experimental results made by other people, the degree of dimensionality reduction of dataspace is not the same, which requires different methods to improve the accuracy and get better classification effect, so compared with TDA, it is more complex.But the topological methods provide innovative data mining methods that can improve the efficiency of machine learning techniques. Some visualization tools about persistent homology, such as Persistent Diagram, Barcode and Persistent Landscape are invented to indicate the main topological features of data.3.2Robust Face Detection Using the Hausdorff Distance3.2.1 Proposed Hausdorff DistanceFace detection is one of major research areas in AI.As one of the human identification features, facial features have the advantages of easy acquisition of sample images compared with finger prints and iris features.At present, the research of face detection is mainly aimed at static face detection, and the research object is often static face image without depth rotation [8].In order to adapt to the closure of some sports fields, effectively use the continuous motion image sequence to improve the recognition efficiency, and minimize the decline of the recognition effect caused by the motion fuzzy image, it is meaningful to propose that the recognition suitable for dynamic situations is meaningful. A similarity measure using Hausdorff Distance(HD) can tolerate to perturbations in the point locations better than others [10]. This is because it measures the proximity rather than the exactness of superimposition. Previous research of applications of HD emphasized locating an object under translation and scaling [9]. In addition, many researchers have improved the performance of the conventional HD measure in terms of speed and accuracy.3.2.2 ProcessIn the preprocessing step of dynamic face detection, Hausdorff Distance is used to locate the face image, which optimizes the next step to a certain extent.This section introduces an efficient implementation method for face location, works on grayscale still images, which is suitable for real-time applications.This method presents a shape comparison approach to achieve fast, accurate face detection that is robust to changes in illumination and background. A two-step-process that allows both coarse detection and exact localization of faces is presented [7].The specific step is to refine the facial parameters in the second stage after roughly detecting the facial region. On two large test-sets, a relative error is given to measure the performance of the system by comparing the estimated eye position with the manual eye position. Relative error measure that is independent of both the dimension of the input images and the scale of the faces [7]. The better location results show that the system has strong robustness under different backgrounds and illumination conditions. The run time behaviour allows the use in real time video applications [9].3.2.3 EvaluationDifferent from the traditional HD, Robust Hausdorff Distance(RHD) not only makes use of the position information of edge points, but also considers other types of information, such as the total number of edge points satisfying a directed distance and some pseudo-edges composed of very few edge points [7]. RHD takes occlusion and pseudo edge into account, which is not easily affected by blur in dynamic image recognition.4.ConclusionIn this paper, we cite two implementations of TDA applications. One is about text classification of natural language, which uses persistent homology algorithm to analyze poetry data-sets. Applying a new method called Mapper to author attribution. The results are analyzed as a complex system, and these statistics can be quantified in many ways. The other application is an efficient algorithm for an automatic face detection system has been proposed. The Hausdorff distance is used as a similarity measure between a general face model and possible instances of the object within the image. The method performs robust and accurate face detection and its efficiency makes it suitable for real-time applications. The face detection algorithm is simple and less computation complexitythan traditional methods. The experimental results have shown that the algorithm is the most efficient approach in terms of speed, accuracy and reliability compared to others [10].In conclusion, TDA can be used in many different fields widely. For example, persistent homology is a tool to study data sets and has been previously used in pulse crystal structures, analyzing 3D images, image analysis, analyzing breast cancer. Persistent homology now is used to understanding biological systems, as an algebraic tool for measuring high dimensional data to represent the topological features of point clouds. Researchers extend applications of computational homology to the analysis of genetic data from breast cancer patients [11]. These topological data analysis methods will be very useful, just like seemingly unrelated discrete points, we can mine out their topology and display the data vividly.AcknowledgmentFirst and foremost, I would like to show my deepest gratitude to my teachers and professors in my university, who have provided me with valuable guidance in every stage of the writing of this thesis. Further, I would like to thank all my friends and roommates for their encouragement and support. Without all their enlightening instruction and impressive kindness, I could not have completed my thesis.References[1]F. Chazal, B. Michel, An introduction to Topological Data Analysis: fundamental and practicalaspects for data scientists, math. ST, vol.43, pp: 3-6, 2017.[2]F. Memoli, K. Singhal, A Primer on Persistent Homology of Finite Metric Spaces, math. AT,vol.38, pp:7-13, 2019.[3]N. Elyasi, An Introduction to a New Text Classifification and Visualization for NaturalLanguage Processing Using Topological Data Analysis, 2019.https://arxiv.xilesou.top/abs/1906.01726 .[4]R. Rivera-Castro, P. Pilyugina, P. Pletnev, I. Maksimov, W. Wyz and E. Burnaev, TopologicalData Analysis of Time Series Data for B2B Customer Relationship Management, cs. LG, 2019.https://arxiv.xilesou.top/abs/1906.03956[5]P. Bubenik, Statistical Topological Data Analysis using Persistence Landscapes, Journal ofMachine Learning Research, vol. 25, pp: 77-102, 2015.[6]T. Chen, Y. Xie, Literature Review of Feature Dimension Reduction in Text Categorization,Information and learning newspaper, vol. 24(6): pp: 690-694, 2005.[7]O. Jesorsky, K. J. Kirchberg and R. V. Frischholz, Robust Face Detection Using the HausdorffffDistance, Lecture Notes in Computer Science, pp. 90-95, 2001.[8]S. Srisuk and W. Kurutach, New Robust Hausdorff Distance Based Face Detection, pp:1022-1025, 2001.[9]Y. Wang, Image Matching Based on Robust Hausdorff Distance, Journal of computer aideddesign and graphics, vol.14(3), pp: 238-241, 2002.[10]Y. Liu and L. Shen, Face Image Location Using Hausdorff Distance, The research anddevelopment of the counter-computing machine, vol.38(14), pp: 475-481, 2011.[11]D. DeWoskin, J. Climent, I. Cruz-White, M. Vazquez, C. Park and J. Arsuaga, Applications ofcomputational homology to the analysis of treatment response in breast cancer patients, Topology and its Applications, vol.157, pp: 157–164, 2010.。

TetGen用户手册中文版

TetGen用户手册中文版

2 入门 ..............................................................................................................12
2.1 编译..........................................................................................................................12 2.1.1 Unix\Linux\MacOSX.....................................................................................13 2.1.2 Windows9.x/NT/2000/XP..............................................................................13 2.2 测试..........................................................................................................................14 2.3 可视化......................................................................................................................17 2.3.1 TetView..........................................................................................................17 2.3.2 Medit ..............................................................................................................17

Geometric Modeling

Geometric Modeling

Geometric ModelingGeometric modeling is a crucial aspect of computer-aided design (CAD) and computer graphics. It involves the creation of digital representations of objects and environments using mathematical algorithms and geometric techniques. These models are used in various fields such as engineering, architecture, animation, and virtual reality. Geometric modeling plays a significant role in the design and visualization of complex structures, the simulation of physical phenomena, and the creation of realistic computer-generated imagery. One of the primary challenges in geometric modeling is achieving accuracy and precision in representing real-world objects and scenes. This requires the use of advanced mathematical concepts such as calculus, linear algebra, and differential geometry. Geometric modeling also involves the use of computational algorithms to generate and manipulate geometric shapes, surfaces, and volumes. These algorithms need to be efficient and robust to handle large-scale and intricate models while maintaining visualfidelity and integrity. Another important aspect of geometric modeling is the representation of 3D objects in a 2D space, which is essential for visualization and rendering. This process involves techniques such as projection, rasterization, and rendering, which are used to convert 3D geometric data into 2D images for display on screens or print. Achieving realistic and visually appealing representations requires careful consideration of lighting, shading, and texture mapping, which are fundamental in computer graphics and visualization. Inaddition to the technical challenges, geometric modeling also raises issuesrelated to usability and user experience. Designing intuitive and user-friendly interfaces for creating and manipulating geometric models is crucial for enabling efficient and effective design workflows. This involves considerations such as interactive manipulation, real-time feedback, and intuitive control mechanisms, which are essential for empowering users to express their creative ideas and concepts. Furthermore, geometric modeling has a significant impact on the manufacturing and production processes. The digital models created through geometric modeling are used for computer-aided manufacturing (CAM) and numerical control (NC) machining, enabling the production of precise and complex parts and assemblies. This integration of geometric modeling with manufacturing technologieshas revolutionized the way products are designed, prototyped, and manufactured, leading to advancements in efficiency, quality, and innovation. From an academic perspective, geometric modeling is a multidisciplinary field that draws from mathematics, computer science, and engineering. Researchers and educators in this field are constantly exploring new methods and techniques for geometric modeling, pushing the boundaries of what is possible in terms of representing and manipulating geometric data. This includes areas such as parametric modeling, geometric constraints, and procedural modeling, which are essential for enabling flexible and adaptable design processes. In conclusion, geometric modeling is a complex and multifaceted field with far-reaching implications for various industries and disciplines. It encompasses technical challenges related to accuracy, efficiency, and visualization, as well as considerations of usability, manufacturing, and academic research. As technology continues to advance, geometric modeling will play an increasingly critical role in shaping the way we design, create, and interact with the world around us.。

OFM_系统变量教程

OFM_系统变量教程

Table of ContentsOFM 系统函数系统函数 (25)统计方法 (25)算术平均值 (25)几何平均值 (25)调和平均值 (25)绝对偏差 (26)方差 (26)标准偏差 (26)系统函数准则 (26)Case Sensitivity (26)空格键 (26)系统函数术语 (27)系统函数日期和时间准则 (27)准确/ 错误 (27)Date in Data Files / Date Format OFM Returns (28)系统函数分类 (28)日期系统函数@AddDate (value, value, value) (30)@AddDays (numeric,numeric) (30)@AddMonths (numeric,numeric,numeric) (31)@AddYears (numeric,numeric,numeric) (31)@Annually (value) (32)@AtoD (alpha) (33)@Daily (value) (33)@Date ( ) (34)@DateCmp (value, value) (34)@DateRange (value, value, value) (35)@Day (value) (35)@DayName (value, value) (36)@DayofWeek (value) (36)@Dom (value) (37)@ElapsedDays (value, value) (37)@ElapsedMonths (value, value) (38)@IndexofDate (value) (38)@Julian (value) (39)@Month (value) (39)@Monthly (value) (40)@MonthName (value, value) (40)@MRecCount ( ) (41)@Note:(value) (41)@Quarterly (value) (42)@SemiAnnually (value) (42)@Weekly (value) (43)@WeekofYear (value) (43)@Year (value) (44)@YYMM (44)工程系统函数@BHP (value,value,value,value,value,value,value) (45)@DCACalc (value, string, string) (46)@DCACaseComment (string) (46)@DCACaseInitials (string) (47)@DCAResults (string, string, optional string) (47)@Forecast (value, string, string, option string) (50)@Interval ( ) (51)@SwTdt (value, value, value, value, value) (52)@TraceAt (value, value) (52)@TraceDate (value, value) (53)@TraceDates (value) (53)@Wd (value, value, string) (54)@WellType (string) (54)财务系统函数@Discount (value, value, value) (55)格式系统函数@Blank (value) (56)@DataCount() (56)@Ff ( ) (57)@FmtDate (value, string) (57)@FmtName (value, value) (57)@Lf (value) (58)@LineCount ( ) (58)@Tab (value) (59)测井分析系统函数@Phi (value, value, string) (59)@Rw (value) (60)@Sw (value, value, value, value, value, string) (60)@Vshale (value, string) (61)逻辑系统函数@Between (value, value, value) (62)@Change (value) (63)@If (value, value, value) (63)数学系统函数@Abs (value) (65)@BesselJ1 (value) (65)@BesselJ2 (value) (66)@Ceil (value) (66)@Cos (value) (66)@Diff (value, value) (67)@Exp (value) (67)@Fit (value, value, value, value, string) (68)@FitEq (value, string) (69)@FitR2 (value, string) (69)@Floor (value) (70)@Interpolate (value, value, value, value, string) (70)@Ln (value) (71)@Log (value) (71)@Max (value, value) (72)@Min (value, value) (72)@Mod (value, value) (73)@PctChange (value, value, value, string) (73)@Pow (value, value) (74)@Radian (value) (74)@Root (string, value, value) (75)@Series (value, value, value) (75)@Sin (value) (76)@Sqrt (value) (76)@TMax (value, value) (77)@TMin (value, value) (77)@TrendChecker (value, value, value, string) (78)杂项系统函数@Alloc (value) (79)@CArea (value, value, value) (80)@DataInterpolator (string, value, string) (81)@Distance (value, value, value, value) (81)@Marker ( ) (82)@MarkerDepth (string, value, value) (82)@MdfromTvd (value) (83)@Metric ( ) (83)@NoKeys ( ) (84)@PrintError (string) (84)@PrintStatus (string) (85)@PvtFile (string) (85)@Random() (86)@Reg (value, value, string) (86)@ReportFile (string) (87)@RepHeaderFile (string) (87)@ReverseOrder() (87)@Step (value, value) (88)@Time ( ) (88)@Today ( ) (89)@TVD (value) (89)@Underscore (string) (90)测绘系统函数@EquationGraphLine ( ) (90)@EvalGraphLine (value) (91)@PlotFile (string) (91)@PlotfromFile (string, string, string) (92)@PlotHeaderFile (string) (93)@SlopeGraphLine ( ) (93)@XGraphTrace ( ) (94)@YGraphTrace ( ) (94)@YoGraphLine ( ) (94)程序系统函数@AppendFile (string) (96)@ARec (value, value) (96)@AsktoStore (value, string, value) (97)@CDataIndex (value, value) (98)@CDataValue (value, value, value) (98)@CFirst (value, value) (99)@CLast (value, value) (99)@CloseFile ( ) (10)@CountInput (value) (100)@DataIndex() (100)@DataIndexRange(numeric) (101)@DRecCount ( ) (101)@DualKeyCount (value) (102)@DualKeySelect (value, value) (103)@DualKeyValue (value, value) (103)@FileExist (string) (104)@First (value) (104)@FromFile (string, string, string) (105)@GetDataIndex ( ) (105)@Last (value) (106)@Launch (string, string) (106)@Length (value, value) (107)@LoadName ( ) (107)@Name ( ) (108)@Next (value) (108)@Null ( ) (109)@OpenFile (string) (109)@Previous (value) (110)@Recall (value) (110)@RecallStr (value) (110)@RecCount (string) (111)@ResetSums ( ) (111)@RRec (value) (112)@SetDataIndex (value) (112)@Store (value, value) (113)@ValueAt (value, value) (113)@WriteFile (string) (114)@WriteFileBreak ( ) (114)@WriteFileFmt (string, value) (115)PVT 系统函数@Cf (value, string) (116)@PfromPoZ (value) (117)@PvtBg (value) (117)@PvtBo (value) (118)@PvtBt (value) (118)@PvtBw (value) (119)@PvtCg (value) (119)@PvtCo (value) (119)@PvtCw (value) (120)@PvtPb ( ) (120)@PvtPc ( ) (121)@PvtRs (value) (121)@PvtRsw (value) (121)@PvtSetAPI (value) (122)@PvtSetPb (value) (122)@PvtSetPc (value) (123)@PvtSetRsb (value) (123)@PvtSetSg (value) (124)@PvtSetT (value) (124)@PvtSetTc (value) (124)@PvtTc ( ) (125)@PvtVg (value) (125)@PvtVo (value) (126)@PvtVw (value) (126)@PvtWrho (value) (126)@PvtZ (value) (127)统计系统函数@AbsDev (value) (128)@AveInput (value) (129)@CAbsDev (value, value) (129)@ClrRSum (value, value) (130)@ClrTAve (value, value) (130)@ClrTSum (value, value) (131)@CMvAve (value, value, value, value) (131)@CRAve (value, value) (132)@CRSum (value, value) (132)@CStdDev (value, value) (133)@CTAve (value, value) (133)@CTSum (value, value) (134)@CumInput (value) (134)@CVariance (value, value) (135)@MvAve (value, value, value) (135)@RAve (value) (136)@RSum (value) (136)@StdDev (value) (137)@TAve (value) (137)@TSum (value) (138)@Variance (value) (138)字符串操作系统函数@AtoN (string) (139)@CFirstStr (string, value) (140)@CLastStr (string, value) (140)@CmpStr (string, string) (141)@DtoN (string) (141)@FindStr (string, string) (142)@FirstStr (string) (142)@IfStr (value, string, string) (142)@InStr (string, string) (143)@LastStr (string) (143)@LenStr (string) (144)@NtoA (value, value, value) (144)@SubStr (string, value, value) (145)表格系统函数@Lookup (value, string, value, string) (145)@RowAverage(numeric[,numeric,numeric]) (146)@RowSum(numeric,numeric) (147)@XRefAlpha (string, string) (147)@XRefValue (value / string, value) (147)OFM系统函数系统函数系统函数是包括一组在OFM内部运作的已编码的标准执行指令。

大规模复杂场景的可见性问题研究

大规模复杂场景的可见性问题研究

大规模复杂场景的可见性问题研究计算机研究与发展JournalofComputerResearchandDevelopmentISSN1000—1239/CN11-1777/TP42(2):236~246,2005大规模复杂场景的可见性问题研究普建涛查红彬(北京大学计算机科学系视觉与听觉信息处理国家重点实验室北京100871)(***********)ResearchonVisibilityforLarge-ScaleandComplexScenes PuJiantaoandZhaHongbin(NationallaboratoryofMachinePerception,DepartmentofComputerScience,PekingUniv ersity,Be~jing100871)AbstractThereal?-timerenderingtechniqueoflarge.-scaleandcomplexscenesisthebasisof manyimportantapplicationssuchasvirtualreality,real—timesimulation,and3Dinteractivedesign.Asoneway toacceleratetherenderingspeed,theresearchonvisibilityhasgainedgreatattentioninrecent years.Its aimistodiscardinvisiblegeometricobjectsasmanyaspossiblebasedongivenviewingpoints SOthat complexscenesorobjectscanberenderedortransferredthroughInternetatrealtime.Inthispa per,asurveyrelatedtothistopicispresented.First,SOmetypicalvisibilityalgorithmsareanalyzedi ndetail.Second,SOmetopicsrelatedtovisibilityresearchareintroducedandthenecessarycompone ntsandstepsthatavisibilityalgorithmneedsaredescribed.Third,SOmecriteriatoevaluatethevisibilitymeth odsareproposed.Finally,SOmekeytechniquesthatneedfurtherresearcheffortsarepointedout. Keywordsvigibility;culling;large—scaleandcomplexscene;real—timenavigation摘要大规模复杂场景的快速绘制技术是虚拟现实,实时仿真以及三维交互设计等许多重要应用的底层支撑技术,也是诸多研究领域面临的一个基本问题.作为其中一个技术解决手段,可见性问题在近年来得到了高度重视并取得了一系列研究成果.通过对相关可见性算法进行分析与总结,阐述了可见性问题的研究内容,提出了方法的优劣判别标准,给出了可见性判断算法所应包含的基本组成部分和步骤,最后指出未来仍然需要重点研究的若干关键问题.关键词可见性;剪裁;大规模复杂场景;实时漫游中图法分类号TP391.411引言可见性问题涉及到众多的研究领域[.作为计算机图形学的一个基本问题,它的主要目的是对于给定的场景和观察视点,通过对遮挡关系进行快速判断,扔弃大量不需要绘制的图形对象,降低场景的复杂程度,增加整个场景的真实感,最终实现低负荷绘制与网络传输.早在20世纪70年代,针对隐藏面消除HSR(hiddensurfaceremova1)就已经提出了众多算法.Sutherland等人_2J在1974年的一份研究报告中对已有的HSR算法进行了讨论与分类,从中不难看出,从隐藏面消除的角度,这个问题已经得到了基本解决.但是,随着三维场景的规模与复杂度的增加,试图用传统的方法对场景进行实时操纵或者漫游变得非常困难.比如,波音777的三维模型收稿日期:20030811;修回日期:2004—02—25基金项目:国家自然科学基金重点项目(60333010);中国博士后科学基金项目(2003033078)普建涛等:大规模复杂场景的可见性问题研究237大概有500万个三角形面片_3J,如果以20帧的速度来绘制的话,那么就需要每秒钟绘制100亿个三角形面片,计算量非常大,即使目前最好的图形硬件也很难满足要求.因此,在保证场景真实感前提下,为了提高人机交互的实时性,要尽可能地减少绘制的工作量.目前,除网格简化等方法之外,可见性判断是一种行之有效的方法,因为在大规模复杂场景中,虽然需要绘制的对象数量大幅度增加了,但是对观察者来说却未必增加很多,往往可见实体的数目要远远小于输入实体数量的总和.例如,当观察者处于房间内部的时候,进入视觉范围的只是房间里的部分物体以及透过门窗所能看到的东西;当处于房间外部的时候,只能看到房间的外表面,其内部对观察者来说是不可见的.因此,场景规模与实际绘制量之间存在一定的差异,这也正是可见性研究的依据和基础.本文的主要内容如下:①阐述了可见性问题面临的技术难点;②对国内外主要研究成果进行分析与总结;③抽象出现有遮挡裁减算法的特点与分类;④总结了可见性判断算法包含的基本步骤以及算法的优劣判别标准;⑤指出了未来仍然需要重点研究的若干关键问题.最后给出了全文总结.2可见性问题的难点在实际应用中,可见性问题并不仅仅是一个简单的遮挡关系判断问题.由于场景规模的增大产生了计算复杂度与稳定性问题,不仅需要考虑交互的实时性,而且还要考虑画面的质量与稳定性,很多因素影响可见性判断所需要采取的策略,传统的HSR 算法很难满足实际应用要求.主要难点表现在: (1)复杂场景的非连续变化性.观察者视点空间位置的微小变动可能会造成显示帧率急剧变化, 产生"POPING"或者空白现象,影响漫游效果. (2)场景的状态变化.场景呈现的状态也影响可见性判断的复杂程度,对于动态场景由于往往需要预先计算出潜在可见实体集合PVS(potentially visibleset),一旦遮挡物或者被遮挡物在交互或者漫游过程中发生位置改变,PVS可能会因为无法实时更新而变得无效,最终导致错误的可见性判断. (3)观察视点的连续变化性.传统的HSR算法主要基于点进行可见性判断,但是为了满足一些特定要求,比如漫游帧率与画面质量,需要基于体或者区域来进行可见性判断.(4)对场景规模的敏感度.随着场景规模的增大,可见性判断的计算量亦随之增大,一般呈线性增长,往往会超过机器的实时计算和图形绘制能力,应用场景的规模就受到限制.因此,要尽可能降低计算量与场景规模之间的关联.(5)可见性判断结果的非精确性.为了满足实时性的要求,往往需要采取近似的方法,不能保证计算结果一定是精确的,通常会遗漏部分可见实体. 因此,需要对其中产生的误差进行估算,保证误差在一定可接受范围之内.针对上述问题,国内外很多研究者根据自身应用领域的特点,从不同角度提出了相应的一些算法, 下面将对这些方法进行全面介绍与分析.3可见性判断方法解析可见性问题的表现形式可以分为3种(如图1所示):①基于视锥的可见性判断;②实体自身的可见性判断;③实体之间的遮挡关系判断.由于形式①的判断比较简单,因此研究的重点集中在对形式②和③上,在具体实践中,很少将两者分开处理.同时,如果以实体为单位来看,存在3种可见性状态: 完全可见,完全不可见,部分可见部分不可见.早期,由于应用的局限性,研究的目标主要是消除隐藏面,只追求最终结果的准确性,并不很关心交互的实时性.但是,随着场景规模与复杂度的增加以及交互实时性等因素的考虑,近年来出现的一些可见性判断算法的主要目的是想方设法快速剔除被遮挡实体.从这个角度上,可以将可见性判断方法分为两大类:HSR算法和遮挡裁减算法(occlusionculling, OC).前者主要包括传统的一些算法,如画家算法和Z—Buffer算法等;后者主要是针对大场景的实时漫Fig.1Threekindsofvisibility.图1可见性问题的3种表现形式238计算机研究与发展2005,42(2)游加速提出的一些方法,更多地强调一种遮挡关系. 下面将对这些方法进行详细的介绍和分析.3.1HSR算法HSR算法没有特别考虑场景的规模与复杂度,而且对这些因素很敏感,一旦规模与复杂度稍有增加,绘制效率就会急剧下降.20世纪70年代的时候相关算法研究已经成熟,Sutherland等人_2J全面讨论了当时现有的HSR算法,并将其归结为对象空间和图像空间两大类型,典型算法主要有画家算法, Z—Buffer算法和BSP树.画家算法的大致原理是先把屏幕置成背景色,再把物体的各个面按其离视点的远近进行排序,然后以距离视点从远到近的顺序绘制这些多边形,后者的显示取代前者的显示,与画家作画的过程类似. 但是,这种方法绘制速度太慢,并不适合实时绘制. Z—Buffer算法是一种典型的图像空间算法,基本思想非常简单,就是在屏幕空间的像素级上以近物取代远物,每次画多边形时,每个像素依照z缓冲进行检查,如果新像素的z值比缓冲里的Z值更近,则缓冲里相应的像素被更新.该方法实现起来比较灵活,有利于硬件实现,可以处理任何情况下的三维多边形,甚至除多边形以外的几何模型(例如球,弯曲表面等).但是总的来说这是一种慢速的算法,除非采用特殊的Z—Buffering硬件加速设备.利用BSP 树_4j可以将多边形构成的场景形成一个可递归遍历的树,它以二叉树的形式保存了模型的拓扑结果信息,从三维空间中的任意给定点出发递归遍历整个树,得到由后往前或由前往后的多边形集合,然后就可以利用画家算法由后往前画出多边形.如图2所示,图2(a)为二维多边形场景,对其进行分割,可以形成图2(b)所示的BSP树,根据视点所在的位置对BSP树进行遍历,可以形成一个从前到后的序vieWer《B怠PartitioningTreeo12ndFig.2BSPtree[.(a)Partitionsof2Dsceneand(b)CreationandtraversingofBSPtree.图2BSP树原理示意.(a)二维场景分割;(b)BSP树的生成与遍历列,如图2(b)所示.与其他方法相比,BSP树是一种相当复杂的实现方式,并没有提供避免重画的机制, 比较适合描述静态场景.3.2OC算法近年来,针对遮挡裁减,从不同角度提出了很多算法,主要有以下几种.3.2.1方位图方法(aspectgraph)方位图是计算机视觉中的一个非常重要的概念|5j,作为一种数据结构,节点表示对象的稳定视图,节点之间由一条边相连,这条边只能连接两个邻接稳定视图,所谓稳定视图是指不同视点的二维投影视图的拓扑结构,即图像结构图(imagestructure graph,ISG)是同构的.当视点在两个稳定视图做一定的移动,如果可以保证在这两个稳定视图之间不出现第3个稳定视图,那么我们说这两个视图是邻接的,两者过渡的地方就是可见性事件(visibility event).这样,通过维护一系列可见性事件,就可以利用时间的接近性实现可见性预测.Plantinga等人|6j详细研究了当视点变化时,多面体投影图像拓扑结构的相应变化.方位图方法试图通过对场景数据的预组织来实现快速计算遮挡关系.图3所示是两条线段对平面区域的划分,可以将其中一条线段当做遮挡物,另一条当做被遮挡物,通过线段顶点之间的相连,可以将空间分为3个区域:①完全可见,如区域1;②完全不可见,如区域3;③部分可见,部分不可见,如区域2.不同的顶点关系形成了不同类型的直线,它们分别是分割线(separatingline)和支撑线(supporting line),对于每个区域,都可以决定模型的可见部分.不难看出,随着边数的增加,方位图的数目会急剧增长,在三维空间中,方位图的最大复杂度可以达到O(),如果边数较多,为了计算方位图所需要花费÷一:≥mTispartiallyoccludedfromregion2 Tisfullyoccludedfromregion3Fig.3Planepartitionsbytwolines[图3两条直线对平面区域的分割o4㈣V2吼o●普建涛等:大规模复杂场景的可见性问题研究239 的时问和空问将是非常大的,导致可行性很小,使得这种方法只适用简单模型.在方位图方法的基础上,Plantinga提出了一种保守可见性预处理l7一.试图降低复杂度,它实际上是一种基于视区域的可见性判断方法,遗憾的是这种方法并没有实现.Shimshoni和Ponce等人l8J提出了有限分辨率方位图(finite-resolutionaspectgraph)方法,试图给出一个对方位图的近似,在一定的近似范围内减少节点.Coorg和Tellerj通过利用空间实体的可见性在时间上的关联关系,对于每个视点充分估计了可见对象,当视点移动时,就设法利用前一个视点的可见性信息,其中需要维护一个方位图数据结构表示可见事件,虽然可以能很好地避免场景突跃效果,但是效率却比较低.3.2.2层次Z缓冲方法(hierarchicalZ—buffer, HZB)传统的z—Buffer算法在扫描转换中充分利用了图像空间中的内在连续性,只需要对一个图像空间的一个图像单元进行计算,在像素级上以近物取代远物,从而高效地实现对整个多边形的可见性计算.但是,对于大规模复杂场景来说,使用这种方法会对场景中的每个多边形在像素层次上进行可见性计算,增加了很多计算时间,效率很低.层次Z—Buffer算法_10_的提出就是针对这种问题的改进,使用了两种数据结构:八叉树和z—pyramid,如图4所示,将场景中的实体以八叉树结构组织,利用场景空间的相关性来判断每一八叉树结点的可见性,如果八叉树结点所对应的实体对于当前的缓冲器是不可见的,那么该结点所表示立方Fig.4Octree&amp;Z—pyramid图4AY-树与Z—pyramid体内任何实体都是不可见的;否则,以同样方式考察其子结点.为提高可见性判断效率,还利用了图像空间的相关性,以Z—pyramid的形式表示,即将屏幕像素的可见点按其深度值组织成四叉树结构,后一级的分辨率是上一级的四分之一,其中每一像素所存储的值为该像素所对应屏幕空间可见结点的最大值.这样,利用层次算法就可实现景物八叉树结点和当前屏幕像素可见点集的快速区域重叠测试和深度比较,从而快速地判定结点和面片的可见性.此外,层次Z—Buffer算法利用了时间上的相关性加速可见性计算.虽然层次Z—Buffer算法能高效地剔除不可见实体,当前的图形系统在硬件上并不支持Z—pyramid,要用软件来模拟需要资源消耗相对较高, 还无法做到实时绘制.3.2.3层次遮挡图方法(hierarchicalocclusionmap,HoM)针对深度复杂性较高的场景,Zhang等人l1一提出了层次遮挡图算法.使用了两个层次概念,即对象空间包围体层次和图像空间遮挡图层次,并且将两者结合起来,其中图像空间遮挡图是整个算法的核心,表示遮挡物在图像平面的累积投影.对于每一帧,都从模型中选择一小组实体作为遮挡物,并对它们进行绘制形成最初的遮挡图,基于该图就可以保守地裁减掉对于当前视点不可见的部分.该算法实际上是将可见性问题分解为两个子问题(如图5 所示):①二维空间的重叠检测;②深度检测,前者决定被检测对象是否完全落在所有遮挡物的投影之中,后者决定被检测对象是否落在遮挡物的后方,即使用遮挡图进行重叠检测,使用深度估计缓冲器作为深度检测.Fig.5HOMalgorithm【.图5HOM算法简单示意图[一HOM层次是一个图像金字塔,如图6所示,层次中的每一副图都有屏幕空间的对应矩阵块中的像素组成,像素值记录着该块的透明情况,上一层是下一层2×2像素的类加平均.为了创建HOM,从遮240计算机研究与发展2005,42(2)挡物集中挑选一些遮挡物绘制到帧缓冲器,这个过程只需透明值,不需要纹理,光照以及深度等信息, 将场景实体绘制为白色,背景绘制为黑色,然后从帧缓冲器逐步读出,即可建立HOM.Fig.6ImagepyramidofHOM.图6HOM图像金字塔HOM方法的特点就是将三维场景空间中的可见性测试转变为二维图像空间中的重叠测试,适合所有的模型,没有形状,大小以及类型的限制,而且利用了当前的图形硬件,支持近似的遮挡剔除.在这个方法中,最先遮挡物的选取是至关重要的,如果选取的遮挡物所形成的遮挡图的不透明度很低,那么就很难进行有效的遮挡裁减.3.2.4空间分割策略(spacepartitionorsubdivision)最直观的一种空间分割策略就是单元一入口方法(cell—porta1).基本思想是将整个场景分为一系列单元(cel1),单元之间通过入口(porta1)彼此相视,可见性问题转换为如何计算这些单元内部以及单元之间的可见性.这种方法最先是基于建筑物内部结构的特点提出来的,将每个房间看做一个单元,而将该房间与其外部的可见连接处看做入口,整个建筑物就可以看做众多单元通过入口连接起来,当观察者处在一个单元内部的时候,他只能看到该单元内部以及透过入口的相关物体,不用关心其他场景实体,从而大幅度降低了计算与绘制负荷.与一般的可见性方法不同,它首先定义一个空的PVS,然后根据入口逐步加入一些可见对象,而一般的可见性方法则是从整个场景中逐步去掉那些不可见的对象,两者是一个相反的过程.Teller等人l12]针对建筑结构模型,提出了一种预处理方法,将模型分为若干矩形单元,如图7所示,边界由不透明的平面组成,然后在边界上找出入口(图7(a)中的灰线部分),用它将这些单元连接起来,形成邻接图,在此基础上计算单元一单元的可见性.整个预处理过程脱机进行,使用BSP树对场景实体按照视点位置从后往前的顺序排列,目的是为了减少实时过程中的计算量.此外,他们还考虑了显示帧之间的关联性,但是并没有给出详细可行的方法.Jimenez等人_13_采取了类似的方法,将可见性信息分类为3种:单元单元(cel1.to—cel1),单元一区域(cell—to—region)和单元一实体(cell—to—object).与Teller等人的方法相比,该方法使用Visibility Skeleton来对一系列可见性变化进行编码,并纳入邻接图结构中.Luebke和Georges_l在PVS的基础上,提出了一个相似算法,只有当建筑结构内部存Fig.7Spacepartitionmethod.(a)Partitionsof architecturemodeland(b)Adjacentgraph.图7空间分割方法示意.(a)建筑模型分割;(b)邻接图普建涛等:大规模复杂场景的可见性问题研究241在静态高遮挡情形时效率比较好,否则,计算复杂度会急剧增长.Y agel和Ray【15j通过对空间进行规整性的分割提出了另外一个算法,虽然该方法对模型的结构不是特别敏感,但是与Teller等人的方法相比,效率却比较低.为了进一步降低绘制复杂度,提高交互性能,A1iaga等人_1J从理论上探讨了如何通过纹理来替换通过入口看到的情景,对各种情形进行分析,但是并没有给出实验结果.单元一入口方法比较适合有一定规则的建筑结构模型,不适合室外场景.针对室外场景,Cohen等人[17j提出了一个将视区域(viewspace)分割为多个单元(cel1)的方法,每个单元包含一组可见对象的超集,对于给定的单元和每一个对象,通过寻找遮挡物来保证这个对象从单元中任何一点观察都不可见.为了简化期间,该算法将建筑物假设为一个盒子.Plantinga【7_给出的方法与之类似,但只适合当一小组遮挡物导致大部分遮挡关系的场合.Bittner【18l针对2.5维城市环境提出了一种线空间分割方法来进行可见性判断.虽然城市场景可能包含任意几何形状的实体,但是遮挡物基本上局限于和地面相联接的垂直平面,即建筑物的立面.该算法以空间层次结构来组织场景,通过近似地从前到后的顺序处理遮挡物,逐步在线空间中建立一个层次结构,用来表示当前场景中的可见部分,其中,使用了两种层次结构:①线空间的分割基于BSP 树,用于决定遮挡物的近似前后顺序;②整个场景以kD—tree树的形式组织,目的是实现对复杂场景的低敏感性.这个方法主要针对场景的2维投影,只在必要的时候考虑场景中实体的高度.3.2.5阴影方法(shadowfrusta)高效稳定的阴影算法依然是当前计算机图形学面临的一个富有挑战性的问题,与可见性问题在概念上有诸多相似之处,为了增加虚拟场景的真实性, 同样涉及处理场景对象之间的相对位置关系.因此,将阴影算法应用到可见性问题的研究上就显得比较自然,但是对这方面的研究和探讨并不是很多. Hudson等人_l提出了一种使用阴影锥进行遮挡裁减的方法:给定一个视点,动态地选择一组遮挡物,对于每个遮挡物,用视点与之轮廓线建立阴影锥,阴影中的所有实体均被裁减,如图8所示.该算法首先需要对场景进行分割,建立层次结构,然后通过对该层次结构进行遍历,使用干涉检测算法将从视点看不到的实体裁减掉,算法的性能与模型的层次结构深度和视点的位置有关系.在进行遮挡物选取时,使用了联机和脱机两种形式.最后,Hudson还指出该算法在最坏的情形下,裁减效率与场景规模呈线性关系,而且线性关系的系数也比较小.EFig.8Cullingbyshadowfrustum【.图8使用阴影锥裁减[此外,Drettakis和Fiume_2尝试了通过计算阴影边界来处理3维场景可见性问题,但是由于计算复杂度和稳定性问题很难应用于大规模场景.3.2.6遮挡树(occlusiontree)方法对于复杂场景要计算出所有对象的可见性,必然非常消耗时间和空间资源.为了快速进行可见性判断,实现遮挡裁减过程对场景的规模与复杂性的低敏感性,Bittner等人l2"通过挖掘场景实体之间的空间位置关系,提出了遮挡树_2和另外一种空间数据结构SVBSP树l(shadowvolumeBSPtrees),如图9所示.场景以kD—tree的形式来组织,可以有效地合并当前遮挡物的遮挡体空间,根据视点快速,准确地决定其中一部分被遮挡物体,而不需要严格计算出真实存在的可见对象,使用空间层次结构来表示场景的拓扑结构,基于给定视点和一组遮挡物来创建遮挡树,强调的是通过视点来计算可见对象的超集,这种方法只适用于静态场景,而且创建SVBSP树需要消耗一定的时间,但是可以通过随后的快速查询来弥补.ShadowedFragmentFig.9Occlusionrelationsfor2DcaseanditsSVBSPtree 图9二维遮挡关系以及相应的SVBSP树PSB≥'SU242计算机研究与发展2005,42(2)SVBSP树实质上是BSP树的一种扩展,每个内部结点是与阴影平面相关联,阴影平面由光源和遮挡物的边组成,当叶子结点标志为in时,表明该叶子结点在阴影内部,标志为out的叶子节点表示从光源处可见.3.2.7遮挡物融合方法(occluderfusion)在很多情况下,物体之所以被遮挡并不是一个遮挡物造成的,往往是在多个遮挡物的共同作用下实现了完全遮挡,因此一个好的遮挡裁减算法应该很好地进行遮挡物的融合,否则需要对场景根据视点从前往后进行排序,然后根据这个次序进行逐一检查遮挡关系引.很多方法都是将遮挡物分解为凸多边形,并以此作为遮挡物来考虑可见性判断算法的设计与实现.与之不同,Schaufler等人_2J首先对场景空问进行离散化,如图10所示.将遮挡关系以离散形式表示,让不透明物体以体的形式来遮挡背后的物体,有利于遮挡物扩展到空间相邻的不透明区域,离散结果在二维情形下以四叉树的形式表示,在三维情形下以八叉树的形式表示;然后,进行遮挡块区的扩展,主要目的是为了发现更大的遮挡区域,实现遮挡融合.Fig.10Discretionofscenespace【.图10场景空间离散化【JWonka等人[在对城市实时漫游的研究中,也考虑了遮挡融合问题,如图ll所示.他们认为,对于多个遮挡物,如果场景实体被视区域内每个采样点的收缩遮挡锥所遮挡,那么该实体整个视区域内也是被遮挡的,遮挡融合就可执行.Fig.11Extendingofoccludedblock【.图11遮挡地区的扩展[5]3.2.8其他方法除上述方法之外,Sana等人[26J提出了一种将遮挡裁减和与视点相关的绘制结合起来的方法,由于通常对于其中需要绘制的不可见对象以较高的细节层次进行绘制,导致资源的浪费,而依赖于视点的绘制提供了实时改变几何表面细节层次的能力,因此通过这两种特点结合可以有效地提高绘制效率.针对地形可见性问题,Floriani等人【将整个问题分为对两种可见性结构的计算,即连续和离散两种编码的可见性结构,而且还考虑了如何将这种方法作用于多分辨率地形模型.同样针对地形结构, Stewart[描述了一种用来加速地形绘制的层次可见性裁减算法,避免了绘制大量不可见的地形数据, 这个方法也考虑了多分辨率地形绘制问题.此外, Wonka等人_29_给出了一个联机遮挡裁减算法,用来实时计算场景空间的可见性,通过计算一组可见性集合来进行多帧显示,由于该方法专门使用了一台服务器来进行可见性计算,因此还需要处理计算与绘制之间的同步问题.最后,从发表文献来看,国内对遮挡裁减算法的研究不是很多,浙江大学的华炜_3..在其博士论文中提出了全局遮挡图(global occlusionmap,GOM)的概念和一个自适应计算GOM的方法.该方法利用了图像空间进行场景的可见性采样,同时以实体空间精度进行方向遮挡的判断,详细讨论了利用GOM进行遮挡剔除的算法. 4OC算法的特点与分类不难看出,已出现的众多OC算法研究的侧重点和出发点都有所不同,实现的方法也不一样,目的都是设法充分利用各种因素来快速判断场景中存在的遮挡关系,避免对不可见对象的绘制.总的来说,遮挡裁减算法可以从以下方面进行分析与归类.普建涛等:大规模复杂场景的可见性问题研究(1)实体空间和图像空间.任何一种遮挡裁减算法的基本出发点都是设法充分利用各种因素来快速判断场景中存在的遮挡关系,避免对不可见对象。

nx.random_geometric_graph的用法

nx.random_geometric_graph的用法

nx.random_geometric_graph的用法一、概述nx.random_geometric_graph 是 NetworkX 的一种生成随机图的函数,主要基于平面上点的几何距离来连接节点,也称为基于几何距离的随机图生成器。

其生成的随机图常被用于测试网络算法和进行实验。

二、函数定义```python nx.random_geometric_graph(n, radius, dim=2, pos=None, p=2, periodic=False, seed=None)```参数:- n :图中节点的数量 - radius :表示点之间的最大距离。

- dim :表示空间的大小,例如dim=2表示空间为二维平面。

- pos :表示节点的位置,默认随机生成。

- p :好像是一个距离指数,决定边的形成情况,当p → ∞ 时,该随机图趋近于完全图,反之则趋近于星形图。

- periodic :表示是否使用周期性边界。

- seed :为随机数生成器设置一个种子。

返回值:生成一个随机图。

三、基本用法1.用法一:简单使用只需要指定节点的数量、半径以及维数即可得到这个随机图。

```python import networkx as nxn = 50 radius = 0.2 G =nx.random_geometric_graph(n, radius) ```2.用法二:搭配pos参数使用pos 表示节点的位置,可以指定节点在二维平面上分布的位置。

可以利用 numpy 库生成一些二维坐标作为参数传递,这种情况下我们不需要再指定维数和半径。

下面是一个例子:```python import numpy as np import matplotlib.pyplot as pltpos = np.random.rand(n, 2) # 在二维平面中随机生成n个坐标 G = nx.random_geometric_graph(n,pos=pos) nx.draw(G, pos, node_size=20,node_color='blue') plt.show() ```我们可以发现,节点在正方形的边界内随机分布,没有直接相连的节点。

pytorch geometric 数据结构 -回复

pytorch geometric 数据结构 -回复

pytorch geometric 数据结构-回复什么是PyTorch Geometric?PyTorch Geometric(简称PyG)是一个基于PyTorch的图神经网络(Graph Neural Networks,GNN)库。

它为图数据建模和处理提供了丰富的功能和工具,使得研究人员和开发者能够更加便捷地构建、训练和评估图神经网络模型。

PyG提供了一系列高效的数据结构和算法,用于处理和操作图结构数据。

它的核心是两个重要的数据结构:Graph和Data。

Graph是用于表示图结构的对象,它由节点(nodes)和边(edges)组成,每个节点可以由特征向量描述,每个边可以附加边权重。

Data是将图数据转换为PyTorch的格式,其中包含节点和边的特征信息,以及相关的标签。

下面将一步一步回答关于PyTorch Geometric数据结构的具体问题。

1. 如何创建一个图(Graph)对象?在PyG中,以Graph对象的形式存储图结构数据。

要创建一个新的Graph对象,可以使用以下代码片段:from torch_geometric.data import Data# 定义节点特征x = torch.tensor([[1], [2], [3]], dtype=torch.float)# 定义边的索引和权重edge_index = torch.tensor([[0, 1, 1, 2], [1, 0, 2, 1]],dtype=torch.long)edge_attr = torch.tensor([[1], [2], [3], [4]], dtype=torch.float)# 创建一个Graph对象graph = Data(x=x, edge_index=edge_index,edge_attr=edge_attr)在上述代码中,我们首先定义了节点特征矩阵x,它是一个1列的浮点型张量。

然后,我们定义了边的索引edge_index和边的属性edge_attr,它们是表示边结构和特征的张量。

直线平行线垂线绘本读后感

直线平行线垂线绘本读后感

直线平行线垂线绘本读后感英文回答:After reading the book about parallel lines and perpendicular lines, I found it to be quite interesting and informative. The book provided a clear explanation of these concepts and their applications in geometry.Parallel lines are two or more lines that never intersect, no matter how far they are extended. They have the same slope and will never meet. This concept is widely used in various fields, such as architecture and engineering. For example, when constructing a bridge, engineers need to ensure that the supporting beams are parallel to each other to provide stability and strength.On the other hand, perpendicular lines are two lines that intersect at a right angle, forming a 90-degree angle. They have opposite reciprocal slopes. Perpendicular lines are commonly seen in everyday life, such as the corners ofa room or the intersection of two roads. In mathematics, perpendicular lines are used to calculate distances and angles.Understanding the concepts of parallel and perpendicular lines is crucial in geometry. It helps us analyze and solve various geometric problems. For instance, if we have two parallel lines and a transversal line intersecting them, we can use the properties of parallel lines to find the measures of corresponding angles, alternate interior angles, or alternate exterior angles.Moreover, the book also discussed the concept of a perpendicular bisector. A perpendicular bisector is a line that divides a line segment into two equal parts and intersects it at a right angle. This concept is often used in construction and design. For example, when constructing a building, architects use perpendicular bisectors to ensure that walls and partitions are evenly spaced and balanced.Overall, the book provided a comprehensiveunderstanding of parallel lines, perpendicular lines, and their applications. The examples and explanations helped me grasp the concepts easily. I now have a betterunderstanding of how these lines work and theirsignificance in various fields.中文回答:读完关于平行线和垂线的绘本后,我觉得这本书非常有趣和有启发性。

翻折在生活中的应用

翻折在生活中的应用

翻折在生活中的应用英文回答:Folding has various applications in our daily lives. One common application is in the field of fashion. Many clothing items, such as shirts, pants, and dresses, are designed with folding techniques to create unique and stylish looks. For example, a pleated skirt is created by folding the fabric in a specific pattern, which adds texture and movement to the garment. Another example is the origami-inspired folding techniques used in designing handbags or wallets, which not only enhance the aesthetics but also provide functionality and convenience.Folding is also widely used in packaging and transportation. Foldable boxes and containers are commonly used for shipping and storage purposes. These boxes can be easily folded and unfolded, allowing for efficient use of space during transportation. For instance, collapsible crates are used in the logistics industry to transportgoods in a compact and organized manner. Folding also plays a crucial role in the packaging of various products, such as food and electronics, as it helps to protect the items and optimize the use of packaging materials.Furthermore, folding is an essential technique in the field of architecture and interior design. Folding can be used to create innovative and dynamic structures. For example, folding techniques are often employed in the construction of folding doors or partitions, which can be easily opened or closed to create flexible spaces. Folding can also be seen in furniture design, such as foldable chairs or tables, which are convenient for storage and transportation.In addition to these practical applications, folding has also found its way into recreational activities. Origami, the art of paper folding, is a popular hobby enjoyed by many people around the world. It involvesfolding a single sheet of paper into various shapes and figures, such as animals, flowers, and geometric patterns. Origami is not only a form of artistic expression but alsoa way to relax and engage in mindful activities.Overall, folding has numerous applications in our daily lives, ranging from fashion and packaging to architecture and recreation. Its versatility and practicality make it an essential technique that enhances both functionality and aesthetics.中文回答:折叠在我们日常生活中有着各种应用。

Ecole Normale Sup'erieure

Ecole Normale Sup'erieure

Svatopluk Poljak 1
We study the convex set Ln de ned by Ln := fX j X = (xij ) positive semide nite n n matrix; xii = 1 for all ig: We describe several geometric properties of Ln . In particular, we show that Ln has 2n?1 vertices, which are its rank one matrices, corresponding to all bipartitions of the set f1; 2; : : : ; ng. Our main motivation for investigating the convex set Ln comes from combinatorial optimization, namely from approximating the max-cut problem. An important property of Ln is that, due to the positive semide nite constraints, one can optimize over it in polynomial time. On the other hand, Ln still inherits the di cult structure of the underlying combinatorial problem. In particular, it is NP-hard to decide whether the optimum of the problem min Tr(C X ); X 2 Ln is reached at a vertex. This result follows from the complete characterization of the matrices C of the form C = bbt for some vector b, for which the optimum of the above program is reached at a vertex.

Toronto, ON

Toronto, ON

Automatic Partitioning for Improved Placement and Routing in Complex Programmable LogicDevicesValavan Manohararajah,Terry Borer,Stephen D.Brown,and Zvonko VranesicAltera CorporationToronto Technology Center151Bloor St.West,Suite200Toronto,ONCANADA M4Y1R5vmanohar@,tborer@,sbrown@,zvonko@Abstract.This work explores the effect of adding a new partitioningstep into the traditional complex programmable logic device(CPLD)CADflow.A novel algorithm based on Rent’s rule and simulated an-nealing partitions a design before it enters the place and route stage inCPLD CAD.The resulting partitions are then placed using an enhancedplacement tool.Experiments conducted on Altera’a APEX20K chips in-dicate that a partitioned placement can provide an average performancegain of7%overflat placements.1IntroductionAn incremental design methodology called LogicLock[1]was introduced in ver-sion1.1of Altera’s Quartus II software.LogicLock offers users an alternative to the traditional“flat”place and route steps in CPLD CAD.In traditional CPLD CAD,the design being compiled is represented as one large netlist as it enters the place and route steps.This allows a global view that is beneficial for some circuits.However,for many large circuits it is difficult for a placement tool to ensure that tightly connected components in the design are not separated by large distances.With the LogicLock feature,users can create partitions(referred to as regions in Quartus II terminology)of logic which are kept together during the place and route stages.A partition may contain both logic and any smaller child partitions.Feedback from LogicLock users has indicated that the ability to keep components together during placement and routing helped increase the overall speed of several designs.This work considers a natural extension to the LogicLock feature.When the user has not specified a set of partitions,a par-titioner within Quartus II is run to determine a set of good partitions for the design.To operate without any user assistance,the partitioner automatically determines both the number of partitions as well as the partitioning itself.The partitioning procedure may not be beneficial for all circuits therefore we alsoconsider the problem of determining when the automatically created partitions are to be used.2Previous WorkAn excellent survey of previous work in netlist partitioning was presented by Alpert and Kahng[4].In their work,partitioning approaches are classified into four categories:move-based approaches,geometric representations,combinato-rial formulations and clustering approaches.Our work uses a move-based ap-proach with some elements of the clustering approach.During a single iteration of the proposed algorithm,a move-based approach is used within a simulated annealing framework to generate a set of partitions.Bigger partitions are created out of smaller partitions discovered during earlier passes of the algorithm in a process that is similar to many bottom-up clustering approaches[7][8].Roy and Sechen[5]explored the use of a simulated annealing approach within a timing-driven FPGA partitioner.In their work,the netlist is clustered to cre-ate a smaller problem for the partitioning tool.In another simulated annealing approach described by Sun and Sechen[6],the circuit to be placed is clustered in a preprocessing step and the clustering information is used during placement.Our work uses a cost function based on Rent’s rule[16].A clustering al-gorithm based on Rent’s rule was previously described by Ng et al[9].Hagen et al.[10]described a spectra-based ratio cut partitioning scheme that creates partitions with the lowest observed Rent’s parameter.Recently,Rent’s rule has found several uses in interconnection prediction and congestion estimation dur-ing placement[11][12][13][14].3The APEX ArchitectureIn this work,Altera’s APEX chips were the target of the partitioning exper-iments.A simplified internal view of an APEX chip is presented in Figure1. At the highest level,the chip is organized into four quadrants.Each quadrant contains an array of MEGALABs arranged into set of columns and rows.A MEGALAB contains a number of LABs and a single ESB.Each LAB contains a set of LEs.An LE is the basic logic element in the APEX architecture.It con-sists of a four-input LUT(4-LUT),a programmable register,carry and cascade logic.The ESB can be used to realize memory functions in the design.For a detailed description of the internal structure and interconnect arrangement in the APEX chips see[2].4The CADflowThe CADflow used in our work is illustrated in Figure2.Once technology mapping is complete an optional partitioning step is executed to discover a set of partitions for the design.Following partitioning,a partition aware clusteringl a b l a b e s b e s b l a b l a b l a b l a b e s b e s b l a b l a b l a b l a b e s b e s b l a b l a bl a b l a b e s b e s b l a b l a bl a b l a b e s b e s b l a b l a b l a b l a b e s b e s b l a b l a b l a b l a b e s b e s b l a b l a bl a b l a b e s b e s b l a b l a bFig.1.The APEX architecture.step is performed.The partition aware clusterer decomposes the logic within a partition into LABs.It also ensures that logic from one partition is not mixed with logic from another partition.The partition aware placement step is much more complicated than the normal placement step found in the traditional flow.First,each partition in the design is assigned a rectangular shape large enough to hold its contents.After shape assignment,the new placement step has to determine a position for each of the rectangular shapes in the design as well a position for each of its contents.Adetailed description of the placement tool is beyond the scope of this paper.A description can be found in [3].The final step,routing,requires no changes and is identical to the one used in an unpartitioned CAD flow.Fig.2.A CAD flow that incorporates a partitioning step.5PartitioningThe input to the partitioning step is a netlist that consists of memory elements, LEs and IOs.The partitioning step produces a set of partitions for the netlist as output.A partition may contain both LEs and smaller partitions.Memory elements and IOs do not participate in the partitioning process because their placement is quite restricted.Both memory elements and IOs can only be placed at select locations on the chip whereas the possibilities for the placement of LEs is numerous.The placement tool is left the task of discovering the best spots for both memory elements and IOs.5.1OverviewAn overview of the partitioning algorithm is presented in Figure3.First,a timing analysis routine is executed to determine a weight for each wire in the design.Interconnect delays are required in order to perform a timing analysis but they only become available after placement and routing have completed. An experiment was performed with25small to medium sized industrial circuits to determine if a net’s delay could be related to its fanout.Each circuit was placed and the delay of the fastest path(using general interconnect)between each source-sink pair was obtained.This data allows for some rough estimation of wire delays.A rough estimate for the delay(in picoseconds)of a source-sink connection on a net with fanout of f is given by(1141log f+733).Now that a model for interconnect delay is available,timing analysis can be performed. Timing analysis determines a slack s uv[15]for each source-sink pair(u,v).We define the criticality,c uv,of each source-sink pair(u,v)asc uv=1−s uvmax∀ij s ij(1)The criticality provides an indication of the relative importance of each source-sink connection and is used as the weight when the cost function is computed.The main partitioning loop consists of block formation and partitioning steps. Block formation behaves differently depending on the value of h.During thefirst call to block formation(h=0),blocks will be created out of the LEs in the de-sign.Most blocks will contain a single LE.However,if there is logic in the design that uses the specialized carry or cascade chain routing,then all LEs that are a part of the carry or cascade chain will become a part of a single block.During the subsequent calls to block formation(h>0),blocks are created out of the partitions discovered during the previous pass.Each partition discovered dur-ing the previous call to Partition is transformed into a block.In this manner, larger partitions can be built out of smaller ones.The partitioning step,Par-tition,uses a simulated annealing based optimization algorithm to divide the blocks into partitions.The main loop terminates whenever new partitions are not being discovered or when the number of levels in the hierarchy discovered so far exceeds the prespecified limit,hlimit.In the experiments,hlimit was set to4.1:TimingAnalysis()2:h←03:do4:BlockFormation(h)5:Partition()6:h←h+17:while NewPartitionsFormed()=trueand h<hlimitFig.3.Overview of the partitioning process.5.2Cost FunctionRent’s Rule The partitioning cost function is based on Rent’s rule[16].The rule relates the number of blocks in a partition,B,to the number of external connections,P,emanating from the partition.Rent’s rule is given byP=T b B r(2) where T b denotes the average number of interconnections for a block in the partition and r is Rent’s exponent.Rent’s exponent,r,is a value in the range [0,1].A value close to1indicates that most of the connections in the partition are external and a value near0indicates that almost all connections are internal.Weighted Rent’s Rule Rent’s rule as stated in Equation2treats all connec-tions equally.However,to account for the fact that some connections may be more critical than the others,each connection is weighted using the criticality value obtained through timing analysis(Section5.1).The terms in Equation2 were revised slightly to account for weighted connections.The number of ex-ternal connections,P,now denotes the total weight of all external connections. Similarly,the average number of interconnections for a block in the partition,T b, now denotes the average weight of interconnections for a block in the partition. Cost of a Partitioning Solution A solution to the partitioning problem consists of a set of partitions each containing a number of blocks.The cost ofthe partitioning is given byC=B i r iB i(3)where B i is the number of blocks in region i and r i is Rent’s exponent for region i.Like Rent’s exponent,the cost,C,is also a value in the range[0,1].A value closer to0is indicative of a good partitioning and a value closer to1is indicative of a bad partitioning.Effect of Partition Size Consider two extreme solutions to the partitioning problem.In one case,all blocks may be placed in a single partition.In the other case,every partition contains a single block.Both cases cannot improve thequality of the placement that is obtained.A lower and upper limit on partition size was established to prevent the creation of partitions that are either too small or too large.If some solution to the partitioning problem has partitions that are smaller than the lower limit or bigger than the upper limit,the cost,C,is pulled towards1.5.3OptimizationThe main partitioning procedure,referred to as Partition in Figure3,starts with a random partitioning solution and iteratively improves it using simulated annealing[17].The starting temperature for the anneal,is obtained using a method similar to that described in[18][19].A set of m moves is randomly generated.Each move is then evaluated and the change in score is observed.The initial temperature is computed to be20times the standard deviation of the set of scoring changes.At each temperature in the anneal,m moves are generated and evaluated. The value of m is equal to the total number of external connections of the blocks participating in the partitioning process.During thefirst pass of partitioning, most blocks will contain a single LE.A four-input LUT is the primary component within an LE therefore,on average,each block is expected to have four input connections and one output connection.If n is the number of LEs in the circuit then m is approximately equal to5n during thefirst pass of partitioning.During the subsequent passes,blocks are formed from the partitions discovered during the preceding pass.A block that is formed in this manner will contain highly “localized”connections.The value of m will be greatly reduced because very few connections will be external to the newly formed blocks.The move generation routine used by the simulated annealer generates two types of moves:directed and random.To generate a directed move,a block with at least one connection crossing a partition boundary is picked at random.The directed move that is generated moves the block into a partition containing one of its endpoints.A random move may be further classified into two types: empty and non-empty.An empty random move picks a block at random and creates a new partition to contain the block.A non-empty random move picks a block at random and moves it into a randomly selected partition that already contains a number of blocks.The move generation routine generates directed moves with a probability of0.75and random moves with a probability of0.25. During random move generation,empty moves are generated with a probability of0.25and non-empty moves are generated with a probability of0.75.Once all moves at a particular temperature have been generated and evalu-ated,the temperature is reduced for the next iteration in the anneal.The new temperature,t new,is given byt new=t old·γ·β+(1−β)e−(α−0.45)2/4(4)Here,t old is the current temperature andαis the accept ratio observed during the iteration.The accept ratio is defined to be the ratio of the number of movesaccepted to the total number of moves tried.The multiplier,γ,controls the basic rate at which the temperature decreases.The fraction,β,controls whether the full multiplier or a scaled down version of the multiplier is used in determining the next temperature value.A Gaussian function which depends on the accept ratio,α,generates the scale factor used to reduce the multiplier.The function reaches its maximum value of1when the accept ratio is0.45.Previous research has indicated the benefit of keeping the temperature hovering around an accept ratio of0.45[20].Note that the full multiplier is used when the accept ratio is close to0.45or whenβis close to1.A value forβis computed by observing the improvement in score obtained during the pass that just completed.If there was a large improvement in score and if the accept ratio is low enough to ensure that the improvement cannot be attributed to random behavior,βis given a value close to1.This allows the search to spend more time at those temperatures that seem to produce large improvements in the score.Simulated annealing stops when the exit criterion is met.The exit criterion used here is similar to that described in[18].The exit criterion is true if the current temperature is lower than the exit temperature,t exit.A value for t exit is computed based on the current cost,C,and the total number of external connections of all blocks in the design,n ext:t exit=Cn ext(5)Here, is a small constant which was set to0.05during the experiments.Any move performed during the anneal is likely to affect several external connections. If the temperature drops below a fraction of the average cost of an external connection,it is unlikely that any move which increases the cost will be accepted and the annealing process can be terminated.6Experimental ResultsIn thefirst experiment,the performance of the partitionedflow was compared to theflatflow on20industrial circuits(Table1).The partitionedflow produces an average speedup of6.81%over theflatflow.Note that the partitioning algorithm produces a wide variety of partitions—circuit“ccta16”was partitioned into6 partitions while“ccta14”was partitioned into118partitions.In the second experiment,the performance of the automatic partitioning al-gorithm was compared to the performance of user created partitions(Table2). The user partitions were created by Altera’sfield application engineers and de-sign engineers working on the LogicLock feature.The user partitionedflow ob-tained a speedup of9.3%over theflatflow whereas the automatically partitioned flow obtained a speedup of7.82%.Apart from the results observed for circuits “cctb2”and“cctb3”,the automatic partitioning is competitive with the user cre-ated partitions.In fact,for circuits“cctb1”,“cctb4”,“cctb5”,“cctb7”,“cctb9”,“cctb11”and“cctb12”,the automatically generated partitions outperformed the user created partitions.Combining the results observed for both experiments,the automatically gen-erated partitioning improved the overall speed of the32circuits by7.04%.Compared to theflatflow,the automatically partitionedflow increases the overall compile time by25.26%for the32circuits.However,most of this time is spent within the partition aware placer rather than in the partitioner.The partitioning procedure itself consumes7.67%of the total compile time.Most of the increase in compile time is due to the increased cost of performing partition moves(the normal placer deals with LAB moves only)in the placer.Given a circuit of size n,the complexity of the partitioning procedure is O(n)(see Section5.3)whereas the complexity of the placer is O(n43).As circuits grow larger,the time spent within the partitioner will be a smaller fraction of the overall compile time.Circuit Flat PartitionedName Size Speed Speed Part-Speedup C R U(LEs)(Mhz)(Mhz)itions(%)ccta1902760.9360.3826-0.900.363.271ccta25357126.58153.733321.450.3112.441ccta3582839.0243.083410.400.545.381ccta41130452.8752.89230.040.512.861ccta51127338.0246.431622.120.524.391ccta6559341.4743.4543 4.770.554.451ccta7545599.03109.063010.130.092.801ccta81082442.5743.8346 2.960.3217.791ccta9538057.5156.3615-2.000.378.901ccta107147157.16177.972013.240.427.301ccta11614545.5441.7284-8.390.761.480ccta12508695.19110.141015.710.4127.221ccta13678930.1629.4728-2.290.631.000ccta14956525.4926.18118 2.710.361.251ccta154720105.94124.191417.230.382.131ccta16364070.8870.9260.060.2218.611ccta171206443.6141.1221-5.710.584.860ccta18894012.4312.46160.240.672.980ccta19340371.5778.29169.390.379.761ccta20337843.1451.843520.170.3416.091Average 6.57paringflat compilations with partitioned compilations.7Auto On/OffThere are instances where automatic partitioning does not improve the per-formance of the circuit.For example,in Table1,when automatic partitioning is used,the performance of“ccta11”drops by8.39%and the performance of “ccta17”drops by5.71%.Similarly,in Table2,the performance of“cctb2”drops by12.84%with automatic partitioning.A natural question to ask is if situations like this can be prevented by using some statistics generated by the partitioner. First,we define the crossing ratio,R,asR=c AvgAllc AvgCross(6)Circuit Flat User Partitioned Auto Partitioned Name SizeSpeed Speed Part-Speedup Speed Part-Speedup C R U (LEs)(Mhz)(Mhz)itions (%)(Mhz)itions (%)cctb1451485.5993.6159.37104.252921.800.423.311cctb21046058.8772.633223.3751.3116-12.840.612.330cctb3499926.7630.471713.8626.97790.780.484.821cctb41063098.65109.841111.34113.07214.620.3116.501cctb51340592.8294.2822 1.5796.9964 4.490.3117.291cctb61414920.8521.3823 2.5420.6013-1.200.478.611cctb72053229.1832.14410.1433.571215.040.486.341cctb8830072.2180.28911.1877.92247.910.392.511cctb91091444.2646.6916 5.4947.40447.090.405.421cctb104385129.25134.3211 3.92127.0611-1.690.093.601cctb11493679.8187.02169.03101.3313626.960.401.681cctb124425107.97118.57259.82119.75410.910.1713.101Average9.307.82Table paring flat compilations,user partitioned compilations and automatically partitioned compilations.Here,c AvgAll is the average criticality of all connections in the circuit and c AvgCross is the average criticality of all connections that cross a partition bound-ary.Intuitively,a good set of partitions would have a low cost (see Section 5.2)and a high crossing ratio.Now,consider the following ruleUseful =1if C <0.56or R >50otherwise (7)The rightmost three columns of Tables 1and 2summarize the C ,R and Useful values for each circuit.This simple rule could help eliminate the use of the automatically created partitions on circuits “ccta11”,“ccta13”,“ccta17”and “cctb2”.However,it does not eliminate circuit “ccta9”which has a low cost and a high crossing ratio.More experimentation with a wide variety of circuits is necessary to determine whether rules such as this could be used to determine when the automatically created partitioning is useful.8ConclusionsThis work introduced a new partitioning algorithm that can be used to improve the quality of CPLD placements.The algorithm used simulated annealing to optimize a cost function based on Rent’s rule.An average performance improve-ment of 7%was observed for a set of 32industrial circuits.The automatically created partitions were found to be competitive with partitions created by ex-perienced users.This work also considered the possibility of using some of the statistics available to the automatic partitioner to determine if the automatically generated partitions would help or hurt a circuit’s performance.References1.Altera.“LogicLock Methodology White Paper”.Available at:/literature/wp/wp logiclock.pdf.2.Altera.Altera2000Databook.Available at:/html/literature/lds.html.3. D.P.Singh,T.P.Borer and S.D.Brown.“Constrained FPGA Placement Algo-rithms for Timing Optimization”.ACM Intl.Conf.FPGAs,submitted,2003.4. C.J.Alpert and A.B.Kahng.“Recent Directions in Netlist Partitioning:A Sur-vey”.Integration:The VLSI Journal,19:1–81,1995.5.K.Roy and C.Sechen.“A Timing-Driven n-way Chip and Multi-Chip Partitioner”.In Proc.IEEE/ACM puter-Aided Design,pages240–247,1993. 6.W.Sun and C.Sechen.“Efficient and Effective Placement for Very Large Circuits”.In Proc.IEEE/ACM puter-Aided Design,pages170–177,1993. 7. D.M.Schuler and E.G.Ulrich.“Clustering and Linear Placement”.In Proc.IEEE/ACM Design Automation Conf.,pages50–56,1972.8.H.Shin and C.Kim.“A Simple Yet Effective Technique for Partitioning”.IEEETrans.VLSI Systems,1(3):380–386,September1993.9.T.-K.Ng,J.Oldfield and V.Pitchumani.“Improvements of a Mincut PartitionAlgorithm”.In Proc.IEEE/ACM puter-Aided Design,pages470–473,1987.10.L.Hagen,A.B.Kahng,F.J.Kurdahi and C.Ramachandran.“On the Intrin-sic Rent Parameter and Spectra-Based Partitioning Methodologies”.IEEE Trans.Computer-Aided Design,13(1):27–37,1994.11. A.Singh,G.Parthasarathy and M.Marek-Sadowska.“Interconnect Resource-Aware Placement for Hierarchical FPGAs”.In Proc.IEEE/ACM Intl.Conf.Computer-Aided Design,pages132–136,2001.12.J.Dambre,P.Verplaetse,D.Stroobandt and J.Van Campenhout.“On Rent’s Rulefor Rectangular Regions”.In Proc.IEEE/ACM Intl.Workshop on System-Level Interconnect Prediction,pages49–56,2001.13.X.Yang,R.Kastner and M.Sarrafzadeh.“Congestion Estimation During Top-Down Placement”.In Proc.Intl.Symp.on Physical Design,pages164–169,2001.14. D.Stroobandt.“A Priori System-Level Interconnect Prediction:Rent’s Ruleand Wire Length Distribution Models”.In Proc.IEEE/ACM Intl.Workshop on System-Level Interconnect Prediction,pages3–21,2001.15.R.B.Hitchcock,G.L.Smith and D.D.Cheng.“Timing Analysis of ComputerHardware”.IBM Journal of Research and Development,26(1):100–105,January 1982.16. ndman and R.Russo.“On a Pin Versus Block Relationship for Partitions ofLogic Graphs”.IEEE Transactions on Computers,c-20:1469–1479,1971.17.S.Kirkpatrick,C.D.Gelatt and M.P.Vecchi.“Optimization by Simulated Anneal-ing”.Science,220:671–680,1983.18.V.Betz,J.Rose and A.Marquardt.“Architecture and CAD for Deep-SubmicronFPGAs”.Kluwer Academic Publishers,1999.19.M.Huang,F.Romeo and A.Sangiovanni-Vincentelli.“An Efficient General Cool-ing Schedule for Simulated Annealing”.In Proc.IEEE/ACM puter-Aided Design,pages381–384,1986.m and J.-M.Delosme.“Performance of a New Annealing Schedule”.In Proc.IEEE/ACM Intl.Design Automation Conf.,pages306–311,1988.。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Partitions of Complete Geometric Graphsinto Plane Trees∗Prosenjit Bose‡Ferran Hurtado§Eduardo Rivera-Campo¶David R.Wood‡February19,2004AbstractConsider the open problem:does every complete geometric graph K2n havea partition of its edge set into n plane spanning trees?We approach this problemfrom three directions.First,we study the case of convex geometric graphs.It is wellknown that the complete convex graph K2n has a partition into n plane spanningtrees.We characterise all such partitions.Second,we give a sufficient condition,which generalises the convex case,for a complete geometric graph to have a parti-tion into plane spanning trees.Finally,we consider a relaxation of the problem inwhich the trees of the partition are not necessarily spanning.We prove that everycomplete geometric graph K n can be partitioned into at most n−∗Research of all the authors was completed in the Departament de Matem`a tica Aplicada II,Universitat Polit`e cnica de Catalunya,Barcelona,Spain.‡School of Computer Science,Carleton University,Ottawa,Canada.Research supported by NSERC. Email:{jit,davidw}@scs.carleton.ca§Departament de Matem`a tica Aplicada II,Universitat Polit`e cnica de Catalunya,Barcelona,Spain.Par-tially supported by projects DURSI2001SGR00224,MCYT-BFM2001-2340,MCYT-BFM2003-0368and Gen.Cat2001SGR00224.Email:Ferran.Hurtado@upc.es¶Departamento de Matem´a ticas,Universidad Aut´o noma Metropolitana,Iztapalapa,M´e xico.Re-search completed while on sabbatical leave supported by MECD,Spain and Conacyt,M´e xico.Email: erc@xanum.uam.mx11IntroductionA geometric graph G is a pair(V(G),E(G))where V(G)is a set of points in the plane in general position(that is,no three are collinear),and E(G)is set of closed segments with endpoints in V(G).Elements of V(G)are vertices and elements of E(G)are edges.An edge with endpoints v and w is denoted by{v,w}or vw when convenient.A geometric graph can be thought of as a straight-line drawing of its underlying(abstract)graph.A geometric graph is plane if no two edges cross.A tree is an acyclic connected graph.A subgraph H of a graph G is spanning if V(H)=V(G).We are motivated by the following beautiful question.Open Problem1.Does every complete geometric graph with an even number of ver-tices have a partition of its edge set into plane spanning trees?Since K n,the complete graph on n vertices,has1trees in such a partition,and n is even.We approach2this problem from three directions.In Section2we study the case of convex geometric graphs.We characterise the partitions of the complete convex graph into plane spanning trees.Section3describes a sufficient condition,which generalises the convex case,for a complete geometric graph to have a partition into plane spanning trees.In Section4 we consider a relaxation of Open Problem1in which the trees of the partition are not necessarily spanning.It is worth mentioning that decompositions of(abstract)graphs into trees have at-tracted much interest.In particular,Tutte[8]and Nash-Williams[6]independently obtained necessary and sufficient conditions for a graph to admit k edge-disjoint span-ning trees,and Ringel’s Conjecture and the Graceful Tree Conjecture about ways of de-composing complete graphs into trees are among the most outstanding open problems in thefield.Nevertheless the non-crossing property that we require in our geometric setting changes the problems drastically.2Convex GraphsA convex graph is a geometric graph with the vertices in convex position.An edge on the convex hull of a convex graph is called a boundary edge.Two convex graphs are isomorphic if the underlying graphs are isomorphic and the clockwise ordering of the vertices around the convex hull is preserved under this isomorphism.Suppose that G1 and G2are isomorphic convex graphs.Then two edges cross in G1if and only if the2corresponding edges in G2also cross.That is,in a convex graph,it is only the order of the vertices around the convex hull that determines edge crossings—the actual coordinates of the vertices are not important.It is well known that Open Problem1has an affirmative solution in the case of convex complete graphs.That is,every convex complete graph K2n can be partitioned into n plane trees,and since the book thickness of K2n equals n,this bound is optimal even for partitions into plane subgraphs[2,3].In this section we characterise the solutions to Open Problem1in the convex case.In other words,we characterise the book embeddings of the complete graph in which each page is a spanning tree.First some well known definitions.A leaf of a tree is a vertex of degree at most one.A leaf-edge of a tree is an edge incident to a leaf.A tree has exactly one leaf if and only if it is a single vertex with no edges.Every tree with at least one edge has at least two leaves.A tree has exactly two leaves if and only if it is a path with at least one edge.Let T be a tree.Let T′be the tree obtained by deleting the leaves and leaf-edges from T. Letℓ(T)be the number of leaves in T′.A star is a tree with at most one non-leaf vertex. Clearly a tree T is a star if and only ifℓ(T)≤1.A caterpillar is a tree T such that T′is a path.The path T′is called the spine of the caterpillar.Clearly T is a caterpillar if and only ifℓ(T)≤2.Observe that stars are the caterpillars whose spines consist of a single vertex.We say a tree T is symmetric if there exists an edge vw of T such that if A and B are the components of T\vw with v∈A and w∈B,then there exists a(graph-theoretic) isomorphism between A and B that maps v to w.Theorem1.Let T1,T2,...,T n be a partition of the edges of the convex complete graph K2n into plane spanning convex trees.Then T1,T2,...,T n are symmetric convex caterpil-lars that are pairwise isomorphic.Conversely,for any symmetric convex caterpillar T on 2n vertices,the edges of the convex complete graph K2n can be partitioned into n plane spanning convex copies of T that are pairwise isomorphic.We will prove Theorem1by a series of lemmas.Garc´ıa et al.[5]proved:Lemma1([5]).Let T be a tree with at least two edges.In every plane convex drawing of T there are at least max{2,ℓ(T)}boundary edges,and there exists a plane convex drawing of T with exactly max{2,ℓ(T)}boundary edges,such that if T is not a star then the boundary edges are pairwise non-consecutive.In what follows{0,1,...,2n−1}are the vertices of a convex graph in clockwise order around the convex hull.In addition,all vertices are taken modulo2n.That is,3vertex i refers to the vertex j=i mod2n.Let G be a convex graph on{0,1,...,2n−1}. For all0≤i,j≤2n−1,let G[i,j]denote the subgraph of G induced by the vertices {i,i+1,...,j}.Lemma2.For all n≥2,let T0,T1,...,T n−1be a partition of the convex complete graph K2n into plane spanning trees.Then(after relabelling the trees)for each0≤i≤n−1,(1)the edge{i,n+i}is in T i,(2)T i is a caterpillar with exactly two boundary edges,and(3)for every non-boundary edge{a,b}of T i,there is exactly one boundary edge of T i ineach of T i[a,b]and T i[b,a].Proof.The edges{{i,n+i}:0≤i≤n−1}are pairwise crossing.Thus each such edge is in a distinct bel the trees such that each edge{i,n+i}is in T i.Since n≥2,each T i has at least three edges,and by Lemma1,has at least two boundary edges.There are2n boundary edges in total and n trees.Thus each T i has exactly two boundary edges,and by Lemma1,ℓ(T i)=2.For any tree T,ℓ(T)≤2if and only if T is a caterpillar.Thus each T i is a caterpillar.Let{a,b}be a non-boundary edge in some T i.Then T i[a,b]has at least one boundary edge of T,as otherwise T i[a,b]would be a convex tree on at least three vertices with only one boundary edge(namely,{a,b}), which contradicts Lemma1.Similarly T i[b,a]has at least one boundary edge of T.Thus each of T i[a,b]and T i[b,a]has exactly one boundary edge of T.Lemma3.Let{i,j}be a non-boundary edge of a plane convex spanning caterpillar T such that T[i,j]has exactly one boundary edge of T.Then exactly one of{i,j−1}and {j,i+1}is an edge of T.Proof.If both{i,j−1}and{j,i+1}are in T then they cross,unless j−1=i+1in which case T contains a3-cycle.Thus at most one of{i,j−1}and{j,i+1}is in T.Suppose,for the sake of contradiction,that neither{i,j−1}nor{j,i+1}are edges of T.Since T is spanning,there is an edge{i,a}or{j,a}in T for some vertex i+1<a<j−1.Without loss of generality{i,a}is this edge,as illustrated in Figure1.Since i,i+1and a are distinct vertices of T[i,a],the subtree T[i,a]has at least three vertices,and by Lemma1,has at least two boundary edges,one of which is{i,a}.Thus T[i,a]has at least one boundary edge that is also a boundary edge of T.Now consider the subtree T′of T induced by{i}∪{a,a+1,...,j}.Then i,a,j−1and j are distinct vertices of T′,and T′has at least four vertices.Since{i,j−1}is not an edge of T,and thus not an edge of T′,the subtree T′is not a star.By Lemma1,T′has at least two4Figure1:One of{i,j−1}and{j,i+1}is an edge of T.non-consecutive boundary edges,at most one of which is{i,j}or{i,a}.Thus T′has at least one boundary edge that is also a boundary edge of T.No boundary edge of T can be in both T[i,a]and T′.Thus we have shown that T[i,j]has at least two boundary edges of T,which is the desired contradiction.In what follows we say an edge e={i,j}has spanspan(e)=min{(i−j)mod2n,(j−i)mod2n}.That is,span(e)is the number of edges in a shortest path between i and j that is con-tained in the convex hull.Lemma4.Let{i,j}be an edge of a plane convex spanning caterpillar T such that1≤j−i≤n,and T[i,j]has exactly one boundary edge of T.Then T[i,j]has exactly one edge of span k for each1≤k≤j−i.Moreover for each such k≥2the edge of span k has an endpoint in common with the edge of span k−1,and the other two endpoints are consecutive on the convex hull.Proof.If j−i=1then{i,j}is a boundary edge,and the result is trivial.Otherwise {i,j}is not a boundary edge.By Lemma3,exactly one of the edges{i,j−1}and {j,i+1}is in T.Without loss of generality{i,j−1}is in T.Thus the edge of span j−i has an endpoint in common with the edge of span j−i−1,and the other two endpoints are consecutive on the convex hull.The result follows by induction(on span)applied to the edge{i,j−1}.Before we prove the main theorem we introduce some notation.Let e={a,b}be an edge in the convex complete graph K2n.Then e+i denotes the edge{a+i,b+i}. For a set X of edges,X+i={e+i:e∈X},and X(k)={e∈X,span(e)≥k}.5Theorem2.Let T0,T1,...,T n−1be a partition of the edges of the convex complete graph K2n into plane spanning convex trees.Then T0,T1,...,T n−1are pairwise isomorphic sym-metric convex caterpillars.Proof.By Lemma2,for each0≤i≤n−1,T i is a caterpillar with two boundary edges, the edge{i,n+i}is in T i,and for every non-boundary edge{a,b}of T i,there is exactly one boundary edge of T i in each of T i[a,b]and T i[b,a].Let H=T0[0,n].Since{0,n}is an edge of H,by Lemma4,H has exactly one edge of span k for each1≤k≤n.Furthermore,for each1≤k≤n−1,the edge of span k has an endpoint in common with the edge of span k+1,and the other two endpoints are consecutive on the convex hull.Let h k={x k,x k+k}denote the edge of span k in H.For each1≤k≤n−1,if h k∩h k+1=x k+k(=x k+1+k+1)then we say the k-direction is‘clockwise’.Otherwise,h k∩h k+1=x k(=x k+1),and we say the k-direction is‘anticlockwise’,as illustrated in Figure2.=x k+1+k+1 (a)x k=x k+1+k+1k+k(b)Figure2:k-direction is(a)clockwise and(b)anticlockwiseWe will prove that H determines the structure of all the trees T0,T1,...,T n−1.We proceed by downwards induction on k=n,n−1,...,1with the hypothesis that for all 0≤i≤n−1,T(k)i=(H(k)+i)∪(H(k)+n+i)(1) Consider the base case with k=n.The only edge in H of span n is{0,n}.Thus H(n)={0,n},which implies that H(n)+i={i,n+i},and H(n)+n+i={n+i,2n+i}= {i,n+i}.Thus the right-hand side of(1)is{i,n+i}.The only edge in T i of span n is{i,n+i}.Thus T(n)i={i,n+i},and(1)is satisfied for k=n.Now suppose that(1)holds for some k+1≥2.We will prove that(1)holds for k.Suppose that the k-direction is clockwise.(The case in which the k-direction as anticlockwise is symmetric.)We proceed by induction on j=0,1,...,2n−1with the hypothesis:the edge{x k+j,x k+k+j}is in the tree T j mod n.(2)6The base case with j =0is immediate since by definition,{x k ,x k +k }∈E (T 0).Suppose that {x k +j,x k +k +j }∈E (T j mod n )for some 0≤j <2n −1.Consider the edge e ={x k +j,x k +k +j +1}.Since the k -direction is clockwise,x k =x k +1+1and x k +k =x k +1+k +1.Thus e ={x k +1+1+j,x k +1+k +1+j +1}={x k +1,x k +1+k +1}+j +1=h k +j +1.Hence e ∈H +j +1,and since e has span k +1,e ∈H (k +1)+j +1.By induction from (1),e ∈T (k +1)(j +1)mod n ,as illustrated in Figure 3.k k 1k +k +j+j +1(a)x k k k +k +j +1+j(b)Figure 3:k -direction is (a)clockwise and (b)anticlockwiseBy Lemma 3applied to e ,which is a non-boundary edge of T (j +1)mod n ,exactly oneof {x k +j,x k +k +j }and {x k +j +1,x k +k +j +1}is an edge of T (j +1)mod n .By induction from (2),{x k +j,x k +k +j }∈T j mod n .Thus {x k +j +1,x k +k +j +1}∈T (j +1)mod n .That is,(2)holds for j +1.Therefore for all 0≤j ≤2n −1,the edge {x k +j,x k +k +j }is in T j mod n .That is,h k +j is in T j mod n .By (1)for k +1we have that (1)holds for k .By (1)with k =1,each tree T i can be expressed as T i =(H +i )∪(H +n +i ).Clearly H ∪(H +n )is a symmetric convex caterpillar.Thus each T i is a translated copy of the same symmetric convex caterpillar.Therefore T 0,T 1,...,T n −1are pairwise isomorphic symmetric convex caterpillars.Figure 4illustrates an example of the application of Theorem 2.Theorem 3.For any symmetric convex caterpillar T on 2n vertices,the edges of the convex complete graph K 2n can be partitioned into n plane spanning pairwise isomorphic convex copies of T .Proof.Say V (K 2n )={0,1,...,2n −1}in clockwise order around the convex hull.Let {0,n }be the edge of T such that after deleting {0,n },A and B are the components with 0∈A and n ∈B ,and there exists a (graph-theoretic)isomorphism between A and B that maps 0to n .It is easily seen that A has a plane representation on the vertices {0,1,...,n }.For each 0≤i ≤n −1,let T i =(A +i )∪(A +n +i ).Then as in7H⇒⇒⇒⇒h4={0,4}h3={0,3}3-direction is anticlockwiseh2={1,3}2-direction is clockwiseh1={1,2}1-direction is anticlockwise k=3k=1Figure4:Illustration for Theorem2with n=4.8Theorem2,T0,T1,...,T n−1is partition of K2n into plane spanning pairwise isomorphic convex copies of T.Observe that Theorems2and3together prove Theorem1.3A Sufficient ConditionIn this section we prove the following sufficient condition for a complete geometric graph to have an affirmative solution to Open Problem1.A double star is a tree with at most two non-leaf vertices.Theorem4.Let G be a complete geometric graph K2n.Suppose that there is a set L of pairwise non-parallel lines with exactly one vertex of G in each open unbounded region formed by L.Then E(G)can be partitioned into plane spanning double stars(that are pairwise graph-theoretically isomorphic).Observe that in a double star,if there are two non-leaf vertices v and w then they must be adjacent,in which case we say vw is the root edge.Lemma5.Let P be a set of points in general position.Let L be a line with L∩P=∅.Let H1and H2be the half-planes defined by L.Let v and w be points such that v∈P∩H1 and w∈P∩H2.Let T(P,L,v,w)be the geometric graph with vertex set P and edge set{vw}∪{vx:x∈(P\{v})∩H1}∪{wy:y∈(P\{w})∩H2}.Then T(P,L,v,w)is a plane double star with root edge vw.Proof.The set of edges incident to v form a star.Regardless of the point set,a geometric star is always plane.Thus no two edges incident to v cross.Similarly no two edges incident to w cross.No edge incident to v crosses an edge incident to w since such edges are separated by L,as illustrated in Figure5.Lemma6.Let P be a set of points in general position.Let L1and L2be non-parallel lines with L1∩P=L2∩P=∅.Let v,w,x,y be points in P such that v,w,x,y are in distinct quarter-planes formed by L1and L2,with each pair(v,w)and(x,y)in opposite quarter-planes.(Note that this does not imply that vw and xy cross.)Let T1and T2be the plane double stars T1=T(P,L1,v,w)and T2=T(P,L2,x,y).Then E(T1)∩E(T2)=∅.9Figure5:A plane double star separated by a line.Proof.Suppose,for the sake of contradiction,that there is an edge e∈E(T1)∩E(T2). All edges of T1are incident to v or w,and all edges of T2are incident to x or y.Thus e∈{vx,vw,vy,xw,xy,wy}.By assumption,v,w,x,y are in distinct quarter-planes formed by L1and L2,with each pair(v,w)and(x,y)in opposite quarter-planes.Thus e crosses at least one of L1and L2.Without loss of generality e crosses L1.Since e∈E(T1),and the only edge of T1that crosses L1is the root edge vw,we have e=vw. Since all edges of T2are incident to x or y and v,w,x,y are distinct,we have e∈E(T2), which is the desired contradiction.Therefore E(T1)∩E(T2)=∅,as illustrated in Figure6.L2Figure6:Plane spanning double stars are edge-disjoint.10Proof of Theorem4.As illustrated in Figure7,let C be a circle such that the vertices of G and the intersection point of any two lines in L are in the interior of C.The intersection points of C and the lines in L partition C into2n consecutive components C0,C1,...,C2n−1,each corresponding to a region containing a single vertex of G.Let i be the vertex in the region corresponding to C bel the lines L0,L1,...,L n−1so that for each0≤i≤n−1,the components C i and C i+n run from C∩L i to C∩L(i+1)mod n in the clockwise direction.L02Figure7:Example of Theorem4with n=4.For each0≤i≤n−1,let T i be the double star T(V(G),L i,i,i+n).By Lemma5, each T i is plane.Since V(T i)=V(G),T i is a spanning tree of G.For all0≤i<j≤n−1,the points i,i+n,j,j+n are in distinct quarter-planes formed by L i and L j, with each pair(i,i+n)and(j,j+n)in opposite quarter-planes.Thus,by Lemma6, E(T i)∩E(T j)=∅.Since each T i has2n−1edges,and there are n(2n−1)edges in total,T0,T1,...,T n−1is the desired partition of E(G).Note that each line in L in Theorem4is a halving line.Pach and Solymosi[7] proved a related result:a complete geometric graph on2n vertices has n pairwise crossing edges if and only if it has precisely n halving lines.4RelaxationsWe now drop the requirement that our plane trees be spanning.Thus we need not restrict ourselves to complete graphs with an even number of vertices.Theorem4 generalises as follows.11Theorem5.Let G be a complete geometric graph K n.Suppose that there is a set L of pairwise non-parallel lines with at least one vertex of G in each open unbounded region formed by L.Then E(G)can be partitioned into n−|L|plane trees.Proof.Let P be a set consisting of exactly one vertex in each open unbounded region formed by L.Then|P|=2|L|.By Theorem4,the induced subgraph G[P]can be partitioned into1|P|+(n−|P|)=2n−1n/12 pairwise crossing edges.Thus Lemma7implies:Corollary1.Every complete geometric graph K n can be partitioned into at most n−plane subgraphs,for some constant c>1?cOf course c<2since nReferences[1]B.A RONOV,P.E RD˝O S,W.G ODDARD,D.J.K LEITMAN,M.K LUGERMAN,J.P ACH,AND L.J.S CHULMAN,Crossing binatorica,14(2):127–134,1994.[2]F.B ERNHART AND P.C.K AINEN,The book thickness of a bin.TheorySer.B,27(3):320–331,1979.[3]F.R.K.C HUNG,F.T.L EIGHTON,AND A.L.R OSENBERG,Embedding graphs inbooks:a layout problem with applications to VLSI design.SIAM J.Algebraic Discrete Methods,8(1):33–58,1987.[4]M.B.D ILLENCOURT,D.E PPSTEIN,AND D.S.H IRSCHBERG,Geometric thickness ofcomplete graphs.J.Graph Algorithms Appl.,4(3):5–17,2000.[5]A.G ARC´IA,C.H ERNANDO,F.H URTADO,M.N OY,AND J.T EJEL,Packing trees intoplanar graphs.J.Graph Theory,40(3):172–181,2002.[6]C.S.J.A.N ASH-W ILLIAMS,Decomposition offinite graphs into forests.J.LondonMath.Soc.,39:12,1964.[7]J.P ACH AND J.S OLYMOSI,Halving lines and perfect cross-matchings.In Advancesin discrete and computational geometry,vol.223of Contemp.Math.,pp.245–249, Amer.Math.Soc.,1999.[8]W.T.T UTTE,On the problem of decomposing a graph into n connected factors.J.London Math.Soc.,36:221–230,1961.13。

相关文档
最新文档