ABSTRACT Kinetic kd-Trees and Longest-Side kd-Trees
Generalized network design problems
Generalized Network Design ProblemsbyCorinne Feremans1,2Martine Labb´e1Gilbert Laporte3March20021Institut de Statistique et de Recherche Op´e rationnelle,Service d’Optimisation,CP210/01, Universit´e Libre de Bruxelles,boulevard du Triomphe,B-1050Bruxelles,Belgium,e-mail: mlabbe@smg.ulb.ac.be2Universiteit Maastricht,Faculty of Economics and Business Administration Depart-ment,Quantitative Economics,P.O.Box616,6200MD Maastricht,The Netherlands,e-mail:C.Feremans@KE.unimaas.nl3Canada Research Chair in Distribution Management,´Ecole des Hautes´Etudes Com-merciales,3000,chemin de la Cˆo te-Sainte-Catherine,Montr´e al,Canada H3T2A7,e-mail: gilbert@crt.umontreal.ca1AbstractNetwork design problems consist of identifying an optimal subgraph ofa graph,subject to side constraints.In generalized network design prob-lems,the vertex set is partitioned into clusters and the feasibility conditionsare expressed in terms of the clusters.Several applications of generalizednetwork design problems arise in thefields of telecommunications,trans-portation and biology.The aim of this review article is to formally definegeneralized network design problems,to study their properties and to pro-vide some applications.1IntroductionSeveral classical combinatorial optimization problems can be cast as Network Design Problems(NDP).Broadly speaking,an NDP consists of identifying an optimal subgraph F of an undirected graph G subject to feasibility conditions. Well known NDPs are the Minimum Spanning Tree Problem(MSTP),the Trav-eling Salesman Problem(TSP)and the Shortest Path Problem(SPP).We are interested here in Generalized NDPs,i.e.,in problems where the vertex set of G is partitioned into clusters and the feasibility conditions are expressed in terms of the clusters.For example,one may wish to determine a minimum length tree spanning all the clusters,a Hamiltonian cycle through all the clusters,etc.Generalized NDPs are important combinatorial optimization problems in their own right,not all of which have received the same degree of attention by operational researchers.In order to solve them,it is useful to understand their structure and to exploit the relationships that link them.These problems also underlie several important applications areas,namely in thefields of telecommu-nications,transportation and biology.Our aim is to formally define generalized NDPs,to study their properties and to provide examples of their applications.We willfirst define an unified notational framework for these problems.This will be followed by complexity results and by the study of seven generalized NDPs.2Definitions and notationsAn undirected graph G=(V,E)consists of afinite non-empty vertex set V= {1,...,n}and an edge set E⊆{{i,j}:i,j∈V}.Costs c i and c ij are assigned to vertices and edges respectively.Unless otherwise specified,c i=0for i∈V and c ij≥0for{i,j}∈E.We denote by E(S)={{i,j}∈E:i,j∈S},the subset of edges having their two end vertices in S⊆V.A subgraph F of G is denoted2by F=(V F,E F),V F⊆V,E F⊆E(V F),and its cost c(F)is the sum of its vertex and edge costs.It is convenient to define an NDP as a problem P associated with a subset of terminal vertices T⊆V.A feasible solution to P is a subgraph F=(V F,E F),where T⊆V F,satisfying some side constraints.If T=V,then the NDP is spanning;if T⊂V,it is non-spanning.Let G(T)=(T,E(T))and denote by F P(T)the subset of feasible solutions to the spanning problem P de-fined on the graph G(T).Let S⊆V be such that S∩T=∅,and denote by F P(T,S)the set of feasible solutions of the non-spanning problem P on graph G(S∪T)that spans T,and possibly some vertices from S.In this framework,feasible NDP solutions correspond to a subset of edges satisfying some constraints.Natural spanning NDPs are the following.1.The Minimum Spanning Tree Problem(MSTP)(see e.g.,Magnanti andWolsey[45]).The MSTP is to determine a minimum cost tree on G that includes all the vertices of V.This problem is polynomially solvable.2.The Traveling Salesman Problem(TSP)(see e.g.,Lawler,Lenstra,RinnooyKan and Shmoys[42]).The TSP consists offinding a minimum cost cycle that passes through each vertex exactly once.This problem is N P-hard.3.The Minimum Perfect Matching Problem(MPMP)(see e.g.,Cook,Cun-ningham,Pulleyblank and Schrijver[8]).A matching M⊆E is a subset of edges such that each vertex of M is adjacent to at most one edge of M.A perfect matching is a matching that contains all the vertices of G.The problem consists offinding a perfect matching of minimum cost.This problem is polynomial.4.The Minimum2-Edge-Connected Spanning Network(M2ECN)(see e.g.,Gr¨o tschel,Monma and Stoer[26]and Mahjoub[46].The M2ECN consists offinding a subgraph with minimal total cost for which there exists two edge-disjoint paths between every pair of vertices.5.The Minimum Clique Problem(MCP).The MCP consists of determining aminimum total cost clique spanning all the vertices.This problem is trivial since the whole graph corresponds to an optimal solution.We also consider the following two non-spanning NDPs.1.The Steiner Tree Problem(STP)(see Winter[61]for an overview).TheSTP is to determine a tree on G that spans a set T of terminal vertices at minimum cost.A Steiner tree may contain vertices other than those of T.These vertices are called the Steiner vertices.This problem is N P-hard.32.The Shortest Path Problem(SPP)(see e.g.,Ahuja,Magnanti and Orlin[1]).Given an origin o and a destination d,o,d∈V,the SPP consists of deter-mining a path of minimum cost from o to d.This problem is polynomially solvable.It can be seen as a particular case of the STP where T={o,d}.In generalized NDPs,V is partitioned into clusters V k,k∈K.We now formally define spanning and non-spanning generalized NDPs.Definition1(“Exactly”generalization of spanning problem).Let G= (V,E)be a graph partitioned into clusters V k,k∈K.The“exactly”generaliza-tion of a spanning NDP P on G consists of identifying a subgraph F=(V F,E F) of G yieldingmin{c(F):|V F∩V k|=1,F∈F P( k∈K(V F∩V k))}.In other words,F must contain exactly one vertex per cluster.Two differ-ent generalizations are considered for non-spanning NDPs.Definition2(“Exactly”generalizations of non-spanning problem).Let G=(V,E)be a graph partitioned into clusters V k,k∈K,and let{K T,K S}be a partition of K.The“exactly”T-generalization of a non-spanning problem NDP P on G consists of identifying a subgraph F=(V F,E F)of G yielding min{c(F):|V F∩V k|=1,k∈K T,F∈F P( k∈K T(V F∩V k), k∈K S V k)}.The“exactly”S-generalization of a non-spanning problem NDP P on G consists of identifying a subgraph F=(V F,E F)of G yieldingmin{c(F):|V F∩V k|=1,k∈K S,F∈F P( k∈K T V k, k∈K S(V F∩V k))}.In other words,in the“exactly”T-generalization,F must contain exactly one vertex per cluster V k with k∈K T,and possibly other vertices in k∈K S V k.In the“exactly”S-generalization,F must contain exactly one vertex per cluster V k with k∈K S,and all vertices of k∈K T V k.We can replace|V F∩V k|=1in the above definitions by|V F∩V k|≥1 or|V F∩V k|≤1,leading to the“at least”version or“at most”version of the generalization.The“exactly”,“at least”and“at most”versions of a generalized NDP P are denoted by E-P,L-P and M-P,respectively.In the“at most”and in the“exactly”versions,intra-cluster edges are neglected.In this case,we call the graph G,|K|-partite complete.In the“at least”version the intra-cluster edges are taken into account.43Complexity resultsWe provide in Tables1and2the complexity of the generalized versions in their three respective forms(“exactly”,“at least”and“at most”)for the seven NDPs considered.Some of these combinations lead to trivial problems.Obviously,if a classical NDP is N P-hard,its generalization is also N P-hard.The indication“∅is opt”means that the empty set is feasible and is optimal for the correspond-ing problem.References about complexity results for the classical version of the seven problems considered can be found in Garey and Johnson[20].As can be seen from Table2,two cases of the generalized SPP are N P-hard by reduction from the Hamiltonian Path Problem(see Garey and Johnson[20]). Li,Tsao and Ulular[43]show that the“at most”S-generalization is polynomial if the shrunk graph is series-parallel but provide no complexity result for the gen-eral case.A shrunk graph G S=(V S,E S)derived from a graph G partitioned into clusters is defined as follows:V S contains one vertex for each cluster of G, and there exists an edge in E S whenever an edge between the two corresponding clusters exists in G.An undirected graph is series-parallel if it is not contractible to K4,the complete graph on four vertices.A graph G is contractible to an-other graph H if H can be obtained from G by deleting and contracting edges. Contracting an edge means that its two end vertices are shrunk and the edge is deleted.We now provide a short literature review and applications for each of the seven generalized NDPs considered.Table1:Complexity of classical and generalized spanning NDPs Problem MSTP TSP MPMP M2ECN MCP Classical Polynomial N P-hard Polynomial N P-hard Trivial,polynomial Exactly N P-hard[47]N P-hard Polynomial N P-hard N P-hard(with vertexcost)[35]At least N P-hard[31]N P-hard Polynomial N P-hard Equivalent toexactlyAt most∅is opt∅is opt∅is opt∅is opt∅is opt5Table2:Complexity of classical and generalized non-spanning NDPsProblem STP SPPClassical N P-hard PolynomialExactly T-generalization N P-hard PolynomialExactly S-generalization N P-hard N P-hardAt least T-generalization N P-hard PolynomialAt least S-generalization N P-hard N P-hardAt most T-generalization∅is opt∅is optAt most S-generalization N P-hard Polynomial if shrunk graphis series-parallel[43]4The generalized minimum spanning tree prob-lemThe Generalized Minimum Spanning Tree Problem(E-GMSTP)is the problemoffinding a minimum cost tree including exactly one vertex from each vertexset from the partition(see Figure1a for a feasible E-GMSTP solution).Thisproblem was introduced by Myung,Lee and Tcha[47].Several formulations areavailable for the E-GMSTP(see Feremans,Labb´e and Laporte[17]).The Generalized Minimum Spanning Tree Problem in its“at least”version(L-GMSTP)is the problem offinding a minimum cost tree including at least onevertex from each vertex set from the partition(see Figure1b for a feasible solu-tion of L-GMSTP).This problem was introduced by Ihler,Reich and Widmayer[31]as a particular case of the Generalized Steiner Tree Problem(see Section9)under the name“Class Tree Problem”.Dror,Haouari and Chaouachi[11]showthat if the family of clusters covers V without being pairwise disjoint,then theL-GMSTP defined on this family can be transformed into the original L-GMSTPon a graph G′obtained by substituting each vertex v∈ ℓ∈L Vℓ,L⊆K by|L| copies vℓ∈Vℓ,ℓ∈L,and adding edges of weight zero between each pair of thesenew vertices(clique of weight zero between vℓforℓ∈L).This can be done aslong as there is nofixed cost on the vertices,and this transformation does nothold for the“exactly”version of the problem.Applications modeled by the E-GMSTP are encountered in telecommuni-cations,where metropolitan and regional networks must be interconnected by atree containing a gateway from each network.For this internetworking,a vertexhas to be chosen in each local network as a hub and the hub vertices must be con-nected via transmission links such as opticalfiber(see Myung,Lee and Tcha[47]).6Figure 1a: E−GMSTP Figure 1b: L−GMSTPFigure1:Feasible GMSTP solutionsThe L-GMSTP has been used to model and solve an important irrigation network design problem arising in desert environments,where a set of|K|poly-gon shaped parcels share a common source of water.Each parcel is represented by a cluster made up of the polygon vertices.Another cluster corresponds to the water source vertex.The problem consists of designing a minimal length irriga-tion network connecting at least one vertex from each parcel to the water source. This irrigation problem can be modeled as an L-GMSTP as follows.Edges corre-spond to the boundary lines of the parcel.The aim is to construct a minimal cost tree such that each parcel has at least one irrigation source(see Dror,Haouari and Chaouachi[11]).Myung,Lee and Tcha[47]show that the E-GMSTP is strongly N P-hard, using a reduction from the Node Cover Problem(see Garey and Johnson[20]). These authors also provide four integer linear programming formulations.A branch-and-bound method is developed and tested on instances involving up to 100vertices.For instances containing between120and200vertices,the method is stopped before thefirst branching.The lower bounding procedure is a heuris-tic method which approximates the linear relaxation associated with the dual of a multicommodityflow formulation for the E-GMSTP.A heuristic algorithm finds a primal feasible solution for the E-GMSTP using the lower bound.The branching strategy performed in this method is described in Noon and Bean[48].A cluster isfirst selected and branching is performed on each vertex of this cluster.In Faigle,Kern,Pop and Still[14],another mixed integer formulation for the E-GMSTP is given.The linear relaxation of this formulation is computed for a set of12instances containing up to120vertices.This seems to yield an7optimal E-GMSTP solution for all but one instance.The authors also use the subpacking formulation from Myung,Lee and Tcha[47]in which the integrality constraints are kept and the subtour constraints are added dynamically.Three instances containing up to75vertices are tested.A branch-and-cut algorithm for the same problem is described in Feremans[15].Several families of valid inequalities for the E-GMSTP are introduced and some of these are proved to be facet defiputational results show that instances involving up to200vertices can be solved to optimality using this method.A comparison with the computational results obtained in Myung,Lee and Tcha[47]shows that the gap between the lower bound and the upper bound obtained before branching is reduced by10%to20%.Pop,Kern and Still[51]provide a polynomial approximation algorithm for the E-GMSTP.Its worst-case ratio is bounded by2ρif the cluster size is bounded byρ.This algorithm is derived from the method described in Magnanti and Wolsey[45]for the Vertex Weighted Steiner Tree Problem(see Section9).Ihler,Reich,Widmayer[31]show that the decision version of the L-GMSTP is N P-complete even if G is a tree.They also prove that no constant worst-case ratio polynomial-time algorithm for the L-GMSTP exists unless P=N P,even if G is a tree on V with edge lengths1and0.They also develop two polynomial-time heuristics,tested on instances up to250vertices.Finally,Dror,Haouari and Chaouachi[11]provide three integer linear programming formulations for the L-GMSTP,two of which are not valid(see Feremans,Labb´e and Laporte[16]). The authors also describefive heuristics including a genetic algorithm.These heuristics are tested on20instances up to500vertices.The genetic algorithm performs better than the other four heuristics.An exact method is described in Feremans[15]and compared to the genetic algorithm in Dror,Haouari and Chaouachi[11].These results show that the genetic algorithm is time consuming compared to the exact approach of Feremans[15].Moreover the gap between the upper bound obtained by the genetic algorithm and the optimum value increases as the size of the problem becomes larger.5The generalized traveling salesman problem The Generalized Traveling Salesman Problem,denoted by E-GTSP,consists of finding a least cost cycle passing through each cluster exactly once.The sym-metric E-GTSP was introduced by Henry-Labordere[28],Saskena[56]and Sri-vastava,Kumar,Garg and Sen[60]who proposed dynamic programming formu-lations.Thefirst integer linear programming formulation is due to Laporte and Nobert[40]and was later enhanced by Fischetti,Salazar and Toth[18]who in-8troduced a number of facet defining valid inequalities for both the E-GTSP and the L-GTSP.In Fischetti,Salazar and Toth[19],a branch-and-cut algorithm is developed,based on polyhedral results developed in Fischetti,Salazar and Toth [18].This method is tested on instances whose edge costs satisfy the triangular inequality(for which E-GTSP and L-GTSP are equivalent).Moreover heuristics producing feasible E-GTSP solutions are provided.Noon[50]has proposed several heuristics for the GTSP.The most sophis-ticated heuristic published to date is due to Renaud and Boctor[53].It is a generalization of the heuristic proposed in Renaud,Boctor and Laporte[54]for the classical TSP.Snyder and Daskin[59]have developed a genetic algorithm which is compared to the branch-and-cut algorithm of Fischetti,Salazar and Toth[19]and to the heuristics of Noon[50]and of Renaud and Boctor[53].This genetic algorithm is slightly slower than other heuristics,but competitive with the CPU times obtained in Fischetti,Salazar and Toth[19]on small instances, and noticeably faster on the larger instances(containing up to442vertices).Approximation algorithms for the GTSP with cost function satisfying the triangle inequality are described in Slav´ık[58]and in Garg,Konjevod and Ravi [21].A non-polynomial-time approximation heuristic derived from Christofides heuristic for the TSP[7]is presented in Dror and Haouari[10];it has a worst-case ratio of2.Transformations of the GTSP instances into TSP instances are studied in Dimitrijevi´c and Saric[9],Laporte and Semet[41],Lien,Ma and Wah[44],Noon and Bean[49].According to Laporte and Semet[41],they do not provide any significant advantage over a direct approach since the TSP resulting from the transformation is highly degenerate.The GTSP arises in several application contexts,several of which are de-scribed in Laporte,Asef-Vaziri and Sriskandarajah[38].These are encountered in post box location(Labb´e and Laporte[36])and in the design of postal deliv-ery routes(Laporte,Chapleau,Landry,and Mercure[39]).In thefirst problem the aim is to select a post box location in each zone of a territory in order to achieve a compromise between user convenience and mail collection costs.In the second application,collection routes must be designed through several post boxes at known locations.Asef-Vaziri,Laporte,and Sriskandarajah[3]study the problem of optimally designing a loop-shaped system for material transportation in a factory.The factory is partitioned into|K|rectilinear zones and the loop must be adjacent to at least one side of each zone,which can be formulated as a GTSP.The GTSP can also be used to model a simple case of the stochastic vehicle routing problem with recourse(Dror,Laporte and Louveaux[12])and some families of arc routing problems(Laporte[37]).In the latter application,a9symmetric arc routing problem is transformed into an equivalent vertex routing problem by replacing edges by vertices.Since the distance from edge e1to edge e2depends on the traversal direction,each edge is represented by two vertices, only one of which is used in the solution.This gives rise to a GTSP.6The generalized minimum perfect matching problemThe E-GMPMP and L-GMPMP are polynomial.Indeed,the E-GMPMP remains a classical MPMP on the shrunk graph,where c kℓ:=min{c ij:i∈V k,j∈Vℓ}for {k,ℓ}∈E S.Moreover the L-GMPMP can be reduced to the E-GMPMP.7The generalized minimum2-edge-connected network problemThe Generalized Minimum Cost2-Edge-Connected Network Problem(E-G2ECN) consists offinding a minimum cost2-edge-connected subgraph that contains ex-actly one vertex from each cluster(Figure2).Figure2:A feasible E-G2ECN solutionThis problem arises in the context of telecommunications when copper wire is replaced with high capacity opticfiber.Because of its high capacity,this new technology allows for tree-like networks.However,this new network becomes failure-sensitive:if one edge breaks,all the network is disconnected.To avoid this situation,the network has to be reliable and must fulfill survivability condi-tions.Since two failures are not likely to occur simultaneously,it seems reasonable to ask for a2-connected network.10This problem is a generalization of the GMSTP.Local networks have to be interconnected by a global network;in every local network,possible locations for a gate(location where the global network and local networks can be intercon-nected)of the global network are given.This global network has to be connected, survivable and of minimum cost.The E-G2ECNP and the L-G2ECNP are studied in Huygens[29].Even when the edge costs satisfy the triangle inequality,the E-G2ECNP and the L-G2ECNP are not equivalent.These problems are N P-hard.There cannot exist a polynomial-time heuristic with bounded worst-case ratio for E-G2ECNP.In Huy-gens[29],new families of facet-defining inequalities for the polytope associated with L-G2ECNP are provided and heuristic methods are described.8The generalized minimum clique problemIn the Generalized Minimum Clique Problem(GMCP)non-negative costs are associated with vertices and edges and the graph is|K|-partite complete.The GMCP consists offinding a subset of vertices containing exactly one vertex from each cluster such that the cost of the induced subgraph(the cost of the selected vertices plus the cost of the edges in the induced subgraph)is minimized(see Figure3).Figure3:A feasible GMSCP solutionThe GMCP appears in the formulation of particular Frequency Assignment Problems(FAP)(see Koster[34]).Assume that“...we have to assign a frequency to each transceiver in a mobile telephone network,a vertex corresponds to a transceiver.The domain of a vertex is the set of frequencies that can be assigned to that transceiver.An edge indicates that communication from one transceiver may interfere with communication from the other transceiver.The penalty of an11edge reflects the priority with which the interference should be avoided,whereas the penalty of a vertex can be seen as the level of preference for the frequen-cies.”(Koster,Van Hoesel and Kolen[35]).The GMCP can also be used to model the conformations occurring in pro-teins(see Althaus,Kohlbacher,Lenhof and M¨u ller[2]).These conformations can be adequately described by a rather small set of so-called rotamers for each amino-acid.The problem of the prediction of protein complex from the structures of its single components can then be reduced to the search of the set of rotamers, one for each side chain of the protein,with minimum energy.This problem is called the Global Minimum Energy Conformation(GMEC).The GMEC can be formulated as follows.Each residue side chain of the protein can take a number of possible rotameric states.To each side chain is associated a cluster.The vertices of this cluster represent the possible rotameric states for this chain.The weight on the vertices is the energy associated with the chain in this rotameric state. The weight on the edges is the energy coming from the combination of rotameric states for different side chains.The GMCP is N P-hard(Koster,Van Hoesel and Kolen[35]).Results of polyhedral study for the GCP were embedded in a cutting plane approach by these authors to solve difficult instances of frequency assignment problems. The structure of the graph in the frequency assignment application is exploited using tree decomposition approach.This method gives good lower bounds for difficult instances.Local search algorithms to solve FAP are also investigated. Two techniques are presented in Althaus,Kohlbacher,Lenhof and M¨u ller[2]to solve the GMEC:a“multi-greedy”heuristic and a branch-and-cut algorithm. Both methods are able to predict the correct complex structure on the instances tested.9The generalized Steiner tree problemThe standard generalization of the STP is the T-Generalized Steiner Tree Prob-lem in its“at least”version(L-GSTP).Let T⊆V be partitioned into clusters. The L-GSTP consists offinding a minimum cost tree of G containing at least one vertex from each cluster.This problem is also known as the Group Steiner Tree Problem or the Class Steiner Tree Problem.Figure4depicts a feasible L-GSTP solution.The L-GSTP is a generalization of the L-GMSTP since the L-GSTP defined on a family of clusters describing a partition of V is a L-GMSTP.This problem was introduced by Reich and Widmayer[52].The L-GSTP arises in wire-routing with multi-port terminals in physical Very Large Scale Integration(VLSI)design.The traditional model assuming sin-12Figure4:A feasible L-GSTP solutiongle ports for each of the terminals to be connected in a net of minimum length is a case of the classical STP.When the terminal is a collection of different pos-sible ports,so that the net can be connected to any one of them,we have an L-GSTP:each terminal is a collection of ports and we seek a minimum length net containing at least one port from each terminal group.The multiple port locations for a single terminal may also model different choices of placing a single port by rotating or mirroring the module containing the port in the placement (see Garg,Konjevod and Ravi[21]).More detailed applications of the L-GSTP in VLSI design can be found in Reich and Widmayer[52].The L-GSTP is N P-hard because it is a generalization of an N P-hard problem.When there are no Steiner vertices,the L-GSTP remains N P-hard even if G is a tree(see Section4).This is a major difference from the classical STP(if we assume that either there is no Steiner vertices or that G is a tree,the complexity of STP becomes polynomial).Ihler,Reich and Widmayer[31]show that the graph G can be transformed(in linear time)into a graph G′(without clusters)such that an optimal Steiner tree on G′can be transformed back into an optimal generalized Steiner tree in G.Therefore,any algorithm for the STP yields an algorithm for the L-GSTP.Even if there exist several contributions on polyhedral aspects(see among others Goemans[24],Goemans and Myung[23],Chopra and Rao[5],[6])and exact methods(see for instance Koch and Martin[33])for the classical problem, only a few are known,as far as we are aware,for the L-GSTP.Polyhedral aspects are studied in Salazar[55]and a lower bounding procedure is described in Gillard and Yang[22].13A number of heuristics for the L-GSTP have been proposed.Early heuris-tics for the L-GSTP are developed in Ihler[30]with an approximation ratio of |K|−1.Two polynomial-time heuristics are tested on instances up to250vertices in Ihler,Reich and Widmayer[31],while a randomized algorithm with polylog-arithmic approximation guarantee is provided in Garg,Konjevod,Ravi[21].A series of polynomial-time heuristics are described in Helvig,Robins,Zelikovsky [27]with worst-case ratio of O(|K|ǫ)forǫ>0.These are proved to empirically outperform one of the heuristic developed in Ihler,Reich and Widmayer[31].In the Vertex Weighted Steiner Tree Problem(VSTP)introduced by Segev [57],weights are associated with the vertices in V.These weights can be negative, in which case they represent profit gained by selecting the vertex.The problem consists offinding a minimum cost Steiner tree(the sum of the weights of the selected vertices plus the sum of the weights of the selected edges).This problem is a special case of the Directed Steiner Tree Problem(DSP)(see Segev[57]). Given a directed graph G=(V,A)with arc weights,afixed vertex and a subset T⊆V,the DSP requires the identification of a minimum weighted directed tree rooted at thefixed vertex and spanning T.The VSTP has been extensively studied(see Duin and Volgenant[13],Gorres[25],Goemans and Myung[23], Klein and Ravi[32]).As far as we know,no Generalized Vertex Weighted Steiner Tree Problem has been addressed.An even more general problem would be the Vertex Weighted Directed Steiner Tree Problem.10The generalized shortest path problemLi,Tsao and Ulular[43]describe an S-generalization of the SPP in its“at most”version(M-GSPP).Let o and d be two vertices of G and assume that V\{o,d}is partitioned into clusters.The M-GSPP consists of determining a shortest path from o to d that contains at most one vertex from each cluster.Note that the T-generalization is of no interest since it reduces to computing the shortest paths between all the pairs of vertices belonging to the two different clusters.In the problem considered by Li,Tsao and Ulular[43],each vertex is as-signed a non-negative weight.The problem consists offinding a minimum cost path from o to d such that the total vertex weight on the path in each traversed cluster does not exceed a non-negative integerℓ(see Figure5).This problem with ℓ=1and vertex weights equal to one for each vertex coincides with the M-GSPP.The problem arises in optimizing the layout of private networks embedded in a larger telecommunication network.A vertex in V\{o,d}represents a digital cross connect center(DCS)that treats the information and insures the transmis-sion.A cluster corresponds to a collection of DCS located at the same location14。
abn-tree例子-概述说明以及解释
abn-tree例子-概述说明以及解释1.引言1.1 概述在计算机科学领域,ABN-Tree是一种基于字典树(Digital Tree)的数据结构,用于高效地存储和管理大量的字符串数据。
通过将字符串按字母进行分割,并以多叉树的形式组织起来,ABN-Tree能够快速地查找、插入和删除字符串。
ABN-Tree的设计灵感来自于传统的字典树和前缀树,但是它具有更高的灵活性和扩展性。
与传统的字典树不同,ABN-Tree在每个节点上维护了一组字符,而不仅仅是单个字符。
这意味着一个节点可以表示多个字符串的公共前缀。
ABN-Tree的构建过程涉及将输入的字符串按照字母进行分割,并根据每个字母构建出一棵多叉树。
这种构建方式使得ABN-Tree在存储和检索字符串时具有很高的效率。
同时,ABN-Tree还支持动态插入和删除操作,使得它能够适应实时的数据变化。
在实际应用中,ABN-Tree可以广泛应用于字符串匹配、自动完成、拼写检查等领域。
例如,在搜索引擎中,ABN-Tree可以用于快速检索相关的搜索词和网页;在文字处理软件中,ABN-Tree可以用于自动纠正拼写错误和提示可能的补全词。
尽管ABN-Tree在处理大量字符串数据时表现出色,但它也存在一些局限性。
例如,ABN-Tree对于需要频繁插入和删除的场景可能效率较低;同时,ABN-Tree的空间占用相对较高,特别是在字符串重复度较高的情况下。
未来,随着数据规模的不断扩大和计算能力的提升,ABN-Tree有望得到进一步的优化和改进。
我们可以预见,ABN-Tree在数据存储与检索领域的应用潜力将会越来越大。
总的来说,ABN-Tree作为一种高效的字符串数据结构,在提高处理速度和准确性方面发挥了重要作用,并具有广阔的发展前景。
在接下来的篇章中,我们将对ABN-Tree的定义、原理、构建过程和实际应用中的例子进行详细介绍,以及对ABN-Tree的优势、局限性及未来发展进行讨论和总结。
kdtree用法 -回复
kdtree用法-回复kdtree用法:一步一步回答引言:Kdtree是一种用于高效地搜索k维空间中最近邻点的数据结构。
它的应用涵盖了许多领域,如模式识别、计算机图形学、机器学习等。
本文将一步一步介绍kdtree的用法,包括构建、插入、最近邻搜索以及删除等操作。
第一步:构建kdtree构建kdtree是使用kdtree的第一步。
首先,我们需要定义一个节点类来表示kdtree中的每个节点。
每个节点包含的属性有:point(用于存储k 维空间中的点)、left(指向左子树的指针)、right(指向右子树的指针)和axis(表示划分区域的坐标轴)。
接下来,我们需要定义一个递归函数来构建kdtree。
该函数将接收一个点集和当前节点作为参数。
它的基本思想是,找到当前点集中在当前坐标轴上的中位数点,作为当前节点的point,并将点集按照当前坐标轴上的中位数点分成两个子集,分别递归构建左子树和右子树。
具体的构建过程如下:1. 如果点集为空,则当前节点为空节点,返回。
2. 找到当前坐标轴上的中位数点m,将其作为当前节点的point。
3. 将点集按照m分成两个子集:小于m的点集和大于m的点集。
4. 递归构建左子树,将小于m的点集作为参数传入,并将左子树的根节点设为当前节点的left。
5. 递归构建右子树,将大于m的点集作为参数传入,并将右子树的根节点设为当前节点的right。
构建结束后,我们就得到了一个完整的kdtree。
第二步:插入点到kdtree在构建kdtree的基础上,我们可以进一步插入新的点。
插入点的过程与构建kdtree的过程类似。
首先,我们需要找到插入点在当前坐标轴上的中位数点m,然后与当前节点的point进行比较。
如果插入点小于m,则递归插入左子树;如果插入点大于等于m,则递归插入右子树。
如果当前节点的left或right为空节点,则创建一个新的节点,并将其作为当前节点的left或right。
具体的插入过程如下:1. 如果当前节点为空节点,则创建一个新的节点,将插入点作为其point,并返回。
编程英语词汇汇总
:JDK(Java Development Kit) java开发工具包JVM(Java Virtual Machine) java虚拟机Javac 编译命令java 解释命令Javadoc 生成java文档命令classpath 类路径Version 版本author 作者public 公共的class 类static 静态的void 没有返回值String 字符串类System 系统类out 输出print 同行打印println 换行打印JIT(just-in-time) 及时处理第二章:byte 字节char 字符boolean 布尔short 短整型int 整形long 长整形float 浮点类型double 双精度if 如果else 否则switch 多路分支case 与常值匹配break 终止default 默认while 当到循环do 直到循环for 已知次数循环continue结束本次循环进行下次跌代length 获取数组元素个数第三章:OOP object oriented programming 面向对象编程Object 对象Class 类Class member 类成员Class method 类方法Class variable 类变量Constructor 构造方法Package 包Import package 导入包第四章:Extends 继承Base class 基类Super class 超类Overloaded method 重载方法Overridden method 重写方法Public 公有Private 私有Protected 保护Static 静态Abstract 抽象Interface 接口Implements interface 实现接口第五章:Exception 意外,异常RuntimeExcepiton 运行时异常ArithmeticException 算术异常IllegalArgumentException 非法数据异常ArrayIndexOutOfBoundsException 数组索引越界异常NullPointerException 空指针异常ClassNotFoundException 类无法加载异常(类不能找到)NumberFormatException 字符串到float类型转换异常(数字格式异常)IOException 输入输出异常FileNotFoundException 找不到文件异常EOFException 文件结束异常InterruptedException (线程)中断异常try 尝试catch 捕捉finally 最后throw 投、掷、抛throws 投、掷、抛print Stack Trace() 打印堆栈信息get Message()获得错误消息get Cause()获得异常原因method 方法able 能够instance 实例check 检查第六章:byte(字节)char(字符)int(整型)long(长整型)float(浮点型)double(双精度)boolean(布尔)short(短整型)Byte (字节类)Character (字符类)Integer(整型类)Long (长整型类)Float(浮点型类)Double (双精度类)Boolean(布尔类)Short (短整型类)Digit (数字)Letter (字母)Lower (小写)Upper (大写)Space (空格)Identifier (标识符)Start (开始)String (字符串)length (值)equals (等于)Ignore (忽略)compare (比较)sub (提取)concat (连接)replace (替换)trim (整理)Buffer (缓冲器)reverse (颠倒)delete (删除)append (添加)Interrupted (中断的)第七章:Date 日期,日子After 后来,后面Before 在前,以前Equals 相等,均等toString 转换为字符串SetTime 设置时间Display 显示,展示Calendar 日历Add 添加,增加GetInstance 获得实例getTime 获得时间Clear 扫除,清除Clone 克隆,复制Util 工具,龙套Components 成分,组成Month 月份Year 年,年岁Hour 小时,钟头Minute 分钟Second 秒Random 随意,任意Next Int 下一个整数Gaussian 高斯ArrayList 对列LinkedList 链表Hash 无用信息,杂乱信号Map 地图Vector 向量,矢量Size 大小Collection 收集Shuffle 混乱,洗牌RemoveFirst 移动至开头RemoveLast 移动至最后lastElement 最后的元素Capacity 容量,生产量Contains 包含,容纳Copy 副本,拷贝Search 搜索,查询InsertElementAt 插入元素在某一位置第八章:io->in out 输入/输出File 文件import 导入exists 存在isFile 是文件isDirectory 是目录getName 获取名字getPath 获取路径getAbsolutePath 获取绝对路径lastModified 最后修改日期length 长度InputStream 输入流OutputStream 输出流Unicode 统一的字符编码标准, 采用双字节对字符进行编码Information 信息FileInputStream 文件输入流FileOutputStream文件输出流IOException 输入输出异常fileobject 文件对象available 可获取的read 读取write 写BufferedReader 缓冲区读取FileReader 文本文件读取BufferedWriter 缓冲区输出FileWriter 文本文件写出flush 清空close 关闭DataInputStream 二进制文件读取DataOutputStream二进制文件写出EOF 最后encoding 编码Remote 远程release 释放第九章:JBuider Java 集成开发环境(IDE)Enterprise 企业版Developer 开发版Foundation 基础版Messages 消息格Structure 结构窗格Project 工程Files 文件Source 源代码Design 设计History 历史Doc 文档File 文件Edit 编辑Search 查找Refactor 要素View 视图Run 运行Tools 工具Window 窗口Help 帮助Vector 矢量addElement 添加内容Project Winzard 工程向导Step 步骤Title 标题Description 描述Copyright 版权Company 公司Aptech Limited Aptech有限公司author 作者Back 后退Finish 完成version 版本Debug 调试New 新建ErrorInsight 调试第十章:JFrame 窗口框架JPanel 面板JScrollPane 滚动面板title 标题Dimension 尺寸Component 组件Swing JAVA轻量级组件getContentPane 得到内容面板LayoutManager 布局管理器setVerticalScrollBarPolicy 设置垂直滚动条策略AWT(Abstract Window Toolkit)抽象窗口工具包GUI (Graphical User Interface)图形用户界面VERTICAL_SCROLLEARAS_NEEDED 当内容大大面板出现滚动条VERTICAL_SOROLLEARAS_ALWAYS 显示滚动条VERTICAL_SOROLLEARAS_NEVER 不显示滚动条JLabel 标签Icon 图标image 图象LEFT 左对齐RIGHT 右对齐JTextField 单行文本getColumns 得到列数setLayout 设置布局BorderLayout 边框布局CENTER 居中对齐JTextArea 多行文本setFont 设置字体setHorizontalAlignment 设置文本水平对齐方式setDefaultCloseOperation 设置默认的关闭操作add 增加JButton 按钮JCheckBox 复选框JRadioButton单选按钮addItem 增加列表项getItemAt 得到位置的列表项getItemCount 得到列表项个数setRolloverIcon 当鼠标经过的图标setSelectedIcon 当选择按钮的图标getSelectedItem 得到选择的列表项getSelectedIndex 得到选择的索引ActionListener 按钮监听ActionEvent 按钮事件actionPerformed 按钮单击方法计算机编程英语大全关键字: 计算机编程英语大全算法常用术语中英对照Data Structures 基本数据结构Dictionaries 字典Priority Queues 堆Graph Data Structures 图Set Data Structures 集合Kd-Trees 线段树Numerical Problems 数值问题Solving Linear Equations 线性方程组Bandwidth Reduction 带宽压缩Matrix Multiplication 矩阵乘法Determinants and Permanents 行列式Constrained and Unconstrained Optimization 最值问题Linear Programming 线性规划Random Number Generation 随机数生成Factoring and Primality Testing 因子分解/质数判定Arbitrary Precision Arithmetic 高精度计算Knapsack Problem 背包问题Discrete Fourier Transform 离散Fourier变换Combinatorial Problems 组合问题Sorting 排序Searching 查找Median and Selection 中位数Generating Permutations 排列生成Generating Subsets 子集生成Generating Partitions 划分生成Generating Graphs 图的生成Calendrical Calculations 日期Job Scheduling 工程安排Satisfiability 可满足性Graph Problems -- polynomial 图论-多项式算法Connected Components 连通分支Topological Sorting 拓扑排序Minimum Spanning Tree 最小生成树Shortest Path 最短路径Transitive Closure and Reduction 传递闭包Matching 匹配Eulerian Cycle / Chinese Postman Euler回路/中国邮路Edge and Vertex Connectivity 割边/割点Network Flow 网络流Drawing Graphs Nicely 图的描绘Drawing Trees 树的描绘Planarity Detection and Embedding 平面性检测和嵌入Graph Problems -- hard 图论-NP问题Clique 最大团Independent Set 独立集Vertex Cover 点覆盖Traveling Salesman Problem 旅行商问题Hamiltonian Cycle Hamilton回路Graph Partition 图的划分Vertex Coloring 点染色Edge Coloring 边染色Graph Isomorphism 同构Steiner Tree Steiner树Feedback Edge/Vertex Set 最大无环子图Computational Geometry 计算几何Convex Hull 凸包Triangulation 三角剖分Voronoi Diagrams Voronoi图Nearest Neighbor Search 最近点对查询Range Search 范围查询Point Location 位置查询Intersection Detection 碰撞测试Bin Packing 装箱问题Medial-Axis Transformation 中轴变换Polygon Partitioning 多边形分割Simplifying Polygons 多边形化简Shape Similarity 相似多边形Motion Planning 运动规划Maintaining Line Arrangements 平面分割Minkowski Sum Minkowski和Set and String Problems 集合与串的问题Set Cover 集合覆盖Set Packing 集合配置String Matching 模式匹配Approximate String Matching 模糊匹配Text Compression 压缩Cryptography 密码Finite State Machine Minimization 有穷自动机简化Longest Common Substring 最长公共子串Shortest Common Superstring 最短公共父串DP——Dynamic Programming——动态规划recursion ——递归编程词汇A2A integration A2A整合abstract 抽象的abstract base class (ABC)抽象基类abstract class 抽象类abstraction 抽象、抽象物、抽象性access 存取、访问access level访问级别access function 访问函数account 账户action 动作activate 激活active 活动的actual parameter 实参adapter 适配器add-in 插件address 地址address space 地址空间address-of operator 取地址操作符ADL (argument-dependent lookup)ADO(ActiveX Data Object)ActiveX数据对象advanced 高级的aggregation 聚合、聚集algorithm 算法alias 别名align 排列、对齐allocate 分配、配置allocator分配器、配置器angle bracket 尖括号annotation 注解、评注API (Application Programming Interface) 应用(程序)编程接口app domain (application domain)应用域application 应用、应用程序application framework 应用程序框架appearance 外观append 附加architecture 架构、体系结构archive file 归档文件、存档文件argument引数(传给函式的值)。
计算机编程及常用术语英语词汇大全
第一部分、计算机算法常用术语中英对照Data Structures 基本数据结构Dictionaries 字典Priority Queues 堆Graph Data Structures 图Set Data Structures 集合Kd-Trees 线段树Numerical Problems 数值问题Solving Linear Equations 线性方程组Bandwidth Reduction 带宽压缩Matrix Multiplication 矩阵乘法Determinants and Permanents 行列式Constrained and Unconstrained Optimization 最值问题Linear Programming 线性规划Random Number Generation 随机数生成Factoring and Primality Testing 因子分解/质数判定Arbitrary Precision Arithmetic 高精度计算Knapsack Problem 背包问题Discrete Fourier Transform 离散Fourier变换Combinatorial Problems 组合问题Sorting 排序Searching 查找Median and Selection 中位数Generating Permutations 排列生成Generating Subsets 子集生成Generating Partitions 划分生成Generating Graphs 图的生成Calendrical Calculations 日期Job Scheduling 工程安排Satisfiability 可满足性Graph Problems -- polynomial 图论-多项式算法Connected Components 连通分支Topological Sorting 拓扑排序Minimum Spanning Tree 最小生成树Shortest Path 最短路径Transitive Closure and Reduction 传递闭包Matching 匹配Eulerian Cycle / Chinese Postman Euler回路/中国邮路Edge and Vertex Connectivity 割边/割点Network Flow 网络流Drawing Graphs Nicely 图的描绘Drawing Trees 树的描绘Planarity Detection and Embedding 平面性检测和嵌入Graph Problems -- hard 图论-NP问题Clique 最大团Independent Set 独立集Vertex Cover 点覆盖Traveling Salesman Problem 旅行商问题Hamiltonian Cycle Hamilton回路Graph Partition 图的划分Vertex Coloring 点染色Edge Coloring 边染色Graph Isomorphism 同构Steiner Tree Steiner树Feedback Edge/Vertex Set 最大无环子图Computational Geometry 计算几何Convex Hull 凸包Triangulation 三角剖分Voronoi Diagrams Voronoi图Nearest Neighbor Search 最近点对查询Range Search 范围查询Point Location 位置查询Intersection Detection 碰撞测试Bin Packing 装箱问题Medial-Axis Transformation 中轴变换Polygon Partitioning 多边形分割Simplifying Polygons 多边形化简Shape Similarity 相似多边形Motion Planning 运动规划Maintaining Line Arrangements 平面分割Minkowski Sum Minkowski和Set and String Problems 集合与串的问题Set Cover 集合覆盖Set Packing 集合配置String Matching 模式匹配Approximate String Matching 模糊匹配Text Compression 压缩Cryptography 密码Finite State Machine Minimization 有穷自动机简化Longest Common Substring 最长公共子串Shortest Common Superstring 最短公共父串DP——Dynamic Programming——动态规划recursion ——递归第二部分、编程词汇A2A integration A2A整合abstract 抽象的abstract base class (ABC)抽象基类abstract class 抽象类abstraction 抽象、抽象物、抽象性access 存取、访问access level访问级别access function 访问函数account 账户action 动作activate 激活active 活动的actual parameter 实参adapter 适配器add-in 插件address 地址address space 地址空间address-of operator 取地址操作符ADL (argument-dependent lookup)ADO(ActiveX Data Object)ActiveX数据对象advancedaggregation 聚合、聚集algorithm 算法alias 别名align 排列、对齐allocate 分配、配置allocator分配器、配置器angle bracket 尖括号annotation 注解、评注API (Application Programming Interface) 应用(程序)编程接口app domain (application domain)应用域application 应用、应用程序application framework 应用程序框架appearance 外观append 附加architecture 架构、体系结构archive file 归档文件、存档文件argument引数(传给函式的值)。
编程算法常用术语中英对照
编程算法常用术语中英对照data structures 基本数据结构dictionaries 字典priority queues 堆graph data structures 图set data structures 集合kd-trees 线段树numerical problems 数值问题solving linear equations 线性方程组bandwidth reduction 带宽压缩matrix multiplication 矩阵乘法determinants and permanents 行列式constrained and unconstrained optimization 最值问题linear programming 线性规划random number generation 随机数生成factoring and primality testing 因子分解/质数判定arbitrary precision arithmetic 高精度计算knapsack problem 背包问题discrete fourier transform 离散fourier变换combinatorial problems 组合问题sorting 排序searching 查找median and selection 中位数generating permutations 排列生成generating subsets 子集生成generating partitions 划分生成generating graphs 图的生成calendrical calculations 日期job scheduling 工程安排satisfiability 可满足性graph problems -- polynomial 图论-多项式算法connected components 连通分支topological sorting 拓扑排序minimum spanning tree 最小生成树shortest path 最短路径transitive closure and reduction 传递闭包matching 匹配eulerian cycle / chinese postman euler回路/中国邮路edge and vertex connectivity 割边/割点network flow 网络流drawing graphs nicely 图的描绘drawing trees 树的描绘planarity detection and embedding 平面性检测和嵌入graph problems -- hard 图论-np问题clique 最大团independent set 独立集vertex cover 点覆盖traveling salesman problem 旅行商问题hamiltonian cycle hamilton回路graph partition 图的划分vertex coloring 点染色edge coloring 边染色graph isomorphism 同构steiner tree steiner树feedback edge/vertex set 最大无环子图computational geometry 计算几何convex hull 凸包triangulation 三角剖分voronoi diagrams voronoi图nearest neighbor search 最近点对查询range search 范围查询point location 位置查询intersection detection 碰撞测试bin packing 装箱问题medial-axis transformation 中轴变换polygon partitioning 多边形分割simplifying polygons 多边形化简shape similarity 相似多边形motion planning 运动规划maintaining line arrangements 平面分割minkowski sum minkowski和set and string problems 集合与串的问题set cover 集合覆盖set packing 集合配置string matching 模式匹配approximate string matching 模糊匹配text compression 压缩cryptography 密码finite state machine minimization 有穷自动机简化longest common substring 最长公共子串shortest common superstring 最短公共父串dp——dynamic programming——动态规划recursion ——递归编程词汇a2a integration a2a整合abstract 抽象的abstract base class (abc)抽象基类abstract class 抽象类abstraction 抽象、抽象物、抽象性access 存取、访问access level访问级别access function 访问函数account 账户action 动作activate 激活active 活动的actual parameter 实参adapter 适配器add-in 插件address 地址address space 地址空间address-of operator 取地址操作符adl (argument-dependent lookup)ado(activex data object)activex数据对象advanced 高级的aggregation 聚合、聚集algorithm 算法alias 别名align 排列、对齐allocate 分配、配置allocator分配器、配置器angle bracket 尖括号annotation 注解、评注api (application programming interface) 应用(程序)编程接口app domain (application domain)应用域application 应用、应用程序application framework 应用程序框架appearance 外观append 附加architecture 架构、体系结构archive file 归档文件、存档文件argument引数(传给函式的值)。
数据结构与算法常用英语词汇
数据结构与算法常用英语词汇.txt 女人谨记:一定要吃好玩好睡好喝好。
一旦累死了,就别的女人花咱的钱,住咱的房,睡咱的老公,泡咱的男朋友,还打咱的娃。
第一部份计算机算法常用术语中英对照Data Structures 基本数据结构Dictionaries 字典Priority Queues 堆Graph Data Structures 图Set Data Structures 集合Kd-Trees 线段树Numerical Problems 数值问题Solving Linear Equations 线性方程组Bandwidth Reduction 带宽压缩Matrix Multiplication 矩阵乘法Determinants and Permanents 行列式Constrained and Unconstrained Optimization 最值问题Linear Programming 线性规划Random Number Generation 随机数生成Factoring and Primality Testing 因子分解/质数判定Arbitrary Precision Arithmetic 高精度计算Knapsack Problem 背包问题Discrete Fourier Transform 离散 Fourier 变换Combinatorial Problems 组合问题Sorting 排序Searching 查找Median and Selection 中位数Generating Permutations 罗列生成Generating Subsets 子集生成Generating Partitions 划分生成Generating Graphs 图的生成Calendrical Calculations 日期Job Scheduling 工程安排Satisfiability 可满足性Graph Problems -- polynomial 图论-多项式算法Connected Components 连通分支Topological Sorting 拓扑排序Minimum Spanning Tree 最小生成树Shortest Path 最短路径Transitive Closure and Reduction 传递闭包Matching 匹配Eulerian Cycle / Chinese Postman Euler 回路/中国邮路Edge and Vertex Connectivity 割边/割点Network Flow 网络流Drawing Graphs Nicely 图的描绘Drawing Trees 树的描绘Planarity Detection and Embedding 平面性检测和嵌入Graph Problems -- hard 图论-NP 问题Clique 最大团Independent Set 独立集Vertex Cover 点覆盖Traveling Salesman Problem 旅行商问题Hamiltonian Cycle Hamilton 回路Graph Partition 图的划分Vertex Coloring 点染色Edge Coloring 边染色Graph Isomorphism 同构Steiner Tree Steiner 树Feedback Edge/Vertex Set 最大无环子图Computational Geometry 计算几何Convex Hull 凸包Triangulation 三角剖分Voronoi Diagrams Voronoi 图Nearest Neighbor Search 最近点对查询Range Search 范围查询Point Location 位置查询Intersection Detection 碰撞测试Bin Packing 装箱问题Medial-Axis Transformation 中轴变换Polygon Partitioning 多边形分割Simplifying Polygons 多边形化简Shape Similarity 相似多边形Motion Planning 运动规划Maintaining Line Arrangements 平面分割Minkowski Sum Minkowski 和Set and String Problems 集合与串的问题Set Cover 集合覆盖Set Packing 集合配置String Matching 模式匹配Approximate String Matching 含糊匹配Text Compression 压缩Cryptography 密码Finite State Machine Minimization 有穷自动机简化Longest Common Substring 最长公共子串Shortest Common Superstring 最短公共父串DP——Dynamic Programming——动态规划recursion ——递归第二部份数据结构英语词汇数据抽象 data abstraction数据元素 data element数据对象 data object数据项 data item数据类型 data type抽象数据类型 abstract data type逻辑结构 logical structure物理结构 phyical structure线性结构 linear structure非线性结构 nonlinear structure基本数据类型 atomic data type固定聚合数据类型 fixed-aggregate data type可变聚合数据类型 variable-aggregate data type 线性表 linear list栈 stack队列 queue串 string数组 array树 tree图 grabh查找,线索 searching更新 updating排序(分类) sorting插入 insertion删除 deletion前趋 predecessor后继 successor直接前趋直接后继双端列表循环队列immediate predecessor immediate successor deque(double-ended queue) cirular queue指针 pointer先进先出表(队列) first-in first-out list 后进先出表(队列) last-in first-out list栈底栈定压入弹出队头bottom top push pop front队尾 rear上溢 overflow下溢 underflow数组 array矩阵 matrix多维数组 multi-dimentional array以行为主的顺序分配 row major order以列为主的顺序分配 column major order 三角矩阵 truangular matrix对称矩阵 symmetric matrix稀疏矩阵 sparse matrix转置矩阵 transposed matrix链表 linked list线性链表 linear linked list单链表 single linked list多重链表 multilinked list循环链表 circular linked list双向链表 doubly linked list十字链表 orthogonal list广义表 generalized list链 link指针域 pointer field链域 link field头结点 head 头指针 head 尾指针 tail 串 string node pointer pointer空白(空格)串blank string 空串(零串) null string子串 substring树 tree子树 subtree森林 forest根 root叶子结点深度层次双亲孩子leaf node depth level parents children兄弟 brother祖先 ancestor子孙 descentdant二叉树 binary tree平衡二叉树 banlanced binary tree 满二叉树 full binary tree彻底二叉树 complete binary tree遍历二叉树 traversing binary tree 二叉排序树 binary sort tree二叉查找树 binary search tree线索二叉树 threaded binary tree 哈夫曼树 Huffman tree有序数 ordered tree无序数 unordered tree判定树 decision tree双链树 doubly linked tree数字查找树 digital search tree树的遍历 traversal of tree先序遍历 preorder traversal中序遍历 inorder traversal后序遍历 postorder traversal图 graph子图 subgraph有向图无向图彻底图连通图digraph(directed graph) undigraph(undirected graph) complete graphconnected graph非连通图 unconnected graph强连通图 strongly connected graph 弱连通图 weakly connected graph 加权图 weighted graph有向无环图 directed acyclic graph 稀疏图 spares graph稠密图 dense graph重连通图 biconnected graph二部图 bipartite graph边 edge顶点 vertex弧 arc路径 path回路(环) cycle弧头弧尾源点终点汇点headtailsource destination sink权 weight连接点 articulation point 初始结点 initial node终端结点 terminal node相邻边 adjacent edge相邻顶点 adjacent vertex 关联边 incident edge入度 indegree出度 outdegree最短路径 shortest path有序对 ordered pair无序对 unordered pair简单路径简单回路连通分量邻接矩阵simple pathsimple cycle connected component adjacency matrix邻接表 adjacency list邻接多重表 adjacency multilist遍历图 traversing graph生成树 spanning tree最小(代价)生成树 minimum(cost)spanning tree生成森林 spanning forest拓扑排序 topological sort偏序 partical order拓扑有序 topological orderAOV 网 activity on vertex networkAOE 网 activity on edge network关键路径 critical path匹配 matching最大匹配 maximum matching增广路径 augmenting path增广路径图 augmenting path graph查找 searching线性查找(顺序查找) linear search (sequential search)二分查找 binary search分块查找 block search散列查找 hash search平均查找长度 average search length散列表 hash table散列函数 hash funticion直接定址法 immediately allocating method 数字分析法 digital analysis method平方取中法 mid-square method折叠法 folding method除法 division method随机数法 random number method排序 sort内部排序 internal sort外部排序 external sort插入排序 insertion sort随小增量排序 diminishing increment sort 选择排序 selection sort堆排序 heap sort快速排序归并排序基数排序外部排序quick sort merge sortradix sort external sort平衡归并排序 balance merging sort二路平衡归并排序 balance two-way merging sort 多步归并排序 ployphase merging sort置换选择排序 replacement selection sort文件 file主文件 master file顺叙文件 sequential file索引文件 indexed file索引顺叙文件 indexed sequential file索引非顺叙文件 indexed non-sequential file直接存取文件 direct access file多重链表文件 multilist file倒排文件 inverted file目录结构 directory structure树型索引 tree index。
机器学习常用的KD-Tree 深度讲解
从线段树到KD树在讲KD树之前,我们先来了解一下线段树的概念。
线段树在机器学习领域当中不太常见,作为高性能维护的数据结构,经常出现在各种算法比赛当中。
线段树的本质是一棵维护一段区间的平衡二叉树。
比如下图就是一个经典的线段树:从下图当中我们不难看出来,这棵线段树维护的是一个区间内的最大值。
比如树根是8,维护的是整个区间的最大值,每一个中间节点的值都是以它为树根的子树中所有元素的最大值。
通过线段树,我们可以在的时间内计算出某一个连续区间的最大值。
比如我们来看下图:当我们要求被框起来的区间中的最大值,我们只需要找到能够覆盖这个区间的中间节点就行。
我们可以发现被红框框起来的两个节点的子树刚好覆盖这个区间,于是整个区间的最大值,就是这两个元素的最大值。
这样,我们就把一个需要查找的问题降低成了,不但如此,我们也可以做到复杂度内的更新,也就是说我们不但可以快速查询,还可以更新线段当中的元素。
当然线段树的应用非常广泛,也有许多种变体,这里我们不过多深入,感兴趣的同学可以期待一下周三的算法与数据结构专题,在之后的文章当中会为大家分享线段树的相关内容。
在这里,我们只需要有一个大概的印象,线段树究竟完成的是什么样的事情即可。
线段树维护的是一个线段,也就是区间内的元素,也就是说维护的是一个一维的序列。
如果我们将数据的维度扩充一下,扩充到多维呢?是的,你没有猜错,从某种程度上来说,我们可以把KD-Tree看成是线段树拓展是dimension,也就是维度,也就是说KD-Tree就是K维度树的意思。
在我们构建线段树的时候,其实是一个递归的建树过程,我们每次把当前的线段一分为二,然后用分成两半的数据分别构建左右子树。
我们可以简单写一下伪代码,来更直观地感受一下:class Node:def __init__(self, value, lchild, rchild):self.value = valueself.lchild = lchildself.rchild = rchilddef build(arr):n = len(arr):left, right = arr[: n//2], arr[n//2:]lchild, rchild = self.build(left), self.build(right)return Node(max(lchild.value, rchild.value), lchild, rchild)我们来看一个二维的例子,在一个二维的平面当中分布着若干个点。
基于SVM-Kd-tree的树型粗分类方法
基于SVM-Kd-tree的树型粗分类方法作者:胡素黎黄丰喜刘晓英来源:《软件导刊》2020年第04期摘要:为提高大数据集粗分类识别率,提出一种基于聚类分析的SVM-Kd-tree树型粗分类方法。
首先根据数据集特征分布进行k-means两簇聚类,对聚类后的数据集进行类别分析,同时将属于两簇的同一类别样本划分出来;然后使用两簇中剩余样本训练SVM二分类器并作为树型结构根节点,将两簇数据分别合并,将划分出来的样本作为左右子孩子迭代构建子节点,直到满足终止条件后,叶子节点开始训练Kd-tree。
实验结果表明,迭代构建树型粗分类方法使训练单- SVM平均时间减少了61.9774%.比Kd-tree同近邻数量的准确率提高了0.03%。
在进行大规模数据集粗分类时,使用聚类分析迭代构建组合分类器时间更短、准确率更高关键词:SVM分类;Kd-tree;树型;组合分类器;K-means;聚类DOI: 10. 11907/rjdk.191714开放科学(资源服务)标识码(OSID):中图分类号:TP301文献标识码:A文章编号:1672-7800(2020)004-0111-04Tree-based Rough Classification Method Based on SVM-Kd-treeHU Su -li. HUANG Feng-xi, LIU Xiao-ying(Beijirzg Xitui Technology Co. , Ltd. , Beijing 1 00026 , China )Abstract: In order to improve the rough classif'ication accuracy of large data sets. a SVM-Kd-tree tree classif'ication method based oncluster analysis is proposed. Firstly, cluster the training data sef by K-means according to the feature distribution into two clusters.and the samples of the same category belonging to the two clusters are leaved out. Then remaining samples in the two clusters are usedto train SVM as the root node of the tree structure. The two clusters of data comhined with the Ieaved out samples separately constructthe left and right child nodes. This process is iteratively constructed until iueet the termination condition. and using the samples of leafnode to train Kd-tree. The experimental resuks show that the iterative construction of the tree-hased rough classification method reduc-es the average time for training a single SVM by 61.977 4c7e . \vhich is 0.03% higher than the accuracy of' the same neighbors of'Kd-tree. In the large-scale data set for rough classification, using the cluster analvsis iteratively construct ensemble classifiers hasshorter tirue and higher accuracy.Key Words: SVM; Kd-tree; tree; enseruble classifer; K-means; clusterO 引言近年來,大数据集分类在人工智能领域应用广泛。
kdtree用法 -回复
kdtree用法-回复kdtree用法:一步一步回答导言:kdtree是一种数据结构,用于高效地组织和搜索多维空间中的数据。
它是一种基于二分搜索的树形数据结构,旨在快速定位给定点附近的数据。
这篇文章将详细介绍kdtree的用法,并一步一步回答关于它的常见问题。
第一步:什么是kdtree?kdtree(k-dimensional tree)是一种用于存储k维空间中数据的二叉树。
它的目的是将数据划分为多个相邻的超矩形区域,并用树结构来组织和快速索引这些区域。
每个节点都代表一个k维空间中的超矩形区域,并且每个节点的左右子节点都表示他们所代表的区域的划分。
第二步:如何构建kdtree?构建kdtree的过程可以通过以下步骤完成:1. 选择根节点:从数据集中选择一个点作为根节点。
可以选择任何一个点,通常选择数据集中的中值点。
2. 选择划分维度:确定当前节点在哪个维度上进行划分。
可以使用一些启发式方法来选择,例如轮流选择、方差最大化等。
3. 左右子节点的构建:将数据集中的点根据划分维度的值分成两个子集,分别作为当前节点的左右子节点。
左子节点包含小于等于划分值的点,右子节点包含大于划分值的点。
4. 递归构建:对于每个子节点,重复步骤2和3,直到所有数据都被用于构建kdtree。
第三步:如何在kdtree中搜索?在kdtree中搜索一个点的过程可以通过以下步骤完成:1. 根节点的搜索:从根节点开始,比较要搜索的点与当前节点的划分值。
如果要搜索的点小于划分值,则进入左子节点,否则进入右子节点。
2. 递归搜索:对于每个子节点,重复步骤1,直到找到一个叶子节点为止。
3. 回溯:从找到的叶子节点开始,向上回溯,查找距离目标点更近的点。
在回溯过程中,如果目标点离当前节点所代表的区域更近,则进入当前节点的另一个子节点。
第四步:kdtree的优缺点是什么?kdtree有以下优点:- 高效的搜索:kdtree通过划分空间来组织数据,能够快速定位给定点附近的数据。
kdtree通俗原理
kdtree通俗原理英文回答:Kd-tree, short for k-dimensional tree, is a data structure that is commonly used for organizing multidimensional data in computer science. It isparticularly useful for solving nearest neighbor search problems. The basic idea behind a kd-tree is to partition the data space into regions, such that each region contains a subset of the data points.To construct a kd-tree, we start with a set of data points. We choose a splitting axis and a splitting value that divides the data points into two subsets. Thesplitting axis is chosen based on some criteria, such as the axis with the largest variance in the data. Thesplitting value is usually the median value along the chosen axis. The data points are then divided into two subsets based on whether they are less than or greater than the splitting value along the splitting axis.This process is repeated recursively for each subset until a stopping condition is met. The stopping condition can be a maximum depth of the tree, a minimum number of data points in a leaf node, or any other criteria that we choose. At each level of the tree, we alternate thesplitting axis to ensure that the data points are evenly distributed across the tree.Once the kd-tree is constructed, we can perform nearest neighbor search by traversing the tree based on thesplitting criteria. Starting at the root node, we compare the query point with the splitting value along thesplitting axis. Based on the comparison, we choose thechild node to visit next. We continue this process until we reach a leaf node, which contains a subset of data points. We then calculate the distance between the query point and each data point in the leaf node, and return the nearest neighbor.Kd-tree has several advantages over other data structures for nearest neighbor search. It has a relativelyfast construction time, especially for high-dimensional data. It also has a low memory footprint, as it only requires storing the splitting axis and splitting value at each node. Additionally, kd-tree can be efficiently used for range search queries and k-nearest neighbor search.中文回答:Kd树,即k维树,是计算机科学中常用的一种用于组织多维数据的数据结构。
无爪图的支撑k-端点树的存在性
第 43 卷第 3 期2024年 5 月Vol.43 No.3May 2024中南民族大学学报(自然科学版)Journal of South-Central Minzu University(Natural Science Edition)无爪图的支撑k-端点树的存在性严政,李丽珠*(长江大学信息与数学学院,湖北荆州434000)摘要树T中度为1的点称为叶子,叶子数目不超过k的树称为k-端点树. 图中存在一个哈密尔顿路,说明图中存在恰好含有两个叶子的支撑树. 自然就有了关于哈密尔顿路问题的一个推广:考虑图中至多有k个叶子的支撑树即支撑k-端点树的存在性问题. 通过控制集参数,确定了连通无爪图中存在支撑k-端点树条件.关键词无爪图;支撑树;叶子;控制集中图分类号O157.5 文献标志码 A 文章编号1672-4321(2024)03-0424-04doi:10.20056/ki.ZNMDZK.20240318Existence of spanning k-ended trees in claw-free graphsYAN Zheng,LI Lizhu*(College of Information and Mathematics, Yangtze University, Jingzhou 434000, Hubei China)Abstract Let T be a tree. A vertex of degree one is a leaf of T. A tree having at most k leaves is called a k-ended tree. A Hamiltonian path is a spanning tree having exactly two leaves. From this point of view,some sufficient conditions for a graph to have a Hamiltonian path are modified to those for a spanning k-ended tree. A sufficient condition using dominating set is given for a connected claw-free graph who has spanning k-ended tree.Keywords claw-free graph; spanning tree; leaf; dominating set图的结构问题是图论研究的核心,其中图的支撑树特征问题是结构图论中一个重要的研究课题,其产生和发展与著名的哈密尔顿问题密切相关. 哈密尔顿问题是在著名的四色猜想研究的基础上产生和发展起来的,受到国内外许多图论学者的关注. 图中存在一个哈密尔顿路,说明图中存在恰好含有两个叶子的支撑树. 自然就有了关于哈密尔顿问题的一个推广:什么条件下图中存在至多有k个叶子的支撑树即支撑k-端点树. 图的哈密尔顿路判定问题是一个NP-完全问题[1],一个图是否存在支撑k-端点树也是一个NP-完全问题[2],图的参数是研究该类问题的主要方向. 本文利用图的控制集条件,给出了连通无爪图中存在支撑k-端点树的条件.1 准备知识设G=(V(G),E(G))是一个图,其中V(G)和E (G)分别表示图G的点集和边集,用|V(G)|和|E(G)|分别表示图G的点数和边数,也可将|V(G)|简记为|G|. 当图G没有环和重边时,称该图为一个简单图,本文研究的均为有限简单图. 设v为图G的一个顶点,deg G(v)表示在图G中与点v相关联的边数,称为点v在图G中的度数. 若图G中两个点不相邻,则称它们是独立的. 若V(G)的子集中任意两个点都是独立的,则称该子集为G的独立点集. G中最大点独立集的点数称为图G的独立数,记作α(G). 设k为一个正整数,定义度和:σK=ìíîïïmin{∑v∈S deg G(v)} α(G)≥k+∞ 否则,其中S为图G的点独立集,且|S|=k.当图G中不含同构于K1,r的诱导子图时,称图G 为K1,r-free的,特别的,称K1,3-free图为无爪图. 设D⊂V(G),若对任意u∈V(G)-D,存在v∈D,使得uv∈E收稿日期2023-06-12 *通信作者李丽珠,研究方向:图论,E-mail:******************作者简介严政(1982-),男,副教授,博士,研究方向:图论,E-mail:*********************基金项目国家自然科学基金资助项目(12271061);湖北省教育厅科学技术研究资助项目(D2*******)第 3 期严政,等:无爪图的支撑k -端点树的存在性(G ),则称D 为图G 的一个控制集. 含k 个顶点的控制集称为k -元素控制集.连通无圈的图称为树,通常用T 表示. 若树T 为图G 的一个子图,且包含图G 的所有顶点,则称T 为图G 的支撑树. 树中度为1的点称为树的叶子或悬挂点,记树T 中所有叶子组成的集合为leaf (T ). 图G 中叶子数不超过k 的支撑树称为支撑k -端点树. 树中度不小于3的点称为分支点,记树T 中所有分支点组成的集合为B (T ). 设P =v 0v 1…v s 为树T 中一条从v 0到v s 的路,其中deg T (v 0)=1,deg T (v s)≥3,其余点的度为2(若存在),则称路P 为T 的一条悬挂链.若u 和v 是树T 的两个顶点,则在T 中从u 到v 的路记作uTv . 设P 1为一条从x 到y 的路,P 2为一条从z 到w 的路,若y ,z 相邻,则P 1∪P 2表示一条从x 到z 且经过边yz 的路.对两个n 元组α=(a 1,a 2,…,a n )和β=(b 1,b 2,…,b n ),若0≤a 1≤a 2≤…≤a n ≤,0≤b 1≤b 2≤…≤b n (n ≥2),当∑i =1na i =∑i =1nb i且存在a n =b n ,a n -1=b n -1,…,a t +1=b t +1,a t >b t (2≤t ≤n)时,称α比β更优.对于图中含有支撑k -端点树的问题,可以从独立数、度和、连通度等参数的角度进行研究. 最早由WIN 在[3]中得到了有关独立数条件的结论:定理 1[3]:若图G 为m -连通图且α(G )≤m +k -1,其中k ≥2,则图G 中存在支撑k -端点树.HAJO 在[4]中得到了关于度和条件的结论.定理 2[4]:若图G 为顶点数至少为2的连通图且σ2(G )≥|G|-k +1,其中k ≥2,则图G 中存在支撑k -端点树.此外,在[5]中KANO 等人考虑了连通无爪图,得到下列结论:定理 3[5]:若图G 为连通无爪图,且σk +1(G )≥|G|-k ,其中k ≥2,则图G 中存在支撑k -端点树.KYAW [6]得到了关于K 1,4-free 连通图的结论.定理 4[6]:若图G 为连通K 1,4-free 图,且σk +1(G )≥|G |-k 2,其中k ≥3,则图G 中存在支撑k -端点树.陈圆等人[7]给出了m -连通图存在支撑3-端点树的结论.定理 5[7]:若图G 为m -连通K 1,4-free 图,设m ≥2,若σm +3(G )≥|G|+2m -2,则图G 中存在支撑3-端点树.孙培[8]提出了有关K 1,5-free 连通图含支撑5-端点树的度和条件.定理 6[8]:若图G 为连通K 1,5-free 图,且σ6(G)≥|G|-1,则图G 中存在支撑5-端点树.AGEEV 在[9]中利用控制集条件,给出了无爪图的哈密尔顿连通条件.定理 7[9]:若图G 为2-连通无爪图,且存在2-元素控制集,则图G 为哈密尔顿连通图.张胜贵[10]等人将上述条件中的2-连通图减弱为连通图条件,得到下列结论.定理 8[10]:若图G 为连通无爪图,且存在2-元素控制集,则图G 中存在哈密尔顿路.更多与叶子有关的支撑树问题可以参考文献[11]、[12]、[13]、[14]、[15]等.本文利用控制集参数,给出了连通无爪图含有支撑k -端点树的条件,证明了下述定理:定理 9:若图G 为连通无爪图且存在k -元素控制集,其中k ≥2,则图G 中存在支撑k -端点树.2 定理9的证明下面引理给出了树中叶子数和顶点度的关系.引理 1:设T 为一个树,X ⊂V (T )且对任意的x ∈X 均有deg T (x )≥3,则T 的叶子数满足:|Leaf (T)|=∑x ∈X(deg T (x )-2)+2.定理9的证明:当k =2时,由定理8知,图中存在一条哈密尔顿路,即图中存在支撑2-端点树,此时定理成立. 因此下面考虑k ≥3.设图G 满足定理9的条件,但不存在支撑k -端点树,选择图G 的一个支撑树T ,使得:(T1) |Leaf (T )|尽可能小.不失一般性,设|Leaf (T )|={x 1,x 2,…,x l }. 显然l ≥k +1,否则树T 就是一个满足定理结论的支撑树,定理成立. 在树T 中,存在l 个悬挂链,记作P 1,P 2,…,P l ,其中x i (1≤i ≤l )分别为P i 的叶子,令1≤|P 1|≤|P 2|≤…≤|P l |.(T2) 在满足条件(T1)的情况下,|I (T )|尽可能大.(T3) 在满足条件(T1)和(T2)的情况下,p =(|P 1|,|P 2|,…,|P l |)尽可能最优.结论1:|Leaf (T )|为图G 的点独立集.反证法,假设结论1不成立. 不失一般性,设|Leaf (T )|中存在两点x i ,x j 在G 中相邻,即x i x j ∈E (G ),其中1≤i ≠j ≤l . 则T +x i x j 中含唯一的圈C ,圈C 上必有度大于等于3的点,记为u . 设u ’为圈C 上与u 相邻的点,则有T *=T +x i x j -uu ’,当deg G (u ’)=2时,Leaf (T *)=(Leaf (T )-{x i ,x j })∪{u ’};当deg G (u ’)≥3时,Leaf (T *)=425第 43 卷中南民族大学学报(自然科学版)Leaf(T)-{x i,x j}. 综上有|Leaf(T*)|<|Leaf(T)|,与(T1)矛盾. 故|Leaf(T)|为图G中的独立集.设W={w1,w2,…,w t}为图G的一个最小元素控制集,令D i=N G(w i)∪{w i}(1≤i≤t),称w i为D i的控制点. 显然有∪i=1t D i=V(G). 记|I(T)|=∪u,v∈B()T uTv.结论2:I(T)≥2.反证法,假设|I(T)|=1,记|I(T)|={v},则T中其余点均在悬挂链上. 由引理1知deg T(v)=|Leaf(T)|≥k+1≥4,设NT(v)={v1,v2,v3,v4,…}. 因为图G为无爪图,不失一般性,设v1,v2在图G中相邻,由结论1知v1,v2至多只有一个叶子.若v1,v2中有一个叶子,假设为v1,则此时有T*=T+v1v2-v2v,Leaf(T*)=Leaf(T)-{v1},即|Leaf(T*)|<|Leaf(T)|,与(T1)矛盾.故v1,v2不是叶子,deg T(v1)=deg T(v2)=2,有T*=T+v1v2-v2v. 此时Leaf(T*)=Leaf(T),I(T*)=I(T)∪{v1},即|I(T*)|>|I(T)|,这与(T2)矛盾.故I(T)中至少有两个度数大于等于3的点,即I(T)≥2.结论3:对任意两点xi,x j∈Leaf(T),x i,x j不属于同一个D m(1≤m≤t).反证法,假设结论不成立,存在一个D m(1≤m≤t),有两个叶子xi,x j在D m中,即x i,x j∈D m. 由结论1和控制集定义知,x i x j∉E(G),且x i w m,x j w m∈E(G),w m∉Leaf(T). 下面分情况讨论:情况1:w m∈I(T)由结论2知,w m在I(T)中有邻点,记为w m*,即w m wm*∈E(T)且w m*∈I(T). 由于图G为无爪图,{w m,w m *,xi,x j}中至少有两点相邻,而x i,x j不相邻,故w m*与x i,x j中至少一个点相邻,不失一般性,设x i w m*∈E (G). 则有T*=T+x i w m+x i w m*-w m w m*-x i x i’,其中x i’为x i在T中的邻点.当deg T(x i’)≥3时,Leaf(T*)=Leaf(T)-{x i},此时|Leaf(T*)|<|Leaf(T)|,与(T1)矛盾. 当deg T(x i’)=2时,Leaf(T*)=(Leaf(T)-{x i})∪{x i’},有I(T*)=I(T)∪{x i},即|Leaf(T*)|=|Leaf(T)|,|I(T*)|>|I(T)|,与(T2)矛盾.故此种情况不成立. x j w m*∈E(G)时,同理不成立.情况2:w m∉I(T)wm不在I(T)上,即w m在悬挂链上. 记P i,P j,P w 分别为T中x i,x j和w m所在的悬挂链,且|P i|≤|P j|. 因为wm∉Leaf(T),记N T(w m)={y1,y2},其中y1为离叶子更近的点,x i’、x j’分别为x i、x j在T中的邻点.(1) 若P w=P i,有T*=T+x j w m-y2w m.当deg T(y2)≥3时,Leaf(T*)=Leaf(T)-{x j}. 此时|Leaf(T*)|<|Leaf(T)|,与(T1)矛盾.当deg T(y2)=2时,Leaf(T*)=(Leaf(T)-{x j})∪{y2},|I(T*)|=|I(T)|,P j*=Pj∪w m P i x i,|P j*|=|P j|,与(T3)矛盾.故此种情况不成立.(2) 若P w=P j.图G为无爪图,则y2在图G中与x i、x j中至少一点相邻.若x i y2∈E(G),则令T*=T+x i w m+x i y2-w m y2-x i x i’,当deg T(x i’)≥3时,Leaf(T*)=Leaf(T)-{x i},则|Leaf(T*)|< |Leaf(T)|,与(T1)矛盾;deg T(x i’)=2时,Leaf(T*)=(Leaf(T)-{x i})∪{x i’},P j*=P j∪{x i},则|Leaf(T*)|=|Leaf (T)|,|I(T*)|=|I(T)|,|P j*|>|P j|,与(T3)矛盾.若x j y2∈E(G),则令T*=T+x i w m+x j y2-w m y2-x i’x i’’,其中x i’为链P i上的分支点,x i’’为悬挂链P i上与x i’相邻的点. 此时有Leaf(T*)=(Leaf(T)-{x i,x j})∪{x i’’},即|Leaf(T*)|<|Leaf(T)|,与(T1)矛盾,此种情况不成立.(3) 若P w≠P i且P w≠P j.当|P w|≤|P j|时,令T*=T+x j w m-w m y2. 若deg T(y2)≥3,则Leaf(T*)=Leaf(T)-{x j},此时|Leaf(T*)|<|Leaf(T)|,与(T1)矛盾. 若deg T(y2)=2,则Leaf(T*)=(Leaf(T)-{x j})∪{y2},|I(T*)|=|I(T)|,P j*=P j∪w m P w x w(其中x w为悬挂链P w上的叶子),则|P j*|>|P j|,与(T3)矛盾,此种情况不成立.当|P w|>|P j|>|P i|时,由图G为无爪图和结论1可知,y2在G中与x i,x j至少一点相邻.若x i y2∈E(G),则令T*=T+x i w m+x i y2-w m y2-x i x i’. 当deg T(x i’)≥3时,Leaf(T*)=Leaf(T)-{x i},即|Leaf(T*)|< |Leaf(T)|,与(T1)矛盾. 当deg T(x i’)=2时,Leaf(T*)=(Leaf(T)-{x i})∪{x i’},|I(T*)|=|I(T)|,P w*=P w∪{x i},则|Leaf(T*)|=|Leaf(T)|,|P w*|>|P w|,与(T3)矛盾. 此种情况不成立.若x j y2∈E(G),则令T*=T+x j w m+x j y2-w m y2-x j x j’,同理得到矛盾. 此种情况不成立.综上所述,T中任意两个不同的叶子不属于同一个D m(1≤m≤t).由结论3知任意两个不同的叶子属于不同的D i (1≤i≤t),从而t≥l≥k+1,即图G至少有k+1个控制点,与定理条件矛盾,假设不成立. 因此对任意有k-元素控制集的连通无爪图G存在支撑k-端点树. 定理得证.426第 3 期严政,等:无爪图的支撑k-端点树的存在性3 结论本文讨论了连通无爪图中有k-元素控制集时,该图存在支撑k-端点树. 下面的例子说明定理9条件中k-元素控制集条件是最优的. 设k、m为正整数,其中k≥2,m≥1,构造一个顶点数为3k+3的圈C3k+3= x1v1y1x2v2y2…x k+1v k+1y k+1x1,将x i与y i相连(1≤i≤k+1),令D1,D2,…,Dk+1为k+1个顶点不相交的完全图K m,将Di中的每个点均与v i相连,所得的图记为G. 容易验证,图G是一个无爪图,{v1,v2,…,v k+1}为图G的k+1-元素控制集,图G不存在k-元素控制集,且图G中不存在支撑k-端点树.定理9从控制集参数的角度给出了判断图中是否存在支撑k-端点树的条件,利用该条件可判断部分定理3所不能判断的图类,为图中存在支撑k-端点树的判定提供了一种新的判定方法. 设k,m为整数,k≥2,m≥2,构造一个顶点数为3k的圈C3k= x1v1y1x2v2y2…x k v k y k x1,将x i与y i相连(1≤i≤k),令D1,D2,…,D k为k个顶点不相交的完全图K m,将D i中的每个点均与v i相连,所得的图记为G. 显然图G为无爪图,且|G|=k(3+m),{v1,v2,…,v k}为其k-元素控制集. 而σk+1(G)=3k+m=|G|-(k-1)m<|G|-k,因此由定理3不能判定该图是否存在支撑k-端点树,但由定理9可知该图存在支撑k-端点树.参考文献[1]HARTMANIS J. Computers and intractability: A guide to the theory of NP-completeness [J]. SIAM Review, 1982,24(1): 90-91.[2]CARVALHO I. QUBO formulations for NP-Hard spanning tree problems[EB/OL]. 2022:arXiv:2209.05024.https:///abs/2209.05024.pdf.[3]WIN S. On a conjecture of Las Vergnas concerning certain spanning trees in graphs[J]. Results in Mathematics,1979, 2(1-2): 215-224.[4]BROERSMA H,TUINSTRA H. Independence trees and Hamilton cycles[J]. Journal of Graph Theory, 1998, 29(4): 227-237.[5]MATSUDA H,OZEKI K,YAMASHITA T. Spanning trees with a bounded number of branch vertices in a claw-free graph[J]. Graphs and Combinatorics, 2014, 30(2):429-437.[6]KYAW A. Spanning trees with at most k leaves in K1, 4-free graphs[J]. Discrete Mathematics, 2011, 311(20):2135-2142.[7]CHEN Y, CHEN G T, HU Z Q. Spanning 3-ended trees in k-connected K_1,4-free graphs[J]. Science ChinaMathematics, 2014, 57(8): 1579-1586.[8]HU Z Q, SUN P. Spanning 5-ended trees in $$K1, 5$$K1,5-free graphs[J]. Bulletin of the Malaysian MathematicalSciences Society, 2020, 43(3): 2565-2586.[9]AGEEV A A. Dominating sets and hamiltonicity inK1, 3-free graphs[J]. Siberian Mathematical Journal, 1994, 35(3): 421-425.[10]ZHENG W,BROERSMA H,WANG L G,et al.Conditions on subgraphs,degrees,and dominationfor Hamiltonian properties of graphs[J]. DiscreteMathematics, 2020, 343(1): 111644.[11]AKIYAMA J,KANO M. [a,b]-factorizations[M]// Lecture Notes in Mathematics. Berlin,Heidelberg:Springer Berlin Heidelberg, 2011.[12]OZEKI K,YAMASHITA T. Spanning trees:A survey [J]. Graphs and Combinatorics, 2011, 27(1): 1-26.[13]EGAWA Y, MATSUDA H, YAMASHITA T, et al. Ona spanning tree with specified leaves[J]. Graphs andCombinatorics, 2008, 24(1): 13-18.[14]TSUGAKI M, YAMASHITA T. Spanning trees with few leaves[J]. Graphs and Combinatorics,2007,23(5):585-598.[15]FLANDRIN E,KAISER T,KUŽEL R,et al.Neighborhood unions and extremal spanning trees[J].Discrete Mathematics, 2008, 308(12): 2343-2350.(责编&校对雷建云)427。
kd tree构建过程
kd tree构建过程kd-tree(也叫k维树)是一种用于解决多维空间搜索问题的数据结构。
它的构建过程涉及到数据的分割和递归,下面将详细讨论kd-tree的构建过程。
首先,kd-tree是一种二叉树,每个节点表示一个k维的数据点。
根据构建规则,kd-tree的每个节点都是一个超矩形,该超矩形划分了k维空间,即数据点可以根据某种标准分布在超矩形的两个子空间中。
构建kd-tree的基本思想是在每一次构建过程中,选择一个合适的维度作为切分维度,并根据该维度的中位数将数据点分割成两个子集。
这样,在每个节点上都有一个切分维度和切分值。
下面具体介绍kd-tree的构建过程:1.首先,确定根节点。
根节点的选择可以采用多种策略,常见的是选择数据集中的一个点作为根节点。
2.确定切分维度和切分值。
在每一次构建过程中,需要选择一个合适的维度作为切分维度,并根据该维度的中位数作为切分值。
切分维度的选择可以采用循环方式,依次选择每个维度作为切分维度,也可以采用更复杂的策略,比如选择方差最大的维度作为切分维度。
3.将数据点分割成两个子集。
根据切分维度和切分值,将数据点分割成两个子集,一个子集包含小于等于切分值的数据点,另一个子集包含大于切分值的数据点。
4.创建节点并递归构建子树。
根据切分维度、切分值和分割后的两个子集,创建节点并递归构建左子树和右子树。
左子树中的点小于等于切分值,右子树中的点大于切分值。
5.重复以上步骤,直至每个节点只包含一个数据点或者没有数据点为止。
构建完成后,kd-tree就可以用于解决多维空间搜索问题。
通过递归搜索每个节点的子树,可以快速定位到目标数据点所在的区域,并进行相应的操作。
构建kd-tree的过程中有一些需要注意的细节:1.切分维度的选择:切分维度不同,构建出的kd-tree也会有所不同。
选择合适的切分维度可以使得树的平衡性更好,从而提高搜索效率。
2.切分值的选择:切分值的选择直接影响切分后的子集,选择不合适的切分值会导致子树的平衡性变差。
考虑局部均值和类全局信息的快速近邻原型选择算法
第40卷第6期自动化学报Vol.40,No.6 2014年6月ACTA AUTOMATICA SINICA June,2014考虑局部均值和类全局信息的快速近邻原型选择算法李娟1,2王宇平1摘要压缩近邻法是一种简单的非参数原型选择算法,其原型选取易受样本读取序列、异常样本等干扰.为克服上述问题,提出了一个基于局部均值与类全局信息的近邻原型选择方法.该方法既在原型选取过程中,充分利用了待学习样本在原型集中k个同异类近邻局部均值和类全局信息的知识,又设定原型集更新策略实现对原型集的动态更新.该方法不仅能较好克服读取序列、异常样本对原型选取的影响,降低了原型集规模,而且在保持高分类精度的同时,实现了对数据集的高压缩效应.图像识别及UCI(University of California Irvine)基准数据集实验结果表明,所提出算法集具有较比较算法更有效的分类性能.关键词数据分类,原型选择,局部均值,类全局信息,自适应学习引用格式李娟,王宇平.考虑局部均值和类全局信息的快速近邻原型选择算法.自动化学报,2014,40(6):1116−1125DOI10.3724/SP.J.1004.2014.01116A Fast Neighbor Prototype Selection Algorithm Based on Local Mean andClass Global InformationLI Juan1,2WANG Yu-Ping1Abstract The condensed nearest neighbor(CNN)algorithm is a simple non-parametric prototype selection method, but its prototype selection process is susceptible to pattern read sequence,abnormal patterns and so on.To deal with the above problems,a new prototype selection method based on local mean and class global information is proposed. Firstly,the proposed method makes full use of those local means of the k heterogeneous and homogeneous nearest neighbors to each be-learning pattern and the class global information.Secondly,an updating process is introduced to the proposed stly,updating strategies are adopted in order to realize dynamic update of the prototype set. The proposed method can not only better lessen the influence of the pattern selected sequence and abnormal patterns on prototype selection,but also reduce the scale of the prototype set.The proposed method can achieve a higher compression efficiency that can guarantee the higher classification accuracy synchronously for original data set.Two image recognition data sets and University of California Irvine(UCI)benchmark data sets are selected as experimental data sets.The experiments show that the proposed method based on the classification performance is more effective than the compared algorithms.Key words Data classification,prototype selection,local mean,global class information,adaptive learningCitation Li Juan,Wang Yu-Ping.A fast neighbor prototype selection algorithm based on local mean and class global information.Acta Automatica Sinica,2014,40(6):1116−1125在机器学习和数据挖掘任务中,作为一种简单成熟的分类算法KNN(k-nearest neighbors algorithm)[1]获得广泛的应用.作为数据挖掘领域的十大经典算法之一,KNN算法具有理论简单、易收稿日期2013-06-19录用日期2013-11-11Manuscript received June19,2013;accepted November11, 2013国家自然科学基金(61272119)资助Supported by National Natural Science Foundation of China (61272119)本文责任编委章毓晋Recommended by Associate Editor ZHANG Yu-Jin1.西安电子科技大学计算机学院西安7100712.陕西师范大学远程教育学院西安7100621.School of Computer Science and Technology,Xidian Univer-sity,Xi an7100712.School of Distance Education,Shaanxi Normal University,Xi an710062于实现、无需预先训练分类器、可适用各种数据分布环境等优势,然而尤其处理大规模数据集时,由于其简单的处理策略而导致产生难以接受的时间和空间消耗.故在分类算法中,如何对大规模数据集去除冗余节点,保留高效分类贡献的代表点,进而降低数据规模、提高分类速度,成为了研究热点.为此,一种有效的处理策略即原型选择就是对原始数据集进行必要缩减,即在保证不降低甚至提高分类精度等性能的基础上,对原始训练集处理从中获取能够反映原始数据集分布及分类特性的代表样本集即原型集,进而降低数据规模和噪音的敏感度,提高分类算法执行效率.6期李娟等:考虑局部均值和类全局信息的快速近邻原型选择算法11171相关技术1.1原型选择算法原型选择算法的重要应用之一是作为某个分类算法的预处理步骤,可与各种分类算法相结合,降低分类算法的数据规模.本文选定原型选择算法与近邻算法相结合,通过分类精度比较所提出算法的执行效率.原型选择算法目标为在不降低分类性能的基础上,去除噪音等异常节点,降低训练集规模,进而提高算法执行效率.其常见模型[2]为:设T R(Training set)为训练集(包含一些无用信息,如噪音、冗余信息等),寻求选择子集T P(Training prototype set),T P⊂T R使得T P不包含多余原型,且Acc(T P)∼=Acc(T R)(Acc(X)表示X作为训练集所获得的分类精度).而在分类过程中,使用T P代替T R作为分类判断基准数据,从而降低了运算的数据规模.原型选择算法一经提出,就获得了长足的发展,产生了诸多的研究成果.其中剪辑近邻法(Edited nearest neighbor,ENN)[3]与压缩近邻法(Con-densed nearest neighbour,CNN)[4]是较早提出的样本选择算法.CNN算法的缺点是对原样本集样本的排列顺序敏感,而且压缩集中含有较多的冗余样本.围绕着CNN算法,产生了一系列改进算法:如FCNN(Fast condensed nearest neighbor)[5]侧重降低样本读取序列敏感性和尽可能获取类决策边界原型;GCNN(Generalized condensed nearest neighbor)[6]引入了同异类近邻,克服了CNN仅使用同类近邻的不足;MNV(Mutual neighborhood value)[7]使用互近邻值降低算法样本读取序列敏感性;RNN(Reduced nearest neighbor rule)[7]侧重于改进CNN算法原型集不能删除的缺陷等;基于聚类策略的类边界样本选择算法,如IKNN(Im-proved k-nearest neighbor classification)[8]和PSC (Prototype selection by clustering)[9]等.上述算法仍然具备算法对噪音的敏感性.通常压缩近邻法即剪辑法,通过去除噪声点和清理不同类别重叠区的样本点来达到代表点选择的目的.编辑法主要采取剔除原始样本集中的噪音等策略,是一种非增量算法,不适用于大规模数据集处理.为此,如何降低传统增量原型选择算法对样本读取序列、异常点敏感,成为增量原型选择算法的研究热点,同时也是本文研究的主要问题.1.2局部均值或类均值分类算法针对KNN算法的噪音敏感性及传统只关注近邻样本忽略其样本分布等弊端,很多研究者考虑了近邻局部均值或类均值信息与样本分布的关系,将近邻局部信息和类统计或均值信息纳入到近邻分类算法中.其中Mitani等[10]提出了一种基于局部均值的非参数分类方法,克服离群点对分类性能的影响,尤其在小样本情形下分类性能较好.Brown 等[11]使用了各自类近邻类样本距离加权信息进行分类,区别于文献[10]中样本集距离加权信息;Han 等[12]引入了类中心思想,充分利用训练样本的整体信息分类;在此基础上,Zeng等[13]提出了基于局部均值和类均值的分类算法,既利用未分类样本在每类里的近邻局部均值信息,又利用类均值的整体知识进行分类;而Brighton等[14]则定义了待学习样本的Reachable和Coverage概念,在此基础上同ENN算法相结合,提出了迭代样本过滤算法(The iterative casefiltering,ICF),Wang等[15]对其进行改进,提出了ISSARC(An iterative algorithm for sample selection based on the reachable and coverage)算法.设置不同的参数,基于均值的分类方法可退化传统最近邻方法.当选择待分类样本在每类训练样本集里的近邻数为1时,则该局部均值方法等价于最近邻分类;当选取近邻数等于对应类的训练样本数时,则等价于欧几里得距离分类[7].综上,在传统原型选择算法中,借鉴样本局部近邻均值和全局均值等信息,可进一步贴合原型集分布状态,降低了异常原型的干扰.本文在传统CNN 算法基础上,利用近邻局部均值和类全局信息,同时借鉴RNN样本删除思想,提出一种新的原型选择算法(An improved nearest neighbor prototype selection algorithm based on local-mean and class global information,LCNN),可在保障不降低甚至提高分类效率基础上,较好克服CNN及其改进算法对样本读取序列的依赖性,提升原型集的动态更新能力、降低算法的噪音敏感性.2考虑局部均值和类全局信息的近邻原型选择算法为便于描述,本文使用以下符号:记任意数据集D={x i=(x i1,x i2,···,x id)|i=1,2,···},类标记集C={c1,c2,···,c m},d为样本维度,m为类别数.记T R={(x i,y i)|x i∈D,y i∈C,i= 1,2,···,n}为训练集;T P⊂T R为训练所得原型集;T S(Testing set)与T R同构,为若干样本的测试集.1118自动化学报40卷设T P =∅,记任一待扫描学习样本x ∈T R ,任一原型p ∈T P ;s kx =S k (x )⊂T P 为x 同类别的k 个近邻原型;h kx =H k (x )⊂T P 为x 异类别的k 个近邻原型;d (x,y )为x 与y 间欧氏距离;label (x )表示样本x 的类别;D (x )= ki =1w i d (x,x i )(其中w i 为x 的第i 近邻原型的距离加权系数,x i 表示x 的第i 近邻原型)为x 的k 近邻加权距离和,即本文所定义的局部均值信息;Ind (p )表示原型p 在T P 中对应索引;P S 为四元组结构,用以表示p 及其同异类最近邻原型关系,其中P S (1)、P S (2)、P S (3)分别表示T P 中p 索引、p 同类最近邻索引和p 异类最近邻索引,P S (4)表示p 是否被删除的标识.2.1LCNN 算法策略在CNN 算法基础上产生了诸多的改进算法.但这些算法仅利用所筛选的k 近邻样本类别信息,未考虑到样本分布等数据集的局部或全局信息,易受近邻样本偏好影响;同时仍保留着CNN 算法的样本读取序列及噪音的敏感性;且较少涉足对原型集样本的动态增删操作,使噪音和孤立点等样本得以延续保存.为此,本文对CNN 算法进行必要改进,其处理策略如下:1)去除CNN 算法无指导的新类别原型获取策略,新增初始化操作,主要完成类全局信息的获取和所有类别初始原型的获取,以类全局信息调控噪音和孤立点能否成为原型;2)针对CNN 算法中仅使用最近邻样本而导致易受样本读取序列和噪音干扰的情况,扩充最近邻样本为k 同类近邻样本和k 异类近邻样本,使用同类近邻均值和异类近邻均值信息作为原型初步判断条件,可有效降低CNN 算法噪音敏感度;3)在预设的更新周期内,使用局部均值及类全局信息完成对孤立点原型、类中心区域原型等删除操作,进而对原型集信息进行针对性更新.图1显示了LCNN 算法运行框图,其中虚线框部分为本文研究的主要内容,即实现原型集选择功能;而分类算法构造分类器部分,本文选择了最近邻分类用以检验LCNN 所产生原型集的性能.令N 1,N 2,···,N m 表示T P 中对应于类别c 1,c 2,···,c m 的原型个数.设x 有k 个有效可取的同异类原型,s kx 为测试样本x 获取T P 中k 个近邻同类原型,那么x 同类局部均值为:D s =k i =1w i d (x,s i kx )(1)同理,h kx 为测试样本x 获取T P 中k 个近邻异类原型,x 的异类原型局部均值为:D h =k i =1w i d (x,h i kx )(2)对于T P 中属于类别c j (j =1,2,···,m )的原型表示为T P j ={p i j |i =1,2,···,N j },那么c j 类的全局均值原型为:G j =1N j N ji =1pij(3)对于类别c j 原型与类均值原型的平均距离为:D j =1N j N ji =1d (p i j ,G j )(4)综上,以类别为整体的均值原型及平均距离都属于c j 的全局信息,故定义了GD =<GD 1,GD 2,···,GD m >为T P 各类的全局信息结构,其中GD j (1)表示类均值原型向量,用来存储T P 各类别的动态中心,即存储G j ;GD j (2)表示类原型间平均距离,用来存储T P 各类别原型间的动态平均距离,即存储D j .本文中GD 被称为T P 类全局信息.图1LCNN 算法运行框图Fig.1Running diagram of LCNN algorithm2.2算法主要处理过程原型集初始化过程、原型学习过程和原型更新过程是LCNN 算法的核心内容.其中,初始化过程采取随机比例的样本读取获取类全局信息,根据类全局信息有指导性选取原型集初始化,降低CNN 算法原型无指导选取的随机性影响;学习过程在一定学习策略下,实现对原型集有效增添;更新过程,设置了不同的更新阀值,通过周期性删除T P 中不符合条件的原型,完成T P 集的动态更新,进而去除类中心原型、孤立点及噪音,较好克服传统CNN 算法只增加、不删除原型的弊端.6期李娟等:考虑局部均值和类全局信息的快速近邻原型选择算法11192.2.1原型集初始化过程原型集初始化过程包含两个功能:1)通过随机提取各类训练样本,获取类样本的平均距离、类均值中心节点的全局信息,作为原型集初始原型选择依据;2)在类全局信息指导下,为每类样本随机选取f (本文一般设置f=2,当不平衡数据集时,f=1)个初始原型加入T P,降低了CNN算法新类别原型选取的随机性;同时获取并填充T P中各原型的同异类近邻节点,更新类均值中心节点.LCNN的原型集初始化过程描述如下:输入.训练样本集T R.输出.GD、P S.步骤1.初始化GD=∅,P S=∅.步骤2.随机从T R中读取一定比例的训练样本,完成GD信息的填充.步骤3.i=1.步骤4.j=1.步骤5.从第i类样本中读取任一样本x,若其满足GD(i,2)<d(x,GD(i,1))<3×GD(i,2),加入到原型集,j=j+1.步骤6.若j<f,转到步骤5.步骤7.i=i+1,若i<m,转到步骤4.步骤8.逐类别逐原型完成对GD和P S的数据填充.步骤9.输出GD、P S.2.2.2原型学习过程LCNN算法是个增量学习算法,整个算法单遍扫描训练样本集,从读取第一个未被扫描样本开始,直至所有待学习样本学习完毕,获取最终原型集.当一个样本的同类近邻局部均值大于样本的异类近邻局部均值时,该样本被选作原型加入到原型集;同时,判断样本与其类中心点距离是否大于最近邻与类中心点距离,如大于则将其选作原型加入到原型集.LCNN的原型学习过程描述如下:输入.GD、P S、λ及T R.输出.T P.步骤1.如T R不存在未被扫描样本,则输出T P,结束算法.步骤2.任取一未被扫描样本x.步骤3.根据x的类别信息c,获取x的s kx、h kx、GD(c,:).步骤4.若d(x,GD(c,1))<GD(c,2),转到步骤8.步骤5.使用式(1)和式(2)分别计算x的同异类近邻局部均值D s和D h.步骤6.若D s>D h,x被选作原型加入T P,同步设置P S(x,:)数据,转到步骤8.步骤7.若d(x,GD(c,1))>d(s1kx,GD(c,1)), x被选作原型加入T P,同步设置P S(x,:)数据.步骤8.若已学习样本数是λ的整数倍,则调用更新过程.步骤9.否则,转到步骤1.LCNN算法突破了CNN算法仅使用最近邻判别原型的简单方式,考虑到原型样本分布等因素,引入训练样本x的同异类局部均值,通过局部均值信息、类均值中心间关系作为判断原型的依据,即克服最近邻原型判别准则的偏好,在一定程度上实现了类边界原型的选取.同时减少了与类中心距离过近原型的添加,稀疏化类中心区域原型个数.2.2.3原型集更新过程更新过程引入了原型删减思想,每λ个样本学习后,调用原型集更新过程,定期删除不符合规则的原型,减少原型数目.本文依托类全局信息和原型的最近同异类近邻设定不同的更新阀值,用以处理不同情况的原型删除操作.对于任一p i∈T P,c j、c s 分别为p i与最近邻异类原型类别,执行两步骤更新操作;待T P原型扫描完毕,执行局部均值及类全局信息更新操作.LCNN的原型集更新过程描述如下:步骤1(孤立原型更新).当d(p i,GD(c j,1))≥3×GD(c j,2)且d(p i,T P(P S(Ind(p i),2)))> d(p i,T P(P S(Ind(p i),3)))表明该类原型为孤立点,删除此类原型,可降低孤立原型影响,则设置P S(Ind(p i),4)=1.步骤2(噪音等异常原型更新).当GD(c s,2)≤d(T P(P S(Ind(p i)),3)),GD(c s,1))且3×GD(c s, 2)>d(T P(P S(Ind(p i)),3)),GD(c s,1))时,利用p i相关局部均值和类全局信息进行判断.若p i的同类局部均值小于它的异类局部均值(D s<D h),同时p i异类原型处于非类边缘区域(即d(T P(P S(Ind(p i),3)),GD(c j,1))> d(T P(P S(Ind(p i),3)),GD(c s,1))或d(T P(P S (Ind(p i),3)),GD(c j,1))>d(T P(P S(Ind(p i),1)), GD(c s,1)),则表示p i为噪音,则设置P S(Ind(p i), 4)=1.步骤3(局部均值及类全局信息更新).原型扫描完毕,对所有更新标识的原型进行删除;更新原型集T P的P S结构信息;最后分类别计算类均值中心 Gj和类原型标准差距离 D j,更新GD(c j,1)= G j1120自动化学报40卷和GD (cj ,2)= D j .2.3关键概念及参数界定1)孤立原型界定:本文采用文献[16]的定义,把孤立原型定义为与类原型均值的距离超过3倍标准差距离的原型.2)近邻权重选取:本文选取了最简单的倒数距离加权参数,即w i =1/i ,w i 随着i 的增加而减小,对应的原型对新原型选取的影响越小.在未考虑全局信息情况,若近邻数为1,即对待学习样本只选取一个同类近邻和一个异类近邻,则局部均值学习退化为传统CNN 学习.LCNN 运行必须两个参数支撑:一是原型近邻数k ;二是更新周期λ.两种参数选取有预设、交叉验证和动态调整三种方式.其中预设和动态调整方式简单便捷,交叉验证方式需要多次验证运行才能获取较好的参数配置.因此,结合原型集增量生成方式,选择了动态调整设置方式,即伴随着原型集的动态变化,动态调整k 和λ.为简化问题,本文在一个更新周期λ内,将不同类别样本x 在T P 中各同异类近邻数设置为相同,且k ≤min(N 1,N 2,···,N m ).本文分别选取k = m min j =1N j ,λ= m j =1N j (N j 表示更新周期开始时T P 中类别c j (j =1,2,···,m )原型数, · 表示向上取整).3算法评估为了更好评估LCNN 算法的性能指标,本文选择了KNN 、CNN 、GCNN 、PSC 、ISSARC 以及ILVQ (Incremental learning vector quantiza-tion)[17]作为比较算法.其中LCNN 与GCNN 处理策略相似,均采取了同异类近邻思想,本文中GCNN 选取了ρ=0,0.1,0.25,0.5,0.75,0.99下的平均运行效率;PSC 主要思想是以空间划分策略尽可能获取类边界原型,本文选取文献[9]中获取最佳运行效率的r =6m 和r =8m;ISSARC 算法是在ICF 算法基础上进行改进的非增量原型选择算法,主要思想是考虑同异类近邻距离的限定,同时通过去除噪音的非增量ENN 算法的预先处理,降低了算法噪音敏感度,其ENN 算法运行采取了文献[15]中的参数设置;而LCNN 也以获取类边界原型为处理目标;ILVQ 是目前高压缩性的快速的增量原型生成算法之一(为简化ILVQ 运行,本文对λ和Ageold采取简单预设λ=Ageold = √n );LCNN 单遍扫描训练集,也体现了快速增量原型选择思想;而选择KNN 和CNN 则作为比较算法分类性能的参照,其中KNN 算法预设5个常见的k 值,分别为3、5、7、9、11.为了验证LCNN 算法的有效性,选择了两个图像识别数据集以及其他12个UCI 数据集(见表1)和3个大规模数据集[18],采用5次5折交叉验证获得对比算法的平均分类效率及分类速度.本文在奔腾IV Intel (R)Core (TM)2Du CPU E 83002.83GHz 1G 的PC 硬件支撑,Windows XP 32位及Matlab 7运行环境下获取实验数据.本文中采用分类精度=|T S correct||T S |×100%、压缩比率=|T P ||T R |×100%、运行时间(单位:秒)作为比较算法的评价指标.其中|T S correct |表示T S 在T R 或T P 作为训练集下被正确分类的样本数,|T R |、|T P |、|T S |分别表示T R 、T P 、T S 所包含的样本或原型数.表1UCI 基准数据集信息Table 1The information of UCI benchmark data sets数据集特征数类别数样本数Iris 43150Wine 133178Glass 96214Ionosphere 342351Cancer 92699Zoo 167101Heart 132270TAE 53151Liver disorders62345Spectf 442267Ecoli 78336Ctg20321263.1理论分析分析比较算法,其中KNN 算法的时间复杂度为O(dn 2n i ),CNN 算法时间复杂度为O(nN 2d +n 1N 2d ),GCNN 算法时间复杂度为O(n 2Nd +n 1N 2d ),PSC 算法时间复杂度为O(τrnd +n 1N 2d ),ISSARC 算法时间复杂度为O(n 3d +d t i =1M 2i +n 1N 2d ),ILVQ 算法时间复杂度为O(dnN +n 1N 2d ).LCNN 算法是增量学习算法,主要分为两部分:增量原型生成时间O(dnN )和原型分类时间O(n 1N 2d ),其整体时间复杂度为O(dnN +n 1N 2d ).上述公式中,n 为训练样本数,d 为样本维度,N 为最终原型数,r 为聚类数,τ为聚6期李娟等:考虑局部均值和类全局信息的快速近邻原型选择算法1121类迭代次数,n 1为测试样本数,t 为ISSARC 算法迭代周期, ti =1M i 为ENN 算法运行所得原型集规模.LCNN 算法对于所有的训练样本而言是近线性的,但后续原型分类所需时间复杂度为传统的近邻分类算法时间复杂度.LCNN 算法是一种增量算法,仅在原型生成过程中执行单遍样本扫描,并不需对训练集进行存储,因此,LCNN 具有处理大规模数据集的能力.3.2人工数据实验为验证LCNN 算法在大规模数据集的处理性能,本文选择文献[17]实验所使用的人工数据集进行增量的原型分类比较.图2和图3均为2维人工数据集:图2中包含5类数据,类别1和类别2满足2维高斯分布,类别3和类别4数据分布为2个同心圆,类别5满足正弦分布;图3在图2有效数据分布的基础上,加入了20%的均匀分布噪音将其随机分布到5个类别中.图2无噪音人工数据集Fig.2No noise artificial dataset图3含噪音的人工数据集Fig.3Noise artificial data set区别于文献[17]实验中多种样本读取序列和不同迭代次数,本文采取单遍随机样本读取序列的简单方式.除选择三种增量算法外,由于ISSARC 算法通过ENN 算法对噪音等异常数据进行了预先处理,提高了算法的抗噪能力,故而选择其作为对照算法.图4、图6、图8、图10为四种算法在图2数据集上原型生成情况,可以看出,LCNN 算法在保持原始样本集分布的情况,对其进行必要缩减,其结果同ISSARC 和ILVQ 算法结果具有可相较性.图5、图7、图9、图11为4种算法在图3数据集上原型生成情况,LCNN 算法除原型个数明显少于ILVQ 算法结果外,相对于ISSARC 算法而言,在一定程度上降低了噪音的敏感性,其噪音数据数量明显少于比较算法.其中,ISSARC 算法属于非增量算法,在人工数据实验中的运行时间消耗达10小时以上.3.3图像识别对比1)医学图像诊断识别为验证LCNN 的实用性能,本文选取了569幅乳腺癌症图像数据进行实验,该数据将每幅乳腺癌症图像提取30个维度详细描述,其中212个异常图像,357个正常图像.通过5次5折交叉验证得到比较算法的平均运行数据.表2数据表明LCNN 在癌症图像识别中有着明显的压缩、分类精度及运行时间优势,是一种可行性的原型选择算法.图4CNN 在无噪音数据集的原型集Fig.4The prototype set obtained by CNN on no noisedataset图5CNN 在噪音数据集的原型集Fig.5The prototype set obtained by CNN on noisedata set1122自动化学报40卷图6ISSARC 在无噪音数据集的原型集Fig.6The prototype set obtained by ISSARC on no noise dataset图7ISSARC 在噪音数据集的原型集Fig.7The prototype set obtained by ISSARC on noisedataset图8ILVQ 在无噪音数据集的原型集Fig.8The prototype set obtained by ILVQ on no noisedataset图9ILVQ 在噪音数据集的原型集Fig.9The prototype set obtained by ILVQ on noisedataset图10LCNN 在无噪音数据集的原型集Fig.10The prototype set obtained by LCNN on nonoise dataset图11LCNN 在噪音数据集的原型集Fig.11The prototype set obtained by LCNN on noisedata set2)数字手写体识别为进一步验证LCNN 实际问题解决能力,特选择了研究文献中常用的手写体数字光学数据集进行算法的比较.该数据集含0到9阿拉伯手写体数字的3823个训练图像信息和1797个测试图像信息.表3数据获取环境同表2.表3数据显示LCNN 较其他比较算法有着一致好的运行效率.其中CNN 算法因运算简单且无需进行原型集的删除等操作,所以运行时间较少;PSC 需要较大的运行开销来完成初始的聚类操作;GCNN 因需要动态计算δ而增加一定运行时间消耗;ILVQ 算法增加了原型周期动态更新操作,运行消耗较大.ISSARC 算法虽然保持最好的压缩比率,然而由于其自身调用ENN 算法的预先处理策略,增加了ISSARC 算法的运行时间消耗.综上,采取LCNN 算法解决实际问题,可有效降低数据规模,可配合其他高效分类算法更好地发挥其优势.3.4UCI 基准数据集实验除上述图像识别数据集外,为更全面验证算法有效性,本文选择的12个中小规模和3个大规模UCI 基准数据集,较全面涵盖了数据集的维度规模和样本规模多样化分布,实验环境同上.6期李娟等:考虑局部均值和类全局信息的快速近邻原型选择算法1123表2比较算法在乳腺癌数据集上的运行效率Table2Operational efficiency results obtained by compared algorithms on breast cancer data set算法KNN CNN GCNN PSC ISSARC ILVQ LCNN 分类精度93.6781.5578.2789.2773.8190.6192.14压缩比率10060.1621.2446.2711.3535.5215.98运行时间 2.409 6.202 3.5193 3.872 4.7269.641 2.752表3比较算法在数字手写体集上的运行效率Table3Operational efficiency results obtained by compared algorithms on handwritten digits dataset算法KNN CNN GCNN PSC ISSARC ILVQ LCNN 分类精度97.9992.0794.5793.2592.4895.5997.08压缩比率10041.3425.7233.9419.9631.5822.57运行时间756.39214.34595.28456.92612.58372.35247.47表4比较算法分类精度与压缩比的实验数据Table4Operational efficiency results obtained by compared algorithms on breast cancer data set 算法KNN CNN GCNN PSC ISSARC ILVQ LCNN 分类精度压缩比率分类精度压缩比率分类精度压缩比率分类精度压缩比率分类精度压缩比率分类精度压缩比率分类精度压缩比率Iris96.6710095.5059.7195.7812.3292.8964.8394.5423.6793.0745.0493.3328.63 Wine70.8010071.2367.9467.3223.5462.2173.7465.5718.5467.6441.1269.6515.82 Glass65.0810062.6466.5968.2749.2660.6972.7862.1222.4064.6928.7465.4326.43 Ionosphere88.6810085.8948.3384.3222.1786.1845.1987.458.7689.2919.9186.1618.79 Cancer96.5010088.127.15394.6116.9278.0510.5584.5514.5678.0510.5595.1425.35 Zoo83.2210088.1457.6788.7331.5278.2657.1976.4323.5187.1035.1992.6234.36 Heart76.2110067.2743.4075.4946.2479.5431.2368.3110.6080.3437.3876.5737.01 TAE77.7210056.3436.7564.4844.6172.2537.1857.5736.9270.2223.3376.6921.69 Liver disorders67.6110055.8016.6765.2642.3161.8867.2155.3620.8760.8715.8967.5114.06 Spectf71.9510063.0520.3673.3552.0779.4124.5672.3316.4877.9335.7480.1435.09 Ecoli86.4510077.4453.7176.7933.8574.9342.8978.6915.7082.2242.9680.1729.73 Ctg82.0910062.6842.9164.4918.7374.8641.6665.0510.0969.1812.2376.7612.48 Average82.8710072.8443.4379.2132.8075.1147.4272.3318.5176.7229.0180.0124.95基于表4数据,可以得到如下结论:对比CNN 和GCNN,LCNN在保持明显的数据优势情况下,有着较高比例数据集的分类精度优势;对比ILVQ 算法,除Ionosphere外,LCNN分类效率优势明显,同时保证了11个数据集上的高压缩比;对比PSC快速原型算法,LCNN在保持11个数据集的高分类效率之外,仍保持9个数据集的高压缩比率,体现了较好的分类效率和较高的压缩比率.相对于其他对比算法而言,ISSARC算法保持着明显的平均压缩优势;而LCNN算法仅有2个数据集的压缩率高于ISSARC算法.通过表5运行时间数据,可得出在小规模数据集下,对于KNN和CNN 而言,LCNN时间优势不足,而在较大规模数据集Ctg下,LCNN时间优势明显;此外,LCNN相对于GCNN、ISSARC、ILVQ算法而言,有着显著的运行时间优势;而相对于目前快速原型PSC算法, LCNN也有着明显的12取7和平均的两项时间优势.。
kd-tree原版论文
Field Name: Field Type Description
dom-elt range-elt split
domain-vector A point from kd-d space
range-vector A point from kr-d space
integer
The splitting dimension
Postconditions: if nearest represents the exemplar (d0 r0),
and exlist represents the exemplar set E,
and dom represents the vector d,
then (d0 r0) 2 E and None-nearer(E d d0).
Code:
1.
nearest-dist := in nity
2.
nearest := unde ned
3.
for ex := each exemplar in exlist
3.1
dist := distance between dom and the domain of ex
3.2
if dist < nearest-dist then
An intoductory tutorial on kd-trees
Andrew W. Moore Carnegie Mellon University
awm@
Extract from Andrew Moore's PhD Thesis: E cient Memory-based Learning for Robot Control PhD. Thesis Technical Report No. 209, Computer Laboratory, University of Cambridge. 1991.
基于复杂场景图的光线追踪渲染的Kd-tree构造
基于复杂场景图的光线追踪渲染的Kd-tree构造
陈立华; 王毅刚
【期刊名称】《《计算机应用与软件》》
【年(卷),期】2011(028)010
【摘要】在基于光线跟踪等的全局光照绘制中,改良空间划分结构一直是各种加速策略中重要的方法之一。
对常见的空间结构构建方法进行研究,针对复杂室内场景提出一种快速的分区构建方法。
首先,算法并不直接将整个空间进行剖分,而是采用分组策略,结合包围盒进行判断,将具有一定空间联系的场景实体合并成一定数量的组;之后,对每个组使用优化后的Kd-tree构建细分结构,并提出合理的终止条件。
与以往的方法相比,该方法构建的加速结构更适合于基于场景图构建的复杂室内环境,为快速生成真实感图形提供了有效的手段。
【总页数】3页(P235-237)
【作者】陈立华; 王毅刚
【作者单位】杭州电子科技大学图形图像研究所浙江杭州310018
【正文语种】中文
【中图分类】TP37
【相关文献】
1.光线追踪的kd-tree构造 [J], 邓维;周竹荣;陆琳睿
2.基于光线追踪的实时渲染技术分析 [J], 赵亮
3.基于复杂场景图的光线追踪渲染的Kd-tree构造 [J], 陈立华; 王毅刚
4.基于光线追踪的实时渲染技术分析 [J], 赵亮[1]
5.基于光线追踪的渲染 [J], 张芙蓉;侯怡帆;高露
因版权原因,仅展示原文概要,查看原文内容请购买。
一种节省内存的点云中K最近邻算法.kdh
文章编号:1006-1576(2008)07-0023-03一种节省内存的点云中K最近邻算法朱林华,蔡勇(西南科技大学计算机科学与技术学院,四川绵阳 621010)摘要:在已有经典KD树算法基础上,提出一种利用数组实现的、压缩式存储的KD树算法,该算法的优点为可以在保证搜索结果正确的前提下,极大减少算法运行时所需的内存空间,搜索效率与经典KD树算法基本相当。
通过多个点云数据的实测,验证了该算法的正确性及效率。
关键词:KD树;点云;K最近邻查询;内存节省中图分类号:O22; TP391.41 文献标识码:AAn EMS Memory-Saved K-Nearest Neighbor Algorithm in Point CloudZHU Lin-hua, CAI Yong(College of Computer Science & Technology, Southwest University of Science & Technology, Mianyang 621010, China) Abstract: Based on known KD Tree method, represent an EMS memory-saved KD Tree algorithm makes use of array and adopts compressing storage. With the proposed algorithm, the system can effectively get the right K-Nearest Neighbor algorithm for the specified position in a very large point set while greatly reducing requirements for EMS memory at runtime. Finally, validate the correctness and effectiveness of the algorithm through several point clouds tests.Keywords: KD tree; Point cloud; K-Nearest neighbor lookup; Space saving0 引言反求工程中,需要针对点云中的某个点,搜索与该点的Euclid距离小于某个阈值的点集,该问题可被看作是NNL问题(Nearest-Neighbor-Lookup,最近邻查询问题)的扩展。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Kinetic kd-Trees and Longest-Side kd-Trees∗Mohammad Ali Abam Mark de Berg Bettina Speckmann Department of Mathematics and Computer Science,TU EindhovenP.O.Box513,5600MB Eindhoven,The Netherlands mabam@win.tue.nl,mdberg@win.tue.nl,speckman@win.tue.nlABSTRACTWe propose a simple variant of kd-trees,called rank-basedkd-trees,for sets of points in R d.We show that a rank-basedkd-tree,like an ordinary kd-tree,supports range search que-ries in O(n1−1/d+k)time,where k is the output size.The main advantage of rank-based kd-trees is that they can beefficiently kinetized:the KDS processes O(n2)events in theworst case,assuming that the points follow constant-degreealgebraic trajectories,each event can be handled in O(log n)time,and each point is involved in O(1)certificates.We also propose a variant of longest-side kd-trees,calledrank-based longest-side kd-trees(RBLS kd-trees,for short),for sets of points in R2.RBLS kd-trees can be kinetizedefficiently as well and like longest-side kd-trees,RBLS kd-trees support nearest-neighbor,farthest-neighbor,and ap-proximate range search queries in O((1/ε)log2n)time.The KDS processes O(n3log n)events in the worst case,assum-ing that the points follow constant-degree algebraic trajecto-ries;each event can be handled in O(log2n)time,and each point is involved in O(log n)certificates.Categories and Subject DescriptorsF.2.2[Nonnumerical Algorithms and Problems]:Ge-ometrical problems and computations.General TermsAlgorithms,Theory.KeywordsKinetic data structures,kd-tree,longest-side kd-tree.∗M.A.was supported by the Netherlands’Organisation for Scientific Research(NWO)under project no.612.065.307. M.d.B.was supported by the Netherlands’Organisation for Scientific Research(NWO)under project no.639.023.301.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.SCG’07,June6–8,2007,Gyeongju,South Korea.Copyright2007ACM978-1-59593-705-6/07/0006...$5.00.1.INTRODUCTIONBackground.Due to the increased availability of GPS sys-tems and to other technological advances,motion data is be-coming more and more available in a variety of application areas:air-traffic control,mobile communication,geographic information systems,and so on.In many of these areas,the data are moving points in2-or higher-dimensional space, and what is needed is to store these points in such a way that range queries(“Report all the points lying currently inside a query range”)or nearest-neighbor queries(“Report the point that is currently closest to a query point”)can be answered efficiently.Hence,there has been a lot of work on developing data structures for moving point data,both in the database community as well as in the computational-geometry community.Within computational geometry,the standard model for designing and analyzing data structures for moving objects is the kinetic-data-structure framework introduced by Basch et al.[3].A kinetic data structure(KDS)maintains a dis-crete attribute of a set of moving objects—the convex hull, for example,or the closest pair—where each object has a known motion trajectory.The basic idea is that although all objects move continuously there are only certain discrete moments in time when the combinatorial structure of the attribute—the ordered set of convex-hull vertices,or the pair that is closest—changes.A KDS contains a set of certifi-cates that constitutes a proof that the maintained structure is correct.These certificates are inserted in a priority queue based on their time of expiration.The KDS then performs an event-driven simulation of the motion of the objects,up-dating the structure whenever an event happens,that is, when a certificate fails.Kinetic data structures and their ac-companying maintenance algorithms can be evaluated and compared with respect to four desired characteristics.A good KDS is compact if it uses little space in addition to the input,responsive if the data structure invariants can be re-stored quickly after the failure of a certificate,local if it can be updated easily when theflight plan for an object changes, and efficient if the worst-case number of events handled by the data structure for a given motion is small compared to some worst-case number of“external events”that must be handled for that motion—see the surveys by Guibas[8,9] for more details.Related work.There are several papers that describe KDS’s for the orthogonal range-searching problem,where the query range is an axis-parallel box.Basch et al.[4] kinetized d-dimensional range trees.Their KDS supports range queries in O(log d n+k)time and uses O(n log d−1n)storage.If the points follow constant-degree algebraic tra-jectories then their KDS processes O(n2)events;each eventcan be handled in O(log d−1n)time.In the plane,Agar-wal et al.[1]obtained an improved solution:their KDSsupports orthogonal range-searching queries in O(log n+k)time,it uses O(n log n/log log n)storage,and the amortizedcost of processing an event is O(log2n).Although these results are nice from a theoretical per-spective,their practical value is limited for several reasons.First of all,they use super-linear storage,which is often un-desirable.Second,they can perform only orthogonal rangequeries;queries with other types of ranges or nearest-neighborsearches are not supported.Finally,especially the solution by Agarwal et al.[1]is rather complicated.Indeed,in the setting where the points do not move,the static counterparts of these structures are usually not used in practice.Instead, simpler structures such as quadtrees,kd-trees,or bounding-volume hierarchies(R-trees,for instance)are used.In this paper we consider one of these structures,namely the kd-tree.Kd-trees were initially introduced by Bentley[5].A kd-tree for a set of points in the plane is obtained recursivelyas follows.At each node of the tree,the current point setis split into two equal-sized subsets with a line.When thedepth of the node is even the splitting line is orthogonal to the x-axis,and when it is odd the splitting line is or-thogonal to the y-axis.In d-dimensional space,the orien-tations of the splitting planes cycle through the d axes in a similar manner.Kd-trees use O(n)storage and support orthogonal range searching queries in O(n1−1/d+k)time, where k is the number of reported points.Maintaining a standard kd-tree kinetically is not efficient.The problem is that a single event—two points swapping their order on x-or y-coordinate—can have a dramatic effect:a new point entering the region corresponding to a node could mean that almost the entire subtree must be re-structured.Hence,a variant of the kd-tree is needed when the points are moving. Agarwal et al.[2]proposed two such variants for moving points in R2:theδ-pseudo kd-tree and theδ-overlapping kd-tree.In aδ-pseudo kd-tree each child of a nodeνcan be associated with at most(1/2+δ)nνpoints,where nνis the number of points in the subtree ofν.In aδ-overlapping kd-tree the regions corresponding to the children ofνcan over-lap as long as the overlapping region contains at mostδnνpoints.Both kd-trees support orthogonal range queries in time O(n1/2+ε+k),where k is the number of reported points. Hereεis a positive constant that can be made arbitrarily small by choosingδappropriately.These KDS’s process O(n2)events if the points follow constant-degree algebraic trajectories.Although it can take up to O(n)time to han-dle a single event,the amortized cost is O(log n)time per event.Neither of these two solutions is completely satis-factory:their query time is worse by a factor O(nε)than the query time in standard kd-trees,there is only a good amortized bound on the time to process events,and only a solution for the2-dimensional case is given.The goal of our paper is to developed a kinetic kd-tree variant that does not have these drawbacks.Even though a kd-tree can be used to search with anytype of range,there are only performance guarantees for or-thogonal ranges.Longest-side kd-trees,introduced by Dick-erson et al.[7],are better in this respect.In a longest-side kd-tree,the orientation of the splitting line at a node is not determined by the level of the node,but by the shape of itsregion:namely,the splitting line is orthogonal to the longest side of the region.Although a longest-side kd-tree does not have performance guarantees for exact range searching, it has very good worst-case performance forε-approximate range queries,which can be answered in O(ε1−d log d n+k) time.(In anε-approximate range query,points that arewithin distanceε·diameter(Q)of the query range Q may also be reported.)Moreover,a longest-side kd-tree can an-swerε-approximate nearest-neighbor queries(or:farthest-neighbor queries)in O(ε1−d log d n)time.The second goal of our paper is to develop a kinetic variant of the longest-side kd-tree.Our results.Ourfirst contribution is a new and simplevariant of the standard kd-tree for a set of n points in d-dimensional space.Our rank-based kd-tree supports orthog-onal range searching in time O(n1−1/d+k)and it uses O(n) storage—just like the original.But additionally it can be kinetized easily and efficiently.The rank-based kd-tree pro-cesses O(n2)events in the worst case if the points follow constant-degree algebraic trajectories1and each event can be handled in O(log n)worst-case time.Moreover,each point is involved only in a constant number of certificates. Thus we improve the both the query time and the event-handling time as compared to the planar kd-tree variants of Agarwal et al.[2],and in addition our results work in any fixed dimension.Our second contribution is thefirst kinetic variant of the longest-side kd-tree,which we call the rank-based longest-side kd-tree(or RBLS kd-tree,for short),for a set of n points in the plane.(We have been unable to generalize this result to higher dimensions.)An RBLS kd-tree uses O(n)space and supports approximate nearest-neighbor,ap-proximate farthest-neighbor,and approximate range queries in the same time as the original longest-side kd-tree does for stationary points,namely O((1/ε)log2n)(plus the time needed to report the answers in case of range searching). The kinetic RBLS kd-tree maintains O(n)certificates,pro-cesses O(n3log n)events if the points follow constant-degree algebraic trajectories1,each event can be handled in O(log2n) time,and each point is involved in O(log n)certificates.2.RANK-BASED KD-TREESLet P be a set of n points in R d and let us denote the coordinate-axes with x1,...,x d.To simplify the discussion we assume that no two points share any coordinate,that is, no two points have the same x1-coordinate,or the same x2-coordinate,etc.(Of course coordinates will temporarily be equal when two points swap their order,but the description below refers to the time intervals in between such events.) In this section we describe a variant of a kd-tree for P,the rank-based kd-tree.A rank-based kd-tree preserves all main properties of a kd-tree and,additionally,it can be kinetized efficiently.Before we describe the actual rank-based kd-tree for P, wefirst introduce another tree,namely the skeleton of a rank-based kd-tree,denoted by S(P).Like a standard kd-1For the bound on the number of events in our rank-based kd-tree,it is sufficient that any pair of points swaps x-or y-order O(1)times.For the bounds on the number of events in the RBLS kd-tree,we need that every two pairs of points define the same x-or y-distance O(1)times.(b)Figure1:(a)The skeleton of a rank-based kd-tree and(b)the rank-based kd-tree itself.tree,S(P)uses axis-orthogonal splitting hyperplanes to di-vide the set of points associated with a node.As usual,the orientation of the axis-orthogonal splitting hyperplanesis alternated between the coordinate axes,that is,wefirstsplit with a hyperplane orthogonal to the x1-axis,then witha hyperplane orthogonal to the x2-axis,and so on.Letνbe a node of S(P).h(ν)is the splitting hyperplane stored atν,axis(ν)is the coordinate-axis to which h(ν)is orthog-onal,and P(ν)is the set of points stored in the subtreerooted atν.A nodeνis called an x i-node if axis(ν)=x iand a nodeωis referred to as an x i-ancestor of a nodeνifωis an ancestor ofνand axis(ω)=x i.Thefirst x i-ancestor of a nodeν(that is,the x i-ancestor closest toν)is the x i-parent(ν)ofν.A standard kd-tree chooses h(ν)such that P(ν)is di-vided roughly in half.In contrast,S(P)chooses h(ν)basedon a range of ranks associated withν,which can have theeffect that the sizes of the children ofνare completely un-balanced.We now explain this construction in detail.We use d arrays A1,...,A d to store the points of P in d sorted lists;the array A i[1,n]stores the sorted list based on the x i-coordinate.As mentioned above,we associate a range[r,r ]of ranks with each nodeν,denoted by range(ν),with1≤r≤r ≤n.Letνbe an x i-node.If x i-parent(ν)does not exist,then range(ν)is equal to[1,n].Otherwise,ifνis con-tained in the left subtree of x i-parent(ν),then range(ν)is equal to thefirst half of range(x i-parent(ν)),and ifνis con-tained in the right subtree of x i-parent(ν),then range(ν)is equal to the second half of range(x i-parent(ν)).If range(ν)=[r,r ]then P(ν)contains at most r −r+1points.We ex-plicitly ignore all nodes(both internal as well as leaf nodes)that do not contain any points,they are not part of S(P),independent of their range of ranks.A nodeνis a leafof S(P)if range(ν)=[j,j]for some j.Clearly a leaf con-tains exactly one point,but not every node that contains only one point is a leaf.(We could prune these nodes, which always have a range[j,k]with j<k,but we chose to keep them in the skeleton for ease of description.)Ifνis not a leaf and axis(ν)=x i then h(ν)is defined by the point whose rank in A i is equal to the median of range(ν). (This is similar to the approach used in the kinetic BSP of [6].)It is not hard to see that this choice of the splitting plane h(ν)is equivalent to the following.Let region(ν)= [a1:b1]×···×[a d:b d]and suppose for example thatνis an x1-node.Then,instead of choosing h(ν)according to the median x1-coordinate of all points in region(ν),we choose h(ν)according to the median x1-coordinate of all points in the slab[a1,b1]×[−∞:∞]×···×[−∞:∞].We construct S(P)incrementally by inserting the pointsof P one by one.(Even though we proceed incrementally,we still use the rank of each point with respect to the whole point set,not with respect to the points inserted so far.) Let p be the point that we are currently inserting into the tree and letνbe the last node visited by p;initiallyν=root(S(P)).Depending on which side of h(ν)contains p we select the appropriate childωofνto be visited next.Ifωdoes not exist,then we create it and compute range(ω)asdescribed above.We recurse withν=ωuntil range(ν)=[j,j]for some j.We always reach such a node after d log nsteps,because the length of range(ν)is a half of the lengthof range(x i-parent(ν))and depth(ν)=depth(x i-parent(ν))+ d for an x i-nodeν.Figure1(a)illustrates S(P)for a set of eight points.Since each leaf of S(P)contains exactly onepoint of P and the depth of each leaf is d log n,the sizeof S(P)is O(n log n).Lemma1.The depth of S(P)is O(log n)and the size of S(P)is O(n log n)for anyfixed dimension d.S(P)can be constructed in O(n log n)time.A nodeν∈S(P)is active if and only if both its childrenexist,that is,both its children contain points.A nodeνisuseful if it is either active,or a leaf,or itsfirst d−1ancestorscontain an active node.Otherwise a node is useless.We derive the rank-based kd-tree for P from the skeleton by pruning all useless nodes from S(P).The parent of a nodeνin the rank-based kd-tree is thefirst unpruned ancestor ofνin S(P).Roughly speaking,in the pruning phase every long path whose nodes have only one child each is shrunk to a path whose length is less than d.The rank-based kd-tree has exactly n leaves and each contains exactly one point of P.Moreover,every nodeνin the rank-based kd-tree is either active or it has an active ancestor among itsfirst d−1 ancestors.The rank-based kd-tree derived from Figure1(a) is illustrated in Figure1(b).Lemma2.(i)A rank-based kd-tree on a set of n points in R d hasdepth O(log n)and size O(n).(ii)Letνbe an x i-node in a rank-based kd-tree.In the subtree rooted at a child ofν,there are at most2d−1x i-nodesωsuch that x i-parent(ω)=ν.(iii)Letνbe an x i-node in a rank-based kd-tree.On everypath starting atνand ending at a descendant ofνandcontaining at least2d−1nodes,there is an x i-nodeωsuch that x i-parent(ω)=ν.Proof.(i)A rank-based kd-tree is at most as deep as its skele-ton S(P).Since the depth of S(P)is O(log n)byLemma1,the depth of a rank-based kd-tree is alsoO(log n).To prove the second claim,we charge everynode that has only one child to itsfirst active ances-tor.Recall that each active node has two children.We charge at most2(d−1)nodes to each active node,because after pruning there is no path in the rank-based kd-tree whose length is at least d and in whichall nodes have one child.Therefore,to bound the sizeof the rank-based kd-tree it is sufficient to bound thenumber of active nodes.Let T be a tree containing allactive nodes and all leaves of the rank-based kd-tree.A nodeνis the parent of a nodeωin T if and onlyifνis thefirst active ancestor ofωin the rank-basedkd-tree.Obviously,T is a binary tree with n leaveswhere each internal node has two children.Hence,thesize of T is O(n)and consequently the size of the rank-based kd-tree is O(n).(a)(b)Figure2:Illustration for the proof of Lemma2. (ii)To simplify notation,letω denote the node in S(P) that corresponds to a nodeωin the rank-based kd-tree.Let z be a child ofνand and let u be thefirst active node in the subtree rooted at z as depicted in Fig.2(a),that is,u is the highest active node in the subtree rooted at z.Note that the definition of active node ensures that u is unique,and note that u can be z.Now assume x i-parent(ω)=νwhereωis an x i-nodein the subtree rooted at z.Ifωis not a node in the subtree rooted at u,then there is just one nodeωin the subtree rooted at z satisfying x i-parent(ω)=ν,since every node between z and u has only one child.This means that we are done.Otherwise,ifωis a node in the subtree rooted at u,thenω must be in the subtree rooted at u of S(P).Let s be thefirst x i-node on the path from u toω .Because one of any d consecutive nodes in S(P)uses a hyperplane orthogonal to the x i-axis as a splitting plane,depth(s )≤depth(u )+d−1.Since u is active and depth(s )≤depth(u )+d−1,the node s must appear as a node,s,in the rank-based kd-tree.This and the assumption that x i-parent(ω)=νimplyω=s which means depth(ω )≤depth(u )+d−1.Hence the number of nodesωis at most2d−1. (iii)Let u be thefirst active node on the path starting atνand ending at a descendant z ofνand containing at least2d−1nodes as depicted in Fig.2(b).Because there is no path in the rank-based kd-tree that con-tains d nodes such that every node in the path has only one child,depth(u)≤depth(ν)+d−1which im-plies depth(z)≥depth(u)+d−1—note that on the path fromνto z there are2d−1nodes.Letω be thefirst x i-node in the path starting at u and ending at z in S(P).Because one of any d consecutive nodes in S(P)uses a hyperplane orthogonal to the x i-axis to split points,and depth(z )≥depth(u )+d−1,the nodeω exists.The nodeω must appear as a node,ω,in the kd-tree,because eitherω =u or among the first d−1ancestor ofω there is an active ancestor, namely u .Putting it all together we can conclude that depth(ω)≤depth(ν)+2d−2which implies the claim.The region associated with a node ν,denoted by region(ν),is the maximal volume bounded by the splitting hyperplanes stored at the ancestors of ν.More precisely,the region as-sociated with the root of a rank-based kd-tree is simply the whole region,and the region corresponding to the right child of a node νis the maximal subregion of region(ν)on the right side of h (ν)and the region corresponds to the left child of νis the rest of region(ν)(for an appropriate definition of left and right in d dimensions).A point p is contained in P (ν)if and only if p lies in region(ν).Like a kd-tree,a rank-based kd-tree can be used to report all points inside a given orthogonal range search query—the reporting algorithm is exactly the same.At first sight,the fact that the splits in our rank-based kd-tree can be very unbalanced may seem to have a big,negative impact on the query time.Fortunately this is not the case.To prove this,we next bound the num-ber of cells intersected by an axis-parallel plane h .As for normal kd-trees,this is immediately gives a bound on the total query time.Lemma 3.Let h be a hyperplane orthogonal to the x i -axis for some i .The number of nodes in a rank-based kd-tree whose regions are intersected by h is O (n 1−1/d ).Proof.Imagine a dummy node μwith axis(μ)=x i as the parent of the root.We charge every node νwhose region is intersected by h to x i -parent(ν).Thanks to μ,x i -parent(ν)exists for every node of the tree and hence every node is in-deed charged to an x i -node.Lemma 2(iii)implies depth(ν)≤depth(x i -parent(ν))+2d −2which implies that at most 22d −2nodes are charged to each x i -node.Therefore it is sufficient to bound the number of x i -nodes whose regions are inter-sected by h .Let T be the tree containing all x i -nodes in the rank-based kd-tree and let T be the tree containing all x i -nodes in the skeleton S (P ).A node νis the parent of a node ωin T if and only if x i -parent(ω)=νin the rank-based kd-tree;the equivalent definition holds for T .According to Lemma 2(ii),every node νin T has at most 2d children and each side of h (ν)contains the regions corresponding to at most 2d −1children of ν.Note that the dummy node has at most 2d −1children in total.Let T ∗be yet another tree containing all nodes in T whose regions are intersected by h .Since h is parallel to h (ν)for every node νof T ,it can intersect only the regions that lie to one side of h (ν).Hence every node of T ∗has at most 2d −1children.The idea behind the proof is to consider a top part of T ∗consisting of n 1−1/d nodes of T ∗,and then argue that all subtree below this top part together contain n 1−1/d nodes as well.Next we make this idea precise.Let TOP(T ∗)be a tree containing all nodes of T ∗whose depths in T ∗are at most (1/d )log n ,and let ν1,...,νc be the leaves of TOP(T ∗)whose depth is exactly (1/d )log n .Clearly c is at most (2d −1)(1/d )log n =n 1−1/d and hencethe size of TOP(T ∗)is at most 2n 1−1/d .Let ν 1,...,ν c be the nodes corresponding to ν1,...,νc in T.Furthermore,let u 1,...,u m be the distinct nodes in Tat depth (1/d )log nsuch that every u k has at least one node νj as descendantand every ν j has a node uk as an ancestor—note that dueto pruning the depth of νjcan be larger than (1/d )log n .Because the nodes ν j are disjoint,we have c 1|P (νj )|≤ m 1|P (uk )|.Let U k be the set of splitting hyperplanes stored in theancestors of u k in T.Recall that all nodes u k are x i -nodes whose regions are intersected by h .Furthermore,allnodes u k have the same depth in T.Together this implies that U k =U l for all 1≤k,l ≤m because their x i -ranges must be the same.Let h 1be the last hyperplane in U k on the left side of region(u 1)and let h 2be the first hyperplane in U k on the right side of region(u 1).Because U k =U l forall 1≤k,l ≤m ,all regions uk are bounded by h 1and h 2.We know that range(u k )contains n/2(1/d )log n=n 1−1/d ranks,hence there are at most n 1−1/d points inside the re-gion bounded by h 1and h 2.Since the nodes u k are disjointand the region bounded by h 1and h 2contains n 1−1/dpoints,we have m 1|P (u k )|≤n 1−1/dwhich implies c 1|P (νj )|= c1|P (ν j )|≤n1−1/d.Finally,let f (n )denote the number of x i -nodes whose regions are intersected by h .We have f (n )=|TOP(T ∗)|+ c 1f (|P (νj )|).Since f (|P (νj )|)≤|P (νj )|, c1|P (νj )|≤n 1−1/d ,and |TOP(T ∗)|≤2n 1−1/d ,we can conclude thatf (n )=O (n 1−1/d).The following theorem summarizes our results.Theorem 4.A rank-based kd-tree for a set P of n points in d dimensions uses O (n )storage and can be built in O (n log n )time.An orthogonal range search query on a rank-based kd-tree takes O (n 1−1/d +k )time where k is the number of reported points.The KDS.We now describe how to kinetize a rank-based kd-tree for a set of continuously moving points P .The com-binatorial structure of a rank-based kd-tree depends only on the ranks of the points in the arrays A i ,that is,it does not change as long as the order of the points in the arrays A i remains the same.Hence it suffices to maintain a certifi-cate for each pair p and q of consecutive points in every array A i ,which fails when p and q change their order.Now assume that a certificate,involving two points p and q and the x i -axis,fails at time t .To handle the event,we simply delete p and q and re-insert them in their new order.(Dur-ing the deletion and re-insertion there is no need to change the ranks of the other points.)These deletions and inser-tions do not change anything for the other points,because their ranks are not influenced by the swap and the deletion and re-insertion of p and q .Hence the rank-based kd-tree remains unchanged except for a small part that involves p and q .A detailed description of this “small part”can be found below.Deletion.Let νbe the first active ancestor of the leaf μcontaining p —see Figure 3(a).The leaf μand all nodes on the path from μto νmust be deleted,since they do not contain any points anymore (they only contained p and p is now deleted).Furthermore,νstops being active.Let ωbe the first active descendent of νif it exists and otherwise let ωbe the leaf whose ancestor is ν.There are at most d nodes on the path from νto ω.Since νis not active anymore,any of the nodes on this path might become useless and hence have to be deleted.Insertion.Let νbe the highest node in the rank-based kd-tree such that its region contains p and the region corre-sponding to its only child ωdoes not contain p —note that p cannot reach a leaf when we re-insert p ,because the range of a leaf is [j,j ]for some j and there cannot be two points in this range.Let ν and ω be the nodes in S (P )correspond-ing to νand ω.Let u be the lowest node on the path from ν(a)(b)Figure3:Deleting and inserting point p.toω whose region contains both region(ω )and p as illus-trated in Figure3(b)—note that we do not maintain S(P) explicitly but with the information maintained inνandωthe path betweenν andω can be constructed temporarily. Because u will become an active node,it must be added to the rank-based kd-tree and also every node on the path fromu toω must be added to the rank-based kd-tree if they are useful.From u ,the point p follows a new path u 1,...,u k which is created during the insertion.Allfirst d−1nodesin the list u 1,...,u k and the leaf u k must be added to the rank-based kd-tree—note that range(u k)=[j,j]for some j.Theorem5.A kinetic rank-based kd-tree for a set P of n moving points in d dimensions uses O(n)storage and pro-cesses O(n2)events in the worst case,assuming that the points follow constant-degree algebraic trajectories.Each event can be handled in O(log n)time and each point is in-volved in O(1)certificates.3.RANK-BASED LONGEST-SIDE KD-TREES Longest-side kd-trees are a variant of kd-trees that choose the orientation of the splitting hyperplane for a nodeνac-cording to the shape of the region associated withν,always splitting the longest sidefirst.Dickerson et al.[7]showed that a longest-side kd-tree can be used to answer the follow-ing queries quickly:(1+ε)-nearest neighbor query:For a set P of points in R d,a query point q∈R d,andε>0,this query re-turns a point p∈P such that d(p,q)≤(1+ε)d(p∗,q),where p∗is the true nearest neighbor to q and d(·,·)denotes the Euclidean distance.(1−ε)-farthest neighbor query:For a set P of points in R d,a query point q∈R d,andε>0,this query re-turns a point p∈P such that d(p,q)≥(1−ε)d(p∗,q),where p∗is the true farthest neighbor to q.ε-approximate range search query:For a set P of points in R d,a query region Q with diameter D Q,andε>0,this query returns(or counts)a set P such that P∩Q⊂P ⊂P and for every point p∈P ,d(p,Q)≤εD Q.The main property of a longest-side kd-tree—which is used to bound the query time—is that the number of disjoint re-gions associated with its nodes and intersecting at least twoopposite sides of a hypercube C is bounded by O(log d−1n). It seems difficult to directly kinetize a longest-side kd-tree.Hence,using similar ideas as in the previous section,weintroduce a simple variation of2-dimensional longest-side kd-trees,so called ranked-based longest-side kd-trees(RBLS kd-trees,for short).An RBLS kd-tree does not only pre-serve all main properties of a longest-side kd-tree but it can be kinetized easily and efficiently.As in the previous sec-tion wefirst describe another tree,namely the skeleton of an RBLS kd-tree denoted by S(P).We then show how to extract an RBLS kd-tree from the skeleton S(P)by pruning. We recursively construct S(P)as follows.We again use two arrays A1and A2to store the points of P in two sorted lists;the array A i[1,n]stores the sorted list based on the x i-coordinate.Let the points in P be inside a box,which is the region associated with the root,and letνbe a node whose subtree must be constructed;initiallyν=root(S(P)).If P(ν)contains only one point,then the subtree is just a single leaf,i.e,νis a leaf of S(P).(Note that this is slightly differ-ent from the previous section.)If P(ν)contains more than one point,then we have to determine the proper splitting line.Let the longest side of region(ν)be parallel to the x i-axis.We set axis(ν)to be x i.If x i-parent(ν)does not exist, then we set range(ν)=[1,n].Otherwise,ifνis contained in the left subtree of x i-parent(ν),then range(ν)is equal to thefirst half of range(x i-parent(ν)),and ifνis contained in the right subtree of x i-parent(ν),then range(ν)is equal to the second half of range(x i-parent(ν)).The splitting line of ν,denoted by l(ν),is orthogonal to axis(ν)and specified by the point whose rank in A i is the median of range(ν).If there is a point of P(ν)on the left side of l(ν)(on the right side of l(ν)or on l(ν)),a node is created as the left child (the right child)ofν.The points of P(ν)which are on the left side of l(ν)are associated with the left child ofν,the remainder is associated with the right child ofν.The region of the right child is the maximal subregion of region(ν)on the right side of l(ν)and the region of the left child is the rest of region(ν).Lemma6.The depth of S(P)is O(log n),the size of S(P)is O(n log n),and S(P)can be constructed in O(n log n) time.Proof.Assume for contradiction that the depth of a leafνis at least2log n+1.Now consider the path from the root toν.Because there are only two distinct axes, there are at least log n+1nodes on this path whose axes are the same,for example x i.Letν1,...,νk be these nodes. Since|range(νj+1)|≤ (1/2)|range(νj)| (j=1,...,k−1) and k>log n,νk must be empty,which is a contradiction. Hence the depth of S(P)is O(log n).Since each leaf contains exactly one point and the depth of S(P)is O(log n),the size of S(P)is O(n log n).Furthermore it is easy to see that it takes O(|P(ν)|)time to split the points at a nodeν.Hence we spend O(n)time at each level of S(P)during construction,for a total construction time of O(n log n).The following lemma shows that RBLS kd-trees preservethe main property of longest-side kd-trees,which is used to bound the query time.。