A Distributed Weighted Cluster Based Routing Protocol for MANETs

合集下载

稳定且负载均衡的移动Ad Hoc网络加权分簇算法

ｈｔｔｐ：／／ｗｗｗ．ｃ－Ｓ－ａ．ｏｒｇ．ｃａ
２０１６年第２５卷第５期
分簇算法【６］，为每个节点分配一个组合权值（ｗｅｉｇｈｔ）来
则节点１，的稳定性函数为：
决定其作为簇头的概率，组合权值考虑了多种因素，包括节点移动性（ｍｏｂｉｌｉｔｙ）、节点能量（ｅｎｅｒｇｙ）￣Ｈ节点度
一些传统的分簇算法只考虑节点某一方面的性能来选举簇头，不能满足最优簇头的选举．所以，学者们提出了多种簇头选举算法．其中，最高节点度分簇算法（ＨＤＣＡ）是一种致力于提高网络控制能量和降低簇头数目的分簇算法，其将每个节点以白色标记，当一个白色节点在邻居白色节点中具有最高的节点度时，则被选举为簇头【，这种算法适合于移动性较弱且节点密度较低的网络．加权分簇算法（ＷｅｉｇｈｔｅｄＣｌｕｓｔｅｒｉｎｇＡｌｇｏｒｉｔｈｍ，ＷＣＡ）是一种综合考虑节点多种属性的典型分簇算法，为每个节点根据其作为簇头的合适度来分配权值［５］．传统ＷＣＡ算法是一种组合加权
ｃｏｍｂｉｎａｔｉｏｎｗｅｉｇｈｔｓｃｏｍｐｕｔｉｎｇｏｆｎｏｄｅｓ，ａｎｄａ “ｒｅｌａｔｉｖｅｌｙｔｙｐｉｃａｌｎｏｄｅｄｅｇｒｅｅ’’ｉｓｐｒｏｐｏｓｅｄｔｏｓｕｂｓｔｉｔｕｔｅｔｈｅｓｉｍｐｌｅｎｏｄｅｄｅｇｒｅｅｆａｃｔｏｒｓｉｎｔｒａｄｉｔｉｏｎａｌＷＣＡ．Ｉｎｔｈｅｍｅａｎｔｉｍｅ，ａｃｃｏｒｄｉｎｇｔｏｔｈｅｄｅｇｒｅｅｏｆｅａｃｈｎｏｄｅ，ＳＬＢ—ＷＣＡｆｏｒｍｕｌａｔｅｓｌｏｃａｌｃｌｕｓｔｅｒｓｉｚｅｃｏｎｓｔｒａｉｎｔｓｉｎｓｔｅａｄｏｆｇｌｏｂａｌｃｏｎｓｔｒａｉｎｔｓｉｎｔｒａｄｉｔｉｏｎａｌａｌｇｏｒｉｔｈｍ．ＳＬＢ—ＷＣＡｃａｎｍａｋｅｎｏｄｅｗｅｉｇｈｔｓｃａｌｃｕｌａｔｉｏｎａｎｄｃｌｕｓｔｅｒｓｉｚｅｃｏｎｔｒｏｌｍｏｒｅｒｅａｓｏｎａｂｌｙ，ａｎｄｍａｋｅｔｈｅｎｅｔｗｏｒｋｌｏａｄｍｏｒｅｂａｌａｎｃｅｄ．ＣｏｍｐａｒｅｄｗｉｔｈｔｒａｄｉｔｉｏｎａｌＷＣＡ，ＳＬＢ—ＷＣＡｈａｓｌｅｓｓｎｕｍｂｅｒｏｆｃｌｕｓｔｅｒｈｅａｄｓ，ｂｅｔｔｅｒｎｅｔｗｏｒｋｃｏｖｅｒａｇｅ，ｗｈｉｃｈｉｍｐｒｏｖｅｓｔｈｅｎｅｔｗｏｒｋｌｉｆｅｔｉｍｅｅｆｆｅｃｔｉｖｅｌｙ￣Ｋｅｙｗｏｒｄｓ：ｍｏｂｉｌｅＡｄＨｏｃｎｅｔｗｏｒｋ；ｗｅｉｇｈｔｅｄｃｌｕｓｔｅｒｉｎｇａｌｇｏｒｉｔｈｍ；ｌｏａｄｂａｌａｎｃｉｎｇ；ｎｅｔｗｏｒｋｃｏｖｅｒａｇｅ

Leveraging the power of local spatial autocorrelation in geophysical interpolative clustering-DMKD

Data Min Knowl DiscDOI10.1007/s10618-014-0372-zLeveraging the power of local spatial autocorrelationin geophysical interpolative clusteringAnnalisa Appice·Donato MalerbaReceived:16December2012/Accepted:22June2014©The Author(s)2014Abstract Nowadays ubiquitous sensor stations are deployed worldwide,in order to measure several geophysical variables(e.g.temperature,humidity,light)for a grow-ing number of ecological and industrial processes.Although these variables are,in general,measured over large zones and long(potentially unbounded)periods of time, stations cannot cover any space location.On the other hand,due to their huge vol-ume,data produced cannot be entirely recorded for future analysis.In this scenario, summarization,i.e.the computation of aggregates of data,can be used to reduce the amount of produced data stored on the disk,while interpolation,i.e.the estimation of unknown data in each location of interest,can be used to supplement station records. We illustrate a novel data mining solution,named interpolative clustering,that has the merit of addressing both these tasks in time-evolving,multivariate geophysical appli-cations.It yields a time-evolving clustering model,in order to summarize geophysical data and computes a weighted linear combination of cluster prototypes,in order to predict data.Clustering is done by accounting for the local presence of the spatial autocorrelation property in the geophysical data.Weights of the linear combination are deﬁned,in order to reﬂect the inverse distance of the unseen data to each clus-ter geometry.The cluster geometry is represented through shape-dependent sampling of geographic coordinates of clustered stations.Experiments performed with several data collections investigate the trade-off between the summarization capability and predictive accuracy of the presented interpolative clustering algorithm.Responsible editors:Hendrik Blockeel,Kristian Kersting,Siegfried Nijssen and FilipŽelezný.A.Appice(B)·D.MalerbaDipartimento di Informatica,Universitàdegli Studi di Bari“Aldo Moro”,Via Orabona4,70125Bari,Italye-mail:annalisa.appice@uniba.itD.Malerbae-mail:donato.malerba@uniba.itA.Appice,D.Malerba Keywords Spatial autocorrelation·Clustering·Inverse distance weighting·Geophysical data stream1IntroductionThe widespread use of sensor networks has paved the way for the explosive living ubiquity of geophysical data streams(i.e.streams of data that are measured repeatedly over a set of locations).Procedurally,remote sensors are installed worldwide.They gather information along a number of variables over large zones and long(potentially unbounded)periods of time.In this scenario,spatial distribution of data sources,as well as temporal distribution of measures pose new challenges in the collection and query of data.Much scientiﬁc and industrial interest has recently been focused on the deployment of data management systems that gather continuous multivariate data from several data sources,recognize and possibly adapt a behavioral model,deal with queries that concern present and past data,as well as seen and unseen data.This poses speciﬁc issues that include storing entirely(unbounded)data on disks withﬁnite memory(Chiky and Hébrail2008),as well as looking for predictions(estimations) where no measured data are available(Li and Heap2008).Summarization is one solution for addressing storage limits,while interpolation is one solution for supplementing unseen data.So far,both these tasks,namely summa-rization and interpolation,have been extensively investigated in the literature.How-ever,most of the studies consider only one task at a time.Several summarization algo-rithms,e.g.Rodrigues et al.(2008),Chen et al.(2010),Appice et al.(2013a),have been deﬁned in data mining,in order to compute fast,compact summaries of geophysical data as they arrive.Data storage systems store computed summaries,while discarding actual data.Various interpolation algorithms,e.g.Shepard(1968b),Krige(1951),have been deﬁned in geostatistics,in order to predict unseen measures of a geophysical vari-able.They use predictive inferences based upon actual measures sampled at speciﬁc locations of space.In this paper,we investigate a holistic approach that links predictive inferences to data summaries.Therefore,we introduce a summarization pattern of geo-physical data,which can be computed to save memory space and make predictive infer-ences easier.We use predictive inferences that exploit knowledge in data summaries, in order to yield accurate predictions covering any(seen and unseen)space location.We begin by observing that the common factor of several summarization and inter-polation algorithms is that they accommodate the spatial autocorrelation analysis in the learned model.Spatial autocorrelation is the cross-correlation of values of a variable strictly due to their relatively close locations on a two-dimensional surface.Spatial autocorrelation exists when there is systematic spatial variation in the values of a given property.This variation can exist in two forms,called positive and negative spatial autocorrelation(Legendre1993).In the positive case,the value of a variable at a given location tends to be similar to the values of that variable in nearby locations. This means that if the value of some variable is low at a given location,the presence of spatial autocorrelation indicates that nearby values are also low.Conversely,neg-ative spatial autocorrelation is characterized by dissimilar values at nearby locations. Goodchild(1986)remarks that positive autocorrelation is seen much more frequentlyLocal spatial autocorrelation and interpolative clusteringin practice than negative autocorrelation in geophysical variables.This is justiﬁed by Tobler’sﬁrst law of geography,according to which“everything is related to everything else,but near things are more related than distant things”(Tobler1979).As observed by LeSage and Pace(2001),the analysis of spatial autocorrelation is crucial and can be fundamental for building a reliable spatial component into any (statistical)model for geophysical data.With the same viewpoint,we propose to:(i) model the property of spatial autocorrelation when collecting the data records of a number of geophysical variables,(ii)use this model to compute compact summaries of actual data that are discarded and(iii)inject computed summaries into predictive inferences,in order to yield accurate estimations of geophysical data at any space location.The paper is organized as follows.The next section clariﬁes the motivation and the actual contribution of this paper.In Sect.3,related works regarding spatial autocorre-lation,spatial interpolators and clustering are reported.In Sect.4,we report the basics of the presented algorithm,while in Sect.5,we describe the proposed algorithm.An experimental study with several data collections is presented in Sect.6and conclusions are drawn.2Motivation and contributionsThe analysis of the property of spatial autocorrelation in geophysical data poses spe-ciﬁc issues.One issue is that most of the models that represent and learn data with spatial autocorrelation are based on the assumption of spatial stationarity.Thus,they assume a constant mean and a constant variance(no outlier)across space(Stojanova et al. 2012).This means that possible signiﬁcant variabilities in autocorrelation dependen-cies throughout the space are overlooked.The variability could be caused by a different underlying latent structure of the space,which varies among its portions in terms of scale of values or density of measures.As pointed out by Angin and Neville(2008), when autocorrelation varies signiﬁcantly throughout space,it may be more accurate to model the dependencies locally rather than globally.Another issue is that the spatial autocorrelation analysis is frequently decoupled from the multivariate analysis.In this case,the learning process accounts for the spa-tial autocorrelation of univariate data,while dealing with distinct variables separately (Appice et al.2013b).Bailey and Krzanowski(2012)observe that ignoring complex interactions among multiple variables may overlook interesting insights into the cor-relation of potentially related variables at any site.Based upon this idea,Dray and Jombart(2011)formulate a multivariate deﬁnition of the concept of spatial autocorre-lation,which centers on the extent to which values for a number of variables observed at a given location show a systematic(more than likely under spatial randomness), homogeneous association with values observed at the“neighboring”locations.In this paper,we develop an approach to modeling non stationary spatial auto-correlation of multivariate geophysical data by using interpolative clustering.As in clustering,clusters of records that are similar to each other at nearby locations are identiﬁed,but a cluster description and a predictive model is associated to each clus-A.Appice,D.Malerba ter.Data records are aggregated through clusters based on the cluster descriptions. The associated predictive models,that provide predictions for the variables,are stored as summaries of the clustered data.On any future demand,predictive models queried to databases are processed according to the requests,in order to yield accurate esti-mates for the variables.Interpolative clustering is a form of conceptual clustering (Michalski and Stepp1983)since,besides the clusters themselves,it also provides symbolic descriptions(in the form of conjunctions of conditions)of the constructed clusters.Thus,we can also plan to consider this description,in order to obtain clusters in different contexts of the same domain.However,in contrast to conceptual clus-tering,interpolative clustering is a form of supervised learning.On the other hand, interpolative clustering is similar to predictive clustering(Blockeel et al.1998),since it is a form of supervised learning.However,unlike predictive clustering,where the predictive space(target variables)is typically distinguished from the descriptive one (explanatory variables),1variables of interpolative clustering play,in principle,both target and explanatory roles.Interpolative clustering trees(ICTs)are a class of tree structured models where a split node is associated with a cluster and a leaf node with a single predictive model for the target variables of interest.The top node of the ICT contains the entire sample of training records.This cluster is recursively partitioned along the target variables into smaller sub-clusters.A predictive model(the mean)is computed for each target variable and then associated with each leaf.All the variables are predicted indepen-dently.In the context of this paper,an ICT is built by integrating the consideration of a local indicator of spatial autocorrelation,in order to account for signiﬁcant vari-abilities of autocorrelation dependencies in training data.Spatial autocorrelation is coupled with a multivariate analysis by accounting for the spatial dependence of data and their multivariate variance,simultaneously(Dray and Jombart2011).This is done by maximizing the variance reduction of local indicators of spatial autocorrelation computed for multivariate data when evaluating the candidates for adding a new node to the tree.This solution has the merit of improving both the summarization,as well as the predictive performance of the computed models.Memory space is saved by storing a single summary for data of multiple variables.Predictive accuracy is increased by exploiting the autocorrelation of data clustered in space.From the summarization perspective,an ICT is used to summarize geophysical data according to a hierarchical view of the spatial autocorrelation.We can browse gen-erated clusters at different levels of the hierarchy.Predictive models on leaf clusters model spatial autocorrelation dependencies as stationary over the local geometry of the clusters.From the interpolation perspective,an ICT is used to compute knowledge1The predictive clustering framework is originally deﬁned in Blockeel et al.(1998),in order to com-bine clustering problems and classiﬁcation/regression problems.The predictive inference is performed by distinguishing between target variables and explanatory variables.Target variables are considered when evaluating similarity between training data such that training examples with similar target values are grouped in the same cluster,while training examples with dissimilar target values are grouped in separate clusters. Explanatory variables are used to generate a symbolic description of the clusters.Although the algorithm presented in Blockeel et al.(1998)can be,in principle,run by considering the same set of variables for both explanatory and target roles,this case is not investigated in the original study.Local spatial autocorrelation and interpolative clusteringto make accurate predictive inferences easier.We can use Inverse Distance Weighting2 (Shepard1968b)to predict variables at a speciﬁc location by a weighted linear com-bination of the predictive models on the leaf clusters.Weights are inverse functions of the distance of the query point from the clusters.Finally,we can observe that an ICT provides a static model of a geophysical phe-nomenon.Nevertheless,inferences based on static models of spatial autocorrelation require temporal stationarity of statistical properties of variables.In the geophysical context,data are frequently subject to the temporal variation of such properties.This requires dynamic models that can be updated continuously as new fresh data arrive (Gama2010).In this paper,we propose an incremental algorithm for the construction of a time-adaptive ICT.When a new sample of records is acquired through stations of a sensor network,a past ICT is modiﬁed,in order to model new data of the process, which may change their properties over time.In theory,a distinct tree can be learned for each time point and several trees can be subsequently combined using some gen-eral framework(e.g.Spiliopoulou et al.2006)for tracking cluster evolution over time. However,this solution is prohibitively time-consuming when data arrive at a high rate. By taking into account that(1)geophysical variables are often slowly time-varying, and(2)a change of the properties of the data distribution of variables is often restricted to a delimited group of stations,more efﬁcient learning algorithms can be derived.In this paper,we propose an algorithm that retains past clusters as long as they discrimi-nate between surfaces of spatial autocorrelated data,while it mines novel clusters only if the latest become inaccurate.In this way,we can save computation time and track the evolution of the cluster model by detecting changes in the data properties at no additional cost.The speciﬁc contributions in this paper are:(1)the investigation of the property of spatial autocorrelation in interpolative clustering;(2)the development of an approach that uses a local indicator of the spatial autocorrelation property,in order to build an ICT by taking into account non-stationarity in autocorrelation and multivariate analysis;(3)the development of an incremental algorithm to yield a time-evolving ICT that accounts for the fact that the statistical properties of the geophysical data may change over time;(4)an extensive evaluation of the effectiveness of the proposed (incremental)approach on several real geophysical data.3Related worksThis work has been motivated by the research literature for the property of spatial autocorrelation and its inﬂuence on the interpolation theory and(predictive)clustering. In the following subsections,we report related works from these research lines.2Inverse distance weighting is a common interpolation algorithm.It has several advantages that endorse its widespread use in geostatistics(Li and Revesz2002;Karydas et al.2009;Li et al.2011):simplicity of implementation;lack of tunable parameters;ability to interpolate scattered data and work on any grid without suffering from multicollinearity.A.Appice,D.Malerba3.1Spatial autocorrelationSpatial autocorrelation analysis quantiﬁes the degree of spatial clustering (positive autocorrelation)or dispersion (negative autocorrelation)in the values of a variable measured over a set of point locations.In the traditional approach to spatial autocorre-lation,the “overall pattern”of spatial dependence in data is summarized into a single indicator,such as the familiar Global Moran’s I,Global Geary’s C or Gamma indi-cators of spatial associations (see Legendre 1993,for details).These indicators allow us to establish whether nearby locations tend to have similar (i.e.spatial clustering),random or different values (i.e.spatial dispersion)of a variable on the entire sample.They are referred to as global indicators of spatial autocorrelation,in contrast to the local indicators that we will consider in this paper.Procedurally,global indicators (see Getis 2008)are computed,in order to indicate the degree of regional clustering over the entire distribution of a geophysical variable.Stojanova et al.(2013),as well as Appice et al.(2012)have recently investigated the addition of these indicators to predictive inferences.They show how global measures of spatial autocorrelation can be computed at multiple levels of a hierarchical clustering,in order to improve predictions that are consistently clustered in space.Despite the useful results of these studies,a limitation of global indicators of spatial autocorrelation is that they assume a spatial stationarity across space (Angin and Neville 2008).While this may be useful when spatial associations are studied for the small areas associated to the bottom levels of a hierarchical model,it is not very meaningful or may even be highly misleading in the analysis of spatial associations for large areas associated to the top levels of a hierarchical model.Local indicators offer an alternative to global modeling by looking for “local pat-terns”of spatial dependence within the study region (see Boots 2002for a survey).Unlike global indicators,which return one value for the entire sample of data,local indicators return one value for each sampled location of a variable;this value expresses the degree to which that location is part of a cluster.Widely speaking,a local indi-cator of spatial autocorrelation allows us to discover deviations from global patterns of spatial association,as well as hot spots like local clusters or local outliers.Several local indicators of spatial autocorrelation are formulated in the literature.Anselin’s local Moran’s I (Anselin 1995)is a local indicator of spatial autocorre-lation,that has gained wide acceptance in the literature.It is formulated as follows:I (i )= (z (i )−z )m 2 n j =1,j =i(λ(i j )(z (j )−z )),(1)where z (i )is the value of a variable Z measured at the location i ,z = n i =1z (i )n is themean of the data measured for Z ,λ(i j )is a spatial (Gaussian or bi-square)weight between the locations i and j over a neighborhood structure of the training data and m 2=1n j (z (j )−z )2is the second moment.Anselin’s local Moran’s I is related to the global Moran I as the average of I (i )is equal to the global I,up to a factor of proportionality.A positive value for I (i )indicates that z (i )is surrounded by similar values.Therefore,this value is part of a cluster.A negative value for I (i )indicates that z (i )is surrounded by dissimilar values.Thus,this value is an outlier.Local spatial autocorrelation and interpolative clusteringThe standardized Getis and Ord local GI∗(Getis and Ord1992)is a local indicator of spatial autocorrelation that is formulated as follows:G I∗(i)=1S2n−1nnj=1,j=iλ(i j)2−Λ(i)2⎛⎝nj=1,j=iλi j z(j)−zΛ(i)⎞⎠,(2)whereΛ(i)= nj=1,j=iλ(i j)and S2=nj=1(z(j)−z)2n.A positive value for G I∗(i)indicates clusters of high values around i,while a negative value for G I∗(i)indicates clusters of low values around i.The interpretation of GI∗is different from that of I:the former distinguishes clusters of high and low values,but does not capture the presence of negative spatial auto correlation(dispersion);the latter is able to detect both positive and negative spatial autocorrelations,but does not distinguish clusters of high or low values.Getis and Ord (1992)recommend computing GI∗to look for spatial clusters and I to detect spatial outliers.By following this advice,Holden and Evans(2010)apply fuzzy C-means to GI∗values,in order to cluster satellite-inferred burn severity classes.Scrucca(2005) and Appice et al.(2013c)use k-means to cluster GI∗values,to compute spatially aware clusters of the data measured for a geophysical variable.Measures of both global and local spatial autocorrelation are principally deﬁned for univariate data.However,the integration of multivariate and autocorrelation informa-tion has recently been advocated by Dray and Jombart(2011).The simplest approach considers a two-step procedure,where data areﬁrst summarized with a multivariate analysis such as PCA.In a second step,any univariate(either global or local)spatial measure can be applied to PCA scores for each axis separately.The other approach ﬁnds coefﬁcients to obtain a linear combination of variables,which maximizes the product between the variance and the global Moran measure of the scores.Alterna-tively,Stojanova et al.(2013)propose computing the mean of global measures(Moran I and global Getis C),computed for distinct variables of a vector,as a global indicator of spatial autocorrelation of the vector by blurring cross-correlations between separate variables.Dray et al.(2006)explore the theory of the principal coordinates of neighbor matrices and develop the framework of Morans eigenvector maps.They demonstrate that their framework can be linked to spatial autocorrelation structure functions also in multivariate domains.Blanchet et al.(2008)expand this framework by taking into account asymmetric directional spatial processes.3.2Interpolation theoryStudies on spatial interpolation were initially encouraged by the analysis of ore mining, water extraction or pumping and rock inspection(Cressie1993).In theseﬁelds,inter-polation algorithms are required as the main resource to recover unknown information and account for problems like missing data,energy saving,sensor default,as well as to support data summarization and investigation of spatial correlation between observed data(Lam1983).The interpolation algorithms estimate a geophysical quantity in any geographic location where the variable measure is not available.The interpolated valueA.Appice,D.Malerba is derived by making use of the knowledge of the nearby observed data and,some-times,of some hypotheses or supplementary information on the data variable.The rationale behind this spatially aware estimate of a variable is the property of positive spatial autocorrelation.Any spatial interpolator accounts for this property,including within its formulation the consideration of a stronger correlation between data which are closer than for those that are farther apart.Regression(Burrough and McDonnell1998),Inverse Distance Weighting(IDW) (Shepard1968a),Radial Basis Functions(RBF)(Lin and Chen2004)and Kriging (Krige1951)are the most common interpolation algorithms.These algorithms are studied to deal with the irregular sampling of the investigated area(Isaaks and Srivas-tava1989;Stein1999)or with the difﬁculty of describing the area by the local atlas of larger and irregular manifolds.Regression algorithms,that are statistical interpolators, determine a functional relationship between the variable to be predicted and the geo-graphic coordinates of points where the variable is measured.IDW and RBF,which are both deterministic interpolators,use mathematical functions to calculate an unknown variable value in a geographic location,based either on the degree of similarity(IDW) or the degree of smoothing(RBF)in relation to neighboring data points.Both algo-rithms share with Kriging the idea that the collection of variable observations can be considered as a production of correlated spatial random data with speciﬁc statistical properties.In Kriging,speciﬁcally,this correlation is used to derive a second-order model of the variable(the variogram).The variogram represents an approximate mea-sure of the spatial dissimilarity of the observed data.The IDW interpolation is based on a linear combination of nearby observations with weights proportional to a power of the distances.It is a heuristic but efﬁcient approach justiﬁed by the typical power-law of the spatial correlation.In this sense,IDW accomplishes the same strategy adopted by the more rigorous formulation of Kriging(Li and Revesz2002;Karydas et al.2009; Li et al.2011).Several studies have arisen from these base interpolation algorithms.Ohashi and Torgo(2012)investigate a mapping of the spatial interpolation problem into a multiple regression task.They deﬁne a series of spatial indicators to better describe the spatial dynamics of the variable of interest.Umer et al.(2010)propose to recover missing data of a dense network by a Kriging interpolator.By considering that the compu-tational complexity of a variogram is cubic in the size of the observed data(Cressie 1993),the variogram calculus,in this study,is sped-up by processing only the areas with information holes,rather than the global data.Goovaerts(1997)extends Krig-ing,in order to predict multiple variables(cokriging)measured at the same location. Cokriging uses direct and cross covariance functions that are computed in the sample of the observed data.Teegavarapu et al.(2012)use IDW and1-Nearest Neighbor,in order to interpolate a grid of rainfall data and re-sample data at multiple resolutions. Lu and Wong(2008)formulate IDW in an adaptive way,by accounting for the varying distance-decay relationship in the area under examination.The weighting parameters are varied according to the spatial pattern of the sampled points in the neighborhood. The algorithm proves more efﬁcient than ordinary IDW and,in several cases,also better than Kriging.Recently,Appice et al.(2013b)have started to link IDW to the spatio-temporal knowledge enclosed in speciﬁc summarization patterns,called trend clusters.Nevertheless,this investigation is restricted to univariate data.Local spatial autocorrelation and interpolative clusteringThere are several applications(e.g.Li and Revesz2002;Karydas et al.2009;Li et al.2011)where IDW is used.This contributes to highlighting IDW as a determinis-tic,quick and simple interpolation algorithm that yields accurate predictions.On the other hand,Kriging is based on the statistical properties of a variable and,hence,it is expected to be more accurate regarding the general characteristics of the recorded data and the efﬁcacy of the model.In any case,the accuracy of Kriging depends on a reliable estimation of the variogram(Isaaks and Srivastava1989;¸Sen and¸Salhn 2001)and the variogram computation cost is proportional to the cube of the number of observed data(Cressie1990).This cost can be prohibitive in time-evolving applica-tions,where the statistical properties of a variable may change over time.In the Data Mining framework,the change of the underlying properties over time is usually called concept drift(Gama2010).It is noteworthy that the concept drift,expected in dynamic data,can be a serious complication for Kriging.It may impose the repetition of costly computation of the variogram each time the statistical properties of the variable change signiﬁcantly.These considerations motivate the broad use of an interpolator like IDW that is accurate enough and whose learning phase can be run on-line when data are collected in a stream.3.3Cluster analysisCluster analysis is frequently used in geophysical data interpretation,in order to obtain meaningful and useful results(Song et al.2010).In addition,it can be used as a sum-marization paradigm for data streams,since it underlines the advantage of discovering summaries(clusters)that adjust well to the evolution of data.The seminal work is that of Aggarwal et al.(2007),where a k-means algorithm is tailored to discover micro-clusters from multidimensional transactions that arrive in a stream.Micro-clusters are adjusted each time a transaction arrives,in order to preserve the temporal locality of data along a time horizon.Another clustering algorithm to summarize data streams is presented in Nassar and Sander(2007).The main characteristic of this algorithm is that it allows us to summarize multi-source data streams.The multi-source stream is com-posed of sets of numeric values that are transmitted by a variable number of sources at consecutive time points.Timestamped values are modeled as2D(time-domain)points of a Euclidean space.Hence,the source location is neither represented as a dimension of analysis nor processed as information-bearing.The stream is broken into windows. Dense regions of2D points are detected in these windows and represented by means of cluster feature vectors.Although a spatial clustering algorithm is employed,the spatial arrangement of data sources is still neglected.Appice et al.(2013a)describe a clustering algorithm that accounts for the property of spatial autocorrelation when computing clusters that are compact and accurate summaries of univariate geophysical data streams.In all these studies clustering is addressed as an unsupervised task.Predictive clustering is a supervised extension of cluster analysis,which combines elements of prediction and clustering.The task,originally formulated in Blockeel et al.(1998),assumes:(1)a descriptive space of explanatory variables,(2)a predictive space of target variables and(3)a set of training records deﬁned on both descriptive and predictive space.Training records that are similar to each other are clustered and a predictive model is associated to each cluster.This allows us to predict the unknown。

AI术语

人工智能专业重要词汇表1、A开头的词汇：Artificial General Intelligence/AGI通用人工智能Artificial Intelligence/AI人工智能Association analysis关联分析Attention mechanism注意力机制Attribute conditional independence assumption属性条件独立性假设Attribute space属性空间Attribute value属性值Autoencoder自编码器Automatic speech recognition自动语音识别Automatic summarization自动摘要Average gradient平均梯度Average-Pooling平均池化Accumulated error backpropagation累积误差逆传播Activation Function激活函数Adaptive Resonance Theory/ART自适应谐振理论Addictive model加性学习Adversarial Networks对抗网络Affine Layer仿射层Affinity matrix亲和矩阵Agent代理/ 智能体Algorithm算法Alpha-beta pruningα-β剪枝Anomaly detection异常检测Approximation近似Area Under ROC Curve／AUC R oc 曲线下面积2、B开头的词汇Backpropagation Through Time通过时间的反向传播Backpropagation/BP反向传播Base learner基学习器Base learning algorithm基学习算法Batch Normalization/BN批量归一化Bayes decision rule贝叶斯判定准则Bayes Model Averaging／BMA贝叶斯模型平均Bayes optimal classifier贝叶斯最优分类器Bayesian decision theory贝叶斯决策论Bayesian network贝叶斯网络Between-class scatter matrix类间散度矩阵Bias偏置/ 偏差Bias-variance decomposition偏差-方差分解Bias-Variance Dilemma偏差–方差困境Bi-directional Long-Short Term Memory/Bi-LSTM双向长短期记忆Binary classification二分类Binomial test二项检验Bi-partition二分法Boltzmann machine玻尔兹曼机Bootstrap sampling自助采样法／可重复采样／有放回采样Bootstrapping自助法Break-Event Point／BEP平衡点3、C开头的词汇Calibration校准Cascade-Correlation级联相关Categorical attribute离散属性Class-conditional probability类条件概率Classification and regression tree/CART分类与回归树Classifier分类器Class-imbalance类别不平衡Closed -form闭式Cluster簇/类/集群Cluster analysis聚类分析Clustering聚类Clustering ensemble聚类集成Co-adapting共适应Coding matrix编码矩阵COLT国际学习理论会议Committee-based learning基于委员会的学习Competitive learning竞争型学习Component learner组件学习器Comprehensibility可解释性Computation Cost计算成本Computational Linguistics计算语言学Computer vision计算机视觉Concept drift概念漂移Concept Learning System /CLS概念学习系统Conditional entropy条件熵Conditional mutual information条件互信息Conditional Probability Table／CPT条件概率表Conditional random field/CRF条件随机场Conditional risk条件风险Confidence置信度Confusion matrix混淆矩阵Connection weight连接权Connectionism连结主义Consistency一致性／相合性Contingency table列联表Continuous attribute连续属性Convergence收敛Conversational agent会话智能体Convex quadratic programming凸二次规划Convexity凸性Convolutional neural network/CNN卷积神经网络Co-occurrence同现Correlation coefficient相关系数Cosine similarity余弦相似度Cost curve成本曲线Cost Function成本函数Cost matrix成本矩阵Cost-sensitive成本敏感Cross entropy交叉熵Cross validation交叉验证Crowdsourcing众包Curse of dimensionality维数灾难Cut point截断点Cutting plane algorithm割平面法4、D开头的词汇Data mining数据挖掘Data set数据集Decision Boundary决策边界Decision stump决策树桩Decision tree决策树／判定树Deduction演绎Deep Belief Network深度信念网络Deep Convolutional Generative Adversarial Network/DCGAN深度卷积生成对抗网络Deep learning深度学习Deep neural network/DNN深度神经网络Deep Q-Learning深度Q 学习Deep Q-Network深度Q 网络Density estimation密度估计Density-based clustering密度聚类Differentiable neural computer可微分神经计算机Dimensionality reduction algorithm降维算法Directed edge有向边Disagreement measure不合度量Discriminative model判别模型Discriminator判别器Distance measure距离度量Distance metric learning距离度量学习Distribution分布Divergence散度Diversity measure多样性度量／差异性度量Domain adaption领域自适应Downsampling下采样D-separation （Directed separation）有向分离Dual problem对偶问题Dummy node哑结点Dynamic Fusion动态融合Dynamic programming动态规划5、E开头的词汇Eigenvalue decomposition特征值分解Embedding嵌入Emotional analysis情绪分析Empirical conditional entropy经验条件熵Empirical entropy经验熵Empirical error经验误差Empirical risk经验风险End-to-End端到端Energy-based model基于能量的模型Ensemble learning集成学习Ensemble pruning集成修剪Error Correcting Output Codes／ECOC纠错输出码Error rate错误率Error-ambiguity decomposition误差-分歧分解Euclidean distance欧氏距离Evolutionary computation演化计算Expectation-Maximization期望最大化Expected loss期望损失Exploding Gradient Problem梯度爆炸问题Exponential loss function指数损失函数Extreme Learning Machine/ELM超限学习机6、F开头的词汇Factorization因子分解False negative假负类False positive假正类False Positive Rate/FPR假正例率Feature engineering特征工程Feature selection特征选择Feature vector特征向量Featured Learning特征学习Feedforward Neural Networks/FNN前馈神经网络Fine-tuning微调Flipping output翻转法Fluctuation震荡Forward stagewise algorithm前向分步算法Frequentist频率主义学派Full-rank matrix满秩矩阵Functional neuron功能神经元7、G开头的词汇Gain ratio增益率Game theory博弈论Gaussian kernel function高斯核函数Gaussian Mixture Model高斯混合模型General Problem Solving通用问题求解Generalization泛化Generalization error泛化误差Generalization error bound泛化误差上界Generalized Lagrange function广义拉格朗日函数Generalized linear model广义线性模型Generalized Rayleigh quotient广义瑞利商Generative Adversarial Networks/GAN生成对抗网络Generative Model生成模型Generator生成器Genetic Algorithm/GA遗传算法Gibbs sampling吉布斯采样Gini index基尼指数Global minimum全局最小Global Optimization全局优化Gradient boosting梯度提升Gradient Descent梯度下降Graph theory图论Ground-truth真相／真实8、H开头的词汇Hard margin硬间隔Hard voting硬投票Harmonic mean调和平均Hesse matrix海塞矩阵Hidden dynamic model隐动态模型Hidden layer隐藏层Hidden Markov Model/HMM隐马尔可夫模型Hierarchical clustering层次聚类Hilbert space希尔伯特空间Hinge loss function合页损失函数Hold-out留出法Homogeneous同质Hybrid computing混合计算Hyperparameter超参数Hypothesis假设Hypothesis test假设验证9、I开头的词汇ICML国际机器学习会议Improved iterative scaling/IIS改进的迭代尺度法Incremental learning增量学习Independent and identically distributed/i.i.d.独立同分布Independent Component Analysis/ICA独立成分分析Indicator function指示函数Individual learner个体学习器Induction归纳Inductive bias归纳偏好Inductive learning归纳学习Inductive Logic Programming／ILP归纳逻辑程序设计Information entropy信息熵Information gain信息增益Input layer输入层Insensitive loss不敏感损失Inter-cluster similarity簇间相似度International Conference for Machine Learning/ICML国际机器学习大会Intra-cluster similarity簇内相似度Intrinsic value固有值Isometric Mapping/Isomap等度量映射Isotonic regression等分回归Iterative Dichotomiser迭代二分器10、K开头的词汇Kernel method核方法Kernel trick核技巧Kernelized Linear Discriminant Analysis／KLDA核线性判别分析K-fold cross validation k 折交叉验证／k 倍交叉验证K-Means Clustering K –均值聚类K-Nearest Neighbours Algorithm/KNN K近邻算法Knowledge base知识库Knowledge Representation知识表征11、L开头的词汇Label space标记空间Lagrange duality拉格朗日对偶性Lagrange multiplier拉格朗日乘子Laplace smoothing拉普拉斯平滑Laplacian correction拉普拉斯修正Latent Dirichlet Allocation隐狄利克雷分布Latent semantic analysis潜在语义分析Latent variable隐变量Lazy learning懒惰学习Learner学习器Learning by analogy类比学习Learning rate学习率Learning Vector Quantization/LVQ学习向量量化Least squares regression tree最小二乘回归树Leave-One-Out/LOO留一法linear chain conditional random field线性链条件随机场Linear Discriminant Analysis／LDA线性判别分析Linear model线性模型Linear Regression线性回归Link function联系函数Local Markov property局部马尔可夫性Local minimum局部最小Log likelihood对数似然Log odds／logit对数几率Logistic Regression Logistic 回归Log-likelihood对数似然Log-linear regression对数线性回归Long-Short Term Memory/LSTM长短期记忆Loss function损失函数12、M开头的词汇Machine translation/MT机器翻译Macron-P宏查准率Macron-R宏查全率Majority voting绝对多数投票法Manifold assumption流形假设Manifold learning流形学习Margin theory间隔理论Marginal distribution边际分布Marginal independence边际独立性Marginalization边际化Markov Chain Monte Carlo/MCMC马尔可夫链蒙特卡罗方法Markov Random Field马尔可夫随机场Maximal clique最大团Maximum Likelihood Estimation/MLE极大似然估计／极大似然法Maximum margin最大间隔Maximum weighted spanning tree最大带权生成树Max-Pooling最大池化Mean squared error均方误差Meta-learner元学习器Metric learning度量学习Micro-P微查准率Micro-R微查全率Minimal Description Length/MDL最小描述长度Minimax game极小极大博弈Misclassification cost误分类成本Mixture of experts混合专家Momentum动量Moral graph道德图／端正图Multi-class classification多分类Multi-document summarization多文档摘要Multi-layer feedforward neural networks多层前馈神经网络Multilayer Perceptron/MLP多层感知器Multimodal learning多模态学习Multiple Dimensional Scaling多维缩放Multiple linear regression多元线性回归Multi-response Linear Regression ／MLR多响应线性回归Mutual information互信息13、N开头的词汇Naive bayes朴素贝叶斯Naive Bayes Classifier朴素贝叶斯分类器Named entity recognition命名实体识别Nash equilibrium纳什均衡Natural language generation/NLG自然语言生成Natural language processing自然语言处理Negative class负类Negative correlation负相关法Negative Log Likelihood负对数似然Neighbourhood Component Analysis/NCA近邻成分分析Neural Machine Translation神经机器翻译Neural Turing Machine神经图灵机Newton method牛顿法NIPS国际神经信息处理系统会议No Free Lunch Theorem／NFL没有免费的午餐定理Noise-contrastive estimation噪音对比估计Nominal attribute列名属性Non-convex optimization非凸优化Nonlinear model非线性模型Non-metric distance非度量距离Non-negative matrix factorization非负矩阵分解Non-ordinal attribute无序属性Non-Saturating Game非饱和博弈Norm范数Normalization归一化Nuclear norm核范数Numerical attribute数值属性14、O开头的词汇Objective function目标函数Oblique decision tree斜决策树Occam’s razor奥卡姆剃刀Odds几率Off-Policy离策略One shot learning一次性学习One-Dependent Estimator／ODE独依赖估计On-Policy在策略Ordinal attribute有序属性Out-of-bag estimate包外估计Output layer输出层Output smearing输出调制法Overfitting过拟合／过配Oversampling过采样15、P开头的词汇Paired t-test成对t 检验Pairwise成对型Pairwise Markov property成对马尔可夫性Parameter参数Parameter estimation参数估计Parameter tuning调参Parse tree解析树Particle Swarm Optimization/PSO粒子群优化算法Part-of-speech tagging词性标注Perceptron感知机Performance measure性能度量Plug and Play Generative Network即插即用生成网络Plurality voting相对多数投票法Polarity detection极性检测Polynomial kernel function多项式核函数Pooling池化Positive class正类Positive definite matrix正定矩阵Post-hoc test后续检验Post-pruning后剪枝potential function势函数Precision查准率／准确率Prepruning预剪枝Principal component analysis/PCA主成分分析Principle of multiple explanations多释原则Prior先验Probability Graphical Model概率图模型Proximal Gradient Descent/PGD近端梯度下降Pruning剪枝Pseudo-label伪标记16、Q开头的词汇Quantized Neural Network量子化神经网络Quantum computer量子计算机Quantum Computing量子计算Quasi Newton method拟牛顿法17、R开头的词汇Radial Basis Function／RBF径向基函数Random Forest Algorithm随机森林算法Random walk随机漫步Recall查全率／召回率Receiver Operating Characteristic/ROC受试者工作特征Rectified Linear Unit/ReLU线性修正单元Recurrent Neural Network循环神经网络Recursive neural network递归神经网络Reference model参考模型Regression回归Regularization正则化Reinforcement learning/RL强化学习Representation learning表征学习Representer theorem表示定理reproducing kernel Hilbert space/RKHS再生核希尔伯特空间Re-sampling重采样法Rescaling再缩放Residual Mapping残差映射Residual Network残差网络Restricted Boltzmann Machine/RBM受限玻尔兹曼机Restricted Isometry Property/RIP限定等距性Re-weighting重赋权法Robustness稳健性/鲁棒性Root node根结点Rule Engine规则引擎Rule learning规则学习18、S开头的词汇Saddle point鞍点Sample space样本空间Sampling采样Score function评分函数Self-Driving自动驾驶Self-Organizing Map／SOM自组织映射Semi-naive Bayes classifiers半朴素贝叶斯分类器Semi-Supervised Learning半监督学习semi-Supervised Support Vector Machine半监督支持向量机Sentiment analysis情感分析Separating hyperplane分离超平面Sigmoid function Sigmoid 函数Similarity measure相似度度量Simulated annealing模拟退火Simultaneous localization and mapping同步定位与地图构建Singular Value Decomposition奇异值分解Slack variables松弛变量Smoothing平滑Soft margin软间隔Soft margin maximization软间隔最大化Soft voting软投票Sparse representation稀疏表征Sparsity稀疏性Specialization特化Spectral Clustering谱聚类Speech Recognition语音识别Splitting variable切分变量Squashing function挤压函数Stability-plasticity dilemma可塑性-稳定性困境Statistical learning统计学习Status feature function状态特征函Stochastic gradient descent随机梯度下降Stratified sampling分层采样Structural risk结构风险Structural risk minimization/SRM结构风险最小化Subspace子空间Supervised learning监督学习／有导师学习support vector expansion支持向量展式Support Vector Machine/SVM支持向量机Surrogat loss替代损失Surrogate function替代函数Symbolic learning符号学习Symbolism符号主义Synset同义词集19、T开头的词汇T-Distribution Stochastic Neighbour Embedding/t-SNE T–分布随机近邻嵌入Tensor张量Tensor Processing Units/TPU张量处理单元The least square method最小二乘法Threshold阈值Threshold logic unit阈值逻辑单元Threshold-moving阈值移动Time Step时间步骤Tokenization标记化Training error训练误差Training instance训练示例／训练例Transductive learning直推学习Transfer learning迁移学习Treebank树库Tria-by-error试错法True negative真负类True positive真正类True Positive Rate/TPR真正例率Turing Machine图灵机Twice-learning二次学习20、U开头的词汇Underfitting欠拟合／欠配Undersampling欠采样Understandability可理解性Unequal cost非均等代价Unit-step function单位阶跃函数Univariate decision tree单变量决策树Unsupervised learning无监督学习／无导师学习Unsupervised layer-wise training无监督逐层训练Upsampling上采样21、V开头的词汇Vanishing Gradient Problem梯度消失问题Variational inference变分推断VC Theory VC维理论Version space版本空间Viterbi algorithm维特比算法Von Neumann architecture冯·诺伊曼架构22、W开头的词汇Wasserstein GAN/WGAN Wasserstein生成对抗网络Weak learner弱学习器Weight权重Weight sharing权共享Weighted voting加权投票法Within-class scatter matrix类内散度矩阵Word embedding词嵌入Word sense disambiguation词义消歧23、Z开头的词汇Zero-data learning零数据学习Zero-shot learning零次学习。

遗传距离聚类法和模型聚类法在地方鸡种群体遗传结构分析中的比较

1.2 微卫星引物综合地方鸡种遗传多样性研究所使用的引
物[4,12-13]，从中选择效果较好，杂合度在中度水平以上的16 个微卫星基因座：M CW 0295、M CW 0222、 M C W 0014、M CW 0067、M CW 0069、M CW 0034、 MCW0111、M CW0078、M CW0206、LEl0094、 LEl0234、M CW0330、M CW0104、M CW0020、 M CW0165、MCW0123。 1.3 PCR扩增、电泳及结果记录
在利用微卫星、单核苷酸多态性（Single nucleotide polymorphisms,SNP）、限制性片段长度多态性（Restriction fragment length polymorphism , R F L P）等基因型数据，分析群体遗传结构及品种间亲缘关系的研究中，聚类方法主要有2 种类型：一种是距离聚类法（Distance-based cluster method），通过计算两两群体（个体）间的遗传距离，并基于遗传距离运用N J（Neighbor joining）或U PGM A（Unweighted pair group method with arithmetic mean）算法构建聚类图，分析群体遗传结构及亲缘关系；目前距离聚类法包括D s（Nei’s standard genetic distance）、DR（Reynolds’genetic distance）、DA（Nei’s improved genetic distance）等，遗传距离在畜禽品种遗传结构及亲缘关系分析中[1-6]已被广泛采用。
遗传距离聚类法和模型聚类法在地方鸡种群体遗传结构分析中的比较
李慧芳1，陈宽维1*，韩威1，张学余1，高玉时1，陈国宏2，朱云芬1，王强1

NIR英文术语

英文缩写中文absorbance A 吸光度acousto optic tunable filter AOTF 声光可调滤光器advance process control APC 先进过程控制系统american society for testing and materials ASTM 美国材料试验协会标准artificial neural networks ANN 人工神经网络back propagation BP-ANN 反向传输神经网络calibration transfer CT 模型传递技术charge coupled device CCD 电荷耦合器件cross validation CV 交互验证方法direct standardisation DS 直接校正算法distributed control system DCS 分布式控制系统elimination of uninformative variable UVE 无信息变量消除法euclidean distance ED 欧式距离finite impulse filter FIR 有脉冲响应办法first derivative 1ST DER 一阶微分fourier transform FT 傅里叶变换fourier transform near infrared FT-NIR 傅里叶变换近红外fuzzy C-means clustering FCME 模糊C-均值聚类gas chromatography GC 气相色谱genetic algorithms GA 遗传算法global calibration model GCM 全局校正模型hierarchical cluster analysis HCA 系统聚类分析方法hybrid calibration model HCM 混合校正模型kennard-stone K-S K-S标样选取方法k-nearest neighbor KNN K-最近邻法linear discriminant analysis LDA 线性判别分析linear learning machine LLM 线性学习机light emitting diode LED 发光二极管locally weighted regression LWR 局部权重回归ling wavelength NIR LW-NIR 长波近红外mahalanobis distance MD 马氏距离Mid-infrared MIR 中红外model updating MU 模型更新multiple linear regression MLR 多元线性回归multiplicative scatter correction MSC 多元散射校正near infrared spectroscopy NIR 近红外光谱nearest neighbor distance NND 最邻近距离net analyte signal NAS 净分析信号number aperture NA 数值孔径orthogonal signal correction OSC 正交信号校正partial least squares PLS 偏最小二乘photodiode array PDA 光电二极管阵列piecewise direct standardisation PDS 分段直接校正算法prediction residual error sum of squares PRESS 预测残差平方和principal component analysis PCA 主成分分析principal component regression PCR 主成分回归process analytical chemistry PAC 过程分析化学root mean square error of cross validaton RMSECV 交互验证均方根偏差root mean square error of prediction RMSEP 预测均方根偏差root of mean sum squared residuals RMSSR 光谱残差均方根sample conditioning system SCS 样品预处理系统sample recovery system SRS 样品回收系统second derivative 2nd Der 二姐微分short wavelength near-infrared SW-NIR 短波近红外signal-to-noise ratio S/N 信噪比simulated annealing SA 模拟退火算法slope/bias S/B 斜率偏差算法soft independent modeling of class analogy SIMCA 簇类的独立软模式方法standard error of calibration SEC 校正标准偏差standard error of cross validation SECV 交互验证标准偏差standard error of prediction SEP 预测标准偏差standard normal variate SNV 标准正交变量变换stepwise regressiion analysis SRA 逐步回归分析法support vector machines SVM 支持向量机方法topology TP 拓扑方法ultraviolet-visible UV-VIS 紫外可见光谱wavelet transform WT 小波变换monochromator 单色器grating 光栅detector 检测器collimating mirror 准直镜平行光镜optical bench 光学台resolution 分辨率diffraction gratings 衍射光栅band-width 带宽slit 狭缝A spectroscopic instrument generally consists of entrance slit,collimator, a dispersive element, such as a grating or prism,focusing optics and detector. In a monochromator system there is normally also an exit slit, and only a narrow portion of the spectrum is projected on a one-element detector. In monochromators the entrance and exit slits are in a fixed position and can be changed in width. Rotating the grating scans the spectrum.Development of micro-electronics during the 90’s in the field ofmulti-element optical detectors, such as Charged Coupled Devices (CCD) Arrays and Photo Diode (PD) Arrays。

cluster-based 推理

cluster-based推理【释义】cluster-based基于聚类的【短语】1cluster based policy基于集群的政策2cluster based server集群服务器3cluster-based web serverweb集群服务器4cluster based routing protocol基于簇的路由协议5cluster-based ad hoc networkadhoc分群网络6cluster-based linux web serverlinux集群服务器7cluster-based web caching systemweb集群缓存系统8cluster-based routing protocol分簇路由协议【例句】1Starting in Version11.1,IDS contains cluster-based replication functionality.从version11.1开始，IDS包含了基于集群的复制功能。

2Then synchronization errors of the dynamic cluster-based DHTS are deduced.理论推导了动态簇dhts算法同步误差。

3This paper presents a novel uneven cluster-based routing protocol for wireless sensor networks.该文提出一种新颖的基于非均匀分簇的无线传感器网络多跳路由协议。

4Cluster-based systems have become the foundation of the architecture of high-performance Web servers.基于机群的可扩展计算机网络已逐渐成为高性能网络服务器架构的基础。

5For example,a mysql-cluster-based database tier would be comprised of two server groups,an NDBD group and a manager group.例如，基于mysql集群的数据库层可能包含两个服务器组，一个ndbd 组和一个manager组。

外文文献阅读笔记

A dynamic replica management strategy in data grid--- Journal of Network and Computer Applications expired, propose, indicate, profitable, boost, claim, present, congestion, deficiency, moderately, metric, turnaround, assume,specify, display, illustrate, issue,outperform over .... about 37%, outperform ....lead todraw one's attentionaccordinglyhave great influence ontake into accountin terms ofplay major role inin comparison with, in comparison toi.e.=(拉丁)id estReplication is a technique used in data grid to improve fault tolerance and to reduce the bandwidth consumption.Managing this huge amount of data in a centralized way is ineffective due to extensive access latency and load on the central server.Data Grids aggregate a collection of distributed resources placed in different parts of the world to enable users to share data and resources.Data replication is an important technique to manage large data in a distributed manner.There are three key issues in all the data replication algorithms which are replica placement, replica management and replica selection.Meanwhile, even though the memory and storage size of new computers are ever increasing, they are still not keeping up with the request of storing large number of data.each node along its path to the requester.Enhanced Dynamic Hierarchical Replication and Weighted SchedulingStrategy in Data Grid--- Journal of Parallel and Distributed Computing duration, manually, appropriate, critical, therefore, hybrid, essential, respectively, candidate, typically, advantage, significantly, thereby, adopt, demonstrate, superiority, scenario, empirically, feasibility, duplicate, insufficient, interpret, beneficial, obviously, whilst, idle, considerably, notably, consequently, apparently,in a wise manneraccording tofrom a size point of viewdepend oncarry outis comprised ofalong withas well asto the best of our knowledgeBest replica placement plays an important role for obtaining maximum benefit from replication as well as reducing storage cost and mean job execution time.Data replication is a key optimization technique for reducing access latency and managing large data by storing data in a wise manner.Effective scheduling in the Grid can reduce the amount of data transferred among nodes by submitting a job to a node where most of the requested data files are available.Effective scheduling of jobs is necessary in such a system to use available resources such as computational, storage and network efficiently.Storing replicas close to the users or grid computation nodes improves response time, fault tolerance and decreases bandwidth consumption.The files of Grid environments that can be changed by Grid users might bring an important problem of maintaining data consistency among the various replicas distributed in different machines.So the sum of them along with the proper weight (w1,w2) for each factor yields the combined cost (CCi,j) of executing job i in site j.A classification of file placement and replication methods on grids--- Future Generation Computer Systems encounter, slightly, simplistic, clairvoyance, deploy, stringent, concerning, properly, appropriately, overhead, motivate, substantial, constantly, monitor, highlight, distinguish, omit, salient, entirely, criteria, conduct, preferably, alleviate, error-prone, conversely,for instanceaccount forhave serious impact ona.k.a.= also known asconsist inaim atin the hands offor .... purposesw.r.t.=with regard toconcentrate onfor the sake ofbe out of the scope of ...striping files in blocksProduction approaches are slightly different than works evaluated in simulation or in controlled conditions....File replication is a common solution to improve the reliability and performance of data transfers.Many file management strategies were proposed but none was adopted in large-scale production infrastructures.Clairvoyant models assume that resource characteristics of interest are entirely known to the file placement algorithm.Cooperation between data placement and job scheduling can improve the overall transfer time and have a significant impact on the application makespan as shown in.We conclude that replication policies should rely on a-priori information about file accesses, such as file type or workflow relation.Dynamic replica placement and selection strategies in data grids----Acomprehensive survey--- Journal of Parallel and Distributed Computing merit, demerit, tedious, namely, whereas, various, literature, facilitate, suitable, comparative, optimum, retrieve, rapid, evacuate, invoke, identical, prohibitive, drawback, periodically,with respect toin particularin generalas the name indicatesfar apartconsist of , consist inData replication techniques are used in data grid to reduce makespan, storage consumption, access latency and network bandwidth.Data replication enhances data availability and thereby increases the system reliability.Managing dynamic architecture of the grid, decision making of replica placement, storage space, cost of replication and selection are some of the issues that impact the performance of the grid.Benefits of data replication strategies include availability, reliability, scalability, adaptability and improved performance.As the name indicates, in dynamic grid, nodes can join and leave the grid anytime.Any replica placement and selection strategy tries to improve one or more of the following parameters: makespan, quality assurance, file missing rate, byte missing rate, communication cost, response time, bandwidth consumption, access latency, load balancing, maintenance cost, job execution time, fault tolerance and strategic replica placement.Identifying Dynamic Replication Strategies for a High-PerformanceData Grid--- Grid Computing 2001 identify, comparative, alternative, preliminary, envision, hierarchical, tier, above-mentioned, interpret, exhibit, defer, methodology, pending, scale, solely, churn outlarge amounts ofpose new problemsdenoted asadapt toconcentrate on doingconduct experimentssend it offin the order of petabytesas of nowDynamic replication can be used to reduce bandwidth consumption and access latency in high performance “data grids” where users require remote access to large files.A data grid connects a collection of geographically distributed computer and storage resources that may be located in different parts of a country or even in different countries, and enables users to share data and other resources.The main aims of using replication are to reduce access latency and bandwidth consumption. Replication can also help in load balancing and can improve reliability by creating multiple copies of the same data.Group-Based Management of Distributed File Caches--- Distributed Computing Systems, 2002 mechanism, exploit, inherent, detrimental, preempt, incur, mask, fetch, likelihood, overlapping, subtle,in spite ofcontend withfar enough in advancetake sth for granted(be) superior toDynamic file grouping is an effective mechanism for exploiting the predictability of file access patterns and improving the caching performance of distributed file systems.With our grouping mechanism we establish relationships by observing file access behavior, without relying on inference from file location or content.We group files to reduce access latency. By fetching groups of files, instead of individual files, we increase cache hit rates when groups contain files that are likely to be accessed together.Further experimentation against the same workloads demonstrated that recency was a better estimator of per-file succession likelihood than frequency counts.Job scheduling and data replication on data grids--- Future Generation Computer Systems throttle, hierarchical, authorized, indicate, dispatch, assign, exhaustive, revenue, aggregate, trade-off, mechanism, kaleidoscopic, approximately, plentiful, inexact, anticipated, mimic, depict, exhaust, demonstrate, superiority, namely, consume,to address this problemdata resides on the nodesa variety ofaim toin contrast tofor the sake ofby means ofplay an important role inhave no distinction betweenin terms ofon the contrarywith respect toand so forthby virtue ofreferring back toA cluster represents an organization unit which is a group of sites that are geographically close.Network bandwidth between sites within a cluster will be larger than across clusters.Scheduling jobs to suitable grid sites is necessary because data movement between different grid sites is time consuming.If a job is scheduled to a site where the required data are present, the job can process data in this site without any transmission delay for getting data from a remote site.RADPA: Reliability-aware Data Placement Algorithm for large-scale network storage systems--- High Performance Computing and Communications, 2009 ever-going, oblivious, exponentially, confront,as a consequencethat is to saysubject to the constraintit doesn't make sense to doMost of the replica data placement algorithms concern about the following two objectives, fairness and adaptability.In large-scale network storage systems, the reliabilities of devices are different relevant to device manufacturers and types.It can fairly distributed data among devices and reorganize near-minimum amount of data to preserve the balanced distribution with the changes of devices.Partitioning Functions for Stateful Data Parallelism in Stream Processing--- The VLDB Journal skewed, desirable, associated, exhibit, superior, accordingly, necessitate, prominent, tractable, exploit, effectively, efficiently, transparent, elastically, amenable, conflicting, concretely, exemplify, depict,a deluge ofin the form of continuous streamslarge volumes ofnecessitate doingas a examplefor instancein this scenarioAccordingly, there is an increasing need to gather and analyze data streams in near real-time to extract insights and detect emerging patterns and outliers.The increased affordability of distributed and parallel computing, thanks to advances in cloud computing and multi-core chip design, has made this problem tractable.However, in the presence of skew in the distribution of the partitioning key, the balance properties cannot be maintained by the consistent hash.MORM: A Multi-objective Optimized Replication Management strategyfor cloud storage cluster--- Journal of Systems Architecture issue, achieve, latency, entail, consumption, article, propose, candidate, conclusively, demonstrate, outperform, nowadays, huge, currently, crucial, significantly, adopt, observe, collectively, previously, holistic, thus, tradeoff, primary, therefore, aforementioned, capture, layout, remainder, formulate, present, enormous, drawback, infrastructure, chunk, nonetheless, moreover, duration, substantially, wherein, overall, collision, shortcoming, affect, further, address, motivate, explicitly, suppose, assume, entire, invariably, compromise, inherently, pursue, handle, denote, utilize, constraint, accordingly, infeasible, violate, respectively, guarantee, satisfaction, indicate, hence, worst-case, synthetic, assess, rarely, throughout, diversity, preference, illustrate, imply, additionally, is an important issuea series ofin terms ofin a distributed mannerin order toby defaultbe referred to astake a holistic view ofconflict witha variety ofis highly in demandgiven the aforementioned issue and trendtake into accountyield close toas followstake into considerationwith respect toa research hot spotcall foraccording todepend upon/onmeet ... requirementfocus onis sensitive tois composed ofconsist offrom the latency minimization perspectivea certain number ofis defined as (follows) / can be expressed as (follows) /can be calculated/computed by / is given by the followingat handcorresponding tohas nothing to do within addition toas depicted in Fig.1et al.The volume of data is measured in terabytes and some time in petabytes in many fields.Data replication allows speeding up data access, reducing access latency and increasing data availability.How many suitable replicas of each data should be created in the cloud to meet a reasonable system requirement is an important issue for further research.Where should these replicas be placed to meet the system task fast execution rate and load balancing requirements is another important issue to be thoroughly investigated.As the system maintenance cost will significantly increase with the number of replicas increasing, keeping too many or fixed replicas are not a good choice.Where should these replicas be placed to meet the system task fast execution rate and load balancing requirements is another important issue to be thoroughly investigated.We build up five objectives for optimization which provides us with the advantage that we can search for solutions that yield close to optimal values for these objectives.The shortcoming of them is that they only consider a restricted set of parameters affecting the replication decision. Further, they only focus on the improvement of the system performance and they do not address the energy efficiency issue in data centers.Data node load variance is the standard deviation of data node load of all data nodes in the cloud storage cluster which can be used to represent the degree of load balancing of the system.The advantage of using simulation is that we can easily vary parameters to understand their individual impact on system performance.Throughout the simulation, we assumed "write-once, read-many" data and did not include the consistency or write and update propagations costs in the study.Distributed replica placement algorithms for correlated data--- The Journal of Supercomputing yield, potential, congestion, prolonged, malicious, overhead, conventional, present, propose, numerous, tackle, pervasive, valid, utilize,develop a .... algorithmsuffer fromin a distributed mannerbe denoted as Mconverge toso on and so forthWith the advances in Internet technologies, applications are all moving toward serving widely distributed users.Replication techniques have been commonly used to minimize the communication latency by bringing the data close to the clients and improve data availability.Thus, data needs to be carefully placed to avoid unnecessary overhead.These correlations have significant impact on data access patterns.For structured data, data correlated due to the structural relations may be frequently accessed together.Assume that data objects can be clustered into different classes due to user accesses, and whenever a client issues an access request, it will only access data in a single class.One challenge for using centralized replica placement algorithms in a widely distributed system is that a server site has to know the (logical) network topology and the resident set of all structured data sets to make replication decisions.We assume that the data objects accessed by most of the transactions follow certain patterns, which will be stable for some time periods.Locality-aware allocation of multi-dimensional correlated files on thecloud platform--- Distributed and Parallel Databases enormous, retrieve, prevailing, commonly, correlated, booming, massive, exploit, crucial, fundamental, heuristic, deterministic, duplication, compromised, brute-force, sacrifice, sophisticated, investigate, abundant, notation, as a matter of factin various wayswith .... taken into considerationplay a vital role init turns out thatin terms ofvice versaa.k.a.= also known asThe effective management of enormous data volumes on the Cloud platform has attracted devoting research efforts.Currently, most prevailing Cloud file systems allocate data following the principles of fault tolerance and availability, while inter-file correlations, i.e. files correlated with each other, are often neglected.There is a trade-off between data locality and the scale of job parallelism.Although distributing data randomly is expected to achieve the best parallelism, however, such a method may lead to degraded user experiences for introducing extra costs on large volume of remote accesses, especially for many applications that are featured with data locality, e.g., context-aware search, subspace oriented aggregation queries, and etc.However, there must be several application-dependent hot subspaces, under which files are frequently being processed.The problem is how to find a compromised partition solution to well serve the file correlations of different feature subspaces as much as possible.If too many files are grouped together, the imbalance cost would raise and degrade the scale of job parallelism;if files are partitioned into too many small groups, data copying traffic across storage nodes would increase.Instead, our solution is to start from a sub-optimal solution and employ some heuristics to derive a near optimal partition with as less cost as possible.By allocating correlated files together, significant I/O savings can be achieved on reducing the huge cost of random data access over the entire distributed storage network.Big Data Analytics framework for Peer-to-Peer Botnet detection usingRandom Forests--- Information Sciences magnitude, accommodate, upsurge, issue, hence, propose, devise, thereby, has struggled toit was revealed thatis expanding exponentiallytake advantage ofin the pastin the realm ofover the last few yearsthere has also been research onin a scalable manneras per the current knowledge of the authorson the contraryin naturereport their work onNetwork traffic monitoring and analysis-related research has struggled to scale for massive amounts of data in real time.In this paper the authors build up on the progress of open source tools like Hadoop, Hive and Mahout to provide a scalable implementation of quasi-real-time intrusion detection system.As per the current knowledge of the authors, the area of network security analytics severely lacks prior research in addressing the issue of Big Data.Improving pattern recognition accuracy of partial discharges by newdata preprocessing methods--- Electric Power Systems Research stochastic, oscillation, literature, utilize, conventional, derive, distinctive, discriminative, artificial, significantly, considerably, furthermore, likewise, Additionally, reasonable, symbolize, eventually, scenario, consequently, appropriate, momentous, conduct, depict, waveshape, deficiency, nonetheless, derived, respectively, suffer from, notably,be taken into considerationby means ofto our best knowledgein accordance withwith respect toas mentionedwith regard tobe equal withlead tofor instancein additionin comparison toThus, analyzing the huge amount of data is not feasible unless data pre-processing is manipulated.As mentioned, PD is completely a random and nonlinear phenomenon. Since ANNs are the best classifiers to model such nonlinear systems, PD patterns can be recognized suitably by ANNs.In other words, when classifier is trained after initial sophistications based on the PRPD patterns extracted from some objects including artificial defects, it can be efficiently used in practical fields to identify the exactly same PD sources by new test data without any iterative process.In pulse shape characterization, some signal processing methods such as Wavelet or Fourier transforms are usually used to extract some features from PD waveshape. These methods are affected by noise and so it is necessary to incorporate some de-noising methods into the pattern recognition process.PD identification is usually performed using PRPD recognition which is not influenced by changing the experimental set up.Partial Discharge Pattern Recognition of Cast Resin CurrentTransformers Using Radial Basis Function Neural Network--- Journal of Electrical Engineering & Technology propose, novel, vital, demonstrate, conduct, significant,This paper proposes a novel pattern recognition approach based on the radial basis function (RBF) neural network for identifying insulation defects of high-voltage electrical apparatus arising from partial discharge (PD).PD measurement and pattern recognition are important tools for improving the reliability of the high-voltage insulation system.。

一种基于相对移动性的 Ad-hoc 网络分簇算法

一种基于相对移动性的 Ad-hoc 网络分簇算法王超;李长庚【摘要】Aiming at the instability problem of network hierarchy structure caused by the movement of nodes in ad-hoc network,this paper proposes a weighted clustering algorithm which is based on relative mobility between nodes.The algorithm applies the correlation mobility be-tween local nodes to clusters election,and considers the effect of distance factor as well.Through the analytic hierarchy process (AHP)it cal-culates the weight of each factor.The algorithm is simulated using NS2 simulation tool and compared with the classical algorithms.Results show that this algorithm is optimised in numbers of the cluster head,the performance increases of about 10% ~15% in number of the cluster attachment changes and the number of cluster head update.It improves the stability of the cluster structure effectively.%针对 Ad-hoc 网络中节点移动而导致网络分层结构不稳定的问题，提出一种基于节点间的相对移动性的加权分簇算法RMCA （Relative Mobility Clustering Algorithm）。

基于复杂网络理论的大型换热网络节点重要性评价

2017年第36卷第5期 CHEMICAL INDUSTRY AND ENGINEERING PROGRESS·1581·化工进展基于复杂网络理论的大型换热网络节点重要性评价王政1，孙锦程1，刘晓强1，姜英1，贾小平2，王芳2（1青岛科技大学化工学院，山东青岛 266042；2青岛科技大学环境与安全工程学院，山东青岛 266042）摘要：鉴于换热网络大型化和流股间复杂关系，使得换热网络换热器节点重要性的研究显得越来越重要，对其控制和安全运行的工程实践方面具有指导意义。

本文以大型换热网络为研究对象，将换热器抽象为节点，换热器之间的干扰传递抽象为边，构造网络拓扑结构。

在复杂网络理论的基础上，提出了评价大型换热网络节点重要性的策略和模型。

首先，从网络的点度中心性、中间中心性、接近中心性和特征向量中心性等网络拓扑结构属性出发，依据多属性决策方法对网络节点重要性进行综合评价；其次，考虑换热网络的方向性，基于PageRank 算法对该网络进行节点重要性评价研究。

综合两个算法的计算结果得出最终结论。

案例分析表明：该研究方法是有效的，可从不同的角度全面评价换热网络的节点重要性，丰富了换热器节点重要性评价的相关理论。

关键词：换热网络；复杂网络；节点重要性；多属性决策；PageRank 算法中图分类号：X92 文献标志码：A 文章编号：1000–6613（2017）05–1581–08 DOI ：10.16085/j.issn.1000-6613.2017.05.004Evaluation of the node importance for large heat exchanger networkbased on complex network theoryWANG Zheng 1，SUN Jincheng 1，LIU Xiaoqiang 1，JIANG Ying 1，JIA Xiaoping 2，WANG Fang 2（1College of Chemical Engineering ，Qingdao University of Science and Technology ，Qingdao 266042，Shandong ，China ；2College of Environment and Safety Engineering ，Qingdao University of Science and Technology ，Qingdao266042，Shandong ，China ）Abstract ：Because of the complexity of large-scale heat exchanger network ，it is important to investigate the importance of heat exchanger nodes in heat exchanger network. It can provide guidance for the control and safe operation of heat exchanger networks ，as well as engineering practices. In this paper ，the network topology structure of large-scale heat exchanger network was constructed by treating heat exchangers as nodes and treating the transfer of interference between heat exchangers as edges. Based on the complex network theory ，the strategies and models for evaluating the node importance of the heat exchanger network were proposed. Firstly ，the importance of nodes were evaluated by the multi-attribute decision method based on the degree centrality, betweenness ，closeness and eigenvector centralities. Next ，considering the direction of case heat exchanger network ，PageRank algorithm was used to evaluate the importance of nodes. Considering the results from these two algorithms ，the final results were obtained. The case analysis showed that the strategy is effective and it can evaluate the node importance from different views ，which will enrich the node importance evaluation theory for heat exchanger network.Key words ：heat exchanger network ；complex network ；node importance ；multi-attribute decision ；PageRank algorithm第一作者及联系人：王政（1968—），男，博士，副教授，硕士生导师，主要研究过程系统工程。

人工智能专业重要词汇表

⼈⼯智能专业重要词汇表A开头的词汇Artificial General Intelligence/AGI 通⽤⼈⼯智能Artificial Intelligence/AI ⼈⼯智能Association analysis 关联分析Attention mechanism 注意⼒机制Attribute conditional independence assumption 属性条件独⽴性假设Attribute space 属性空间Attribute value 属性值Autoencoder ⾃编码器Automatic speech recognition ⾃动语⾳识别Automatic summarization ⾃动摘要Average gradient 平均梯度Average-Pooling 平均池化Accumulated error backpropagation 累积误差逆传播Activation Function 激活函数Adaptive Resonance Theory/ART ⾃适应谐振理论Addictive model 加性学习Adversarial Networks 对抗⽹络Affine Layer 仿射层Affinity matrix 亲和矩阵Agent 代理 / 智能体Algorithm 算法Alpha-beta pruning α-β剪枝Anomaly detection 异常检测Approximation 近似Area Under ROC Curve／AUC Roc 曲线下⾯积B开头的词汇Backpropagation Through Time 通过时间的反向传播Backpropagation/BP 反向传播Base learner 基学习器Base learning algorithm 基学习算法Batch Normalization/BN 批量归⼀化Bayes decision rule 贝叶斯判定准则Bayes Model Averaging／BMA 贝叶斯模型平均Bayes optimal classifier 贝叶斯最优分类器Bayesian decision theory 贝叶斯决策论Bayesian network 贝叶斯⽹络Between-class scatter matrix 类间散度矩阵Bias 偏置 / 偏差Bias-variance decomposition 偏差-⽅差分解Bias-Variance Dilemma 偏差 – ⽅差困境Bi-directional Long-Short Term Memory/Bi-LSTM 双向长短期记忆Binary classification ⼆分类Binomial test ⼆项检验Bi-partition ⼆分法Boltzmann machine 玻尔兹曼机Bootstrap sampling ⾃助采样法／可重复采样／有放回采样Bootstrapping ⾃助法Break-Event Point／BEP 平衡点3、C开头的词汇Calibration 校准Cascade-Correlation 级联相关Categorical attribute 离散属性Class-conditional probability 类条件概率Classification and regression tree/CART 分类与回归树Classifier 分类器Class-imbalance 类别不平衡Closed -form 闭式Cluster 簇/类/集群Cluster analysis 聚类分析Clustering 聚类Clustering ensemble 聚类集成Co-adapting 共适应Coding matrix 编码矩阵COLT 国际学习理论会议Committee-based learning 基于委员会的学习Competitive learning 竞争型学习Component learner 组件学习器Comprehensibility 可解释性Computation Cost 计算成本Computational Linguistics 计算语⾔学Computer vision 计算机视觉Concept drift 概念漂移Concept Learning System /CLS 概念学习系统Conditional entropy 条件熵Conditional mutual information 条件互信息Conditional Probability Table／CPT 条件概率表Conditional random field/CRF 条件随机场Conditional risk 条件风险Confidence 置信度Confusion matrix 混淆矩阵Connection weight 连接权Connectionism 连结主义Consistency ⼀致性／相合性Contingency table 列联表Continuous attribute 连续属性Convergence 收敛Conversational agent 会话智能体Convex quadratic programming 凸⼆次规划Convexity 凸性Convolutional neural network/CNN 卷积神经⽹络Co-occurrence 同现Correlation coefficient 相关系数Cosine similarity 余弦相似度Cost curve 成本曲线Cost Function 成本函数Cost matrix 成本矩阵Cost-sensitive 成本敏感Cross entropy 交叉熵Cross validation 交叉验证Crowdsourcing 众包Curse of dimensionality 维数灾难Cut point 截断点Cutting plane algorithm 割平⾯法4、D开头的词汇Data mining 数据挖掘Data set 数据集Decision Boundary 决策边界Decision stump 决策树桩Decision tree 决策树／判定树Deduction 演绎Deep Belief Network 深度信念⽹络Deep Convolutional Generative Adversarial Network/DCGAN 深度卷积⽣成对抗⽹络Deep learning 深度学习Deep neural network/DNN 深度神经⽹络Deep Q-Learning 深度 Q 学习Deep Q-Network 深度 Q ⽹络Density estimation 密度估计Density-based clustering 密度聚类Differentiable neural computer 可微分神经计算机Dimensionality reduction algorithm 降维算法Directed edge 有向边Disagreement measure 不合度量Discriminative model 判别模型Discriminator 判别器Distance measure 距离度量Distance metric learning 距离度量学习Distribution 分布Divergence 散度Diversity measure 多样性度量／差异性度量Domain adaption 领域⾃适应Downsampling 下采样D-separation （Directed separation）有向分离Dual problem 对偶问题Dummy node 哑结点Dynamic Fusion 动态融合Dynamic programming 动态规划5、E开头的词汇Eigenvalue decomposition 特征值分解Embedding 嵌⼊Emotional analysis 情绪分析Empirical conditional entropy 经验条件熵Empirical entropy 经验熵Empirical error 经验误差Empirical risk 经验风险End-to-End 端到端Energy-based model 基于能量的模型Ensemble learning 集成学习Ensemble pruning 集成修剪Error Correcting Output Codes／ECOC 纠错输出码Error rate 错误率Error-ambiguity decomposition 误差-分歧分解Euclidean distance 欧⽒距离Evolutionary computation 演化计算Expectation-Maximization 期望最⼤化Expected loss 期望损失Exploding Gradient Problem 梯度爆炸问题Exponential loss function 指数损失函数Extreme Learning Machine/ELM 超限学习机6、F开头的词汇Factorization 因⼦分解False negative 假负类False positive 假正类False Positive Rate/FPR 假正例率Feature engineering 特征⼯程Feature selection 特征选择Feature vector 特征向量Featured Learning 特征学习Feedforward Neural Networks/FNN 前馈神经⽹络Fine-tuning 微调Flipping output 翻转法Fluctuation 震荡Forward stagewise algorithm 前向分步算法Frequentist 频率主义学派Full-rank matrix 满秩矩阵Functional neuron 功能神经元7、G开头的词汇Gain ratio 增益率Game theory 博弈论Gaussian kernel function ⾼斯核函数Gaussian Mixture Model ⾼斯混合模型General Problem Solving 通⽤问题求解Generalization 泛化Generalization error 泛化误差Generalization error bound 泛化误差上界Generalized Lagrange function ⼴义拉格朗⽇函数Generalized linear model ⼴义线性模型Generalized Rayleigh quotient ⼴义瑞利商Generative Adversarial Networks/GAN ⽣成对抗⽹络Generative Model ⽣成模型Generator ⽣成器Genetic Algorithm/GA 遗传算法Gibbs sampling 吉布斯采样Gini index 基尼指数Global minimum 全局最⼩Global Optimization 全局优化Gradient boosting 梯度提升Gradient Descent 梯度下降Graph theory 图论Ground-truth 真相／真实8、H开头的词汇Hard margin 硬间隔Hard voting 硬投票Harmonic mean 调和平均Hesse matrix 海塞矩阵Hidden dynamic model 隐动态模型Hidden layer 隐藏层Hidden Markov Model/HMM 隐马尔可夫模型Hierarchical clustering 层次聚类Hilbert space 希尔伯特空间Hinge loss function 合页损失函数Hold-out 留出法Homogeneous 同质Hybrid computing 混合计算Hyperparameter 超参数Hypothesis 假设Hypothesis test 假设验证9、I开头的词汇ICML 国际机器学习会议Improved iterative scaling/IIS 改进的迭代尺度法Incremental learning 增量学习Independent and identically distributed/i.i.d. 独⽴同分布Independent Component Analysis/ICA 独⽴成分分析Indicator function 指⽰函数Individual learner 个体学习器Induction 归纳Inductive bias 归纳偏好Inductive learning 归纳学习Inductive Logic Programming／ILP 归纳逻辑程序设计Information entropy 信息熵Information gain 信息增益Input layer 输⼊层Insensitive loss 不敏感损失Inter-cluster similarity 簇间相似度International Conference for Machine Learning/ICML 国际机器学习⼤会Intra-cluster similarity 簇内相似度Intrinsic value 固有值Isometric Mapping/Isomap 等度量映射Isotonic regression 等分回归Iterative Dichotomiser 迭代⼆分器10、K开头的词汇Kernel method 核⽅法Kernel trick 核技巧Kernelized Linear Discriminant Analysis／KLDA 核线性判别分析K-fold cross validation k 折交叉验证／k 倍交叉验证K-Means Clustering K – 均值聚类K-Nearest Neighbours Algorithm/KNN K近邻算法Knowledge base 知识库Knowledge Representation 知识表征11、L开头的词汇Label space 标记空间Lagrange duality 拉格朗⽇对偶性Lagrange multiplier 拉格朗⽇乘⼦Laplace smoothing 拉普拉斯平滑Laplacian correction 拉普拉斯修正Latent Dirichlet Allocation 隐狄利克雷分布Latent semantic analysis 潜在语义分析Latent variable 隐变量Lazy learning 懒惰学习Learner 学习器Learning by analogy 类⽐学习Learning rate 学习率Learning Vector Quantization/LVQ 学习向量量化Least squares regression tree 最⼩⼆乘回归树Leave-One-Out/LOO 留⼀法linear chain conditional random field 线性链条件随机场Linear Discriminant Analysis／LDA 线性判别分析Linear model 线性模型Linear Regression 线性回归Link function 联系函数Local Markov property 局部马尔可夫性Local minimum 局部最⼩Log likelihood 对数似然Log odds／logit 对数⼏率Logistic Regression Logistic 回归Log-likelihood 对数似然Log-linear regression 对数线性回归Long-Short Term Memory/LSTM 长短期记忆Loss function 损失函数12、M开头的词汇Machine translation/MT 机器翻译Macron-P 宏查准率Macron-R 宏查全率Majority voting 绝对多数投票法Manifold assumption 流形假设Manifold learning 流形学习Margin theory 间隔理论Marginal distribution 边际分布Marginal independence 边际独⽴性Marginalization 边际化Markov Chain Monte Carlo/MCMC 马尔可夫链蒙特卡罗⽅法Markov Random Field 马尔可夫随机场Maximal clique 最⼤团Maximum Likelihood Estimation/MLE 极⼤似然估计／极⼤似然法Maximum margin 最⼤间隔Maximum weighted spanning tree 最⼤带权⽣成树Max-Pooling 最⼤池化Mean squared error 均⽅误差Meta-learner 元学习器Metric learning 度量学习Micro-P 微查准率Micro-R 微查全率Minimal Description Length/MDL 最⼩描述长度Minimax game 极⼩极⼤博弈Misclassification cost 误分类成本Mixture of experts 混合专家Momentum 动量Moral graph 道德图／端正图Multi-class classification 多分类Multi-document summarization 多⽂档摘要Multi-layer feedforward neural networks 多层前馈神经⽹络Multilayer Perceptron/MLP 多层感知器Multimodal learning 多模态学习Multiple Dimensional Scaling 多维缩放Multiple linear regression 多元线性回归Multi-response Linear Regression ／MLR 多响应线性回归Mutual information 互信息13、N开头的词汇Naive bayes 朴素贝叶斯Naive Bayes Classifier 朴素贝叶斯分类器Named entity recognition 命名实体识别Nash equilibrium 纳什均衡Natural language generation/NLG ⾃然语⾔⽣成Natural language processing ⾃然语⾔处理Negative class 负类Negative correlation 负相关法Negative Log Likelihood 负对数似然Neighbourhood Component Analysis/NCA 近邻成分分析Neural Machine Translation 神经机器翻译Neural Turing Machine 神经图灵机Newton method ⽜顿法NIPS 国际神经信息处理系统会议No Free Lunch Theorem／NFL 没有免费的午餐定理Noise-contrastive estimation 噪⾳对⽐估计Nominal attribute 列名属性Non-convex optimization ⾮凸优化Nonlinear model ⾮线性模型Non-metric distance ⾮度量距离Non-negative matrix factorization ⾮负矩阵分解Non-ordinal attribute ⽆序属性Non-Saturating Game ⾮饱和博弈Norm 范数Normalization 归⼀化Nuclear norm 核范数Numerical attribute 数值属性14、O开头的词汇Objective function ⽬标函数Oblique decision tree 斜决策树Occam’s razor 奥卡姆剃⼑Odds ⼏率Off-Policy 离策略One shot learning ⼀次性学习One-Dependent Estimator／ODE 独依赖估计On-Policy 在策略Ordinal attribute 有序属性Out-of-bag estimate 包外估计Output layer 输出层Output smearing 输出调制法Overfitting 过拟合／过配Oversampling 过采样15、P开头的词汇Paired t-test 成对 t 检验Pairwise 成对型Pairwise Markov property 成对马尔可夫性Parameter 参数Parameter estimation 参数估计Parameter tuning 调参Parse tree 解析树Particle Swarm Optimization/PSO 粒⼦群优化算法Part-of-speech tagging 词性标注Perceptron 感知机Performance measure 性能度量Plug and Play Generative Network 即插即⽤⽣成⽹络Plurality voting 相对多数投票法Polarity detection 极性检测Polynomial kernel function 多项式核函数Pooling 池化Positive class 正类Positive definite matrix 正定矩阵Post-hoc test 后续检验Post-pruning 后剪枝potential function 势函数Precision 查准率／准确率Prepruning 预剪枝Principal component analysis/PCA 主成分分析Principle of multiple explanations 多释原则Prior 先验Probability Graphical Model 概率图模型Proximal Gradient Descent/PGD 近端梯度下降Pruning 剪枝Pseudo-label 伪标记16、Q开头的词汇Quantized Neural Network 量⼦化神经⽹络Quantum computer 量⼦计算机Quantum Computing 量⼦计算Quasi Newton method 拟⽜顿法17、R开头的词汇Radial Basis Function／RBF 径向基函数Random Forest Algorithm 随机森林算法Random walk 随机漫步Recall 查全率／召回率Receiver Operating Characteristic/ROC 受试者⼯作特征Rectified Linear Unit/ReLU 线性修正单元Recurrent Neural Network 循环神经⽹络Recursive neural network 递归神经⽹络Reference model 参考模型Regression 回归Regularization 正则化Reinforcement learning/RL 强化学习Representation learning 表征学习Representer theorem 表⽰定理reproducing kernel Hilbert space/RKHS 再⽣核希尔伯特空间Re-sampling 重采样法Rescaling 再缩放Residual Mapping 残差映射Residual Network 残差⽹络Restricted Boltzmann Machine/RBM 受限玻尔兹曼机Restricted Isometry Property/RIP 限定等距性Re-weighting 重赋权法Robustness 稳健性/鲁棒性Root node 根结点Rule Engine 规则引擎Rule learning 规则学习18、S开头的词汇Saddle point 鞍点Sample space 样本空间Sampling 采样Score function 评分函数Self-Driving ⾃动驾驶Self-Organizing Map／SOM ⾃组织映射Semi-naive Bayes classifiers 半朴素贝叶斯分类器Semi-Supervised Learning 半监督学习semi-Supervised Support Vector Machine 半监督⽀持向量机Sentiment analysis 情感分析Separating hyperplane 分离超平⾯Sigmoid function Sigmoid 函数Similarity measure 相似度度量Simulated annealing 模拟退⽕Simultaneous localization and mapping 同步定位与地图构建Singular Value Decomposition 奇异值分解Slack variables 松弛变量Smoothing 平滑Soft margin 软间隔Soft margin maximization 软间隔最⼤化Soft voting 软投票Sparse representation 稀疏表征Sparsity 稀疏性Specialization 特化Spectral Clustering 谱聚类Speech Recognition 语⾳识别Splitting variable 切分变量Squashing function 挤压函数Stability-plasticity dilemma 可塑性-稳定性困境Statistical learning 统计学习Status feature function 状态特征函Stochastic gradient descent 随机梯度下降Stratified sampling 分层采样Structural risk 结构风险Structural risk minimization/SRM 结构风险最⼩化Subspace ⼦空间Supervised learning 监督学习／有导师学习support vector expansion ⽀持向量展式Support Vector Machine/SVM ⽀持向量机Surrogat loss 替代损失Surrogate function 替代函数Symbolic learning 符号学习Symbolism 符号主义Synset 同义词集19、T开头的词汇T-Distribution Stochastic Neighbour Embedding/t-SNE T – 分布随机近邻嵌⼊Tensor 张量Tensor Processing Units/TPU 张量处理单元The least square method 最⼩⼆乘法Threshold 阈值Threshold logic unit 阈值逻辑单元Threshold-moving 阈值移动Time Step 时间步骤Tokenization 标记化Training error 训练误差Training instance 训练⽰例／训练例Transductive learning 直推学习Transfer learning 迁移学习Treebank 树库Tria-by-error 试错法True negative 真负类True positive 真正类True Positive Rate/TPR 真正例率Turing Machine 图灵机Twice-learning ⼆次学习20、U开头的词汇Underfitting ⽋拟合／⽋配Undersampling ⽋采样Understandability 可理解性Unequal cost ⾮均等代价Unit-step function 单位阶跃函数Univariate decision tree 单变量决策树Unsupervised learning ⽆监督学习／⽆导师学习Unsupervised layer-wise training ⽆监督逐层训练Upsampling 上采样21、V开头的词汇Vanishing Gradient Problem 梯度消失问题Variational inference 变分推断VC Theory VC维理论Version space 版本空间Viterbi algorithm 维特⽐算法Von Neumann architecture 冯 · 诺伊曼架构22、W开头的词汇Wasserstein GAN/WGAN Wasserstein⽣成对抗⽹络Weak learner 弱学习器Weight 权重Weight sharing 权共享Weighted voting 加权投票法Within-class scatter matrix 类内散度矩阵Word embedding 词嵌⼊Word sense disambiguation 词义消歧23、Z开头的词汇Zero-data learning 零数据学习Zero-shot learning 零次学习。

1+x大数据试题+参考答案

1+x大数据试题+参考答案一、单选题（共80题，每题1分，共80分）1、关于Sqoop数据的导入导出描述不正确的是？（）A、实现从MySQL到Hive的导入导出B、实现从MySQL到Oracle的导入导出C、实现从HDFS到Oracle的导入导出D、实现从HDFS到MySQL的导入导出正确答案：B2、关于ZooKeeper临时节点的说法正确的是？（）A、创建临时节点的命令为：create -s /tmp myvalueB、临时节点允许有子节点C、一旦会话结束，临时节点将被自动删除D、临时节点不能手动删除正确答案：C3、下列关于调度器的描述不正确的是？（）A、先进先出调度器可以是多队列B、容器调度器其实是多个FIFO队列C、公平调度器不允许管理员为每个队列单独设置调度策略D、先进先出调度器以集群资源独占的方式运行作业正确答案：A4、Hive 适合（）环境A、Hive 适合关系型数据环境B、Hive 适合用于联机(online)事务处理C、适合应用在大量不可变数据的批处理作业D、提供实时查询功能正确答案：C5、下列哪些不是 ZooKeeper 的特点（）A、可靠性B、顺序一致性C、多样系统映像D、原子性正确答案：C6、tar 命令用于对文件进行打包压缩或解压，-t 参数含义（）A、查看压缩包内有哪些文件B、创建压缩文件C、向压缩归档末尾追加文件D、解开压缩文件正确答案：A7、下列哪些不是 HBase 的特点（）A、高可靠性B、高性能C、面向列D、紧密性正确答案：D8、把公钥追加到授权文件的命令是？（）A、ssh-addB、ssh-copy-idC、ssh-keygenD、ssh正确答案：B9、HDFS有一个gzip文件大小75MB，客户端设置Block大小为64MB。

当运行mapreduce任务读取该文件时input split大小为？A、64MBB、75MBC、一个map读取64MB，另外一个map读取11MB正确答案：B10、大数据平台实施方案流程中,建议整个项目过程顺序是（）。

基于D-S证据理论的加权协作频谱检测算法

基于D-S证据理论的加权协作频谱检测算法周亚建;刘凯;肖林【摘要】提出了一种基于 D-S 证据理论的加权协作频谱检测(DS-WCSS)算法。

该算法使用能量检测进行本地检测,利用2种假设检验条件下检验统计量的方差和均值来评估各认知用户可信度的差异性,进而给出各认知用户可信度的权重,最后使用D-S证据理论进行数据融合和判决。

仿真结果表明,与基于D-S证据理论和传统硬判决的协作频谱检测算法相比,DS-WCSS可以有效地提高检测性能。

%10.3969/j.issn.1000-436x.2012.12.003【期刊名称】《通信学报》【年(卷),期】2012(000)012【总页数】6页(P19-24)【关键词】认知无线电;协作频谱检测;Dempster-Shafer证据理论;可信度【作者】周亚建;刘凯;肖林【作者单位】北京邮电大学计算机学院,北京 100876; 电子信息控制重点实验室,四川成都 610036;北京航空航天大学电子信息工程学院,北京 100191;北京航空航天大学电子信息工程学院,北京 100191【正文语种】中文【中图分类】TN915.011 引言传统的无线频谱管理策略给授权用户分配固定的频段使用，不过，伴随着无线通信业务的发展，这种策略造成了一些通信区域某些频段在众多用户进行大量通信业务时频谱匮乏，而另外一些通信区域中的某些频段存在大量的空闲频谱[1]。

认知无线电（CR, cognitive radio）技术通过借用空闲频谱来解决这个问题，从而提高了频谱利用率。

它通过频谱检测来判断特定频谱是否空闲并且加以利用。

虚警概率和检测概率是衡量检测性能的标准。

进行频谱检测时，需要较低的虚警概率来发现更多的空闲频谱以及较高的检测概率来降低对授权用户的干扰。

频谱检测按照认知用户是否协作可分为本地频谱检测和协作频谱检测。

本地频谱检测主要有3种技术:匹配滤波器检测、特征检测以及能量检测[2]。

匹配滤波器检测的精度高，但是需要知道授权用户的信号类型；特征检测不需要知道授权用户的信号类型，但是计算量大；能量检测简单易于实现，并且不需要知道授权用户的信号就可以进行检测，因此，本文采用能量检测进行本地检测。

硕士论文_无线传感器网络定位算法的研究

硕士学位论文MASTER’S DISSERTATION论文题目无线传感器网络定位算法的研究A Dissertation in Computer Application TechnologySTUDY ON LOCALIZATION ALGORITHM OF WIRELESS SENSOR NETWORKby Hu YulanSupervisor: Professor Wang XinshengYanshan University2011.12燕山大学硕士学位论文原创性声明本人郑重声明：此处所提交的硕士学位论文《无线传感器网络定位算法的研究》，是本人在导师指导下，在燕山大学攻读硕士学位期间独立进行研究工作所取得的成果。

据本人所知，论文中除已注明部分外不包含他人已发表或撰写过的研究成果。

对本文的研究工作做出重要贡献的个人和集体，均已在文中以明确方式注明。

本声明的法律结果将完全由本人承担。

作者签字日期：年月日燕山大学硕士学位论文使用授权书《无线传感器网络定位算法的研究》系本人在燕山大学攻读硕士学位期间在导师指导下完成的硕士学位论文。

本论文的研究成果归燕山大学所有，本人如需发表将署名燕山大学为第一完成单位及相关人员。

本人完全了解燕山大学关于保存、使用学位论文的规定，同意学校保留并向有关部门送交论文的复印件和电子版本，允许论文被查阅和借阅。

本人授权燕山大学，可以采用影印、缩印或其他复制手段保存论文，可以公布论文的全部或部分内容。

保密□，在年解密后适用本授权书。

本学位论文属于不保密□。

（请在以上相应方框内打“√”）作者签名：日期：年月日导师签名：日期：年月日摘要摘要传感器节点的位置信息在无线传感器网络的监测活动等应用中起着至关重要的作用。

而取得节点位置信息较简便、快捷、精确的方法是通过手动设定或携带GPS 定位设备等手段，但通过这种方式获取的成本很高。

因此，较好的方法是采用定位算法进行估计。

本文将主要研究基于多维标度的无线传感器网络定位算法。

大数据开发工程师招聘笔试题与参考答案(某大型央企)2025年

2025年招聘大数据开发工程师笔试题与参考答案(某大型央企)(答案在后面)一、单项选择题（本大题有10小题，每小题2分，共20分）1、大数据开发工程师在处理海量数据时，以下哪种技术通常用于提高数据处理速度和效率？（）A、关系型数据库管理系统B、分布式文件系统C、数据仓库技术D、内存数据库2、在Hadoop生态系统中，用于实现分布式计算和存储的框架是？（）A、HiveB、MapReduceC、ZookeeperD、HBase3、题干：大数据开发工程师在数据仓库设计中，以下哪种数据模型最适合于支持复杂查询和快速数据访问？A、星型模型B、雪花模型C、星座模型D、星云模型4、题干：在处理大数据时，以下哪种技术可以有效地提高数据处理速度和效率？A、分布式文件系统B、关系型数据库C、NoSQL数据库D、内存数据库5、以下哪项不是大数据开发工程师常用的编程语言？A. PythonB. JavaC. C++D. SQL6、在Hadoop生态系统中，以下哪个组件用于实现分布式文件存储？A. HBaseB. HiveC. YARND. HDFS7、在Hadoop生态系统中，以下哪个组件主要用于处理大规模数据的分布式存储？A. HDFSB. YARNC. HiveD. HBase8、在数据分析中，以下哪个算法通常用于分类问题？A. K-MeansB. Decision TreeC. KNN（K-Nearest Neighbors）D. SVM（Support Vector Machine）9、大数据开发工程师在处理海量数据时，以下哪种技术通常用于提高数据处理的效率？A. 关系型数据库B. NoSQL数据库C. MapReduceD. 关系型数据库与NoSQL数据库结合 10、以下哪个不是Hadoop生态系统中用于处理大数据分析的技术？A. HiveB. HBaseC. PigD. Spark二、多项选择题（本大题有10小题，每小题4分，共40分）1、以下哪些技术是大数据开发工程师在处理海量数据时通常会使用的？（）A、Hadoop生态系统中的HDFS、MapReduce、HiveB、Spark生态系统中的Spark Core、Spark SQL、Spark StreamingC、NoSQL数据库，如MongoDB、Cassandra、RedisD、关系型数据库，如MySQL、Oracle、SQL Server2、大数据开发工程师在数据预处理阶段通常会进行哪些操作？（）A、数据清洗，包括去除重复数据、处理缺失值B、数据集成，将来自不同数据源的数据合并C、数据转换，将数据格式转换为适合分析的形式D、数据归一化，确保不同数据集之间的一致性E、数据脱敏，对敏感数据进行加密或屏蔽3、以下哪些技术栈是大数据开发工程师通常需要掌握的？（）A、Hadoop生态系统（包括HDFS、MapReduce、Hive、HBase等）B、Spark生态圈（包括Spark Core、Spark SQL、Spark Streaming等）C、NoSQL数据库（如MongoDB、Cassandra、Redis等）D、关系型数据库（如MySQL、Oracle等）E、机器学习框架（如TensorFlow、PyTorch等）4、以下关于大数据处理流程的描述，正确的是（）A、数据采集是大数据处理的第一步，包括从各种数据源获取数据B、数据预处理包括数据清洗、数据转换和数据去重等C、数据存储是将处理后的数据存储到分布式文件系统或数据库中D、数据分析是通过统计、机器学习等方法对数据进行挖掘和解释E、数据可视化是将数据分析的结果以图形、图表等形式展示出来5、以下哪些技术是大数据开发工程师在处理大数据时可能会使用到的？（）A、Hadoop生态圈中的HDFS、MapReduceB、Spark和Spark StreamingC、Flink和StormD、MySQL和OracleE、Elasticsearch和Kibana6、以下哪些工具或平台是用于大数据开发工程师进行数据可视化和分析的？（）A、TableauB、Power BIC、DatawrapperD、D3.jsE、Jupyter Notebook7、以下哪些技术栈是大数据开发工程师在项目中常用的？（）A、Hadoop生态圈（HDFS、MapReduce、Hive、HBase等）B、Spark生态圈（Spark Core、Spark SQL、Spark Streaming等）C、FlinkD、KafkaE、Redis8、以下关于大数据处理流程的描述，正确的是？（）A、数据采集是大数据处理的第一步，包括数据的收集和预处理B、数据存储是将采集到的数据存储到合适的存储系统中，如HDFSC、数据处理包括数据的清洗、转换和聚合等操作D、数据挖掘是从处理过的数据中提取有价值的信息或知识E、数据展示是将挖掘到的信息通过图表、报表等形式呈现给用户9、以下哪些技术栈是大数据开发工程师在项目开发中可能需要熟练掌握的？（）A. Hadoop生态系统（包括HDFS、MapReduce、YARN等）B. Spark生态（包括Spark Core、Spark SQL、Spark Streaming等）C. Kafka消息队列D. Elasticsearch全文检索E. MySQL关系型数据库 10、以下哪些行为符合大数据开发工程师的职业规范？（）A. 严格遵守公司代码审查和代码提交规范B. 在遇到技术难题时，首先尝试通过查阅资料和向同事求助解决C. 在团队协作中，积极分享自己的经验和知识D. 对于新技术的学习，只关注自己负责的模块，不关心其他模块E. 在项目中，遇到问题及时向上级汇报三、判断题（本大题有10小题，每小题2分，共20分）1、大数据开发工程师在工作中，Hadoop生态系统中的HDFS（Hadoop Distributed File System）主要用于存储非结构化和半结构化的大数据文件。

飞塔防火墙双机-HA与会话同步

HA Configuration
• Gratuitous ARP’s ▪ arps
• Bounce Links ▪ Link 失败信号
• 变化后重新协商 ▪ override
• Unit 优先级 ▪ priority
• 监控接口 ▪ monitor
• 虚拟集群 ▪ vcluster2
• 虚拟域成员 ▪ vdom
8
Slave External PMAC: 0b-a4-8e
负载均衡模式下主设备的会话表
session info: proto=6 proto_state=11 expire=3599 timeout=3600 flags=00000000 av_idx=4 use=5 bandwidth=0/sec guaranteed_bandwidth=0/sec traffic=0/sec prio=0 logtype=session ha_id=0 hakey=49729 tunnel=/ state=redir log local may_dirty statistic(bytes/packets/err): org=1253/21/0 reply=1503/19/0 tuples=3 orgin->sink: org pre->post, reply pre->post oif=3/5 gwy=192.168.11.254/10.0.1.1 hook=post dir=org act=snat 10.0.1.1:2287->193.1.193.64:21(192.168.11.101:2287) hook=pre dir=reply act=dnat 193.1.193.64:21->192.168.11.101:2287(10.0.1.1:2287) hook=post dir=reply act=noop 193.1.193.64:21->10.0.1.1:2287(0.0.0.0:0) pos/(before,after) -233083355/(0,8), 0/(0,0) misc=20004 domain_info=0 auth_info=0 ftgd_info=0 ids=0x0 vd=0 serial=00005ae5 tos=ff/ff

分布式无线传感器网络定位算法MDS_MAP_D_

分布式⽆线传感器⽹络定位算法MDS_MAP_D_2008年6⽉Journal on CommunicationsJune 2008第29卷第6期通信学报 V ol.29 No.6分布式⽆线传感器⽹络定位算法MDS-MAP(D)马震，刘云，沈波（北京交通⼤学通信与信息系统北京市重点实验室，北京 100044）摘要：针对⽆线传感器⽹络的定位问题，提出了⼀种分布式的算法MDS-MAP(D)，明确给出了节点相对坐标计算和局部⽹络融合的过程，并对算法进⾏了计算复杂性分析和仿真。

MDS-MAP(D)以分布式节点分簇为基础，利⽤⽹络的连接关系，在不需要⾼精度测距技术⽀持的条件下对节点坐标进⾏估计，减⼩了节点定位的计算复杂度和能量消耗。

分析与仿真结果表明，算法的计算复杂度由3()O N 下降到2(),O Nm m N <，并且定位精度提⾼了1%～3%。

关键词：⽆线传感器⽹络；定位；多维标度；分布式中图分类号：TP393 ⽂献标识码：A ⽂章编号：1000-436X(2008)06-0057-06Distributed locating algorithm for wireless sensornetworks- MDS-MAP(D)MA Zhen, LIU Yun, SHEN Bo(Key Laboratory of Communication & Information Systems, Beijing Jiaotong University,Beijing Municipal Commission of Education, Beijing 100044, China)Abstract: A new distributed locating algorithm MDS-MAP(D) was proposed, which attempted to improve the perform-ance of node localization in wireless sensor networks. The process of the computation about node relative coordinates and the aggregation from local network to global network are introduced explicitly. Further, the analyses to computational complexity and the simulations of the algorithm are also present. MDS-MAP(D), which is based on node clustering mechanism and uses connectivity of nodes to estimate the coordinates of nodes, reduces the complexity and energy con-sumption of node localization on the absence of distance measurement with high precision. The simulation and analysis results indicate that the complexity of node localization algorithm falls to 2(),O Nm m N < from 3()O N and the accu-racy is improved 1%～3%.Key words: wireless networks; location; multidimensional scaling; distribution1 引⾔⽆线传感器⽹络(WSN, wireless sensor network)技术在最近⼏年得到了迅速发展，正逐渐被⼴泛⽤于军事、交通、环境和⼯业⽣产等领域，完成对温度、湿度、压⼒和速度等许多物理量的测量[1]。

基于乘积偏好关系的专家模糊核聚类赋权方法

基于乘积偏好关系的专家模糊核聚类赋权方法王泽洲;陈云翔;项华春【摘要】多属性、多目标性决策中,针对专家给出各方案偏好关系下的决策问题,提出一种基于乘积偏好关系的专家模糊核聚类赋权方法.该方法运用模糊核聚类的思想实现对决策专家的聚类,并通过放宽归一化约束条件,克服了传统模糊核聚类算法中离群点对聚类结果的影响.同时,在专家类内赋权过程中,运用CI-IOWG算子集结同类专家的意见,依据不同专家对于形成类别一致性意见的贡献程度来确定专家权重;克服了传统基于熵权或判断矩阵一致性的赋权方法的局限性.算例表明,该方法可行、有效.%Within the multiple attributes and multi-objective decision making problems,for the case that each decision maker has a preference relation referring to alternatives,a method of expert fuzzy kernel clustering weighting based on muhiplicative preference relations is proposed,in which the experts are classified by using fuzzy kernel clustering principle.By loosening the normalization constraints,the effects of outliers on the clustering results could be overcome.At the same time,this paper presents CI-IOWG operator for group decision-making with multiplicative preference relation in the process of determining the intra class weight.And the weighting method can determine the experts' weight according to the contribution degree for clustering which overcomes the limitations of the traditional weighting method based on entropy and consistency.The example shows that the method is feasible and effective.【期刊名称】《火力与指挥控制》【年(卷),期】2017(042)005【总页数】7页(P56-62)【关键词】乘积偏好关系;专家赋权;模糊核聚类;CI-IOWG算子【作者】王泽洲;陈云翔;项华春【作者单位】空军工程大学装备管理与安全工程学院,西安710051;空军工程大学装备管理与安全工程学院,西安710051;空军工程大学装备管理与安全工程学院,西安710051【正文语种】中文【中图分类】C934在多目标、多属性决策问题中，由于决策对象存在模糊性，决策者存在主观不确定性，决策者（专家）往往偏向于给出决策对象相互比较的偏好关系。

Cluster计算能力模型总结与设想

Cluster计算能力模型总结与设想刘乃维【期刊名称】《电脑开发与应用》【年(卷),期】2013(000)012【摘要】Cluster计算，中文名称为集群计算。

它可以被认为是这样一种技术：它把许多系统连接到一起，使多台服务器像一台机器一样进行计算工作。

采用cluster计算，通常是为了提高系统稳定性的同时，提高系统对大数据的实时处理能力。

在cluster计算中，服务器群在网络中表现为单一的系统，并以这种模式加以管理。

对于并行计算和集群计算，现已提出的各种计算模型，大多针对的是并行计算与它的共性，而非特性。

而如果需要针对这些特性，就需要用一种更加准确的模型加以描述。

将介绍基于集群的一些计算模型，并提出新的cluster计算模型的一个可能的方案。

%Cluster computing, known as a kind of computing technology to connect the systems, letting them work like a single processor,using cluster computing in dealing with big data could help the computing system performance better and more powerful in security at the same time. This multi-processor system, shown as a single system in the network, alsois managed in this way. There are many models to analyze the performance of cluster computing, which aimed just at the common point it shared with parallel computing. In fact, this paper need to find a way to analyze and model the performance of cluster computing based on the feature of its own. The models used are evaluated now, and then propose a better way to analyze.【总页数】3页(P70-72)【作者】刘乃维【作者单位】北京航空航天大学，北京 100191【正文语种】中文【中图分类】TP311【相关文献】1.PC Cluster存储方案的设想与实施 [J], 宋柏芬2.河南高校非计算机专业学生计算机能力模型研究 [J], 徐宜可3.河南高校非计算机专业教师计算机能力模型研究 [J], 王志衡;高昂;4.网络安全等级保护2.0云计算安全合规能力模型 [J], 张振峰; 张志文; 王睿超5.仿壁虎刚毛阵列对卫星表面吸附能力模型与计算 [J], 罗剑;王杰娟;于小红;周雯雯因版权原因，仅展示原文概要，查看原文内容请购买。

基于索引偏移的MapReduce聚类负载均衡策略

基于索引偏移的MapReduce聚类负载均衡策略周华平;刘光宗;张贝贝【期刊名称】《计算机科学》【年(卷),期】2018(045)005【摘要】MapReduce作为一种分布式编程模型,被广泛应用于大规模和高维度数据集的处理中.其采用原始Hash函数划分数据,当数据分布不均匀时,常会出现数据倾斜的问题.基于MapReduce的聚类算法,需要多次迭代且不清楚各阶段Reduce 的输入数据分布,因此现有的解决数据倾斜的方法并不适用.为解决数据划分的不均衡问题,提出一种当存在数据倾斜时更改剩余分区索引的策略.该方法在Map运行的过程中统计将要分给各reducer的数据量,由JobTrackcr监控全局的分区信息并根据数据倾斜模型动态修改原分区函数;在接下来的分区过程中,Partitioner把即将导致倾斜的分区索引到其余负载较轻的reducer上,使各节点的负载达到均衡.基于Zipf分布数据集和真实数据集,将所提算法与现有的解决数据倾斜的方法进行对比,结果证明,所提策略解决了MapReduce聚类中的数据倾斜问题,且在稳定性与执行时间上优于Hash和基于采样的动态分区法.%MapReduce has been widely used in large-scale and high-dimension datasets as a kind of distributed programming model.Original Hash partition function in MapReduce often occurs data skew when data distribution is not uniform.In the clustering algorithm based on MapReduce,existing solutions for data skew are not applicable because the input data distribution of Reduce is unclear at each stage of multiple iteration.To solve the imbalance problem of data partitioning,this paper proposed a strategy to change the remainingpartition index when data is tilted.In Map phase,the amount of data which will be distributed to each reducer is counted,then the global partition information is monitored and the original partition function is dynamically modified according to the data skew model by JobTrackcr,so the Partitioner can change the index of these partitions which will cause data skew to the other reducer that has less load in the next partitioning process,and eventually balance the load of each node.Finally,this method was compared with existing methods on both synthetic datasets and real datasets.The experimental results show this strategy can solve data skew on MapReduce clustering with better stability and efficiency than Hash method and dynamic partitioning method based on sampling.【总页数】7页(P303-309)【作者】周华平;刘光宗;张贝贝【作者单位】安徽理工大学计算机科学与工程学院安徽淮南232000;安徽理工大学计算机科学与工程学院安徽淮南232000;安徽理工大学计算机科学与工程学院安徽淮南232000【正文语种】中文【中图分类】TP311【相关文献】1.基于反馈调度的MapReduce负载均衡分区算法研究 [J], 刘寒梅;韩宏莹2.基于压力反馈的MapReduce负载均衡策略 [J], 李航晨;秦小麟;沈尧3.数据本地性感知的MapReduce负载均衡策略 [J], 李航晨;秦小麟;沈尧4.一种周期性MapReduce作业的负载均衡策略 [J], 傅杰;都志辉5.基于负载均衡的MapReduce后备任务上限自适应算法 [J], 李燕歌;张治斌;王娜因版权原因，仅展示原文概要，查看原文内容请购买。

基于凝聚式信息瓶颈的加权层次聚类算法

基于凝聚式信息瓶颈的加权层次聚类算法李寒;郭禾;王宇新;刘萍;杨元生【期刊名称】《计算机工程》【年(卷),期】2011(037)006【摘要】提出一种针对面向对象软件架构恢复的基于凝聚式信息瓶颈的加权层次聚类算法(ABWHC).该算法采用信息丢失度作为相似度度量标准,扩充聚类特征和权值,利用面向对象软件的特性,为实体或簇生成用以描述其含义的标签组.实验结果表明,ABWHC算法不仅能改善聚类的性能,还能恢复面向对象软件的架构.%This paper proposes an Agglomerative Information Bottleneck based Weighted Hierarchical Clustering algorithm(ABWHC) to rebuild rnthe architecture of object oriented software. ABWHC uses information loss as the similarity measure, considers the characteristics of object oriented rnsoftware by extending clustering features and weights, and generating label group for each entity or cluster. Experimental results demonstrate that rnABWHC improves the performance of clustering, and efficiently and flexibly achieves object oriented software architecture recovery.【总页数】3页(P55-57)【作者】李寒;郭禾;王宇新;刘萍;杨元生【作者单位】大连理工大学计算机科学与技术学院,辽宁,大连,116024;大连理工大学软件学院,辽宁,大连,116620;大连理工大学计算机科学与技术学院,辽宁,大连,116024;大连理工大学计算机科学与技术学院,辽宁,大连,116024;大连理工大学计算机科学与技术学院,辽宁,大连,116024【正文语种】中文【中图分类】TP311.5【相关文献】1.基于凝聚式层次聚类算法的标签聚类研究 [J], 曹高辉;焦玉英;成全2.基于改进凝聚层次聚类算法的生态环境监测采样点优选技术研究 [J], 彭硕;郭晨;周松;王博;3.基于动态加权网络凝聚度的交互式评价信息集结法 [J], 张发明;熊洁旖4.面向凝聚式层次聚类算法实现的矩阵存储数据结构研究 [J], 张振亚;程红梅;王进;王煦法5.犹豫模糊语言凝聚式层次聚类算法与应用 [J], 张振宇;林杰;苗润生因版权原因，仅展示原文概要，查看原文内容请购买。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

tering of nodes saves energy and communication bandwidth in ad-hoc networks. In this paper we discuss distributed weighted clustering algorithm (DWCA) [2]. This approach is based on combined weight metric that takes into account several system parameters like the ideal node degree, transmission range, energy and mobility of the nodes. Depending on specific applications some of these parameters can be used in the metric to determine the clusterhead. However more clusterheads lead to extra number of hops for a packet when it is to be routed from source to destination. On the other hand we can choose to have minimum number of clusterheads to maximize the resource utilization. Various parameters and their respective weights can be determined to arrive at an efficient weighted cluster based routing scheme. The rest of the paper is organised as follows. In Section 2 we give background information of MANETs and cluster based routing schemes. We have described proposed algorithm in Section 3. Performance evaluation and comparison bia simulation is presented in Section 4. Finally, Section 5 concludes the paper and talks about
WSN
N. CHAUHAN
ET AL.
55
future work.
2. Background and Related Ws
Mobile ad-hoc networks (MANETs) are a form of wireless networks which do not require a base station for providing network connectivity. It defines simple mechanisms which enable mobile devices to form a temporary community without any planned installation, or human intervention. Each node acts as a host and a router at the same time. This means that each node participating in a MANET commits itself to forward data packets from a neighboring node to another until a final destination is reached. In other words, the survival of a MANET relies on the cooperation between its participating members. MANETs have many advantages like low cost, on the fly deployment, etc [3].
Abstract
Mobile ad-hoc networks (MANETs) are a form of wireless networks which do not require a base station for providing network connectivity. Many MANETs’ characteristics that distinguish MANETs from other wireless networks also make routing a challenging task. Cluster based routing is a MANET routing schemes in which various clusters of mobile nodes are formed with each cluster having its own clusterhead which is responsible for routing among clusters. In this paper we propose and implement a distributed weighted clustering algorithm for MANETs. This approach is based on combined weight metric that takes into account several system parameters like the node degree, transmission range, energy and mobility of the nodes. We have evaluated the performance of the proposed scheme through simulation in various network situations. Simulation results show that improved distributed weighted clustering algorithm (DWCAIMP) outperforms the original distributed weighted clustering algorithm (DWCA). Keywords: MANETs, Clustering, Routing, Wireless Communication, Distributed Clustering
2.2. Routing in MANETs
Routing is a very challenging task in mobile ad hoc networks due to their peculiar characteristics like dynamic mobility, frequent disconnections, low bandwidth, low battery power, etc. Hence traditional routing protocols like RIP [4] cannot be used in mobile ad hoc networks. Various routing protocol schemes have been proposed for mobile ad hoc networks like table driven, source initiated on demand etc. and protocols like AODV [1], DSDV [5], DSR [6], ZRP [7], etc.
Wireless Sensor Network, 2011, 3, 54-60
doi:10.4236/wsn.2011.32006 Published Online February 2011 (/journal/wsn)
A Distributed Weighted Cluster Based Routing Protocol for MANETs
1. Introduction
MANETs are a form of wireless networks which do not require a base station for providing network connectivity. In this networking concept, mobile devices form a temporary community without any planned installation, or human intervention. Each node acts as a host and a router at the same time. This means that each node participating in a MANET commits itself to forward data packets from a neighboring node to another until a final destination is reached. Mobile ad-hoc networks have many characteristics which distinguish them from other wireless networks. These factors are: no fixed network infrastructure, dynamic network configuration, node mobility and frequent node failure, low battery power, etc., which make routing in MANETs quite a challenging task. Various routing protocols have been proposed for MANETs with varying performance in different network conditions [1]. Cluster based routing is one of the routing schemes for MANETs in which various clusters of mobile nodes are formed with each cluster having its own clusterhead which is responsible for routing between clusters. ClusCopyright © 2011 SciRes.