exploiting_temporal_gis_and_spatio_temporal_data_to_enhance_telecom
文献 (10)Semi-supervised and unsupervised extreme learning
Semi-supervised and unsupervised extreme learningmachinesGao Huang,Shiji Song,Jatinder N.D.Gupta,and Cheng WuAbstract—Extreme learning machines(ELMs)have proven to be an efficient and effective learning paradigm for pattern classification and regression.However,ELMs are primarily applied to supervised learning problems.Only a few existing research studies have used ELMs to explore unlabeled data. In this paper,we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization,thus greatly expanding the applicability of ELMs.The key advantages of the proposed algorithms are1)both the semi-supervised ELM (SS-ELM)and the unsupervised ELM(US-ELM)exhibit the learning capability and computational efficiency of ELMs;2) both algorithms naturally handle multi-class classification or multi-cluster clustering;and3)both algorithms are inductive and can handle unseen data at test time directly.Moreover,it is shown in this paper that all the supervised,semi-supervised and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping,which is the key concept in ELM theory.Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.Index Terms—Clustering,embedding,extreme learning ma-chine,manifold regularization,semi-supervised learning,unsu-pervised learning.I.I NTRODUCTIONS INGLE layer feedforward networks(SLFNs)have been intensively studied during the past several decades.Most of the existing learning algorithms for training SLFNs,such as the famous back-propagation algorithm[1]and the Levenberg-Marquardt algorithm[2],adopt gradient methods to optimize the weights in the network.Some existing works also use forward selection or backward elimination approaches to con-struct network dynamically during the training process[3]–[7].However,neither the gradient based methods nor the grow/prune methods guarantee a global optimal solution.Al-though various methods,such as the generic and evolutionary algorithms,have been proposed to handle the local minimum This work was supported by the National Natural Science Foundation of China under Grant61273233,the Research Fund for the Doctoral Program of Higher Education under Grant20120002110035and20130002130010, the National Key Technology R&D Program under Grant2012BAF01B03, the Project of China Ocean Association under Grant DY125-25-02,and Tsinghua University Initiative Scientific Research Program under Grants 2011THZ07132.Gao Huang,Shiji Song,and Cheng Wu are with the Department of Automation,Tsinghua University,Beijing100084,China(e-mail:huang-g09@;shijis@; wuc@).Jatinder N.D.Gupta is with the College of Business Administration,The University of Alabama in Huntsville,Huntsville,AL35899,USA.(e-mail: guptaj@).problem,they basically introduce high computational cost. One of the most successful algorithms for training SLFNs is the support vector machines(SVMs)[8],[9],which is a maximal margin classifier derived under the framework of structural risk minimization(SRM).The dual problem of SVMs is a quadratic programming and can be solved conveniently.Due to its simplicity and stable generalization performance,SVMs have been widely studied and applied to various domains[10]–[14].Recently,Huang et al.[15],[16]proposed the extreme learning machines(ELMs)for training SLFNs.In contrast to most of the existing approaches,ELMs only update the output weights between the hidden layer and the output layer, while the parameters,i.e.,the input weights and biases,of the hidden layer are randomly generated.By adopting squared loss on the prediction error,the training of output weights turns into a regularized least squares(or ridge regression)problem which can be solved efficiently in closed form.It has been shown that even without updating the parameters of the hidden layer,the SLFN with randomly generated hidden neurons and tunable output weights maintains its universal approximation capability[17]–[19].Compared to gradient based algorithms, ELMs are much more efficient and usually lead to better generalization performance[20]–[22].Compared to SVMs, solving the regularized least squares problem in ELMs is also faster than solving the quadratic programming problem in standard SVMs.Moreover,ELMs can be used for multi-class classification problems directly.The predicting accuracy achieved by ELMs is comparable with or even higher than that of SVMs[16],[22]–[24].The differences and similarities between ELMs and SVMs are discussed in[25]and[26], and new algorithms are proposed by combining the advan-tages of both models.In[25],an extreme SVM(ESVM) model is proposed by combining ELMs and the proximal SVM(PSVM).The ESVM algorithm is shown to be more accurate than the basic ELMs model due to the introduced regularization technique,and much more efficient than SVMs since there is no kernel matrix multiplication in ESVM.In [26],the traditional RBF kernel are replaced by ELM kernel, leading to an efficient algorithm with matched accuracy of SVMs.In the past years,researchers from variesfields have made substantial contribution to ELM theories and applications.For example,the universal approximation ability of ELMs has been further studied in a classification context[23].The gen-eralization error bound of ELMs has been investigated from the perspective of the Vapnik-Chervonenkis(VC)dimension theory and the initial localized generalization error model(LGEM)[27],[28].Varies extensions have been made to the basic ELMs to make it more efficient and more suitable for specific problems,such as ELMs for online sequential data [29]–[31],ELMs for noisy/missing data[32]–[34],ELMs for imbalanced data[35],etc.From the implementation aspect, ELMs has recently been implemented using parallel tech-niques[36],[37],and realized on hardware[38],which made ELMs feasible for large data sets and real time reasoning. Though ELMs have become popular in a wide range of domains,they are primarily used for supervised learning tasks such as classification and regression,which greatly limits their applicability.In some cases,such as text classification, information retrieval and fault diagnosis,obtaining labels for fully supervised learning is time consuming and expensive, while a multitude of unlabeled data are easy and cheap to collect.To overcome the disadvantage of supervised learning al-gorithms that they cannot make use of unlabeled data,semi-supervised learning(SSL)has been proposed to leverage both labeled and unlabeled data[39],[40].The SSL algorithms assume that the input patterns from both labeled and unlabeled data are drawn from the same marginal distribution.Therefore, the unlabeled data naturally provide useful information for exploring the data structure in the input space.By assuming that the input data follows some cluster structure or manifold in the input space,SSL algorithms can incorporate both la-beled and unlabeled data into the learning process.Since SSL requires less effort to collect labeled data and can offer higher accuracy,it has been applied to various domains[41]–[43].In some other cases where no labeled data are available,people may be interested in exploring the underlying structure of the data.To this end,unsupervised learning(USL)techniques, such as clustering,dimension reduction or data representation, are widely used to fulfill these tasks.In this paper,we extend ELMs to handle both semi-supervised and unsupervised learning problems by introducing the manifold regularization framework.Both the proposed semi-supervised ELM(SS-ELM)and unsupervised ELM(US-ELM)inherit the computational efficiency and the learn-ing capability of traditional pared with existing algorithms,SS-ELM and US-ELM are not only inductive (straightforward extension for out-of-sample examples at test time),but also can be used for multi-class classification or multi-cluster clustering directly.We test our algorithms on a variety of data sets,and make comparisons with other related algorithms.The results show that the proposed algorithms are competitive with state-of-the-art algorithms in terms of accuracy and efficiency.It is worth to mention that all the supervised,semi-supervised and unsupervised ELMs can actually be put into a unified framework,that is all the algorithms consist of two stages:1)random feature mapping;and2)output weights solving.Thefirst stage is to construct the hidden layer using randomly generated hidden neurons.This is the key concept in the ELM theory,which differs it from many existing feature learning methods.Generating feature mapping randomly en-ables ELMs for fast nonlinear feature learning and alleviates the problem of over-fitting.The second stage is to solve the weights between the hidden layer and the output layer, and this is where the main difference of supervised,semi-supervised and unsupervised ELMs lies.We believe that the unified framework for the three types of ELMs might provide us a new perspective to understand the underlying behavior of the random feature mapping in ELMs.The rest of the paper is organized as follows.In Section II,we give a brief review of related existing literature on semi-supervised and unsupervised learning.Section III and IV introduce the basic formulation of ELMs and the man-ifold regularization framework,respectively.We present the proposed SS-ELM and US-ELM algorithms in Sections V and VI.Experiment results are given in Section VII,and Section VIII concludes the paper.II.R ELATED WORKSOnly a few existing research studies on ELMs have dealt with the problem of semi-supervised learning or unsupervised learning.In[44]and[45],the manifold regularization frame-work was introduce into the ELMs model to leverage both labeled and unlabeled data,thus extended ELMs for semi-supervised learning.However,both of these two works are limited to binary classification problems,thus they haven’t explore the full power of ELMs.Moreover,both algorithms are only effective when the number of training patterns is more than the number of hidden neurons.Unfortunately,this condition is usually violated in semi-supervised learning since the training data is relatively scarce compared to the hidden neurons,whose number is commonly set to several hundreds or several thousands.Recently,a co-training approach have been proposed to train ELMs in a semi-supervised setting [46].In this algorithm,the labeled training sets are augmented gradually by moving a small set of most confidently predicted unlabeled data to the labeled set at each loop,and ELMs are trained repeatedly on the pseudo-labeled set.Since the algo-rithm need to train ELMs repeatedly,it introduces considerable extra computational cost.The proposed SS-ELM is related to a few other mani-fold assumption based semi-supervised learning algorithms, such as the Laplacian support vector machines(LapSVMs) [47],the Laplacian regularized least squares(LapRLS)[47], semi-supervised neural networks(SSNNs)[48],and semi-supervised deep embedding[49].It has been shown in these works that manifold regularization is effective in a wide range of domains and often leads to a state-of-the-art performance in terms of accuracy and efficiency.The US-ELM proposed in this paper are related to the Laplacian Eigenmaps(LE)[50]and spectral clustering(SC) [51]in that they both use spectral techniques for embedding and clustering.In all these algorithms,an affinity matrix is first built from the input patterns.The SC performs eigen-decomposition on the normalized affinity matrix,and then embeds the original data into a d-dimensional space using the first d eigenvectors(each row is normalized to have unit length and represents a point in the embedded space)corresponding to the d largest eigenvalues.The LE algorithm performs generalized eigen-decomposition on the graph Laplacian,anduses the d eigenvectors corresponding to the second through the(d+1)th smallest eigenvalues for embedding.When LE and SC are used for clustering,then k-means is adopted to cluster the data in the embedded space.Similar to LE and SC,the US-ELM are also based on the affinity matrix,and it is converted to solving a generalized eigen-decomposition problem.However,the eigenvectors obtained in US-ELM are not used for data representation directly,but are used as the parameters of the network,i.e.,the output weights.Note that once the US-ELM model is trained,it can be applied to any presented data in the original input space.In this way,US-ELM provide a straightforward way for handling new patterns without recomputing eigenvectors as in LE and SC.III.E XTREME LEARNING MACHINES Consider a supervised learning problem where we have a training set with N samples,{X,Y}={x i,y i}N i=1.Herex i∈R n i,y i is a n o-dimensional binary vector with only one entry(correspond to the class that x i belongs to)equal to one for multi-classification tasks,or y i∈R n o for regression tasks,where n i and n o are the dimensions of input and output respectively.ELMs aim to learn a decision rule or an approximation function based on the training data. Generally,the training of ELMs consists of two stages.The first stage is to construct the hidden layer using afixed number of randomly generated mapping neurons,which can be any nonlinear piecewise continuous functions,such as the Sigmoid function and Gaussian function given below.1)Sigmoid functiong(x;θ)=11+exp(−(a T x+b));(1)2)Gaussian functiong(x;θ)=exp(−b∥x−a∥);(2) whereθ={a,b}are the parameters of the mapping function and∥·∥denotes the Euclidean norm.A notable feature of ELMs is that the parameters of the hidden mapping functions can be randomly generated ac-cording to any continuous probability distribution,e.g.,the uniform distribution on(-1,1).This makes ELMs distinct from the traditional feedforward neural networks and SVMs. The only free parameters that need to be optimized in the training process are the output weights between the hidden neurons and the output nodes.By doing so,training ELMs is equivalent to solving a regularized least squares problem which is considerately more efficient than the training of SVMs or backpropagation algorithms.In thefirst stage,a number of hidden neurons which map the data from the input space into a n h-dimensional feature space (n h is the number of hidden neurons)are randomly generated. We denote by h(x i)∈R1×n h the output vector of the hidden layer with respect to x i,andβ∈R n h×n o the output weights that connect the hidden layer with the output layer.Then,the outputs of the network are given byf(x i)=h(x i)β,i=1,...,N.(3)In the second stage,ELMs aim to solve the output weights by minimizing the sum of the squared losses of the prediction errors,which leads to the following formulationminβ∈R n h×n o12∥β∥2+C2N∑i=1∥e i∥2s.t.h(x i)β=y T i−e T i,i=1,...,N,(4)where thefirst term in the objective function is a regularization term which controls the complexity of the model,e i∈R n o is the error vector with respect to the i th training pattern,and C is a penalty coefficient on the training errors.By substituting the constraints into the objective function, we obtain the following equivalent unconstrained optimization problem:minβ∈R n h×n oL ELM=12∥β∥2+C2∥Y−Hβ∥2(5)where H=[h(x1)T,...,h(x N)T]T∈R N×n h.The above problem is widely known as the ridge regression or regularized least squares.By setting the gradient of L ELM with respect toβto zero,we have∇L ELM=β+CH H T(Y−Hβ)=0(6) If H has more rows than columns and is of full column rank,which is usually the case where the number of training patterns are more than the number of the hidden neurons,the above equation is overdetermined,and we have the following closed form solution for(5):β∗=(H T H+I nhC)−1H T Y,(7)where I nhis an identity matrix of dimension n h.Note that in practice,rather than explicitly inverting the n h×n h matrix in the above expression,we can use Gaussian elimination to directly solve a set of linear equations in a more efficient and numerically stable manner.If the number of training patterns are less than the number of hidden neurons,then H will have more columns than rows, which often leads to an underdetermined least squares prob-lem.In this case,βmay have infinite number of solutions.To handle this problem,we restrictβto be a linear combination of the rows of H:β=H Tα(α∈R N×n o).Notice that when H has more columns than rows and is of full row rank,then H H T is invertible.Multiplying both side of(6) by(H H T)−1H,we getα+C(Y−H H Tα)=0,(8) This yieldsβ∗=H Tα∗=H T(H H T+I NC)−1Y(9)where I N is an identity matrix of dimension N. Therefore,in the case where training patterns are plentiful compared to the hidden neurons,we use(7)to compute the output weights,otherwise we use(9).IV.T HE MANIFOLD REGULARIZATION FRAMEWORK Semi-supervised learning is built on the following two assumptions:(1)both the label data X l and the unlabeled data X u are drawn from the same marginal distribution P X ;and (2)if two points x 1and x 2are close to each other,then the conditional probabilities P (y |x 1)and P (y |x 2)should be similar as well.The latter assumption is widely known as the smoothness assumption in machine learning.To enforce this assumption on the data,the manifold regularization framework proposes to minimize the following cost functionL m=12∑i,jw ij ∥P (y |x i )−P (y |x j )∥2,(10)where w ij is the pair-wise similarity between two patterns x iand x j .Note that the similarity matrix W =[w ij ]is usually sparse,since we only place a nonzero weight between two patterns x i and x j if they are close,e.g.,x i is among the k nearest neighbors of x j or x j is among the k nearest neighbors of x i .The nonzero weights are usually computed using Gaussian function exp (−∥x i −x j ∥2/2σ2),or simply fixed to 1.Intuitively,the formulation (10)penalizes large variation in the conditional probability P (y |x )when x has a small change.This requires that P (y |x )vary smoothly along the geodesics of P (x ).Since it is difficult to compute the conditional probability,we can approximate (10)with the following expression:ˆLm =12∑i,jw ij ∥ˆyi −ˆy j ∥2,(11)where ˆyi and ˆy j are the predictions with respect to pattern x i and x j ,respectively.It is straightforward to simplify the above expression in a matrix form:ˆL m =Tr (ˆY T L ˆY ),(12)where Tr (·)denotes the trace of a matrix,L =D −W isknown as the graph Laplacian ,and D is a diagonal matrixwith its diagonal elements D ii =l +u∑j =1w i,j .As discussed in [52],instead of using L directly,we can normalize it byD −12L D −12or replace it by L p (p is an integer),based on some prior knowledge.V.S EMI -SUPERVISED ELMIn the semi-supervised setting,we have few labeled data and plenty of unlabeled data.We denote the labeled data in the training set as {X l ,Y l }={x i ,y i }l i =1,and unlabeled dataas X u ={x i }ui =1,where l and u are the number of labeled and unlabeled data,respectively.The proposed SS-ELM incorporates the manifold regular-ization to leverage unlabeled data to improve the classification accuracy when labeled data are scarce.By modifying the ordinary ELM formulation (4),we give the formulation ofSS-ELM as:minβ∈R n h ×n o12∥β∥2+12l∑i =1C i ∥e i ∥2+λ2Tr (F T L F )s.t.h (x i )β=y T i −e T i ,i =1,...,l,f i =h (x i )β,i =1,...,l +u(13)where L ∈R (l +u )×(l +u )is the graph Laplacian built fromboth labeled and unlabeled data,and F ∈R (l +u )×n o is the output matrix of the network with its i th row equal to f (x i ),λis a tradeoff parameter.Note that similar to the weighted ELM algorithm (W-ELM)introduced in [35],here we associate different penalty coeffi-cient C i on the prediction errors with respect to patterns from different classes.This is because we found that when the data is skewed,i.e.,some classes have significantly more training patterns than other classes,traditional ELMs tend to fit the classes that having the majority of patterns quite well but fits other classes poorly.This usually leads to poor generalization performance on the testing set (while the prediction accuracy may be high,but the some classes are neglected).Therefore,we propose to alleviate this problem by re-weighting instances from different classes.Suppose that x i belongs to class t i ,which has N t i training patterns,then we associate e i with a penalty ofC i =C 0N t i.(14)where C 0is a user defined parameter as in traditional ELMs.In this way,the patterns from the dominant classes will not be over fitted by the algorithm,and the patterns from a class with less samples will not be neglected.We substitute the constraints into the objective function,and rewrite the above formulation in a matrix form:min β∈R n h×n o 12∥β∥2+12∥C 12( Y −Hβ)∥2+λ2Tr (βT H TL Hβ)(15)where Y∈R (l +u )×n o is the training target with its first l rows equal to Y l and the rest equal to 0,C is a (l +u )×(l +u )diagonal matrix with its first l diagonal elements [C ]ii =C i ,i =1,...,l and the rest equal to 0.Again,we compute the gradient of the objective function with respect to β:∇L SS −ELM =β+H T C ( Y−H β)+λH H T L H β.(16)By setting the gradient to zero,we obtain the solution tothe SS-ELM:β∗=(I n h +H T C H +λH H T L H )−1H TC Y .(17)As in Section III,if the number of labeled data is fewer thanthe number of hidden neurons,which is common in SSL,we have the following alternative solution:β∗=H T (I l +u +C H H T +λL L H H T )−1C Y .(18)where I l +u is an identity matrix of dimension l +u .Note that by settingλto be zero and the diagonal elements of C i(i=1,...,l)to be the same constant,(17)and (18)reduce to the solutions of traditional ELMs(7)and(9), respectively.Based on the above discussion,the SS-ELM algorithm is summarized as Algorithm1.Algorithm1The SS-ELM algorithmInput:The labeled patterns,{X l,Y l}={x i,y i}l i=1;The unlabeled patterns,X u={x i}u i=1;Output:The mapping function of SS-ELM:f:R n i→R n oStep1:Construct the graph Laplacian L from both X l and X u.Step2:Initiate an ELM network of n h hidden neurons with random input weights and biases,and calculate the output matrix of the hidden neurons H∈R(l+u)×n h.Step3:Choose the tradeoff parameter C0andλ.Step4:•If n h≤NCompute the output weightsβusing(17)•ElseCompute the output weightsβusing(18)return The mapping function f(x)=h(x)β.VI.U NSUPERVISED ELMIn this section,we introduce the US-ELM algorithm for unsupervised learning.In an unsupervised setting,the entire training data X={x i}N i=1are unlabeled(N is the number of training patterns)and our target is tofind the underlying structure of the original data.The formulation of US-ELM follows from the formulation of SS-ELM.When there is no labeled data,(15)is reduced tomin β∈R n h×n o ∥β∥2+λTr(βT H T L Hβ)(19)Notice that the above formulation always attains its mini-mum atβ=0.As suggested in[50],we have to introduce addtional constraints to avoid a degenerated solution.Specifi-cally,the formulation of US-ELM is given bymin β∈R n h×n o ∥β∥2+λTr(βT H T L Hβ)s.t.(Hβ)T Hβ=I no(20)Theorem1:An optimal solution to problem(20)is given by choosingβas the matrix whose columns are the eigenvectors (normalized to satisfy the constraint)corresponding to thefirst n o smallest eigenvalues of the generalized eigenvalue problem:(I nh +λH H T L H)v=γH H T H v.(21)Proof:We can rewrite the problem(20)asminβ∈R n h×n o,ββT Bβ=I no Tr(βT Aβ),(22)Algorithm2The US-ELM algorithmInput:The training data:X∈R N×n i;Output:•For embedding task:The embedding in a n o-dimensional space:E∈R N×n o;•For clustering task:The label vector of cluster index:y∈N N×1+.Step1:Construct the graph Laplacian L from X.Step2:Initiate an ELM network of n h hidden neurons withrandom input weights,and calculate the output matrix of thehidden neurons H∈R N×n h.Step3:•If n h≤NFind the generalized eigenvectors v2,v3,...,v no+1of(21)corresponding to the second through the n o+1smallest eigenvalues.Letβ=[ v2, v3,..., v no+1],where v i=v i/∥H v i∥,i=2,...,n o+1.•ElseFind the generalized eigenvectors u2,u3,...,u no+1of(24)corresponding to the second through the n o+1smallest eigenvalues.Letβ=H T[ u2, u3,..., u no+1],where u i=u i/∥H H T u i∥,i=2,...,n o+1.Step4:Calculate the embedding matrix:E=Hβ.Step5(For clustering only):Treat each row of E as a point,and cluster the N points into K clusters using the k-meansalgorithm.Let y be the label vector of cluster index for allthe points.return E(for embedding task)or y(for clustering task);where A=I nh+λH H T L H and B=H T H.It is easy to verify that both A and B are Hermitianmatrices.Thus,according to the Rayleigh-Ritz theorem[53],the above trace minimization problem attains its optimum ifand only if the column span ofβis the minimum span ofthe eigenspace corresponding to the smallest n o eigenvaluesof(21).Therefore,by stacking the normalized eigenvectors of(21)corresponding to the smallest n o generalized eigenvalues,we obtain an optimal solution to(20).In the algorithm of Laplacian eigenmaps,thefirst eigenvec-tor is discarded since it is always a constant vector proportionalto1(corresponding to the smallest eigenvalue0)[50].In theUS-ELM algorithm,thefirst eigenvector of(21)also leadsto small variations in embedding and is not useful for datarepresentation.Therefore,we suggest to discard this trivialsolution as well.Letγ1,γ2,...,γno+1(γ1≤γ2≤...≤γn o+1)be the(n o+1)smallest eigenvalues of(21)and v1,v2,...,v no+1be their corresponding eigenvectors.Then,the solution to theoutput weightsβis given byβ∗=[ v2, v3,..., v no+1],(23)where v i=v i/∥H v i∥,i=2,...,n o+1are the normalizedeigenvectors.If the number of labeled data is fewer than the numberTABLE ID ETAILS OF THE DATA SETS USED FOR SEMI-SUPERVISED LEARNINGData set Class Dimension|L||U||V||T|G50C2505031450136COIL20(B)2102440100040360USPST(B)225650140950498COIL2020102440100040360USPST1025650140950498of hidden neurons,problem(21)is underdetermined.In this case,we have the following alternative formulation by using the same trick as in previous sections:(I u+λL L H H T )u=γH H H T u.(24)Again,let u1,u2,...,u no +1be generalized eigenvectorscorresponding to the(n o+1)smallest eigenvalues of(24), then thefinal solution is given byβ∗=H T[ u2, u3,..., u no +1],(25)where u i=u i/∥H H T u i∥,i=2,...,n o+1are the normal-ized eigenvectors.If our task is clustering,then we can adopt the k-means algorithm to perform clustering in the embedded space.We summarize the proposed US-ELM in Algorithm2. Remark:Comparing the supervised ELM,the semi-supervised ELM and the unsupervised ELM,we can observe that all the algorithms have two similar stages in the training process,that is the random feature learning stage and the out-put weights learning stage.Under this two-stage framework,it is easy tofind the differences and similarities between the three algorithms.Actually,all the algorithms share the same stage of random feature learning,and this is the essence of the ELM theory.This also means that no matter the task is a supervised, semi-supervised or unsupervised learning problem,we can always follow the same step to generate the hidden layer. The differences of the three types of ELMs lie in the second stage on how the output weights are computed.In supervised ELM and SS-ELM,the output weights are trained by solving a regularized least squares problem;while the output weights in the US-ELM are obtained by solving a generalized eigenvalue problem.The unified framework for the three types of ELMs might provide new perspectives to further develop the ELM theory.VII.E XPERIMENTAL RESULTSWe evaluated our algorithms on wide range of semi-supervised and unsupervised parisons were made with related state-of-the-art algorithms, e.g.,Transductive SVM(TSVM)[54],LapSVM[47]and LapRLS[47]for semi-supervised learning;and Laplacian Eigenmap(LE)[50], spectral clustering(SC)[51]and deep autoencoder(DA)[55] for unsupervised learning.All algorithms were implemented using Matlab R2012a on a2.60GHz machine with4GB of memory.TABLE IIIT RAINING TIME(IN SECONDS)COMPARISON OF TSVM,L AP RLS,L AP SVM AND SS-ELMData set TSVM LapRLS LapSVM SS-ELMG50C0.3240.0410.0450.035COIL20(B)16.820.5120.4590.516USPST(B)68.440.9210.947 1.029COIL2018.43 5.841 4.9460.814USPST68.147.1217.259 1.373A.Semi-supervised learning results1)Data sets:We tested the SS-ELM onfive popular semi-supervised learning benchmarks,which have been widely usedfor evaluating semi-supervised algorithms[52],[56],[57].•The G50C is a binary classification data set of which each class is generated by a50-dimensional multivariate Gaus-sian distribution.This classification problem is explicitlydesigned so that the true Bayes error is5%.•The Columbia Object Image Library(COIL20)is a multi-class image classification data set which consists1440 gray-scale images of20objects.Each pattern is a32×32 gray scale image of one object taken from a specific view.The COIL20(B)data set is a binary classification taskobtained from COIL20by grouping thefirst10objectsas Class1,and the last10objects as Class2.•The USPST data set is a subset(the testing set)of the well known handwritten digit recognition data set USPS.The USPST(B)data set is a binary classification task obtained from USPST by grouping thefirst5digits as Class1and the last5digits as Class2.2)Experimental setup:We followed the experimental setup in[57]to evaluate the semi-supervised algorithms.Specifi-cally,each of the data sets is split into4folds,one of which was used for testing(denoted by T)and the rest3folds for training.Each of the folds was used as the testing set once(4-fold cross-validation).As in[57],this random fold generation process were repeated3times,resulted in12different splits in total.Every training set was further partitioned into a labeled set L,a validation set V,and an unlabeled set U.When we train a semi-supervised learning algorithm,the labeled data from L and the unlabeled data from U were used.The validation set which consists of labeled data was only used for model selection,i.e.,finding the optimal hyperparameters C0andλin the SS-ELM algorithm.The characteristics of the data sets used in our experiment are summarized in Table I. The training of SS-ELM consists of two stages:1)generat-ing the random hidden layer;and2)training the output weights using(17)or(18).In thefirst stage,we adopted the Sigmoid function for nonlinear mapping,and the input weights and biases were generated according to the uniform distribution on(-1,1).The number of hidden neurons n h wasfixed to 1000for G50C,and2000for the rest four data sets.In the second stage,wefirst need to build the graph Laplacian L.We followed the methods discussed in[52]and[57]to compute L,and the hyperparameter settings can be found in[47],[52] and[57].The trade off parameters C andλwere selected from。
基于注意力掩码与特征提取的人脸伪造主动防御
基于注意力掩码与特征提取的人脸伪造主动防御1. 内容综述随着社交媒体和在线平台的普及,人脸伪造技术逐渐成为一种严重的安全威胁。
这种技术利用深度学习算法对人脸图像进行篡改,以达到欺骗、欺诈或其他恶意目的。
为了应对这一挑战,研究人员提出了许多基于注意力掩码与特征提取的人脸伪造主动防御方法。
这些方法旨在通过检测和阻止人脸伪造攻击,保护用户隐私和数据安全。
1.1 研究背景在当今信息化社会中,人脸伪造技术日益成熟,给个人隐私保护和信息安全带来了巨大挑战。
随着深度学习技术的快速发展,基于注意力掩码与特征提取的人脸伪造主动防御方法应运而生。
这种方法通过在人脸伪造过程中引入注意力机制和特征提取技术,有效地提高了对人脸伪造行为的识别和防御能力。
传统的人脸伪造检测方法主要依赖于人工设计的特征提取器和分类器,这些方法在面对复杂的人脸伪造场景时,往往表现出较低的准确性和鲁棒性。
而基于注意力掩码与特征提取的人脸伪造主动防御方法则能够自动地从原始图像中提取关键信息,同时忽略无关的信息,从而提高对人脸伪造行为的识别准确率。
研究者们已经在这一领域取得了一系列重要成果,一些研究成果表明,利用注意力机制可以有效地识别出人脸伪造样本中的微小变化;另外,结合深度学习和卷积神经网络(CNN)等先进技术,可以实现对人脸伪造行为的高效、准确识别。
尽管目前的研究取得了一定的进展,但仍然面临着许多挑战。
如何进一步提高模型的泛化能力和鲁棒性,以及如何在实际应用中实现对大规模数据的有效处理等问题尚待进一步研究。
本研究旨在探讨基于注意力掩码与特征提取的人脸伪造主动防御方法,以期为解决这一问题提供新的思路和技术支持。
1.2 研究意义随着互联网的普及和社交媒体的发展,人脸伪造技术逐渐成为了一种新型的攻击手段。
这种攻击手段可以通过伪造他人的人脸图像来达到欺骗、敲诈等目的,给个人隐私和信息安全带来极大的威胁。
研究如何有效地防御人脸伪造攻击具有重要的理论价值和实际应用意义。
局部敏感哈希算法在近似最近邻搜索中的应用
局部敏感哈希算法在近似最近邻搜索中的应用随着数据量的增长和处理速度的提高,近似最近邻搜索(Approximate Nearest Neighbor Search,简称ANN Search)成为了实际应用中十分重要的问题。
ANN Search指的是在大规模数据中查找与目标数据最接近的数据点,但是为了减少计算量,通常只需要返回一个近似的结果。
近似最近邻搜索在很多领域都有广泛的应用,例如计算机视觉、自然语言处理以及人脸识别等。
但是,在处理大规模数据时,基于暴力搜索的传统算法往往需要大量的计算时间,无法满足实际需求。
因此,需要一些高效的算法来提高ANN Search的效率。
局部敏感哈希(Locality Sensitive Hashing,简称LSH)算法便是一种高效的ANN Search算法。
LSH算法能够将高维空间中相似的数据点映射到低维空间的相近位置,并且保证在低维空间中相似的数据点的距离尽可能地接近。
这种映射方式从而可以在低维空间中通过一些简单的计算找出相似的最近邻。
在LSH算法中,哈希函数起到了关键的作用。
常用的哈希函数有多样性哈希(MinHash)、随机超平面哈希(Random Hyperplane Hash)以及基于余弦相似度的哈希(Cosine Hash)。
这些哈希函数均有一个共同的特点,即局部敏感性。
LSH算法可以通过不同的哈希函数来构建多个哈希表,每个哈希表能够提供一部分相似的数据点。
同时,将多个哈希表的结果组合起来,即可得到更加准确的近似最近邻结果。
局部敏感哈希算法的优点在于它在高维空间中能够快速地找到相似的最近邻。
由于随着维度的增加,数据集中数据点之间的“距离”变得越来越稀疏,因此LSH算法能够在高维空间中快速地缩小数据点之间的距离,从而同样能够提高搜索效率。
但是,局部敏感哈希算法在实际应用中也存在一些问题。
首先,哈希函数的选择十分重要,不同的哈希函数能够提供的精度也存在差异。
因此,如何选择合适的哈希函数以及如何设置哈希函数中的参数,也是LSH算法优化的一个重要方向。
temporal action localization综述
temporal action localization综述1. 引言1.1 概述Temporal action localization是计算机视觉和人工智能领域中的一个重要研究方向。
它涉及对视频中的动作进行定位和识别,即确定视频片段中发生的特定动作以及它们出现的时间范围。
这对于视频内容理解、视频检索、行为识别和视频摘要等任务具有重要意义。
在过去的几年里,随着深度学习技术的兴起和发展,越来越多基于深度学习的动作定位算法被提出。
这些算法利用深度神经网络模型从输入视频序列中提取特征,并根据这些特征进行动作检测和定位。
然而,尽管深度学习在许多视觉任务中取得了巨大成功,但在处理时序数据方面仍存在一些挑战。
首先,由于视频是时空数据,在编码和分析过程中需要考虑时间维度。
其次,在训练数据方面,缺少标注完整的时序信息使得训练模型变得困难。
因此,本篇综述将重点介绍针对temporal action localization任务的最新方法和技术,并评估它们在不同数据集上的性能。
我们还将讨论当前研究中存在的问题和挑战,并探讨该领域在视频监控和其他潜在应用领域的研究方向和前景。
1.2 文章结构本篇综述文章将按照以下结构进行组织和阐述:在引言部分,我们将对temporal action localization的概念和重要性进行介绍。
同时,我们还将明确本文的目的和结构。
在文章主体部分,我们将系统地呈现针对temporal action localization任务的不同方法。
我们首先简要介绍该任务的基本原理,并详细描述基于深度学习、传统机器学习以及二者结合的算法。
随后,我们会重点关注已有数据集和评估指标,并对它们进行比较和评估。
接下来,在讨论与分析部分,我们会仔细检视各种算法的性能表现,并对当前研究面临的问题和挑战进行深入讨论。
进一步地,在应用领域与未来发展方向一节中,我们将聚焦temporal action localization在视频监控中的应用研究现状和前景展望。
slam算法工程师招聘笔试题与参考答案(某世界500强集团)2024年
2024年招聘slam算法工程师笔试题与参考答案(某世界500强集团)(答案在后面)一、单项选择题(本大题有10小题,每小题2分,共20分)1、以下哪个不属于SLAM(Simultaneous Localization and Mapping)算法的基本问题?A、定位B、建图C、导航D、路径规划2、在视觉SLAM中,常用的特征点检测算法不包括以下哪一项?A、SIFT(Scale-Invariant Feature Transform)B、SURF(Speeded Up Robust Features)C、ORB(Oriented FAST and Rotated BRIEF)D、BOW(Bag-of-Words)3、SLAM(同步定位与映射)系统中的“闭环检测”功能主要目的是什么?A. 提高地图的精度B. 减少计算量C. 优化路径规划D. 增强系统稳定性4、在视觉SLAM中,以下哪种方法通常用于提取特征点?A. SIFT(尺度不变特征变换)B. SURF(加速稳健特征)C. ORB(Oriented FAST and Rotated BRIEF)D. 以上都是5、SLAM(Simultaneous Localization and Mapping)算法的核心目标是什么?A. 实现无人驾驶车辆在未知环境中的自主导航B. 构建三维空间地图并实时更新C. 实现机器人路径规划D. 以上都是6、以下哪种传感器不适合用于SLAM系统?A. 激光雷达B. 摄像头C. 声呐D. 超声波传感器7、以下关于SLAM(同步定位与映射)系统的描述中,哪个是错误的?A. SLAM系统通常需要在未知环境中进行定位与建图。
B. SLAM系统通常需要使用传感器来获取环境信息。
C. SLAM系统可以实时生成地图并更新位置信息。
D. SLAM系统不需要进行初始化定位。
8、以下关于视觉SLAM(视觉同步定位与映射)系统的描述中,哪个是正确的?A. 视觉SLAM系统只依赖于视觉传感器进行定位与建图。
基于分组双阶段双向卷积长短期方法的高光谱图像超分辨率网络
智城实践NO.04 20241智能城市 INTELLIGENT CITY基于分组双阶段双向卷积长短期方法的高光谱图像超分辨率网络林建君1侯钧译2杨翠云2(1.烟台职业学院信息工程系,山东 烟台 264670;2.青岛科技大学信息科学技术学院,山东 青岛 266000)摘要:文章提出基于分组的双阶段Bi-ConvLSTM网络(GDBN),可以充分利用图像的空间和光谱信息,通过使用以波段为单位的分组策略,有效缓解了计算负担,并对光谱信息进行保护。
在编码器的不同阶段,对浅层信息提取模块和深度特征提取模块进行不同层次信息的提取,浅层信息提取模块能够对不同尺度的浅层特征信息进行充分捕捉,深度特征提取模块能够捕捉图像的高频特征信息。
文章还引入通道注意力机制,增强网络对特征的组织能力,并在自然数据集cave上进行大量实验,效果普遍优于目前主流的深度学习方法。
关键词:双向卷积长短期记忆网络;高光谱图像超分辨率;通道注意力;神经网络;深度学习中图分类号:TP391 文献标识码:A 文章编号:2096-1936(2024)04-0001-03DOI:10.19301/ki.zncs.2024.04.001Hyperspectral image super-resolution network based on groupedtwo-stage biconvolution long-term and short-term methodLIN Jian-jun HOU Jun-yi YANG Cui-yunAbstract:In this paper, a two-stage Bi-ConvLSTM network based on grouping (GDBN) is proposed, which can make full use of the spatial and spectral information of images, and effectively relieve the computational burden and protect the spectral information by using the grouping strategy based on band units. At different stages of the encoder, the shallow information extraction module and the depth feature extraction module can extract different levels of information. The shallow information extraction module can fully capture the shallow feature information of different scales, and the depth feature extraction module can capture the high-frequency feature information of the image. The paper also introduces channel attention mechanism to enhance the network's ability to organize features, and conducts a large number of experiments on natural data set cave, and the effect is generally better than the current mainstream deep learning methods.Key words:bidirectional convolution long-term and short-term memory network; hyperspectral image super-resolution; channel attention; neural network; deep learning近年来,基于深度学习[1-2]的单图像超分辨率方法取得了广泛发展。
面向隐私保护的稀疏对抗攻击样本生成方法
第47卷第6期燕山大学学报Vol.47No.62023年11月Journal of Yanshan UniversityNov.2023㊀㊀文章编号:1007-791X (2023)06-0538-12面向隐私保护的稀疏对抗攻击样本生成方法王㊀涛1,∗,马㊀川2,陈淑平3,尤殿龙4(1.河北科技师范学院工商管理学院,河北秦皇岛066004;2.燕山大学工程训练中心,河北秦皇岛066004;3.燕山大学图书馆,河北秦皇岛066004;4.燕山大学信息科学与工程学院,河北秦皇岛066004)㊀㊀收稿日期:2022-12-07㊀㊀㊀责任编辑:温茂森基金项目:国家自然科学基金资助项目(62276226);河北省自然科学基金资助项目(F2021203038)㊀㊀作者简介:∗王涛(1983-),女,河南洛阳人,博士,副教授,主要研究方向为信息安全㊁人工智能安全,Email:yy_mma@㊂摘㊀要:为了应对视频监控和社交网络分享等真实场景中深度神经网络对图像信息的过度挖掘,提出了一种稀疏对抗攻击样本的生成方法,旨在对抗深度神经网络,致其错误分类,无法完成后续未授权任务㊂对扰动像素数量㊁扰动幅度以及扰动位置等多个目标进行优化,并基于抽样方案简捷高效地生成对抗样本㊂与其他5种相关方法对比了对抗成功率㊁扰动像素数量㊁扰动幅度㊁扰动位置和优化效果等指标,并根据扰动像素的分布情况分析了目标模型的分类空间特征㊂通过迁移测试和在目标检测任务中的应用,对本文算法的泛化能力和实用性进行了评估㊂实验结果表明,算法在扰动率不超过1%的情况下,依然可以保证对深度神经网络的有效对抗,并显著优化了扰动像素幅度及扰动位置,对原始图像的破坏性更小,扰动更加不易感知㊂算法具有良好的泛化性和实用性㊂关键词:深度神经网络;稀疏对抗攻击;对抗样本;抽样;隐私保护中图分类号:TP391㊀㊀文献标识码:A㊀㊀DOI :10.3969/j.issn.1007-791X.2023.06.0070 引言人工智能技术日益在计算机视觉领域中变得不可替代㊂基于深度神经网络(Deep NeuralNetwork,DNN),智能设备通过对图像的智能分析和处理,可支持目标检测㊁目标识别㊁流量统计㊁事件监测㊁监控安防㊁AI 零售等应用场景[1-2],甚至可以进行视觉推理和场景理解[3-5]㊂人工智能在计算机视觉领域的应用正在重塑人们的生活㊁工作和思维的方式㊂但另一方面,图像之中包含丰富的语义知识,可以透露很多本身内容之外的隐含信息,一旦被恶意使用将会对个人隐私构成巨大威胁㊂例如,利用深度学习模型可以从个人图像中挖掘出职业[6]㊁健康状况[7]㊁性取向[8]等敏感信息㊂结合大数据分析,甚至能够得到用户关系网[9]㊂在无死角监控和社交媒体共享的时代,如何保护个人隐私成为亟待解决的问题㊂目前,对抗DNN 模型检测的研究中,基于模糊处理的方法㊁数据中毒攻击和对抗样本攻击被广泛研究㊂Wilber 等人[10]使用模糊㊁像素化㊁变暗和遮挡等模糊技术对抗人脸测试㊂但基于模糊处理的方法要么对原图像破坏过多,要么对DNN 识别系统无效㊂Cherepanova 等人[11]开发了LowKey 系统,可以使用户在社交媒体上公开图像之前进行预处理(对图像投毒),以便第三方DNN 模型无法将其用于人脸识别目的㊂Shan 等人[12]提出了Fawkes,帮助人们在公开图像之前对其图像进行投毒以抵御DNN 识别模型㊂当这些图像用于训练识别模型时,会导致用户的正常图像被错误识别㊂这种方案的前提是能够向数据集中注入投毒图像,如果目标模型已经训练完毕或在干净的数据集上进行训练,则上述方法将失效㊂对抗样本攻击的方法是通过对图像施以轻微扰动来欺骗第6期王㊀涛等㊀面向隐私保护的稀疏对抗攻击样本生成方法539㊀DNN模型,致其错误分类[13]㊂Yang等人[14]提出了一种利用图像加密技术生成对抗性身份掩码的方法,并通过在人脸图像上覆盖掩码来隐藏原始身份㊂该类方法的优势是无论DNN模型的结构如何,只需修改自己的图像数据即可达到目的,因此更适合在真实场景中保护用户隐私㊂DNN模型检测的技术核心是基于深度学习的图像分类㊂为了在监控和社交平台这些真实场景中保护个人隐私,可以在图像中添加少量扰动来对抗目标模型,致其错误分类,从而无法完成后续一系列未经授权的任务㊂如暴露在视频监控下的人和物品可以贴扰动贴㊁发布到社交平台的图像可以在上传时修改少量像素值,以此方式来对抗DNN模型㊂为了提高攻击方法在真实场景中的实用性,降低对图像的破坏性,本文使用稀疏对抗攻击方法,即通过扰动少量像素(扰动像素即修改像素)进行对抗攻击,以达到保护个人隐私的目的㊂1㊀相关研究Szegedy等人[13]首次提出了对抗攻击方法,发现微小扰动即可使DNN错误分类㊂Goodfellow等人[15]提出FGSM算法,利用梯度来生成扰动,提供了一种简单而快速地生成对抗样本的方法㊂随后,Kurakin等人[16]对FGSM算法进行拓展,提出了BIM算法,用于物理世界中的对抗攻击㊂Moosavi-Dezfooli等人[17]提出了Deepfool方法,基于超平面分类求解最小扰动,同等攻击效果下得到了比FGSM方法更少的扰动,并对分类器的鲁棒性进行了量化㊂Rozsa等人[18]提出了FGVM算法,加强了对抗攻击的泛化能力,可以对抗多种深度神经网络㊂以上方法采用的是白盒攻击,对目标模型的梯度㊁结构和参数等信息依赖程度高㊂此外,由于利用了L2或者Lɕ范数限制扰动幅度来生成最小扰动,微小的扰动虽然肉眼不可见,但通常需扰动整张图像中的每一个像素,这就使得该类方法难以应用于真实场景㊂Papernot等人[19]在L0范数约束下进行定长扰动,提出了JSMA算法,引入显著图概念,达到了只需修改少量的输入特征即可使目标模型误分类的目的㊂Narodytska等人[20]利用一种基于局部搜索的算法来构造梯度的数值近似用以扰动图像,只需修改很少的像素即可生成对抗样本,且是一种黑盒攻击方法㊂Su等人[21]提出了一种One-pixel 攻击方法,利用差分进化算法,仅需获取目标模型的输出标签的概率信息,不需要计算梯度即可生成对抗样本(只是效率较低),是一种黑盒的攻击方法㊂Carlini和Wagner[22]分别在L0㊁L2和Lɕ范数约束下引入三种攻击算法,在防御蒸馏和非蒸馏的网络上都获得了良好的效果,证明了算法的通用性㊂Modas等人[23]提出了一种稀疏对抗攻击方法SparseFool,可以快速地计算稀疏扰动,并能有效地扩展到高维数据㊂但较大的稀疏度会导致这些少量的像素被修改的过于显著而容易被察觉(可感知)㊂为了提高稀疏对抗攻击方法的不可感知性,Croce等人[24]提出了CornerSearch算法,通过添加额外的约束,使像素仅在特定区域扰动,并避免沿轴对齐边缘扰动,从而达到稀疏扰动但肉眼仍然不易感知的目的㊂近期,Croce等人[25]又提出了一种基于随机搜索的多功能稀疏对抗攻击框架Sparse-RS㊂这些研究为本文带来了启发和灵感㊂其中,稀疏对抗攻击的研究中,有的限制扰动像素的数量但不限制扰动的幅度,有的限制了像素扰动的幅度但不限制扰动的位置,有的限制在特定区域内扰动像素但又没有兼顾扰动的幅度和数量㊂扰动过多的像素很难在真实场景中实施,不限制像素扰动的位置和幅度则对图像破坏过大且易被感知㊂为了能够同时满足这些目标,最有效的方法就是把对抗样本生成问题转化为多目标优化问题,同时限制扰动像素的数量㊁幅度和位置㊂扰动像素的数量越少㊁幅度越小㊁位置越偏,则效果越好㊂下面阐述本文提出的稀疏对抗攻击样本的生成算法㊂2㊀稀疏对抗攻击样本生成算法本文在保证扰动尽可能小的前提下,遍历正确类别之外的其他类别,并使其他类别的置信度尽可能大,据此生成对抗样本㊂下面给出形式化的优化目标㊂2.1㊀优化目标一张图像可用一个n维向量X=(x1,x2, ,540㊀燕山大学学报2023x n )形式化地表示,其中分量x i 表示图像中的一个像素㊂若图像是彩色的,x i 是一个五元组,即x i =(m i ,n i ,R i ,G i ,B i ),其中m i ,n i 为像素x i 的坐标,R i ,G i ,B i 表示像素x i 的三个颜色分量㊂若图像是灰度图,则x i 为一个三元组,即x i =(m i ,n i ,G a i ),其中G a i 为灰度㊂后文只针对五元组的彩色图像进行阐述,三元组的灰度图的扰动方法与之同理,故不再赘述㊂令f 是DNN 模型的分类器,f 接收输入向量X 并计算图像X 属于各类别的分类置信度(即概率)并取结果中最大的作为分类结果,比如计算后类别t 的分类置信度最大,则f 将X 分类为类别t ㊂令f t (X )表示图像X 属于类别t 的分类置信度㊂修改图像X 中的某些像素,对X 进行扰动,得到扰动向量e (X )=(e 1,e 2, ,e n )㊂其中e i 的结构与x i 完全相同,即e i =(m i ,n i ,R i ,G i ,B i ),其中m i ,n i为像素e i 的坐标,R i ,G i ,B i 表示像素e i 的三个颜色分量的扰动量,若R i ,G i ,B i 的值都是0则说明该像素没有被扰动㊂令符号adv 表示图像未被扰动时由f 分类得到的类别㊂添加扰动后,希望分类器f 产生分类错误,这就需要f 将图像X 分类为其原类别adv 的置信度尽可能的小,而分类为其他类别的置信度应尽可能的大㊂同时,为了达到肉眼不易感知的目的,需要保证扰动e (X )尽可能的小㊂于是,优化目标可以形式化地表述为求下面问题的最优解e (X )∗:max fmin e (X )∗㊀f t (X +e (X ))s.t.㊀t =1,2, ,C ,㊀t ʂadv ,(1)其中,C 为类别总数,adv 为图像X 未被扰动时由f 分类得到的类别㊂本文算法属于稀疏对抗攻击算法,只修改少量像素进行扰动,由L 0范数约束㊂即 e (X ) 0ɤk ,其中k 为扰动像素的个数,e (X )为稀疏向量,只有被扰动的像素才有RGB 扰动值,其他像素的RGB 扰动值都为0㊂为了便于表述,将e(X )称为扰动向量,而将X +e (X )称为扰动图像向量,表示原图像X 被扰动之后的图像㊂其值为X 和e (X )中对应位置上的颜色分量相加之后得到的n 维向量,简记为E ㊂2.2㊀扰动向量构建添加稀疏扰动后,为了尽可能地降低被肉眼识别出来的概率,需要限制扰动像素的幅度和扰动像素的位置㊂2.2.1㊀限制扰动像素的幅度限制扰动像素的幅度即限制扰动后的像素与原像素周围的像素之间的色差㊂色差过大会增加被肉眼识别的概率,但色差过小又会降低对抗成功率㊂因此,将像素x i 及与x i 呈直线相邻的2个像素作为一组来计算标准差(即分别将水平㊁竖直以及两条斜对角线这四条线上的三个像素作为一组计算颜色标准差,得到四组结果),记为σ1i ㊁σ2i ㊁σ3i ㊁σ4i ,如图1(a)所示㊂选取其中最小的值,记为σi =min(σ1i ,σ2i ,σ3i ,σ4i )㊂颜色标准差反映色差的离散程度,在像素x i 上加减这个σi ,就得到像素x i 的两个扰动颜色值,即x i ʃσi ,如图1(b)和(c)所示㊂对于这两个扰动颜色值,后续会分别计算其扰动效果,并将扰动效果更好的保留,作为该像素的扰动颜色㊂像素x i 处在图像的边缘时,对于其缺失的紧邻像素,以边缘像素填充㊂为了便于表述,将一张图像中所有像素与其周围像素的颜色标准差向量记为σ,即σ=(σ1,σ2, ,σn )㊂图1㊀扰动颜色的生成Fig.1㊀Generation of perturbation color㊀㊀先讨论单像素攻击,即k =1的情况㊂此时扰动向量e (X )中只有一个像素非空,即e (X )=(0,0, ,σi , ,0)㊂对于单个像素x i ,因为有两种扰动方式,即x i ʃσi ,所以可得到两个单像素扰动图像向量E +i =X +(0,0, ,σi , ,0)E -i =X -(0,0, ,σi , ,0){,(2)于是,可以得到所有的单像素扰动图像向量组成的向量空间1=(E +1,E -1, ,E +n ,E -n ),如图2所示㊂扰动一个像素产生2个扰动图像,由n 个像素构成的原图像就可以生成2n 个扰动图像㊂但最终不能对同一个像素同时进行两种扰动,因此,接下来根据扰动效果,从中选取一个扰动效果更好的用于后续的多像素攻击㊂第6期王㊀涛等㊀面向隐私保护的稀疏对抗攻击样本生成方法541㊀图2㊀单像素扰动图像向量空间1的生成Fig.2㊀Generation of one-pixel perturbationimage vector spaces1㊀㊀利用分类器f对1中的扰动图像分类,计算各类别的置信度和分类结果㊂若发现分类结果不是其原来类别adv ,则表示单像素攻击成功㊂一般来说,单像素攻击很难成功,但可以利用单像素扰动的分类置信度,进行多像素攻击㊂对于每一个类别t (t ɪ{1,2, ,C },t ʂadv ),计算E +i和E -i的分类置信度f t (E +i)和f t (E -i),并取其中分类置信度更大的扰动图像用于后续的多像素攻击,为了便于表述,将筛选后的结果(E +i 或E -i )简记为E ʃi ,即f t (E ʃi )=max(f t (E +i ),f t (E -i ))㊂(3)从式(3)中可知,E ʃi 指代的是导致分类器f 分类为t 的置信度最高的那个扰动x i 像素的扰动图像(E +i 或E -i )㊂分别针对每一分类,按置信度降序排列㊂记图像E ʃi 分类为t 的置信度的排序序号为fs t i ,若fs t i <fs t j ,则有f t (E ʃi )>f t (E ʃj )㊂于是得到按分类置信度降序的排序矩阵F s ,F s =fs 11fs t 1 fs C1︙︙︙fs 1i fs t i fs C i ︙︙︙fs 1n fs t n fs C n æèçççççççöø÷÷÷÷÷÷÷㊂对于分类t (t =1,2, ,C ,t ʂadv ),取F s 中第t 列前N 个排序序号所对应的扰动图像,即可得到最可能将分类结果扰动成t 的N 个单像素扰动图像㊂2.2.2㊀限制扰动像素的位置将扰动像素到图像中心点的欧式距离作为扰动像素偏远程度的度量,即d t (E ʃi )= (m i ,n i )-(mᶄi ,nᶄi ) 2,其中(m i ,n i )为图像E ʃi 中像素x i 的坐标,(mᶄi ,nᶄi )为图像中心点的坐标㊂本文的目标是在保证攻击有效的前提下,搜索离中心区域尽可能远的扰动像素,即扰动像素的位置越偏远则效果越好㊂因此,针对每一类别t (t ʂadv ),选取F s 的第t 列中排序序号前N 个所对应的扰动图像,再依据其d t (E ʃi )降序排序,排序后的序号记为ds t i ,若ds t i <ds t j ,则有d t (E ʃi )>d t (E ʃj )㊂于是对于每一个类别t (t ʂadv ),都可以得到一个排序向量D ts=(ds t 1,ds t 2, ,ds t N ),该向量首先按分类为t 的置信度降序,再按扰动像素离中心点的距离降序排列㊂所有类别t (t ʂadv )的排序向量D t s 就构成了新的排序矩阵D s ㊂这样,就得到了按照d t (E ʃi )排序后的N 个单像素扰动图像,这些图像中扰动像素的位置各不相同,但其离中心点的距离比其他未被选取的扰动像素更远㊂接下来利用这些扰动像素构建多像素扰动图像㊂2.2.3㊀利用抽样方案生成对抗攻击样本随机抽取k 个像素进行扰动,来生成一个针对类别t (t ʂadv )的目标攻击样本,即 e (X ) 0=k ,k ȡ2㊂为了同时保证扰动成功率和扰动效果(不易被肉眼识别),排序向量D t s 中更靠前的扰动像素,被抽到的概率应该更大㊂在扰动成功率不明显下降的基础上进行微调,尽量调高排序靠前的样本的抽样率,得到抽样公式㊀P (Z =i )=3(N -i +1)2N 3,i =1,2, ,N ,(4)根据式(4),从{1,2, ,N }中随机抽样(s 1,s 2, ,s k ),构造图像X 的k 个扰动像素㊂当i =1时,概率为3/N ,当i =N 时,概率为3/N 3㊂这个分布保证能更大概率地抽取位置更远㊁颜色修改对决策边界影响更大的扰动像素㊂将针对类别t (t ʂadv )进行目标攻击的,具有k 个扰动像素的扰动图像简记为E ʃ{s 1,s 2, ,s k}t㊂这样,k 个单像素的扰动效果通过Eʃ{s 1,s 2, ,s k }t叠加在一起,如图3所示㊂随机进行N iter 次抽样,可以得到针对类别t (t ʂadv )进行目标攻击的N iter 个k 像素扰动图像向量542㊀燕山大学学报2023组成的向量空间k t=(E ʃ{s 11,s 12, ,s 1k}t ,E ʃ{s 21,s 22, ,s 2k}t, ,Eʃ{s N iter1,s Niter2, ,s Niter k}t),其中{s i 1,s i 2, ,s i k}t表示第i 次抽样得到的k 个像素的序号信息㊂利用分类器f对k t中的k 像素扰动图像分类,计算各类别的置信度和分类结果㊂若存在分类结果为t (不再分类为其原分类adv )的扰动图像,则表示攻击成功㊂遍历除adv 之外的所有种类,调整k 的值,搜索扰动图像,生成多像素对抗样本㊂图3㊀k 像素扰动图像的生成Fig.3㊀Generationng of k -pixel perturbation image㊀㊀与迭代方案[26]相比,抽样方案的最大优势在于,所有这些图像都可以并行地输入到分类器f 中进行分类,这比迭代方案中的顺序处理方式要快得多㊂而且,抽样方案不像迭代方案一样需依赖先前的步骤,因此不会陷入某些次优解㊂另外,相比进化算法等复杂算法,抽样方案采用的算法更简单,具有更好的性能㊂本文方法不依赖目标DNN 模型的结构㊁梯度和参数等信息,是一种黑盒对抗攻击方法㊂2.3㊀扰动算法下面给出生成对抗样本的具体步骤和算法㊂1)生成像素颜色标准差向量σ㊂为图像X 中的每个像素计算其与周围像素之间的标准差,得到颜色标准差向量σ㊂利用σ可以限制扰动像素的色差,达到扰动图像后不易被觉察的目的㊂2)生成单像素扰动图像向量空间1,并搜索对抗样本(单像素攻击)㊂对图像X 中一个像素x i 进行扰动(x i ʃσi ),会得到2个单像素扰动图像E +i 和E -i ,因此,向量空间1中共有2n 个单像素扰动图像㊂用分类器f对1中的扰动图像进行分类,若分类结果不是其原分类adv ,则找到对抗样本,算法结束㊂否则,记录n 个扰动图像的分类置信度(对于每一个类别t (t ʂadv ),图像E +i 和E -i 中只留下分类置信度高的一个)㊂3)依据优化目标排序㊂依据式(1)中的优化目标,分类器f 将扰动图像分类为其他类别t (t ʂadv )的置信度应尽可能的大㊂对于每一个类别,根据步骤(2)中得到的分类置信度降序排序,得到排序矩阵F s ㊂遍历类别1到C (t ʂadv ),按像素离中心点的距离降序,为每一个类别t (t ʂadv )计算出排序向量D t s ㊂4)生成多像素扰动图像向量空间kt,并搜索对抗样本(多像素攻击)㊂希望生成的对抗样本中扰动像素个数应尽可能地少,因此扰动像素个数k 从2开始,遍历类别1到C (t ʂadv ),根据式(4)的概率分布对D t s 进行N iter 次抽样,得到k 像素扰动图像向量组成的向量空间k t,并在其中搜索对抗样本㊂若无法生成对抗样本,则逐渐增加k 的值直到成功生成对抗样本或k 到达限定的最大扰动像素个数k max 为止㊂根据以上步骤,即可为图像生成对抗样本,对抗DNN,其具体算法如算法1所示㊂Algorithm 1adversarial image perturbation algorithmInput :㊀Image:X =(x 1,x 2, ,x n );//x i =(m i ,n i ,R i ,G i ,B i )MaxNPixels:k max ;Iterations:N iter ;NOutput :Adversarial example:E1:σѳGenSigma(X )2:1ѳGenOnePixelPerturb(X ,σ)3:if ∃eɪ1,f (e )classified not as adv then4:E ѳe ,return5:end6:get ordering matrix F s7:k ѳ28:while k ɤk max do9:㊀for t =1to C do10:㊀㊀if t ʂadv then 11:㊀㊀㊀get ordering vector D t s by top N of column tof F s12:㊀㊀㊀k tѳGenNPixelsPerturb(X ,t ,k ,N iter )13:㊀㊀endif14:㊀㊀if ∃eɪk t,f (e )classified not as adv then15:E ѳe ,return 16:㊀㊀endif17:㊀endfor18:㊀k ѳk +119:㊀endwhile在算法1中,函数GenSigma(X )会生成颜色标准差向量σ,函数GenOnePixelPerturb(X ,σ)会第6期王㊀涛等㊀面向隐私保护的稀疏对抗攻击样本生成方法543㊀根据σ修改图像X生成单像素扰动图像向量空间1,函数GenNPixelsPerturb(X ,t ,k ,N iter )会针对每一个分类类别t (t ʂadv )生成k 像素扰动图像向量空间k t㊂综上所述,首先进行单像素攻击,找出哪些像素更容易攻击成功,并依据优化目标对这些像素排序;然后通过抽样将k (2ɤk ɤk max )个单像素组合在一起进行多像素攻击,生成对抗样本㊂3㊀实验与评价实验基于经典的ResNet 网络,使用手写数字数据集MNIST 和普适物体数据集CIFAR-10㊂将最大扰动像素个数k max 设置为30,随机抽样次数N iter 设置为1000,N 设置为100㊂实验选取了与本文方法最相关的5种稀疏对抗攻击方法进行对比分析㊂这5种方法分别是One-Pixel [21]㊁CW [22]㊁SparseFool [23]㊁JSMA [19]及CornerSearch [24]㊂对比了对抗成功率㊁扰动幅度(色差)的均值和中位数㊁扰动位置的均值和中位数㊁对抗样本效果等指标,并根据实验结果分析了目标DNN 模型的分类空间特征㊂3.1㊀对抗成功率和扰动数对比每种方法进行100次实验,统计结果如表1所示㊂其中CornerSearch 选取的是与本文最相关的σ-map 模式的结果㊂由表1可知,在对抗成功率上,本文方法在MNIST 数据集上略低于SparseFool 和JSMA 方法,在CIFAR-10数据集上则与其他方法相当或略高㊂这是由于MNIST 数据集中的图像分辨率为28ˑ28,像素较少导致了算法的优化能力无法有效发挥㊂从扰动均值和扰动中位数上可以看出,本文的方法扰动像素的数量少于CW 和JSMA,与SparseFool 相当,比One-Pixel 和CornerSearch 略多㊂表1㊀对抗成功率及扰动像素数量对比Tab.1㊀Comparison of success rate and number of perturbation pixels数据集方法One-Pixel CW SparseFoolJSMA CornerSearch本文方法是否黑盒是否否否是是MNIST对抗成功率92.5%86.2%99.0%99.1%96.2%97.6%扰动均值8.5746.1219.5385.619.278.54扰动中位数845124778CIFAR-10对抗成功率100%100%100%100%98.1%100.0%扰动均值 3.2918.5215.3756.247.6411.75扰动中位数31612528123.2㊀对抗样本效果对比不失一般性,随机选取了1张图片(ID:70),利用各方法对其扰动,每种方法重复实验100次,得到100张扰动图像,然后随机抽取6张展示在图4中㊂由图4可以看出,JSMA 和CW 方法扰动的像素数量较多,其中JSMA 扰动像素的数量达到50之多㊂而为了保证对抗成功,使DNN 模型错误分类,One-Pixel㊁CW㊁SparseFool 和JSMA 四种方法未对像素的扰动幅度进行限制,致使其扰动像素与周围像素间产生了非常大的色差,肉眼可见,对图像的破坏性较大㊂CornerSearch 方法的σ-map 模式与本文方法都对像素的修改数量和修改幅度同时进行了限制,扰动像素与周围像素间的色差较小,视觉效果良好,其变化不易感知㊂本文方法平均需要扰动12个像素(图像共有1024个像素,即扰动率1%),比CornerSearch 方法略多,但由于本文方法同时对图像的扰动位置进行了限制,所以产生的扰动像素更靠图像边缘,对图像的视觉影响更小,更不易感知㊂相比One-Pixel㊁CornerSearch 和SparseFool 方法,本文对扰动像素的位置和幅度优化效果显著,使得扰动像素更加不易感知,能更好地实现保护图像信息而扰动不易察觉的目的㊂3.3㊀DNN 模型分类空间特征分析下面给出实验中各种方法的扰动像素的位置544㊀燕山大学学报2023和RGB 颜色分量的分布,并分析目标DNN 模型的分类空间特征㊂图4㊀对抗样本效果对比Fig.4㊀The effect comparison of adversarial examples㊀㊀扰动像素的位置分布如图5所示,位置由设置了透明度的圆圈标记㊂颜色较深的圆圈是由于多次叠加造成的,表明在100次实验中该位置多次被扰动,这意味着在这些位置扰动更容易成功,说明这些位置邻近目标DNN 模型分类空间的边界㊂扰动像素到图像中心点的欧氏距离统计结果如表2所示㊂对比来看,One-Pixel 和CornerSearch 方法的扰动位置集中在图像的中心位置附近,这表明当扰动像素的数量很少时,扰动图像中心位置附近的像素更容易攻击成功㊂而CW㊁第6期王㊀涛等㊀面向隐私保护的稀疏对抗攻击样本生成方法545㊀SparseFool 和JSMA 方法,扰动像素比较均匀地散布在图像各处,这在JSMA 尤为明显㊂表明当扰动像素的数量越来越多时,攻击能否成功对位置因素的依赖性越来越小,即使在边缘区域依然可以攻击成功㊂这也证明了,只要合理选择扰动像素,是可以找到扰动像素位置和扰动像素数量之间的平衡的,即可以扰动数量不多且较偏远的像素达到对抗攻击的目的㊂本文方法旨在寻找这种平衡,兼顾了扰动像素的位置㊁数量和扰动幅度,在保证扰动像素的数量不多㊁幅度不大的情况下,尽量寻找图像边缘位置进行扰动㊂从图表中可以看出,本文方法的位置优化效果显著㊂CornerSearch 方法的σ-map 模式和本文方法都对扰动幅度进行了优化,导致扰动位置比较集中(图中有更多的位置叠加在一起)㊂但本文方法距离中心点的位置均值达到了10.4,表明扰动发生在图像中更偏远的位置上㊂图5㊀位置分布及对比Fig.5㊀The location distribution and comparison表2㊀扰动像素位置与颜色对比Tab.2㊀Comparison of the position and color of perturbation pixelsOne-PixelCW SparseFool JSMA CornerSearch本文方法位置距中心点均值8.811.010.811.48.510.4距中心点中位数8.111.411.211.98.210.4颜色扰动均值194.5182.6181.0174.4124.292.7扰动中位数185.4165.0165.3158.3127.872.0㊀㊀RGB 颜色分布情况如图6所示㊂由设置了透明度的星号标记的是扰动像素原来的RGB 颜色值,由设置了透明度的圆圈标记的是对应像素扰动后的颜色值㊂同样,颜色较深的标记是因为多次叠加,表明在100次实验中扰动像素多次被扰动成该颜色,这意味着扰动成这些颜色更容易成功,说明这些颜色值邻近目标DNN 模型的分类空间的边界㊂。
特征更新的动态图卷积表面损伤点云分割方法
第41卷 第4期吉林大学学报(信息科学版)Vol.41 No.42023年7月Journal of Jilin University (Information Science Edition)July 2023文章编号:1671⁃5896(2023)04⁃0621⁃10特征更新的动态图卷积表面损伤点云分割方法收稿日期:2022⁃09⁃21基金项目:国家自然科学基金资助项目(61573185)作者简介:张闻锐(1998 ),男,江苏扬州人,南京航空航天大学硕士研究生,主要从事点云分割研究,(Tel)86⁃188****8397(E⁃mail)839357306@;王从庆(1960 ),男,南京人,南京航空航天大学教授,博士生导师,主要从事模式识别与智能系统研究,(Tel)86⁃130****6390(E⁃mail)cqwang@㊂张闻锐,王从庆(南京航空航天大学自动化学院,南京210016)摘要:针对金属部件表面损伤点云数据对分割网络局部特征分析能力要求高,局部特征分析能力较弱的传统算法对某些数据集无法达到理想的分割效果问题,选择采用相对损伤体积等特征进行损伤分类,将金属表面损伤分为6类,提出一种包含空间尺度区域信息的三维图注意力特征提取方法㊂将得到的空间尺度区域特征用于特征更新网络模块的设计,基于特征更新模块构建出了一种特征更新的动态图卷积网络(Feature Adaptive Shifting⁃Dynamic Graph Convolutional Neural Networks)用于点云语义分割㊂实验结果表明,该方法有助于更有效地进行点云分割,并提取点云局部特征㊂在金属表面损伤分割上,该方法的精度优于PointNet ++㊁DGCNN(Dynamic Graph Convolutional Neural Networks)等方法,提高了分割结果的精度与有效性㊂关键词:点云分割;动态图卷积;特征更新;损伤分类中图分类号:TP391.41文献标志码:A Cloud Segmentation Method of Surface Damage Point Based on Feature Adaptive Shifting⁃DGCNNZHANG Wenrui,WANG Congqing(School of Automation,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China)Abstract :The cloud data of metal part surface damage point requires high local feature analysis ability of the segmentation network,and the traditional algorithm with weak local feature analysis ability can not achieve the ideal segmentation effect for the data set.The relative damage volume and other features are selected to classify the metal surface damage,and the damage is divided into six categories.This paper proposes a method to extract the attention feature of 3D map containing spatial scale area information.The obtained spatial scale area feature is used in the design of feature update network module.Based on the feature update module,a feature updated dynamic graph convolution network is constructed for point cloud semantic segmentation.The experimental results show that the proposed method is helpful for more effective point cloud segmentation to extract the local features of point cloud.In metal surface damage segmentation,the accuracy of this method is better than pointnet++,DGCNN(Dynamic Graph Convolutional Neural Networks)and other methods,which improves the accuracy and effectiveness of segmentation results.Key words :point cloud segmentation;dynamic graph convolution;feature adaptive shifting;damage classification 0 引 言基于深度学习的图像分割技术在人脸㊁车牌识别和卫星图像分析领域已经趋近成熟,为获取物体更226吉林大学学报(信息科学版)第41卷完整的三维信息,就需要利用三维点云数据进一步完善语义分割㊂三维点云数据具有稀疏性和无序性,其独特的几何特征分布和三维属性使点云语义分割在许多领域的应用都遇到困难㊂如在机器人与计算机视觉领域使用三维点云进行目标检测与跟踪以及重建;在建筑学上使用点云提取与识别建筑物和土地三维几何信息;在自动驾驶方面提供路面交通对象㊁道路㊁地图的采集㊁检测和分割功能㊂2017年,Lawin等[1]将点云投影到多个视图上分割再返回点云,在原始点云上对投影分割结果进行分析,实现对点云的分割㊂最早的体素深度学习网络产生于2015年,由Maturana等[2]创建的VOXNET (Voxel Partition Network)网络结构,建立在三维点云的体素表示(Volumetric Representation)上,从三维体素形状中学习点的分布㊂结合Le等[3]提出的点云网格化表示,出现了类似PointGrid的新型深度网络,集成了点与网格的混合高效化网络,但体素化的点云面对大量点数的点云文件时表现不佳㊂在不规则的点云向规则的投影和体素等过渡态转换过程中,会出现很多空间信息损失㊂为将点云自身的数据特征发挥完善,直接输入点云的基础网络模型被逐渐提出㊂2017年,Qi等[4]利用点云文件的特性,开发了直接针对原始点云进行特征学习的PointNet网络㊂随后Qi等[5]又提出了PointNet++,针对PointNet在表示点与点直接的关联性上做出改进㊂Hu等[6]提出SENET(Squeeze⁃and⁃Excitation Networks)通过校准通道响应,为三维点云深度学习引入通道注意力网络㊂2018年,Li等[7]提出了PointCNN,设计了一种X⁃Conv模块,在不显著增加参数数量的情况下耦合较远距离信息㊂图卷积网络[8](Graph Convolutional Network)是依靠图之间的节点进行信息传递,获得图之间的信息关联的深度神经网络㊂图可以视为顶点和边的集合,使每个点都成为顶点,消耗的运算量是无法估量的,需要采用K临近点计算方式[9]产生的边缘卷积层(EdgeConv)㊂利用中心点与其邻域点作为边特征,提取边特征㊂图卷积网络作为一种点云深度学习的新框架弥补了Pointnet等网络的部分缺陷[10]㊂针对非规律的表面损伤这种特征缺失类点云分割,人们已经利用各种二维图像采集数据与卷积神经网络对风扇叶片㊁建筑和交通工具等进行损伤检测[11],损伤主要类别是裂痕㊁表面漆脱落等㊂但二维图像分割涉及的损伤种类不够充分,可能受物体表面污染㊁光线等因素影响,将凹陷㊁凸起等损伤忽视,或因光照不均匀判断为脱漆㊂笔者提出一种基于特征更新的动态图卷积网络,主要针对三维点云分割,设计了一种新型的特征更新模块㊂利用三维点云独特的空间结构特征,对传统K邻域内权重相近的邻域点采用空间尺度进行区分,并应用于对金属部件表面损伤分割的有用与无用信息混杂的问题研究㊂对邻域点进行空间尺度划分,将注意力权重分组,组内进行特征更新㊂在有效鉴别外邻域干扰特征造成的误差前提下,增大特征提取面以提高局部区域特征有用性㊂1 深度卷积网络计算方法1.1 包含空间尺度区域信息的三维图注意力特征提取方法由迭代最远点采集算法将整片点云分割为n个点集:{M1,M2,M3, ,M n},每个点集包含k个点:{P1, P2,P3, ,P k},根据点集内的空间尺度关系,将局部区域划分为不同的空间区域㊂在每个区域内,结合局部特征与空间尺度特征,进一步获得更有区分度的特征信息㊂根据注意力机制,为K邻域内的点分配不同的权重信息,特征信息包括空间区域内点的分布和区域特性㊂将这些特征信息加权计算,得到点集的卷积结果㊂使用空间尺度区域信息的三维图注意力特征提取方式,需要设定合适的K邻域参数K和空间划分层数R㊂如果K太小,则会导致弱分割,因不能完全利用局部特征而影响结果准确性;如果K太大,会增加计算时间与数据量㊂图1为缺损损伤在不同参数K下的分割结果图㊂由图1可知,在K=30或50时,分割结果效果较好,K=30时计算量较小㊂笔者选择K=30作为实验参数㊂在分析确定空间划分层数R之前,简要分析空间层数划分所应对的问题㊂三维点云所具有的稀疏性㊁无序性以及损伤点云自身噪声和边角点多的特性,导致了点云处理中可能出现的共同缺点,即将离群值点云选为邻域内采样点㊂由于损伤表面多为一个面,被分割出的损伤点云应在该面上分布,而噪声点则被分布在整个面的两侧,甚至有部分位于损伤内部㊂由于点云噪声这种立体分布的特征,导致了离群值被选入邻域内作为采样点存在㊂根据采用DGCNN(Dynamic Graph Convolutional Neural Networks)分割网络抽样实验结果,位于切面附近以及损伤内部的离群值点对点云分割结果造成的影响最大,被错误分割为特征点的几率最大,在后续预处理过程中需要对这种噪声点进行优先处理㊂图1 缺损损伤在不同参数K 下的分割结果图Fig.1 Segmentation results of defect damage under different parameters K 基于上述实验结果,在参数K =30情况下,选择空间划分层数R ㊂缺损损伤在不同参数R 下的分割结果如图2所示㊂图2b 的结果与测试集标签分割结果更为相似,更能体现损伤的特征,同时屏蔽了大部分噪声㊂因此,选择R =4作为实验参数㊂图2 缺损损伤在不同参数R 下的分割结果图Fig.2 Segmentation results of defect damage under different parameters R 在一个K 邻域内,邻域点与中心点的空间关系和特征差异最能表现邻域点的权重㊂空间特征系数表示邻域点对中心点所在点集的重要性㊂同时,为更好区分图内邻域点的权重,需要将整个邻域细分㊂以空间尺度进行细分是较为合适的分类方式㊂中心点的K 邻域可视为一个局部空间,将其划分为r 个不同的尺度区域㊂再运算空间注意力机制,为这r 个不同区域的权重系数赋值㊂按照空间尺度多层次划分,不仅没有损失核心的邻域点特征,还能有效抑制无意义的㊁有干扰性的特征㊂从而提高了深度学习网络对点云的局部空间特征的学习能力,降低相邻邻域之间的互相影响㊂空间注意力机制如图3所示,计算步骤如下㊂第1步,计算特征系数e mk ㊂该值表示每个中心点m 的第k 个邻域点对其中心点的权重㊂分别用Δp mk 和Δf mk 表示三维空间关系和局部特征差异,M 表示MLP(Multi⁃Layer Perceptrons)操作,C 表示concat 函数,其中Δp mk =p mk -p m ,Δf mk =M (f mk )-M (f m )㊂将两者合并后输入多层感知机进行计算,得到计算特征系数326第4期张闻锐,等:特征更新的动态图卷积表面损伤点云分割方法图3 空间尺度区域信息注意力特征提取方法示意图Fig.3 Schematic diagram of attention feature extraction method for spatial scale regional information e mk =M [C (Δp mk ‖Δf mk )]㊂(1) 第2步,计算图权重系数a mk ㊂该值表示每个中心点m 的第k 个邻域点对其中心点的权重包含比㊂其中k ∈{1,2,3, ,K },K 表示每个邻域所包含点数㊂需要对特征系数e mk 进行归一化,使用归一化指数函数S (Softmax)得到权重多分类的结果,即计算图权重系数a mk =S (e mk )=exp(e mk )/∑K g =1exp(e mg )㊂(2) 第3步,用空间尺度区域特征s mr 表示中心点m 的第r 个空间尺度区域的特征㊂其中k r ∈{1,2,3, ,K r },K r 表示第r 个空间尺度区域所包含的邻域点数,并在其中加入特征偏置项b r ,避免权重化计算的特征在动态图中累计单面误差指向,空间尺度区域特征s mr =∑K r k r =1[a mk r M (f mk r )]+b r ㊂(3) 在r 个空间尺度区域上进行计算,就可得到点m 在整个局部区域的全部空间尺度区域特征s m ={s m 1,s m 2,s m 3, ,s mr },其中r ∈{1,2,3, ,R }㊂1.2 基于特征更新的动态图卷积网络动态图卷积网络是一种能直接处理原始三维点云数据输入的深度学习网络㊂其特点是将PointNet 网络中的复合特征转换模块(Feature Transform),改进为由K 邻近点计算(K ⁃Near Neighbor)和多层感知机构成的边缘卷积层[12]㊂边缘卷积层功能强大,其提取的特征不仅包含全局特征,还拥有由中心点与邻域点的空间位置关系构成的局部特征㊂在动态图卷积网络中,每个邻域都视为一个点集㊂增强对其中心点的特征学习能力,就会增强网络整体的效果[13]㊂对一个邻域点集,对中心点贡献最小的有效局部特征的边缘点,可以视为异常噪声点或低权重点,可能会给整体分割带来边缘溢出㊂点云相比二维图像是一种信息稀疏并且噪声含量更大的载体㊂处理一个局域内的噪声点,将其直接剔除或简单采纳会降低特征提取效果,笔者对其进行低权重划分,并进行区域内特征更新,增强抗噪性能,也避免点云信息丢失㊂在空间尺度区域中,在区域T 内有s 个点x 被归为低权重系数组,该点集的空间信息集为P ∈R N s ×3㊂点集的局部特征集为F ∈R N s ×D f [14],其中D f 表示特征的维度空间,N s 表示s 个域内点的集合㊂设p i 以及f i 为点x i 的空间信息和特征信息㊂在点集内,对点x i 进行小范围内的N 邻域搜索,搜索其邻域点㊂则点x i 的邻域点{x i ,1,x i ,2, ,x i ,N }∈N (x i ),其特征集合为{f i ,1,f i ,2, ,f i ,N }∈F ㊂在利用空间尺度进行区域划分后,对空间尺度区域特征s mt 较低的区域进行区域内特征更新,通过聚合函数对权重最低的邻域点在图中的局部特征进行改写㊂已知中心点m ,点x i 的特征f mx i 和空间尺度区域特征s mt ,目的是求出f ′mx i ,即中心点m 的低权重邻域点x i 在进行邻域特征更新后得到的新特征㊂对区域T 内的点x i ,∀x i ,j ∈H (x i ),x i 与其邻域H 内的邻域点的特征相似性域为R (x i ,x i ,j )=S [C (f i ,j )T C (f i ,j )/D o ],(4)其中C 表示由输入至输出维度的一维卷积,D o 表示输出维度值,T 表示转置㊂从而获得更新后的x i 的426吉林大学学报(信息科学版)第41卷特征㊂对R (x i ,x i ,j )进行聚合,并将特征f mx i 维度变换为输出维度f ′mx i =∑[R (x i ,x i ,j )S (s mt f mx i )]㊂(5) 图4为特征更新网络模块示意图,展示了上述特征更新的计算过程㊂图5为特征更新的动态图卷积网络示意图㊂图4 特征更新网络模块示意图Fig.4 Schematic diagram of feature update network module 图5 特征更新的动态图卷积网络示意图Fig.5 Flow chart of dynamic graph convolution network with feature update 动态图卷积网络(DGCNN)利用自创的边缘卷积层模块,逐层进行边卷积[15]㊂其前一层的输出都会动态地产生新的特征空间和局部区域,新一层从前一层学习特征(见图5)㊂在每层的边卷积模块中,笔者在边卷积和池化后加入了空间尺度区域注意力特征,捕捉特定空间区域T 内的邻域点,用于特征更新㊂特征更新会降低局域异常值点对局部特征的污染㊂网络相比传统图卷积神经网络能获得更多的特征信息,并且在面对拥有较多噪声值的点云数据时,具有更好的抗干扰性[16],在对性质不稳定㊁不平滑并含有需采集分割的突出中心的点云数据时,会有更好的抗干扰效果㊂相比于传统预处理方式,其稳定性更强,不会发生将突出部分误分割或漏分割的现象[17]㊂2 实验结果与分析点云分割的精度评估指标主要由两组数据构成[18],即平均交并比和总体准确率㊂平均交并比U (MIoU:Mean Intersection over Union)代表真实值和预测值合集的交并化率的平均值,其计算式为526第4期张闻锐,等:特征更新的动态图卷积表面损伤点云分割方法U =1T +1∑Ta =0p aa ∑Tb =0p ab +∑T b =0p ba -p aa ,(6)其中T 表示类别,a 表示真实值,b 表示预测值,p ab 表示将a 预测为b ㊂总体准确率A (OA:Overall Accuracy)表示所有正确预测点p c 占点云模型总体数量p all 的比,其计算式为A =P c /P all ,(7)其中U 与A 数值越大,表明点云分割网络越精准,且有U ≤A ㊂2.1 实验准备与数据预处理实验使用Kinect V2,采用Depth Basics⁃WPF 模块拍摄金属部件损伤表面获得深度图,将获得的深度图进行SDK(Software Development Kit)转化,得到pcd 格式的点云数据㊂Kinect V2采集的深度图像分辨率固定为512×424像素,为获得更清晰的数据图像,需尽可能近地采集数据㊂选择0.6~1.2m 作为采集距离范围,从0.6m 开始每次增加0.2m,获得多组采量数据㊂点云中分布着噪声,如果不对点云数据进行过滤会对后续处理产生不利影响㊂根据统计原理对点云中每个点的邻域进行分析,再建立一个特别设立的标准差㊂然后将实际点云的分布与假设的高斯分布进行对比,实际点云中误差超出了标准差的点即被认为是噪声点[19]㊂由于点云数据量庞大,为提高效率,选择采用如下改进方法㊂计算点云中每个点与其首个邻域点的空间距离L 1和与其第k 个邻域点的空间距离L k ㊂比较每个点之间L 1与L k 的差,将其中差值最大的1/K 视为可能噪声点[20]㊂计算可能噪声点到其K 个邻域点的平均值,平均值高出标准差的被视为噪声点,将离群噪声点剔除后完成对点云的滤波㊂2.2 金属表面损伤点云关键信息提取分割方法对点云损伤分割,在制作点云数据训练集时,如果只是单一地将所有损伤进行统一标记,不仅不方便进行结果分析和应用,而且也会降低特征分割的效果㊂为方便分析和控制分割效果,需要使用ArcGIS 将点云模型转化为不规则三角网TIN(Triangulated Irregular Network)㊂为精确地分类损伤,利用图6 不规则三角网模型示意图Fig.6 Schematic diagram of triangulated irregular networkTIN 的表面轮廓性质,获得训练数据损伤点云的损伤内(外)体积,损伤表面轮廓面积等㊂如图6所示㊂选择损伤体积指标分为相对损伤体积V (RDV:Relative Damege Volume)和邻域内相对损伤体积比N (NRDVR:Neighborhood Relative Damege Volume Ratio)㊂计算相对平均深度平面与点云深度网格化平面之间的部分,得出相对损伤体积㊂利用TIN 邻域网格可获取某损伤在邻域内的相对深度占比,有效解决制作测试集时,将因弧度或是形状造成的相对深度判断为损伤的问题㊂两种指标如下:V =∑P d k =1h k /P d -∑P k =1h k /()P S d ,(8)N =P n ∑P d k =1h k S d /P d ∑P n k =1h k S ()n -()1×100%,(9)其中P 表示所有点云数,P d 表示所有被标记为损伤的点云数,P n 表示所有被认定为损伤邻域内的点云数;h k 表示点k 的深度值;S d 表示损伤平面面积,S n 表示损伤邻域平面面积㊂在获取TIN 标准包络网视图后,可以更加清晰地描绘损伤情况,同时有助于量化损伤严重程度㊂笔者将损伤分为6种类型,并利用计算得出的TIN 指标进行损伤分类㊂同时,根据损伤部分体积与非损伤部分体积的关系,制定指标损伤体积(SDV:Standard Damege Volume)区分损伤类别㊂随机抽选5个测试组共50张图作为样本㊂统计非穿透损伤的RDV 绝对值,其中最大的30%标记为凹陷或凸起,其余626吉林大学学报(信息科学版)第41卷标记为表面损伤,并将样本分类的标准分界值设为SDV㊂在设立以上标准后,对凹陷㊁凸起㊁穿孔㊁表面损伤㊁破损和缺损6种金属表面损伤进行分类,金属表面损伤示意图如图7所示㊂首先,根据损伤是否产生洞穿,将损伤分为两大类㊂非贯通伤包括凹陷㊁凸起和表面损伤,贯通伤包括穿孔㊁破损和缺损㊂在非贯通伤中,凹陷和凸起分别采用相反数的SDV 作为标准,在这之间的被分类为表面损伤㊂贯通伤中,以损伤部分平面面积作为参照,较小的分类为穿孔,较大的分类为破损,而在边缘处因腐蚀㊁碰撞等原因缺角㊁内损的分类为缺损㊂分类参照如表1所示㊂图7 金属表面损伤示意图Fig.7 Schematic diagram of metal surface damage表1 损伤类别分类Tab.1 Damage classification 损伤类别凹陷凸起穿孔表面损伤破损缺损是否形成洞穿××√×√√RDV 绝对值是否达到SDV √√\×\\S d 是否达到标准\\×\√\2.3 实验结果分析为验证改进的图卷积深度神经网络在点云语义分割上的有效性,笔者采用TensorFlow 神经网络框架进行模型测试㊂为验证深度网络对损伤分割的识别准确率,采集了带有损伤特征的金属部件损伤表面点云,对点云进行预处理㊂对若干金属部件上的多个样本金属面的点云数据进行筛选,删除损伤占比低于5%或高于60%的数据后,划分并装包制作为点云数据集㊂采用CloudCompare 软件对样本金属上的损伤部分进行分类标记,共分为6种如上所述损伤㊂部件损伤的数据集制作参考点云深度学习领域广泛应用的公开数据集ModelNet40part㊂分割数据集包含了多种类型的金属部件损伤数据,这些损伤数据显示在510张总点云图像数据中㊂点云图像种类丰富,由各种包含损伤的金属表面构成,例如金属门,金属蒙皮,机械构件外表面等㊂用ArcGIS 内相关工具将总图进行随机点拆分,根据数据集ModelNet40part 的规格,每个独立的点云数据组含有1024个点,将所有总图拆分为510×128个单元点云㊂将样本分为400个训练集与110个测试集,采用交叉验证方法以保证测试的充分性[20],对多种方法进行评估测试,实验结果由单元点云按原点位置重新组合而成,并带有拆分后对单元点云进行的分割标记㊂分割结果比较如图8所示㊂726第4期张闻锐,等:特征更新的动态图卷积表面损伤点云分割方法图8 分割结果比较图Fig.8 Comparison of segmentation results在部件损伤分割的实验中,将不同网络与笔者网络(FAS⁃DGCNN:Feature Adaptive Shifting⁃Dynamic Graph Convolutional Neural Networks)进行对比㊂除了采用不同的分割网络外,其余实验均采用与改进的图卷积深度神经网络方法相同的实验设置㊂实验结果由单一损伤交并比(IoU:Intersection over Union),平均损伤交并比(MIoU),单一损伤准确率(Accuracy)和总体损伤准确率(OA)进行评价,结果如表2~表4所示㊂将6种不同损伤类别的Accuracy 与IoU 进行对比分析,可得出结论:相比于基准实验网络Pointet++,笔者在OA 和MioU 方面分别在贯通伤和非贯通伤上有10%和20%左右的提升,在整体分割指标上,OA 能达到90.8%㊂对拥有更多点数支撑,含有较多点云特征的非贯通伤,几种点云分割网络整体性能均能达到90%左右的效果㊂而不具有局部特征识别能力的PointNet 在贯通伤上的表现较差,不具备有效的分辨能力,导致分割效果相对于其他损伤较差㊂表2 损伤部件分割准确率性能对比 Tab.2 Performance comparison of segmentation accuracy of damaged parts %实验方法准确率凹陷⁃1凸起⁃2穿孔⁃3表面损伤⁃4破损⁃5缺损⁃6Ponitnet 82.785.073.880.971.670.1Pointnet++88.786.982.783.486.382.9DGCNN 90.488.891.788.788.687.1FAS⁃DGCNN 92.588.892.191.490.188.6826吉林大学学报(信息科学版)第41卷表3 损伤部件分割交并比性能对比 Tab.3 Performance comparison of segmentation intersection ratio of damaged parts %IoU 准确率凹陷⁃1凸起⁃2穿孔⁃3表面损伤⁃4破损⁃5缺损⁃6PonitNet80.582.770.876.667.366.9PointNet++86.384.580.481.184.280.9DGCNN 88.786.589.986.486.284.7FAS⁃DGCNN89.986.590.388.187.385.7表4 损伤分割的整体性能对比分析 出,动态卷积图特征以及有效的邻域特征更新与多尺度注意力给分割网络带来了更优秀的局部邻域分割能力,更加适应表面损伤分割的任务要求㊂3 结 语笔者利用三维点云独特的空间结构特征,将传统K 邻域内权重相近的邻域点采用空间尺度进行区分,并将空间尺度划分运用于邻域内权重分配上,提出了一种能将邻域内噪声点降权筛除的特征更新模块㊂采用此模块的动态图卷积网络在分割上表现出色㊂利用特征更新的动态图卷积网络(FAS⁃DGCNN)能有效实现金属表面损伤的分割㊂与其他网络相比,笔者方法在点云语义分割方面表现出更高的可靠性,可见在包含空间尺度区域信息的注意力和局域点云特征更新下,笔者提出的基于特征更新的动态图卷积网络能发挥更优秀的作用,而且相比缺乏局部特征提取能力的分割网络,其对于点云稀疏㊁特征不明显的非贯通伤有更优的效果㊂参考文献:[1]LAWIN F J,DANELLJAN M,TOSTEBERG P,et al.Deep Projective 3D Semantic Segmentation [C]∥InternationalConference on Computer Analysis of Images and Patterns.Ystad,Sweden:Springer,2017:95⁃107.[2]MATURANA D,SCHERER S.VoxNet:A 3D Convolutional Neural Network for Real⁃Time Object Recognition [C]∥Proceedings of IEEE /RSJ International Conference on Intelligent Robots and Systems.Hamburg,Germany:IEEE,2015:922⁃928.[3]LE T,DUAN Y.PointGrid:A Deep Network for 3D Shape Understanding [C]∥2018IEEE /CVF Conference on ComputerVision and Pattern Recognition (CVPR).Salt Lake City,USA:IEEE,2018:9204⁃9214.[4]QI C R,SU H,MO K,et al.PointNet:Deep Learning on Point Sets for 3D Classification and Segmentation [C]∥IEEEConference on Computer Vision and Pattern Recognition (CVPR).Hawaii,USA:IEEE,2017:652⁃660.[5]QI C R,SU H,MO K,et al,PointNet ++:Deep Hierarchical Feature Learning on Point Sets in a Metric Space [C]∥Advances in Neural Information Processing Systems.California,USA:SpringerLink,2017:5099⁃5108.[6]HU J,SHEN L,SUN G,Squeeze⁃and⁃Excitation Networks [C ]∥IEEE Conference on Computer Vision and PatternRecognition.Vancouver,Canada:IEEE,2018:7132⁃7141.[7]LI Y,BU R,SUN M,et al.PointCNN:Convolution on X⁃Transformed Points [C]∥Advances in Neural InformationProcessing Systems.Montreal,Canada:NeurIPS,2018:820⁃830.[8]ANH VIET PHAN,MINH LE NGUYEN,YEN LAM HOANG NGUYEN,et al.DGCNN:A Convolutional Neural Networkover Large⁃Scale Labeled Graphs [J].Neural Networks,2018,108(10):533⁃543.[9]任伟建,高梦宇,高铭泽,等.基于混合算法的点云配准方法研究[J].吉林大学学报(信息科学版),2019,37(4):408⁃416.926第4期张闻锐,等:特征更新的动态图卷积表面损伤点云分割方法036吉林大学学报(信息科学版)第41卷REN W J,GAO M Y,GAO M Z,et al.Research on Point Cloud Registration Method Based on Hybrid Algorithm[J]. Journal of Jilin University(Information Science Edition),2019,37(4):408⁃416.[10]ZHANG K,HAO M,WANG J,et al.Linked Dynamic Graph CNN:Learning on Point Cloud via Linking Hierarchical Features[EB/OL].[2022⁃03⁃15].https:∥/stamp/stamp.jsp?tp=&arnumber=9665104. [11]林少丹,冯晨,陈志德,等.一种高效的车体表面损伤检测分割算法[J].数据采集与处理,2021,36(2):260⁃269. LIN S D,FENG C,CHEN Z D,et al.An Efficient Segmentation Algorithm for Vehicle Body Surface Damage Detection[J]. Journal of Data Acquisition and Processing,2021,36(2):260⁃269.[12]ZHANG L P,ZHANG Y,CHEN Z Z,et al.Splitting and Merging Based Multi⁃Model Fitting for Point Cloud Segmentation [J].Journal of Geodesy and Geoinformation Science,2019,2(2):78⁃79.[13]XING Z Z,ZHAO S F,GUO W,et al.Processing Laser Point Cloud in Fully Mechanized Mining Face Based on DGCNN[J]. ISPRS International Journal of Geo⁃Information,2021,10(7):482⁃482.[14]杨军,党吉圣.基于上下文注意力CNN的三维点云语义分割[J].通信学报,2020,41(7):195⁃203. YANG J,DANG J S.Semantic Segmentation of3D Point Cloud Based on Contextual Attention CNN[J].Journal on Communications,2020,41(7):195⁃203.[15]陈玲,王浩云,肖海鸿,等.利用FL⁃DGCNN模型估测绿萝叶片外部表型参数[J].农业工程学报,2021,37(13): 172⁃179.CHEN L,WANG H Y,XIAO H H,et al.Estimation of External Phenotypic Parameters of Bunting Leaves Using FL⁃DGCNN Model[J].Transactions of the Chinese Society of Agricultural Engineering,2021,37(13):172⁃179.[16]柴玉晶,马杰,刘红.用于点云语义分割的深度图注意力卷积网络[J].激光与光电子学进展,2021,58(12):35⁃60. CHAI Y J,MA J,LIU H.Deep Graph Attention Convolution Network for Point Cloud Semantic Segmentation[J].Laser and Optoelectronics Progress,2021,58(12):35⁃60.[17]张学典,方慧.BTDGCNN:面向三维点云拓扑结构的BallTree动态图卷积神经网络[J].小型微型计算机系统,2021, 42(11):32⁃40.ZHANG X D,FANG H.BTDGCNN:BallTree Dynamic Graph Convolution Neural Network for3D Point Cloud Topology[J]. Journal of Chinese Computer Systems,2021,42(11):32⁃40.[18]张佳颖,赵晓丽,陈正.基于深度学习的点云语义分割综述[J].激光与光电子学,2020,57(4):28⁃46. ZHANG J Y,ZHAO X L,CHEN Z.A Survey of Point Cloud Semantic Segmentation Based on Deep Learning[J].Lasers and Photonics,2020,57(4):28⁃46.[19]SUN Y,ZHANG S H,WANG T Q,et al.An Improved Spatial Point Cloud Simplification Algorithm[J].Neural Computing and Applications,2021,34(15):12345⁃12359.[20]高福顺,张鼎林,梁学章.由点云数据生成三角网络曲面的区域增长算法[J].吉林大学学报(理学版),2008,46 (3):413⁃417.GAO F S,ZHANG D L,LIANG X Z.A Region Growing Algorithm for Triangular Network Surface Generation from Point Cloud Data[J].Journal of Jilin University(Science Edition),2008,46(3):413⁃417.(责任编辑:刘俏亮)。
基于超像素的遥感图像并行分割算法
基于超像素的遥感图像并行分割算法第一步是超像素生成。
超像素是指将图像划分为若干个相似的连续像素块的方法,可以减少图像中的冗余信息,提高分割效果。
常用的超像素生成方法包括基于区域的算法(如SLIC算法)和基于图的算法(如TurboPixels算法)。
算法首先对图像进行预处理,然后根据像素的相似性将图像划分为超像素。
第二步是特征提取。
特征是用来描述图像中物体的属性或特点的量化信息,可以用于区分不同类别的像素。
常用的特征包括颜色、纹理、形状等。
在超像素分割中,可以使用各个超像素的平均颜色值、纹理特征或边缘信息作为特征。
第三步是相似度计算。
相似度计算是用来度量两个像素或超像素之间的相似程度,可以通过计算它们之间的颜色差异、纹理差异等来得到。
常用的相似度计算方法包括欧式距离、马哈拉诺比斯距离等。
最后一步是并行分割。
并行分割是指同时对图像的多个部分进行分割的过程,可以大大提高分割速度。
在并行分割中,可以使用多线程或GPU 并行计算来加速算法的运行。
算法将相邻的超像素进行相似度比较,并通过合并相似的超像素来实现分割。
在合并超像素时,可以使用图像分割算法(如阈值分割算法、聚类算法等)来确定分割结果。
总结起来,基于超像素的遥感图像并行分割算法通过超像素生成、特征提取、相似度计算和并行分割的步骤来实现。
该算法可以提高分割效果和运算速度,适用于处理大规模遥感图像数据。
单目slam原理
单目slam原理单目SLAM(Simultaneous Localization and Mapping)是一种利用单个摄像头进行实时定位和建图的技术。
它能够通过对场景的实时观测和处理来估计自身相对于场景的位置,同时还能生成可用于后续操作的场景地图。
单目SLAM是一项复杂的任务,因为它需要处理实时数据流、解决视觉特征点匹配问题和求解高维状态估计等挑战。
单目SLAM的实现过程通常分为三个主要步骤:特征提取和追踪、定位估计和地图建立。
特征提取和追踪是单目SLAM的第一步,它通过分析连续图像帧之间的视觉特征点,如角点、边缘等来追踪相机的运动轨迹。
特征提取通常使用特征描述算法,如SIFT、SURF或ORB来检测图像中的关键点,然后利用特征描述符来跟踪这些特征点。
接下来是定位估计,它使用视觉几何和运动模型来估计相机的位姿(位置和姿态)。
这个过程通常包括对视觉特征点进行匹配,计算相机的运动并进行位姿估计。
匹配通常通过计算特征描述符之间的相似性得到,然后可以使用RANSAC等算法来排除错误匹配的特征点。
最后,就是地图建立。
在单目SLAM中,地图可以是一个二维的点云或三维的点云。
地图的构建主要通过特征点的三维重建来实现,这是通过利用相机的内外参数和深度信息来计算每个特征点的三维位置。
三维重建可以使用多种方法,如三角测量法、视差或深度图像等。
然而,由于单目SLAM只使用一个摄像头,它面临一些挑战和限制。
首先,单目SLAM无法直接获得场景的真实尺度,在没有其他传感器信息的情况下,无法准确地估计距离。
其次,单目SLAM容易受到大场景变化、运动模糊和低纹理等问题的影响,这会导致特征提取和匹配的困难。
此外,单目SLAM还假设场景是静态的,当有动态物体存在时,会导致地图建模的错误。
为了解决这些挑战,研究人员已经提出了很多改进的方法。
例如,使用光流估计技术可以提高特征追踪的精度,使用直接法可以克服纹理缺失问题。
此外,结合其他传感器(如惯性测量单元或深度传感器)也能够提高单目SLAM的性能。
机器人自主导航与定位技术测试考核试卷
3.请详细说明视觉SLAM中的特征提取、特征匹配和运动估计三个关键步骤的作用及其相互关系。()
4.在机器人路径规划中,解释A*算法和RRT(Rapidly-exploring Random Trees)算法的基本思想,并比较它们的优缺点。()
三、填空题(本题共10小题,每小题2分,共20分,请将正确答案填到题目空白处)
1.在机器人自主导航中,______是一种通过传感器数据来同时完成地图构建和定位的技术。()
2.机器人导航中的______算法是一种基于启发式的搜索算法,用于寻找从起点到目标点的最优路径。()
3.在视觉SLAM中,______是一种常用的前端处理技术,用于提取图像中的特征点。()
8. AB
9. ABC
10. ABCD
11. AB
12. ABCD
13. ABC
14. ABC
15. ABCD
16. ABC
17. ABC
18. ABCD
19. ABC
20. ABC
三、填空题
1. SLAM
2. A*算法
3.特征提取
4.避障算法
5.粒子滤波器
6.信标定位
7. IMU(惯性测量单元)
8.路径跟踪控制
A.速度控制
B.方向控制
C.轨迹跟踪
D.动态避障
19.以下哪些方法可以用于机器人的地形感知?()
A.激光雷达
B.摄像头
C.触觉传感器
D.红外传感器
20.以下哪些技术可以用于提高机器人定位的实时性?()
A.并行计算
B.硬件加速
C.算法优化
稀疏 slam方法
稀疏 slam方法Simultaneous Localization and Mapping (SLAM) is a widely used technique in the field of robotics for mapping an environment and simultaneously determining the location of the robot within that environment. However, one common challenge in SLAM is dealing with sparse data, where not enough information is available to accurately estimate the robot's position and build a reliable map. This can lead to errors in the mapping process and make it difficult for the robot to navigate effectively.同时定位与建图(SLAM)是机器人领域中广泛使用的一种技术,用于绘制环境地图并同时确定机器人在该环境中的位置。
然而,SLAM中常见的一个挑战是处理稀疏数据,即没有足够的信息可用于准确估计机器人的位置并构建可靠的地图。
这可能会导致绘图过程中出现错误,并使机器人难以有效导航。
Sparse SLAM methods aim to address this challenge by developing algorithms and techniques that can effectively handle data sparsity and still produce accurate and reliable maps. One common approach is to use feature-based methods, where distinctive landmarks orfeatures in the environment are used to localize the robot and build the map. These features can be detected and tracked by the robot's sensors, such as cameras or lidars, and used as reference points for estimating the robot's position.稀疏SLAM方法旨在通过开发算法和技术来解决这一挑战,能够有效处理数据的稀疏性,并且仍能产生准确可靠的地图。
大规模预训练模型技术和应用评估方法
大规模预训练模型技术和应用评估方法一、引言随着深度学习技术的发展,预训练模型已成为自然语言处理领域的热门话题。
大规模预训练模型技术和应用评估方法是自然语言处理领域中的重要研究方向之一。
本文将从以下几个方面介绍大规模预训练模型技术和应用评估方法。
二、大规模预训练模型技术1.预训练模型概述预训练模型是指在大规模数据上进行无监督学习,学习得到通用的语言表示,再在有标注数据上进行微调,得到特定任务的最优解。
常见的预训练模型包括BERT、GPT等。
2.BERTBERT(Bidirectional Encoder Representations from Transformers)是由Google提出的一种基于Transformer结构的双向编码器,通过Masked Language Model(MLM)和Next Sentence Prediction (NSP)任务进行预训练。
其中MLM任务是将输入序列中15%的随机词汇替换为[MASK]标记,并要求网络输出原始词汇;NSP任务是判断两个句子是否为连续句子。
3.GPTGPT(Generative Pre-trained Transformer)是由OpenAI提出的一种基于Transformer结构的单向解码器,通过语言模型任务进行预训练。
其中语言模型任务是要求网络根据前文生成下一个词汇。
三、大规模预训练模型应用评估方法1.下游任务微调下游任务微调是指在预训练模型的基础上,使用有标注数据进行微调,得到特定任务的最优解。
常见的下游任务包括文本分类、序列标注等。
2.无监督评估方法无监督评估方法是指不依赖于有标注数据的评估方法。
常见的无监督评估方法包括语法分析、词义相似度计算等。
3.有监督评估方法有监督评估方法是指依赖于有标注数据的评估方法。
常见的有监督评估方法包括人类判断、F1值计算等。
四、总结大规模预训练模型技术和应用评估方法是自然语言处理领域中的重要研究方向之一。
机器学习在地理学领域的应用研究
机器学习在地理学领域的应用研究地理学作为一门综合性的学科,涉及到许多复杂的现象和过程,如地形特征、气候变化、人类活动等。
随着科技的进步和数据的普及,机器学习逐渐成为地理学领域的研究利器。
本文将从不同的角度,探讨机器学习在地理学中的应用。
一、地形特征模拟地形特征是地理学中的重要研究对象之一。
传统的地形特征模拟通常依赖于复杂的数学模型和人工规则。
而机器学习则可以通过大量的地貌数据和真实观测数据,学习地形特征的生成规律,并进行模拟。
例如,人工神经网络可以通过输入地形数据和气候数据,模拟出不同气候条件下的地形变化趋势,为预测地貌演化提供依据。
二、遥感图像解译遥感图像解译是地理学研究中常见的任务之一。
机器学习可以通过训练模型,自动解译大量的遥感图像。
以卷积神经网络为例,可以通过输入遥感图像和相关地理信息,进行自动分类和识别。
这种方法不仅大大提高了遥感图像解译的效率,还能够减少人为误差。
三、气候变化预测气候变化是地理学研究中的热门话题之一。
机器学习的强大数据处理和模式识别能力,为气候变化预测提供了新的思路。
通过分析历史气象数据,机器学习算法可以学习天气和气候之间的关系,并预测未来的气候变化趋势。
这对于制定应对气候变化的政策和规划具有重要意义。
四、城市规划优化城市规划是地理学的重要研究领域之一,也是机器学习的应用领域之一。
机器学习可以分析大量的城市数据,如人流、交通状况等,提供科学的决策支持。
以聚类算法为例,可以通过分析人口分布和资源分布的数据,为城市规划提供科学的方案。
此外,机器学习还可以模拟城市发展趋势,为未来城市的规划和建设提供指导。
总结:机器学习在地理学领域的应用研究涉及到多个方面,如地形特征模拟、遥感图像解译、气候变化预测和城市规划优化等。
通过机器学习算法的强大数据分析和模式识别能力,地理学的研究效率和准确性得到极大提升。
然而,机器学习在地理学领域的应用仍面临一些挑战,如数据质量、模型可解释性等。
未来随着技术的不断发展和数据的不断积累,机器学习有望在地理学领域发挥更大的作用,为我们深入理解和研究地球提供更多可能性。
一种基于特征矩阵的软件脆弱性代码克隆检测方法
一种基于特征矩阵的软件脆弱性代码克隆检测方法∗甘水滔;秦晓军;陈左宁;王林章【期刊名称】《软件学报》【年(卷),期】2015(000)002【摘要】This article proposes a clone detection method based on a program characteristic metrics. Though analyzing the syntax and semantic characteristics of vulnerabilities, this detection method abstracts certain key nodes which describe different forms of vulnerability type from syntax parser tree, and expands four basic types of code clone to auxiliary classes. The characteristic metrics of the code then is finalized by obtaining the number of key nodes which are calculated via scanning corresponding code segment in the syntax parser tree. The clone detection based on a characteristic metrics creates basic knowledge base by extracting partial instances of open vulnerability database, and precisely locates the vulnerability codes by performing cluster calculation on the same codes responding to multiple types of code clone. Comparing with the detection method based on single characteristic vector, the proposed method produces more precise description about vulnerability. This detection method also offers a remedy to the drawbacks of formal detection method on its vulnerability type covering ability. Nine vulnerabilities are detected in an android-kernel system test. Testing on software of different code sizes shows that the performance of this method is linear with the size ofthe code.%提出了一种基于特征矩阵的软件代码克隆检测方法。
一种用于人脸识别的contourlet域稀疏表示分类器
一种用于人脸识别的contourlet域稀疏表示分类器
陈隽
【期刊名称】《网络新媒体技术》
【年(卷),期】2011(032)011
【摘要】超完备稀疏表示法可以有效解决人脸识别领域中由于光照、表情变化、遮挡和噪声问题等原因造成的性能瓶颈.基于超完备稀疏表示法,将人脸识别问题看作是为多个线性回归模型中的分类问题,提出了一种基于Contourlet域的稀疏表示分类器,改进了利用主成分分析进行数据预处理所造成的鉴别信息丢失,提升了稀疏表示分类器的鉴别能力.在ORL库、Yale库、扩展Yale库和PIE库上大量实验结果验证了算法的有效性.
【总页数】5页(P76-80)
【作者】陈隽
【作者单位】江苏省淮安市行政学院淮安223005
【正文语种】中文
【相关文献】
1.多稀疏表示分类器决策融合的人脸识别 [J], 唐彪;金炜;符冉迪;龚飞
2.一种基于低秩恢复稀疏表示分类器的人脸识别方法 [J], 杜海顺;张旭东;侯彦东;金勇
3.Contourlet 变换域的稀疏表示分类方法 [J], 廖传柱;潘婷婷;江铭炎
4.优化形式下的稀疏表示分类器的人脸识别 [J], 吉朝明; 宋铁成
5.优化形式下的稀疏表示分类器的人脸识别 [J], 吉朝明; 宋铁成
因版权原因,仅展示原文概要,查看原文内容请购买。
基于密度峰值聚类的网络入侵检测
基于密度峰值聚类的网络入侵检测
杜淑颖
【期刊名称】《软件》
【年(卷),期】2022(43)6
【摘要】网络安全己经成为现如今社会发展的重要保障,而入侵检测系统在网络安全的体系结构中占着举足轻重的地位。
传统的基于聚类分析的网络入侵检测方法需要预先设定聚类数目且无法处理噪声数据,但入侵检测系统获取的网络行为记录具有很强的随机性,其聚类数目和聚类形状难以事先确定,故需要更鲁棒的聚类方法进行入侵检测。
本文提出一种基于密度峰值聚类的网络入侵检测方法,该方法利用了密度峰值聚类算法的优点,无需迭代、参数鲁棒、自动获取聚类数目,并且可以很好地处理噪声数据和入侵检测系统所获取的网络行为记录,挖掘更有效的入侵信息。
最后通过对KDD CUP 1999数据集的实验验证,验证本文方法的有效性和精确性。
【总页数】7页(P40-46)
【作者】杜淑颖
【作者单位】徐州生物工程职业技术学院信息管理学院;中国矿业大学计算机科学与技术学院
【正文语种】中文
【中图分类】TP391
【相关文献】
1.基于小波变换和改进快速密度峰值聚类算法的负荷曲线聚类研究
2.基于k近邻密度峰值聚类混合算法的网络入侵检测
3.基于密度峰值的网络用户信息聚类局部自适应加密研究
4.基于密度峰值的网络用户信息聚类局部自适应加密研究
5.基于密度峰值法的复杂网络聚类增长维度研究
因版权原因,仅展示原文概要,查看原文内容请购买。
基于顶点范数的半脆弱盲水印算法
基于顶点范数的半脆弱盲水印算法
张国有;崔健;王安红
【期刊名称】《扬州大学学报:自然科学版》
【年(卷),期】2022(25)1
【摘要】针对传统数字水印算法无法检测三维网格模型是否遭受恶意攻击的问题,提出一种基于顶点范数的空域半脆弱盲水印算法.通过计算表征几何特征的顶点范数构造水印嵌入基元,在嵌入过程中先使用顶点范数对模型进行分区,再在每个分区内按位嵌入水印,检测时采用投票机制得出水印信息.实验结果表明,该算法对平移、旋转、均匀缩放、顶点重排序及其组合操作的无害操作具有鲁棒性,能较好地识别噪声和量化等恶意攻击,可容忍模型无害操作并有效检测出恶意攻击行为.
【总页数】7页(P48-53)
【作者】张国有;崔健;王安红
【作者单位】太原科技大学计算机科学与技术学院;太原科技大学电子信息工程学院
【正文语种】中文
【中图分类】TP309.2
【相关文献】
1.基于内容认证与盲重建的半脆弱水印系统
2.基于H.264低比特率视频流的半脆弱盲水印算法实现
3.基于顶点统计特征的三维网格模型盲水印算法
4.基于顶点范数的三维模型鲁棒数字水印算法
5.基于整数小波变换的半盲脆弱水印算法
因版权原因,仅展示原文概要,查看原文内容请购买。
深度神经网络拾取地震P和S波到时
深度神经网络拾取地震P和S波到时于子叶;储日升;盛敏汉【期刊名称】《地球物理学报》【年(卷),期】2018(061)012【摘要】从地震波形数据中快速准确地提取各个震相的到时是地震学中的基础问题.本文针对上述问题提出了利用深度神经网络拾取到时的新方法,建立了用于地震到时提取的17层Inception深度网络模型,在对原始三分量数据进行高通滤波和归一化处理后输入网络直接输出到时信息.整个过程基于神经网络自适应提取波形特征,自动输出结果.通过对100组加了不同强度的噪声数据进行了可靠性检验,相比于其他方法神经网络方法对于噪声具有较高的容忍度以及稳定性,并且与地震目录数据有较高的相似性.相比于AR-AIC+ STA/LTA,深度神经网络虽然运算速度稍慢,但整个过程不需设定时窗与阈值,同时具有更高的可用性,并且可以迭代升级以提高精度.此方法作为人工智能方法,为波形到时拾取提供了新思路.【总页数】14页(P4873-4886)【作者】于子叶;储日升;盛敏汉【作者单位】中国科学院测量与地球物理研究所大地测量与地球动力学国家重点实验室,武汉 430077;中国科学院大学,北京100049;中国科学院测量与地球物理研究所大地测量与地球动力学国家重点实验室,武汉 430077;中国科学院测量与地球物理研究所大地测量与地球动力学国家重点实验室,武汉 430077;中国科学院大学,北京100049【正文语种】中文【中图分类】P315【相关文献】1.基于支持向量机的地震体波震相自动识别及到时自动拾取 [J], 蒋一然;宁杰远2.基于优化参数的地震P、S波震相到时自动拾取及质量评估 [J], 杨旭; 李永华; 苏伟; 孙莲3.基于深度学习到时拾取自动构建长宁地震前震目录 [J], 赵明;唐淋;陈石;苏金蓉;张淼4.一种适用于地方震事件的S波到时自动拾取方法 [J], 张红才;廖诗荣;陈智勇;黄玲珠5.利用标准时频变换方法在强噪声环境下无偏拾取地震P波、S波到时 [J], 姚彦吉;柳林涛;盛敏汉;许厚泽因版权原因,仅展示原文概要,查看原文内容请购买。
基于GAN的自适应残差密集网络图像去模糊方法
基于GAN的自适应残差密集网络图像去模糊方法
李寻寻;李旭健
【期刊名称】《计算机仿真》
【年(卷),期】2022(39)11
【摘要】为解决由于物体运动或者相机抖动造成的图像模糊的问题,提出了一种基于GAN的自适应残差密集网络图像去模糊方法。
将自适应残差密集网络作为生成器网络的核心部分,并利用残差密集块提升网络提取特征的能力。
一方面,在对抗损失、内容损失等的约束下,通过生成器与判别器的对抗训练,可以生成细节纹理丰富
的去模糊图像。
另一方面,利用自适应残差密集块和跳跃连接进行差分学习,解决了
梯度消失的问题,提高了网络结构对权值变化的敏感性,使得网络训练速度更快,模型收敛更好,生成的图像更清晰。
通过与现有去模糊方法定性定量的比较,提出的方法
在GOPRO数据集上去模糊效果更好,且结构相似度和峰值信噪比均优于对比方法。
【总页数】6页(P457-462)
【作者】李寻寻;李旭健
【作者单位】山东科技大学计算机科学与工程学院
【正文语种】中文
【中图分类】TP317.4
【相关文献】
1.结合残差密集块的卷积神经网络图像去噪方法
2.RD-GAN:一种结合残差密集网
络的高清动漫人脸生成方法3.基于自适应残差的运动图像去模糊4.基于注意力残
差编解码网络的动态场景图像去模糊5.基于轻量化渐进式残差网络的图像快速去模糊
因版权原因,仅展示原文概要,查看原文内容请购买。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Internet GIS Application Framework for Location-Based ServicesDevelopmentDragan Stojanovic, Slobodanka Djordjevic-KajanFaculty of Electronic Engineering, University of NisBeogradska 14, 18000 Nis,YugoslaviaEmail: {dragans, sdjordjevic}@elfak.ni.ac.yuAbstractSupport of Internet GIS application framework for design and development of location based services is described in this paper. Definition, main characteristics and functions of location-based services are presented and the Internet GIS as their core component is described through overview of possible architectures and development methodologies. The iSTOMM - a framework for Internet GIS application development is introduced as the collection of spatio-temporal data modeling and management tools and techniques for design and development of location-based services. The power of spatio-temporal concepts incorporated within 3-tier information systems architecture of iSTOMM have been verified through development of Yellow pages service, and remains under further refinement and improvement through development of more sophisticated location based services.1. IntroductionThe Internet and Web, on the turn of the Millennium, are tremendously changing every aspect of our lives. Communication with business partners, commerce transactions, buying and selling goods and services, sharing and exchanging ideas and information, learning, software development, and many others everyday business and leisure activities we are forced to conduct using specialized Internet/Web based information systems popularly named: e-business, e-commerce, e-learning, e-medicine, e-everything, etc. The open infrastructure, the open and public standards and the decentralised architecture are responsible for such success and penetration of Internet and Web technology in human everyday life and business.The exponential Internet growth and the global connectivity reached in the last few years have had a great impact on the requirements of contemporary and next-generation information systems. Fundamental characteristics include efficient data access, delivery over the Web, heterogeneity, and interoperability. The primary focus of Internet/Web use for mass distribution and presentation of public information has moved to distribution of software services over intranet, extranet and Internet [1]. Next-generation information systems are going to be assembled of specialised Web services (components) that are self-contained, self-describing, modular applications that can be published, located and invoked across the Web using wide spectrum of Web-enabled stationary (desktops, workstations, Web TV) and mobile devices (PDAs, mobile phones, laptops, handheld computers, etc.). The Internet/Web technology wave has also reached the Geographic Information Systems (GIS) research and development sector. The integration of GIS and Internet technologies is allowing GIS developers to provide access to geo-information and processing without burdening end users with complicated and expensive software and dedicated hardware. The recent convergence of multiple information and communication technologies including Internet, wireless communications, mobile position determination, portable Internet-enabled devices and GIS hasgiven rise to a new class of location based applications and services. Location based services deliver geographic information and geo-processing power between mobile and/or static users via the Internet and/or wireless network. Development of GIS, as integral components of location-based services remains the complex task so the availability of Internet GIS application framework is crucial to achieve greater scalability, reliability and fast-time-to-market.The rest of the paper is organised as follows. Section 2 reviews main concepts and architecture of location-based services and their main component - Internet GIS. Section 3 presents Internet GIS application framework, giving the overview of spatio-temporal object model it is based on and describing its architecture and functional components. Section 4 presents location based service for Yellow pages, developed on top of Internet GIS application framework. Section 5 concludes this paper and gives an overview of future research.2. Internet GIS and Location Based ServicesOver the last few years the GIS research and development sector has undertaken a paradigm shift away from the pure desktop GIS solutions which has to be used only by experts giving results, often on paper, to non-experts. Now, through the introduction of Internet or intranet/extranet-based GIS systems, it is possible for any person to use some kind of GIS and mapping system. Only minimal browser technology enable users to zoom into their data, explore and analyse it and produce a report that can be pasted into any office application environment. Tomorrow's (or even today's) GIS software "scene" will increasingly consist of data manipulation and spatial analysis tools designed as Web-based components or applications, operating on geo-data in a distributed, Internet/Web-based environment [2].With the emergence of Internet/Web the design and development of contemporary and next-generation information systems face new challenges, opportunities and requirements. The first generation of Internet information systems was considered mostly static in nature, providing users with poor and limited interactive view of information from databases, generated by Web servers. The contemporary architecture of information systems is now distributed over multiple tiers (usually: client, application and data server tiers), and assembled of application components that are self-contained, self-describing, modular applications (services). Such components can be published, located and invoked across the Web using computing devices of all sizes: from mainframes to PDAs and mobile phones [3]. In accordance with it, there are two basic approaches to development and deployment of GIS or any other complex, data-driven application on the Internet: as server-side or client-side applications [4]. In a server-side Internet GIS application, a Web browser is used to generate server requests and display the results. An Internet GIS server usually combines a standard Web (HTTP) server and a GIS application server, and the GIS databases and functionality reside completely on the server(s). Within server-side GIS application user’s interaction in Web browser represents the request which is transferred to a Web server. The Web server passes the request to a GIS application server, which runs GIS application software, generates a map graphic, converts the graphic to Web format, wraps the image in HTML and sends it back to the Web server, which then returns the response to the client as a standard Web page. Server-side applications can comply with Internet standards, because the entire complex and proprietary software, as well as the GIS databases reside on a server that's administered by the deploying organization. Disadvantages of server-side solutions are primarily associated with poor performance and limited user interface and interaction.The future belongs to client-side, 3-tiers Internet GIS applications, especially for intranet and extranet solutions dedicated to provide full GIS analysis and management support to specific users within business, government or public utility sectors. In client-side Internet GIS, the client is enhanced to support GIS operations, while the middle tier, representing by application server, is populated with application logic (figure 1). In such systems either a substantial amount of GIS functionality is moved to the client, or only the user interface is enhanced slightly to enable specific user interaction. Depending on the degree of functionality possessed by the client, the OpenGIS has developed a model to classify different types of Internet GISs according to their portrayal service built within clients with various "thickness" [5]:§Thin clients (only raster images JPG and PNG)§Medium clients (graphic primitives WebCGM and SVG)§Thick clients (data in the form of simple features XML/GML, processed at the client side)Client-side solutions typically are implemented by augmenting the Web browser with Java applets, ActiveX components or plug-ins. However, some client-side solutions require users to install a complete client application. The primary advantages of client-side solutions are the abilities to enhance user interfaces, improve performance and implement advanced solutions using both raster and vector data.Figure 1. Client-side Internet GIS architecture (3-tiers)The main problems associated with client-side solutions relate to distributing software and data. Distributing software (Java, ActiveX or any other type) is still problematic because of portability and platform incompatibilities. Considering distribution of geo-data and geo-services, ISO TC 211 [6] and OpenGIS [5,7] are working on open standards for interoperability within geo-information infrastructure, which have to be foundation of contemporary Internet based GISs. Their standardisation activities and initiatives require consensus on geo-information aspects related to: geometric model, description of geo-data sets and geo-services, access and query of metadata and return of query results, selecting of geo-data and formatting (and transferring) geo-data sets' results. The recent convergence of network computing and wireless telecommunications with Internet-based spatial technologies is giving rise to a new class of location-based applications and services. Location based services represent Internet GIS applications which, according to location of the user (or some requested location), deliver geo-data and geo-processing from the GIS servers across Internet/Web for using on wide spectrum of Web-enabled stationary and mobile devices [8]. The value of location based services is in giving assistance to stationary and mobile users in day-to-day situations such as giving the shortest route from their location to the nearest place of interest or sending emergency service to their current location in the case of accident. The truth power of location based services lays in delivering GIS functionality and location-based information acrossfixed and mobile Internet-based networks, to be used by anyone, anywhere, at any time and on any device.Location is central to how people organize and relate to their world. Knowing the location of people, objects, and phenomena at any time within end-user applications that are aware of the position of a user, and delivered through a wide range of devices via the Internet and wireless network, bring invaluable benefits to business, consumer and government sectors. According to OpenGIS - OpenLS [9], location represents a position in space and time that can be measured and whose coordinates can be derived in a particular spatial and temporal reference system. However, at the current state of location based services technology, the temporal aspect of geographic location is mainly neglected, or only partially integrated within recent research, development and standardisation initiatives and solutions. That omission is inherited from a GIS community that hasn't still explored and established the full potentials of managing temporal aspect of geo-information [11].The architecture of location based services consists of three main parts (figure 2) [9]:§ Positioning of mobile terminals based on either GSM/GPRS/UTMS mobile communication systems, or GPS/GLONASS/Galileo satellite positioning systems.§ Wireless communication network based on GSM/GPRS/UTMS.§Internet/Web GIS that provides spatio-temporal data and services over Web.GPS / GLONASS/ GalileopositioningPSTNPositioningserver(GMLC)Network services Network ManagementControlDSL &Cable Stationary user service clientsFigure 2. Location based services concept and architecture (adapted from [7])GIS (Location) Information Server provides access to location data sources distributed over Internet, which possess different structure (DBMS, files, etc.) and are stored in different, proprietary or open formats (SHP, MIF, XML, GML, GIF, PNG, etc). Location application services representapplication components integrated within GIS (Location) Application Servers that operate on location content and provide value-added services to two main group of clients: wireless and wired. Gateway Services enable integration of existing wireless-IP platforms maintained by mobile communication operators with Location Application Servers and Location Service Clients. Such clients operate either on fixed (desktop) or mobile terminals and interface directly with users/customers, and depending on processing and graphic capabilities of targeted device possess various levels of functionality and interactivity ("thickness").Three generations of location based services is identified [10]. First generation services were limited to stationary desktop computers with wired connection to Internet/Web and represent currently mature Internet/Web GIS applications. They require the user to manually input his/her location (or location of interest) in the form of a place name, street address, postal code, telephone number and geographic coordinates and appropriate temporal information. Second generation location services, which are available today, have the ability to determine rough locations typically at the postal code level. Using a device such as the PalmVII to access the PalmNet data network, a mobile user can find restaurants or gas stations in order of proximity or travel directions. Third generation location services are more location aware, taking advantage of more precise positional information and have the capability to initiate services proactively based on location. These trigger mode services can notify the user of relevant events or conditions without the active participation of the user such as traffic alerts that meet the user's preset preferences. Three types of triggers could be identified:§Object triggers notify the user of the mobile device when entering within a predefined distance of a facility.§Object-temporal triggers add the dimension of time.§Affinity triggers allow one mobile device to know of the location of another mobile device. Although considerable attention within location based service technology has been placed to its constituent technologies, like wireless Web, mobile Internet-enabled devices and mobile positioning, the heart of the whole system represents Internet-enabled GIS technology. Internet GIS within location based services provides access to spatio-temporal information and GIS components dedicated to its processing based on the location of user, or locations of mobile or static features which are of interest to the user. The Internet GIS technology behind these services will empower an increasingly diverse range of applications, putting even more valuable information and processing in the hands of mobile and stationary users wherever and whenever it is needed. But, to design and develop an Internet GIS application, to deploy it over multiple tiers within wired/wireless Internet-based network and to provide and sell it to the end user as valuable location based service, the developer must be supported by appropriate Internet GIS application framework during all phases of development process.3. i STOMM - Internet GIS application frameworkThe A RGONAUT project at the Computer Graphics and GIS Lab, University of Nis, aims to develop a suite of spatio-temporal data modeling and management technologies and tools integrated within temporal GIS application framework - STOMM (Spatio-Temporal Object Modeling and Management) [11]. STOMM is based on an object-oriented spatio-temporal data model and is intended to support all stages of development process, from conceptual design through to implementation, of a temporal GIS application. The STOMM data model was developed according to specifications and standards developed within ISO/TC 211 [6, 12] and OpenGIS [7] related to modeling and processing of geographic information. STOMM presents the successor of S PA T EMPsystem [13], but based on a new, STOMM data model, extended with more functionality in spatio-temporal domain consistently with ongoing standardization and redesigned and redeveloped according to component-based development approach.Based on this general framework, two architecture-oriented application frameworks, for desktop and internet-based temporal GIS applications (d STOMM and i STOMM respectively) have been implemented and have already found valued, commercial purposes [16]. The primary purpose of i STOMM framework is to add significant GIS functionality to location based services developed around it. It provides a wide variety of spatio-temporal services, which operate on spatio-temporal information arisen from different information sources through Internet infrastructure and mobile terminal positions.i STOMM framework also provides support for displaying results of such operations to the mobile or stationary Web terminals, in the form of maps, reports and messages and/or triggering specific alerts and events.3.1 STOMM data modelSTOMM data model supports all important concepts found in object-oriented modeling theory and paradigm, and can been used at conceptual and logical modeling levels. It is based on extensible spatial and temporal class hierarchies. The basis of STOMM modeling framework is GeographicObject class for modeling geographic features. The GeographicObject class represents some entity or phenomenon of the real world, so it must have an actual or potential position in space, which is represented by SpatialObject class, and time, which is represented by TemporalObject class. STOMM data model enables user to design a temporal GIS application by defining application specific classes representing the real-world entities (cities, forests, railways, telecom cables) through inheritance of GeographicObject class or some of its subclasses already designed for specific application domain. Spatial properties of geographic features are expressed by their geometry, topology and cartographic presentation. In STOMM data model, spatial properties are specified through SpatialObject class (figure 3).Figure 3. GeographicObject class and its spatial property - SpatialObject classThe geometric description of a SpatialObject is defined through Geometry class (figure 4). Considering Geometry class hierarchy, we have decided to accept standardization work pursued within OpenGIS Consortium, concerning simple features and their geometries [7] and ISO/TC 211 work on spatial schema [6]. We have based our work on high correspondence between those specifications relating to geometric properties of geographic features, differentiated mainly in the level of abstraction. Our decision has also been supported by recent adoption of OpenGIS Simple Feature Specification, by ISO/TC 211 in the form of Draft International Standard [12]. Geometry classes contain appropriate descriptions of specific geometry through attributes (collection of points represented different geometric primitives), as well as appropriate class methods representing geometric and topological operations and relations. Coordinate values of geometric classes have real world meaning within spatial referencing framework that has been provided through SpatialReferenceSystem class associated to Geometry class.Figure 4. Cartographic presentation of spatial objects - Cartography classCartographic presentation of spatial objects and visualization of spatial objects' properties are defined through Cartography class (figure 4). Thus, a clear separation between geometry and its cartographic presentation has been defined. Cartography class contain attributes (symbols, font, line width, colour, etc.) and operations that determine how properties of a given spatial object (geometry and topology) are represented in terms of different parameters (scale, graphics output, output media…). Operations defined in the SpatialObject class for expressing topological relations, direction relations and metric operators, define a foundation for definition and specification of spatial part of spatio-temporal query language STOQL [14]. Temporal properties of features modeled within STOMM data model are incorporated using TemporalObject class. TimeStamp class hierarchy defines primitive temporal classes which can be used for timestamping: instant (Instant class), period (Period class), and homogenous (MultiInstant and MultiPeriod classes) and heterogeneous (TimeStampCollection class) collections of time stamps (figure 5). TimeStamp class is associated to TemporalReferenceSytem class, which defines time type (UTC, GPS or arbitrary time), and an offset from UTC time in hours and minutes. The TemporalObject class includes operations for specifying temporal topology (before, after, during, meet, contain, etc.) and temporal metrics (duration, union, intersection, difference, etc.) as defined in [11, 15]. These operations represent the basis for definition and specification of temporal part of spatio-temporal query language STOQL.Figure 5. TemporalObject class and TimeStamp class hierarchyThe Evolution class is defined and associated to TemporalObject class (figure 6). Thus every instance of specific temporal class, i.e. class inherited from TemporalObject class has the reference to the appropriate Evolution class' object. An instance of Evolution class relates to an ordered set of instances of TemporalObject class (its application specific subclasses) and those are the versions of the same temporal object. The Evolution class, through its operations, provides evidence of temporal object changes, records temporal object's lifespan, and provides reference to previous, next, first, last, or the current version of a temporal object, as well as the versions valid at the specific time stamp.{ordered}Figure 6. Evolution and TemporalObject classesClasses of objects that are considered as temporal must be defined through inheritance of the TemporalObject class. So, the methodology of temporal object modeling proposed here corresponds to object timestamping by valid time. But, any attribute of such temporal class, which possesses temporal behavior different from the rest of the class, could be modeled as a single attribute of a separate class inherited from TemporalObject class. Thus, the temporal part of STOMM data model can be considered as object-attribute timestamping. Geographic features whose thematic (non-spatial) properties are changed through time are modeled as a TGeographicObject class through multiple inheritance from classes TemporalObject and GeographicObject (figure 7). Class TGeographicObject represents geographic features whose thematic properties undergo changes through time, but whose spatial properties, specified in SpatialObject class, are not changeable.Figure 7. TGeographicObject classIn order to define spatial properties of geographic features, which are changed thorough time the SpatioTemporalObject class is defined within STOMM data model, through inheritance of SpatialObject class and TemporalObject class (figure 8). The SpatioTemporalObject class defines spatio-temporal operations as an extended set of spatial operations defined in SpatialObject class, with temporal information added in the list of operations' parameters. The topological and direction operations can be twofold. One type of operations returns a boolean value indicating whether there exists or not specific spatial relation at specific timestamp between spatio-temporal objects. The other type returns the set of time intervals indicating that during these intervals spatio-temporal objects satisfied specified relations. Metric operators simply specifies value of metric characteristics (length, area, etc.) of spatio-temporal object at specified time instant, or a set of values for time period or timestamp collection.Figure 8. Modeling changes of spatial properties - SpatioTemporalObject classThus, within STOMM data model, four different types of geographic features can be modeled according to their behavior in time, i.e. whether their spatial and/or thematic properties are changed in time.3.2 i STOMM architecture and componentsThe i STOMM architecture is compliant with the ISO/ODP 10746 guidelines for open distributed systems, and adheres (as well as generic STOMM) to the standardisation work within OpenGIS and ISO/TC 211, related to spatial data and services interoperability. Development of i STOMM is based on up-to-date Internet information technologies such as: Java, XML/GML, CORBA, etc.i STOMM application framework supports 3-tier Internet GIS architecture (figure 9), consisting of:§Spatio-temporal information server;§Spatio-temporal application server and application components;§Spatio-temporal service client components.Figure 9. i STOMM architecture and componentsThe figure also shows appropriate information and distributed component technologies used for implementation of i STOMM components and communications between different components and tiers. i STOMM information server is based on implementation of STOMM data model within commercial, relational, object-relational and object-oriented DBMS (Oracle 8i, SQL Server, MSAccess, etc.). i STOMM application server consists of GIS application components, developed on CORBA and EJB distributed component platforms, that represent building blocks of user (application developer)-defined temporal GIS application. The main component of the i STOMM application server is i STOMM kernel, which provides spatio-temporal object management and integration, including mechanisms for updating, indexing, mediation & wrapping between different data models, etc. i STOMM application components represent software components for advanced manipulation of spatio-temporal objects, such as:§Processing of spatio-temporal-thematic queries, visually specified according to various spatial, thematic and temporal criteria through visual spatio-temporal, object-oriented query languagei STOQL;§Visualisation and animation of spatio-temporal objects through i STOVAM (i STO Visualization and Animation Manager);§Conversion of spatio-temporal object from/to STOMM object format to/from XML/GML or specific proprietary spatial data formats;§Spatio-temporal analysis, reasoning and data mining (only generic support to appropriate, user-defined components);i STOMM application server provides architecture extensibility enabling integration of user-defined application components into its general architecture. Also mentioned components can be customised, extended or replaced as long as the new components reside on the i STOMM Kernel.i STOMM client contains GUI components providing access to i STOMM application server functionality, developed on JavaBeans and CORBA technology. Such components are integrated with user-defined location based service client, executing within Web browser on mobile or stationary user terminal or as a standalone application.The main purpose of i STOMM application framework represents design and development of location based services for business, customer and government sector. Currently, two location-based services based on i STOMM are under development, and are intended for:§Yellow pages directories;§Vehicle navigation, tracking and guidance, which particularly exploits the full value ofi STOMM temporal dimension.4. Development of i STOMM based "Yellow pages" serviceThe implementation of i STOMM we've primarily based on contemporary, object-relational DBMSs such as Oracle8i, which provides built in support for spatial data management, as well as specification of user-defined data types. Also, by using JDBC/ODBC standards the access to other commercially available DBMSs is provided. Implementation of i STOMM application components is based on Java programming language, as well as CORBA and EJB distributed computing platforms. i STOMM client GUI components are implemented either as CORBA clients or JavaBeans components.As the proof of the concepts built in i STOMM application framework, we've developed a prototype location based service for Yellow pages for city of Nis and its surrounding. Due to limited mobile communication infrastructure that is provided by Yugoslav mobile operators, and unavailability of mobile positioning services within their centers, we base our location based service on wired Internet access. The Yellow pages application is based on geographic and thematic data collected within GinisWeb prototype [17] representing city streets and street numbers, as well as different types of business and public organisations (schools, bus and train stations, faculties, cultural。