D-xylose_Kinetics.
[14],[17],[4],[1],[13],[7],[2],[3],[18],[16],[10],[15],[11],[9],[8],[12],and[6].
FORMALIZED MATHEMATICSVolume11,Number4,2003University of BiałystokBanach Space of Absolute SummableReal SequencesYasumasa Suzuki Take,Yokosuka-shiJapanNoboru EndouGifu National College of Technology Yasunari ShidamaShinshu UniversityNaganoSummary.A continuation of[5].As the example of real norm spaces, we introduce the arithmetic addition and multiplication in the set of absolutesummable real sequences and also introduce the norm.This set has the structureof the Banach space.MML Identifier:RSSPACE3.The notation and terminology used here are introduced in the following papers:[14],[17],[4],[1],[13],[7],[2],[3],[18],[16],[10],[15],[11],[9],[8],[12],and[6].1.The Space of Absolute Summable Real SequencesThe subset the set of l1-real sequences of the linear space of real sequences is defined by the condition(Def.1).(Def.1)Let x be a set.Then x∈the set of l1-real sequences if and only if x∈the set of real sequences and id seq(x)is absolutely summable.Let us observe that the set of l1-real sequences is non empty.One can prove the following two propositions:(1)The set of l1-real sequences is linearly closed.(2) the set of l1-real sequences,Zero(the set of l1-real sequences,the linearspace of real sequences),Add(the set of l1-real sequences,the linear space377c 2003University of BiałystokISSN1426–2630378yasumasa suzuki et al.of real sequences),Mult(the set of l1-real sequences,the linear space ofreal sequences) is a subspace of the linear space of real sequences.One can check that the set of l1-real sequences,Zero(the set of l1-real sequences,the linear space of real sequences),Add(the set of l1-real sequences,the linear space of real sequences),Mult(the set of l1-real sequences,the linear space of real sequences) is Abelian,add-associative,ri-ght zeroed,right complementable,and real linear space-like.One can prove the following proposition(3) the set of l1-real sequences,Zero(the set of l1-real sequences,the linearspace of real sequences),Add(the set of l1-real sequences,the linear spaceof real sequences),Mult(the set of l1-real sequences,the linear space ofreal sequences) is a real linear space.The function norm seq from the set of l1-real sequences into R is defined by: (Def.2)For every set x such that x∈the set of l1-real sequences holds norm seq(x)= |id seq(x)|.Let X be a non empty set,let Z be an element of X,let A be a binary operation on X,let M be a function from[:R,X:]into X,and let N be a function from X into R.One can check that X,Z,A,M,N is non empty.Next we state four propositions:(4)Let l be a normed structure.Suppose the carrier of l,the zero of l,theaddition of l,the external multiplication of l is a real linear space.Thenl is a real linear space.(5)Let r1be a sequence of real numbers.Suppose that for every naturalnumber n holds r1(n)=0.Then r1is absolutely summable and |r1|=0.(6)Let r1be a sequence of real numbers.Suppose r1is absolutely summableand |r1|=0.Let n be a natural number.Then r1(n)=0.(7) the set of l1-real sequences,Zero(the set of l1-real sequences,the linearspace of real sequences),Add(the set of l1-real sequences,the linear spaceof real sequences),Mult(the set of l1-real sequences,the linear space ofreal sequences),norm seq is a real linear space.The non empty normed structure l1-Space is defined by the condition (Def.3).(Def.3)l1-Space= the set of l1-real sequences,Zero(the set of l1-real sequences,the linear space of real sequences),Add(the set of l1-realsequences,the linear space of real sequences),Mult(the set of l1-realsequences,the linear space of real sequences),norm seq .banach space of absolute summable (379)2.The Space is Banach SpaceOne can prove the following two propositions:(8)The carrier of l1-Space=the set of l1-real sequences and for every set xholds x is an element of l1-Space iffx is a sequence of real numbers andid seq(x)is absolutely summable and for every set x holds x is a vectorof l1-Space iffx is a sequence of real numbers and id seq(x)is absolutelysummable and0l1-Space=Zeroseq and for every vector u of l1-Space holdsu=id seq(u)and for all vectors u,v of l1-Space holds u+v=id seq(u)+id seq(v)and for every real number r and for every vector u of l1-Spaceholds r·u=r id seq(u)and for every vector u of l1-Space holds−u=−id seq(u)and id seq(−u)=−id seq(u)and for all vectors u,v of l1-Spaceholds u−v=id seq(u)−id seq(v)and for every vector v of l1-Space holdsid seq(v)is absolutely summable and for every vector v of l1-Space holdsv = |id seq(v)|.(9)Let x,y be points of l1-Space and a be a real number.Then x =0iffx=0l1-Space and0 x and x+y x + y and a·x =|a|· x .Let us observe that l1-Space is real normed space-like,real linear space-like, Abelian,add-associative,right zeroed,and right complementable.Let X be a non empty normed structure and let x,y be points of X.The functorρ(x,y)yields a real number and is defined by:(Def.4)ρ(x,y)= x−y .Let N1be a non empty normed structure and let s1be a sequence of N1.We say that s1is CCauchy if and only if the condition(Def.5)is satisfied. (Def.5)Let r2be a real number.Suppose r2>0.Then there exists a natural number k1such that for all natural numbers n1,m1if n1 k1and m1k1,thenρ(s1(n1),s1(m1))<r2.We introduce s1is Cauchy sequence by norm as a synonym of s1is CCauchy.In the sequel N1denotes a non empty real normed space and s2denotes a sequence of N1.We now state two propositions:(10)s2is Cauchy sequence by norm if and only if for every real number rsuch that r>0there exists a natural number k such that for all naturalnumbers n,m such that n k and m k holds s2(n)−s2(m) <r.(11)For every sequence v1of l1-Space such that v1is Cauchy sequence bynorm holds v1is convergent.References[1]Grzegorz Bancerek.The ordinal numbers.Formalized Mathematics,1(1):91–96,1990.[2]Czesław Byliński.Functions and their basic properties.Formalized Mathematics,1(1):55–65,1990.380yasumasa suzuki et al.[3]Czesław Byliński.Functions from a set to a set.Formalized Mathematics,1(1):153–164,1990.[4]Czesław Byliński.Some basic properties of sets.Formalized Mathematics,1(1):47–53,1990.[5]Noboru Endou,Yasumasa Suzuki,and Yasunari Shidama.Hilbert space of real sequences.Formalized Mathematics,11(3):255–257,2003.[6]Noboru Endou,Yasumasa Suzuki,and Yasunari Shidama.Real linear space of real sequ-ences.Formalized Mathematics,11(3):249–253,2003.[7]Krzysztof Hryniewiecki.Basic properties of real numbers.Formalized Mathematics,1(1):35–40,1990.[8]Jarosław Kotowicz.Monotone real sequences.Subsequences.Formalized Mathematics,1(3):471–475,1990.[9]Jarosław Kotowicz.Real sequences and basic operations on them.Formalized Mathema-tics,1(2):269–272,1990.[10]Jan Popiołek.Some properties of functions modul and signum.Formalized Mathematics,1(2):263–264,1990.[11]Jan Popiołek.Real normed space.Formalized Mathematics,2(1):111–115,1991.[12]Konrad Raczkowski and Andrzej Nędzusiak.Series.Formalized Mathematics,2(4):449–452,1991.[13]Andrzej Trybulec.Subsets of complex numbers.To appear in Formalized Mathematics.[14]Andrzej Trybulec.Tarski Grothendieck set theory.Formalized Mathematics,1(1):9–11,1990.[15]Wojciech A.Trybulec.Subspaces and cosets of subspaces in real linear space.FormalizedMathematics,1(2):297–301,1990.[16]Wojciech A.Trybulec.Vectors in real linear space.Formalized Mathematics,1(2):291–296,1990.[17]Zinaida Trybulec.Properties of subsets.Formalized Mathematics,1(1):67–71,1990.[18]Edmund Woronowicz.Relations and their basic properties.Formalized Mathematics,1(1):73–83,1990.Received August8,2003。
deep visual-semantic alignments for generating image descriptions
Deep Visual-Semantic Alignments for Generating Image DescriptionsAndrej Karpathy Li Fei-FeiDepartment of Computer Science,Stanford University{karpathy,feifeili}@AbstractWe present a model that generates free-form natural lan-guage descriptions of image regions.Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and vi-sual data.Our approach is based on a novel combination of Convolutional Neural Networks over image regions,bidi-rectional Recurrent Neural Networks over sentences,and a structured objective that aligns the two modalities through a multimodal embedding.We then describe a Recurrent Neu-ral Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions.We demonstrate the effectiveness of our alignment model with ranking experiments on Flickr8K,Flickr30K and COCO datasets,where we substantially improve on the state of the art.We then show that the sentences created by our gen-erative model outperform retrieval baselines on the three aforementioned datasets and a new dataset of region-level annotations.1.IntroductionA quick glance at an image is sufficient for a human to point out and describe an immense amount of details about the vi-sual scene[8].However,this remarkable ability has proven to be an elusive task for our visual recognition models.The majority of previous work in visual recognition has focused on labeling images with afixed set of visual categories,and great progress has been achieved in these endeavors[36,6]. However,while closed vocabularies of visual concepts con-stitute a convenient modeling assumption,they are vastly restrictive when compared to the enormous amount of rich descriptions that a human can compose.Some pioneering approaches that address the challenge of generating image descriptions have been developed[22,7]. However,these models often rely on hard-coded visual con-cepts and sentence templates,which imposes limits on their variety.Moreover,the focus of these works has been on re-ducing complex visual scenes into a single sentence,which we consider as an unnecessaryrestriction.Figure1.Our model generates free-form natural language descrip-tions of image regions.In this work,we strive to take a step towards the goal of generating dense,free-form descriptions of images(Figure1).The primary challenge towards this goal is in the de-sign of a model that is rich enough to reason simultaneously about contents of images and their representation in the do-main of natural language.Additionally,the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely primarily on training data.The second,practical challenge is that datasets of im-age captions are available in large quantities on the internet [14,46,29],but these descriptions multiplex mentions of several entities whose locations in the images are unknown.Our core insight is that we can leverage these large image-sentence datasets by treating the sentences as weak labels, in which contiguous segments of words correspond to some particular,but unknown location in the image.Our ap-proach is to infer these alignments and use them to learna generative model of descriptions.Concretely,our contri-butions are twofold:•We develop a deep neural network model that in-fers the latent alignment between segments of sen-1tences and the region of the image that they describe.Our model associates the two modalities through a common,multimodal embedding space and a struc-tured objective.We validate the effectiveness of this approach on image-sentence retrieval experiments in which we surpass the state-of-the-art.•We introduce a multimodal Recurrent Neural Network architecture that takes an input image and generates its description in text.Our experiments show that the generated sentences significantly outperform retrieval-based baselines,and produce sensible qualitative pre-dictions.We then train the model on the inferred cor-respondences and evaluate its performance on a new dataset of region-level annotations.We make our code,data and annotations publicly available.2.Related WorkDense image annotations.Our work shares the high-level goal of densely annotating the contents of images with many works before us.Barnard et al.[1]and Socher et al.[38]studied the multimodal correspondence between words and images to annotate segments of images.Several works[26,12,9]studied the problem of holistic scene un-derstanding in which the scene type,objects and their spa-tial support in the image is inferred.However,the focus of these works is on correctly labeling scenes,objects and re-gions with afixed set of categories,while our focus is on richer and higher-level descriptions of regions. Generating textual descriptions.Multiple works have ex-plored the goal of annotating images with textual descrip-tions on the scene level.A number of approaches pose the task as a retrieval problem,where the most compatible annotation in the training set is transferred to a test image [14,39,7,34,17],or where training annotations are broken up and stitched together[23,27,24].However,these meth-ods rely on a large amount of training data to capture the variety in possible outputs,and are often expensive at test time due to their non-parametric nature.Several approaches have been explored for generating image captions based on fixed templates that arefilled based on the content of the im-age[13,22,7,43,44,4].This approach still imposes limits on the variety of outputs,but the advantage is that thefinal results are more likely to be syntactically correct.Instead of using afixed template,some approaches that use a gen-erative grammar have also been developed[33,45].More closely related to our approach is the work of Srivastava et al.[40]who use a Deep Boltzmann Machine to learn a joint distribution over a images and tags.However,they do not generate extended phrases.More recently,Kiros et al.[19] developed a log-bilinear model that can generate full sen-tence descriptions.However,their model uses afixed win-dow context,while our Recurrent Neural Network model can condition the probability distribution over the next word in the sentence on all previously generated words. Grounding natural language in images.A number of ap-proaches have been developed for grounding textual data in the visual domain.Kong et al.[20]develop a Markov Ran-dom Field that infers correspondences from parts of sen-tences to objects to improve visual scene parsing in RGBD images.Matuszek et al.[30]learn joint language and per-ception model for grounded attribute learning in a robotic setting.Zitnick et al.[48]reason about sentences and their grounding in cartoon scenes.Lin et al.[28]retrieve videos from a sentence description using an intermediate graph representation.The basic form of our model is in-spired by Frome et al.[10]who associate words and images through a semantic embedding.More closely related is the work of Karpathy et al.[18],who decompose images and sentences into fragments and infer their inter-modal align-ment using a ranking objective.In contrast to their model which is based on grounding dependency tree relations,our model aligns contiguous segments of sentences which are more meaningful,interpretable,and notfixed in length. Neural networks in visual and language domains.Mul-tiple approaches have been developed for representing im-ages and words in higher-level representations.On the im-age side,Convolutional Neural Networks(CNNs)[25,21] have recently emerged as a powerful class of models for image classification and object detection[36].On the sen-tence side,our work takes advantage of pretrained word vectors[32,15,2]to obtain low-dimensional representa-tions of words.Finally,Recurrent Neural Networks have been previously used in language modeling[31,41],but we additionally condition these models on images.3.Our ModelOverview.The ultimate goal of our model is to generate descriptions of image regions.During training,the input to our model is a set of images and their corresponding sen-tence descriptions(Figure2).Wefirst present a model that aligns segments of sentences to the visual regions that they describe through a multimodal embedding.We then treat these correspondences as training data for our multimodal Recurrent Neural Network model which learns to generate the descriptions.3.1.Learning to align visual and language data Our alignment model assumes an input dataset of images and their sentence descriptions.The key challenge to in-ferring the association between visual and textual data is that sentences written by people make multiple references to some particular,but unknown locations in the image.For example,in Figure2,the words“Tabby cat is leaning”referFigure 2.Overview of our approach.A dataset of images and their sentence descriptions is the input to our model (left).Our model first infers the correspondences (middle)and then learns to generate novel descriptions (right).to the cat,the words “wooden table”refer to the table,etc.We would like to infer these latent correspondences,with the goal of later learning to generate these snippets from image regions.We build on the basic approach of Karpa-thy et al.[18],who learn to ground dependency tree re-lations in sentences to image regions as part of a ranking objective.Our contribution is in the use of bidirectional recurrent neural network to compute word representations in the sentence,dispensing of the need to compute depen-dency trees and allowing unbounded interactions of words and their context in the sentence.We also substantially sim-plify their objective and show that both modifications im-prove ranking performance.We first describe neural networks that map words and image regions into a common,multimodal embedding.Then we introduce our novel objective,which learns the embedding representations so that semantically similar concepts across the two modalities occupy nearby regions of the space.3.1.1Representing imagesFollowing prior work [22,18],we observe that sentencedescriptions make frequent references to objects and their attributes.Thus,we follow the method of Girshick et al.[11]to detect objects in every image with a Region Convo-lutional Neural Network (RCNN).The CNN is pre-trained on ImageNet [3]and finetuned on the 200classes of the ImageNet Detection Challenge [36].To establish fair com-parisons to Karpathy et al.[18],we use the top 19detected locations and the whole image and compute the represen-tations based on the pixels I b inside each bounding box as follows:v =W m [CNN θc (I b )]+b m ,(1)where CNN (I b )transforms the pixels inside bounding box I b into 4096-dimensional activations of the fully connected layer immediately before the classifier.The CNN parame-ters θc contain approximately 60million parameters and the architecture closely follows the network of Krizhevsky et al [21].The matrix W m has dimensions h ×4096,where h is the size of the multimodal embedding space (h ranges from 1000-1600in our experiments).Every image is thus repre-sented as a set of h -dimensional vectors {v i |i =1...20}.3.1.2Representing sentencesTo establish the inter-modal relationships,we would like to represent the words in the sentence in the same h -dimensional embedding space that the image regions oc-cupy.The simplest approach might be to project every in-dividual word directly into this embedding.However,this approach does not consider any ordering and word context information in the sentence.An extension to this idea is to use word bigrams,or dependency tree relations as pre-viously proposed [18].However,this still imposes an ar-bitrary maximum size of the context window and requires the use of Dependency Tree Parsers that might be trained on unrelated text corpora.To address these concerns,we propose to use a bidirectional recurrent neural network (BRNN)[37]to compute the word representations.In our setting,the BRNN takes a sequence of N words (encoded in a 1-of-k representation)and trans-forms each one into an h -dimensional vector.However,the representation of each word is enriched by a variably-sized context around that ing the index t =1...N to denote the position of a word in a sentence,the precise form of the BRNN we use is as follows:x t =W w I t(2)e t =f (W e x t +b e )(3)h f t =f (e t +W f h ft −1+b f )(4)h b t =f (e t +W b h b t +1+b b )(5)s t =f (W d (h f t +h b t )+b d ).(6)Here,I t is an indicator column vector that is all zeros except for a single one at the index of the t -th word in a word vo-cabulary.The weights W w specify a word embedding ma-trix that we initialize with 300-dimensional word2vec [32]weights and keep fixed in our experiments due to overfitting concerns.Note that the BRNN consists of two independent streams of processing,one moving left to right (h f t )and theother right to left (h bt )(see Figure 3for diagram).The fi-nal h -dimensional representation s t for the t -th word is a function of both the word at that location and also its sur-rounding context in the sentence.Technically,every s t is a function of all words in the entire sentence,but our empir-Figure3.Diagram for evaluating the image-sentence score S kl. Object regions are embedded with a CNN(left).Words(enriched by their context)are embedded in the same multimodal space with a BRNN(right).Pairwise similarities are computed with inner products(magnitudes shown in grayscale)andfinally reduced to image-sentence score with Equation8.icalfinding is that thefinal word representations(s t)align most strongly to the visual concept of the word at that lo-cation(I t).Our hypothesis is that the strength of influence diminishes with each step of processing since s t is a more direct function of I t than of the other words in the sentence. We learn the parameters W e,W f,W b,W d and the respec-tive biases b e,b f,b b,b d.A typical size of the hidden rep-resentation in our experiments ranges between300-600di-mensions.We set the activation function f to the rectified linear unit(ReLU),which computes f:x→max(0,x).3.1.3Alignment objectiveWe have described the transformations that map every im-age and sentence into a set of vectors in a common h-dimensional space.Since our labels are at the level of en-tire images and sentences,our strategy is to formulate an image-sentence score as a function of the individual scores that measure how well a word aligns to a region of an im-age.Intuitively,a sentence-image pair should have a high matching score if its words have a confident support in the image.In Karpathy et al.[18],they interpreted the dot product v T i s t between an image fragment i and a sentence fragment t as a measure of similarity and used these to de-fine the score between image k and sentence l as:S kl=t∈g li∈g kmax(0,v T i s t).(7)Here,g k is the set of image fragments in image k and g l is the set of sentence fragments in sentence l.The indices k,l range over the images and sentences in the training set. Together with their additional Multiple Instance Learning objective,this score carries the interpretation that a sentence fragment aligns to a subset of the image regions whenever the dot product is positive.We found that the following reformulation simplifies the model and alleviates the need for additional objectives and their hyperparameters:S kl=t∈g lmax i∈gkv T i s t.(8)Here,every word s t aligns to the single best image region. As we show in the experiments,this simplified model also leads to improvements in thefinal ranking performance. Assuming that k=l denotes a corresponding image and sentence pair,thefinal max-margin,structured loss remains:C(θ)=klmax(0,S kl−S kk+1)rank images(9)+lmax(0,S lk−S kk+1)rank sentences.This objective encourages aligned image-sentences pairs to have a higher score than misaligned pairs,by a margin.3.1.4Decoding text segment alignments to images Consider an image from the training set and its correspond-ing sentence.We can interpret the quantity v T i s t as the un-normalized log probability of the t−th word describing any of the bounding boxes in the image.However,since we are ultimately interested in generating snippets of text instead of single words,we would like to align extended,contigu-ous sequences of words to a single bounding box.Note that the na¨ıve solution that assigns each word independently to the highest-scoring region is insufficient because it leads to words getting scattered inconsistently to different regions. To address this issue,we treat the true alignments as latent variables in a Markov Random Field(MRF)where the bi-nary interactions between neighboring words encourage an alignment to the same region.Concretely,given a sentence with N words and an image with M bounding boxes,we introduce the latent alignment variables a j∈{1..M}for j=1...N and formulate an MRF in a chain structure along the sentence as follows:E(a)=j=1...NψU j(a j)+j=1...N−1ψB j(a j,a j+1)(10)ψU j(a j=t)=v T i s t(11)ψB j(a j,a j+1)=β1[a j=a j+1].(12) Here,βis a hyperparameter that controls the affinity to-wards longer word phrases.This parameter allows us to interpolate between single-word alignments(β=0)andFigure4.Diagram of our multimodal Recurrent Neural Network generative model.The RNN takes an image,a word,the context from previous time steps and defines a distribution over the next word.START and END are special tokens.aligning the entire sentence to a single,maximally scoring region whenβis large.We minimize the energy tofind the best alignments a using dynamic programming.The output of this process is a set of image regions annotated with seg-ments of text.We now describe an approach for generating novel phrases based on these correspondences.3.2.Multimodal Recurrent Neural Network forgenerating descriptionsIn this section we assume an input set of images and their textual descriptions.These could be full images and their sentence descriptions,or regions and text snippets as dis-cussed in previous sections.The key challenge is in the de-sign of a model that can predict a variable-sized sequence of outputs.In previously developed language models based on Recurrent Neural Networks(RNNs)[31,41,5],this is achieved by defining a probability distribution of the next word in a sequence,given the current word and context from previous time steps.We explore a simple but effective ex-tension that additionally conditions the generative process on the content of an input image.More formally,the RNN takes the image pixels I and a sequence of input vectors (x1,...,x T).It then computes a sequence of hidden states (h1,...,h t)and a sequence of outputs(y1,...,y t)by iter-ating the following recurrence relation for t=1to T:b v=W hi[CNNθc(I)](13)h t=f(W hx x t+W hh h t−1+b h+b v)(14)y t=softmax(W oh h t+b o).(15) In the equations above,W hi,W hx,W hh,W oh and b h,b o are a set of learnable weights and biases.The output vector y t has the size of the word dictionary and one additional di-mension for a special END token that terminates the gener-ative process.Note that we provide the image context vector b v to the RNN at every iteration so that it does not have to remember the image content while generating words. RNN training.The RNN is trained to combine a word(x t), the previous context(h t−1)and the image information(b v) to predict the next word(y t).Concretely,the training pro-ceeds as follows(refer to Figure4):We set h0= 0,x1to a special START vector,and the desired label y1as thefirst word in the sequence.In particular,we use the word em-bedding for“the”as the START vector x1.Analogously, we set x2to the word vector of thefirst word and expect the network to predict the second word,etc.Finally,on the last step when x T represents the last word,the target label is set to a special END token.The cost function is to maximize the log probability assigned to the target labels.RNN at test time.The RNN predicts a sentence as follows: We compute the representation of the image b v,set h0=0, x1to the embedding of the word“the”,and compute the distribution over thefirst word y1.We sample from the dis-tribution(or pick the argmax),set its embedding vector as x2,and repeat this process until the END token is generated.3.3.OptimizationWe use Stochastic Gradient Descent with mini-batches of 100image-sentence pairs and momentum of0.9to optimize the alignment model.We cross-validate the learning rate and the weight decay.We also use dropout regularization in all layers except in the recurrent layers[47].The generative RNN is more difficult to optimize,party due to the word frequency disparity between rare words,and very common words(such as the END token).We achieved the best re-sults using RMSprop[42],which is an adaptive step size method that scales the gradient of each weight by a running average of its gradient magnitudes.4.ExperimentsDatasets.We use the Flickr8K[14],Flickr30K[46]and COCO[29]datasets in our experiments.These datasets contain8,000,31,000and123,000images respectively and each is annotated with5sentences using Amazon Mechanical Turk.For Flickr8K and Flickr30K,we use 1,000images for validation,1,000for testing and the rest for training(consistent with[14,18]).For COCO we use 5,000images for both validation and testing.Data Preprocessing.We convert all sentences to lower-case,discard non-alphanumeric characters,andfilter out the articles“an”,“a”,and“the”for efficiency.Our word vocabulary contains20,000words.4.1.Image-Sentence Alignment EvaluationWefirst investigate the quality of the inferred text and im-age alignments.As a proxy for this evaluation we perform ranking experiments where we consider a withheld set of images and sentences and then retrieve items in one modal-ity given a query from the other.We use the image-sentence score S kl(Section3.1.3)to evaluate a compatibility score between all pairs of test images and sentences.We then re-port the median rank of the closest ground truth result in theImage Annotation Image SearchModel R@1R@5R@10Med r R@1R@5R@10Med rDeViSE(Frome et al.[10]) 4.518.129.226 6.721.932.725SDT-RNN(Socher et al.[39])9.629.841.1168.929.841.116DeFrag(Karpathy et al.[18])12.632.944.0149.729.642.515Our implementation of DeFrag[18]13.835.848.210.49.528.240.315.6Our model:DepTree edges14.837.950.09.411.631.443.813.2Our model:BRNN16.540.654.27.611.832.144.712.4Flickr30KDeViSE(Frome et al.[10]) 4.518.129.226 6.721.932.725SDT-RNN(Socher et al.[39])9.629.841.1168.929.841.116DeFrag(Karpathy et al.[18])14.237.751.31010.230.844.214Our implementation of DeFrag[18]19.244.558.0 6.012.935.447.510.8Our model:DepTree edges20.046.659.4 5.415.036.548.210.4Our model:BRNN22.248.261.4 4.815.237.750.59.2COCOOur model:1K test images29.462.075.9 2.520.952.869.2 4.0Our model:5K test images11.832.545.412.28.924.936.319.5Table1.Image-Sentence ranking experiment results.R@K is Recall@K(high is good).Med r is the median rank(low is good).In the results for our models,we take the top5validation set models,evaluate each independently on the test set and then report the average performance.The standard deviations on the recall values range from approximately0.5to1.0.list and Recall@K,which measures the fraction of times a correct item was found among the top K results.The results of these experiments can be found in Table1,and exam-ple retrievals in Figure5.We now highlight some of the takeaways.Our full model outperforms previous work.We compare our full model(“Our model:BRNN”)to the following base-lines:DeViSE[10]is a model that learns a score between words and images.As the simplest extension to the setting of multiple image regions and multiple words,Karpathy et al.[18]averaged the word and image region representa-tions to obtain a single vector for each modality.Socher et al.[39]is trained with a similar objective,but instead of averaging the word representations,they merge word vec-tors into a single sentence vector with a Recursive Neural Network.DeFrag are the results reported by Karpathy et al.[18].Since we use different word vectors,dropout for regularization and different cross-validation ranges(includ-ing larger embedding sizes),we re-implemented their cost function for a fair comparison(“Our implementation of De-Frag”).In all of these cases,our full model(“Our model: BRNN”)provides consistent improvements.Our simpler cost function improves performance.We now try to understand the sources of these improvements. First,we removed the BRNN and used dependency tree re-lations exactly as described in Karpathy et al.[18](“Our model:DepTree edges”).The only difference between this model and“Our reimplementation of DeFrag”is the new, simpler cost function introduced in Section3.1.3.We see that our formulation shows consistent improvements.BRNN outperforms dependency tree relations.Further-more,when we replace the dependency tree relations with the BRNN,we observe additional performance improve-ments.Since the dependency relations were shown to work better than single words and bigrams[18],this suggests that the BRNN is taking advantage of contexts longer than two words.Furthermore,our method does not rely on extracting a Dependency Tree and instead uses the raw words directly. COCO results for future comparisons.The COCO dataset has only recently been released,and we are not aware of other published ranking results.Therefore,we re-port results on a subset of1,000images and the full set of 5,000test images for future comparisons.Qualitative.As can be seen from example groundings in Figure5,the model discovers interpretable visual-semantic correspondences,even for small or relatively rare objects such as“seagulls”and“accordion”.These details would be missed by models that only reason about full images. 4.2.Evaluation of Generated DescriptionsWe have demonstrated that our alignment model produces state of the art ranking results and qualitative experiments suggest that the model effectively infers the alignment be-tween words and image regions.Our task is now to synthe-size these sentence snippets given new image regions.We evaluate these predictions with the BLEU[35]score,which despite multiple problems[14,22]is still considered to be the standard metric of evaluation in this setting.The BLEU score evaluates a candidate sentence by measuring the frac-tion of n-grams that appear in a set of references.Figure5.Example alignments predicted by our model.For every test image above,we retrieve the most compatible test sentence and visualize the highest-scoring region for each word(before MRF smoothing described in Section3.1.4)and the associated scores(v T i s t). We hide the alignments of low-scoring words to reduce clutter.We assign each region an arbitrary color.Flickr8K Flickr30K COCOMethod of generating text B-1B-2B-3B-1B-2B-3B-1B-2B-3Human agreement0.590.350.160.640.360.160.570.310.13Ranking:Nearest Neighbor0.290.110.030.270.080.020.320.110.03Generating:RNN0.420.190.060.450.200.060.500.250.12Table2.BLEU score evaluation of full image predictions on1,000images.B-n is BLEU score that uses up to n-grams(high is good).Our multimodal RNN outperforms retrieval baseline. Wefirst verify that our multimodal RNN is rich enough to support sentence generation for full images.In this experi-ment,we trained the RNN to generate sentences on full im-ages from Flickr8K,Flickr30K,and COCO datasets.Then at test time,we use thefirst four out offive sentences as references and thefifth one to evaluate human agreement. We also compare to a ranking baseline which uses the best model from the previous section(Section4.1)to annotate each test image with the highest-scoring sentence from the training set.The quantitative results of this experiment are in Table2.Note that the RNN model confidently outper-forms the retrieval method.This result is especially interest-ing in COCO dataset,since its training set consists of more than600,000sentences that cover a large variety of de-scriptions.Additionally,compared to the retrieval baseline which compares each image to all sentences in the training set,the RNN takes a fraction of a second to evaluate.We show example fullframe predictions in Figure6.Our generative model(shown in blue)produces sensible de-scriptions,even in the last two images that we consider to be failure cases.Additionally,we verified that none of these sentences appear in the training set.This suggests that the model is not simply memorizing the training data.How-ever,there are20occurrences of“man in black shirt”and 60occurrences of“is paying guitar”,which the model may have composed to describe thefirst image.Region-level evaluation.Finally,we evaluate our region RNN which was trained on the inferred,intermodal corre-spondences.To support this evaluation,we collected a new dataset of region-level annotations.Concretely,we asked8 people to label a subset of COCO test images with region-level text descriptions.The labeling interface consisted of a single test image,and the ability to draw a bounding box and annotate it with text.We provided minimal constraints and instructions,except to“describe the content of each box”and we encouraged the annotators to describe a large variety of objects,actions,stuff,and high-level concepts. Thefinal dataset consists of1469annotations in237im-ages.There are on average6.2annotations per image,and each one is on average4.13words long.We compare three models on this dataset:The region RNN model,a fullframe RNN model that was trained on full im-ages and sentences,and a ranking baseline.To predict de-scriptions with the ranking baseline,we take the number of words in the shortest reference annotation and search the training set sentences for the highest scoring segment of text。
青春双歧杆菌代谢低聚木糖机理研究
第27卷第5期2007年10月林 产 化 学 与 工 业Che m istry and I ndustry of Forest Pr oducts Vol .27No .5Oct .2007研究报告青春双歧杆菌代谢低聚木糖机理研究 收稿日期:2006-10-11 基金项目:国家自然科学基金资助项目(39770601);高等学校博士学科点专项基金(20050298010) 作者简介:张军华(1977-),男,湖北京山人,讲师,博士,主要从事林产生物化学加工的研究 3通讯作者:勇强,硕士生导师,主要从事林产生物化学加工研究。
HANG Jun 2hua 张军华1,勇强23,余世袁2(1.西北农林科技大学林学院,陕西杨凌712100;2.南京林业大学化学工程学院,江苏南京210037)摘 要: 对青春双歧杆菌代谢低聚木糖的动力学行为进行了探讨,同时对青春双歧杆菌代谢低聚木糖时各组分的变化规律及代谢产物进行了研究。
结果表明,青春双歧杆菌代谢低聚木糖48h 后,总糖质量浓度从最初的5.00g/L 降至3.59g/L,菌体质量浓度从0.10g/L 上升至0.35g/L,培养体系的pH 值从7.00降至4.75。
青春双歧杆菌代谢低聚木糖时,首先代谢木二糖至木五糖组分,此后,其分泌的α-L -阿拉伯呋喃糖苷酶将低聚木糖组分中的阿拉伯糖基侧链水解释放,并作为青春双歧杆菌的碳源。
代谢过程中,青春双歧杆菌还能分泌β-D -木糖苷酶作用于低聚木糖的末端,释放出木糖,最终导致木糖在培养体系中累积。
代谢产物分析结果显示,青春双歧杆菌代谢低聚木糖的代谢产物主要有乳酸、乙酸、丙酸和丁酸。
关键词: 双歧杆菌;低聚木糖;有机酸;益生元中图分类号:T Q91;Q53 文献标识码:A 文章编号:0253-2417(2007)05-0001-05Metabolis m of Xyl oolig osaccharides by B ifidobacteriu m adolescentisZ HANG Jun 2hua 1,Y ONG Q iang 2,Y U Shi 2yuan 2(1.College of Forestry,North west Agriculture and Forestry University,Yangling 712100,China;2.College of Che m ical Engineering,Nanjing Forestry University,Nanjing 210037,China )Abstract:Kinetics of xyl ooligosaccharides (XOS )metabolized by B ifidobacterium adolescentis was investigated,and the compo 2nents of XOS and the main metabolites were analyzed during the metabolis m.The results showed that t otal sugar concentrati on decreased fr om 5.00g/L t o 3.59g/L when XOS was used as s ole carbon s ources for B.adolescentis after 48h metabolis m in vitro ,cell concentrati on increased fr om 0.10g/L t o 0.35g/L and the pH value decreased fr om 7.00t o 4.75.Xyl obi ose t o xyl opentaose were metabolized by B.adolescentis firstly when XOS was used as carbon s ources,after that,arabinose was released fr om arabinosyl gr oup s in XOS m ixture by α2L 2arabinofuranosidases fr om B.adolescentis during the metabolis m,and then arabinose was metabolized ulti m ately .Free xyl ose was accu mulated by releasing fr om the end chain of XOS by β2D 2xyl osidase fr om B.adolescentis .The main metabolites were lactate,acetate,p r op i onate and butyrate .Key words:bifidobacteria;xyl ooligosaccharides;organic acid;p rebi otic双歧杆菌是人体肠道内重要的优势菌群,其对维护人体正常健康起着极其重要的作用。
参考文献的标注与著录
专著中或连续出版物中析出的文献 (contribution):从整部文献(可 指专著或期刊)中析出的篇章。科技 论文参考文献大量是以此种形式出现 的。 唐学明,陈滇宝. 双烯烃配位聚 合进展 [M] //黄葆同,沈之荃等著. 烯烃双烯烃配位聚合进展. 北京:科 学出版社,1998:172-202
T Hatakeyama, H Hatakeyama, K Nakamura. Determination of bound water content in polymers by DTA, DSC and TG [J]. Thermochim Acta, 1988, 123: 153
4
标注、著录及其两种体制
标注:在论文正文中如何在适当的位置指明所引 文献的序号(或作者、年代)。 著录:在文献表中如何给出查阅每条文献的具体 途径。并将标注、著录的方式加以规范,形成 共同的“语言”。
两种标注制(著录体制): 作者-出版年制(first element and date method, Harward style) 顺序编码制(numeric references method, Vancouver style)
“Sestac (1969) and Wendlandt (1977 b) have proposed a simple method for recording the sample temperature in a thermobalance, using an auxiliary thermocouple, such as shown in Fig.15.”
· 图3.1 X射线粉末衍射图【1】 1 R.Spits,L.Duran and A.Guyot, Makromol.Chem.,189,549(1988) · 表3.1 活性MgCL2和TiCL3晶格参数比较【6】 6 I.W.Bassi, unpublished results
International Journal of Pattern Recognition and Artificial Intelligence c ○ World Scienti
AUTOMATIC CLASSIFICATION OF DIGITAL PHOTOGRAPHS BASED ON DECISION FORESTS
RAIMONDO SCHETTINI DISCo, University of Milano Bicocca, Via Bicocca degli Arcimboldi 8 Milano, 20126, Italy schettini@disco.unimib.it CARLA BRAMBILLA IMATI, CNR, Via Bassini 15 Milano, 20131, Italy carla@r.it CLAUDIO CUSANO ITC, CNR, Via Bassini 15 Milano, 20131, Italy DISCo, University of Milano Bicocca, Via Bicocca degli Arcimboldi 8 Milano, 20126, Italy cusano@r.it GIANLUIGI CIOCCA ITC, CNR, Via Bassini 15 Milano, 20131, Italy DISCo, University of Milano Bicocca, Via Bicocca degli Arcimboldi 8 Milano, 20126, Italy ciocca@r.it
Annotating photographs with broad semantic labels can be useful in both image processing and content-based image retrieval. We show here how low-level features can be related to semantic photo categories, such as indoor, outdoor and close-up, using decision forests consisting of trees constructed according to CART methodology. We also show how the results can be improved by introducing a rejection option in the classification process. Experimental results on a test set of 4500 photographs are reported and discussed. Keywords : CART, decision forest, digital images, image classification, low-level features.
From Data Mining to Knowledge Discovery in Databases
s Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media atten-tion of late. What is all the excitement about?This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges in-volved in real-world applications of knowledge discovery, and current and future research direc-tions in the field.A cross a wide variety of fields, data arebeing collected and accumulated at adramatic pace. There is an urgent need for a new generation of computational theo-ries and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data. These theories and tools are the subject of the emerging field of knowledge discovery in databases (KDD).At an abstract level, the KDD field is con-cerned with the development of methods and techniques for making sense of data. The basic problem addressed by the KDD process is one of mapping low-level data (which are typically too voluminous to understand and digest easi-ly) into other forms that might be more com-pact (for example, a short report), more ab-stract (for example, a descriptive approximation or model of the process that generated the data), or more useful (for exam-ple, a predictive model for estimating the val-ue of future cases). At the core of the process is the application of specific data-mining meth-ods for pattern discovery and extraction.1This article begins by discussing the histori-cal context of KDD and data mining and theirintersection with other related fields. A briefsummary of recent KDD real-world applica-tions is provided. Definitions of KDD and da-ta mining are provided, and the general mul-tistep KDD process is outlined. This multistepprocess has the application of data-mining al-gorithms as one particular step in the process.The data-mining step is discussed in more de-tail in the context of specific data-mining al-gorithms and their application. Real-worldpractical application issues are also outlined.Finally, the article enumerates challenges forfuture research and development and in par-ticular discusses potential opportunities for AItechnology in KDD systems.Why Do We Need KDD?The traditional method of turning data intoknowledge relies on manual analysis and in-terpretation. For example, in the health-careindustry, it is common for specialists to peri-odically analyze current trends and changesin health-care data, say, on a quarterly basis.The specialists then provide a report detailingthe analysis to the sponsoring health-care or-ganization; this report becomes the basis forfuture decision making and planning forhealth-care management. In a totally differ-ent type of application, planetary geologistssift through remotely sensed images of plan-ets and asteroids, carefully locating and cata-loging such geologic objects of interest as im-pact craters. Be it science, marketing, finance,health care, retail, or any other field, the clas-sical approach to data analysis relies funda-mentally on one or more analysts becomingArticlesFALL 1996 37From Data Mining to Knowledge Discovery inDatabasesUsama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth Copyright © 1996, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1996 / $2.00areas is astronomy. Here, a notable success was achieved by SKICAT ,a system used by as-tronomers to perform image analysis,classification, and cataloging of sky objects from sky-survey images (Fayyad, Djorgovski,and Weir 1996). In its first application, the system was used to process the 3 terabytes (1012bytes) of image data resulting from the Second Palomar Observatory Sky Survey,where it is estimated that on the order of 109sky objects are detectable. SKICAT can outper-form humans and traditional computational techniques in classifying faint sky objects. See Fayyad, Haussler, and Stolorz (1996) for a sur-vey of scientific applications.In business, main KDD application areas includes marketing, finance (especially in-vestment), fraud detection, manufacturing,telecommunications, and Internet agents.Marketing:In marketing, the primary ap-plication is database marketing systems,which analyze customer databases to identify different customer groups and forecast their behavior. Business Week (Berry 1994) estimat-ed that over half of all retailers are using or planning to use database marketing, and those who do use it have good results; for ex-ample, American Express reports a 10- to 15-percent increase in credit-card use. Another notable marketing application is market-bas-ket analysis (Agrawal et al. 1996) systems,which find patterns such as, “If customer bought X, he/she is also likely to buy Y and Z.” Such patterns are valuable to retailers.Investment: Numerous companies use da-ta mining for investment, but most do not describe their systems. One exception is LBS Capital Management. Its system uses expert systems, neural nets, and genetic algorithms to manage portfolios totaling $600 million;since its start in 1993, the system has outper-formed the broad stock market (Hall, Mani,and Barr 1996).Fraud detection: HNC Falcon and Nestor PRISM systems are used for monitoring credit-card fraud, watching over millions of ac-counts. The FAIS system (Senator et al. 1995),from the U.S. Treasury Financial Crimes En-forcement Network, is used to identify finan-cial transactions that might indicate money-laundering activity.Manufacturing: The CASSIOPEE trou-bleshooting system, developed as part of a joint venture between General Electric and SNECMA, was applied by three major Euro-pean airlines to diagnose and predict prob-lems for the Boeing 737. To derive families of faults, clustering methods are used. CASSIOPEE received the European first prize for innova-intimately familiar with the data and serving as an interface between the data and the users and products.For these (and many other) applications,this form of manual probing of a data set is slow, expensive, and highly subjective. In fact, as data volumes grow dramatically, this type of manual data analysis is becoming completely impractical in many domains.Databases are increasing in size in two ways:(1) the number N of records or objects in the database and (2) the number d of fields or at-tributes to an object. Databases containing on the order of N = 109objects are becoming in-creasingly common, for example, in the as-tronomical sciences. Similarly, the number of fields d can easily be on the order of 102or even 103, for example, in medical diagnostic applications. Who could be expected to di-gest millions of records, each having tens or hundreds of fields? We believe that this job is certainly not one for humans; hence, analysis work needs to be automated, at least partially.The need to scale up human analysis capa-bilities to handling the large number of bytes that we can collect is both economic and sci-entific. Businesses use data to gain competi-tive advantage, increase efficiency, and pro-vide more valuable services to customers.Data we capture about our environment are the basic evidence we use to build theories and models of the universe we live in. Be-cause computers have enabled humans to gather more data than we can digest, it is on-ly natural to turn to computational tech-niques to help us unearth meaningful pat-terns and structures from the massive volumes of data. Hence, KDD is an attempt to address a problem that the digital informa-tion era made a fact of life for all of us: data overload.Data Mining and Knowledge Discovery in the Real WorldA large degree of the current interest in KDD is the result of the media interest surrounding successful KDD applications, for example, the focus articles within the last two years in Business Week , Newsweek , Byte , PC Week , and other large-circulation periodicals. Unfortu-nately, it is not always easy to separate fact from media hype. Nonetheless, several well-documented examples of successful systems can rightly be referred to as KDD applications and have been deployed in operational use on large-scale real-world problems in science and in business.In science, one of the primary applicationThere is an urgent need for a new generation of computation-al theories and tools toassist humans in extractinguseful information (knowledge)from the rapidly growing volumes ofdigital data.Articles38AI MAGAZINEtive applications (Manago and Auriol 1996).Telecommunications: The telecommuni-cations alarm-sequence analyzer (TASA) wasbuilt in cooperation with a manufacturer oftelecommunications equipment and threetelephone networks (Mannila, Toivonen, andVerkamo 1995). The system uses a novelframework for locating frequently occurringalarm episodes from the alarm stream andpresenting them as rules. Large sets of discov-ered rules can be explored with flexible infor-mation-retrieval tools supporting interactivityand iteration. In this way, TASA offers pruning,grouping, and ordering tools to refine the re-sults of a basic brute-force search for rules.Data cleaning: The MERGE-PURGE systemwas applied to the identification of duplicatewelfare claims (Hernandez and Stolfo 1995).It was used successfully on data from the Wel-fare Department of the State of Washington.In other areas, a well-publicized system isIBM’s ADVANCED SCOUT,a specialized data-min-ing system that helps National Basketball As-sociation (NBA) coaches organize and inter-pret data from NBA games (U.S. News 1995). ADVANCED SCOUT was used by several of the NBA teams in 1996, including the Seattle Su-personics, which reached the NBA finals.Finally, a novel and increasingly importanttype of discovery is one based on the use of in-telligent agents to navigate through an infor-mation-rich environment. Although the ideaof active triggers has long been analyzed in thedatabase field, really successful applications ofthis idea appeared only with the advent of theInternet. These systems ask the user to specifya profile of interest and search for related in-formation among a wide variety of public-do-main and proprietary sources. For example, FIREFLY is a personal music-recommendation agent: It asks a user his/her opinion of several music pieces and then suggests other music that the user might like (<http:// www.ffl/>). CRAYON(/>) allows users to create their own free newspaper (supported by ads); NEWSHOUND(<http://www. /hound/>) from the San Jose Mercury News and FARCAST(</> automatically search information from a wide variety of sources, including newspapers and wire services, and e-mail rele-vant documents directly to the user.These are just a few of the numerous suchsystems that use KDD techniques to automat-ically produce useful information from largemasses of raw data. See Piatetsky-Shapiro etal. (1996) for an overview of issues in devel-oping industrial KDD applications.Data Mining and KDDHistorically, the notion of finding useful pat-terns in data has been given a variety ofnames, including data mining, knowledge ex-traction, information discovery, informationharvesting, data archaeology, and data patternprocessing. The term data mining has mostlybeen used by statisticians, data analysts, andthe management information systems (MIS)communities. It has also gained popularity inthe database field. The phrase knowledge dis-covery in databases was coined at the first KDDworkshop in 1989 (Piatetsky-Shapiro 1991) toemphasize that knowledge is the end productof a data-driven discovery. It has been popular-ized in the AI and machine-learning fields.In our view, KDD refers to the overall pro-cess of discovering useful knowledge from da-ta, and data mining refers to a particular stepin this process. Data mining is the applicationof specific algorithms for extracting patternsfrom data. The distinction between the KDDprocess and the data-mining step (within theprocess) is a central point of this article. Theadditional steps in the KDD process, such asdata preparation, data selection, data cleaning,incorporation of appropriate prior knowledge,and proper interpretation of the results ofmining, are essential to ensure that usefulknowledge is derived from the data. Blind ap-plication of data-mining methods (rightly crit-icized as data dredging in the statistical litera-ture) can be a dangerous activity, easilyleading to the discovery of meaningless andinvalid patterns.The Interdisciplinary Nature of KDDKDD has evolved, and continues to evolve,from the intersection of research fields such asmachine learning, pattern recognition,databases, statistics, AI, knowledge acquisitionfor expert systems, data visualization, andhigh-performance computing. The unifyinggoal is extracting high-level knowledge fromlow-level data in the context of large data sets.The data-mining component of KDD cur-rently relies heavily on known techniquesfrom machine learning, pattern recognition,and statistics to find patterns from data in thedata-mining step of the KDD process. A natu-ral question is, How is KDD different from pat-tern recognition or machine learning (and re-lated fields)? The answer is that these fieldsprovide some of the data-mining methodsthat are used in the data-mining step of theKDD process. KDD focuses on the overall pro-cess of knowledge discovery from data, includ-ing how the data are stored and accessed, howalgorithms can be scaled to massive data setsThe basicproblemaddressed bythe KDDprocess isone ofmappinglow-leveldata intoother formsthat might bemorecompact,moreabstract,or moreuseful.ArticlesFALL 1996 39A driving force behind KDD is the database field (the second D in KDD). Indeed, the problem of effective data manipulation when data cannot fit in the main memory is of fun-damental importance to KDD. Database tech-niques for gaining efficient data access,grouping and ordering operations when ac-cessing data, and optimizing queries consti-tute the basics for scaling algorithms to larger data sets. Most data-mining algorithms from statistics, pattern recognition, and machine learning assume data are in the main memo-ry and pay no attention to how the algorithm breaks down if only limited views of the data are possible.A related field evolving from databases is data warehousing,which refers to the popular business trend of collecting and cleaning transactional data to make them available for online analysis and decision support. Data warehousing helps set the stage for KDD in two important ways: (1) data cleaning and (2)data access.Data cleaning: As organizations are forced to think about a unified logical view of the wide variety of data and databases they pos-sess, they have to address the issues of map-ping data to a single naming convention,uniformly representing and handling missing data, and handling noise and errors when possible.Data access: Uniform and well-defined methods must be created for accessing the da-ta and providing access paths to data that were historically difficult to get to (for exam-ple, stored offline).Once organizations and individuals have solved the problem of how to store and ac-cess their data, the natural next step is the question, What else do we do with all the da-ta? This is where opportunities for KDD natu-rally arise.A popular approach for analysis of data warehouses is called online analytical processing (OLAP), named for a set of principles pro-posed by Codd (1993). OLAP tools focus on providing multidimensional data analysis,which is superior to SQL in computing sum-maries and breakdowns along many dimen-sions. OLAP tools are targeted toward simpli-fying and supporting interactive data analysis,but the goal of KDD tools is to automate as much of the process as possible. Thus, KDD is a step beyond what is currently supported by most standard database systems.Basic DefinitionsKDD is the nontrivial process of identifying valid, novel, potentially useful, and ultimate-and still run efficiently, how results can be in-terpreted and visualized, and how the overall man-machine interaction can usefully be modeled and supported. The KDD process can be viewed as a multidisciplinary activity that encompasses techniques beyond the scope of any one particular discipline such as machine learning. In this context, there are clear opportunities for other fields of AI (be-sides machine learning) to contribute to KDD. KDD places a special emphasis on find-ing understandable patterns that can be inter-preted as useful or interesting knowledge.Thus, for example, neural networks, although a powerful modeling tool, are relatively difficult to understand compared to decision trees. KDD also emphasizes scaling and ro-bustness properties of modeling algorithms for large noisy data sets.Related AI research fields include machine discovery, which targets the discovery of em-pirical laws from observation and experimen-tation (Shrager and Langley 1990) (see Kloes-gen and Zytkow [1996] for a glossary of terms common to KDD and machine discovery),and causal modeling for the inference of causal models from data (Spirtes, Glymour,and Scheines 1993). Statistics in particular has much in common with KDD (see Elder and Pregibon [1996] and Glymour et al.[1996] for a more detailed discussion of this synergy). Knowledge discovery from data is fundamentally a statistical endeavor. Statistics provides a language and framework for quan-tifying the uncertainty that results when one tries to infer general patterns from a particu-lar sample of an overall population. As men-tioned earlier, the term data mining has had negative connotations in statistics since the 1960s when computer-based data analysis techniques were first introduced. The concern arose because if one searches long enough in any data set (even randomly generated data),one can find patterns that appear to be statis-tically significant but, in fact, are not. Clearly,this issue is of fundamental importance to KDD. Substantial progress has been made in recent years in understanding such issues in statistics. Much of this work is of direct rele-vance to KDD. Thus, data mining is a legiti-mate activity as long as one understands how to do it correctly; data mining carried out poorly (without regard to the statistical as-pects of the problem) is to be avoided. KDD can also be viewed as encompassing a broader view of modeling than statistics. KDD aims to provide tools to automate (to the degree pos-sible) the entire process of data analysis and the statistician’s “art” of hypothesis selection.Data mining is a step in the KDD process that consists of ap-plying data analysis and discovery al-gorithms that produce a par-ticular enu-meration ofpatterns (or models)over the data.Articles40AI MAGAZINEly understandable patterns in data (Fayyad, Piatetsky-Shapiro, and Smyth 1996).Here, data are a set of facts (for example, cases in a database), and pattern is an expres-sion in some language describing a subset of the data or a model applicable to the subset. Hence, in our usage here, extracting a pattern also designates fitting a model to data; find-ing structure from data; or, in general, mak-ing any high-level description of a set of data. The term process implies that KDD comprises many steps, which involve data preparation, search for patterns, knowledge evaluation, and refinement, all repeated in multiple itera-tions. By nontrivial, we mean that some search or inference is involved; that is, it is not a straightforward computation of predefined quantities like computing the av-erage value of a set of numbers.The discovered patterns should be valid on new data with some degree of certainty. We also want patterns to be novel (at least to the system and preferably to the user) and poten-tially useful, that is, lead to some benefit to the user or task. Finally, the patterns should be understandable, if not immediately then after some postprocessing.The previous discussion implies that we can define quantitative measures for evaluating extracted patterns. In many cases, it is possi-ble to define measures of certainty (for exam-ple, estimated prediction accuracy on new data) or utility (for example, gain, perhaps indollars saved because of better predictions orspeedup in response time of a system). No-tions such as novelty and understandabilityare much more subjective. In certain contexts,understandability can be estimated by sim-plicity (for example, the number of bits to de-scribe a pattern). An important notion, calledinterestingness(for example, see Silberschatzand Tuzhilin [1995] and Piatetsky-Shapiro andMatheus [1994]), is usually taken as an overallmeasure of pattern value, combining validity,novelty, usefulness, and simplicity. Interest-ingness functions can be defined explicitly orcan be manifested implicitly through an or-dering placed by the KDD system on the dis-covered patterns or models.Given these notions, we can consider apattern to be knowledge if it exceeds some in-terestingness threshold, which is by nomeans an attempt to define knowledge in thephilosophical or even the popular view. As amatter of fact, knowledge in this definition ispurely user oriented and domain specific andis determined by whatever functions andthresholds the user chooses.Data mining is a step in the KDD processthat consists of applying data analysis anddiscovery algorithms that, under acceptablecomputational efficiency limitations, pro-duce a particular enumeration of patterns (ormodels) over the data. Note that the space ofArticlesFALL 1996 41Figure 1. An Overview of the Steps That Compose the KDD Process.methods, the effective number of variables under consideration can be reduced, or in-variant representations for the data can be found.Fifth is matching the goals of the KDD pro-cess (step 1) to a particular data-mining method. For example, summarization, clas-sification, regression, clustering, and so on,are described later as well as in Fayyad, Piatet-sky-Shapiro, and Smyth (1996).Sixth is exploratory analysis and model and hypothesis selection: choosing the data-mining algorithm(s) and selecting method(s)to be used for searching for data patterns.This process includes deciding which models and parameters might be appropriate (for ex-ample, models of categorical data are differ-ent than models of vectors over the reals) and matching a particular data-mining method with the overall criteria of the KDD process (for example, the end user might be more in-terested in understanding the model than its predictive capabilities).Seventh is data mining: searching for pat-terns of interest in a particular representa-tional form or a set of such representations,including classification rules or trees, regres-sion, and clustering. The user can significant-ly aid the data-mining method by correctly performing the preceding steps.Eighth is interpreting mined patterns, pos-sibly returning to any of steps 1 through 7 for further iteration. This step can also involve visualization of the extracted patterns and models or visualization of the data given the extracted models.Ninth is acting on the discovered knowl-edge: using the knowledge directly, incorpo-rating the knowledge into another system for further action, or simply documenting it and reporting it to interested parties. This process also includes checking for and resolving po-tential conflicts with previously believed (or extracted) knowledge.The KDD process can involve significant iteration and can contain loops between any two steps. The basic flow of steps (al-though not the potential multitude of itera-tions and loops) is illustrated in figure 1.Most previous work on KDD has focused on step 7, the data mining. However, the other steps are as important (and probably more so) for the successful application of KDD in practice. Having defined the basic notions and introduced the KDD process, we now focus on the data-mining component,which has, by far, received the most atten-tion in the literature.patterns is often infinite, and the enumera-tion of patterns involves some form of search in this space. Practical computational constraints place severe limits on the sub-space that can be explored by a data-mining algorithm.The KDD process involves using the database along with any required selection,preprocessing, subsampling, and transforma-tions of it; applying data-mining methods (algorithms) to enumerate patterns from it;and evaluating the products of data mining to identify the subset of the enumerated pat-terns deemed knowledge. The data-mining component of the KDD process is concerned with the algorithmic means by which pat-terns are extracted and enumerated from da-ta. The overall KDD process (figure 1) in-cludes the evaluation and possible interpretation of the mined patterns to de-termine which patterns can be considered new knowledge. The KDD process also in-cludes all the additional steps described in the next section.The notion of an overall user-driven pro-cess is not unique to KDD: analogous propos-als have been put forward both in statistics (Hand 1994) and in machine learning (Brod-ley and Smyth 1996).The KDD ProcessThe KDD process is interactive and iterative,involving numerous steps with many deci-sions made by the user. Brachman and Anand (1996) give a practical view of the KDD pro-cess, emphasizing the interactive nature of the process. Here, we broadly outline some of its basic steps:First is developing an understanding of the application domain and the relevant prior knowledge and identifying the goal of the KDD process from the customer’s viewpoint.Second is creating a target data set: select-ing a data set, or focusing on a subset of vari-ables or data samples, on which discovery is to be performed.Third is data cleaning and preprocessing.Basic operations include removing noise if appropriate, collecting the necessary informa-tion to model or account for noise, deciding on strategies for handling missing data fields,and accounting for time-sequence informa-tion and known changes.Fourth is data reduction and projection:finding useful features to represent the data depending on the goal of the task. With di-mensionality reduction or transformationArticles42AI MAGAZINEThe Data-Mining Stepof the KDD ProcessThe data-mining component of the KDD pro-cess often involves repeated iterative applica-tion of particular data-mining methods. This section presents an overview of the primary goals of data mining, a description of the methods used to address these goals, and a brief description of the data-mining algo-rithms that incorporate these methods.The knowledge discovery goals are defined by the intended use of the system. We can distinguish two types of goals: (1) verification and (2) discovery. With verification,the sys-tem is limited to verifying the user’s hypothe-sis. With discovery,the system autonomously finds new patterns. We further subdivide the discovery goal into prediction,where the sys-tem finds patterns for predicting the future behavior of some entities, and description, where the system finds patterns for presenta-tion to a user in a human-understandableform. In this article, we are primarily con-cerned with discovery-oriented data mining.Data mining involves fitting models to, or determining patterns from, observed data. The fitted models play the role of inferred knowledge: Whether the models reflect useful or interesting knowledge is part of the over-all, interactive KDD process where subjective human judgment is typically required. Two primary mathematical formalisms are used in model fitting: (1) statistical and (2) logical. The statistical approach allows for nondeter-ministic effects in the model, whereas a logi-cal model is purely deterministic. We focus primarily on the statistical approach to data mining, which tends to be the most widely used basis for practical data-mining applica-tions given the typical presence of uncertain-ty in real-world data-generating processes.Most data-mining methods are based on tried and tested techniques from machine learning, pattern recognition, and statistics: classification, clustering, regression, and so on. The array of different algorithms under each of these headings can often be bewilder-ing to both the novice and the experienced data analyst. It should be emphasized that of the many data-mining methods advertised in the literature, there are really only a few fun-damental techniques. The actual underlying model representation being used by a particu-lar method typically comes from a composi-tion of a small number of well-known op-tions: polynomials, splines, kernel and basis functions, threshold-Boolean functions, and so on. Thus, algorithms tend to differ primar-ily in the goodness-of-fit criterion used toevaluate model fit or in the search methodused to find a good fit.In our brief overview of data-mining meth-ods, we try in particular to convey the notionthat most (if not all) methods can be viewedas extensions or hybrids of a few basic tech-niques and principles. We first discuss the pri-mary methods of data mining and then showthat the data- mining methods can be viewedas consisting of three primary algorithmiccomponents: (1) model representation, (2)model evaluation, and (3) search. In the dis-cussion of KDD and data-mining methods,we use a simple example to make some of thenotions more concrete. Figure 2 shows a sim-ple two-dimensional artificial data set consist-ing of 23 cases. Each point on the graph rep-resents a person who has been given a loanby a particular bank at some time in the past.The horizontal axis represents the income ofthe person; the vertical axis represents the to-tal personal debt of the person (mortgage, carpayments, and so on). The data have beenclassified into two classes: (1) the x’s repre-sent persons who have defaulted on theirloans and (2) the o’s represent persons whoseloans are in good status with the bank. Thus,this simple artificial data set could represent ahistorical data set that can contain usefulknowledge from the point of view of thebank making the loans. Note that in actualKDD applications, there are typically manymore dimensions (as many as several hun-dreds) and many more data points (manythousands or even millions).ArticlesFALL 1996 43Figure 2. A Simple Data Set with Two Classes Used for Illustrative Purposes.。
Survey of clustering data mining techniques
A Survey of Clustering Data Mining TechniquesPavel BerkhinYahoo!,Inc.pberkhin@Summary.Clustering is the division of data into groups of similar objects.It dis-regards some details in exchange for data simplifirmally,clustering can be viewed as data modeling concisely summarizing the data,and,therefore,it re-lates to many disciplines from statistics to numerical analysis.Clustering plays an important role in a broad range of applications,from information retrieval to CRM. Such applications usually deal with large datasets and many attributes.Exploration of such data is a subject of data mining.This survey concentrates on clustering algorithms from a data mining perspective.1IntroductionThe goal of this survey is to provide a comprehensive review of different clus-tering techniques in data mining.Clustering is a division of data into groups of similar objects.Each group,called a cluster,consists of objects that are similar to one another and dissimilar to objects of other groups.When repre-senting data with fewer clusters necessarily loses certainfine details(akin to lossy data compression),but achieves simplification.It represents many data objects by few clusters,and hence,it models data by its clusters.Data mod-eling puts clustering in a historical perspective rooted in mathematics,sta-tistics,and numerical analysis.From a machine learning perspective clusters correspond to hidden patterns,the search for clusters is unsupervised learn-ing,and the resulting system represents a data concept.Therefore,clustering is unsupervised learning of a hidden data concept.Data mining applications add to a general picture three complications:(a)large databases,(b)many attributes,(c)attributes of different types.This imposes on a data analysis se-vere computational requirements.Data mining applications include scientific data exploration,information retrieval,text mining,spatial databases,Web analysis,CRM,marketing,medical diagnostics,computational biology,and many others.They present real challenges to classic clustering algorithms. These challenges led to the emergence of powerful broadly applicable data2Pavel Berkhinmining clustering methods developed on the foundation of classic techniques.They are subject of this survey.1.1NotationsTo fix the context and clarify terminology,consider a dataset X consisting of data points (i.e.,objects ,instances ,cases ,patterns ,tuples ,transactions )x i =(x i 1,···,x id ),i =1:N ,in attribute space A ,where each component x il ∈A l ,l =1:d ,is a numerical or nominal categorical attribute (i.e.,feature ,variable ,dimension ,component ,field ).For a discussion of attribute data types see [106].Such point-by-attribute data format conceptually corresponds to a N ×d matrix and is used by a majority of algorithms reviewed below.However,data of other formats,such as variable length sequences and heterogeneous data,are not uncommon.The simplest subset in an attribute space is a direct Cartesian product of sub-ranges C = C l ⊂A ,C l ⊂A l ,called a segment (i.e.,cube ,cell ,region ).A unit is an elementary segment whose sub-ranges consist of a single category value,or of a small numerical bin.Describing the numbers of data points per every unit represents an extreme case of clustering,a histogram .This is a very expensive representation,and not a very revealing er driven segmentation is another commonly used practice in data exploration that utilizes expert knowledge regarding the importance of certain sub-domains.Unlike segmentation,clustering is assumed to be automatic,and so it is a machine learning technique.The ultimate goal of clustering is to assign points to a finite system of k subsets (clusters).Usually (but not always)subsets do not intersect,and their union is equal to a full dataset with the possible exception of outliersX =C 1 ··· C k C outliers ,C i C j =0,i =j.1.2Clustering Bibliography at GlanceGeneral references regarding clustering include [110],[205],[116],[131],[63],[72],[165],[119],[75],[141],[107],[91].A very good introduction to contem-porary data mining clustering techniques can be found in the textbook [106].There is a close relationship between clustering and many other fields.Clustering has always been used in statistics [10]and science [158].The clas-sic introduction into pattern recognition framework is given in [64].Typical applications include speech and character recognition.Machine learning clus-tering algorithms were applied to image segmentation and computer vision[117].For statistical approaches to pattern recognition see [56]and [85].Clus-tering can be viewed as a density estimation problem.This is the subject of traditional multivariate statistical estimation [197].Clustering is also widelyA Survey of Clustering Data Mining Techniques3 used for data compression in image processing,which is also known as vec-tor quantization[89].Datafitting in numerical analysis provides still another venue in data modeling[53].This survey’s emphasis is on clustering in data mining.Such clustering is characterized by large datasets with many attributes of different types. Though we do not even try to review particular applications,many important ideas are related to the specificfields.Clustering in data mining was brought to life by intense developments in information retrieval and text mining[52], [206],[58],spatial database applications,for example,GIS or astronomical data,[223],[189],[68],sequence and heterogeneous data analysis[43],Web applications[48],[111],[81],DNA analysis in computational biology[23],and many others.They resulted in a large amount of application-specific devel-opments,but also in some general techniques.These techniques and classic clustering algorithms that relate to them are surveyed below.1.3Plan of Further PresentationClassification of clustering algorithms is neither straightforward,nor canoni-cal.In reality,different classes of algorithms overlap.Traditionally clustering techniques are broadly divided in hierarchical and partitioning.Hierarchical clustering is further subdivided into agglomerative and divisive.The basics of hierarchical clustering include Lance-Williams formula,idea of conceptual clustering,now classic algorithms SLINK,COBWEB,as well as newer algo-rithms CURE and CHAMELEON.We survey these algorithms in the section Hierarchical Clustering.While hierarchical algorithms gradually(dis)assemble points into clusters (as crystals grow),partitioning algorithms learn clusters directly.In doing so they try to discover clusters either by iteratively relocating points between subsets,or by identifying areas heavily populated with data.Algorithms of thefirst kind are called Partitioning Relocation Clustering. They are further classified into probabilistic clustering(EM framework,al-gorithms SNOB,AUTOCLASS,MCLUST),k-medoids methods(algorithms PAM,CLARA,CLARANS,and its extension),and k-means methods(differ-ent schemes,initialization,optimization,harmonic means,extensions).Such methods concentrate on how well pointsfit into their clusters and tend to build clusters of proper convex shapes.Partitioning algorithms of the second type are surveyed in the section Density-Based Partitioning.They attempt to discover dense connected com-ponents of data,which areflexible in terms of their shape.Density-based connectivity is used in the algorithms DBSCAN,OPTICS,DBCLASD,while the algorithm DENCLUE exploits space density functions.These algorithms are less sensitive to outliers and can discover clusters of irregular shape.They usually work with low-dimensional numerical data,known as spatial data. Spatial objects could include not only points,but also geometrically extended objects(algorithm GDBSCAN).4Pavel BerkhinSome algorithms work with data indirectly by constructing summaries of data over the attribute space subsets.They perform space segmentation and then aggregate appropriate segments.We discuss them in the section Grid-Based Methods.They frequently use hierarchical agglomeration as one phase of processing.Algorithms BANG,STING,WaveCluster,and FC are discussed in this section.Grid-based methods are fast and handle outliers well.Grid-based methodology is also used as an intermediate step in many other algorithms (for example,CLIQUE,MAFIA).Categorical data is intimately connected with transactional databases.The concept of a similarity alone is not sufficient for clustering such data.The idea of categorical data co-occurrence comes to the rescue.The algorithms ROCK,SNN,and CACTUS are surveyed in the section Co-Occurrence of Categorical Data.The situation gets even more aggravated with the growth of the number of items involved.To help with this problem the effort is shifted from data clustering to pre-clustering of items or categorical attribute values. Development based on hyper-graph partitioning and the algorithm STIRR exemplify this approach.Many other clustering techniques are developed,primarily in machine learning,that either have theoretical significance,are used traditionally out-side the data mining community,or do notfit in previously outlined categories. The boundary is blurred.In the section Other Developments we discuss the emerging direction of constraint-based clustering,the important researchfield of graph partitioning,and the relationship of clustering to supervised learning, gradient descent,artificial neural networks,and evolutionary methods.Data Mining primarily works with large databases.Clustering large datasets presents scalability problems reviewed in the section Scalability and VLDB Extensions.Here we talk about algorithms like DIGNET,about BIRCH and other data squashing techniques,and about Hoffding or Chernoffbounds.Another trait of real-life data is high dimensionality.Corresponding de-velopments are surveyed in the section Clustering High Dimensional Data. The trouble comes from a decrease in metric separation when the dimension grows.One approach to dimensionality reduction uses attributes transforma-tions(DFT,PCA,wavelets).Another way to address the problem is through subspace clustering(algorithms CLIQUE,MAFIA,ENCLUS,OPTIGRID, PROCLUS,ORCLUS).Still another approach clusters attributes in groups and uses their derived proxies to cluster objects.This double clustering is known as co-clustering.Issues common to different clustering methods are overviewed in the sec-tion General Algorithmic Issues.We talk about assessment of results,de-termination of appropriate number of clusters to build,data preprocessing, proximity measures,and handling of outliers.For reader’s convenience we provide a classification of clustering algorithms closely followed by this survey:•Hierarchical MethodsA Survey of Clustering Data Mining Techniques5Agglomerative AlgorithmsDivisive Algorithms•Partitioning Relocation MethodsProbabilistic ClusteringK-medoids MethodsK-means Methods•Density-Based Partitioning MethodsDensity-Based Connectivity ClusteringDensity Functions Clustering•Grid-Based Methods•Methods Based on Co-Occurrence of Categorical Data•Other Clustering TechniquesConstraint-Based ClusteringGraph PartitioningClustering Algorithms and Supervised LearningClustering Algorithms in Machine Learning•Scalable Clustering Algorithms•Algorithms For High Dimensional DataSubspace ClusteringCo-Clustering Techniques1.4Important IssuesThe properties of clustering algorithms we are primarily concerned with in data mining include:•Type of attributes algorithm can handle•Scalability to large datasets•Ability to work with high dimensional data•Ability tofind clusters of irregular shape•Handling outliers•Time complexity(we frequently simply use the term complexity)•Data order dependency•Labeling or assignment(hard or strict vs.soft or fuzzy)•Reliance on a priori knowledge and user defined parameters •Interpretability of resultsRealistically,with every algorithm we discuss only some of these properties. The list is in no way exhaustive.For example,as appropriate,we also discuss algorithms ability to work in pre-defined memory buffer,to restart,and to provide an intermediate solution.6Pavel Berkhin2Hierarchical ClusteringHierarchical clustering builds a cluster hierarchy or a tree of clusters,also known as a dendrogram.Every cluster node contains child clusters;sibling clusters partition the points covered by their common parent.Such an ap-proach allows exploring data on different levels of granularity.Hierarchical clustering methods are categorized into agglomerative(bottom-up)and divi-sive(top-down)[116],[131].An agglomerative clustering starts with one-point (singleton)clusters and recursively merges two or more of the most similar clusters.A divisive clustering starts with a single cluster containing all data points and recursively splits the most appropriate cluster.The process contin-ues until a stopping criterion(frequently,the requested number k of clusters) is achieved.Advantages of hierarchical clustering include:•Flexibility regarding the level of granularity•Ease of handling any form of similarity or distance•Applicability to any attribute typesDisadvantages of hierarchical clustering are related to:•Vagueness of termination criteria•Most hierarchical algorithms do not revisit(intermediate)clusters once constructed.The classic approaches to hierarchical clustering are presented in the sub-section Linkage Metrics.Hierarchical clustering based on linkage metrics re-sults in clusters of proper(convex)shapes.Active contemporary efforts to build cluster systems that incorporate our intuitive concept of clusters as con-nected components of arbitrary shape,including the algorithms CURE and CHAMELEON,are surveyed in the subsection Hierarchical Clusters of Arbi-trary Shapes.Divisive techniques based on binary taxonomies are presented in the subsection Binary Divisive Partitioning.The subsection Other Devel-opments contains information related to incremental learning,model-based clustering,and cluster refinement.In hierarchical clustering our regular point-by-attribute data representa-tion frequently is of secondary importance.Instead,hierarchical clustering frequently deals with the N×N matrix of distances(dissimilarities)or sim-ilarities between training points sometimes called a connectivity matrix.So-called linkage metrics are constructed from elements of this matrix.The re-quirement of keeping a connectivity matrix in memory is unrealistic.To relax this limitation different techniques are used to sparsify(introduce zeros into) the connectivity matrix.This can be done by omitting entries smaller than a certain threshold,by using only a certain subset of data representatives,or by keeping with each point only a certain number of its nearest neighbors(for nearest neighbor chains see[177]).Notice that the way we process the original (dis)similarity matrix and construct a linkage metric reflects our a priori ideas about the data model.A Survey of Clustering Data Mining Techniques7With the(sparsified)connectivity matrix we can associate the weighted connectivity graph G(X,E)whose vertices X are data points,and edges E and their weights are defined by the connectivity matrix.This establishes a connection between hierarchical clustering and graph partitioning.One of the most striking developments in hierarchical clustering is the algorithm BIRCH.It is discussed in the section Scalable VLDB Extensions.Hierarchical clustering initializes a cluster system as a set of singleton clusters(agglomerative case)or a single cluster of all points(divisive case) and proceeds iteratively merging or splitting the most appropriate cluster(s) until the stopping criterion is achieved.The appropriateness of a cluster(s) for merging or splitting depends on the(dis)similarity of cluster(s)elements. This reflects a general presumption that clusters consist of similar points.An important example of dissimilarity between two points is the distance between them.To merge or split subsets of points rather than individual points,the dis-tance between individual points has to be generalized to the distance between subsets.Such a derived proximity measure is called a linkage metric.The type of a linkage metric significantly affects hierarchical algorithms,because it re-flects a particular concept of closeness and connectivity.Major inter-cluster linkage metrics[171],[177]include single link,average link,and complete link. The underlying dissimilarity measure(usually,distance)is computed for every pair of nodes with one node in thefirst set and another node in the second set.A specific operation such as minimum(single link),average(average link),or maximum(complete link)is applied to pair-wise dissimilarity measures:d(C1,C2)=Op{d(x,y),x∈C1,y∈C2}Early examples include the algorithm SLINK[199],which implements single link(Op=min),Voorhees’method[215],which implements average link (Op=Avr),and the algorithm CLINK[55],which implements complete link (Op=max).It is related to the problem offinding the Euclidean minimal spanning tree[224]and has O(N2)complexity.The methods using inter-cluster distances defined in terms of pairs of nodes(one in each respective cluster)are called graph methods.They do not use any cluster representation other than a set of points.This name naturally relates to the connectivity graph G(X,E)introduced above,because every data partition corresponds to a graph partition.Such methods can be augmented by so-called geometric methods in which a cluster is represented by its central point.Under the assumption of numerical attributes,the center point is defined as a centroid or an average of two cluster centroids subject to agglomeration.It results in centroid,median,and minimum variance linkage metrics.All of the above linkage metrics can be derived from the Lance-Williams updating formula[145],d(C iC j,C k)=a(i)d(C i,C k)+a(j)d(C j,C k)+b·d(C i,C j)+c|d(C i,C k)−d(C j,C k)|.8Pavel BerkhinHere a,b,c are coefficients corresponding to a particular linkage.This formula expresses a linkage metric between a union of the two clusters and the third cluster in terms of underlying nodes.The Lance-Williams formula is crucial to making the dis(similarity)computations feasible.Surveys of linkage metrics can be found in [170][54].When distance is used as a base measure,linkage metrics capture inter-cluster proximity.However,a similarity-based view that results in intra-cluster connectivity considerations is also used,for example,in the original average link agglomeration (Group-Average Method)[116].Under reasonable assumptions,such as reducibility condition (graph meth-ods satisfy this condition),linkage metrics methods suffer from O N 2 time complexity [177].Despite the unfavorable time complexity,these algorithms are widely used.As an example,the algorithm AGNES (AGlomerative NESt-ing)[131]is used in S-Plus.When the connectivity N ×N matrix is sparsified,graph methods directly dealing with the connectivity graph G can be used.In particular,hierarchical divisive MST (Minimum Spanning Tree)algorithm is based on graph parti-tioning [116].2.1Hierarchical Clusters of Arbitrary ShapesFor spatial data,linkage metrics based on Euclidean distance naturally gener-ate clusters of convex shapes.Meanwhile,visual inspection of spatial images frequently discovers clusters with curvy appearance.Guha et al.[99]introduced the hierarchical agglomerative clustering algo-rithm CURE (Clustering Using REpresentatives).This algorithm has a num-ber of novel features of general importance.It takes special steps to handle outliers and to provide labeling in assignment stage.It also uses two techniques to achieve scalability:data sampling (section 8),and data partitioning.CURE creates p partitions,so that fine granularity clusters are constructed in parti-tions first.A major feature of CURE is that it represents a cluster by a fixed number,c ,of points scattered around it.The distance between two clusters used in the agglomerative process is the minimum of distances between two scattered representatives.Therefore,CURE takes a middle approach between the graph (all-points)methods and the geometric (one centroid)methods.Single and average link closeness are replaced by representatives’aggregate closeness.Selecting representatives scattered around a cluster makes it pos-sible to cover non-spherical shapes.As before,agglomeration continues until the requested number k of clusters is achieved.CURE employs one additional trick:originally selected scattered points are shrunk to the geometric centroid of the cluster by a user-specified factor α.Shrinkage suppresses the affect of outliers;outliers happen to be located further from the cluster centroid than the other scattered representatives.CURE is capable of finding clusters of different shapes and sizes,and it is insensitive to outliers.Because CURE uses sampling,estimation of its complexity is not straightforward.For low-dimensional data authors provide a complexity estimate of O (N 2sample )definedA Survey of Clustering Data Mining Techniques9 in terms of a sample size.More exact bounds depend on input parameters: shrink factorα,number of representative points c,number of partitions p,and a sample size.Figure1(a)illustrates agglomeration in CURE.Three clusters, each with three representatives,are shown before and after the merge and shrinkage.Two closest representatives are connected.While the algorithm CURE works with numerical attributes(particularly low dimensional spatial data),the algorithm ROCK developed by the same researchers[100]targets hierarchical agglomerative clustering for categorical attributes.It is reviewed in the section Co-Occurrence of Categorical Data.The hierarchical agglomerative algorithm CHAMELEON[127]uses the connectivity graph G corresponding to the K-nearest neighbor model spar-sification of the connectivity matrix:the edges of K most similar points to any given point are preserved,the rest are pruned.CHAMELEON has two stages.In thefirst stage small tight clusters are built to ignite the second stage.This involves a graph partitioning[129].In the second stage agglomer-ative process is performed.It utilizes measures of relative inter-connectivity RI(C i,C j)and relative closeness RC(C i,C j);both are locally normalized by internal interconnectivity and closeness of clusters C i and C j.In this sense the modeling is dynamic:it depends on data locally.Normalization involves certain non-obvious graph operations[129].CHAMELEON relies heavily on graph partitioning implemented in the library HMETIS(see the section6). Agglomerative process depends on user provided thresholds.A decision to merge is made based on the combinationRI(C i,C j)·RC(C i,C j)αof local measures.The algorithm does not depend on assumptions about the data model.It has been proven tofind clusters of different shapes,densities, and sizes in2D(two-dimensional)space.It has a complexity of O(Nm+ Nlog(N)+m2log(m),where m is the number of sub-clusters built during the first initialization phase.Figure1(b)(analogous to the one in[127])clarifies the difference with CURE.It presents a choice of four clusters(a)-(d)for a merge.While CURE would merge clusters(a)and(b),CHAMELEON makes intuitively better choice of merging(c)and(d).2.2Binary Divisive PartitioningIn linguistics,information retrieval,and document clustering applications bi-nary taxonomies are very useful.Linear algebra methods,based on singular value decomposition(SVD)are used for this purpose in collaborativefilter-ing and information retrieval[26].Application of SVD to hierarchical divisive clustering of document collections resulted in the PDDP(Principal Direction Divisive Partitioning)algorithm[31].In our notations,object x is a docu-ment,l th attribute corresponds to a word(index term),and a matrix X entry x il is a measure(e.g.TF-IDF)of l-term frequency in a document x.PDDP constructs SVD decomposition of the matrix10Pavel Berkhin(a)Algorithm CURE (b)Algorithm CHAMELEONFig.1.Agglomeration in Clusters of Arbitrary Shapes(X −e ¯x ),¯x =1Ni =1:N x i ,e =(1,...,1)T .This algorithm bisects data in Euclidean space by a hyperplane that passes through data centroid orthogonal to the eigenvector with the largest singular value.A k -way split is also possible if the k largest singular values are consid-ered.Bisecting is a good way to categorize documents and it yields a binary tree.When k -means (2-means)is used for bisecting,the dividing hyperplane is orthogonal to the line connecting the two centroids.The comparative study of SVD vs.k -means approaches [191]can be used for further references.Hier-archical divisive bisecting k -means was proven [206]to be preferable to PDDP for document clustering.While PDDP or 2-means are concerned with how to split a cluster,the problem of which cluster to split is also important.Simple strategies are:(1)split each node at a given level,(2)split the cluster with highest cardinality,and,(3)split the cluster with the largest intra-cluster variance.All three strategies have problems.For a more detailed analysis of this subject and better strategies,see [192].2.3Other DevelopmentsOne of early agglomerative clustering algorithms,Ward’s method [222],is based not on linkage metric,but on an objective function used in k -means.The merger decision is viewed in terms of its effect on the objective function.The popular hierarchical clustering algorithm for categorical data COB-WEB [77]has two very important qualities.First,it utilizes incremental learn-ing.Instead of following divisive or agglomerative approaches,it dynamically builds a dendrogram by processing one data point at a time.Second,COB-WEB is an example of conceptual or model-based learning.This means that each cluster is considered as a model that can be described intrinsically,rather than as a collection of points assigned to it.COBWEB’s dendrogram is calleda classification tree.Each tree node(cluster)C is associated with the condi-tional probabilities for categorical attribute-values pairs,P r(x l=νlp|C),l=1:d,p=1:|A l|.This easily can be recognized as a C-specific Na¨ıve Bayes classifier.During the classification tree construction,every new point is descended along the tree and the tree is potentially updated(by an insert/split/merge/create op-eration).Decisions are based on the category utility[49]CU{C1,...,C k}=1j=1:kCU(C j)CU(C j)=l,p(P r(x l=νlp|C j)2−(P r(x l=νlp)2.Category utility is similar to the GINI index.It rewards clusters C j for in-creases in predictability of the categorical attribute valuesνlp.Being incre-mental,COBWEB is fast with a complexity of O(tN),though it depends non-linearly on tree characteristics packed into a constant t.There is a similar incremental hierarchical algorithm for all numerical attributes called CLAS-SIT[88].CLASSIT associates normal distributions with cluster nodes.Both algorithms can result in highly unbalanced trees.Chiu et al.[47]proposed another conceptual or model-based approach to hierarchical clustering.This development contains several different use-ful features,such as the extension of scalability preprocessing to categori-cal attributes,outliers handling,and a two-step strategy for monitoring the number of clusters including BIC(defined below).A model associated with a cluster covers both numerical and categorical attributes and constitutes a blend of Gaussian and multinomial models.Denote corresponding multivari-ate parameters byθ.With every cluster C we associate a logarithm of its (classification)likelihoodl C=x i∈Clog(p(x i|θ))The algorithm uses maximum likelihood estimates for parameterθ.The dis-tance between two clusters is defined(instead of linkage metric)as a decrease in log-likelihoodd(C1,C2)=l C1+l C2−l C1∪C2caused by merging of the two clusters under consideration.The agglomerative process continues until the stopping criterion is satisfied.As such,determina-tion of the best k is automatic.This algorithm has the commercial implemen-tation(in SPSS Clementine).The complexity of the algorithm is linear in N for the summarization phase.Traditional hierarchical clustering does not change points membership in once assigned clusters due to its greedy approach:after a merge or a split is selected it is not refined.Though COBWEB does reconsider its decisions,its。
Kinect 3D 扫描重建程序简介说明书
2nd International Conference on Machinery, Electronics and Control Simulation (MECS 2017)A Brief Talk on the 3D Scanning Reconstruction Program Based onKinect and its ApplicationWang Yongsheng1, a, Zhang Qizhi2, b,* and Liu Xiao2, c1 School of art and design, Lanzhou Jiaotong University, Lanzhou 730070, China2 School of art and design, Lanzhou Jiaotong University, Lanzhou 730070, Chinaa**************,b**********************,c***************Abstract: By combining the Kinect (a somatosensory device) with a supporter and a rotary wheel and using a 3D scanning software named KScan3D to realize a quick scan of the human body, this paper constructs two scanning systems, one of which is equipped with a handheld single-camera comparing to the other adopting multiple fixed cameras, so as to generate virtual 3D models and provide various applications, such as printing 3D objects, generating web pages or making animations and virtual exhibition models.Keywords: 3D scanning restructure; painted sculptures of Dunhuang; digitalization.1. IntroductionThe rapid growth of computer sciences and digital VR techniques has made it possible for using 3D scanning and printing technologies to realize digital 3D visual reconstruction. Such advancement has bee promptly applied to various areas including construction, medical care, archeology and industrial use. Compared to traditional modeling methods, 3D scanning and printing technologies are faster and more precise. Due to a wide range of utilities, human body modeling has become a desperately urgent need.2. 3D Scanning ReconstructionAs described during the experiment in this paper, data are first obtained through Kinect, a somatosensory camera sensor. Subsequently, the data are converted to a 3D mesh by using a software called Kscan3D. A mesh with 360 degree coverage can be achieved based on data showing different angles with the help of Kscan3D, which is capable of automatically capture and organize the 3D grip. Deleting unnecessary point from the data can smooth the data before the final combining and compositing to generate a complete model.2.1 Hardware Construction2.1.1Hardware ConstructionThis system is designed to scan people standing about 1.6 to 1.9 meters tall, which means the supporter needs to be 2.5 meters high at the same level with the Kinect. Each Kinect is set about 64cm away from the other and is responsible for scanning a part of the body. Then it is the software’s job to connect the separated scanning results. The Kinect on the highest position scans the head and shoulders, while the second is in charge of the arms. Similarly, the third shoots the waist and the forth scans the legs and the feet. If this person is short, 3 Kinects may be enough, using the first Kinect for scanning the head and shoulders, the second for arms and waist, and the third for legs and feet. This experiment is conducted by using 3 Kinects. Besides, a rotary wheel running at a uniform velocity is required and its loading capacity has to be adequate to supporting a person for full 360 degree scanning. The rotary wheel is 66cm away from the aluminum supporter as shown in Figure 1.Fig. 12.1.2 What is Kinect and how does it work:Kinect is a somatosensory camera sensor, providing multiple features, for instance, motion capture, image recognition, microphone input and voice recognition. Players are enabled to interact with virtual objects and other players in game as well as to share pictures and information with other XBOX users via internet. Kinect is first designed as a peripheral input equipment for the Xbox and later on Kinect for windows is developed for connecting with PC.How does Kinect work: Kinect is a combination of various sensors, which comprises a color camera,a pair of depth cameras (one is used to emit infrared ray, the other is to received the returned ray), an array of microphones and a motor base, Kinect is able to work within a range from 1.2 m to 3.5 m. Asof Kinect for Windows, the range of the camera is shortened to 40 cm to 3 m.The depth camera uses IR to obtain the depth value of each point corresponding to the image returned (the depth value is actually the distance from the object to the Kinect in mm with 5mm tolerance). The MIC array is actually a microphone arranged horizontally. Due to the gap, it can be used to determine where the speaker is and can reduce noises.First of all, the program has to detect the connected Kinect for initialization. Second, the program is required to configure and enable its desired data flow, i.e. the expected data, including color information, depth data, bone data flow and IR data stream. Next, the corresponding data flow is obtained in accordance with the previous setup. Finally, it is to stop the Kinect and release PC resources.2.1.3 Precautions about the scanned objectOn selecting an object for scanning, it is the first thing to do to think over its features, such as its size, shape, weight and surface material. Sensors of Kscan3D and Kinect/Xtion are capable of scanning a lot of objects, for example, human body, furniture, house or even bigger stuff. Due to the resolution limit, distance from the senor to the object is not supposed to be less than 40 cm and the scanned object not smaller than a grown man's shoe. Objects that are too thin may not have satisfactory scanning results. Generally speaking, the best object to be scanned should be in neutral colors with matte opaque materials. It is hard to scan things of surface materials that are whether dark, reflective, transparent or translucent. Therefore, preparations should be made before scanning. Environmental factors, including air, lights and movements, determine the success or failure of the experiment to a large extent. As of a physical object, it's a must to take its size into consideration and find a way to cover all the angles. For a small and light object, object is placed on the rotary wheel for scanning, while sensors are fixed. In general, the ambient light should cover all directions. As a matter of fact, sunlight may affect the working of IR sensors. It is not easy for scanning under sunlight. Therefore, if necessary, the scanning work is suggested to be done outside in a cloudy weather or be accomplished inside. During the scanning process, it is important to ensure the person or object motionless because movements may lead to a data acquisition failure.2.2Software IntroductionKScan3D is a 3D scanning software utilizing Kinect or Xtion to acquire point cloud data in a real-time manner, which allows multiple depth cameras to work simultaneously for scanning real objects and supports quick generating of complete 3D models.2.3Integration Solution to the 3D Scanning SystemThere are mainly two types of 3D scanning solutions. One is a handheld scanning system with a single camera, while the other adopts multiple fixed cameras.2.3.1 Handheld single-camera scanning systemScanning modes for the handheld single-camera scanning system: the single-camera system has two scanning modes, which are individual scanning and batch scanning.(1)Make sure the box of "enable batch scanning" within the scanning pane is not checked;(2)Set up an initial delay (seconds);(3)Turn the sensors towards the object and ensure the object can be seen from the real-time video feedback;(4)Click the SCAN buttonFigure 2 shows the scanning results.Fig. 2Batch scanning:(1)Check the box of "enable batch scanning" within the scanning pane;(2)Select the number of objects to be scanned(3)Set up a time delay between two scans(4)Set up an initial delay (seconds);(5)Turn the sensors towards the object and ensure the object can be seen from the real-time video feedback;(6)Click the SCAN buttonFigure 3 shows the scanning results.Fig. 32.3.2Fixed multiple-cameras system1. Start the KScan3D software and make sure all sensor are working, as shown in Figure 4.Fig. 42.Adjust the position of the sensorIf needed, it is Ok to move up or move down the position of the sensor in the list until the live response sequence complies with the sensor's actual position, as shown in Figure 5.Fig. 53. Capture an individual scan.Correctly set up the scanning options without checking the option for batch scanning. Adjust the parts to be scanned to the corresponding position and delete until the end, as shown in Figure 6 and Figure 7.Fig.6Fig.74. Capture a full-body scan(1)Capture a full-body scan. Set up to the Capture Only mode and check the box of batch scanning with the number of scans set to 40. The person should stand in the middle of the rotary wheel. Start the wheel before clicking the scan button as shown in Figure 8 and Figure 9.Fig. 8Fig.9(2)Click the mesh-editing button and select the first picture. Press ctrl+a to select all thumbnails. Click the Point Cloud panel to set up the alignment as shown in Figure 10.Fig.10(3)Click the Build (for compositing) button and Kscan3D can turn each point into a mesh. In the end, delete unnecessary data. Using the basic mesh editing functions can help complete and export a high-quality result.Fig.11(4)Click the Finalize (finish) button to get the final model. Select the Finalize button for a combined mesh. Combination is required to be completed before export. The three properties in the finish step are used to adjust parameters like mesh density, inner fill and decimation, as shown in Figure 12.Fig.122.4 As long as the model is generated, select the Export button to export the mesh or point cloud to the current file. KScan3D can export the following file formats: fbx, obj, ply, stl, asc and 3d3. (1)Select a format as shown in Figure 13Figure 13(2)Export dataIn the control panel, load the mesh or point cloud to be exported. Please note that combine mesh is not supported for export. Click the Export button on the tool bar and an interface of folder selection will pop up as shown in Figure 14.Fig. 14Select a specified folder or create a new folder to store the mesh or pint cloud as shown in Figure 15.Fig. 153. 3D printoutsTechnical support: 3D printing is a form of fast manufacturing technology. By "slicing" the drawing of the object to print into numerous layers, the processor heats up and presses the filament PLA (a new type of biodegradable material as the raw material) by utilizing the fuses deposition molding technology (FDM) to process each layer under computer control so as to get the formed 3D object. This is the most commonly adopted 3D printing method featuring high technical maturity and low cost.Working steps of a 3D printer: use Kinect to scan the model and set up the printing parameters with the built-in software. Import the stl file from the memory card to the printer for printing. This paper uses a 3DA8 industrial printer manufactured by JGAURORA, which is able to print a maximum size of 350*250*300mm. The printing of large objects can be done through splicing. Coloring can be completed through spraying or manual painting.Models created during this experiment are shown in Figure 16, Figure 17and Figure 18.Fig.16Fig.17Fig.184. Online interaction demoBlender is a virtualization engine for fast modeling, which is often used by modeling for games. As an open-source WebGL framework, Blend4Web can be used to create interactive webpages. Without going out, a user can have the chance to view 3D human models on webpages, check for details by rotating angles or zooming and share it to other users. This is impossible to realize by traditional browsing methods.Through the Blender4web plugin, the scanned and finished human body model is allowed to be directly exported to a .html file. The Blender4web export mode is not included in the Blender so we have to manually install it: first, download the add-on corresponding to the Blender version from the official website for bledner4web. Then, in the Blender, click File->User Preferences->File-> scrips->the zip containing the bledner4web add-on-> ->Save User Settings. Next, exit the blend and open File->User Preferences->Addons->import-export->Blend4web. After that, the .html option can be seen in the File->Export. Figure 19shows an exported webpage.Fig.195. Applied to animation and virtual exhibitionUsing the 3D scanning technology to conduct a comprehensive human body scan, a 3D model of human body and materials can be obtained. With parameter adjustment to enrich the changes and combinations, once these color models are inputted into 3D animation software like 3dsmax, fast, accurate and vivid performance can be achieved as well as virtual exhibition, which has greatly boosted the diversified development of the animation industry.6. Conclusion3D scanning and printing technologies have made it possible to express human models in a perfect, accurate and quick manner by inputting the scanned 3D information into a computer. This paper utilizes two scanning methods for human body modeling and studies their feasibility by printing a human model. However, due to scanning angles, the model may have flaws affecting the accuracy of modeling, which need to be improved in the future.References[1]System of Online Fitting Room Based on Web [J], Yang Wenwen, Guo Jianan and Yang Xiaodong, Computer Era, 2015, 5[2]Application of 3D Scanning and Printing Technologies to the Repair of Crack Decorative Components [A], Jiang Yueju, Lv Haijun, Yang Xiaoyi, Xu Wei and Ma Xing Sheng, Construction Technology, 2016,12[3]Research on the 3D Scanning and Printing of Human Head [A], Song Junfeng, Shenyang University of Technology, 2016[4]Development of the 3D Printing Technology and its Software Implementation [J], Shi Yusheng, Zhang Chaobai, Bai Yu and Zhao Zuye, Science China Information, 2015[5]Present Research on the 3D Printing Technology and Critical Know-Hows [J], Journal of Material Engineering, Zhang Xuejun, Tang Siyi, Zhao Hengyue, Guo Shaoging, Li Neng, Sun Bingbing and Chen Bingqing, 2016, 2[6]Research on the Technology and Working Principle of Kinect [J], Shi Manyin, Natural science journal of Harbin Normal University, 2013, 6。
生物质半纤维素稀酸水解反应
( 1. College of Biotechnology and Pharmaceutical Engineering ,Nanjing University of Technology , Nanjing 210009 ,China ; 2. State Key Laboratory of Materials-Oriented Chemical Engineering ,Nanjing University of Technology ,Nanjing 210009 ,China ; 3. College of Science ,Nanjing University of Technology ,Nanjing 210009 ,China ) Abstract Hemicellulose is the second largest component of lignocellulosic biomass. The conversion of
[ 11 ,12 ]
1 Xβ
1 ( Meo ) 4 GA α
1 R
[ 14 , 15 ]
1 OAc L-Af α
。与 其 他 方 法 相 比 ( 如 蒸 气 爆 破、 碱处理
图1
禾本科植物 半 纤 维 素 基 本 结 构 简 图
( X : β -D-木
等) , 稀酸可以有效水解半纤 维 素, 转 化 80% —90% 的半纤维素糖, 并有利于纤维素的酶水解糖化, 且成 本较低
[ 5]
木质纤维素类生物质主要由纤维素、 半纤维素 和木质素 组 成, 其 中 半 纤 维 素 一 般 占 20% —35% 。 半纤维素可作为胆 固 醇 抑 制 剂 和 药 片 分 解 剂 等
三维输入卷积神经网络脑电信号情感识别
情感在人们的日常生活中起着至关重要的作用。
目前,情感识别的研究对象有文本、语音、脑电以及其他的生理信号等。
情感识别已经成为人工智能、计算机科学和医疗健康等领域的研究重点。
早期的情感识别主要是基于面部表情或者语音来进行识别,后来有基于心率、肌电、呼吸等外围生理信号进行情感识别。
与上述的方式相比,脑电(Electroencephalogram,EEG)信号作为中枢神经生理信号,其不会因为人们的主观因素而受到影响,更能够客观真实地反映人们当前的情感状态。
因此,近年来脑电信号被广泛应用于情感识别研究领域[1]。
由于在大数据集上,使用深度学习通常比使用机器学习所取得的效果更优,其已成功应用在计算机视觉、语音识别、自然语言处理等各个领域,因此受到了各个三维输入卷积神经网络脑电信号情感识别蔡冬丽,钟清华,朱永升,廖金湘,韩劢之华南师范大学物理与电信工程学院,广州510006摘要:为了保留电极之间的空间信息以及充分提取脑电信号(Electroencephalogram,EEG)特征,提高情感识别的准确率,提出一种基于三维输入卷积神经网络的特征学习和分类算法。
采用单熵(近似熵(Approximate Entropy,ApEn)、排列熵(Permutation Entropy,PeEn)和奇异值分解熵(Singular value decomposition Entropy,SvdEn))以及其组合熵特征,分别在DEAP数据集进行效价和唤醒度两个维度上的情感识别实验。
实验结果表明,采用组合熵特征比单熵特征在情感识别实验中准确率有显著提高。
最高组合熵特征平均准确率在效价和唤醒度上分别为94.14%和94.44%,比最高单熵特征平均准确率分别提高了5.05个百分点和4.49个百分点。
关键词:脑电信号;情感识别;近似熵;排列熵;奇异值分解熵;卷积神经网络;组合特征文献标志码:A中图分类号:TP391doi:10.3778/j.issn.1002-8331.1912-0126EEG Emotion Recognition Using Convolutional Neural Network with3D InputCAI Dongli,ZHONG Qinghua,ZHU Yongsheng,LIAO Jinxiang,HAN MaizhiSchool of Physics and Telecommunication Engineering,South China Normal University,Guangzhou510006,China Abstract:In order to preserve the spatial information between the electrodes and fully extract the characteristics ofElectroencephalogram(EEG)and improve the accuracy of emotion recognition,a feature learning and classification algo-rithm based on convolutional neural network with3D input is proposed.The single entropy(approximate entropy,permu-tation entropy and singular value decomposing entropy)and its combined entropy characteristics are used to perform emotion recognition experiments in the DEAP dataset on the two dimensions of valenceand arousal.The experimental results show that the accuracy of the combined entropy feature is significantly higher than that of the single entropy feature in the emotion recognition experiments.The average accuracy of the highest combined entropy characteristics are 94.14%and94.44%in the valence and arousal,respectively,which are5.05percentage points and4.49percentage points higher than the average accuracy of the highest single entropy.Key words:Electroencephalogram(EEG);emotion recognition;approximate entropy;permutation entropy;singular val-ue decomposition entropy;convolutional neural network;combined features基金项目:国家自然科学基金(61871433);广东省优秀青年教师培养计划资助项目(YQ2015046);广州市珠江科技新星资助项目(201610010199)。
kinetics-skeleton格式行为数据提取方法 -回复
kinetics-skeleton格式行为数据提取方法-回复KineticsSkeleton格式是一种常用的行为数据提取方法,它可以从视频中提取出各种人体动作信息。
本文将以KineticsSkeleton格式行为数据提取方法为主题,详细介绍其原理、步骤和应用。
一、KineticsSkeleton格式简介KineticsSkeleton格式是一种用于行为识别和动作分析的数据格式。
它以关键点的形式表示人体动作,通过对关键点的跟踪和分析,可以获取到人体在不同时间点的姿态和动作信息。
KineticsSkeleton格式主要包括以下几个要素:视频ID、视频URL、帧率(FPS)、关键点列表。
二、原理KineticsSkeleton格式的提取基于人体姿态估计的技术,通过计算机视觉和机器学习算法,从视频中检测和跟踪人体关键点。
常用的人体姿态估计方法有两种:基于2D关键点的方法和基于3D关键点的方法。
前者从视频中提取2D关键点坐标,后者则在此基础上获得3D关键点的坐标。
三、步骤1. 数据预处理首先,需要准备用于行为数据提取的视频。
这些视频可以是采集的真实场景,也可以是模拟生成的合成视频。
为了提高算法的准确性,还可以在预处理阶段对视频进行降噪、裁剪和调整分辨率等操作。
2. 人体姿态估计接下来,使用人体姿态估计算法对视频进行处理。
这些算法可以是传统的计算机视觉方法,也可以是基于深度学习的神经网络模型。
常用的深度学习模型有OpenPose、HRNet和AlphaPose等。
这些模型将视频帧作为输入,输出每个关键点的坐标。
3. 关键点跟踪在得到初始关键点坐标后,需要对它们进行跟踪,以便在整个视频序列中保持连续性。
关键点跟踪可以采用传统的光流算法,或者基于图像特征匹配的方法。
跟踪的目标是确保关键点在连续视频帧之间的正确对应。
4. 数据格式化经过关键点跟踪后,可以将每个帧中的关键点坐标整理成KineticsSkeleton格式。
二、通用名称、功能分类,用量和使用范围
二、通用名称、功能分类,用量和使用范围本文叙述的葡糖异构酶(Glucose Isomerase,简称GI)制剂是通过重组锈棕色链霉菌生产菌株(Streptomyces rubiginosus)发酵生产的,该重组锈棕色链霉菌携带有多拷贝来自锈棕色链霉菌自身的葡糖异构酶编码基因GI。
通用名称:葡糖异构酶Glucose Isomerase功能分类:加工助剂食品工业用酶制剂使用范围:由玉米等淀粉原料工业生产高果糖糖浆用量:按生产需要适量使用来源(生产菌):锈棕色链霉菌Streptomyces rubiginosus供体:锈棕色链霉菌Streptomyces rubiginosus系统名称:D-木糖醛糖-酮糖异构酶D-xylose aldose-ketose-isomerase其他使用名称:葡糖异构酶Glucose Isomerase;木糖异构酶Xylose Isomerase;D-木糖酮异构酶D-xylose ketoisomerase;D-木糖酮醇异构酶D-xylose ketol-isomeraseCAS号: 9055-00-9EC号: 5.3.1.5商品名称:三、证明技术上确有必要和使用效果的资料或者文件3.1 锈棕色链霉菌来源的葡糖异构酶的功能类别及作用机理3.1.1 功能类别葡糖异构酶作为食品工业加工助剂,一般采用固定化的形式,用于工业化生产高果糖玉米糖浆(HFCS,High Fructose Corn Syrup),将由玉米淀粉酶解而来的葡萄糖异构化生成甜度更高的果糖。
葡糖异构酶在高果糖玉米糖浆的生产应用如下图所示。
在食品工业生产食用糖浆的工艺中,由玉米等谷物来源的淀粉经液化、糖化、除杂等步骤加工处理生成葡萄糖糖浆,成分主要为葡萄糖,还有少量低聚糖,如麦芽糖、异麦芽糖、潘糖等。
将该葡萄糖浆以一定的流速通过填充有固定化葡糖异构酶的填充柱,使部分葡萄糖异构化为果糖,则产品为高果糖糖浆。
高果糖玉米糖浆是一种葡萄糖和果糖的1:1等量混合糖浆,是蔗糖的甜度的1.3倍,被用作糖尿病人使用的甜味剂。
kinetics-skeleton格式行为数据提取方法 -回复
kinetics-skeleton格式行为数据提取方法-回复Kinetics Skeleton格式行为数据提取方法是一种用于从视频中提取姿势和动作的计算机视觉技术。
该方法通过对视频序列中的每帧图像进行分析,识别和跟踪人体关节点,最终生成人体姿势和动作序列。
Kinetics Skeleton格式行为数据提取方法在许多领域中得到了广泛应用,包括人体行为识别与分析、人机交互、虚拟现实等。
首先,我们需要了解Kinetics Skeleton格式行为数据提取的基本原理。
该方法依赖于深度学习和计算机视觉技术。
通过预训练的深度学习模型,我们可以将视频序列中的每一帧转化为包含人体关节点的骨骼图。
骨骼图是由关节点以及它们之间的连接关系组成的。
每个关节点代表人体的一个特定部位,例如头、肩膀、手臂、腿等。
连接关系表示关节点之间的空间位置关系。
其次,我们需要准备一些工具和数据。
对于Kinetics Skeleton格式行为数据提取,我们需要使用一种适用于视频处理的编程语言和计算机视觉库,如Python和OpenCV。
此外,我们还需要一些视频数据集,可以从公开的数据集中获取,如Kinetics或其他行为识别数据集。
接下来,我们将介绍主要的步骤和方法来提取Kinetics Skeleton格式行为数据。
步骤1:数据预处理在进行数据提取之前,我们需要对视频数据进行预处理。
这包括读取视频文件、调整视频分辨率、裁剪视频等。
预处理步骤的目的是为了确保视频序列的质量和规范性,以便后续的关节点检测和姿势跟踪。
步骤2:关节点检测关节点检测是Kinetics Skeleton格式行为数据提取的关键步骤。
我们使用预训练的深度学习模型来检测每帧图像中的人体关节点。
该模型可以将人体关节点提取为坐标点的形式,以便后续的姿势跟踪。
步骤3:姿势跟踪在关节点检测之后,我们需要对关节点进行姿势跟踪。
姿势跟踪的目的是将关节点在视频序列中进行连接,形成骨骼图。
基于混沌映射的小波域脆弱数字水印算法
基于混沌映射的小波域脆弱数字水印算法
葛为卫;崔志明;吴健
【期刊名称】《计算机工程与设计》
【年(卷),期】2008(029)008
【摘要】结合小波变换技术,提出一种新的脆弱水印算法.为了提高该算法的安全性,在生成水印信息和检测水印时映射到每个小波系数,并运用混沌理论,在提取水印时实现盲检测.实验结果表明,该方法在保证水印不可见的同时,能精确验证出图像数据中每个像素点的变化,具有良好的验证功能.
【总页数】3页(P2137-2139)
【作者】葛为卫;崔志明;吴健
【作者单位】苏州大学,智能化信息处理研究所,江苏,苏州,215006;苏州大学,智能化信息处理研究所,江苏,苏州,215006;苏州大学,智能化信息处理研究所,江苏,苏
州,215006
【正文语种】中文
【中图分类】TP309.7
【相关文献】
1.混沌映射的半脆弱图像数字水印算法 [J], 李赵红;侯建军;宋伟
2.基于混沌映射的小波域自适应数字水印算法 [J], 黄管乐
3.基于Logistic混沌映射的DCT域脆弱数字水印算法 [J], 李赵红;侯建军
4.基于混沌映射的小波域数字水印算法 [J], 薛琴;彭进业
5.基于二维Logistic混沌映射DWT脆弱数字水印算法 [J], 陈善学;彭娟;李方伟
因版权原因,仅展示原文概要,查看原文内容请购买。
基于非对称多解码器和注意力模块的三维肾脏影像结构分割模型
基于非对称多解码器和注意力模块的三维肾脏影像结构分割模型在医学影像领域,三维肾脏影像结构分割是一个关键的任务,对于辅助医生进行疾病诊断和治疗起着重要的作用。
为了提高分割的准确性和效率,本文提出了一种基于非对称多解码器和注意力模块的三维肾脏影像结构分割模型。
1. 引言医学影像学是现代医学中不可或缺的一部分,而影像分割在医学影像学中扮演着至关重要的角色。
特别是对于肾脏的病变诊断,准确的分割结果对医生的诊断和治疗决策具有重要意义。
2. 相关研究目前,许多基于深度学习的方法已经被应用于医学影像分割任务。
其中,UNet和其改进模型成为了主流方法之一。
但是,传统的UNet存在着一些问题,如信息流动的不平衡和缺乏细节信息的损失。
3. 模型设计为了克服传统UNet的问题,我们提出了一个基于非对称多解码器和注意力模块的三维肾脏影像结构分割模型。
首先,我们采用了一个编码器-解码器的架构,其中编码器负责提取输入影像的高层语义特征,而解码器则将这些特征重构为分割结果。
为了更好地捕捉不同尺度下的特征,我们设计了非对称多解码器结构,其中每个解码器负责捕捉不同尺度下的特征,并通过融合模块将它们集成到最终的分割结果中。
另外,我们引入了注意力模块来强调对重要的特征区域的关注。
注意力模块包括一个注意力门控机制和一个特征融合层。
注意力门控机制对特征图进行加权处理,以突出重要的特征区域,而特征融合层将这些加权特征融合到最终的分割结果中。
4. 实验与结果我们使用了公开的肾脏影像数据集进行实验,评估了我们提出的模型的性能。
与传统的UNet和其他改进模型相比,实验结果显示我们的模型在准确性和效率方面取得了显著的改进。
我们的模型能够更准确地分割出肾脏结构,同时具有较高的计算效率。
5. 结论本文提出了一种基于非对称多解码器和注意力模块的三维肾脏影像结构分割模型。
实验证明,我们的模型在肾脏分割任务上具有较好的性能。
未来,我们将进一步改进模型,提高其在更多医学影像任务中的适用性,并推广到临床实践中。
影响美拉德反应的几种因素研究
到 25 ℃测定吸光度,室温分别放置 4 h 和 24 h,并测 定其相应吸光度。
进行。因此,在这样的条件下美拉德反应不显著。 (2)在碱性条件下,由于邻近 n 原子的影响,糖碱
1.3.3 赖氨酸与不同浓度和不同种类糖的反应 分别称取适量的赖氨酸和 5 种糖,用 pH=10.0
基 C1 上电子密度增大,使 1,2-烯醇化转为困难,所以 在碱性介质中,一般进行 2,3-烯醇化。
441
现代食品科技
Modern Food Science and Technology
2010, Vol.26, No.5
用棕色瓶置于暗处储存备用。吸取一定量的反应液于 具塞试管中,在温度为 100 ℃水浴中加热时间 1 h。 反应结束即用水冷却到 25 ℃测其吸光度。
增加,在 pH=8.0 和 pH=10.0 出现两个突变点,而 pH=9.0~10.0 相对变化较小,pH=11.0~12.0 趋于稳 定。pH 对美拉德反应的影响,其原因一般认为[7-9]:
复取平均值。
2 结果与讨论
图 2 温度和时间对美拉德反应的影响 Fig.2 Effect of temperature and time on Maillard reaction
2.1 不同 pH 对反应的影响
由图 2 可见,不同温度加热相同时间的吸光度不
由图 1 可见,美拉德反应在 pH 为 5.0~7.0 时,其 同。总体来说,吸光度随温度的升高而增加,随加热
的缓冲液分别定容至 0.1 mol来自L。取糖溶液 2 mL、4 mL、
6 mL、8 mL 分别加入 4 支试管中,再往每支试管中
加入 0.1 mol/L 的赖氨酸溶液 2 mL,最后加入一定量
缓冲溶液,使反应体系达到 10 mL,调 pH=10.0,将 试管置于 100 ℃水浴中加热,40 min 后取出,冷却到
D-S证据理论在目标识别与检测中的应用研究的开题报告
D-S证据理论在目标识别与检测中的应用研究的开题报告一、研究背景目标识别与检测一直是计算机视觉领域的研究热点和难点之一。
在实际应用中,目标的形态、尺寸、角度等因素的变化、遮挡、噪声的干扰以及光照变化等因素,均会影响目标识别与检测的精度和有效性。
因此,如何提高目标识别与检测的鲁棒性和准确性一直是该领域研究的难点之一。
在目标识别与检测中,D-S证据理论被广泛应用。
该理论可以有效地解决遮挡、多尺度、多视角等问题,提高目标检测与识别的准确性和鲁棒性。
因此,本研究将探究D-S证据理论在目标识别与检测中的应用。
二、研究内容本研究旨在探究D-S证据理论在目标识别与检测中的应用,并针对该领域中存在的一些问题进行探讨。
具体研究内容如下:(1)D-S证据理论的原理与基本概念。
(2)研究目标识别与检测中存在的问题,并探讨D-S证据理论如何解决这些问题。
(3)在实际应用中,通过实验对D-S证据理论在目标识别与检测中的应用进行验证。
(4)对研究结果进行总结和分析。
三、研究目标本研究的目标是深入阐述D-S证据理论在目标识别与检测中的应用,并从理论和实践两个层面,解决目标识别与检测中的常见问题,论证D-S证据理论的有效性。
同时,本研究旨在推动目标识别与检测领域的研究进展。
四、研究方法本研究采用理论分析和实验验证相结合的研究方法。
首先,通过文献综述分析目标识别与检测中存在的问题及D-S证据理论的研究现状。
然后,探讨D-S证据理论在目标识别与检测中的应用。
最后,设计实验验证D-S证据理论在目标识别与检测中的有效性。
五、研究意义本研究旨在提高目标识别与检测的准确性和鲁棒性,更好地满足实际应用需求。
通过探讨D-S证据理论在目标识别与检测中的应用,本研究为相关领域的研究提供理论基础和实验验证结果,推动该领域的研究进展。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
© 2000 Lippincott Williams & Wilkins, Inc. Published by Lippincott Williams & Wilkins, Inc.
3
Equation 1
D-xylose Kinetics and Hydrogen Breath Tests in Functionally Anephric Patients Using the 15-gram Dose. Craig, Robert; Carlson, Stephen; Ehrenpreis, Eli Journal of Clinical Gastroenterology. 31(1):55-59, July 2000.
Thank You
This PowerPoint document contains the images that you requested.
Copyright Notice
All Online Service materials, including, without limitation, text, pictures, graphics and other files and the selection and arrangement thereof are copyrighted materials of Ovid or its licensors, all rights reserved. Except for the Authorized Use specified above, you may not copy, modify or distribute any of the Online Service materials. You may not "mirror" any material contained on this Online Service on any other server. Any unauthorized use of any material contained on this Online Service may violate copyright laws, trademark laws, the laws of privacy and publicity and communications regulations and statutes.
TABLE 1
D-xylose Kinetics and Hydrogen Breath Tests in Functionally Anephric Patients Using the 15-gram Dose. Craig, Robert; Carlson, Stephen; Ehrenpreis, Eli Journal of Clinical Gastroenterology. 31(1):55-59, July 2000.
© 2000 Lippincott Williams & Wilkins, Inc. Published by Lippincott Williams & Wilkins, Inc.
6
5
TABLE 3
D-xylose Kinetics and Hydrogen Breath Tests in Functionally Anephric Patients Using the 15-gram Dose. Craig, Robert; Carlson, Stephen; Ehrenpreis, Eli Journal of Clinical Gastroenterology. 31(1):55-59, July 2000.
Equation 1
© 2000 Lippincott Williams & Wilkins, Inc. Published by Lippincott Williams & Wilkins, Inc.
4ቤተ መጻሕፍቲ ባይዱ
TABLE 2
D-xylose Kinetics and Hydrogen Breath Tests in Functionally Anephric Patients Using the 15-gram Dose. Craig, Robert; Carlson, Stephen; Ehrenpreis, Eli Journal of Clinical Gastroenterology. 31(1):55-59, July 2000.
TABLE 1 . Patient characteristics
© 2000 Lippincott Williams & Wilkins, Inc. Published by Lippincott Williams & Wilkins, Inc.
2
FIG. 1
D-xylose Kinetics and Hydrogen Breath Tests in Functionally Anephric Patients Using the 15-gram Dose. Craig, Robert; Carlson, Stephen; Ehrenpreis, Eli Journal of Clinical Gastroenterology. 31(1):55-59, July 2000.
TABLE 3 . Comparison of means and ranges for kinetic parameters following 15 g d-xylose (current study) and 25 g d-xylose dosing (ref. 1) in patients with chronic renal failure (CRF)
FIG. 1 . Pharmacokinetic model of d-xylose. The gastrointestinal (GI), central, and peripheral compartments are in circles. The rate constant for nonabsorptive loss is Ko. The rate constant for absorption is Ka. The intercompartmental rate constant is Kic.
TABLE 2 . Patient data, 15 g d-xylose, chronic renal failure
© 2000 Lippincott Williams & Wilkins, Inc. Published by Lippincott Williams & Wilkins, Inc.