H. DLV-HEX Dealing with Semantic Web under Answer-Set Programming
(subject to the weak annotation constraints),and optimiz-ing the DCNN parameters using stochastic gradient descent (SGD).When we only have access to image-level anno-tated training data,we achieve39.6%,close to[31]butwithout relying on any external objectness or segmenta-tion module.More importantly,our EM approach also excels in the semi-supervised scenario which is very im-portant in practice.Having access to a small number of strongly (pixel-level)annotated images and a large number of weakly (bounding box or image-level)annotated images,the proposed algorithm can almost match the performance of the fully-supervised system.For example,having access to 2.9k pixel-level images and 9k image-level annotated im-ages yields 68.5%,only 2%inferior the performance of the system trained with all 12k images strongly annotated at the pixel level.Finally,we show that using additional weak or strong annotations from the MS-COCO dataset can further improve results,yielding 73.9%on the PASCAL VOC 2012benchmark.Contributions In summary,our main contributions are:1.We present EM algorithms for training with image-level or bounding box annotation,applicable to both the weakly-supervised and semi-supervised settings.2.We show that our approach achieves excellent per-formance when combining a small number of pixel-level annotated images with a large number of image-level or bounding box annotated images,nearly match-ing the results achieved when all training images have pixel-level annotations.3.We show that combining weak or strong annotations across datasets yields further improvements.In partic-ular,we reach 73.9%IOU performance on PASCAL VOC 2012by combining annotations from the PAS-CAL and MS-COCO datasets.2.Related workTraining segmentation models with only image-level labels has been a challenging problem in the literature [12,36,37,39].Our work is most related to other re-cent DCNN models such as [30,31],who also study the weakly supervised setting.They both develop MIL-based algorithms for the problem.In contrast,our model em-ploys an EM algorithm,which similarly to [26]takes into account the weak labels when inferring the latent image seg-mentations.Moreover,[31]proposed to smooth the predic-tion results by region proposal algorithms,e.g .,CPMC [3]and MCG [1],learned on pixel-segmented images.Neither [30,31]cover the semi-supervised setting.Bounding box annotations have been utilized for seman-tic segmentation by [38,42],while [15,21,40]describe schemes exploiting both image-level labels and bounding box annotations.[4]attained human-level accuracy for car segmentation by using 3D bounding boxes.Bounding box annotations are also commonly used in interactive segmen-tation [22,33];we show that such foreground/backgroundPixel annotationsImage Deep Convolutional Neural NetworkLossFigure 1.DeepLab model training from fully annotated images.segmentation methods can effectively estimate object seg-ments accurate enough for training a DCNN semantic seg-mentation system.Working in a setting very similar to ours,[9]employed MCG [1](which requires training from pixel-level annotations)to infer object masks from bounding box labels during DCNN training.3.Proposed MethodsWe build on the DeepLab model for semantic image seg-mentation proposed in [5].This uses a DCNN to predict the label distribution per pixel,followed by a fully-connected (dense)CRF [19]to smooth the predictions while preserv-ing image edges.In this paper,we focus for simplicity on methods for training the DCNN parameters from weak la-bels,only using the CRF at test time.Additional gains can be obtained by integrated end-to-end training of the DCNN and CRF parameters [41,6].Notation We denote by x the image values and y the seg-mentation map.In particular,y m ∈{0,...,L }is the pixel label at position m ∈{1,...,M },assuming that we have the background as well as L possible foreground labels and M is the number of pixels.Note that these pixel-level la-bels may not be visible in the training set.We encode the set of image-level labels by z ,with z l =1,if the l -th label is present anywhere in the image,i.e .,if m [y m =l ]>0.3.1.Pixel-level annotationsIn the fully supervised case illustrated in Fig.1,the ob-jective function isJ (θ)=log P (y |x ;θ)=Mm =1log P (y m |x ;θ),(1)where θis the vector of DCNN parameters.The per-pixellabel distributions are computed byP (y m |x ;θ)∝exp(f m (y m |x ;θ)),(2)where f m (y m |x ;θ)is the output of the DCNN at pixel m .We optimize J (θ)by mini-batch SGD.3.2.Image-level annotationsWhen only image-level annotation is available,we can observe the image values x and the image-level labels z ,but the pixel-level segmentations y are latent variables.WeAlgorithm 1Weakly-Supervised EM (fixed bias version)Input:Initial CNN parameters θ′,potential parameters b l ,l ∈{0,...,L },image x ,image-level label set z .E-Step:For each image position m1:ˆf m (l )=f m (l |x ;θ′)+b l ,if z l =12:ˆf m (l )=f m (l |x ;θ′),if z l =03:ˆy m =argmax l ˆf m (l )M-Step:4:Q (θ;θ′)=log P (ˆy |x ,θ)= M m =1log P (ˆy m |x ,θ)5:Compute ∇θQ (θ;θ′)and use SGD to update θ′.have the following probabilistic graphical model:P (x ,y ,z ;θ)=P (x )Mm =1P (y m |x ;θ)P (z |y ).(3)We pursue an EM-approach in order to learn the model parameters θfrom training data.If we ignore terms that do not depend on θ,the expected complete-data log-likelihood given the previous parameter estimate θ′isQ (θ;θ′)= yP (y |x ,z ;θ′)log P (y |x ;θ)≈log P (ˆy |x ;θ),(4)where we adopt a hard-EM approximation,estimating in the E-step of the algorithm the latent segmentation by ˆy =argmax yP (y |x ;θ′)P (z |y )(5)=argmax ylog P (y |x ;θ′)+log P (z |y )(6)=argmaxyMm =1f m (y m |x ;θ′)+log P (z |y ) .(7)In the M-step of the algorithm,we optimize Q (θ;θ′)≈log P (ˆy |x ;θ)by mini-batch SGD similarly to (1),treatingˆyas ground truth segmentation.To completely identify the E-step (7),we need to specifythe observation model P (z |y ).We have experimented withtwo variants,EM-Fixed and EM-Adapt .EM-Fixed In this variant,we assume that log P (z |y )fac-torizes over pixel positions aslog P (z |y )=Mm =1φ(y m ,z )+(const),(8)allowing us to estimate the E-step segmentation at eachpixel separatelyˆy m =argmaxy mˆf m (y m ).=f m (y m |x ;θ′)+φ(y m ,z ).(9)ImageFigure 2.DeepLab model training using image-level labels.We assume thatφ(y m =l,z )=b l if z l =10if z l =0(10)We set the parameters b l =b fg ,if l >0and b 0=b bg ,with b fg >b bg >0.Intuitively,this potential encourages a pixel to be assigned to one of the image-level labels z .We choose b fg >b bg ,boosting present foreground classes more than the background,to encourage full object coverage andavoid a degenerate solution of all pixels being assigned to background.The procedure is summarized in Algorithm 1and illustrated in Fig.2.EM-Adapt In this method,we assume that log P (z |y )=φ(y ,z )+(const),where φ(y ,z )takes the form of a cardi-nality potential [23,32,35].In particular,we encourage atleast a ρl portion of the image area to be assigned to classl ,if z l =1,and enforce that no pixel is assigned to classl ,if z l =0.We set the parameters ρl =ρfg ,if l >0andρ0=ρbg .Similar constraints appear in [10,20].In practice,we employ a variant of Algorithm 1.Weadaptively set the image-and class-dependent biases b l so as the prescribed proportion of the image area is assigned to the background or foreground object classes.This acts as a powerful constraint that explicitly prevents the background score from prevailing in the whole image,also promoting higher foreground object coverage.The detailed algorithm is described in the supplementary material.EM It is instructive to compare our EM-based approach with two recent Multiple Instance Learning (MIL)methods for learning semantic image segmentation models [30,31].The method in [30]defines an MIL classification objective based on the per-class spatial maximum of the lo-cal label distributions of (2),ˆP (l |x ;θ).=max m P (y m =l |x ;θ),and [31]adopts a softmax function.While this approach has worked well for image classification tasks [28,29],it is less suited for segmentation as it does not pro-mote full object coverage:The DCNN becomes tuned to focus on the most distinctive object parts (e.g .,human face)instead of capturing the whole object (e.g .,human body).ImageBbox annotationsDeep ConvolutionalNeural NetworkDenseCRFargmaxLossFigure3.DeepLab model training from bounding boxes.3.3.Bounding Box AnnotationsWe explore three alternative methods for training our segmentation model from labeled bounding boxes.Thefirst Bbox-Rect method amounts to simply consider-ing each pixel within the bounding box as positive example for the respective object class.Ambiguities are resolved by assigning pixels that belong to multiple bounding boxes to the one that has the smallest area.The bounding boxes fully surround objects but also contain background pixels that contaminate the training set with false positive examples for the respective object classes.Tofilter out these background pixels,we have also explored a second Bbox-Seg method in which we per-form automatic foreground/background segmentation.To perform this segmentation,we use the same CRF as in DeepLab.More specifically,we constrain the center area of the bounding box(α%of pixels within the box)to be fore-ground,while we constrain pixels outside the bounding box to be background.We implement this by appropriately set-ting the unary terms of the CRF.We then infer the labels for pixels in between.We cross-validate the CRF parameters to maximize segmentation accuracy in a small held-out set of fully-annotated images.This approach is similar to the grabcut method of[33].Examples of estimated segmenta-tions with the two methods are shown in Fig.4.The two methods above,illustrated in Fig.3,estimate segmentation maps from the bounding box annotation as a pre-processing step,then employ the training procedure of Sec.3.1,treating these estimated labels as ground-truth.Our third Bbox-EM-Fixed method is an EM algorithm that allows us to refine the estimated segmentation maps throughout training.The method is a variant of the EM-Fixed algorithm in Sec.3.2,in which we boost the present foreground object scores only within the bounding box area.3.4.Mixed strong and weak annotationsIn practice,we often have access to a large number of weakly image-level annotated images and can only afford to procure detailed pixel-level annotations for a small fraction of these images.We handlethishybrid training scenario byImage with Bbox Ground-Truth Bbox-Rect Bbox-SegFigure4.Estimatedsegmentation frombounding box annotation.+Pixel AnnotationsFG/BGBiasargmax1. Car2. Person3. HorseDeep ConvolutionalNeural Network LossDeep ConvolutionalNeural NetworkLossScore mapsFigure5.DeepLab model training on a union of full(strong labels)and image-level(weak labels)annotations.combining the methods presented in the previous sections,as illustrated in Figure5.In SGD training of our deep CNNmodels,we bundle to each mini-batch afixed proportionof strongly/weakly annotated images,and employ our EMalgorithm in estimating at each iteration the latent semanticsegmentations for the weakly annotated images.4.Experimental Evaluation4.1.Experimental ProtocolDatasets The proposed training methods are evaluatedon the PASCAL VOC2012segmentation benchmark[13],consisting of20foreground object classes and one back-ground class.The segmentation part of the original PAS-CAL VOC2012dataset contains1464(train),1449(val),and1456(test)images for training,validation,and test,re-spectively.We also use the extra annotations provided by[16],resulting in augmented sets of10,582(train aug)and12,031(trainval aug)images.We have also experimentedwith the large MS-COCO2014dataset[24],which con-tains123,287images in its trainval set.The MS-COCO2014dataset has80foreground object classes and one back-ground class and is also annotated at the pixel level.The performance is measured in terms of pixelintersection-over-union(IOU)averaged across the21classes.Wefirst evaluate our proposed methods on the PAS-CAL VOC2012val set.We then report our results on the official PASCAL VOC2012benchmark test set(whose an-notations are not released).We also compare our test set results with other competing methods.Reproducibility We have implemented the proposed methods by extending the excellent Caffe framework[18]. We share our source code,configurationfiles,and trained models that allow reproducing the results in this paper at a companion web site https:/// deeplab/deeplab-public.Weak annotations In order to simulate the situations where only weak annotations are available and to have fair comparisons(e.g.,use the same images for all settings),we generate the weak annotations from the pixel-level annota-tions.The image-level labels are easily generated by sum-marizing the pixel-level annotations,while the bounding box annotations are produced by drawing rectangles tightly containing each object instance(PASCAL VOC2012also provides instance-level annotations)in the dataset. Network architectures We have experimented with the two DCNN architectures of[5],with parameters initialized from the VGG-16ImageNet[11]pretrained model of[34]. They differ in the receptivefield of view(FOV)size.We have found that large FOV(224×224)performs best when at least some training images are annotated at the pixel level, whereas small FOV(128×128)performs better when only image-level annotations are available.In the main paper we report the results of the best architecture for each setup and defer the full comparison between the two FOVs to the supplementary material.Training We employ our proposed training methods to learn the DCNN component of the DeepLab-CRF model of [5].For SGD,we use a mini-batch of20-30images and ini-tial learning rate of0.001(0.01for thefinal classifier layer), multiplying the learning rate by0.1after afixed number of iterations.We use momentum of0.9and a weight decay of 0.0005.Fine-tuning our network on PASCAL VOC2012 takes about12hours on a NVIDIA Tesla K40GPU.Similarly to[5],we decouple the DCNN and Dense CRF training stages and learn the CRF parameters by cross val-idation to maximize IOU segmentation accuracy in a held-out set of100Pascal val fully-annotated images.We use10 mean-field iterations for Dense CRF inference[19].Note that the IOU scores are typically3-5%worse if we don’t use the CRF for post-processing of the results.4.2.Pixel-level annotationsWe havefirst reproduced the results of[5].Training the DeepLab-CRF model with strong pixel-level annota-tions on PASCAL VOC2012,we achieve a mean IOU scoreMethod#Strong#Weak val IOUEM-Fixed(Weak)-10,58220.8EM-Adapt(Weak)-10,58238.2EM-Fixed(Semi)20010,38247.650010,08256.97509,83259.81,0009,58262.01,4645,00063.21,4649,11864.6Strong1,464-62.510,582-67.6Table1.VOC2012val performance for varying number of pixel-level(strong)and image-level(weak)annotations(Sec.4.3).Method#Strong#Weak test IOUMIL-FCN[30]-10k25.7MIL-sppxl[31]-760k35.8MIL-obj[31]BING760k37.0MIL-seg[31]MCG760k40.6EM-Adapt(Weak)-12k39.6EM-Fixed(Semi)1.4k10k66.22.9k9k68.5Strong[5]12k-70.3Table2.VOC2012test performance for varying number of pixel-level(strong)and image-level(weak)annotations(Sec.4.3).of67.6%on val and70.3%on test;see method DeepLab-CRF-LargeFOV in[5,Table1].4.3.Image-level annotationsValidation results We evaluate our proposed methods in training the DeepLab-CRF model using image-level weak annotations from the10,582PASCAL VOC2012train aug set,generated as described in Sec.4.1above.We report the val performance of our two weakly-supervised EM vari-ants described in Sec.3.2.In the EM-Fixed variant we use b fg=5and b bg=3asfixed foreground and background biases.We found the results to be quite sensitive to the dif-ference b fg−b bg but not very sensitive to their absolute val-ues.In the adaptive EM-Adapt variant we constrain at least ρbg=40%of the image area to be assigned to background and at leastρfg=20%of the image area to be assigned to foreground(as specified by the weak label set).We also examine using weak image-level annotations in addition to a varying number of pixel-level annotations, within the semi-supervised learning scheme of Sec.3.4. In this Semi setting we employ strong annotations of a subset of PASCAL VOC2012train set and use the weak image-level labels from another non-overlapping subset of the train aug set.We perform segmentation inference for the images that only have image-level labels by means of EM-Fixed,which we have found to perform better than EM-Adapt in the semi-supervised training setting.The results are summarized in Table1.We see that the EM-Adapt algorithm works much better than the EM-Fixed algorithm when we only have access to image level an-notations,20.8%vs.38.2%validation ing1,464 pixel-level and9,118image-level annotations in the EM-Fixed semi-supervised setting significantly improves per-formance,yielding64.6%.Note that image-level annota-tions are helpful,as training only with the1,464pixel-level annotations only yields62.5%.Test results In Table2we report our test results.We com-pare the proposed methods with the recent MIL-based ap-proaches of[30,31],which also report results obtained with image-level annotations on the VOC benchmark.Our EM-Adapt method yields39.6%,which improves over MIL-FCN[30]by a large13.9%margin.As[31]shows,MIL can become more competitive if additional segmentation in-formation is introduced:Using low-level superpixels,MIL-sppxl[31]yields35.8%and is still inferior to our EM algo-rithm.Only if augmented with BING[7]or MCG[1]can MIL obtain results comparable to ours(MIL-obj:37.0%, MIL-seg:40.6%)[31].Note,however,that both BING and MCG have been trained with bounding box or pixel-annotated data on the PASCAL train set,and thus both MIL-obj and MIL-seg indirectly rely on bounding box or pixel-level PASCAL annotations.The more interestingfinding of this experiment is that including very few strongly annotated images in the semi-supervised setting significantly improves the performance compared to the pure weakly-supervised baseline.For example,using 2.9k pixel-level annotations along with 9k image-level annotations in the semi-supervised setting yields68.5%.We would like to highlight that this re-sult surpasses all techniques which are not based on the DCNN+CRF pipeline of[5](see Table6),even if trained with all available pixel-level annotations.4.4.Bounding box annotationsValidation results In this experiment,we train the DeepLab-CRF model using bounding box annotations from the train aug set.We estimate the training set segmentations in a pre-processing step using the Bbox-Rect and Bbox-Seg methods described in Sec.3.3.We assume that we also have access to100fully-annotated PASCAL VOC2012val images which we have used to cross-validate the value of the single Bbox-Seg parameterα(percentage of the cen-ter bounding box area constrained to be foreground).We variedαfrom20%to80%,finding thatα=20%maxi-mizes accuracy in terms of IOU in recovering the ground truth foreground from the bounding box.We also examine the effect of combining these weak bounding box annota-tions with strong pixel-level annotations,using the semi-supervised learning methods of Sec.3.4.The results are summarized in Table3.When using only bounding box annotations,we see that Bbox-Seg improves over Bbox-Rect by8.1%,and gets within7.0%of the strong pixel-level annotation result.We observe that combining 1,464strong pixel-level annotations with weak bounding box annotations yields65.1%,only2.5%worse than the strong pixel-level annotation result.In the semi-supervisedMethod#Strong#Box val IOUBbox-Rect(Weak)-10,58252.5Bbox-EM-Fixed(Weak)-10,58254.1Bbox-Seg(Weak)-10,58260.6Bbox-Rect(Semi)1,4649,11862.1Bbox-EM-Fixed(Semi)1,4649,11864.8Bbox-Seg(Semi)1,4649,11865.1Strong1,464-62.510,582-67.6Table3.VOC2012val performance for varying number of pixel-level(strong)and bounding box(weak)annotations(Sec.4.4).Method#Strong#Box test IOUBoxSup[9]MCG10k64.6BoxSup[9] 1.4k(+MCG)9k66.2Bbox-Rect(Weak)-12k54.2Bbox-Seg(Weak)-12k62.2Bbox-Seg(Semi) 1.4k10k66.6Bbox-EM-Fixed(Semi) 1.4k10k66.6Bbox-Seg(Semi) 2.9k9k68.0Bbox-EM-Fixed(Semi) 2.9k9k69.0Strong[5]12k-70.3Table4.VOC2012test performance for varying number of pixel-level(strong)and bounding box(weak)annotations(Sec.4.4).learning settings and1,464strong annotations,Semi-Bbox-EM-Fixed and Semi-Bbox-Seg perform similarly.Test results In Table4we report our test results.We com-pare the proposed methods with the very recent BoxSup ap-proach of[9],which also uses bounding box annotations on the VOC2012segmentation paring our al-ternative Bbox-Rect(54.2%)and Bbox-Seg(62.2%)meth-ods,we see that simple foreground-background segmenta-tion provides much better segmentation masks for DCNN training than using the raw bounding boxes.BoxSup does 2.4%better,however it employs the MCG segmentation proposal mechanism[1],which has been trained with pixel-annotated data on the PASCAL train set;it thus indirectly relies on pixel-level annotations.When we also have access to pixel-level annotated im-ages,our performance improves to66.6%(1.4k strong annotations)or69.0%(2.9k strong annotations).In this semi-supervised setting we outperform BoxSup(66.6%vs.66.2%with1.4k strong annotations),although we do not use MCG.Interestingly,Bbox-EM-Fixed improves over Bbox-Seg as we add more strong annotations,and it per-forms1.0%better(69.0%vs.68.0%)with2.9k strong an-notations.This shows that the E-step of our EM algorithm can estimate the object masks better than the foreground-background segmentation pre-processing step when enough pixel-level annotated images are available.Comparing with Sec.4.3,note that2.9k strong+9k image-level annotations yield68.5%(Table2),while2.9k strong+9k bounding box annotations yield69.0%(Ta-ble3).Thisfinding suggests that bounding box annotations add little value over image-level annotations when a suffi-cient number of pixel-level annotations is also available.Method#Strong COCO#Weak COCO val IOU PASCAL-only--67.6EM-Fixed(Semi)-123,28767.7Cross-Joint(Semi)5,000118,28770.0Cross-Joint(Strong)5,000-68.7Cross-Pretrain(Strong)123,287-71.0Cross-Joint(Strong)123,287-71.7 Table5.VOC2012val performance using strong annotations for all10,582train aug PASCAL images and a varying number of strong and weak MS-COCO annotations(Sec.4.5).Method test IOUMSRA-CFM[8]61.8FCN-8s[25]62.2Hypercolumn[17]62.6TTI-Zoomout-16[27]64.4DeepLab-CRF-LargeFOV[5]70.3BoxSup(Semi,with weak COCO)[9]71.0DeepLab-CRF-LargeFOV(Multi-scale net)[5]71.6Oxford TVG CRF RNN VOC[41]72.0Oxford TVG CRF RNN COCO[41]74.7Cross-Pretrain(Strong)72.7Cross-Joint(Strong)73.0Cross-Pretrain(Strong,Multi-scale net)73.6Cross-Joint(Strong,Multi-scale net)73.9Table6.VOC2012test performance using PASCAL and MS-COCO annotations(Sec.4.5).4.5.Exploiting Annotations Across Datasets Validation results We present experiments leveraging the 81-label MS-COCO dataset as an additional source of data in learning the DeepLab model for the21-label PASCAL VOC2012segmentation task.We consider three scenarios:•Cross-Pretrain(Strong):Pre-train DeepLab on MS-COCO,then replace the top-level network weights and fine-tune on Pascal VOC2012,using pixel-level anno-tation in both datasets.•Cross-Joint(Strong):Jointly train DeepLab on Pas-cal VOC2012and MS-COCO,sharing the top-level network weights for the common classes,using pixel-level annotation in both datasets.•Cross-Joint(Semi):Jointly train DeepLab on Pascal VOC2012and MS-COCO,sharing the top-level net-work weights for the common classes,using the pixel-level labels from PASCAL and varying the number of pixel-and image-level labels from MS-COCO.In all cases we use strong pixel-level annotations for all 10,582train aug PASCAL images.We report our results on the PASCAL VOC2012val in Table5,also including for comparison our best PASCAL-only67.6%result exploiting all10,582strong annotations as a baseline.When we employ the weak MS-COCO an-notations(EM-Fixed(Semi))we obtain67.7%IOU,which does not improve over the PASCAL-only baseline.How-ever,using strong labels from5,000MS-COCO images (4.0%of the MS-COCO dataset)and weak labels from the remaining MS-COCO images in the Cross-Joint(Semi) semi-supervised scenario yields70.0%,a significant2.4%boost over the baseline.This Cross-Joint(Semi)result is also1.3%better than the68.7%performance obtained us-ing only the5,000strong and no weak annotations from MS-COCO.As expected,our best results are obtained by using all123,287strong MS-COCO annotations,71.0%for Cross-Pretrain(Strong)and71.7%for Cross-Joint(Strong). We observe that cross-dataset augmentation improves by 4.1%over the best PASCAL-only ing only a small portion of pixel-level annotations and a large portion of image-level annotations in the semi-supervised setting reaps about half of this benefit.Test results We report our PASCAL VOC2012test re-sults in Table6.We include results of other leading models from the PASCAL leaderboard.All our models have been trained with pixel-level annotated images on the PASCAL trainval aug and the MS-COCO2014trainval datasets.Methods based on the DCNN+CRF pipeline of DeepLab-CRF[5]are the most competitive,with perfor-mance surpassing70%,even when only trained on PAS-CAL data.Leveraging the MS-COCO annotations brings about2%improvement.Our top model yields73.9%,using the multi-scale network architecture of[5].Also see[41], which also uses joint PASCAL and MS-COCO training,and further improves performance(74.7%)by end-to-end learn-ing of the DCNN and CRF parameters.4.6.Qualitative Segmentation ResultsIn Fig.6we provide visual comparisons of the results obtained by the DeepLab-CRF model learned with some of the proposed training methods.5.ConclusionsThe paper has explored the use of weak or partial anno-tation in training a state of art semantic image segmenta-tion model.Extensive experiments on the challenging PAS-CAL VOC2012dataset have shown that:(1)Using weak annotation solely at the image-level seems insufficient to train a high-quality segmentation model.(2)Using weak bounding-box annotation in conjunction with careful seg-mentation inference for images in the training set suffices to train a competitive model.(3)Excellent performance is obtained when combining a small number of pixel-level an-notated images with a large number of weakly annotated images in a semi-supervised setting,nearly matching the results achieved when all training images have pixel-level annotations.(4)Exploiting extra weak or strong annota-tions from other datasets can lead to large improvements. AcknowledgmentsThis work was partly supported by ARO62250-CS,and NIH5R01EY022247-03.We also gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.。
Virtual Books
![Virtual Books](https://img.taocdn.com/s3/m/e471af303968011ca3009146.png)
Virtual Books:Integrating Hypertext and Virtual RealityMaster’s Thesis of Jouke C. VerlindenGraduation Committee:prof. dr. H.G. Solir. C.A.P.G. van der Mastdr. Jay David Bolter (GVU Center, Georgia Tech)dr. James D. Foley (GVU Center, Georgia Tech)ir. B.R. SodoyerDelft University of Technology, Faculty of Technical Matematics and Informatics, HCI group.August 1993.Abstract“Think of computers as a medium, not as a tool” - Alan Kay in “The Art of User Inter-face Design”, 1989.Virtual Reality technology gives us new ways to represent information, based on spatial dis-play and multisensory interactivity. At present both commercial products and scientific re-search in VR create and explore relatively simple environments. These environments are often purely perceptual: that is, the user is placed a in world of color and shape that represents or re-sembles the “real” world. Objects (tables, doors, walls) in these environments have no deeper semantic significance.The Virtual Books project is an exploration of introducing semantics into three-dimensional space, by inclusions and manipulation of information, based on traditional writing technolo-gies (e.g. printed books) and the emerging electronic books (hypertexts, hypermedia etc.) Printed books often combine pictures and text. Hypermedia integrates texts with graphics, an-imation, video, and audio. Our goal is to extend these existing techniques of integration so that we can deploy text or other information in three dimensions and allow for effective interaction between the writer/reader/user and the text. We believe that this approach will provide solu-tions to prominent problems in the fields of hypertext and Virtual Reality. Four prototypes were developed to illustrate our ideas: The Georgia Tech Catalog, the Textured Book, the V oice Annotation System, and the World Processor. Silicon Graphics workstations with both immersive and non-immersive Virtual Reality technology were employed. To implement the prototypes, two software libraries were made (the bird and the SVE library); they facilitate easy creation and reuse of virtual environments. This project was done at the Graphics, Visual-ization, and Usability (GVU) Center, Georgia Tech, Atlanta, U.S.A. My advisor was dr. Jay David Bolter, professor in the School of Literature, Communications, and Culture.PrefaceAlmost 10 months of work are lying behind me. They seemed to have last a lifetime, that will come to an sudden end within a few days. Moreover, the project is the final step towards ob-taining my Master’s Degree in Computer Science -- a “project” that lasted 5 years! That means I can only say:The project has died, long live the project!Together with another Dutch exchange student, Anton Spaans, I have lived in Atlanta (Geor-gia, USA) for about eight months. We were both temporary members of the Graphics, Visual-ization, and Usability Center at the Georgia Institute of Technology. Daily (and nightly) we worked with advanced computer systems, faculty, and graduate students. These inspired me to do what I did and to pursue a further career in R&D. During my stay, I was also involved in various other activities, including the Apple Design Contest, the spatial audio research, and 3D algorithm visualization. And, of course, the band and the movie committee.It was a fascinating stay that taught me a lot. Not just about science or user interfaces: in those eight months I was a member of american society. A society in which the artificial has become natural -- a society that sells “I Can’t Believe it’s not Butter”™ and where the slogan “Just Add Water..” seems to be ubiquitous.AcknowledgmentsThe Dutch often say American friendships are superficial. Not in my experience: the people I met in Atlanta, Chapel Hill, Palo Alto, and so many other places turned out to be good friends. It is impossible to thank them all, even if I had ten months time to do so. I thank all people who made may stay as it was, and those who supported me during this unforgettable time. Es-pecially:Jay Bolter, my advisor at the GVU Center. His enthusiastic and open-minded approach made the project what it became. He treated me as a companion, not as a student. Yet he taught me so much...Jim Foley, for giving me the opportunity to come to his extraordinary lab and for putting me on the right track by introducing me to Jay.Joan Morton was an angel. She helped us whenever it was needed and did so many other things for the exchange students. Larry Hodges, who tolerated my work on “his” machines and introduced me to many other computer graphics researchers.Charles van der Mast, my advisor at the Delft University of Technology, who made this possi-ble. Without knowing him, I probably would not have ended up working abroad. Furthermore, he patiently awaited my results and provided me with suitable criticism.Daryl Lawton for bringing us to the fattest and fanciest dinner places. Mimi Recker, who ad-vised me during the usability tests. David Burgess and Beth Mynatt for distracting me from my actual project and involving me in their remarkable work.And of course all the GVU “Rats”, including: Jack Freeman, Jasjit Singh, Wayne Woolton, James O’Brien, Joe Wehrli, Heather Pritchett, Tom Meyer, Augusto op den Bosch, Anton Spaans, Mary-Ann Frogge, Jerome Salomon, Todd Griffith, Thomas Kuehme, Krishna Barat and all the others..The participants of the tests: Robert Hamilton (who I met again a month ago in Amsterdam), Gary Harrison, David Hamilton, and the eight students of Stuart Moultrop’s technical writing class.Dan Russell, who was my indespicable host at Xerox PARC. And of course the graduate stu-dents at UNC; especially Russell Taylor, who gave me the opportunity to have a look in the kitchen of the world famous Virtual Reality lab and introduced me to his friends Stephan, Rich, and John.The other members of the band: Tim, Ted and Mike. It was great to start a musical conversa-tion with you, guys!Dimitri, once a student and now a married engineer, who helped me tremendously during the last (and critical) days.My family and friends in holland, who didn’t forget me (even when I forgot them..) And fi-nally, I thank the one who supported me and had to deal with my stress during this long period that didn’t seem to end: Simone.Table of ContentsAbstract (1)Preface (2)Acknowledgments (3)1.Introduction (7)1.1Project8 1.2Environment9 1.2.1GVU Center 9 1.2.2Jay David Bolter 101.3Report112.Problem Analysis (12)2.1Background 1: Hypertext13 2.1.1Short (Hi)story of Media 13 2.1.2Hypertext 17 2.1.3Problems 21 2.2Background 2: Virtual Reality23 2.2.1Introduction 23 2.2.2Survey 24 2.2.3Problems with current Virtual Reality systems. 26 2.3Virtual Books: Integrating Hypertext and Virtual Reality28 2.3.1Proposal 28 2.3.2Related Research 282.3.3Requirements and Constraints 303.Functional Design (31)3.1Spatial Authoring concepts32 3.1.1Concepts of Hypertext environments 32 3.1.2Virtual Reality concepts 33 3.1.3Virtual Books Concepts 34 3.2Functionality36 3.3Presentation issues37 3.3.1Representation of hypertextual structure. 37 3.3.2Navigation and the representation of links. 37 3.3.3Representation of information. 38 3.3.4Virtual Reality issues. 38 3.4prototypes40 3.4.1Catalog 40 3.4.2Textured Book 42 3.4.3Voice Annotation System. 433.4.4World Processor 454.Technical Design (51)4.1Platform524.1.1Hardware 52 4.1.2Software Support 53 4.2Prototypes59 4.2.1The Catalog 59 4.2.2The Textured Book 59 4.2.3Speech Annotation System 594.2.4World Processor 595.Implementation and Evaluation (61)5.1The Catalog62 5.2Textured Book64 5.3Voice Annotation System67 5.4World Processor696.Conclusions and Future Research (71)6.1Conclusions and Results72 6.2 Future Research74Bibliography (77)Appendix A: PapersAppendix B: Prototypes ListAppendix C: Manuals of the Software LibrariesAppendix D: User’s Manual of the World ProcessorAppendix E: Voice Annotation testsAppendix F: A short report about my trip to Xerox and UNC1. IntroductionThe Master’s program of informatics at the Delft University of Technology requires a research project of six to nine months, with a thesis as result. Fortunately, through the contacts of Charles van der Mast (Delft University of Technology) with James Foley (Georgia Institute of Technology) I had the opportunity to work with prof. Jay David Bolter at the Graphics, Visual-ization and Usability Center in Atlanta, U.S.A., during a period of eight months. We explored our mutual interests in virtual reality, hypertext, writing and media. This Master’s thesis is considered to be the final, but certainly not the only result of our cooperation: 4 faculty reports and several videoclips were made as well.In the first months, october and november, we tried to formulate the Virtual Books Project as clear as possible. At the same time, I developed and implemented general GVU demonstra-tions for the Virtual Reality equipment. This equipment was recently purchased and just un-packed. In december, prof. Bolter went to Milan to give a keynote speech at ECHT’92, called “Virtual Reality and Hypertext”. A month later, I had the unexpected opportunity to visit three interesting research laboratories: the Virtual Reality lab at the university of North Carolina, Chapell Hill, the Xerox Palo Alto Research Center (PARC) and the world famous M.I.T. Me-dia Lab in Boston.By that time, I finished working on the lower-level software support (the bird- and the SVE-li-brary) and began to develop two complex Virtual Book-prototypes: the Voice Annotation sys-tem and the World Processor. Exploratory user tests were conducted during march and april. Both prototypes seemed interesting enough to start writing two separate papers on them, one has recently been accepted to the European Simulation Symposium (ESS ‘93), to be held on october 25-28, 1993 at the Delft University of Technology in the Netherlands.During the last month in Atlanta (may ‘93), I expanded the SVE- library and updated its docu-mentation in cooperation with Drew Kessler. One of the additions enables relatively easy tex-ture mapping, which was used in the last prototype called the Textured Book. After my return to Holland in june, I proceeded with writing this thesis and finishing the ESS ‘93 paper.1.1ProjectInitially, the project did not have distinct objectives. Jay Bolter and I introduced the term “Vir-tual Book”, which represented our interest in the exploration of Virtual Reality as a medium -a medium that could be used to communicate and structurize information in new ways. We fo-cused on some of the shortcomings of today’s upcoming electronic media: Hypertext and Vir-tual Reality.Hypertext and hypermedia are considered to be the new avenues in textual and reflective com-munication. These so-called “electronic books” have great perspectives. Their potential is in-creasing every day due to growing infrastructures and computing power. At the same time, these communication channels threaten the efficient and effective use of information. These disadvantages are often summarized as “information overload”. I will unravel this problem in several parts including 1) the getting lost in information space problem 2) the cognitive task switching problem. It will be argued that such problems are related to the limitations of the ap-plied metaphors and interaction techniques, that did not change significantly since the late six-ties.On the other hand, the sensory illusion of television, movies and computer games seem to be upgraded by the ultimate form of visual and engaging media:Virtual Reality. Virtual Reality is considered to be the most interactive medium of the future. The techniques involved generate three-dimensional environments that maximize the naturalness of the user interface - by three dimensional direct manipulation and perceptual immersion. Although the quality of the im-ages and devices has improved since its introduction in 1968, its theoretical potential did not change. The user is placed in a world of color and shape that represents or resembles the real world. Objects (tables, doors, walls) in these environments have no deeper semantic signifi-cance. This makes Virtual Reality a poor medium for symbolic communication.This project explores the integration of the traditional electronic books and virtual reality. Printed books often combine pictures and text. Hypermedia integrates texts with graphics, an-imation video and audio. Our goal is to extend these existing techniques of integration so that we can deploy information in three dimensions and allow for effective interaction between the writer/reader/user and the information. We think this synergetic approach will solve some of the most prominent problems in both fields, e.g. the “getting lost in hyperspace” problem. The project can be divided into three steps:1)Framing and testing ideas and testing in mockups or modest prototypes. These mockupsmay be on paper or in the computer. Of course, this phase includes a search of the rele-vant literature as well as attempts to get familiar with the available Virtual Reality hard-and software.2)Developing more elaborate prototypes that highlight specific aspects for creating andreading virtual books. This includes:a) developing a software layer that allows fast creation and modification of virtual book-prototypes.b) developing a prototype that illustrates how problems associated with current elec-tronic books can be solved or diminished.c) developing a prototype that illustrates how to add facilities for verbal communicationinto existing virtual reality applications.3)Based upon the second phase, I will: a) conduct some usability tests with groups of di-verse disciplines. b) identify strengths and weaknesses of the environments. c) draw con-clusions about the feasability and usefulness of such a virtual book and discussdirections for future research.1.2Environment1.2.1GVU CenterThe Graphics, Visualization and Usability Center is one of the most active and outstanding re-search institutes on Human-Computer Interfaces (HCI) in the world. The center houses a wide variety of faculty, who try to explore new frontiers of HCI. Members and graduate students or-igin from the College of Architecture, School for Civil Engeneering, College of Computing, School of Industrial and Systems Engineering, Office for Information Technology, School of Literature, Communication and Culture, School of Mathematics, Multimedia Technology Lab, and School of Psychology. James D. Foley, the well known computer graphics scientist, is the Center’s director. His careful management and open mindedness are the crucial driving forces to the quality and diversity of the Center’s research. His vision of the GVU is formulated as follows:“Making computers accesible and usable by every person represents the next and per-haps final great frontier int he computer/information revolution which has swept the world during the last half of this century.... The Center’s vision is of a world in which individuals are empowered in their everyday pursuits by the use of computers; of a world in which computers are used as easily and effectively as are automobiles, ste-reos, and telephones”(GVU 1992, p. 1)The Center’s research covers: realistic imagery, computer- supported collaborative work, al-gorithm visualization, medical imaging, image understanding, scientific data visualization, animation, user interface software, usability, virtual environments, image quality, user inter-faces for blind people,and expert systems in graphics and user interfaces. These projects are lead by several well know scientists, including John Stasko, Al Badre, Jessica Hodgins, Scott Hudson, Piyawadee “Noi” Sukaviriya and Christine Mitchell. Apart from the regular objective to publish and present high-quality scientific work, faculty and graduate students put a lot of effort into the creation of convincing demonstrations of their findings. MIT media lab’s “demo or die”-rule (brand 1987) seems to apply to the GVU as well: guided tours and demonstrations are frequently given to many visitors (including funders and scientists).Most graduate students do their research in the Graphics, Visualization, and Usability lab, which offers many high-end workstations and audio/video facilities. Furthermore, the lab also includes a conference room (with HCI library), a professional animation production area, and an isolated room for usability tests. A special “usability manager” takes care of the software, hardware and people of the lab (currently Suzan Liebeskind). However, the lab does not only provide technical support. The presence of so much “brains” in concentrated doses adds a so-cial dimension to the lab’s activities, a valuable -informal- communications channel that was certainly beneficial for my projects. Discussions, trouble-shooting sessions, and expert consul-tancies are held daily (and nightly!) every now and then. More formal meetings include the weekly brown bag meetings and the distinguished lecturer’s series (held each quarter). The completely renovated lab was officially opened 7 days after Anton Spaans and I arrived. This “convocation” day included several talks of celebrities in HCI research (e.g. Stuart Card and Andy van Dam) and, of course, many demonstrations of the GVU research in the lab.The GVU Center and its Lab can not easily be compared with the user interface research group at Delft. Apart from its interdisciplinary character and the wide variety of high-perfor-mance (graphics) workstations at the Center, there is another important difference between the the GVU lab and the HCI group at Delft: the GVU lab has a broad focus of research and does not fear to go beyond applied research. Companies like Siemens, SUN, DEC and Silicon Graphics fund projects that are focused on “technology Push”. This kind of research gets little attention in Delft, where research is primarily limited to applied problems, with its focus on validity and methods.As a part of the graphics research, professor Larry F. Hodges directs the virtual environments research group. At the beginning of the summer quarter in 1992 dr. Hodges ordered Virtual Reality equipment (see chapter 4.1 for a technical description). When I arrived in october ‘92 about 5 members were just unpacking the parts and trying to connect the systems together. From that moment on the research rapidly evolved to new, sophisticated uses and applications of virtual environments, including developing navigation interface techniques and metaphors, assessing display parameters for manipulation in virtual environments, making scientific visu-alization applications, and developing therapy for phobias (especially fear of heights). The group has weekly meetings to discuss strategies and the progress of the projects.1.2.2Jay David BolterMy advisor, Dr. Jay David Bolter is a professor in the School of Literature, Communication and Culture. He teaches technical writing, classical languages and the use of multimedia appli-cations. His research is directed toward communication, hypertext and new multimodal inter-faces for writing. He has written two books on the cultural and social significance of the computer: “Turing’s Man: Western Culture in the Computer Age” (1984) and “Writing Space: The Computer, Hypertext and the History of Writing” (1991). His books show that he is a gifted writer who has an understanding of both humanities and computer science. Apart from his writing, he co-designed and implemented a very interesting hypertext system called Sto-ryspace. In my experience, this application is one of the few hypertexts system that augments the writing task instead of disorienting the user with an overload of functionality. The usability is high, as can be noticed by the number of people that buy and employ it (it is comercially available for the Apple Macintosh).1.3ReportAlthough the project consisted of many small seemingly unrelated parts, one main thesis was pursued. This report will present the results of the Virtual Books project in a top-down fashion in 6 chapters. After describing the two backgrounds (hypertext and Virtual Reality), a more detailed discussion of Virtual Books is held in chapter 2 (Problem Analysis).Chapter 3 (Functional Design) elaborates on the the design of Virtual Books. It includes a dis-cussion of the general concepts, functions, and user interface issues. These evolve into the functional design of four prototypes:1) the Georgia Tech Catalog2) the Textured Book3) the V oice Annotator4) the World ProcessorTheir technical aspects are described in chapter 4 (Technical Design). This chapter also pre-sents a short overview of the computer hard/software that was used and the development of the Simple Virtual Environment(SVE) library.The implementation and evaluation of the prototypes appear in chapter 5 (Implementation and Evaluation). A short videoclip will accompany this report to illustrate the user interface and usability.The last chapter, chapter 6 (Conclusions and Future Research), includes the conclusions of this project and presents possibilities for future Virtual Books research.Several papers were written during this project, including: “The World Processor: an Interface for Textual Display and Manipulation in Virtual Reality”, “Virtual Annotation: Verbal Com-munication in Virtual Reality”, and “A First Experience with Spatial Audio in a Virtual Envi-ronment”. These articles can be found in Appendix A.Appendix B gives a short list of the prototypes; where they are located and how they are startedA description of the libraries that were developed during the project are presented in appendix C.Appendix D is the user’s manual that was used during the usability tests of the World Proces-sor.After the V oice Annotator was tested, the participants were interviewed. The questionaire and its answers are presented in the appendix E.Finally, appendix F is an informal report of my trip to Chapel Hill and Palo Alto.2. Problem AnalysisA more detailed analysis of the backgrounds is needed in order to design Virtual Books. The first sections of this chapter survey the two fields of interest: hypertext and Virtual Reality. Then I will propose to combine these two in the Virtual Books project. By integrating hyper-text facilities in virtual reality applications or adding virtual reality interfaces to hypertext sys-tems, some prominent problems of these fields can be solved.2.1Background 1: HypertextWe are living in the information age: our society produces, consumes and transforms data. The importance of information increases every day, and yet at the same time the amount of data seems to grow without bounds. Some researchers hold that one issue of a today’s newspaper contains more information than a medeival human would have encountered in his or her entire life.Computer science and informatics have introduced a paradigm to model the enormous com-plexity of generating and processing information. This paradigm considers all entities and ac-tivities that are involved during information processing as information systems - human beings as well as computers, faxes, phones etc.I will present a different view on information systems. It is a more literary view, focused on the history of writing and communication as it can be found in Jay Bolter’s book “Writing Space”. Instead of seeing computerized information systems as tools to process information, they are thought to be the decessors of earlier media (papyrus, codex, printed book). The ori-gins of information and its purposes (i.e. communication) have to be considered. In this vision, one of the most distinctive concepts computers contributed to media technology was hyper-text. This will be discussed in the second part of this section. However, current hypertext sys-tems do not always prove to be beneficial. The last section will identify the most prominent problems of these applications.2.1.1Short (Hi)story of MediaIn this section I want to introduce some aspects of media that seem to be relevant to this project. None of the ideas mentioned below are new, most of them originate from Jay Bolter’s book “Writing Space” (Bolter 1991). Reading this work results in a paradigm shift; I don’t percieve computers as tools (information processors with widgets) any more, but as media (substrates for communication).In his book, Jay Bolter initially focuses on the history of writing. One of the most important milestones in history was the introduction of the printing press by Gutenberg in the seven-tienth century. It ended an era in which the written page was only shared by an elite (monks and royal community). The rest of society was not able to read or write and relied on the estab-lisment to pass on information1. Society’s balance was disrupted by the printing press. Sud-denly everybody could purchase a book and get information at first hand; established authorities losed their exclusive rights to share (and create) information. In his book, Jay Bolter tells about the social and cultural impact of new technology. More specifically, he dis-cusses how the shift from traditional to electronic media will change writing and reading. Be-fore presenting electronic media in more detail, a short and a rather simplified excursion into cognitive science will be held to explain the word “medium”. Simplistically speaking, our thoughts are chunked into small entities, optimistically called “ideas” or “concepts” (this ap-proach to thinking is discussed further in the next section on hypertext). To communicate with others, we cluster our thoughts into “information”, and transfer those to a specific mediumthe ancient greek strongly preferred speech to written communication; the last was considered to weaken intellectual skills and memory(e.g. air for speaking, paper for writing/reading). The transfer from brain to medium involves a representation scheme.Figure 1: medium, representation scheme and thoughtsThis representation scheme is a structure or template to shape thoughts into symbols, that are in fact elements that can be embedded in the medium. In other words, the abstract scheme is a method to make our thoughts publicly available while the medium serves as a substrate for symbols. Spoken and written language are the most popular schemes in our everyday life. Other existing media (e.g. television, fax) do afford other (but not necessarily disjunct) repre-sentation schemes. Theories on media, schemes, and communication are rapidly evolving. As for now, we will focus on the history and potential of computer-based media. In the post-war period the introduction of computer technology slowly changed society. At first, the expensive power of computers was only exploited for mathematical purposes. A handful of visionaries accomplished the thought that computers were general-purpose machines; due to its quick ac-cess storage memory, the ability to create huge communication networks and its capacity to manipulate symbols, the computer has an unequalled power to act as a new medium. Influ-enced by the great media-guru Marshall McLuhan, computer scientist Alan Kay1 points out the computer’s unique properties in the context of media: he considers the computer as a meta-medium: a container that can hold information of any form, representation schemes and me-dia. Kay was familiar with the possibilities of digital media, in which arbitrary information is converted into digital symbols before storing it into the computer memory. At present, audio, video, pictures, and text exist side by side in popular multimedia systems.The number of facilities to exchange information by computer are rapidly increasing. In the seventies it started with electronic mail and Bulletin Board Systems. Today, a wide variety of communication channels can be used including:• the Online Book Initiative- a database that can be reached on the internet2. It includes elec-tronic versions of literature, children’s books, fairy tales and poems.• the USENET news system - a distributed Bulletin Board-alike system that includes hundreds of discussion groups. The subjects vary from antroposophy to computer science and from rock groups to biology, all these groups receive for about 20-100 postings a day. USENET news is often employed as an informal communication channel among scientists to discuss ongoing research and opinions.•Gopher- a distributed database with campus information, electronic versions of technical re-graphical interfaces. He also introduced the imaginary personal desktop computer called dynabook.2. the internet is a worldwide cluster of networks that connects universities, researchinstitutes and several industries。
Discovering Similar Multidimensional Trajectories
![Discovering Similar Multidimensional Trajectories](https://img.taocdn.com/s3/m/028c148084868762caaed5f8.png)
Discovering Similar Multidimensional TrajectoriesMichail VlachosUC Riverside mvlachos@George KolliosBoston Universitygkollios@Dimitrios GunopulosUC Riversidedg@AbstractWe investigate techniques for analysis and retrieval of object trajectories in a two or three dimensional space. Such kind of data usually contain a great amount of noise, that makes all previously used metrics fail.Therefore,here we formalize non-metric similarity functions based on the Longest Common Subsequence(LCSS),which are very ro-bust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to the similar portions of the sequences.Stretching of sequences in time is allowed,as well as global translating of the se-quences in space.Efficient approximate algorithms that compute these similarity measures are also provided.We compare these new methods to the widely used Euclidean and Time Warping distance functions(for real and synthetic data)and show the superiority of our approach,especially under the strong presence of noise.We prove a weaker ver-sion of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries.Finally,we present experimental results that validate the accuracy and efficiency of our approach.1IntroductionIn this paper we investigate the problem of discovering similar trajectories of moving objects.The trajectory of a moving object is typically modeled as a sequence of con-secutive locations in a multidimensional(generally two or three dimensional)Euclidean space.Such data types arise in many applications where the location of a given object is measured repeatedly over time.Examples include features extracted from video clips,animal mobility experiments, sign language recognition,mobile phone usage,multiple at-tribute response curves in drug therapy,and so on.Moreover,the recent advances in mobile computing, sensor and GPS technology have made it possible to collect large amounts of spatiotemporal data and there is increas-ing interest to perform data analysis tasks over this data [4].For example,in mobile computing,users equipped with mobile devices move in space and register their location at different time instants via wireless links to spatiotemporal databases.In environmental information systems,tracking animals and weather conditions is very common and large datasets can be created by storing locations of observed ob-jects over time.Data analysis in such data include deter-mining andfinding objects that moved in a similar way or followed a certain motion pattern.An appropriate and ef-ficient model for defining the similarity for trajectory data will be very important for the quality of the data analysis tasks.1.1Robust distance metrics for trajectoriesIn general these trajectories will be obtained during a tracking procedure,with the aid of various sensors.Here also lies the main obstacle of such data;they may contain a significant amount of outliers or in other words incorrect data measurements(unlike for example,stock data which contain no errors whatsoever).Figure1.Examples of2D trajectories.Two in-stances of video-tracked time-series data representingthe word’athens’.Start&ending contain many out-liers.Athens 1Athens 2Boston 1Boston 2DTWLCSSFigure 2.Hierarchical clustering of 2D series (displayed as 1D for clariry).Left :The presence of many outliers in the beginning and the end of the sequences leads to incorrect clustering.DTW is not robust under noisy conditions.Right :The focusing on the common parts achieves the correct clustering.Our objective is the automatic classification of trajec-tories using Nearest Neighbor Classification.It has been shown that the one nearest neighbor rule has asymptotic er-ror rate that is at most twice the Bayes error rate[12].So,the problem is:given a database of trajectories and a query (not already in the database),we want to find the trajectory that is closest to .We need to define the following:1.A realistic distance function,2.An efficient indexing scheme.Previous approaches to model the similarity between time-series include the use of the Euclidean and the Dy-namic Time Warping (DTW)distance,which however are relatively sensitive to noise.Distance functions that are ro-bust to extremely noisy data will typically violate the trian-gular inequality.These functions achieve this by not consid-ering the most dissimilar parts of the objects.However,they are useful,because they represent an accurate model of the human perception,since when comparing any kind of data (images,trajectories etc),we mostly focus on the portions that are similar and we are willing to pay less attention to regions of great dissimilarity.For this kind of data we need distance functions that can address the following issues:Different Sampling Rates or different speeds.The time-series that we obtain,are not guaranteed to be the outcome of sampling at fixed time intervals.The sensors collecting the data may fail for some period of time,leading to inconsistent sampling rates.Moreover,two time series moving at exactly the similar way,but one moving at twice the speed of the other will result (most probably)to a very large Euclidean distance.Similar motions in different space regions .Objectscan move similarly,but differ in the space they move.This can easily be observed in sign language recogni-tion,if the camera is centered at different positions.If we work in Euclidean space,usually subtracting the average value of the time-series,will move the similar series closer.Outliers.Might be introduced due to anomaly in the sensor collecting the data or can be attributed to hu-man ’failure’(e.g.jerky movement during a track-ing process).In this case the Euclidean distance will completely fail and result to very large distance,even though this difference may be found in only a few points.Different lengths.Euclidean distance deals with time-series of equal length.In the case of different lengths we have to decide whether to truncate the longer series,or pad with zeros the shorter etc.In general its use gets complicated and the distance notion more vague.Efficiency.It has to be adequately expressive but suf-ficiently simple,so as to allow efficient computation of the similarity.To cope with these challenges we use the Longest Com-mon Subsequence (LCSS)model.The LCSS is a varia-tion of the edit distance.The basic idea is to match two sequences by allowing them to stretch,without rearranging the sequence of the elements but allowing some elements to be unmatched .The advantages of the LCSS method are twofold:1)Some elements may be unmatched,where in Eu-clidean and DTW all elements must be matched,even the outliers.2)The LCSS model allows a more efficient approximatecomputation,as will be shown later(whereas in DTW you need to compute some costly Norm).Infigure2we can see the clustering produced by the distance.The sequences represent data collected through a video tracking process.Originally they represent 2d series,but only one dimension is depicted here for clar-ity.The fails to distinguish the two classes of words, due to the great amount of outliers,especially in the begin-ning and in the end of the ing the Euclidean distance we obtain even worse results.The produces the most intuitive clustering as shown in the samefigure. Generally,the Euclidean distance is very sensitive to small variations in the time axis,while the major drawback of the is that it has to pair all elements of the sequences.Therefore,we use the model to define similarity measures for trajectories.Nevertheless,a simple extension of this model into2or more dimensions is not sufficient, because(for example)this model cannot deal with paral-lel movements.Therefore,we extend it in order to address similar problems.So,in our similarity model we consider a set of translations in2or more dimensions and wefind the translation that yields the optimal solution to the problem.The rest of the paper is organized as follows.In section2 we formalize the new similarity functions by extending the model.Section3demonstrates efficient algorithms to compute these functions and section4elaborates on the indexing structure.Section5provides the experimental validation of the accuracy and efficiency of the proposed approach and section6presents the related work.Finally, section7concludes the paper.2Similarity MeasuresIn this section we define similarity models that match the user perception of similar trajectories.First we give some useful definitions and then we proceed by presenting the similarity functions based on the appropriate models.We assume that objects are points that move on the-plane and time is discrete.Let and be two trajectories of moving objects with size and respectively,whereand.For a trajectory,let be the sequence.Definition1Given an integer and a real number,we define the as follows:A Ba and andThe constant controls how far in time we can go in order to match a given point from one trajectory to a point in another trajectory.The constant is the matching threshold(see figure3).Thefirst similarity function is based on the and the idea is to allow time stretching.Then,objects that are close in space at different time instants can be matched if the time instants are also close.Definition2We define the similarity function between two trajectories and,given and,as follows:Definition3Given,and the family of translations,we define the similarity function between two trajectories and,as follows:So the similarity functions and range from to. Therefore we can define the distance function between two trajectories as follows:Definition4Given,and two trajectories and we define the following distance functions:andNote that and are symmetric.is equal to and the transformation that we use in is translation which preserves the symmetric prop-erty.By allowing translations,we can detect similarities be-tween movements that are parallel in space,but not iden-tical.In addition,the model allows stretching and displacement in time,so we can detect similarities in move-ments that happen with different speeds,or at different times.Infigure4we show an example where a trajectory matches another trajectory after a translation is applied. Note that the value of parameters and are also important since they give the distance of the trajectories in space.This can be useful information when we analyze trajectory data.XFigure4.Translation of trajectory.The similarity function is a significant improvement over the,because:i)now we can detect parallel move-ments,ii)the use of normalization does not guarantee that we will get the best match between two u-ally,because of the significant amount of noise,the average value and/or the standard deviation of the time-series,that are being used in the normalization process,can be distorted leading to improper translations.3Efficient Algorithms to Compute the Simi-larity3.1Computing the similarity functionTo compute the similarity functions we have to run a computation for the two sequences.Thecan be computed by a dynamic programming algorithm in time.However we only allow matchings when the difference in the indices is at most,and this allows the use of a faster algorithm.The following lemma has been shown in[5],[11].Lemma1Given two trajectories and,with and,we canfind the in time.If is small,the dynamic programming algorithm is very efficient.However,for some applications may need to be large.For that case,we can speed-up the above computa-tion using random sampling.Given two trajectories and ,we compute two subsets and by sampling each trajectory.Then we use the dynamic programming algo-rithm to compute the on and.We can show that,with high probability,the result of the algorithm over the samples,is a good approximation of the actual value. We describe this technique in detail in[35].3.2Computing the similarity functionWe now consider the more complex similarity function .Here,given two sequences,and constants, we have tofind the translation that maximizes the length of the longest common subsequence of()over all possible translations.Let the length of trajectories and be and re-spectively.Let us also assume that the translationis the translation that,when applied to,gives a longest common subsequence,and it is also the translation that maximizes the length of the longest common subsequence.The key observation is that,although there is an infinite number of translations that we can apply to,each transla-tion results to a longest common subsequence between and,and there is afinite set of possible longest common subsequences.In this section we show that we can efficiently enumerate afinite set of translations,such that this set provably includes a translation that maximizes the length of the longest common subsequence of and .To give a bound on the number of transformations that we have to consider,we look at the projections of the two trajectories on the two axes separately.We define theprojection of a trajectoryto be the sequence of the valueson the -coordinate:.A one di-mensional translation is a function that adds a con-stant to all the elements of a 1-dimensional sequence:.Take the projections of and ,and respec-tively.We can show the following lemma:Lemma 2Given trajectories ,if ,then the length of the longest common subsequence of the one dimensional sequences and,is at least :.Also,.Now,consider and .A translation by ,applied to can be thought of as a linear transformation of the form .Such a transformation will allowto be matched to all for which ,and.It is instructive to view this as a stabbing problem:Con-sider the vertical line segments,where (Figure 5).Bx,i By,i+2Bx,i+3Bx,i+4Bx,i+5Ax,iAx,i+1Ax,i+2fc1(x) = x + c1fc2(x) = x + c2Ax axisBx axisFigure 5.An example of two translations.These line segments are on a two dimensional plane,where on the axis we put elements ofand on the axis we put elements of .For every pair of elementsin and that are within positions from eachother (and therefore can be matched by the algo-rithm if their values are within ),we create a vertical line segment that is centered at the point and extends above and below this point.Since each element in can be matched with at most elements in ,the total number of such line segments is .A translation in one dimension is a function of the form .Therefore,in the plane we de-scribed above,is a line of slope 1.After translatingby ,an element of can be matched to an el-ement of if and only if the line intersects the line segment .Therefore each line of slope 1defines a set of possi-ble matchings between the elements of sequences and.The number of intersected line segments is actually an upper bound on the length of the longest common sub-sequence because the ordering of the elements is ignored.However,two different translations can result to different longest common subsequences only if the respective lines intersect a different set of line segments.For example,the translations and in figure 5intersect different sets of line segments and result to longest common subsequences of different length.The following lemma gives a bound on the number of possible different longest common subsequences by bound-ing the number of possible different sets of line segments that are intersected by lines of slope 1.Lemma 3Given two one dimensional sequences ,,there are lines of slope 1that intersect different sets of line segments.Proof:Let be a line of slope 1.If we move this line slightly to the left or to the right,it still in-tersects the same number of line segments,unless we cross an endpoint of a line segment.In this case,the set of inter-sected line segments increases or decreases by one.There are endpoints.A line of slope 1that sweeps all the endpoints will therefore intersect at most different sets of line segments during the sweep.In addition,we can enumerate the trans-lations that produce different sets of potential matchings byfinding the lines of slope 1that pass through the endpoints.Each such translation corresponds to a line .This set of translations gives all possible matchings for a longest common subsequence of .By applying the same process on we can also find a set of translations that give all matchings of.To find the longest common subsequence of the se-quences we have to consider only thetwo dimensional translations that are created by taking the Cartesian product of the translations on and the trans-lations on .Since running the LCSS algorithm takeswe have shown the following theorem:Theorem 1Given two trajectories and,withand ,we can compute theintime.3.3An Efficient Approximate AlgorithmTheorem 1gives an exact algorithm for computing ,but this algorithm runs in cubic time.In this section we present a much more efficient approximate algorithm.The key in our technique is that we can bound the difference be-tween the sets of line segments that different lines of slope 1intersect,based on how far apart the lines are.Consider again the one dimensional projections. Lets us consider the translations that result to different sets of intersected line segments.Each translation is a line of the form.Let us sort these trans-lations by.For a given translation,let be the set of line segments it intersects.The following lemma shows that neighbor translations in this order intersect similar sets of line segments.Lemma4Let be the different translations for sequences and,where.Then the symmetric difference.We can now prove our main theorem:Theorem2Given two trajectories and,with and,and a constant,we canfind an ap-proximation of the similaritysuch that intime.Proof:Let.We consider the projections of and into the and axes.There exists a translation on only such that is a superset of the matches in the optimal of and.In addition,by the previous lemma,there are translations()that have at most different matchings from the optimal. Therefore,if we use the translations,fortime if we runpairs of translations in the plane.Since there is one that is away from the optimal in each dimension,there is one that is away from the optimal in2dimensions.Setting-th quantiles for each set,pairs of translations.4.Return the highest result.4Indexing Trajectories for Similarity Re-trievalIn this section we show how to use the hierarchical tree of a clustering algorithm in order to efficiently answer near-est neighbor queries in a dataset of trajectories.The distance function is not a metric because it does not obey the triangle inequality.Indeed,it is easy to con-struct examples where we have trajectories and, where.This makes the use of traditional indexing techniques diffi-cult.We can however prove a weaker version of the triangle inequality,which can help us avoid examining a large por-tion of the database objects.First we define:Clearly,or in terms of distance:In order to provide a lower bound we have to maximize the expression.Therefore,for every node of the tree along with the medoid we have to keep the trajectory that maximizes this expression.If the length of the query is smaller than the shortest length of the trajec-tories we are currently considering we use that,otherwise we use the minimum and maximum lengths to obtain an approximate result.4.2Searching the Index tree for Nearest Trajec-toriesWe assume that we search an index tree that contains tra-jectories with minimum length and maximum length .For simplicity we discuss the algorithm for the1-Nearest Neighbor query,where given a query trajectory we try tofind the trajectory in the set that is the most sim-ilar to.The search procedure takes as input a nodein the tree,the query and the distance to the closest tra-jectory found so far.For each of the children,we check if the child is a trajectory or a cluster.In case that it is a trajectory,we just compare its distance to with the current nearest trajectory.If it is a cluster,we check the length of the query and we choose the appropriate value for .Then we compute a lower bound to the distance of the query with any trajectory in the cluster and we compare the result with the distance of the current near-est neighbor.We need to examine this cluster only if is smaller than.In our scheme we use an approximate algorithm to compute the.Consequently,the value offrom the bound we compute for.Note that we don’t need to worry about the other terms since they have a negative sign and the approximation algorithm always underestimates the .5Experimental EvaluationWe implemented the proposed approximation and index-ing techniques as they are described in the previous sec-tions and here we present experimental results evaluating our techniques.We describe the datasets and then we con-tinue by presenting the results.The purpose of our experi-ments is twofold:first,to evaluate the efficiency and accu-racy of the approximation algorithm presented in section3 and second to evaluate the indexing technique that we dis-cussed in the previous section.Our experiments were run on a PC AMD Athlon at1GHz with1GB RAM and60 GB hard disk.5.1Time and Accuracy ExperimentsHere we present the results of some experiments using the approximation algorithm to compute the similarity func-tion.Our dataset here comes from marine mammals’satellite tracking data.It consists of sequences of geo-graphic locations of various marine animals(dolphins,sea lions,whales,etc)tracked over different periods of time, that range from one to three months(SEALS dataset).The length of the trajectories is close to.Examples have been shown infigure1.In table1we show the computed similarity between a pair of sequences in the SEALS dataset.We run the exact and the approximate algorithm for different values of and and we report here some indicative results.is the num-ber of times the approximate algorithm invokes the procedure(that is,the number of translations that we try).As we can see,for and we get very good results.We got similar results for synthetic datasets.Also, in table1we report the running times to compute the simi-larity measure between two trajectories of the same dataset. The approximation algorithm uses again from to differ-ent runs.The running time of the approximation algorithm is much faster even for.As can be observed from the experimental results,the running times of the approximation algorithm is not pro-portional to the number of runs().This is achieved by reusing the results of previous translations and terminat-ing early the execution of the current translation,if it is not going to yield a better result.The main conclusion of the above experiments is that the approximation algorithm can provide a very tractable time vs accuracy trade-off for computing the similarity between two trajectories,when the similarity is defined using the model.5.2Classification using the Approximation Algo-rithmWe compare the clustering performance of our method to the widely used Euclidean and DTW distance functions. Specifically:cover.htmlSimilarityApproximate for K tries Exact9494250.250.3160.18460.2530.00140.0022 0.50.5710.4100.5100.00140.0022 0.250.3870.1960.3060.00180.00281 0.50.6120.4880.5630.00180.00280 0.250.4080.2500.3570.001910.0031 0.50.6530.4400.5840.001930.0031 Table1.Similarity values and running times between two sequences from our SEALS dataset.1.The Euclidean distance is only defined for sequencesof the same length(and the length of our sequences vary considerably).We tried to offer the best possible comparison between every pair of sequences,by slid-ing the shorter of the two trajectories across the longer one and recording their minimum distance.2.For DTW we modified the original algorithm in orderto match both x and y coordinates.In both DTW and Euclidean we normalized the data before computing the distances.Our method does not need any normal-ization,since it computes the necessary translations.3.For LCSS we used a randomized version with andwithout sampling,and for various values of.The time and the correct clusterings represent the average values of15runs of the experiment.This is necessary due to the randomized nature of our approach.5.2.1Determining the values for&The values we used for and are clearly dependent on the application and the dataset.For most datasets we had at our disposal we discovered that setting to more than of the trajectories length did not yield significant improvement.Furthermore,after some point the similarity stabilizes to a certain value.The determination of is appli-cation dependent.In our experiments we used a value equal to the smallest standard deviation between the two trajec-tories that were examined at any time,which yielded good and intuitive results.Nevertheless,when we use the index the value of has to be the same for all pairs of trajectories.5.2.2Experiment1-Video tracking data.The2D time series obtained represent the X and Y position of a human tracking feature(e.g.tip offinger).In conjuc-tion with a”spelling program”the user can”write”various words[19].We used3recordings of5different words.The data correspond to the following words:’athens’,’berlin’,’london’,’boston’,’paris’.The average length of the series is around1100points.The shortest one is834points and the longest one1719points.To determine the efficiency of each method we per-formed hierarchical clustering after computing thepairwise distances for all three distance functions.We eval-uate the total time required by each method,as well as the quality of the clustering,based on our knowledge of whichword each trajectory actually represents.We take all possi-ble pairs of words(in this case pairs)and use the clustering algorithm to partition them into two classes.While at the lower levels of the dendrogram the clustering is subjective,the top level should provide an accurate divi-sion into two classes.We clustered using single,complete and average linkage.Since the best results for every dis-tance function are produced using the complete linkage,we report only the results for this approach(table2).The same experiment is conducted with the rest of the datasets.Exper-iments have been conducted for different sample sizes and values of(as a percentage of the original series length).The results with the Euclidean distance have many clas-sification errors and the DTW has some errors,too.For the LCSS the only real variations in the clustering are for sam-ple sizes.Still the average incorrect clusterings for these cases were constantly less than one().For 15%sampling or more,there were no errors.5.2.3Experiment2-Australian Sign LanguageDataset(ASL).The dataset consists of various parameters(such as the X,Y, Z hand position,azimuth etc)tracked while different writ-ers sign one the95words of the ASL.These series are rel-atively short(50-100points).We used only the X and Y parameters and collected5recordings of the following10 words:’Norway’,’cold’,’crazy’,’eat’,’forget’,’happy’,’innocent’,’later’,’lose’,’spend’.This is the experiment conducted also in[25](but there only one dimension was used).Examples of this dataset can be seen infigure6.Correct Clusterings(out of10)Complete Linkage Euclidean34.96DTW237.6412.7338.04116.17328.85145.06565.203113.583266.753728.277Distance Time(sec)CorrectClusterings(out of45)ASL with noiseEuclidean 2.271520Figure 7.ASL data :Time required to compute the pairwise distances of the 45combinations(same for ASL and ASL withnoise)Figure 8.Noisy ASL data :The correct clusterings of the LCSS method using complete linkage.Figure 9.Performance for increasing number of Near-est Neighbors.Figure 10.The pruning power increases along with the database size.jectories.We executed a set of -Nearest Neighbor (K-NN)queries for ,,,and and we plot the fraction of the dataset that has to be examined in order to guarantee that we have found the best match for the K-NN query.Note that in this fraction we included the medoids that we check during the search since they are also part of the dataset.In figure 9we show some results for -Nearest Neigh-bor queries.We used datasets with ,and clusters.As we can see the results indicate that the algorithm has good performance even for queries with large K.We also per-formed similar experiments where we varied the number of clusters in the datasets.As the number of clusters increased the performance of the algorithm improved considerably.This behavior is expected and it is similar to the behavior of recent proposed index structures for high dimensional data [9,6,21].On the other hand if the dataset has no clusters,the performance of the algorithm degrades,since the major-ity of the trajectories have almost the same distance to the query.This behavior follows again the same pattern of high dimensional indexing methods [6,36].The last experiment evaluates the index performance,over sets of trajectories with increasing cardinality.We in-dexed from to trajectories.The pruning power of the inequality is evident in figure 10.As the size of the database increases,we can avoid examining a larger frac-tion of the database.6Related WorkThe simplest approach to define the similarity between two sequences is to map each sequence into a vector and then use a p-norm distance to define the similarity measure.The p-norm distance between two n-dimensional vectors and is defined as。
![METHOD AND APPARATUS FOR FORMING A NONWOVEN FIBROU](https://img.taocdn.com/s3/m/f915d2fea6c30c2258019e0d.png)
专利名称:METHOD AND APPARATUS FOR FORMINGA NONWOVEN FIBROUS WEB发明人:GENTILE A,US,HAUCK C,US申请号:US3772107D申请日:19711103公开号:US3772107A公开日:19731113专利内容由知识产权出版社提供摘要:A method of forming a nonwoven fibrous web from multiple laps of staple fibers, the major proportion of fibers within each lap being substantially oriented in one direction. Laps of the staple fibers are fed into overlying relationship with the major proportion of oriented fibers in each lap disposed in the same direction. Air is entrapped between adjacent laps as they are fed into overlying relationship to create an air barrier between adjacent overlying laps. The overlying laps are spread transversely of the direction in which the major proportion of fibers are oriented, and fibers within overlying laps are reoriented transversely within the plane of the laps out of the direction in which the major proportion of fibers are oriented while maintaining the air barrier between the adjacent overlying laps. The laps are pressed together after the lap spreading and fiber reorienting steps to form a unitary nonwoven fibrous web and fibers of the unitary nonwoven fibrous web are then bonded together.申请人:GENTILE A,US,HAUCK C,US更多信息请下载全文后查看。
Journal Citation Reports 中文说明书
![Journal Citation Reports 中文说明书](https://img.taocdn.com/s3/m/fb3202255901020207409cfd.png)
Journal Citation Reports中文使用手冊本公司已於2010年6月25日確認此為最新版本.目次目次 (2)JCR簡介 (4)JCR的使用者 (4)直接連結JCR (5)從ISI Web of Knowledge平台進入 (5)Links from Web of Science(從WOS進入) (6)Journal Search Screen期刊搜尋畫面 (9)Journal Search Options期刊檢索項目選擇 (9)Journal Summary List 查詢結果清單 (10)Full Record Page 全記錄 (11)Journal Rank in Categories (13)Eigenfactor TM & Article Influence TM (14)Impact Factor 影響指數 (15)Five Year Impact Factors 五年影響指數 (15)Immediacy Index 立即指數 (16)Cited Half-Life 被引用半衰期 (16)Cited Journal Graph 被引用期刊圖表 (17)Citing Half-Life 引用半衰期 (17)Citing Journal Graph引用期刊圖表 (18)Source Data原始資料 (18)Cited Journal Data 被引用期刊清單 (19)Citing Journal List 引用期刊清單 (20)Related Journals 相關期刊 (21)Impact Factor Trend Graph影響指數圖表 (22)View Journals by Subject Category 以分類瀏覽期刊 (24)Sort Again再次排序 (25)View Category Data 看瀏覽資訊 (26)Journals Related to Aggregate Subject Category 學科領域的相關期刊 (29)Marking Records選取期刊 (30)Marked List個人選取清單 (31)Printing Records列印 (31)Saving Records儲存 (32)Importing Saved Records to Microsoft Excel 將儲存資料輸出到EXCEL (32)Journal Title Changes 期刊名稱變更 (35)Unified Impact Factors合併影響指數 (36)登入EndNote Web (38)建立EndNote Web中的參考文獻 (40)管理EndNote Web中的參考文獻 (42)引用EndNote Web中的參考文獻 (44)選擇書目格式 (46)編輯引用文獻 (47)移除參數 (48)Contacting Thomson Reuters (49)聯絡碩睿資訊有限公司 (49)JCR簡介Journal Citation Reports(JCR)是一個獨特的各種學科期刊評鑑工具。
单细胞混样品测序后数据拆分(CellHashing技术)最近有学徒提到她在复现⽂献:《utative regulators for the continuum of erythroid differentiation revealed by single-cell transcriptome ofhuman BM and UCB cells.》的单细胞数据分析的时候.她发现作者明明是提到了6个样品,但是其数据集是:https:///geo/query/acc.cgi?acc=GSE150774,可以看到是5个数据:GSM4558614 CD235a+ cells - umbilical cord blood 2 and umbilical cord blood 3,UCB3GSM4558615 CD235a+ cells - umbilical cord blood 1GSM4558616 CD235a+ cells - Bone Marrow 2GSM4558617 CD235a+ cells - Bone Marrow 3GSM4558618 CD235a+ cells - Bone Marrow 4仔细看了看⽂章⾥⾯的描述,确实是 10x Genomics platform 这个技术,是 3 adult bone marrow and 3 umbilical cord blood samples ,合起来是6个样品,⽽且提前做了细胞分选,仅仅是关注 CD235a+ cells学徒以为是作者数据整理上传失败,其实是cell hashing技术,⼤家可以先去了解 CITE-seq技术,它可以同时拿到普通基因的表达量矩阵,以及⼏⼗个蛋⽩质(通过antibody-derived tags (ADT))的表达量矩阵,该技术的全称为cellular indexing of transcriptomes and epitopes by sequencing。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
By means of HEX programs, powerful meta-reasoning becomes available in a decidable setting, e.g., for SemanticWeb applications, for meta-interpretation in ASP itself, or for defining policy languages. For example, advanced closed world reasoning or the definition of constructs for an extended ontology language (e.g., of RDF-Schema) is wellsupported. Due to the higher-order features, the representation is succinct. An experimental prototype implementation of the language is available, based on a reduction to ordinary ASP.
cuwovymxpu tes the predicate t taking values from the predicate
. This latter predicates extracts RDF statements from
the set of URI specified by means of § ; this task is delegated
p , "$"$3 are higher-order atoms,
are either higher-order atoms or exter-
nal atoms. The operator “AeUf ” is negation as fail-
ufw orre {m"$(o$uw"rqAwdr sej f a"ua$lr"te
An external atom facilitates to determine the truth value of an atom through an external source of computation. For in-
sta7 n¤9c8Ae@C,B$t)EhDGeFrHPuICle)RQGBTSU¨VXW¥Ya`cbed f(gih9¤98A@CB$)EDGFHPIC)RQGBTSU¨3)Pqp3r3¤§f(gs¨
DLV-HEX: Dealing with Semantic Web under Answer-Set Programming
Thomas Eiter, Giovambattista Ianni, Roman Schindlauer, and Hans Tompits Institut fu¨r Informationssysteme, Technische Universita¨t Wien,
ns edugtTav9twxio n").$$$ Awynjeez ,xtewrhnearle as to m"$$i$s s two lueist| s of terms (called input and
od f
the and
list, respectively), and is an external predicate name.
2 HEX Programs
HEX programs are sets of rules of the form
$""V U"$"EsdA seqf gdqhi q$"$ seqf gjk (1)
wanhdereA ql $"n$m 3gj o
whAerehigs¢he$r"-o$rd es rd
atom are
(or atom) is terms. It
a tuple s¡ v s
is possible
q$"$$ s d z ,
to specify
¥§m|¤¦ £ of©r ¥§¨le¦ cf© ¨uv l|¤e£s owf z
awto mv |¤s£
frz £a. mt ei-sloagsich-olriktceustyfnotraxth.eFcoorninjusntacnticoen,
The semaห้องสมุดไป่ตู้tics of HEX program is given by generalizing
the answer-set semantics [Eiter et al., 2005]. We note that the
answer-set semantics may yield no, one, or multiple models
However, for important issues such as meta-reasoning in the context of the Semantic Web, no adequate answer-set engines have been available so far. Motivated by this fact and the observation that, furthermore, interoperability with other software is an important issue (not only in this context), Eiter et al. [2005] extended the answer-set semantics to HEX programs, which are higher order logic programs (which accommodate meta-reasoning through higher-order atoms) with external atoms for software interoperability. Intuitively, a higher-order atom allows to quantify values over predicate names, and to freely exchange predicate symbols with constant symbols, like in the rule
F¡ aevitoerri,tieannsntri,arßoem9a–n1,1to, mA-p1i0ts4¢ 0@Vkire.ntunwa,ieAnu.asctr.aiat
We present an implementation of HEX programs, which are nonmonotonic logic programs admitting higher-order atoms as well as external atoms. Higher-order features are widely acknowledged as useful for various tasks, including meta-reasoning. Furthermore, the possibility to exchange knowledge with external sources in a fully declarative framework such as answer-set programming (ASP) is nowadays important, in particular in view of applications in the Semantic-Web area. Through external atoms, HEX programs can deal with external knowledge and reasoners of various nature, such as RDF datasets or description-logic knowledge bases.
to an external computational source (e.g., an external deduction system, an execution library, etc.).
External atoms allow a bidirectional flow of information to and from external sources of computation such as descriptionlogic reasoners.