ICCV_2015_Thin Structure Estimation with Curvature Regularization
《稀疏角CT重建的算法研究》篇一一、引言随着医学影像技术的快速发展,计算机断层扫描(Computed Tomography, CT)技术已经成为临床诊断的重要手段之一。
龙源期刊网 行人再识别研究取得进展作者:杨智来源:《科学》2016年第05期[本刊讯]清华大学王生进教授课题组在有关行人再识别的研究中取得引人注目进展,该项成果分别在2014年度欧洲计算机视觉国际会议、2015年度国际电气和电子工程师协会(IEEE)的计算机视觉大会(ICCV)上发表。
行人再识别(person re-identification)是一种判断所关注的某个特定行人在多个监测摄像头的哪个摄像头中出现过的新技术,以此获得行人的行走轨迹。
2016年,王生进研究组又在国际权威期刊《IEEE模式分析与机器智能会刊》(IEEE PAMI)上发表了最新的行人再识别研究成果。
• Smith B A, Yin Q, Feiner S, et al. Gaze locking: passive eye contact detection for human-object interaction[C]. User Interface Software and Technology, 2013: 271-280.
• 采集工具:深度相机Kinect + RGB相机 • 采集方法:志愿者坐在深度相机前,要求眼睛一直盯着运动的乒
乓球,同时用RGB相机记录这一过程。在采集到的视频中人工标 注眼睛中心点与乒乓球的2D坐标,映射到点云中得到对应的三维 坐标,做差得到三维视线向量 • 规模:94段视频,16位不同人种的样本 • 适用场景:视线估计 • 局限性:需要深度摄像头,数据量较少
• 采集工具:手动合成或自动合成 • SynthesEyes (ICCV 2015) • UnityEyes (ETRA 2016)
• 直接提供了自动生成工具 • 使用Unity引擎制作 • 可以自定义视线、头部姿态等
• SimGAN (CVPR 2017) 用GAN做视线迁移 • Unsupervised Representation Learning (CVPR 2020) 视线重定向
• RT-GENE(眼动仪+深度+RGB) • 视线追踪
• SynthesEyes(合成)
• GazeFollow
• UnityEyes(合成)
• VideoGaze
MPIIGaze (CVPR 2015)
• 采集工具:参数已知的单个RGB相机 • 采集方法:利用相机参数和镜面算法计算并校准人眼的3D位置,
《稀疏角CT重建的算法研究》篇一一、引言计算机断层扫描(Computed Tomography, CT)技术是现代医学影像诊断的重要手段之一。
相较于传统的CT 重建算法,稀疏角CT重建算法在数据采集和图像重建过程中采用了更少的投影角度,从而提高了图像的分辨率和减少了伪影。
三、稀疏角CT重建的算法研究1. 迭代重建算法迭代重建算法是稀疏角CT重建中常用的一种算法。
2. 深度学习算法近年来,深度学习算法在稀疏角CT重建中也得到了广泛应用。
3. 压缩感知算法压缩感知算法是一种基于信号稀疏性的重建算法,也被广泛应用于稀疏角CT重建中。
基于多层特征嵌入的单目标跟踪算法1. 内容描述基于多层特征嵌入的单目标跟踪算法是一种在计算机视觉领域中广泛应用的跟踪技术。
1.1 研究背景随着计算机视觉和深度学习技术的快速发展,目标跟踪在许多领域(如安防、智能监控、自动驾驶等)中发挥着越来越重要的作用。
1.2 研究目的本研究旨在设计一种基于多层特征嵌入的单目标跟踪算法,以提高目标跟踪的准确性和鲁棒性。
AbstractCompressive sensing and sparse inversion methods have gained a significant amount of attention in recent years due to their capability to accurately reconstruct signals from measurements with significantly less data than previously possible. In this paper, a modified Gaussian frequency domain compressive sensing and sparse inversion method is proposed, which leverages the proven strengths of the traditional method to enhance its accuracy and performance. Simulation results demonstrate that the proposed method can achieve a higher signal-to- noise ratio and a better reconstruction quality than its traditional counterpart, while also reducing the computational complexity of the inversion procedure.IntroductionCompressive sensing (CS) is an emerging field that has garnered significant interest in recent years because it leverages the sparsity of signals to reduce the number of measurements required to accurately reconstruct the signal. This has many advantages over traditional signal processing methods, including faster data acquisition times, reduced power consumption, and lower data storage requirements. CS has been successfully applied to a wide range of fields, including medical imaging, wireless communications, and surveillance.One of the most commonly used methods in compressive sensing is the Gaussian frequency domain compressive sensing and sparse inversion (GFD-CS) method. In this method, compressive measurements are acquired by multiplying the original signal with a randomly generated sensing matrix. The measurements are then transformed into the frequency domain using the Fourier transform, and the sparse signal is reconstructed using a sparsity promoting algorithm.In recent years, researchers have made numerous improvementsto the GFD-CS method, with the goal of improving its reconstruction accuracy, reducing its computational complexity, and enhancing its robustness to noise. In this paper, we propose a modified GFD-CS method that combines several techniques to achieve these objectives.Proposed MethodThe proposed method builds upon the well-established GFD-CS method, with several key modifications. The first modification is the use of a hierarchical sparsity-promoting algorithm, which promotes sparsity at both the signal level and the transform level. This is achieved by applying the hierarchical thresholding technique to the coefficients corresponding to the higher frequency components of the transformed signal.The second modification is the use of a novel error feedback mechanism, which reduces the impact of measurement noise on the reconstructed signal. Specifically, the proposed method utilizes an iterative algorithm that updates the measurement error based on the difference between the reconstructed signal and the measured signal. This feedback mechanism effectively increases the signal-to-noise ratio of the reconstructed signal, improving its accuracy and robustness to noise.The third modification is the use of a low-rank approximation method, which reduces the computational complexity of the inversion algorithm while maintaining reconstruction accuracy. This is achieved by decomposing the sensing matrix into a product of two lower dimensional matrices, which can be subsequently inverted using a more efficient algorithm.Simulation ResultsTo evaluate the effectiveness of the proposed method, we conducted simulations using synthetic data sets. Three different signal types were considered: a sinusoidal signal, a pulse signal, and an image signal. The results of the simulations were compared to those obtained using the traditional GFD-CS method.The simulation results demonstrate that the proposed method outperforms the traditional GFD-CS method in terms of signal-to-noise ratio and reconstruction quality. Specifically, the proposed method achieves a higher signal-to-noise ratio and lower mean squared error for all three types of signals considered. Furthermore, the proposed method achieves these results with a reduced computational complexity compared to the traditional method.ConclusionThe results of our simulations demonstrate the effectiveness of the proposed method in enhancing the accuracy and performance of the GFD-CS method. The combination of sparsity promotion, error feedback, and low-rank approximation techniques significantly improves the signal-to-noise ratio and reconstruction quality, while reducing thecomputational complexity of the inversion procedure. Our proposed method has potential applications in a wide range of fields, including medical imaging, wireless communications, and surveillance.。
二、目标跟踪研究现状1. 基于相关滤波的目标跟踪算法在相关滤波目标跟踪算法出现之前,大部分目标跟踪算法采用粒子滤波框架来进行目标跟踪,粒子数量往往成为限制算法速度的一个重要原因。
1.1. 特征部分改进MOSSE[1] 算法及在此基础上引入循环矩阵快速计算的CSK[2]算法均采用简单灰度特征,这种特征很容易受到外界环境的干扰,导致跟踪不准确。
ICML2014ICML20151. An embarrassingly simple approach to zero-shot learning2. Learning Transferable Features with Deep Adaptation Networks3. A Theoretical Analysis of Metric Hypothesis Transfer Learning4. Gradient-based hyperparameter optimization through reversible learningICML20161. One-Shot Generalization in Deep Generative Models2. Meta-Learning with Memory-Augmented Neural Networks3. Meta-gradient boosted decision tree model for weight and target learning4. Asymmetric Multi-task Learning based on Task Relatedness and ConfidenceICML20171. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning2. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks3. Meta Networks4. Learning to learn without gradient descent by gradient descentICML20181. MSplit LBI: Realizing Feature Selection and Dense Estimation Simultaneously in Few-shotand Zero-shot Learning2. Understanding and Simplifying One-Shot Architecture Search3. One-Shot Segmentation in Clutter4. Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory5. Bilevel Programming for Hyperparameter Optimization and Meta-Learning6. Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace7. Been There, Done That: Meta-Learning with Episodic Recall8. Learning to Explore via Meta-Policy Gradient9. Transfer Learning via Learning to Transfer10. Rapid adaptation with conditionally shifted neuronsNIPS20141. Zero-shot recognition with unreliable attributesNIPS2015NIPS20161. Learning feed-forward one-shot learners2. Matching Networks for One Shot Learning3. Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs NIPS20171. One-Shot Imitation Learning2. Few-Shot Learning Through an Information Retrieval Lens3. Prototypical Networks for Few-shot Learning4. Few-Shot Adversarial Domain Adaptation5. A Meta-Learning Perspective on Cold-Start Recommendations for Items6. Neural Program Meta-InductionNIPS20181. Bayesian Model-Agnostic Meta-Learning2. The Importance of Sampling inMeta-Reinforcement Learning3. MetaAnchor: Learning to Detect Objects with Customized Anchors4. MetaGAN: An Adversarial Approach to Few-Shot Learning5. Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior6. Meta-Gradient Reinforcement Learning7. Meta-Reinforcement Learning of Structured Exploration Strategies8. Meta-Learning MCMC Proposals9. Probabilistic Model-Agnostic Meta-Learning10. MetaReg: Towards Domain Generalization using Meta-Regularization11. Zero-Shot Transfer with Deictic Object-Oriented Representation in Reinforcement Learning12. Uncertainty-Aware Few-Shot Learning with Probabilistic Model-Agnostic Meta-Learning13. Multitask Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies14. Stacked Semantics-Guided Attention Model for Fine-Grained Zero-Shot Learning15. Delta-encoder: an effective sample synthesis method for few-shot object recognition16. One-Shot Unsupervised Cross Domain Translation17. Generalized Zero-Shot Learning with Deep Calibration Network18. Domain-Invariant Projection Learning for Zero-Shot Recognition19. Low-shot Learning via Covariance-Preserving Adversarial Augmentation Network20. Improved few-shot learning with task conditioning and metric scaling21. Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning22. Learning to Play with Intrinsically-Motivated Self-Aware Agents23. Learning to Teach with Dynamic Loss Functiaons24. Memory Replay GANs: learning to generate images from new categories without forgettingICCV20151. One Shot Learning via Compositions of Meaningful Patches2. Unsupervised Domain Adaptation for Zero-Shot Learning3. Active Transfer Learning With Zero-Shot Priors: Reusing Past Datasets for Future Tasks4. Zero-Shot Learning via Semantic Similarity Embedding5. Semi-Supervised Zero-Shot Classification With Label Representation Learning6. Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions7. Learning to Transfer: Transferring Latent Task Structures and Its Application to Person-Specific Facial Action Unit DetectionICCV20171. Supplementary Meta-Learning: Towards a Dynamic Model for Deep Neural Networks2. Attributes2Classname: A Discriminative Model for Attribute-Based Unsupervised Zero-ShotLearning3. Low-Shot Visual Recognition by Shrinking and Hallucinating Features4. Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning5. Learning Discriminative Latent Attributes for Zero-Shot Classification6. Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of ActionsCVPR20141. COSTA: Co-Occurrence Statistics for Zero-Shot Classification2. Zero-shot Event Detection using Multi-modal Fusion of Weakly Supervised Concepts3. Learning to Learn, from Transfer Learning to Domain Adaptation: A Unifying Perspective CVPR20151. Zero-Shot Object Recognition by Semantic Manifold DistanceCVPR20162. Multi-Cue Zero-Shot Learning With Strong Supervision3. Latent Embeddings for Zero-Shot Classification4. One-Shot Learning of Scene Locations via Feature Trajectory Transfer5. Less Is More: Zero-Shot Learning From Online Textual Documents With Noise Suppression6. Synthesized Classifiers for Zero-Shot Learning7. Recovering the Missing Link: Predicting Class-Attribute Associations for UnsupervisedZero-Shot Learning8. Fast Zero-Shot Image Tagging9. Zero-Shot Learning via Joint Latent Similarity Embedding10. Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated ImageAnnotation11. Learning to Co-Generate Object Proposals With a Deep Structured Network12. Learning to Select Pre-Trained Deep Representations With Bayesian Evidence Framework13. DeepStereo: Learning to Predict New Views From the World’s ImageryCVPR20171. One-Shot Video Object Segmentation2. FastMask: Segment Multi-Scale Object Candidates in One Shot3. Few-Shot Object Recognition From Machine-Labeled Web Images4. From Zero-Shot Learning to Conventional Supervised Classification: Unseen Visual DataSynthesis5. Learning a Deep Embedding Model for Zero-Shot Learning6. Low-Rank Embedded Ensemble Semantic Dictionary for Zero-Shot Learning7. Multi-Attention Network for One Shot Learning8. Zero-Shot Action Recognition With Error-Correcting Output Codes9. One-Shot Metric Learning for Person Re-Identification10. Semantic Autoencoder for Zero-Shot Learning11. Zero-Shot Recognition Using Dual Visual-Semantic Mapping Paths12. Matrix Tri-Factorization With Manifold Regularizations for Zero-Shot Learning13. One-Shot Hyperspectral Imaging Using Faced Reflectors14. Gaze Embeddings for Zero-Shot Image Classification15. Zero-Shot Learning - the Good, the Bad and the Ugly16. Link the Head to the “Beak”: Zero Shot Learning From Noisy Text Description at PartPrecision17. Semantically Consistent Regularization for Zero-Shot Recognition18. Semantically Consistent Regularization for Zero-Shot Recognition19. Zero-Shot Classification With Discriminative Semantic Representation Learning20. Learning to Detect Salient Objects With Image-Level Supervision21. Quad-Networks: Unsupervised Learning to Rank for Interest Point DetectionCVPR20181. A Generative Adversarial Approach for Zero-Shot Learning From Noisy Texts2. Transductive Unbiased Embedding for Zero-Shot Learning3. Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial EmbeddingNetworks4. Learning to Compare: Relation Network for Few-Shot Learning5. One-Shot Action Localization by Learning Sequence Matching Network6. Multi-Label Zero-Shot Learning With Structured Knowledge Graphs7. “Zero-Shot” Super-Resolution Using Deep Internal Learning8. Low-Shot Learning With Large-Scale Diffusion9. CLEAR: Cumulative LEARning for One-Shot One-Class Image Recognition10. Zero-Shot Sketch-Image Hashing11. Structured Set Matching Networks for One-Shot Part Labeling12. Memory Matching Networks for One-Shot Image Recognition13. Generalized Zero-Shot Learning via Synthesized Examples14. Dynamic Few-Shot Visual Learning Without Forgetting15. Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification byStepwise Learning16. Feature Generating Networks for Zero-Shot Learning17. Low-Shot Learning With Imprinted Weights18. Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs19. Webly Supervised Learning Meets Zero-Shot Learning: A Hybrid Approach for Fine-Grained Classification20. Few-Shot Image Recognition by Predicting Parameters From Activations21. Low-Shot Learning From Imaginary Data22. Discriminative Learning of Latent Features for Zero-Shot Recognition23. Multi-Content GAN for Few-Shot Font Style Transfer24. Preserving Semantic Relations for Zero-Shot Learning25. Zero-Shot Kernel Learning26. Neural Style Transfer via Meta Networks27. Learning to Estimate 3D Human Pose and Shape From a Single Color Image28. Learning to Segment Every Thing29. Leveraging Unlabeled Data for Crowd Counting by Learning to Rank。
在图像分类领域,卷积神经网络(Convolutional Neural Network, CNN)是一种常用且有效的模型。
多视⾓学习的⼏篇⽂章整理 最近在调研3D算法⽅⾯的⼯作,整理了⼏篇多视⾓学习的⽂章。
⽬录1、(ICCV2015)MVCNN:Multi-view Convolutional Neural Networks for 3D Shape Recognition论⽂地址:代码: 该篇⽂章被认为是多视⾓学习的开⼭之作; 简单的求⼀个3D形状的多视⾓图像的特征描述⼦的平均值,或者简单的将这些特征描述⼦做“连接”(这地⽅可以想象成将特征简单的“串联”),会导致不好的效果。
因此,我们设计了Multi-view CNN(MVCNN),放在基础的2D图像CNN之中。
整张⽹络第⼀部分的所有分⽀,共享相同的 CNN1⾥的参数。
参考:2、(CVPR2016) Volumetric and multi-view CNNs for object classification on 3D data论⽂地址:代码:3、(BMVC2017)DSCNN:Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition论⽂地址:代码:4、(CVPR2018)GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition论⽂地址:代码: 这篇⽂章在MVCNN的基础之上,提出了group-view convolutional neural network(GVCNN)。
二、立体匹配关键技术研究1. 立体匹配概述立体匹配是计算机视觉领域中的一个基本问题,其目的是从两幅或多幅图像中提取出对应的特征点,并计算这些特征点在三维空间中的位置。
2. 立体匹配算法研究目前,常见的立体匹配算法包括基于区域、基于特征和基于相位的方法。
此外,深度学习在立体匹配中也取得了显著的成果,如Displets、MC-CNN 等模型均能实现较高的匹配精度。
三、点云重建关键技术研究1. 点云重建概述点云重建是指将通过立体匹配等方法获得的三维空间点云数据整合成三维模型的过程。
2. 点云重建算法研究点云重建算法主要包括表面重建和纹理映射两个部分。
四、立体匹配与点云重建的融合与应用1. 融合方法研究立体匹配与点云重建的融合是提高三维重建精度的关键。
二、立体匹配技术研究1. 立体匹配基本原理立体匹配是利用两幅或多幅不同视角的图像,通过算法找到场景中同一目标点的对应关系,从而获取物体的深度信息。
2. 特征提取与匹配方法特征提取是立体匹配的关键步骤,其目的是从图像中提取出具有代表性的特征点。
3. 视差计算与优化通过特征匹配得到的视差图存在噪声和错误匹配等问题,需要进行视差计算和优化。
三、点云重建技术研究1. 点云重建基本原理点云重建是指根据一组空间中的点集,通过算法重建出物体的三维模型。
2. 数据采集与预处理数据采集是点云重建的第一步,通过传感器获取场景的点云数据。
3. 点云配准与建模在完成数据预处理后,需要对点云数据进行配准操作,即将不同视角下的点云数据统一到同一坐标系下。
15ICCV_Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for
Weakly-and Semi-Supervised Learning of a Deep Convolutional Network forSemantic Image SegmentationGeorge Papandreou∗Google,Inc. gpapan@ Liang-Chieh Chen∗UCLAlcchen@Kevin P.MurphyGoogle,Inc.kpmurphy@Alan L.YuilleUCLAyuille@AbstractDeep convolutional neural networks(DCNNs)trained on a large number of images with strong pixel-level anno-tations have recently significantly pushed the state-of-art in semantic image segmentation.We study the more challeng-ing problem of learning DCNNs for semantic image seg-mentation from either(1)weakly annotated training data such as bounding boxes or image-level labels or(2)a com-bination of few strongly labeled and many weakly labeled images,sourced from one or multiple datasets.We develop Expectation-Maximization(EM)methods for semantic im-age segmentation model training under these weakly super-vised and semi-supervised settings.Extensive experimental evaluation shows that the proposed techniques can learn models delivering competitive results on the challenging PASCAL VOC2012image segmentation benchmark,while requiring significantly less annotation effort.We share source code implementing the proposed system at https: ///deeplab/deeplab-public.1.IntroductionSemantic image segmentation refers to the problem of assigning a semantic label(such as“person”,“car”or “dog”)to every pixel in the image.Various approaches have been tried over the years,but according to the results on the challenging Pascal VOC2012segmentation benchmark,the best performing methods all use some kind of Deep Convo-lutional Neural Network(DCNN)[2,5,8,14,25,27,41].In this paper,we work with the DeepLab-CRF approach of[5,41].This combines a DCNN with a fully connected Conditional Random Field(CRF)[19],in order to get high resolution segmentations.This model achieves state-of-art results on the challenging PASCAL VOC segmentation benchmark[13],delivering a mean intersection-over-union (IOU)score exceeding70%.A key bottleneck in building this class of DCNN-based∗Thefirst two authors contributed equally to this work.segmentation models is that they typically require pixel-level annotated images during training.Acquiring such data is an expensive,time-consuming annotation effort.Weak annotations,in the form of bounding boxes(i.e.,coarse object locations)or image-level labels(i.e.,information about which object classes are present)are far easier to collect than detailed pixel-level annotations.We develop new methods for training DCNN image segmentation mod-els from weak annotations,either alone or in combination with a small number of strong annotations.Extensive ex-periments,in which we achieve performance up to69.0%, demonstrate the effectiveness of the proposed techniques.According to[24],collecting bounding boxes around each class instance in the image is about15times faster/cheaper than labeling images at the pixel level.We demonstrate that it is possible to learn a DeepLab-CRF model delivering62.2%IOU on the PASCAL VOC2012 test set by training it on a simple foreground/background segmentation of the bounding box annotations.An even cheaper form of data to collect is image-level labels,which specify the presence or absence of se-mantic classes,but not the object locations.Most exist-ing approaches for training semantic segmentation models from this kind of very weak labels use multiple instance learning(MIL)techniques.However,even recent weakly-supervised methods such as[25]deliver significantly infe-rior results compared to their fully-supervised counterparts, only achieving25.7%.Including additional trainable ob-jectness[7]or segmentation[1]modules that largely in-crease the system complexity,[31]has improved perfor-mance to40.6%,which still significantly lags performance of fully-supervised systems.We develop novel online Expectation-Maximization (EM)methods for training DCNN semantic segmentation models from weakly annotated data.The proposed algo-rithms alternate between estimating the latent pixel labels (subject to the weak annotation constraints),and optimiz-ing the DCNN parameters using stochastic gradient descent (SGD).When we only have access to image-level anno-tated training data,we achieve39.6%,close to[31]butwithout relying on any external objectness or segmenta-tion module.More importantly,our EM approach also excels in the semi-supervised scenario which is very im-portant in practice.Having access to a small number of strongly (pixel-level)annotated images and a large number of weakly (bounding box or image-level)annotated images,the proposed algorithm can almost match the performance of the fully-supervised system.For example,having access to 2.9k pixel-level images and 9k image-level annotated im-ages yields 68.5%,only 2%inferior the performance of the system trained with all 12k images strongly annotated at the pixel level.Finally,we show that using additional weak or strong annotations from the MS-COCO dataset can further improve results,yielding 73.9%on the PASCAL VOC 2012benchmark.Contributions In summary,our main contributions are:1.We present EM algorithms for training with image-level or bounding box annotation,applicable to both the weakly-supervised and semi-supervised settings.2.We show that our approach achieves excellent per-formance when combining a small number of pixel-level annotated images with a large number of image-level or bounding box annotated images,nearly match-ing the results achieved when all training images have pixel-level annotations.3.We show that combining weak or strong annotations across datasets yields further improvements.In partic-ular,we reach 73.9%IOU performance on PASCAL VOC 2012by combining annotations from the PAS-CAL and MS-COCO datasets.2.Related workTraining segmentation models with only image-level labels has been a challenging problem in the literature [12,36,37,39].Our work is most related to other re-cent DCNN models such as [30,31],who also study the weakly supervised setting.They both develop MIL-based algorithms for the problem.In contrast,our model em-ploys an EM algorithm,which similarly to [26]takes into account the weak labels when inferring the latent image seg-mentations.Moreover,[31]proposed to smooth the predic-tion results by region proposal algorithms,e.g .,CPMC [3]and MCG [1],learned on pixel-segmented images.Neither [30,31]cover the semi-supervised setting.Bounding box annotations have been utilized for seman-tic segmentation by [38,42],while [15,21,40]describe schemes exploiting both image-level labels and bounding box annotations.[4]attained human-level accuracy for car segmentation by using 3D bounding boxes.Bounding box annotations are also commonly used in interactive segmen-tation [22,33];we show that such foreground/backgroundPixel annotationsImage Deep Convolutional Neural NetworkLossFigure 1.DeepLab model training from fully annotated images.segmentation methods can effectively estimate object seg-ments accurate enough for training a DCNN semantic seg-mentation system.Working in a setting very similar to ours,[9]employed MCG [1](which requires training from pixel-level annotations)to infer object masks from bounding box labels during DCNN training.3.Proposed MethodsWe build on the DeepLab model for semantic image seg-mentation proposed in [5].This uses a DCNN to predict the label distribution per pixel,followed by a fully-connected (dense)CRF [19]to smooth the predictions while preserv-ing image edges.In this paper,we focus for simplicity on methods for training the DCNN parameters from weak la-bels,only using the CRF at test time.Additional gains can be obtained by integrated end-to-end training of the DCNN and CRF parameters [41,6].Notation We denote by x the image values and y the seg-mentation map.In particular,y m ∈{0,...,L }is the pixel label at position m ∈{1,...,M },assuming that we have the background as well as L possible foreground labels and M is the number of pixels.Note that these pixel-level la-bels may not be visible in the training set.We encode the set of image-level labels by z ,with z l =1,if the l -th label is present anywhere in the image,i.e .,if m [y m =l ]>0.3.1.Pixel-level annotationsIn the fully supervised case illustrated in Fig.1,the ob-jective function isJ (θ)=log P (y |x ;θ)=Mm =1log P (y m |x ;θ),(1)where θis the vector of DCNN parameters.The per-pixellabel distributions are computed byP (y m |x ;θ)∝exp(f m (y m |x ;θ)),(2)where f m (y m |x ;θ)is the output of the DCNN at pixel m .We optimize J (θ)by mini-batch SGD.3.2.Image-level annotationsWhen only image-level annotation is available,we can observe the image values x and the image-level labels z ,but the pixel-level segmentations y are latent variables.WeAlgorithm 1Weakly-Supervised EM (fixed bias version)Input:Initial CNN parameters θ′,potential parameters b l ,l ∈{0,...,L },image x ,image-level label set z .E-Step:For each image position m1:ˆf m (l )=f m (l |x ;θ′)+b l ,if z l =12:ˆf m (l )=f m (l |x ;θ′),if z l =03:ˆy m =argmax l ˆf m (l )M-Step:4:Q (θ;θ′)=log P (ˆy |x ,θ)= M m =1log P (ˆy m |x ,θ)5:Compute ∇θQ (θ;θ′)and use SGD to update θ′.have the following probabilistic graphical model:P (x ,y ,z ;θ)=P (x )Mm =1P (y m |x ;θ)P (z |y ).(3)We pursue an EM-approach in order to learn the model parameters θfrom training data.If we ignore terms that do not depend on θ,the expected complete-data log-likelihood given the previous parameter estimate θ′isQ (θ;θ′)= yP (y |x ,z ;θ′)log P (y |x ;θ)≈log P (ˆy |x ;θ),(4)where we adopt a hard-EM approximation,estimating in the E-step of the algorithm the latent segmentation by ˆy =argmax yP (y |x ;θ′)P (z |y )(5)=argmax ylog P (y |x ;θ′)+log P (z |y )(6)=argmaxyMm =1f m (y m |x ;θ′)+log P (z |y ) .(7)In the M-step of the algorithm,we optimize Q (θ;θ′)≈log P (ˆy |x ;θ)by mini-batch SGD similarly to (1),treatingˆyas ground truth segmentation.To completely identify the E-step (7),we need to specifythe observation model P (z |y ).We have experimented withtwo variants,EM-Fixed and EM-Adapt .EM-Fixed In this variant,we assume that log P (z |y )fac-torizes over pixel positions aslog P (z |y )=Mm =1φ(y m ,z )+(const),(8)allowing us to estimate the E-step segmentation at eachpixel separatelyˆy m =argmaxy mˆf m (y m ).=f m (y m |x ;θ′)+φ(y m ,z ).(9)ImageFigure 2.DeepLab model training using image-level labels.We assume thatφ(y m =l,z )=b l if z l =10if z l =0(10)We set the parameters b l =b fg ,if l >0and b 0=b bg ,with b fg >b bg >0.Intuitively,this potential encourages a pixel to be assigned to one of the image-level labels z .We choose b fg >b bg ,boosting present foreground classes more than the background,to encourage full object coverage andavoid a degenerate solution of all pixels being assigned to background.The procedure is summarized in Algorithm 1and illustrated in Fig.2.EM-Adapt In this method,we assume that log P (z |y )=φ(y ,z )+(const),where φ(y ,z )takes the form of a cardi-nality potential [23,32,35].In particular,we encourage atleast a ρl portion of the image area to be assigned to classl ,if z l =1,and enforce that no pixel is assigned to classl ,if z l =0.We set the parameters ρl =ρfg ,if l >0andρ0=ρbg .Similar constraints appear in [10,20].In practice,we employ a variant of Algorithm 1.Weadaptively set the image-and class-dependent biases b l so as the prescribed proportion of the image area is assigned to the background or foreground object classes.This acts as a powerful constraint that explicitly prevents the background score from prevailing in the whole image,also promoting higher foreground object coverage.The detailed algorithm is described in the supplementary material.EM It is instructive to compare our EM-based approach with two recent Multiple Instance Learning (MIL)methods for learning semantic image segmentation models [30,31].The method in [30]defines an MIL classification objective based on the per-class spatial maximum of the lo-cal label distributions of (2),ˆP (l |x ;θ).=max m P (y m =l |x ;θ),and [31]adopts a softmax function.While this approach has worked well for image classification tasks [28,29],it is less suited for segmentation as it does not pro-mote full object coverage:The DCNN becomes tuned to focus on the most distinctive object parts (e.g .,human face)instead of capturing the whole object (e.g .,human body).ImageBbox annotationsDeep ConvolutionalNeural NetworkDenseCRFargmaxLossFigure3.DeepLab model training from bounding boxes.3.3.Bounding Box AnnotationsWe explore three alternative methods for training our segmentation model from labeled bounding boxes.Thefirst Bbox-Rect method amounts to simply consider-ing each pixel within the bounding box as positive example for the respective object class.Ambiguities are resolved by assigning pixels that belong to multiple bounding boxes to the one that has the smallest area.The bounding boxes fully surround objects but also contain background pixels that contaminate the training set with false positive examples for the respective object classes.Tofilter out these background pixels,we have also explored a second Bbox-Seg method in which we per-form automatic foreground/background segmentation.To perform this segmentation,we use the same CRF as in DeepLab.More specifically,we constrain the center area of the bounding box(α%of pixels within the box)to be fore-ground,while we constrain pixels outside the bounding box to be background.We implement this by appropriately set-ting the unary terms of the CRF.We then infer the labels for pixels in between.We cross-validate the CRF parameters to maximize segmentation accuracy in a small held-out set of fully-annotated images.This approach is similar to the grabcut method of[33].Examples of estimated segmenta-tions with the two methods are shown in Fig.4.The two methods above,illustrated in Fig.3,estimate segmentation maps from the bounding box annotation as a pre-processing step,then employ the training procedure of Sec.3.1,treating these estimated labels as ground-truth.Our third Bbox-EM-Fixed method is an EM algorithm that allows us to refine the estimated segmentation maps throughout training.The method is a variant of the EM-Fixed algorithm in Sec.3.2,in which we boost the present foreground object scores only within the bounding box area.3.4.Mixed strong and weak annotationsIn practice,we often have access to a large number of weakly image-level annotated images and can only afford to procure detailed pixel-level annotations for a small fraction of these images.We handlethishybrid training scenario byImage with Bbox Ground-Truth Bbox-Rect Bbox-SegFigure4.Estimatedsegmentation frombounding box annotation.+Pixel AnnotationsFG/BGBiasargmax1. Car2. Person3. HorseDeep ConvolutionalNeural Network LossDeep ConvolutionalNeural NetworkLossScore mapsFigure5.DeepLab model training on a union of full(strong labels)and image-level(weak labels)annotations.combining the methods presented in the previous sections,as illustrated in Figure5.In SGD training of our deep CNNmodels,we bundle to each mini-batch afixed proportionof strongly/weakly annotated images,and employ our EMalgorithm in estimating at each iteration the latent semanticsegmentations for the weakly annotated images.4.Experimental Evaluation4.1.Experimental ProtocolDatasets The proposed training methods are evaluatedon the PASCAL VOC2012segmentation benchmark[13],consisting of20foreground object classes and one back-ground class.The segmentation part of the original PAS-CAL VOC2012dataset contains1464(train),1449(val),and1456(test)images for training,validation,and test,re-spectively.We also use the extra annotations provided by[16],resulting in augmented sets of10,582(train aug)and12,031(trainval aug)images.We have also experimentedwith the large MS-COCO2014dataset[24],which con-tains123,287images in its trainval set.The MS-COCO2014dataset has80foreground object classes and one back-ground class and is also annotated at the pixel level.The performance is measured in terms of pixelintersection-over-union(IOU)averaged across the21classes.Wefirst evaluate our proposed methods on the PAS-CAL VOC2012val set.We then report our results on the official PASCAL VOC2012benchmark test set(whose an-notations are not released).We also compare our test set results with other competing methods.Reproducibility We have implemented the proposed methods by extending the excellent Caffe framework[18]. We share our source code,configurationfiles,and trained models that allow reproducing the results in this paper at a companion web site https:/// deeplab/deeplab-public.Weak annotations In order to simulate the situations where only weak annotations are available and to have fair comparisons(e.g.,use the same images for all settings),we generate the weak annotations from the pixel-level annota-tions.The image-level labels are easily generated by sum-marizing the pixel-level annotations,while the bounding box annotations are produced by drawing rectangles tightly containing each object instance(PASCAL VOC2012also provides instance-level annotations)in the dataset. Network architectures We have experimented with the two DCNN architectures of[5],with parameters initialized from the VGG-16ImageNet[11]pretrained model of[34]. They differ in the receptivefield of view(FOV)size.We have found that large FOV(224×224)performs best when at least some training images are annotated at the pixel level, whereas small FOV(128×128)performs better when only image-level annotations are available.In the main paper we report the results of the best architecture for each setup and defer the full comparison between the two FOVs to the supplementary material.Training We employ our proposed training methods to learn the DCNN component of the DeepLab-CRF model of [5].For SGD,we use a mini-batch of20-30images and ini-tial learning rate of0.001(0.01for thefinal classifier layer), multiplying the learning rate by0.1after afixed number of iterations.We use momentum of0.9and a weight decay of 0.0005.Fine-tuning our network on PASCAL VOC2012 takes about12hours on a NVIDIA Tesla K40GPU.Similarly to[5],we decouple the DCNN and Dense CRF training stages and learn the CRF parameters by cross val-idation to maximize IOU segmentation accuracy in a held-out set of100Pascal val fully-annotated images.We use10 mean-field iterations for Dense CRF inference[19].Note that the IOU scores are typically3-5%worse if we don’t use the CRF for post-processing of the results.4.2.Pixel-level annotationsWe havefirst reproduced the results of[5].Training the DeepLab-CRF model with strong pixel-level annota-tions on PASCAL VOC2012,we achieve a mean IOU scoreMethod#Strong#Weak val IOUEM-Fixed(Weak)-10,58220.8EM-Adapt(Weak)-10,58238.2EM-Fixed(Semi)20010,38247.650010,08256.97509,83259.81,0009,58262.01,4645,00063.21,4649,11864.6Strong1,464-62.510,582-67.6Table1.VOC2012val performance for varying number of pixel-level(strong)and image-level(weak)annotations(Sec.4.3).Method#Strong#Weak test IOUMIL-FCN[30]-10k25.7MIL-sppxl[31]-760k35.8MIL-obj[31]BING760k37.0MIL-seg[31]MCG760k40.6EM-Adapt(Weak)-12k39.6EM-Fixed(Semi)1.4k10k66.22.9k9k68.5Strong[5]12k-70.3Table2.VOC2012test performance for varying number of pixel-level(strong)and image-level(weak)annotations(Sec.4.3).of67.6%on val and70.3%on test;see method DeepLab-CRF-LargeFOV in[5,Table1].4.3.Image-level annotationsValidation results We evaluate our proposed methods in training the DeepLab-CRF model using image-level weak annotations from the10,582PASCAL VOC2012train aug set,generated as described in Sec.4.1above.We report the val performance of our two weakly-supervised EM vari-ants described in Sec.3.2.In the EM-Fixed variant we use b fg=5and b bg=3asfixed foreground and background biases.We found the results to be quite sensitive to the dif-ference b fg−b bg but not very sensitive to their absolute val-ues.In the adaptive EM-Adapt variant we constrain at least ρbg=40%of the image area to be assigned to background and at leastρfg=20%of the image area to be assigned to foreground(as specified by the weak label set).We also examine using weak image-level annotations in addition to a varying number of pixel-level annotations, within the semi-supervised learning scheme of Sec.3.4. In this Semi setting we employ strong annotations of a subset of PASCAL VOC2012train set and use the weak image-level labels from another non-overlapping subset of the train aug set.We perform segmentation inference for the images that only have image-level labels by means of EM-Fixed,which we have found to perform better than EM-Adapt in the semi-supervised training setting.The results are summarized in Table1.We see that the EM-Adapt algorithm works much better than the EM-Fixed algorithm when we only have access to image level an-notations,20.8%vs.38.2%validation ing1,464 pixel-level and9,118image-level annotations in the EM-Fixed semi-supervised setting significantly improves per-formance,yielding64.6%.Note that image-level annota-tions are helpful,as training only with the1,464pixel-level annotations only yields62.5%.Test results In Table2we report our test results.We com-pare the proposed methods with the recent MIL-based ap-proaches of[30,31],which also report results obtained with image-level annotations on the VOC benchmark.Our EM-Adapt method yields39.6%,which improves over MIL-FCN[30]by a large13.9%margin.As[31]shows,MIL can become more competitive if additional segmentation in-formation is introduced:Using low-level superpixels,MIL-sppxl[31]yields35.8%and is still inferior to our EM algo-rithm.Only if augmented with BING[7]or MCG[1]can MIL obtain results comparable to ours(MIL-obj:37.0%, MIL-seg:40.6%)[31].Note,however,that both BING and MCG have been trained with bounding box or pixel-annotated data on the PASCAL train set,and thus both MIL-obj and MIL-seg indirectly rely on bounding box or pixel-level PASCAL annotations.The more interestingfinding of this experiment is that including very few strongly annotated images in the semi-supervised setting significantly improves the performance compared to the pure weakly-supervised baseline.For example,using 2.9k pixel-level annotations along with 9k image-level annotations in the semi-supervised setting yields68.5%.We would like to highlight that this re-sult surpasses all techniques which are not based on the DCNN+CRF pipeline of[5](see Table6),even if trained with all available pixel-level annotations.4.4.Bounding box annotationsValidation results In this experiment,we train the DeepLab-CRF model using bounding box annotations from the train aug set.We estimate the training set segmentations in a pre-processing step using the Bbox-Rect and Bbox-Seg methods described in Sec.3.3.We assume that we also have access to100fully-annotated PASCAL VOC2012val images which we have used to cross-validate the value of the single Bbox-Seg parameterα(percentage of the cen-ter bounding box area constrained to be foreground).We variedαfrom20%to80%,finding thatα=20%maxi-mizes accuracy in terms of IOU in recovering the ground truth foreground from the bounding box.We also examine the effect of combining these weak bounding box annota-tions with strong pixel-level annotations,using the semi-supervised learning methods of Sec.3.4.The results are summarized in Table3.When using only bounding box annotations,we see that Bbox-Seg improves over Bbox-Rect by8.1%,and gets within7.0%of the strong pixel-level annotation result.We observe that combining 1,464strong pixel-level annotations with weak bounding box annotations yields65.1%,only2.5%worse than the strong pixel-level annotation result.In the semi-supervisedMethod#Strong#Box val IOUBbox-Rect(Weak)-10,58252.5Bbox-EM-Fixed(Weak)-10,58254.1Bbox-Seg(Weak)-10,58260.6Bbox-Rect(Semi)1,4649,11862.1Bbox-EM-Fixed(Semi)1,4649,11864.8Bbox-Seg(Semi)1,4649,11865.1Strong1,464-62.510,582-67.6Table3.VOC2012val performance for varying number of pixel-level(strong)and bounding box(weak)annotations(Sec.4.4).Method#Strong#Box test IOUBoxSup[9]MCG10k64.6BoxSup[9] 1.4k(+MCG)9k66.2Bbox-Rect(Weak)-12k54.2Bbox-Seg(Weak)-12k62.2Bbox-Seg(Semi) 1.4k10k66.6Bbox-EM-Fixed(Semi) 1.4k10k66.6Bbox-Seg(Semi) 2.9k9k68.0Bbox-EM-Fixed(Semi) 2.9k9k69.0Strong[5]12k-70.3Table4.VOC2012test performance for varying number of pixel-level(strong)and bounding box(weak)annotations(Sec.4.4).learning settings and1,464strong annotations,Semi-Bbox-EM-Fixed and Semi-Bbox-Seg perform similarly.Test results In Table4we report our test results.We com-pare the proposed methods with the very recent BoxSup ap-proach of[9],which also uses bounding box annotations on the VOC2012segmentation paring our al-ternative Bbox-Rect(54.2%)and Bbox-Seg(62.2%)meth-ods,we see that simple foreground-background segmenta-tion provides much better segmentation masks for DCNN training than using the raw bounding boxes.BoxSup does 2.4%better,however it employs the MCG segmentation proposal mechanism[1],which has been trained with pixel-annotated data on the PASCAL train set;it thus indirectly relies on pixel-level annotations.When we also have access to pixel-level annotated im-ages,our performance improves to66.6%(1.4k strong annotations)or69.0%(2.9k strong annotations).In this semi-supervised setting we outperform BoxSup(66.6%vs.66.2%with1.4k strong annotations),although we do not use MCG.Interestingly,Bbox-EM-Fixed improves over Bbox-Seg as we add more strong annotations,and it per-forms1.0%better(69.0%vs.68.0%)with2.9k strong an-notations.This shows that the E-step of our EM algorithm can estimate the object masks better than the foreground-background segmentation pre-processing step when enough pixel-level annotated images are available.Comparing with Sec.4.3,note that2.9k strong+9k image-level annotations yield68.5%(Table2),while2.9k strong+9k bounding box annotations yield69.0%(Ta-ble3).Thisfinding suggests that bounding box annotations add little value over image-level annotations when a suffi-cient number of pixel-level annotations is also available.Method#Strong COCO#Weak COCO val IOU PASCAL-only--67.6EM-Fixed(Semi)-123,28767.7Cross-Joint(Semi)5,000118,28770.0Cross-Joint(Strong)5,000-68.7Cross-Pretrain(Strong)123,287-71.0Cross-Joint(Strong)123,287-71.7 Table5.VOC2012val performance using strong annotations for all10,582train aug PASCAL images and a varying number of strong and weak MS-COCO annotations(Sec.4.5).Method test IOUMSRA-CFM[8]61.8FCN-8s[25]62.2Hypercolumn[17]62.6TTI-Zoomout-16[27]64.4DeepLab-CRF-LargeFOV[5]70.3BoxSup(Semi,with weak COCO)[9]71.0DeepLab-CRF-LargeFOV(Multi-scale net)[5]71.6Oxford TVG CRF RNN VOC[41]72.0Oxford TVG CRF RNN COCO[41]74.7Cross-Pretrain(Strong)72.7Cross-Joint(Strong)73.0Cross-Pretrain(Strong,Multi-scale net)73.6Cross-Joint(Strong,Multi-scale net)73.9Table6.VOC2012test performance using PASCAL and MS-COCO annotations(Sec.4.5).4.5.Exploiting Annotations Across Datasets Validation results We present experiments leveraging the 81-label MS-COCO dataset as an additional source of data in learning the DeepLab model for the21-label PASCAL VOC2012segmentation task.We consider three scenarios:•Cross-Pretrain(Strong):Pre-train DeepLab on MS-COCO,then replace the top-level network weights and fine-tune on Pascal VOC2012,using pixel-level anno-tation in both datasets.•Cross-Joint(Strong):Jointly train DeepLab on Pas-cal VOC2012and MS-COCO,sharing the top-level network weights for the common classes,using pixel-level annotation in both datasets.•Cross-Joint(Semi):Jointly train DeepLab on Pascal VOC2012and MS-COCO,sharing the top-level net-work weights for the common classes,using the pixel-level labels from PASCAL and varying the number of pixel-and image-level labels from MS-COCO.In all cases we use strong pixel-level annotations for all 10,582train aug PASCAL images.We report our results on the PASCAL VOC2012val in Table5,also including for comparison our best PASCAL-only67.6%result exploiting all10,582strong annotations as a baseline.When we employ the weak MS-COCO an-notations(EM-Fixed(Semi))we obtain67.7%IOU,which does not improve over the PASCAL-only baseline.How-ever,using strong labels from5,000MS-COCO images (4.0%of the MS-COCO dataset)and weak labels from the remaining MS-COCO images in the Cross-Joint(Semi) semi-supervised scenario yields70.0%,a significant2.4%boost over the baseline.This Cross-Joint(Semi)result is also1.3%better than the68.7%performance obtained us-ing only the5,000strong and no weak annotations from MS-COCO.As expected,our best results are obtained by using all123,287strong MS-COCO annotations,71.0%for Cross-Pretrain(Strong)and71.7%for Cross-Joint(Strong). We observe that cross-dataset augmentation improves by 4.1%over the best PASCAL-only ing only a small portion of pixel-level annotations and a large portion of image-level annotations in the semi-supervised setting reaps about half of this benefit.Test results We report our PASCAL VOC2012test re-sults in Table6.We include results of other leading models from the PASCAL leaderboard.All our models have been trained with pixel-level annotated images on the PASCAL trainval aug and the MS-COCO2014trainval datasets.Methods based on the DCNN+CRF pipeline of DeepLab-CRF[5]are the most competitive,with perfor-mance surpassing70%,even when only trained on PAS-CAL data.Leveraging the MS-COCO annotations brings about2%improvement.Our top model yields73.9%,using the multi-scale network architecture of[5].Also see[41], which also uses joint PASCAL and MS-COCO training,and further improves performance(74.7%)by end-to-end learn-ing of the DCNN and CRF parameters.4.6.Qualitative Segmentation ResultsIn Fig.6we provide visual comparisons of the results obtained by the DeepLab-CRF model learned with some of the proposed training methods.5.ConclusionsThe paper has explored the use of weak or partial anno-tation in training a state of art semantic image segmenta-tion model.Extensive experiments on the challenging PAS-CAL VOC2012dataset have shown that:(1)Using weak annotation solely at the image-level seems insufficient to train a high-quality segmentation model.(2)Using weak bounding-box annotation in conjunction with careful seg-mentation inference for images in the training set suffices to train a competitive model.(3)Excellent performance is obtained when combining a small number of pixel-level an-notated images with a large number of weakly annotated images in a semi-supervised setting,nearly matching the results achieved when all training images have pixel-level annotations.(4)Exploiting extra weak or strong annota-tions from other datasets can lead to large improvements. AcknowledgmentsThis work was partly supported by ARO62250-CS,and NIH5R01EY022247-03.We also gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.。
2.1 基于颜色直方图的图像检索算法
2.2 基于SIFT的图像检索算法
2.3 基于BoW模型的图像检索算法
2.4 基于深度学习的图像检索算法的优势和不足
3.1 深度学习算法概述
3.2 模型训练
3.2.1 数据准备
3.2.2 卷积神经网络的基本结构
3.2.3 训练过程
3.3 特征提取和相似度匹配
3.3.1 特征提取
3.3.2 相似度匹配
3.3.3 相似度度量
3.4 实验结果和性能分析
3.4.1 实验数据集
3.4.2 实验结果
3.4.3 性能分析
第四章:本文提出的基于深度学习的图像检索算法4.1 算法框架
4.2 数据准备
4.3 模型训练
4.4 特征提取和相似度匹配
4.5 实验结果和性能分析
《稀疏角CT重建的算法研究》篇一一、引言计算机断层扫描(Computed Tomography, CT)技术是现代医学影像诊断的重要手段之一。
这种问题主要源于以下几个方面:1. 扫描角度不全面:由于某些角度的投影数据缺失,导致重建图像的细节和结构信息丢失。
2. 噪声和伪影:由于数据稀疏,重建图像中的噪声和伪影更为明显。
3. 计算复杂度:稀疏角CT数据的处理需要更高的计算资源和算法复杂度。
下面将重点介绍几种典型的算法及其原理:1. 迭代重建算法迭代重建算法是一种常用的CT重建算法,可以通过多次迭代来提高重建图像的质量。
2. 压缩感知算法压缩感知算法是一种基于信号稀疏性的重建算法,可以通过对稀疏角CT数据进行压缩感知来提高重建图像的质量。
3. 深度学习算法深度学习算法在稀疏角CT重建中也得到了广泛应用。
HSfMHybridStructure-from-Motion《学习笔记》HSfM: Hybrid Structure-from-MotionAbstr a c t为了估计初始的相机位姿,SFM⽅法可以被概括为增量式或全局式。
Intro duc tio nSFM技术是指通过⼀系列图⽚估计三维场景结构和相机位姿。
1. 数据获取与处理肿瘤细胞自动识别算法的研究需要大量的肿瘤细胞图像,这些图像可以通过肿瘤病理学的方法获得。
2. 特征提取与选择在肿瘤细胞图像中,每个细胞都有各自的特征。
3. 细胞分类与边缘识别通过特征提取和选择,可以对细胞进行分类识别,并对其边缘进行识别。
4. 算法优化和实现在完成以上步骤后,还需要对算法进行优化,以提高准确性和效率,同时实现算法的工程化应用。
1. 临床诊断在实际的临床诊断中,医生需要针对患者的身体情况对肿瘤病理学进行细致的观察和分析。
《稀疏角CT重建的算法研究》篇一一、引言近年来,随着计算机技术的发展和医疗设备精度的提升,稀疏角CT(Computed Tomography)重建技术在医学诊断和放射学研究中发挥着越来越重要的作用。
三、稀疏角CT重建算法的原理及特点1. 解析法:解析法利用投影数据的傅里叶变换进行反投影,计算速度快,但重建图像的分辨率和信噪比相对较低。
2. 迭代法:迭代法通过不断迭代优化,逐步逼近真实图像。
1.远程协同设计中的特征提取与重建 [J], 文义
2.一种新的协同设计环境中访问控制模型研究 [J], 沈国强;姚丽华;张国煊;姜涛
3.网格协同设计环境中的任务调度机制 [J], 杨格兰;左伟明
4.协同设计环境中任务分解与调度的研究 [J], 金黎黎;孔令富
5.协同商务与设计环境中的安全体系与技术研究 [J], 唐业;张申生;李磊
基于 C-V 模型的木材缺陷重建图像特征提取1)
基于 C-V 模型的木材缺陷重建图像特征提取1)刘嘉新;吴彤;王克奇【摘要】以含有裂纹的柳木木材、含有空洞的椴树木材为样本,实验研究了基于细胞反演法的木材重建图像的特征提取。
%We proposed a feature extraction algorithm of wood reconstructed images based on the cell -inversion method .Firstly, the cell-inversion method is used to reconstruct the wood tomographic image , and open smooth filtering is implemented to the results.Secondly, C-V model is used to obtain the wood defect features from the reconstructed image .Finally, a com-parison is made between measured and real values of the defect area which provide a quality evaluation of the cell inversion algorithm.Willow samples with cracks and Linden samples with cavity were used for experiment .The C-V model can seg-ment the defect image and extract features from the reconstructed image .This research will provide a quality evaluation method of wood defect image reconstruction and the future design of wood defectmonitor .【期刊名称】《东北林业大学学报》【年(卷),期】2015(000)012【总页数】4页(P78-81)【关键词】木材缺陷;缺陷特征;C-V模型【作者】刘嘉新;吴彤;王克奇【作者单位】东北林业大学,哈尔滨,150040;东北林业大学,哈尔滨,150040;东北林业大学,哈尔滨,150040【正文语种】中文【中图分类】S781.5应力波图像重建技术,是在不破坏木材本身的前提下将木材内部断层进行重建,从而获取木材的内部状况信息;木材无损检测技术是多样化的,而应力波检测法无疑是木材的无损检测技术中最值得深入研究的技术[1]。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
dmitrii.a.marin@ yzhong.cs@ mdrangova@robarts.ca yuri@csd.uwo.ca
the curvature of the object boundary. Moreover, we do not assume that the boundary of a thin structure (e.g. vessel or road) is given. Detection variables are estimated simultaneously with the center-line. This paper proposes a general energy formulation and an optimization algorithm for detection and subpixel delineation of thin structures based on curvature regularization. Curvature is a natural regularizer for thin structures and it has been widely explored in the past. In the context of image segmentation with second-order smoothness it was studied by [31, 37, 32, 5, 14, 28, 25]. It is also a popular second-order prior in stereo or multi-view-reconstruction [20, 27, 40]. Curvature has been used inside connectivity measures for analysis of diffusion MRI [24]. Curvature is also widely used for inpainting [3, 7] and edge completion [13, 39, 2]. For example, stochastic completion field technique in [39, 24] estimates probability that a completed/extrapolated curve passes any given point assuming it is a random walk with bias to straight paths. Note that common edge completion methods use existing edge detectors as an input for the algorithm. In contrast to these prior works, this paper proposes a
Many applications in vision require estimation of thin structures such as boundary edges, surfaces, roads, blood vessels, neurons, etc. Unlike most previous approaches, we simultaneously detect and delineate thin structures with sub-pixel localization and real-valued orientation estimation. This is an ill-posed problem that requires regularization. We propose an objective function combining detection likelihoods with a prior minimizing curvature of the centerlines or surfaces. Unlike simple block-coordinate descent, we develop a novel algorithm that is able to perform joint optimization of location and detection variables more effectively. Our lower bound optimization algorithm applies to quadratic or absolute curvature. The proposed early vision framework is sufficiently general and it can be used in many higher-level applications. We illustrate the advantage of our approach on a range of 2D and 3D examples.
In proceedings of “International Conference on Computer Vision” (ICCV), Santiago, Chile, Dec. 2015
Tth Curvature Regularization
1. Introduction
A large amount of work in computer vision is devoted to estimation of structures like edges, center-lines, or surfaces for fitting thin objects such as intensity boundaries, blood vessels, neural axons, roads, or point clouds. This paper is focused on the general concept of a center-line, which could be defined in different ways. For example, Canny approach to edge detection implicitly defines a center-line as a “ridge” of intensity gradients [6]. Standard methods for shape skeletons define medial axis as singularities of a distance map from a given object boundary [35, 34]. In the context of thin objects like edges, vessels, etc, we consider a center-line to be a smooth curve minimizing orthogonal projection errors for the points of the thin structure. We study curvature of the center-line as a regularization criteria for its inference. In general, curvature is actively discussed in the context of thin structures. For example, it is well known that curvature of the object boundary has significant effect on the medial axis [17, 35]. In contrast, we are directly concerned with curvature of the center-line, not
In proceedings of “International Conference on Computer Vision” (ICCV), Santiago, Chile, Dec. 2015
general low-level regularization framework for detecting thin structures with accurate estimation of location and orientation. In contrast to [39, 13, 24] we explicitly minimize the integral of curvature along the estimated thin structure. Unlike [12] we do not use curvature for grouping predetected thin structures, we use curvature as a regularizer during the detection stage. Related work: Our regularization framework is based on the curvature estimation formula proposed by Olsson et al. [26, 27] in the context of surface fitting to point clouds for multi-view reconstruction, see Fig.2(a). One assumption in [26, 27] is that the data points are noisy readings of the surface. While the method allows outliers, their formulation is focused on estimation of local surface patches. Our work can be seen as a generalization to detection problems where majority of the data points, e.g. image pixels in Fig.2(c), are not within a thin structure. In addition to local tangents, our method estimates probability that the point is a part of the thin structure. Section 2 discusses in details this and other significant differences from the formulation in [26, 27]. Assuming pi and pj are neighboring points on a thin structure, e.g. a curve, Olsson et al. [26] evaluate local curvature as follows. Let li and lj be the tangents to the curve at points pi and pj . Then the authors propose the following approximation for the absolute curvature |κ(li , lj )| = ||li − pj || + ||lj − pi || ||pi − pj ||