SPARSE IMAGE CODING USING LEARNED OVERCOMPLETE DICTIONARIES

合集下载

人工智能深度学习技术练习(习题卷19)

人工智能深度学习技术练习(习题卷19)

人工智能深度学习技术练习(习题卷19)第1部分:单项选择题,共47题,每题只有一个正确答案,多选或少选均不得分。

1.[单选题]MNIST训练集的第一个数据是()。

A)4B)2C)0D)5答案:D解析:难易程度:易题型:2.[单选题]在课堂中使用分类器将那两种生物进行分类()A)毛虫和萤火虫B)臭虫和瓢虫C)毛虫和瓢虫D)萤火虫和瓢虫答案:C解析:难易程度:易题型:3.[单选题]实现带偏置的卷积操作的运算是:A)conv_ret1 = conv2d(x_image, W_conv1) + b_conv1B)conv_ret1 = conv2d(x_image, W_conv1 + b_conv1)C)conv_ret1 = conv2d(x_image, W_conv1, b_conv1)D)conv_ret1 = conv2d(x_image, W_conv1)答案:A解析:4.[单选题]下列有关循环神经网络中的Embedding层的描述,错误的是( )A)Fmbedding层通常用在神经网络的第一层。

B)Embedding层将正整数(索引)转换为固定大小的向量。

C)mbedding层后得到的密集向量的每个元素只能是0或!。

D)若Embedding层的输入的大小为(batch sizeinput length),则输出的大小为(batch size, input length,output dim),outputdim是密集向量的维数。

答案:C解析:5.[单选题]lstm中的门,使用哪个激活函数控制A)reluB)sigmoidC)tanhD)softmax答案:B解析:6.[单选题]手写字识别模型中,隐藏层的节点个数为()A)500B)784C)576D)28答案:A解析:7.[单选题]tf.GradientTape用来记录()过程A)正向传播B)反向传播C)参数更新D)代价处理答案:A解析:8.[单选题]compile函数中没有以下哪一个参数A)迭代次数B)优化算法C)评估指标D)损失值答案:A解析:9.[单选题]情感分类属于哪一类问题?A)多个输入多个输出B)一个输入多个输出C)一个输入一个输出D)多个输入一个输出答案:D解析:10.[单选题]Mini-batch的原理是A)选取数据中部分数据进行梯度下降B)和批量梯度下降相同,只是将算法进行优化C)将数据每次进行一小批次处理,通过迭代将数据全部处理D)随机选取一些数据,计算梯度进行下降,每次将学习率降低一点答案:C解析:11.[单选题]深度神经网络的缩写是?A)CNNB)RNNC)SNND)DNN答案:D解析:12.[单选题]上图中激活函数属于哪一个?A)SigmoidB)Leaky ReLUC)tanhD)Relu答案:C解析:tanh13.[单选题]Dropout运行原理:A)随机取消一些节点,只是用部分节点进行拟合运算,防止过拟合B)dropout能增加新样本防止过拟合C)dropout进行归一化操作,防止过拟合D)dropout通过给损失函数增加惩罚项,防止过拟合答案:A解析:14.[单选题]如果x的值是True,那么tf.cast(x, tf.float32)的值是什么?A)0.0B)1.0C)FalseD)True答案:B解析:15.[单选题]不属于Python保留字的是()。

吴恩达推荐的深度学习书目

吴恩达推荐的深度学习书目

UFLDL Recommended ReadingsIf you're learning about UFLDL (Unsupervised Feature Learning and Deep Learning), here is a list of papers to consider reading. We're assuming you're already familiar with basic machine learning at the level of [CS229 (lecture notes available)].The basics:▪[CS294A] Neural Networks/Sparse Autoencoder Tutorial. (Most of this is now in the UFLDL Tutorial, but the exercise is still on the CS294A website.)▪[1] Natural Image Statistics book, Hyvarinen et al.▪This is long, so just skim or skip the chapters that you already know.▪Important chapters: 5 (PCA and whitening; you'll probably already know the PCA stuff),6 (sparse coding),7 (ICA), 10 (ISA), 11 (TICA), 16 (temporal models).▪[2] Olshausen and Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images Nature 1996. (Sparse Coding)▪[3] Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer and Andrew Y. Ng.Self-taught learning: Transfer learning from unlabeled data. ICML 2007Autoencoders:▪[4] Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 2006.▪If you want to play with the code, you can also find it at [5].▪[6] Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. Greedy Layer-Wise Training of Deep Networks. NIPS 2006▪[7] Pascal Vincent, Hugo Larochelle, Yoshua Bengio and Pierre-Antoine Manzagol.Extracting and Composing Robust Features with Denoising Autoencoders. ICML 2008.▪(They have a nice model, but then backwards rationalize it into a probabilistic model.Ignore the backwards rationalized probabilistic model [Section 4].)Analyzing deep learning/why does deep learning work:▪[8] H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation. ICML 2007.▪(Someone read this and let us know if this is worth keeping,. [Most model related material already covered by other papers, it seems not many impactful conclusionscan be made from results, but can serve as reading for reinforcement for deepmodels])▪[9] Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. Why Does Unsupervised Pre-training Help Deep Learning?JMLR 2010▪[10] Ian J. Goodfellow, Quoc V. Le, Andrew M. Saxe, Honglak Lee and Andrew Y. Ng.Measuring invariances in deep networks. NIPS 2009.RBMs:▪[11] Tutorial on RBMs.▪But ignore the Theano code examples.▪(Someone tell us if this should be moved later. Useful for understanding some of DL literature, but not needed for many of the later papers? [Seems ok to leave in, usefulintroduction if reader had no idea about RBM's, and have to deal with Hinton's 06Science paper or 3-way RBM's right away])Convolution Networks:▪[12] Tutorial on Convolution Neural Networks.▪But ignore the Theano code examples.Applications:▪Computer Vision▪[13] Jianchao Yang, Kai Yu, Yihong Gong, Thomas Huang. Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009 ▪[14] A. Torralba, R. Fergus and Y. Weiss. Small codes and large image databases for recognition. CVPR 2008.▪Audio Recognition▪[15] Unsupervised feature learning for audio classification using convolutional deep belief networks, Honglak Lee, Yan Largman, Peter Pham and Andrew Y. Ng. In NIPS 2009.Natural Language Processing:▪[16] Yoshua Bengio, Réjean Ducharme, Pascal Vincent and Christian Jauvin, A Neural Probabilistic Language Model. JMLR 2003.▪[17] R. Collobert and J. Weston. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. ICML 2008.▪[18] Richard Socher, Jeffrey Pennington, Eric Huang, Andrew Y. Ng, and Christopher D.Manning. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions.EMNLP 2011▪[19] Richard Socher, Eric Huang, Jeffrey Pennington, Andrew Y. Ng, and Christopher D.Manning. Dynamic Pooling and Unfolding Recursive Autoencoders for ParaphraseDetection. NIPS 2011▪[20] Mnih, A. and Hinton, G. E. Three New Graphical Models for Statistical Language Modelling. ICML 2007Advanced stuff:▪Slow Feature Analysis:▪[21] Slow feature analysis yields a rich repertoire of complex cell properties. Journal of Vision, 2005.▪Predictive Sparse Decomposition▪[22] Koray Kavukcuoglu, Marc'Aurelio Ranzato, and Yann LeCun, "Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition", Computationaland Biological Learning Lab, Courant Institute, NYU, 2008.▪[23] Kevin Jarrett, Koray Kavukcuoglu, Marc'Aurelio Ranzato, and Yann LeCun, "What is the Best Multi-Stage Architecture for Object Recognition?", In ICCV 2009Mean-Covariance models▪[24] M. Ranzato, A. Krizhevsky, G. Hinton. Factored 3-Way Restricted Boltzmann Machines for Modeling Natural Images. In AISTATS 2010.▪[25] M. Ranzato, G. Hinton, Modeling Pixel Means and Covariances Using Factorized Third-Order Boltzmann Machines. CVPR 2010▪(someone and tell us if you need to read the 3-way RBM paper before the mcRBM one[I didn't find it necessary, in fact the CVPR paper seemed easier to understand.])▪[26] Dahl, G., Ranzato, M., Mohamed, A. and Hinton, G. E. Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine. NIPS 2010.▪[27] Y. Karklin and M. S. Lewicki, Emergence of complex cell properties by learning to generalize in natural scenes, Nature, 2008.▪(someone tell us if this should be here. Interesting algorithm + nice visualizations, though maybe slightly hard to understand. [seems a good reminder there are otherexisting models])Overview▪[28] Yoshua Bengio. Learning Deep Architectures for AI. FTML 2009.▪(Broad landscape description of the field, but technical details there are hard to follow so ignore that. This is also easier to read after you've gone over some of literature ofthe field.)Practical guides:▪[29] Geoff Hinton. A practical guide to training restricted Boltzmann machines. UTML TR 2010–003.▪ A practical guide (read if you're trying to implement and RBM; but otherwise skip since this is not really a tutorial).▪[30] Y. LeCun, L. Bottou, G. Orr and K. Muller. Efficient Backprop. Neural Networks: Tricks of the trade, Springer, 1998▪Read if you're trying to run backprop; but otherwise skip since very low-level engineering/hackery tricks and not that satisfying to read.Also, for other lists of papers:▪[31] Honglak Lee's Course▪[32] from Geoff's tutorial。

计算光谱成像联合色差矫正及超分辨技术研究

计算光谱成像联合色差矫正及超分辨技术研究
CASSI 是高度非线性的光学系统,不同像素点的点扩散函数不同,难以统一处理,本
文提出了基于分段线性近似点扩散函数的色差矫正方法。通过仿真实验验证,本文提
出的色差矫正方法能够有效减少色差对重建图像质量的影响,从而提高了光谱图像的
重建质量。
此外,针对 CASSI 系统重建图像分辨率低的问题,本文另辟蹊径,提出了联合
poor imaging quality and low resolution. Based on this, some papers put forward specific
solutions for better reconstruction quality, such as multi-frame observation and an
图 1.3
编码快照光谱成像系统示意图 ............................................................................. 4
图 1.4
DD-CASSI 系统[15] ................................................................................................. 4
process is divided into two stage: the observation process and the data restoration process.
In the observation stage, the measurements are obtained by coding and sampling of the
图 1.5

机器学习大师

机器学习大师

机器学习大师Andrew Ng Stanford University & Google Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning Jiquan Ngiam Richard Socher Will Zou Kai Chen Greg Corrado Jeff Dean Matthieu Devin Rajat Monga Marc’Aurelio Paul Tucker Kay Le Ranzato Stan ford: Google: Thanks to: 400 100,000 Andrew Ng This talk The idea of “deep learning.” Using brain simulations, hope to: - Make learning algorithms much better and easier to use. - Make revolutionary advances in machine learning and AI. Vision is not only mine; shared with many researchers: E.g., Samy Bengio, Yoshua Bengio, Tom Dean, Jeff Dean, Nando de Freitas, Jeff Hawkins, Geoff Hinton, Quoc Le, Yann LeCun, Honglak Lee, Tommy Poggio, Ruslan Salakhutdinov, Josh Tenenbaum, Kai Yu, Jason Weston, …. I believe this is our best shot at progress towards real AI. Andrew Ng What do we want computers to do with our data? Images/video Audio Text Label: “Motorcycle” Suggest tags Image search … Speech recognition Music classification Speaker identification … Web search Anti-spam Machine translation … Andrew Ng Computer vision is hard! Motorcycle Motorcycle Motorcycle Motorcycle Motorcycle Motorcycle Motorcycle Motorcycle Motorcycle Andrew Ng What do we want computers to do with our data? Images/video Audio Text Label: “Motorcycle” Suggest tags Image search … Speech recognition Speaker identification Music classification … Web search Anti-spam Machine translation … Machine learning performs well on many of these problems, but is a lot of work. What is it about machine learning that makes it so hard to use? Andrew Ng Machine learning for image classification “Motorcycle” This talk: Develop ideas using images and audio. Ideas apply to other problems (e.g., text) too. Andrew Ng Why is this hard? You see this: But the camera sees this: Andrew Ng Machine learning and feature representations Input Raw image Motorbikes “Non”-Motorbikes Learning algorithm pixel 1 p i x e l 2 pixel 1 pixel 2 Andrew Ng Machine learning and feature representations Input Motorbikes “Non”-Motorbikes Learning algorithm pixel 1 p i x e l 2 pixel 1 pixel 2 Raw image Andrew Ng Machine learning and feature representations Input Motorbikes “Non”-Motorbikes Learning algorithm pixel 1 p i x e l 2 pixel 1 pixel 2 Raw image Andrew Ng What we want Input Motorbikes “Non”-Motorbikes Learning algorithm pixel 1 p i x e l 2 Feature representation handlebars wheel E.g., Does it have Handlebars? Wheels? Handlebars W h e e l s Raw image Features Andrew Ng Computing features in computer vision 0.1 0.7 0.6 0.4 0.1 0.4 0.5 0.5 0.1 0.6 0.7 0.5 0.2 0.3 0.4 0.4 0.1 0.7 0.4 0.6 0.1 0.4 0.5 0.5 … Find edges at four orientations Sum up edge strength in each quadrant Final feature vector But… we don’t have a handlebars detector. So, researchers try to hand-design features to capture various statistical properties of the image. Andrew Ng Feature representations Learning algorithm Feature Representation Input Andrew Ng How is computerperception done? Image Grasp point Low-level features Image Vision features Detection Images/video Audio Audio features Speaker ID Audio Text Text Text features Text classification, Machine translation, Information retrieval, .... Andrew Ng Feature representations Learning algorithm Feature Representation Input Andrew Ng Computer vision features SIFT Spin image HoG RIFT Textons GLOH Andrew Ng Audio features ZCR Spectrogram MFCC Rolloff Flux Andrew Ng NLP features Parser features Named entity recognition Stemming Part of speech Anaphora Ontologies (WordNet) Coming up with features is difficult, time- consuming, requires expert knowledge. When working applications of learning, we spend a lot of time tuning the features. Andrew Ng Feature representations Input Learning algorithm Feature Representation Andrew Ng The “one learning algorithm” hypothesis [Roe et al., 1992] Auditory cortex learns to see Auditory Cortex Andrew Ng The “one learning algorithm” hypothesis [Metin & Frost, 1989] Somatosensory cortex learns to see Somatosensory Cortex Andrew Ng Sensor representations in the brain [BrainPort; Welsh & Blasch, 1997; Nagel et al., 2005; Constantine-Paton & Law, 2009] Seeing with your tongue Human echolocation (sonar) Haptic belt: Direction sense Implanting a 3 rd eye Andrew Ng On two approaches to computer perception The adult visual system computes an incredibly complicated function of the input. We can try to directly implement most of this incredibly complicated function (hand-engineer features). Can we learn this function instead? A trained learning algorithm (e.g., neural network, boosting, decision tree, SVM,…) is very complex. But the learning algorithm itself is usually very simple. The complexity of the trained algorithm comes from the data, not the algorithm. Andrew Ng Learning input representations Find a better way to represent images than pixels. Andrew Ng Learning input representations Find a better way to represent audio. Andrew Ng Feature learning problem ?? Given a 14x14 image patch x, can represent it using 196 real numbers. ?? Problem: Can we find a learn a better feature vector to represent this? 255 98 93 87 89 91 48 … Andrew Ng Self-taught learning (Unsupervised Feature Learning) Testing: What is this? [This uses unlabeled data. One can learn the features from labeled data too.] Not motorcycles Unlabeled images … Motorcycles Andrew Ng Self-taught learning (Unsupervised Feature Learning) Testing: What is this? [This uses unlabeled data. One can learn the features from labeled data too.] Not motorcycles Unlabeled images … Motorcycles Andrew Ng First stage of visual processing: V1 V1 is the first stage of visual processing in the brain. Neurons in V1 typically modeled as edge detectors: Neuron #1 of visual cortex (model) Neuron #2 of visual cortex (model) Andrew Ng Feature Learning via Sparse Coding Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). Input: Images x (1) , x (2) , …, x (m) (each in R n x n ) Learn: Dictionary of bases f 1 , f 2 , …, f k (alsoR n x n ), so that each input x can be approximately decomposed as: x a j f j s.t. a j ’s are mostly zero (“sparse”) [NIPS 2006, 2007] j=1 k Andrew Ng Sparse coding illustration Natural Images Learned bases (f 1 , …, f 64 ): “Edges” 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 0.8 * + 0.3 * + 0.5 * x 0.8 * f 36 + 0.3 * f 42 + 0.5 * f 63 [a 1 , …, a 64 ] = [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, 0] (feature representation) Test example More succinct, higher-level, representation. Andrew Ng More examples Represent as: [a 15 =0.6, a 28 =0.8, a 37 = 0.4]. Represent as: [a 5 =1.3, a 18 =0.9, a 29 = 0.3]. 0.6 * + 0.8 * + 0.4 * f 15 f 28 f 37 1.3 * + 0.9 * + 0.3 * f 5 f 18 f 29 ?? Method “invents” edge detection. ?? Automatically learns to represent an image in terms of the edges that appear in it. Gives a more succinct, higher-level representation than the raw pixels. ?? Quantitatively similar to primary visual cortex (area V1) in brain. Andrew Ng Sparse coding applied to audio [Evan Smith & Mike Lewicki, 2006] Image shows 20 basis functions learned from unlabeled audio. Andrew Ng Sparse coding applied to audio [Evan Smith & Mike Lewicki, 2006] Image shows 20 basis functions learned from unlabeled audio. Andrew Ng Sparse coding applied to touch data Collect touch data using a glove, following distribution of grasps used by animals in the wild. Grasps used by animals [Macfarlane & Graziano, 2009] Sparse Autoencoder Sample Bases Sparse RBM Sample Bases ICA Sample Bases K-Means Sample Bases Sparse Autoencoder Sample Bases Sparse RBM Sample Bases ICA Sample Bases K-Means Sample Bases Example learned representations -1 -0.5 0 0.5 1 0 5 10 15 20 25 Experimental Data Distribution Log (Excitatory/Inhibitory Area) N u m b e r o f N e u r o n s -1 -0.5 0 0.5 1 0 5 10 15 20 25 Model Distribution Log (Excitatory/Inhibitory Area) N u m b e r o f B a s e s -1 -0.5 0 0.51 0 0.02 0.04 0.06 0.08 0.1 Log (Excitatory/Inhibitory Area) P r o ba b i l i t y PDF comparisons (p = 0.5872) Biological data Learning Algorithm [Andrew Saxe] Andrew Ng Learning feature hierarchies Input image (pixels) “Sparse coding” (edges; cf. V1) Higher layer (Combinations of edges; cf. V2) [Lee, Ranganath & Ng, 2007] x 1 x 2 x 3 x 4 a 3 a 2 a 1 [Technical details: Sparse autoencoder or sparse version of Hinton’s DBN.] Andrew Ng Learning feature hierarchies Input image Model V1 Higher layer (Model V2?) Higher layer (Model V3?) [Lee, Ranganath & Ng, 2007] [Technical details: Sparse autoencoder or sparse version of Hinton’s DBN.] x 1 x 2 x 3 x 4 a 3 a 2 a 1 Andrew Ng Hierarchical Sparse coding (Sparse DBN): Trained on face images pixels edges object parts (combination of edges) object models [Honglak Lee] Training set: Aligned images of faces. Andrew Ng Features learned from training on different object classes. Hierarchical Sparse coding (Sparse DBN) FacesCars Elephants Chairs [Honglak Lee] Andrew Ng Machine learning applications Andrew Ng Video Activity recognition (Hollywood 2 benchmark) Method Accuracy Hessian + ESURF [Williems et al 2008] 38% Harris3D + HOG/HOF [Laptev et al 2003, 2004] 45% Cuboids + HOG/HOF [Dollar et al 2005, Laptev 2004] 46% Hessian + HOG/HOF [Laptev 2004, Williems et al 2008] 46% Dense + HOG / HOF [Laptev 2004] 47% Cuboids + HOG3D [Klaser 2008, Dollar et al 2005] 46% Unsupervised feature learning (our method) 52% Unsupervised feature learning significantly improves on the previous state-of-the-art. [Le, Zhou & Ng, 2011] Andrew Ng Sparse coding on audio (speech) 0.9 * + 0.7 * + 0.2 * Spectrogram x f 36 f 42 f 63 Andrew Ng Dictionary of bases f i learned for speech [Honglak Lee] Many bases seem to correspond to phonemes. Andrew Ng Hierarchical Sparse coding (sparse DBN) for audio Spectrogram [Honglak Lee] Andrew Ng Spectrogram Hierarchical Sparse coding (sparse DBN) for audio [Honglak Lee] Andrew Ng Hierarchical Sparse coding (sparse DBN) for audio [Honglak Lee] Andrew Ng Phoneme Classification (TIMIT benchmark) Method Accuracy Clarkson and Moreno (1999) 77.6% Gunawardana et al. (2005) 78.3% Sung et al. (2007) 78.5% Petrov et al. (2007) 78.6% Sha and Saul (2006) 78.9% Yu et al. (2006) 79.2% Unsupervised feature learning (our method) 80.3% Unsupervised feature learning significantly improves on the previous state-of-the-art. [Lee et al., 2009] Andrew Ng State-of-the-art Unsupervised feature learning Andrew Ng Images Multimodal (audio/video) CIFAR Object classification Accuracy Prior art (Ciresan et al., 2011) 80.5% Stanford Feature learning 82.0% NORB Object classification Accuracy Prior art (Scherer et al., 2010) 94.4% Stanford Feature learning 95.0% AVLetters Lip reading Accuracy Prior art (Zhao et al., 2009) 58.9% Stanford Feature learning 65.8% Galaxy Other unsupervised feature learning records: Pedestrian detection (Yann LeCun) Speech recognition (Geoff Hinton) PASCAL VOC object classification (Kai Yu) Hollywood2 Classification Accuracy Prior art (Laptev et al., 2004) 48% Stanford Feature learning 53% KTH Accuracy Prior art (Wang et al., 2010) 92.1% Stanford Feature learning 93.9% UCF Accuracy Prior art (Wang et al., 2010) 85.6% Stanford Feature learning 86.5% YouTube Accuracy Prior art (Liu et al., 2009) 71.2% Stanford Feature learning 75.8% Video Text/NLP Paraphrase detection Accuracy Prior art (Das & Smith, 2009) 76.1% Stanford Feature learning 76.4% Sentiment (MR/MPQA data) Accuracy Prior art (Nakagawa et al., 2010) 77.3% Stanford Feature learning 77.7% Andrew Ng Technical challenge: Scaling up Andrew Ng Supervised Learning ?? Choices of learning algorithm: – Memory based – Winnow – Perceptron – Na??ve Bayes – SVM –…. ?? What matters the most? [Banko & Brill, 2001] Training set size (millions) A c c u r a c y “It’s not who has the best algorithm that wins. It’s who has the most data.” Andrew Ng Scaling and classification accuracy (CIFAR-10) Large numbers of features is critical. The specific learning algorithm is important, but ones thatcan scale to many features also have a big advantage. [Adam Coates] Andrew Ng Attempts to scale up Significant effort spent on algorithmic tricks to get algorithms to run faster. ?? Efficient sparse coding. [LeCun, Ng, Yu] ?? Efficient posterior inference [Bengio, Hinton] ?? Convolutional Networks. [Bengio, de Freitas, LeCun, Lee, Ng] ?? Tiled Networks. [Hinton, Ng] ?? Randomized/fast parameter search. [DiCarlo, Ng] ?? Massive data synthesis. [LeCun, Schmidhuber] ?? Massive embedding models [Bengio, Collobert, Hinton, Weston] ?? Fast decoder algorithms. [LeCun, Lee, Ng, Yu] ?? GPU, FPGA and ASIC implementations. [Dean, LeCun, Ng, Olukotun] [Raina, Madhavan and Ng, 2008] Andrew Ng Images Multimodal (audio/video) CIFAR Object classification Accuracy Prior art (Ciresan et al., 2011) 80.5% Stanford Feature learning 82.0% NORB Object classification Accuracy Prior art (Scherer et al., 2010) 94.4% Stanford Feature learning 95.0% AVLetters Lip reading Accuracy Prior art (Zhao et al., 2009) 58.9% Stanford Feature learning 65.8% Galaxy Other unsupervised feature learning records: Pedestrian detection (Yann LeCun) Speech recognition (Geoff Hinton) PASCAL VOC object classification (Kai Yu) Hollywood2 Classification Accuracy Prior art (Laptev et al., 2004) 48% Stanford Feature learning 53% KTH Accuracy Prior art (Wang et al., 2010) 92.1% Stanford Feature learning 93.9% UCF Accuracy Prior art (Wang et al., 2010) 85.6% Stanford Feature learning 86.5% YouTube Accuracy Prior art (Liu et al., 2009) 71.2% Stanford Feature learning 75.8% Video Text/NLP Paraphrase detection Accuracy Prior art (Das & Smith, 2009) 76.1% Stanford Feature learning 76.4% Sentiment (MR/MPQA data) Accuracy Prior art (Nakagawa et al., 2010) 77.3% Stanford Feature learning 77.7% Andrew Ng Scaling up: Discovering object classes [Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Greg Corrado, Matthieu Devin, Kai Chen, Jeff Dean] Andrew Ng Training procedure What features can we learn if we train a massive model on a massive amount of data. Can we learn a “grandmother cell”? ?? Train on 10 million images (YouTube) ?? 1000 machines (16,000 cores) for 1 week. ?? 1.15 billion parameters ?? Test on novel images Training set (YouTube) Test set (FITW + ImageNet) Andrew Ng Face neuron [Raina, Madhavan and Ng, 2008] Top Stimuli from the test set Optimal stimulus by numerical optimization Random distractors Faces Andrew Ng Invariance properties F e a t u r e r e s p o n s e Horizontal shift Vertical shift F e a t u r e r e s p o n s e 3D rotation angle F e a t u r e r e s p o n s e 90 +15 pixels o F e a t u r e r e s p o n s e Scale factor 1.6x +15 pixels Andrew Ng Cat neuron [Raina, Madhavan and Ng, 2008] Top Stimuli from the test set Optimal stimulus by numerical optimization Cat face neuron Random distractors Cat faces Visualization Top Stimuli from the test set Optimal stimulus by numerical optimization Pedestrian neuron Random distractors Pedestrians Andrew Ng Weaknesses & Criticisms Andrew Ng Weaknesses & Criticisms ?? You’re learning everything. It’s better to encode prior knowledge about structure ofimages (or audio, or text).。

纹理物体缺陷的视觉检测算法研究--优秀毕业论文

纹理物体缺陷的视觉检测算法研究--优秀毕业论文

摘 要
在竞争激烈的工业自动化生产过程中,机器视觉对产品质量的把关起着举足 轻重的作用,机器视觉在缺陷检测技术方面的应用也逐渐普遍起来。与常规的检 测技术相比,自动化的视觉检测系统更加经济、快捷、高效与 安全。纹理物体在 工业生产中广泛存在,像用于半导体装配和封装底板和发光二极管,现代 化电子 系统中的印制电路板,以及纺织行业中的布匹和织物等都可认为是含有纹理特征 的物体。本论文主要致力于纹理物体的缺陷检测技术研究,为纹理物体的自动化 检测提供高效而可靠的检测算法。 纹理是描述图像内容的重要特征,纹理分析也已经被成功的应用与纹理分割 和纹理分类当中。本研究提出了一种基于纹理分析技术和参考比较方式的缺陷检 测算法。这种算法能容忍物体变形引起的图像配准误差,对纹理的影响也具有鲁 棒性。本算法旨在为检测出的缺陷区域提供丰富而重要的物理意义,如缺陷区域 的大小、形状、亮度对比度及空间分布等。同时,在参考图像可行的情况下,本 算法可用于同质纹理物体和非同质纹理物体的检测,对非纹理物体 的检测也可取 得不错的效果。 在整个检测过程中,我们采用了可调控金字塔的纹理分析和重构技术。与传 统的小波纹理分析技术不同,我们在小波域中加入处理物体变形和纹理影响的容 忍度控制算法,来实现容忍物体变形和对纹理影响鲁棒的目的。最后可调控金字 塔的重构保证了缺陷区域物理意义恢复的准确性。实验阶段,我们检测了一系列 具有实际应用价值的图像。实验结果表明 本文提出的纹理物体缺陷检测算法具有 高效性和易于实现性。 关键字: 缺陷检测;纹理;物体变形;可调控金字塔;重构
Keywords: defect detection, texture, object distortion, steerable pyramid, reconstruction
II

基于亮度不变特征的自适应双边滤波算法

基于亮度不变特征的自适应双边滤波算法

基于亮度不变特征的自适应双边滤波算法康长青;徐格静;项东升;赵永标【摘要】针对目前滤波算法存在的亮度敏感性不足,提出基于亮度不变特征的自适应双边滤波算法.算法首先利用局部相位的最大矩和最小矩来建立空间参数的角点和边缘信息表示,接着采用灰度均一性测度来建立亮度距离参数与图像内在噪声的关系,从而建立双边滤波的自适应参数调整策略.实验结果表明,提出算法的平均峰值信噪比值比已有算法高0.49~2.2 dB,具有更好的图像视觉质量和边缘保持能力.【期刊名称】《激光与红外》【年(卷),期】2013(043)005【总页数】4页(P550-553)【关键词】亮度不变特征;双边滤波;相位一致性测度;灰度均一性测度【作者】康长青;徐格静;项东升;赵永标【作者单位】湖北文理学院数学与计算机科学学院,湖北襄阳441053;湖北文理学院数学与计算机科学学院,湖北襄阳441053;湖北文理学院数学与计算机科学学院,湖北襄阳441053;湖北文理学院数学与计算机科学学院,湖北襄阳441053【正文语种】中文【中图分类】TP391.91 引言图像去噪是图像处理和计算机视觉中的重要的基础问题之一。

如何在噪声去除的同时最大程度保留边缘细节是图像去噪的一个难题。

目前常见的边缘保持滤波算法主要有各向异性扩散滤波[1-2],非局部均值滤波[3-4],过完备词典学习[5-6]和双边滤波[7-12]等。

各向异性扩散滤波[1]主要采用梯度模值函数的局部扩散系数,使得图像逐渐逼近,能在一定程度上保持图像的边缘,但是算法在理论上的不适定性,会造成处理过程的不稳定,使得算法处理的时间受噪声方差影响严重;非局部均值滤波[3]主要利用图像的自相似性冗余特征,通过对图像的逐块估计,相似度权重计算和加权平均来去噪,特别适用强纹理图像处理,但是由于逐像素计算块相似度,存在计算复杂度较高,不便于实时运用的缺点。

过完备词典学习算法[5]主要基于稀疏表示理论,通过设计适当的过完备字典,求解稀疏表示来进行滤波,但是该方法同样存在计算量大、复杂度高的不足;相比以上算法,双边滤波算法[7]采用空间距离和亮度距离加权平均,计算简单,实现容易,已经广泛应用于彩色图像处理领域和其他图像处理与分析领域,主要缺点是难以确定合适的参数。

基于最小均方误差估计和稀疏性先验的图像去噪

基于最小均方误差估计和稀疏性先验的图像去噪

基于最小均方误差估计和稀疏性先验的图像去噪孙冬;向豪;卢一相;饶儒婷;杨杨【摘要】提出一种基于稀疏表达理论和最小均方误差估计的图像去噪算法,主要内容包括:在贝叶斯复原框架下,根据图像在冗余字典下的稀疏表达模型,建立原始图像表达系数的最小均方误差复原方程;利用随机正交匹配追踪算法,研究复原方程的数值求解算法,对图像表达系数进行近似求解,进而对原始图像进行恢复.对一组标准测试图像的仿真实验表明,提出的算法能够较好地去除图像中的噪声,并且复原图像具有较好的主观视觉质量和较高的峰值信噪比客观评价指标.【期刊名称】《安徽大学学报(自然科学版)》【年(卷),期】2019(043)001【总页数】5页(P32-36)【关键词】图像去噪;稀疏表达;最小均方误差估计;随机正交匹配追踪;贝叶斯复原【作者】孙冬;向豪;卢一相;饶儒婷;杨杨【作者单位】安徽大学电气工程与自动化学院,安徽合肥230601;安徽大学电气工程与自动化学院,安徽合肥230601;安徽大学电气工程与自动化学院,安徽合肥230601;安徽大学电气工程与自动化学院,安徽合肥230601;安徽大学电子科学与技术学院,安徽合肥230601【正文语种】中文【中图分类】TP311噪声移除是图像处理领域的一个基本问题.由于人类视觉系统对图像边缘结构具有敏感性,因此对噪声污染的图像中相关边缘和纹理的精确恢复是图像去噪的主要研究内容.自20世纪60年代以来,大量的文献利用各类线性及非线性的方法对图像去噪问题进行了深入的研究和探索,采用的方法包括边缘保护去噪[1-2]、多尺度几何分析[3-6]、分形方法[7]、non-local块平均[8-10]、人工神经网络[14]和稀疏表达模型[16]等,这些算法的性能主要取决于在噪声去除的同时对图像边缘结构信息的复原能力.近年来对图像表达模型的研究[11-13]表明,经过训练的冗余字典不仅对噪声具有较强的鲁棒性,能够产生更加稀疏的表达系数,可以灵活有效地捕捉图像的边缘结构,并且更重要的是字典的原子具有局部性、方向性和带通性的特点,这些特点与人类初级视皮层V1区的特性相符,因此相应的图像处理算法能够产生与HVS(human visual system)更加一致的结果.论文根据图像的稀疏表达模型,提出一种基于贝叶斯MMSE(minimum mean square error)估计复原框架下的图像去噪算法.1 基于稀疏表达模型的MMSE图像复原方程假设噪声信号的观测模型为y=y0+e,(1)其中:y0为n维原始信号,高斯噪声项根据稀疏表达理论,若y0可被n×m维冗余字典A表达为y0=Ax0,则(1)式可写为y=Ax0+e.通过对x0的MMSE估计,有得到原始无噪信号y0的近似为考虑x0的最大后验概率(maximum a posteriori, 简称MAP)估计为(3)记s为x的支撑,有其中:Ω为x的所有可能的支撑构成的集合.对于具有某个特定支撑s*的可行解x,在进行P(x|y)求解时,求和号中只有支撑为s*的项会被保留,有(4)将(4)式代入(3)式得又因为P(xs|s,y)P(s|y)∝P(|s,xs)P(xs|s)P(s),有(5)在的支撑已知的情况下,P(s)为常数,上式可化简得(6)假设表达系数由文献[15]知上式也是x0的MMSE估计,即(8)将展开,得(9)上式意味着,是所有可能的支撑的MMSE估计的加权和,权重为P(s|y).根据贝叶斯公式,(9)式可写为(10)上式中的P(s)项可近似为[16](11)其中:α与β为参数,由x0支撑尺寸的概率密度函数P(k)决定.对于(10)式中的P(y|s)项,由文献[15]知(12)其中:c1为常数.将(11)和(12)式代入(10)式,得MMSE复原方程为(13)2 数值求解算法MMSE复原方程在对求解时,需要对所有可能的支撑进行遍历,分别计算相应的和权重ws,计算复杂度为为了简化计算,在的支撑尺寸|s|=k已知时,(13)式可写为(14)其中:为的加权,常数项为进一步减少计算量,采用下述随机OMP算法对(14)式进行近似求解:重复L次:初始化阶段:k=0,初始解x0=0,初始残差r0=y-Ax0=y,初始支撑s0=support{x0}=Φ,下标集B={b1,b2,…,bm}={1,2,…,m}.迭代阶段(重复k次):(1)k=k+1;(2) 随机选择步:(a) 遍历B中的每一项bj,计算概率(b) 对概率pj进行归一化;(c) 根据B中的bj所对应的概率pj,随机选择一个下标值bj0;(3) 支撑更新步:(a) 更新支撑sk=sk-1∪{bj0};(b) 更新下标集B=B/{bj0};(4) 更新解:计算在支撑sk下能够使‖Axk-y‖最小化的解xk;(5) 更新残差:rk=y-Axk.MMSE估计:计算本次支撑下的MMSE估计当所有的L次循环完成之后,得到的近似估计:3 实验为验证该文提出算法的正确性和有效性,对一组标准测试图像进行去噪实验,从主观视觉质量和客观PSNR评价指标对图像去噪的效果进行评价.算法所使用的图像块大小为n=16×16(全堆叠分块策略),OMP编码阈值为随机OMP中执行次数L=100,用于稀疏编码的字典为冗余度为4的DCT字典.图像块的先验支撑尺寸k和表达系数的方差由OMP算法首先对原始的含噪图像直接进行稀疏编码,然后再对表达系数的支撑尺寸和方差进行计算,并以此作为k和的估计.图1给出Lena图像在噪声标准差σe=20时的去噪结果.从帽子的边缘、眼睛的轮廓等细节可以看出,算法能够较好地对噪声污染的高频区域进行重建,主观视觉质量较高,不产生明显的锯齿和模糊现象.为进一步考查算法的去噪性能,图1给出了peppers图像在σe=50强噪声情况下的复原结果.由图1(a)、(b)可以看到,尽管图像中绝大部分的边缘细节已经几乎完全被淹没在噪声中,但该文提出的算法依然表现出了优秀的性能,提供了可接受的边缘复原结果.图1 含噪图像(σe =50)及去噪结果((a)、(b));含噪图像peppers去噪结果(PSNR=28.40)((c)、(d))作为客观质量评价,图2给出了该文算法与其他3种去噪算法的PSNR结果进行对比,这3种算法分别是:Lee滤波器去噪[1]、基于平移不变contourlet变换(translation-invariant contourlet transform, 简称TICT)的去噪[10]和基于K-奇异值分解(K-singular value decomposition,简称K-SVD)算法的图像去噪[17].图2 PSNR数值对比由图2可以看出,该文算法所对应的去噪图像有着更高的PSNR数值,与主观的视觉质量感受具有一致性.4 结束语综上所述,基于稀疏表达模型和MMSE估计的图像去噪算法能够在有效抑制噪声的同时,可以很好地保护图像的边缘结构信息,具有较高的去噪图像质量,在医学成像、测绘、视频监控等领域中拥有较大的应用潜力.下一步的研究目标应主要集中在降低去噪处理时间上,可以考虑使用图形处理器进行加速.参考文献:【相关文献】[1] LEE J S. Digital image enhancement and noise filtering by use of local statistics[J]. IEEE Trans, Pattern Anal Mach Intell, 1980, 2 (2): 165-168.[2] ESLAMI R, RADHA H. Translation-invariant contourlet transform and its application to image denoising[J]. IEEE Trans Image Process, 2006, 15 (11): 3362-3374.[3] BAI J, FENG X C. Fractional-order anisotropic diffusion for image denoising[J]. IEEE Trans Image Process, 2007, 16 (10): 2492-2502.[4] LUISIER F, BLU T, UNSER M. A new SURE approach to image denoising interscale orthonormal wavelet thresholding[J]. IEEE Trans Image Process, 2007, 16 (3): 593-606. [5] SETHUNADH R, THOMAS T. SAR image despeckling in directionlet domain based on edge detection[J]. IEEE Electron Lett, 2013, 49 (6): 422-424.[6] GAO Q W, LU Y X, SUN D. Directionlet-based denoising of SAR images using a Cauchy model[J]. Signal Process, 2013, 93 (5): 1056-1063.[7] GHAZEL M, FREEMAN G H, VRSCAY E R. Fractal-wavelet image denoising[C]// IEEE Conference on Image Processing, 2002: 836-839.[8] ZHANG S, SALARI E. Image denoising using a neural network based non-linear filter in wavelet domain[C]// IEEE Conference on Acoustics, Speech, and Signal Processing, 2005: 989-992.[9] DABOV K, FOI A, KATKOVNIK V, et al. Image denoising by sparse 3D transform-domain collaborative filtering[J]. IEEE Trans Image Process, 2007, 16 (8): 2080-2095.[10] WANG J, GUO Y W, YING Y T, et al. Fast non-local algorithm for imagedenoising[C]//IEEE Conference on Image Processing, 2006: 1429-1432.[11] WU Y, TRACEY B, NATARAJAN P, et al. James-Stein type center pixel weights for non-local means image denoising[J]. IEEE Signal Process Lett, 2013, 20 (4): 411-414.[12] OLSHAUSEN B A, FIELD D J. Sparse coding with no overcomplete basis set: a strategy employed by V1[J]. Vision Research, 1997, 37 (23): 3311-3325.[13] KEWICKI M S, OLSHAUSEN B A. Probabilistic framework for the adaptation and comparison of image codes[J]. Journal of the Optical Society of America, 1999, 16 (7): 1587-1601.[14] VINJE W E, GALLANT J L. Sparse coding and decorrelation in primary visual cortex during natural vision[J]. Science, 2000, 287 (5456): 1273-1276.[15] ELAD M. Sparse and redundant representation: from theory to application in signal and image processing[M]. New York: Springer, 2010: 103-104.[16] SUN D, GAO Q W, LU Y X. A novel image denoising algorithm using linear Bayesian MAP estimation based on sparse representation[J]. Signal Processing, 2014, 100 (7): 132-145.[17] ELAD M, AHARON M. Image denoising via sparse and redundant representations over learned dictionaries[J]. IEEE Trans Image Process, 2006, 15 (12): 3736-3745.。

KSVD-MOD

KSVD-MOD

声明:本人属于绝对的新手,刚刚接触“稀疏表示”这个领域。

之所以写下以下的若干个连载,是鼓励自己不要急功近利,而要步步为赢!所以下文肯定有所纰漏,敬请指出,我们共同进步!踏入“稀疏表达”(Sparse Representation)这个领域,纯属偶然中的必然。

之前一直在研究压缩感知(Compressed Sensing)中的重构问题。

照常理来讲,首先会找一维的稀疏信号(如下图)来验证CS 理论中的一些原理,性质和算法,如测量矩阵为高斯随机矩阵,贝努利矩阵,亚高斯矩阵时使用BP,MP,OMP 等重构算法的异同和效果。

然后会找来二维稀疏信号来验证一些问题。

当然,就像你所想的,这些都太简单。

是的,接下来你肯定会考虑对于二维的稠密信号呢,如一幅lena图像?我们知道CS理论之所以能突破乃奎斯特采样定律,使用更少的采样信号来精确的还原原始信号,其中一个重要的先验知识就是该信号的稀疏性,不管是本身稀疏,还是在变换域稀疏的。

因此我们需要对二维的稠密信号稀疏化之后才能使用CS的理论完成重构。

问题来了,对于lena图像这样一个二维的信号,其怎样稀疏表示,在哪个变换域上是稀疏的,稀疏后又是什么?于是竭尽全力的google...后来发现了马毅的“Image Super-Resolution via Sparse Representation”(IEEE Transactions on Image Processing,Nov.2010)这篇文章,于是与稀疏表达的缘分开始啦!谈到稀疏表示就不能不提下面两位的团队,Yi Ma AND Elad Michael,国内很多高校(像TSinghua,USTC)的学生直奔两位而去。

(下图是Elad M的团队,后来知道了CS界大牛Donoho是Elad M的老师,怪不得...)其实对于马毅,之前稍有了解,因为韦穗老师,我们实验室的主任从前两年开始着手人脸识别这一领域并且取得了不错的成绩,人脸识别这个领域马毅算是大牛了...因此每次开会遇到相关的问题,韦老师总会提到马毅,于是通过各种渠道也了解了一些有关他科研和个人的信息。

卷积稀疏编码csc算法

卷积稀疏编码csc算法

卷积稀疏编码(CSC)算法简介卷积稀疏编码(Convolutional Sparse Coding, CSC)是一种用于信号处理和图像处理的方法,通过对信号进行稀疏表示来提取特征。

CSC算法基于稀疏编码的思想,将输入信号分解为多个原子的线性组合,以实现信号的表示和降维。

CSC算法在图像处理、模式识别、信号压缩等领域有广泛应用,可以用于特征提取、去噪、图像复原等任务。

本文将详细介绍CSC算法的原理、流程和应用。

CSC算法原理CSC算法主要由两个部分组成:字典学习和稀疏编码。

字典学习阶段用于学习一组原子(也称为字典),使得这些原子能够紧密地表示输入信号中的结构信息。

稀疏编码阶段使用学习到的字典,将输入信号表示为字典中原子的线性组合。

字典学习字典学习是CSC算法中非常重要的一步,它旨在从训练数据中自适应地学习一个具有刻画性质的字典。

字典学习可以通过最小化稀疏表示误差来实现,即找到一个字典D和稀疏表示系数X,使得输入信号Y可以近似表示为Y ≈ DX。

常用的字典学习方法有K-SVD算法和OMP算法。

K-SVD算法是一种迭代方法,通过交替更新字典中的原子和稀疏表示系数来逐步优化字典。

OMP算法则是一种贪婪算法,每次选择最相关的原子进行更新。

稀疏编码在字典学习完成后,稀疏编码阶段将输入信号表示为字典中原子的线性组合。

给定一个输入信号Y和一个字典D,稀疏编码问题可以定义为以下优化问题:min ||X||_0, s.t. Y = DX其中||X||_0表示稀疏度,即非零元素的个数。

这个优化问题通常是NP-hard的,因此常常使用近似方法来求解。

常用的稀疏编码方法有OMP算法、L1范数最小化方法等。

OMP算法通过逐步选择最相关的原子,并将其添加到稀疏表示中。

L1范数最小化方法则通过求解一个凸优化问题来获得稀疏表示系数。

CSC算法流程CSC算法的流程可以分为以下几个步骤:1.数据准备:收集和预处理输入信号数据,将其转换为合适的表示形式。

《IEEESignalProcessingMagazine》期刊第22页50条数据

《IEEESignalProcessingMagazine》期刊第22页50条数据

《IEEESignalProcessingMagazine》期刊第22页50条数据《IEEE Signal Processing Magazine》期刊第22页50条数据https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html academic-journal-foreign_ieee-signal-processing-magazine_info_58_1/1.《Turbocharging Interpolated FIR Filters [DSP Tips & Tricks]》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html2.《Sparse Graph Codes for Side Information and Binning》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html3.《Distributed Monoview and Multiview Video Coding》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html4.《Message Passing in Semantic Peer-to-Peer Overlay Networks [Exploratory DSP]》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html5.《Plenoptic Manifolds》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html6.《Compressing Time-Varying Visual Content》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html7.《Forensic Virtual Autopsies by Direct Volume Rendering [DSP Applications]》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html8.《3DTV over IP》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html9.《Signal Processing for Biometric Systems [DSP Forum]》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html10.《Multilinear (Tensor) Image Synthesis, Analysis, and Recognition [Exploratory DSP]》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html11.《Motion Capture Technology for Entertainment [In the Spotlight]》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html12.《Recovering Periodically Spaced Missing Samples [DSP Tips & Tricks]》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html13.《Eval-Ware: Biometrics Resources [Best of the Web]》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html14.《Book Review [review of Algorithmic Information Theory: Mathematics of Digital Information Processing (Seibt, P.; 2006)]》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html15.《Nonparametric Estimates of Biological Transducer Functions》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html16.《Quantum Dots in Imaging [In the Spotlight]》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html17.《Network Coding for the Internet and Wireless Networks》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html18.《Space-time adaptive processing: a knowledge-based perspective for airborne radar》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html19.《Knowledge-based radar signal and data processing: a tutorial review》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html20.《Knowledge-aided adaptive radar at DARPA: an overview》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html21.《Cognitive radar: a way of the future》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html22.《An effective color scale for simultaneous color and gray-scale publications》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html23.《Signal processing for optical communication》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html24.《Knowledge-based resource management for multifunction radar: a look at scheduling and task prioritization》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html25.《The history of linear prediction》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html26.《Turning overlap-save into a multiband mixing, downsampling filter bank》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html27.《Threading and autodocumenting news videos: a promising solution to rapidly browse news topics》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html28.《Automatic genre classification of music content: a survey》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html29.《Radio frequency identification of Hurricane Katrina victims》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html30.《High content cellular imaging for drug development》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html31.《Extracting moods from pictures and sounds: towards truly personalized TV》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html32.《Browsing sports video: trends in sports-related indexing and retrieval work》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html33.《Pictures are not taken in a vacuum - an overview of exploiting context for semantic scene content understanding》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html34.《Looking back and to the future - leadership from a Swedish perspective》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html35.《Semantic segmentation and summarization of music: methods based on tonality and recurrent structure》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html36.《Techniques for Movie Content Analysis and Skimming》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html37.《Small world distributed access of multimedia data: an indexing system that mimics social acquaintance networks》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html38.《Signal processing outreach at the U.S. Naval Academy》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html39.《Video shot detection and condensed representation. a review》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html40.《Using signal processing techniques to model worm propagation over wireless sensor networks》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html41.《Automatic multimedia indexing: combining audio, speech, and visual information to index broadcast news》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html42.《Semantic event detection via multimodal data mining》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html43.《Radio frequency identification of Hurricane Katrina victims》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html44.《Semantic retrieval of multimedia by concept languages: treating semantic concepts like words》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html45.《Semantic retrieval of video - review of research on video retrieval in meetings, movies and broadcast news, and sports》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html46.《The colored revolution of bioimaging》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html47.《Perspective on biomedical quantitative ultrasound imaging》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html48.《On the digital trail of mobile cells》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html49.《Machine learning for multimodality genomic signal processing》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html50.《Tracking in molecular bioimaging》原⽂链接:https:///doc/66505b7aa4c30c22590102020740be1e650ecc96.html /academic-journal-foreign_ieee-signal-processing-magazine_thesis/020*********.html。

基于字典学习和稀疏表示的超分辨率重建算法研究

基于字典学习和稀疏表示的超分辨率重建算法研究
and
corresponding
low-resolution
image
patches
via
using
K-Singular
Value
Decomposition (K-SVD) and the idea of joint dictionary training method for
super-resolution reconstruction. This method will improve the dictionary training
representation error and its lacks of adaptability in the above mentioned classical
Super-resolution reconstruction algorithms, as well as the adaptive algorithm of sparse
representation, such as the optimization models, solving methods, dictionary construction
methods of sparse representation, and so on. Then apply these technologies to the image
中,采用两个正则项约束的重建模型计算复杂度较高的问题。本文提出了一种基于结
构聚类和主成分分析(PCA)子字典学习的超分辨率重建算法。一是利用基于结构聚
类和 PCA 变换的子字典训练方法获得稀疏表示能力更强的子字典用于超分辨率重
建。二是在图像重建阶段优化了 Dong 提出的基于降质恢复的模型,去除两个复杂度

稀疏总结

稀疏总结

稀疏表示在目标检测方面的学习总结1,稀疏表示的兴起大量研究表明视觉皮层复杂刺激的表达采用的是稀疏编码原则,以稀疏编码为基础的稀疏表示方法能较好刻画人类视觉系统对图像的认知特性,已引起人们极大的兴趣和关注,在机器学习和图像处理领域得到了广泛应用,是当前国内外的研究热点之一.[1]Vinje W E ,Gallant J L .Sparse coding and decorrelation in pri- mary visual cortex during natural vision [J].Science ,2000,287(5456):1273-1276.[2]Nirenberg S ,Carcieri S ,Jacobs A ,et al .Retinal ganglion cells act largely as independent encoders [J ].Nature ,2001,411(6838):698-701.[3]Serre T ,Wolf L ,Bileschi S ,et al .Robust object recognition with cortex-like mechanisms[J].IEEE Transactions on PatternAnalysis and Machine Intelligence ,2007,29(3):411-426.[4]赵松年,姚力,金真,等.视像整体特征在人类初级视皮层上的稀疏表象:脑功能成像的证据[J].科学通报,2008,53(11):1296-1304.图像稀疏表示研究主要沿着两条线展开:单一基方法和多基方法.前者主要是多尺度几何分析理论,认为图像具有非平稳性和非高斯性,用线性算法很难处理,应建立适合处理边缘及纹理各层面几何结构的图像模型,以脊波(Ridgelet)、曲波(Curvelet)等变换为代表的多尺度几何分析方法成为图像稀疏表示的有效途径;后者以Mallat 和Zhang 提出的过完备字典分解理论为基础,根据信号本身的特点自适应选取能够稀疏表示信号的冗余基。

稀疏表示

稀疏表示

( D) 2 || ||0
上面的符号表示:最小的线性相关的列向量所含的向量个 数。那么对于0范数优化问题就会有一个唯一的解。可即便是 证明了唯一性,求解这个问题仍然是NP-Hard。
时间继续来到2006年,华裔的数学家Terrence Tao出现, Tao和Donoho的弟子Candes合作证明了在RIP条件下,0范 数优化问题与以下1范数优化问题具有相同的解:
谢谢!
α=(0,0,0.75)
α=(0,0.24,0.75)
α=(0,0.24,0.75)
α=(0,0.24,0.65)
对于上面求内积找最匹配原子的一步,当时鉴于原 子个数太多,就想了可否在这里做个优化,就用了PSO (粒子群优化算法)查找最优原子,这个比遗传算法要 简单,我觉得这个算法也还挺有意思的。 基于学习的方法:
输入的刺激即照片不一样,则响应神经元也不一样
模拟人类视觉系统的感知机制来形成对于图像的稀疏表 示,将字典中的每个原子看作一个神经元,整个字典则对应 人类视觉皮层中神经元整体,并且字典中原子具有类似视觉 皮层中神bor函数作为简单细胞的感受野 函数,刻画其响应特性。
2 2 2 x k y x g K ( ) exp( ) cos(2 ) 2 2
( x x0 ) cos ( y y0 ) sin x
( x x0 ) sin ( y y0 ) cos y
Gabor函数
稀疏表示的应用 图像恢复,又左侧图像恢复出右侧结果
图像修补,左侧图像修补得到右侧结果
图像去模糊左上为输入模糊图像,右下为输出清晰图像, 中间均为迭代过程
物体检测
自行车,左侧输入图像,中间为位置概率图,右侧为检测结果

sparsecodingsr流程

sparsecodingsr流程

sparsecodingsr流程英文版Sparse Coding SR (Super-Resolution) ProcessIn the realm of computer vision and image processing, sparse coding has emerged as a powerful tool for enhancing image quality. Among its many applications, sparse coding has been particularly effective in super-resolution (SR) techniques, where it aims to reconstruct high-resolution images from their low-resolution counterparts. This article outlines the basic steps involved in the sparse coding SR process.1. Understanding Sparse CodingSparse coding is a form of dimensionality reduction where a signal is represented as a linear combination of a small number of elements from a larger dictionary of elements. In the context of images, this dictionary typically consists of image patches or features. The sparsity constraint ensures that only a few of thesepatches contribute significantly to the reconstruction of the original image.2. Preparing the Low-Resolution ImageBefore applying sparse coding for SR, the low-resolution image must be preprocessed. This involves scaling the image to the desired size and potentially applying other image enhancement techniques such as denoising or contrast enhancement.3. Constructing the DictionaryThe next step is to construct a dictionary of high-resolution image patches. These patches are typically extracted from a large collection of high-resolution images or can be learned through an optimization process. The goal is to have a diverse set of patches that can effectively represent a wide range of textures and features.4. Sparse CodingWith the dictionary in place, the low-resolution image is divided into overlapping patches. Each patch is thenrepresented as a sparse combination of patches from the dictionary. This sparse representation is obtained by solving an optimization problem that minimizes the reconstruction error while enforcing sparsity.5. Reconstruction of High-Resolution PatchesUsing the sparse codes obtained in the previous step, high-resolution patches are reconstructed. This is done by mapping the sparse codes back to the dictionary and retrieving the corresponding high-resolution patches.6. Merging the High-Resolution PatchesThe reconstructed high-resolution patches are then merged to form a complete high-resolution image. This merging process requires careful handling to avoid artifacts and ensure smooth transitions between patches.7. Post-Processing and EnhancementFinally, the reconstructed high-resolution image may undergo further post-processing steps such as sharpening,color correction, or noise reduction to further enhance its quality.ConclusionThe sparse coding SR process is an effective way to improve image quality by leveraging the sparse representation of signals. By carefully constructing a dictionary, sparse coding, and reconstructing high-resolution patches, this method enables the reconstruction of high-quality images from their low-resolution counterparts.中文版稀疏编码SR(超分辨率)流程在计算机视觉和图像处理领域,稀疏编码作为一种强大的工具,对提升图像质量有着显著效果。

基于稀疏编码的鲁棒型人脸超分辨率重建

基于稀疏编码的鲁棒型人脸超分辨率重建

基于稀疏编码的鲁棒型人脸超分辨率重建刘芳华;阮若林;倪浩;王建峰【摘要】为了减少人脸超分图像的边缘伪影和图像噪点,利用基于稀疏编码的单幅图像超分辨率重建算法,在字典学习阶段,结合L1范数引入在线字典学习方法,使字典根据当前输入图像块和上次迭代生成的字典逐列更新,得到更加精确的超完备字典对,用于图像重建.实验中进行的仿真结果表明,改进算法超分结果的峰值信噪比(PSNR)和结构相似性(SSIM)比同类型的稀疏编码超分法(SCSR)和应用在线字典学习算法的超分方法(ODLSR)均有较大幅度提升,比后者平均提升0.72 dB和0.0187.同时,视觉上有效地消除了边缘伪影,且在处理含噪人脸图像时,具备更强的去噪能力和更好的鲁棒性.%In order to reduce the artifacts and noises accompanied with the edges of face super-resolution im-ages, the improved algorithm uses the super-resolution model based on sparse coding. In the dictionary learning phase,L1-norm is combined into online dictionary learning which is used as the dictionary training method. The generated dictionary is updated column by column according to the present input image patches and the previous iterated dictionary. Thus the more accurate overcomplete dictionaries can be acquired to re-construct the final image. Comparisons of simulation results in the experiment show that the peak signal-to-noise ratio(PSNR) and structural similarity(SSIM) of the proposed method are much bigger than those of sparse coding super-resolution algorithm(SCSR) and online dictionary learning super-resolution algorithm (ODLSR). The average promotion quantity to the latter algorithms is 0. 72 dB and 0. 0187,respectively. The artifacts along the edges are eliminated effectively.The denoising capability and robustness of the pro-posed algorithm are much better than those of both SCSR and ODLSR in processing noisy face images.【期刊名称】《电讯技术》【年(卷),期】2017(057)008【总页数】6页(P957-962)【关键词】人脸图像;超分辨率重建;稀疏编码;在线字典学习【作者】刘芳华;阮若林;倪浩;王建峰【作者单位】湖北科技学院电子与信息工程学院,湖北咸宁437100;湖北科技学院生物医学工程学院,湖北咸宁437100;湖北科技学院电子与信息工程学院,湖北咸宁437100;湖北科技学院网络管理中心,湖北咸宁437100【正文语种】中文【中图分类】TN919.8在安全监控、数字娱乐等实际应用中,因拍摄距离较远或拍摄设备分辨率较低所获取的低分辨率图像限制了人脸识别、高清设备显示等后续图像处理和显示效果。

稀疏编码(Sparse coding)在图像检索中的应用

稀疏编码(Sparse coding)在图像检索中的应用

稀疏编码(Sparse coding)在图像检索中的应用黄劲;孙洋;徐浩然【摘要】稀疏编码(Sparse Coding)作为深度学习的一个分支,在机器学习领域取得了多个方面的突破。

本文将探索如何将Sparse Coding结合到图像检索的多个模块中,利用Sparse Coding的优点来提高检索的效果。

%As one branch of Deep Learning, Sparse Coding achieves many outstanding results in many fields. In this paper, it tries to merge the Sparse Coding into Image Retrieval System’s multiple modules, and to utilize the advantage of SC to obtain good retrieval performance.【期刊名称】《数字技术与应用》【年(卷),期】2013(000)011【总页数】3页(P76-77,81)【关键词】图像检索;稀疏编码;深度学习【作者】黄劲;孙洋;徐浩然【作者单位】四川大学计算机学院四川成都 610065;四川大学计算机学院四川成都 610065;四川大学计算机学院四川成都 610065【正文语种】中文【中图分类】TP391.41随着互联网图片数量的几何速度增长,大量无标签图片的产生和标签的不准确性,使得基于标签检索图片无法满足要求,基于内容的图像检索成为趋势。

近年来,Sparse Coding作为深度学习的一个分支,在多个领域获得较好的效果,尤其是在图像识别和图像处理方面效果显著。

本文将探讨如何把Sparse Coding融入到基于内容的图像检索的某些模块中,以获得较好的检索效果,并着重介绍Sparse Coding的背景和将其在图像预处理,特征提取,特征融合模块中融入的方法和意义。

SRC

SRC
疏表达的分类
01 参照准则
03 HTML 05 基因
目录
02 要求代码 04 采样率 06 排列
目录
07 混凝土
09 安全应急响应中心
08 源文件 010 系列
SRC-sparse representation-based classifier.基于稀疏表达的分类。
稀疏编码的概念来自于神经生物学。生物学家提出,哺乳类动物在长期的进化中,生成了能够快速,准确, 低代价地表示自然图像的视觉神经方面的能力。我们每看到的一副画面都是上亿像素的,我们的大脑很难像电脑 那样直接存储。研究表明,我们每一副图像都提取出很少的信息用于存储。我们把它叫做稀疏编码,即Sparse Coding.
排列
S.R.C.(Same Raw Continuous)指在某种字母排列组合下,同排连续大小写相同的字母组合(不少于3 个),又称zxc。qwerty键盘就是一种排列方式,zxc就是其中一种字母组合。
混凝土
钢骨混凝土构件 Steel Reinforced Concrete简称SRC构件。
配置钢骨、并按规定配置柔性钢筋的混凝土构件,有钢骨混凝土梁、钢骨混凝土柱、钢骨混凝土剪力墙和钢 骨混凝土筒体等结构构件。
SRC的采用为声卡省下一颗晶振。
基因
src基因(sarcoma gene,肉瘤基因),即鸡肉瘤病毒(SRV)基因组中的基因,可使鸡产生肉瘤,是第一个 鉴定的病毒癌基因。
1970年,Peter Vogt分离到一种Rous病毒的突变体,该突变病毒能够感染细胞并进行复制,但是不能致癌。 由于该突变体只是丧失了将正常细胞转化为癌细胞的能力,因此推测突变的基因是诱导癌变的基因。后来的分析 鉴定发现,该突变体只是缺失了一个基因。由于该基因的缺失,不能诱导肉瘤(sarcoma)的形成,故将此基因命 名为src基因。
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

(Murray:2004) IEEE Workshop on Machine Learning for Signal Processing, Sao Luis, Brazil, Sep 2004SPARSE IMAGE CODING USING LEARNEDOVERCOMPLETE DICTIONARIESJoseph F.Murray1and Kenneth Kreutz-DelgadoUniversity of California,San DiegoElectrical and Computer Engineering9500Gilman Dr Dept0407La Jolla Ca92093-0407Email:jfmurray@,kreutz@Abstract.Images can be coded accurately using a sparse set of vec-tors from an overcomplete dictionary,with potential applicationsin image compression and feature selection for pattern recogni-tion.We discuss algorithms that perform sparse coding and makethree contributions.First,we compare our overcomplete dictio-nary learning algorithm(FOCUSS-CNDL)with overcomplete In-dependent Component Analysis(ICA).Second,noting that oncea dictionary has been learned in a given domain the problem be-comes one of choosing the vectors to form an accurate,sparse rep-resentation,we compare a recently developed algorithm(SparseBayesian Learning with Adjustable Variance Gaussians)to wellknown methods of subset selection:Matching Pursuit and FO-CUSS.Third,noting that in some cases it may be necessary tofind a non-negative sparse coding,we present a modified version ofthe FOCUSS algorithm that canfind such non-negative codings.INTRODUCTIONWe discuss the problem of representing images with a highly sparse set ofvectors drawn from a learned overcomplete dictionary.The problem hasreceived considerable attention since the work of Olshausen and Field[8],whosuggest that this is the strategy used by the visual cortex for representingimages.The implication is that a sparse,overcomplete representation isespecially suitable for visual tasks such as object detection and recognitionthat occur in higher regions of the cortex.A key result of this line of work isthat images(and other data)can be coded more efficiently using a learnedbasis than with a non-adapted basis(e.g.wavelet and Gabor dictionaries)[5].Our earlier work has shown that overcomplete codes can be more efficient1J.F.Murray was supported by the Sloan Foundation and the Arcs Foundation.than complete codes in terms of entropy(bits/pixel),even though there are many more coefficients than image pixels in an overcomplete code[4].Non-learned dictionaries(often composed of Gabor functions)are used to generate the features in many pattern recognition systems[12],and we believe that their performance could be improved using learned dictionaries that are adapted to the image statistics of the inputs.Another natural application of sparse image coding is image compression. Standard compression methods such as JPEG use afixed,complete basis (e.g.discrete cosines).Compression systems(based on methods closely re-lated to those presented here)have shown that using learned overcomplete dictionaries can provide improved compression over such standard techniques [2].Other applications of sparse coding include high-resolution spectral es-timation,direction-of-arrival estimation,speech coding,biomedical imaging and function approximation[10].In some problems,we may desire(or the physics of the problem may dictate)non-negative sparse codings.A multiplicative algorithm for non-negative coding was developed and applied to images[3].A non-negative In-dependent Component Analysis(ICA)algorithm was presented in[9](which also discusses other applications).In[3,9]only the complete case was con-sidered.Here,we present an algorithm that can learn non-negative sources from an overcomplete dictionary,which leads naturally to a learning method that adapts the dictionary for such sources.SPARSE CODING AND VECTOR SELECTIONThe problem of sparse coding is that of representing some data y∈R m(e.g.a patch of an image)using a small number of non-zero components in a source vector x∈R n under the linear modely=Ax+ν,(1)where the dictionary A∈R m×n may be overcomplete(n≥m),and the additive noiseνis assumed to be Gaussian,pν=N(0,σ2ν).By assuming a prior p X(x)on the sources,we can formulate the problem in a Bayesian framework andfind the maximum a posteriori solution for x,x=arg maxp(x|A,y)x=arg max[log p(y|A,x)+log p X(x)].(2)xBy making an appropriate choice for the prior p X(x),we canfind solutions with high sparsity(i.e.few non-zero components).We define sparsity as the number of elements of x that are zero,and the related quantity diversity as the number of non-zero elements,so that diversity=(n−sparsity).Assuming the prior distribution of the sources x is a generalized exponential of the form,p X(x)=c x e−γp d p(x),(3)where the parameter p determines the shape of distribution and c x is a nor-malizing constant to ensure p X (x )is a density function.A common choice for the prior on x is for the function d p (x )to be the p -norm-like measure,d p (x )= x p p =ni =1|x [i ]|p ,0≤p ≤1,(4)where x [i ]are the elements of the vector x .When p =0,d p (x )is a count of the number of non-zero elements of x (diversity),and so d p (x )is referred to as a diversity measure [4].With these choices for d p (x )and p ν,we find that,x =arg max x [log p (y |A,x )+log p X (x )]=arg min x y −Ax 2+λ x p p .(5)When p →0we obtain an optimization problem that directly minimizes the reconstruction error and the diversity of x .When p =1the problem no longer directly minimizes diversity,but the right-hand-side of (5)has the desirable property of being globally convex and so has no local minima.The p =1cost function is used by the Basis Pursuit algorithm [13].FOCUSS and Non-negative FOCUSSFor a given,known dictionary A ,the Focal Underdetermined System Solver (FOCUSS)was developed to solve (5)for p ≤1[10].The algorithm is an iterative re-weighted factored-gradient approach,and has consistently shown better performance than greedy vector-selection algorithms such as Basis Pursuit and Matching Pursuit,although at a cost of increased computation[10].Previous versions of FOCUSS have assumed that x is unrestricted on R n .In some cases however,we may require that the sources be non-negative,x [i ]≥0.This amounts to a change of prior on x from symmetric to one-sided,but this results in nearly the same optimization problem as (5).To create a non-negative FOCUSS algorithm,we need to ensure that the x [i ]are initial-ized to non-negative values,and that each iteration keeps the sources in the feasible region.To do so,we propose the non-negative FOCUSS algorithm,Π−1( xk )=diag(| x k [i ]|2−p )λk=λmax 1− y k −A x y k ,λk >0 x k←Π−1( x k )A T λk I +A Π−1( x k )A T −1y k x k [i ]← 0 x k [i ]<0 x k [i ] x k [i ]≥0,(6)where λk is a heuristically-adapted regularization term,limited by λmax which controls the tradeoffbetween sparsity and reconstruction error (higher valuesofλlead to more sparse solutions,at the cost of increased error).We denote this algorithm FOCUSS+,to distinguish from the FOCUSS algorithm[4] which omits the last line of(6).The estimate of x is refined over iterations of(6)and usually10to50iterations are needed for convergence(defined as the change in x being smaller than some threshold from one iteration to the next).Sparse Bayesian Learning with Adjustable Variance Gaussian Pri-ors(SBL-A VG)Recently,a new class of Bayesian model characterized by Gaussian prior sources with adjustable variances has been developed[11].These models use the linear generating model(1)for the data y but instead of using a non-Gaussian sparsity inducing prior on the sources x(as FOCUSS does),they use aflexibly-parameterized Gaussian prior,p X(x)=p(x|α)=ni=0N(x[i]|0,α−1i),(7)where the variance hyperparameterα−1i can be adjusted for each compo-nent x[i].Whenα−1i approaches zero,the density of x[i]becomes sharplypeaked making it very likely that the source will be zero,increasing the spar-sity of the code.The algorithm for estimating the sources has been termedSparse Bayesian Learning(SBL),but wefind this term to be too general,as other algorithms(including the older FOCUSS algorithm)also estimatesparse components in a Bayesian framework.We use the term SBL-AVG(Adjustable Variance Gaussian)to be more specific.To insure that the prior probability p(x|α)is sparsity-inducing,an ap-propriate prior on the hyperparameterαmust be chosen.In general,aGamma(αi|a,b)distribution can be used for the prior ofαi,and in par-ticular with a=b=0,the prior onαi becomes uniform.This leads top(x[i])having a Student’s t-distribution which qualitatively resembles the p-like distributions(with0<p≤1)used to enforce sparsity in FOCUSS and other algorithms.SBL-AVG has been used successfully for pattern recognition,with perfor-mance comparable to Support Vector Machines(SVMs)[11].In these appli-cations the known dictionary A is a kernel matrix created from the trainingexamples in the pattern recognition problem just as with SVMs.The perfor-mance of SBL-AVG was similar to SVM in terms of error rates,while usingfar fewer support vectors(non-zero x i)resulting in simpler models.Theo-retical properties of SBL-AVG for subset selection have been elucidated[13],and simulations on synthetic data show superior performance over FOCUSSand other basis selection methods.To our knowledge,results have not beenpreviously reported for SBL-AVG on image coding.Modified Matching Pursuit(MMP):Greedy vector selection Many variations on the idea of matching pursuit,or greedy subset selec-tion,have been developed.Here,we use Modified Matching Pursuit(MMP) [1]which selects each vector(in series)to minimize the residual representa-tion error.The simpler Matching Pursuit(MP)algorithm is more compu-tationally efficient,but provides less accurate reconstruction.More details and comparisons can be found in[1].For the case of non-negative sources, matching pursuit can be suitably adapted,and we call this algorithm MP+.DICTIONARY LEARNING ALGORITHMSIn the previous section we discussed algorithms that accurately and sparsely represent a signal using a known,predefined dictionary A.Intuitively,we would expect that if A were adapted to the statistics of a particular problem that better and sparser representations could be found.This is the motiva-tion that led to the development of the FOCUSS-CNDL dictionary learning algorithm.Dictionary learning is closely related to the problem of ICA which usually deals with complete A but can be extended to overcomplete A[6]. FOCUSS-CNDLThe FOCUSS-CNDL algorithm solves the problem(1)when both the sources x and the dictionary A are assumed to be unknown random variables[4].The algorithm contains two major parts,a sparse vector selection step and a dic-tionary learning step which are derived in a jointly Bayesian framework.The sparse vector selection is done by FOCUSS(or FOCUSS+if non-negative x i are needed),and the dictionary learning A-update step uses gradient descent.With a set of training data Y=(y1,...,y N)wefind the maximum a posteriori estimates A and X=( x1,..., x N)such that( A, X)=arg minA,XNk=1y k−Ax k 2+λd p(x k),(8)where d p(x)= x k p p is the diversity measure(4)that measures the number of non-zero elements of a source vector x k(see above).The optimization problem(8)attempts to minimize the squared error of the reconstruction of y k while minimizing d p and hence the number of non-zero elements in x k.The problem formulation is similar to ICA in that both model the input Y as being linearly generated by unknowns A and X, but ICA attempts to learn a new matrix W which by W y k= x k linearly produces estimates x k in which the components x i,k are as statistically inde-pendent as possible.ICA in general does not result in as sparse solutions as FOCUSS-CNDL which specifically uses a sparsity-inducing non-linear itera-tive FOCUSS algorithm tofind x k.We now summarize the FOCUSS-CNDL algorithm which was fully de-rived in[4].For each of the N data vectors y k in Y,we update the sparse source vectors x k using one iteration of the FOCUSS or FOCUSS+algorithm (6).After updating x k for k=1...N the dictionary A is re-estimated,Σyˆx=1NNk=1y k x T k,Σˆxˆx=1NNk=1x k x T k,δ A= AΣˆxˆx−ΣyˆxA← A−γδ A−tr( A Tδ A) A,γ>0,(9)whereγis the learning rate parameter.Each iteration of FOCUSS-CNDL consists of updating all x k,k=1...N with one FOCUSS iteration(6),followed by a dictionary update(9)(which usesΣcalculated from the updated x l estimates).After each update of A,the columns are adjusted to have equal norm a i = a j ,in such a way that A has unit Frobenius norm, A F=1.Overcomplete Independent Component Analysis(ICA)Another method for learning an overcomplete dictionary based on ICA was developed by Lewicki and Sejnowski[5,6].In the overcomplete case,the sources must be estimated as opposed to in standard ICA(complete A), where the sources are found by multiplying by the learned matrix W, x= W y.In[5]the sources are estimated using a modified conjugate gradient optimization of a cost function closely related to(5)that uses the1-norm (derived using a Laplacian prior on x).The dictionary is updated by gradient ascent on the likelihood using a Gaussian approximations(cf.[5]eq.20).MEASURING PERFORMANCETo compare the performance of image coding algorithms we need to measure two quantities:distortion and compression.As a measure of distortion we use a normalized root-mean-square-error(RMSE)calculated over all N patches in the image,RMSE=1σ1KNk=1(y k−A x k)212,(10)whereσis the variance of the elements in all the y k.Note that this is calculated over the image patches,leading to a slightly different calculation than the mean-square error over the entire image.To measure how much a given transform algorithm compresses an image, we need a coding algorithm that maps which coefficients were used and their amplitudes into an efficient binary code.The design of such encoders is generally a complex undertaking,and is outside the scope of our work here. However,information theory states that we can estimate a lower bound on the coding efficiency if we know the entropy of the input signal.Followingthe method of Lewicki and Sejnowski(cf.[6]eq.13)we estimate the entropy of the coding using histograms of the quantized coefficients.Each coefficient x k is quantized to8bits(or256histogram bins).The number of coefficients in each bin is c i.The limit on the number of bits needed to encode eachinput vector is,#bits≥bits lim≡−256i=1c iNlog2f[i],(11)where f[i]is the estimated probability distribution at each bin.We use f[i]=c i/(Nn),while in[6]a Laplacian kernel is used to estimate the density. The entropy estimate in bits/pixel is given by,entropy=bits limm,(12)where m is the size of each image patch(the vector y k).It is important tonote that this estimate of entropy takes into account the extra bits needed toencode an overcomplete(n>m)dictionary,i.e.we are considering the bitsused to encode each image pixel,not each coefficient.EXPERIMENTSPrevious work has shown that learned complete bases can provide more ef-ficient image coding(fewer bits/pixel at the same error rate)when com-pared with unadapted bases such as Gabor,Fourier,Haar and Daubechieswavelets[5].In our earlier work[4]we showed that overcomplete dictionariesA can give more efficient codes than complete bases.Here,our goal is to compare methods for learning overcomplete A(FOCUSS-CNDL and over-complete ICA),and methods for coding images once A has been learned,including the case when the sources must be non-negative.Comparison of dictionary learning methodsTo provide a comparison between FOCUSS-CNDL and overcomplete ICA[6],both algorithms were used to train a64×128dictionary A on a set of8×8pixel patches drawn from images of man-made objects.For FOCUSS-CNDL,training of A proceeded as described in[4].Once A was learned,FOCUSSwas used to compare image coding performance,with parameters p=0.5,iterations=50,and the regularization parameterλmax was adjusted over therange[0.005,0.5]to achieve different levels of compression.A separate testset was composed of15images of objects from the COIL database[7].Figure1shows the image coding performance of dictionaries learned usingFOCUSS-CNDL(which gave better performance)and overcomplete ICA.FOCUSS was used to code the test images,which may give an advantageto the FOCUSS-CNDL dictionary as it was able to adapt its dictionary tosources generated with FOCUSS(while overcomplete ICA uses a conjugategradient method tofind sources).Figure1:Image learned withFOCUSS-CNDLComparing FOCUSSIn this experiment the MMP,SBL-AVG and dictio-nary on a set of with FOCUSS-CNDL from the the same15testimages.For=0.5,λmax∈[0.005,0.5].For and thefixednoise parameterσthe number ofvectors selectedFigure2b-f shows examples of an image code with the algorithms.FO-CUSS was used in Figure2b for low compression and Figure2c for highcompression.SBL-AVG was similarly used in Figure2d and2e.In bothcases,SBL-AVG was more accurate and provided higher compression,e.g.MSE of0.0021vs.0.0026at entropy0.54vs0.78bits/pixel.In terms ofsparsity,Figure2e requires only154nonzero coefficients(of8192,or about2%)to represent the image.Figure3a shows the tradeoffbetween accurate reconstruction(low RMSE)and compression(bits/pixel)as approximated by the entropy estimate(12).The lower right of the curves represents the higher accuracy/lower compres-sion regime,and in this range the SBL performs best,with lower RMSE errorat the same level of compression.At the most sparse representation(upperleft of the curves)where only1or2dictionary vectors are used to representeach image patch,the MMP algorithm performed best.This is expected inthe case of1vector per patch,where the MMPfinds the optimal single vectorto match the input.Coding times per image on a1.7GHz AMD processorare:FOCUSS15.64sec,SBL-AVG17.96sec,MMP0.21sec.Image coding with non-negative sourcesNext,we investigate the performance tradeoffassociated with using non-negative sources ing the same set of images as in the previous section,(a) Original(b) FOCUSS(c) FOCUSS(d) SBL-AVG(e) SBL-AVG(f) MMP(g) MP+(h) FOCUSS+Figure2:Images coded using an overcomplete dictionary.(a)Original image(b) FOCUSS0.78bpp(bits/pixel)(c)FOCUSS0.56bpp(d)SBL-AVG0.68bpp,214 nonzero sources(out of8192)(e)SBL-AVG0.54bpp,154nonzero sources(f)MMP 0.65bpp(g)MP+0.76bpp(h)FOCUSS+0.77bpp,236nonzero sources.In(b)-(f) the dictionary was learned using FOCUSS-CNDL.In(g)-(h),non-negative codes were generated and the dictionary was learned with FOCUSS-CNDL+.we learn a new A∈R64×128using the non-negative FOCUSS+algorithm(6) in the FOCUSS-CNDL dictionary learning algorithm(9).The image gray-scale pixel values are scaled to y i∈[0,1]and the sources are also restricted to x i≥0but elements of the dictionary are not further restricted and maybe negative.Once the dictionary has been learned,the same set of15images as above were coded using FOCUSS+.Figure2g and2h show an image coded using MP+and FOCUSS+.FOCUSS+is visually superior and pro-vides higher quality reconstruction(MSE0.0016vs.0.0027)at comparable compression rates(0.77vs.0.76bits/pixel).Figure3b shows the compres-sion/error tradeoffwhen using non-negative sources to code the same set of test images as above.As expected,there is a reduction in performance when compared with methods that use positive and negative sources especially at lower compression levels.CONCLUSIONWe have discussed methods for learning sparse representations of images us-ing overcomplete dictionaries,and methods for adapting those dictionaries to the problem domain.Images can be represented accurately with a very sparse code,with on the order of2%of the coefficients being nonzero.When the sources are unrestricted,x∈R n,the SBL-AVG algorithm provides the best performance,encoding images with fewer bits/pixel at the same error when compared FOCUSS and Matching Pursuit.When the sources are re-quired to be non-negative,x[i]≥0,the FOCUSS+and associated dictionary learning algorithm presented here provide the best performance.[4]K.Kreutz-Delgado,J. F.Murray, B. D.Rao,K.Engan,T.-W.Lee andT.J.Sejnowski,“Dictionary Learning Algorithms for Sparse Representation,”Neural Computation,vol.15,no.2,pp.349–396,February2003.[5]M.S.Lewicki and B.A.Olshausen,“A Probabilistic Framework for the Adap-tation and Comparison of Image Codes,”J.Opt.Soc.Am.A,vol.16,no.7, pp.1587–1601,July1999.[6]M.S.Lewicki and T.J.Sejnowski,“Learning overcomplete representations,”Neural Computation,vol.12,no.2,pp.337–365,February2000.[7]S.A.Nene,S.K.Nayar and H.Murase,“Columbia Object Image Library(COIL-100),”Techn.Report CUCS-006-96,Columbia University,1996. [8] B.A.Olshausen and D.J.Field,“Sparse coding with an overcomplete basisset:A strategy employed by V1?”Vis.Res.,vol.37,pp.3311–3325,1997.[9]M.D.Plumbley,“Algorithms for nonnegative independent component analy-sis,”IEEE Trans.Neural Net.,vol.14,no.3,pp.534–543,May2003. [10] B.D.Rao and K.Kreutz-Delgado,“An Affine Scaling Methodology for BestBasis Selection,”IEEE Trans.Sig.Proc.,vol.47,pp.187–200,1999. [11]M.E.Tipping,“Sparse Bayesian Learning and the Relevance Vector Machine,”Journal of Machine Learning Research,vol.1,pp.211–244,2001. [12] D.M.Weber and D.Casasent,“Quadratic Gaborfilters for object detection,”IEEE Trans.Image Processing,vol.10,no.2,pp.218–230,February2001.[13] D.P.Wipf and B.D.Rao,“Sparse Bayesian Learning for Basis Selection,”toappear IEEE Trans.Sig.Proc.,2004.。

相关文档
最新文档