lect-1 压缩感知和稀疏优化基本理论

合集下载

相关主题

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

16/41
Coherence
Deﬁnition (Mallat and Zhang 1993 )
The (mutual) coherence of a given matrix A is the largest absolute normalized inner product between different columns from A. Suppose A = [a1 , a2 , . . . , an ]. The mutual coherence of A is given by µ(A) = max |ak aj | ak 2 · aj
x ˆ
11/41
Fundamental Question
The basic question of sparse optimization is: Can I trust my model to return an intended sparse quantity? That is does my model have a unique solution? (otherwise, different algorithms may return different answers) is the solution exactly equal to the original sparse quantity? if not (due to noise), is the solution a faithful approximate of it? how much effort is needed to numerically solve the model?
Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees
http://bicmr.pku.edu.cn/~wenzw/bigdata2016.html Acknowledgement: this slides is based on Prof. Emmanuel Candes’ and Prof. Wotao Yin’s lecture notes
13/41
Spark
First questions for ﬁnding the sparsest solution to Ax = b Can sparsest solution be unique? Under what conditions? Given a sparse x, how to verify whether it is actually the sparsest one?
x
Sample Compress Transmit / Store
Receive
Decompress
x ˆ
Traditional Compressive sensing
x
Compressive sensing (senses less, faster)
Transmit / Store
Receive
Reconstruction
14/41
Spark
Theorem (Gorodnitsky and Rao 1997 )
If Ax = b has a solution x obeying x sparsest solution.
0
< spark(A)/2, then x is the
Proof idea: if there is a solution y to Ax = b and x − y = 0, then A(x − y) = 0 and thus x or y
1
minimization
min x 0 s.t. Ax = b Combinatorially hard
min x 1 s.t. Ax = b Linear program minimize i |xi | subject to Ax = b
is equivalent to minimize i ti subject to Ax = b −ti ≤ xi ≤ ti with variables x, t ∈ Rn x solution ⇐⇒ (x , t = |x |) solution
information theory signal processing statistics some areas of computer science ...
Inspired new areas of research, e. g. low-rank matrix recovery
6/41
A contemporary paradox
12/41
How to read guarantees
Some basic aspects that distinguish different types of guarantees: Recoverability (exact) vs stability (inexact) General A or special A? Universal (all sparse vectors) or instance (certain sparse vector(s))? General optimality? or speciﬁc to model/algorithm? Required property of A: spark, RIP, coherence, NSP, dual certiﬁcate? If randomness is involved, what is its role? Condition/bound is tight or not? Absolute or in order of magnitude?
min x
x
2
min x
x
1
s.t. Ax = b
(a)
0 -minimization
s.t. Ax = b
(b)
2 -minimization
s.t. Ax = b
(c)
1 -minimization 4/41
Linear programming formulation
0
minimization
8/41
Sparsity and wavelet "compression"
Take a mega-pixel image Compute 1, 000, 000 wavelet coefﬁcients Set to zero all but the 25, 000 largest coefﬁcients Invert the wavelet transform
Deﬁnition (Donoho and Elad 2003)
The spark of a given matrix A is the smallest number of columns from A that are linearly dependent, written as spark(A). rank(A) is the largest number of columns from A that are linearly independent. In general, spark(A) = rank(A) + 1; except for many randomly generated matrices. Rank is easy to compute, but spark needs a combinatorial search.
5/41
Compressed sensing
Name coined by David Donoho Has become a label for sparse signal recovery But really one instance of underdetermined problems Informs analysis of underdetermined problems Changes viewpoint about underdetermined problems Starting point of a general burst of activity in
http://bicmr.pku.edu.cn/~wenzw/courses/sparse_l1_example.m
Find the sparest solution Given n=256, m=128. A = randn(m,n); u = sprandn(n, 1, 0.1); b = A*u;
2/41
Special structure
If unknown is assumed to be sparse low-rank then one can often ﬁnd solutions to these problems by convex optimization
3/41
Compressive Sensing
15/41
General Recovery - Spark
Rank is easy to compute, but spark needs a combinatorial search. However, for matrix with entries in general positions, spark(A) = rank(A)+1. For example, if matrix A ∈ Rm×n (m < n) has entries Aij ∼ N (0, 1), then rank(A) = m = spark(A) − 1 with probability 1. In general, any full rank matrix A ∈ Rm×n (m < n), any m + 1 columns of A is linearly dependent, so spark(A) ≤ m + 1 = rank(A) + 1
This principle underlies modern lossy coders
9/41
Signals/images may not be exactly sparse
10/41
Comparison
Sparse representation = good compression
Why? Because there are fewer things to send/store
1/41
Underdetermined systems of linear equations
x ∈ Rn , A ∈ Rm×n , b ∈ Rm
When fewer equations than unknowns Fundamental theorem of algebra says that we cannot ﬁnd x In general, this is absolutely correct
0 0
+ y
0
0
≥ x−y
0
≥ spark(A),
0
≥ spark(A) − x
> spark(A)/2 > x
The result does not mean this x can be efﬁciently found numerically. For many random matrices A ∈ Rm×n , the result means that if an algorithm returns x satisfying x 0 < (m + 1)/2, the x is optimal with probability 1. What to do when spark(A) is difﬁcult to obtain?
1 1 0.8
0.4
0.8
0.6
0.6
来自百度文库
0.4
0.2
0.4
0.2
0.2
0 −0.2
0
0 −0.2
−0.2
−0.4 −0.4
−0.6
−0.6
−0.4
−0.8 −1 50 100 150 200 250 −0.8 −1
−0.6 50 100 150 200 250
50
100
150
200
250
min x
x
0
Massive data acquisition Most of the data is redundant and can be thrown away Seems enormously wasteful
7/41
Sparsity in signal processing
Implication: can discard small coefﬁcients without much perceptual loss