L07 Data Compression

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

2
x1 0 , x 2 1
Efficiency：

2

0 . 961
Then average codeword length is: The efficiency of this code is:
DUT
L 1
By the same way ：
H (X ) R

3

H ( X ) L
L
H(X ) log D
定理说明：另一方面,必可以找到前缀码，使其平均码长满足

L
DUT
H(X ) log D
1
金明录教授
应用信息论基础
DUT
应用信息论基础
金明录教授
Efficiency of Codes
Definition : The efficiency of a prefix code is defined as
Efficiency of Codes
Example: DMS
X x1 x 2 3 1 p 4 4
then： Entropy
Efficiency of Codes
Extension of the source：
xi
x1x1 x1x2 x2x1 x2x2

变长信源编码定理
又称无噪信道编码定理编码后的码符号信源尽可能为等概分布，使每个码符号平均所含的信息量达到最大要做到无失真编码，变换每个信源符号平均所需最少的元码元数就是信源的熵率 D元码元数就是信源的熵率信源的熵率是描述信源每个符号平均所需最少的比特数是存在性定理－－具有理论指导意义是构造性定理是构造性定理－－设计出多种具体编码方法设计出多种具体编码方法
0 . 985

0 . 811
金明录教授
DUT

11
4

H ( X ) L
0 . 991
金明录教授
12
应用信息论基础
应用信息论基础
变长信源编码举例
例子(定长编码)
s2 S s1 P ( s ) 3 / 4 1 / 4
可以求得
H ( S ) 0.811, D( I ( s )) 0.475
P (G N ) 1 ; P (G N ) 2
N ( H ( S ) )
X {x1, x2,, xD}
单符号信源
si wi ( xi1 , xi2 , , xiL )
单个符号共有q个 L长共有DL个
P ( S1 , S 2 , , S N ) 2

DUT
应用信息论基础
金明录教授
变长信源编码定理
无失真变长信源编码定理(香农第一定理)
离散无记忆平稳信源S，其熵率为离散无记忆平稳信源S 其熵率为H(X)，并有码符号并有码符号 X={x1,…,xD}。对信源S进行编码,总可以找到一种编码方法，构成唯构成唯一可译码，使信源S中每个信源符号所需的平均码长译使信源中每个信源符号所需的平均长满足: 物理意义：
Shannon-type code
A Shannon-type code for the message U of Example
i.e., H(U) ≈ 1.846 bits. We design a binary Shannon-type code:
Note that
i.e., such a code does exist, as expected. A possible choice for such a
Shannon-type code
We need to show that such a code always exists. Note that by y definition in
So we try to design a simple code following this main design Assume that PU(ui) = pi > 0 for all i since we do not care about
DUT
应用信息论基础
金明录教授
DUT
应用信息论基础
金明录教授
等长编码
S {s1 , s2 , , sq }
等长信源编码定理
C {w1 , w2 , , wq }
编码器
任何一个离散随机序列信源当wenku.baidu.com列长度N→∝时,信源序列会产生两极分化.大概率事件集合GN 与小概率事件集合GN ,即qN= GN ∪GN
messages with zero probability. probability
Hence, Hence we have
Then, for every message ui define
and choose an arbitrary unique prefix-free codeword of length li.
DUT
应用信息论基础
金明录教授
应用信息论基础
金明录教授
Shannon-type code
Example: As an example, consider a random message U with four
symbols having probabilities p1 = 0.4, p2 = 0.3, p3 = 0.2, p4 = 0.1
则有
若要求
0.96, 10-5
N 4.13 107
与前一个例相比：同一个信源，当要求编码效率达到96％时，等长码需要4100万个信源符号联合编码；变长码只需2个符号（二次扩展信源）联合编码；结论：采用变长编码，L不需要很大就可以达到相当高的编码效率，而且可以实现无失真编码。且随着L的增大，编码效率越来越接近于1。
应用信息论基础
金明录教授
教学内容
Introduction and Preview Entropy, Entropy Relative Entropy Entropy, and Mutual Information Asymptotic Equipartition Property Entropy Rates of a Stochastic Process Data Compression Channel Capacity Differential Entropy Gaussian Channel Rate Distortion Theory y Network Information Theory
DUT
What We Can Do: Analysis of Some Good Codes 1、Shannon-type code 2、Shannon-Fano-Elias coding 3、Shannon-Fano coding 4、Huffman coding 5、Arithmetic codes 6、The Lempel-Ziv Lempel Ziv Algorithm 7、Run Length Encoding
N ( H ( S ) )
信源序列集合 q
N
(1 )2

GN q
N
N ( H ( S ) )
q DL
G N 2 N ( H ( S ) )
G N

2 N [ H ( S ) ] 2 N [log q H ( s ) ] qN
等长信源编码定理
Shannon-type code
Let’s compute the efficiency of the
Shannon-type code designed before and compare:
and therefore
E[L] = 1 + 0.7 + 0.3 + 0.3 + 0.1 = 2.4, which hi h satisfies ti fi as predicted di t d 1.846 ≤ 2.4 < 1.846 + 1.
code is shown in Figure bellow.
DUT
应用信息论基础
金明录教授
DUT
应用信息论基础
金明录教授
Shannon-type code
Next, let’s see how efficient such a code is. To that goal we note that
But obviously the code shown in the
Figure is much better than the Shannon -type type code!
Its performance is E[L] = 2 < 2.4. We see that h a Shannon-type h code d (even ( though h h it i is i not an optimal i l
DUT
12-13学年第二学期
DUT
应用信息论基础
金明录教授
应用信息论基础
金明录教授
Contents
Review Source Coding Tree of Code Kraft K ft inequality i lit What We Cannot Do: Fundamental Limitations of
S Source Coding C di
Summary of Source Coding Theorem 1、Fixed length 2、Variable length
Summary of Source Coding Theorem Efficiency of Codes What We Can Do: Analysis of Some Good Codes Summary
P (xi)
9/16 3/16 3/16 1/16
27 16
code 0 10 110 111
27 32
H (X )
1 4
log 4
3 4
log
4 3
0 . 811
L2
27 16
For every source symbol： Use a prefix code：
L
H ( X ) L
code!) achieves the ultimate lower bound by less than 1! An optimal code will be even better than that!

Efficiency of Codes
物理意义：
HD (X ) L

H(X ) L log gD
编码前平均每个信源符号携带的信息量为：H(X) g 编码后平均每个信源符号携带的最大的信息量为： LlogD 所以编码效率表示符号利用效率。
DUT DUT
应用信息论基础
金明录教授
应用信息论基础
金明录教授
2 N 1 2 L
G N
N次扩展信源
i ( si , si ,, si ) Wi ( xi , xi ,, xi )
1
qN DL
DUT
N长共有qN个
应用信息论基础
L长共有DL个
金明录教授
L 失真编能无失真编码不能无失真编码失 L H D ( X ) 2 HD(X ) N 译码概率趋于 0 译码概率趋于 1 N 编码效率 D I(X N ) 2 H ( X ) NH D ( X ) N D L ( N / L) H 2 ( X ) 1 2
and the Kraft inequality is always satisfied. So we know that we can always find a Shannon-type code.
DUT
Any code generated like that is called Shannon-type code.
DUT
应用信息论基础
金明录教授
应用信息论基础
金明录教授
Shannon-type code
We have already understood that a good code should assign a short
codeword to a message with high probability, codewords d d for f unlikely lik l messages only. l principle. principle and use the long