LZ复杂度的最先提出(1976)

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

FELLOW, IEEE Abstract-A new approach to the problem of evaluating the complexity (“randomness”) of finite sequencesis presented. The proposed complexity measure is related to the number of steps in a self-delimiting production process by which a given sequence is presumed to be generated. It is further related to the number of distinct substrings and the rate of their occurrence along the sequence. The derived properties of the proposed measure are discussed and motivated in conjunction with other wellestablished complexity criteria.
IEEE TRANSACTIONS
Baidu Nhomakorabea
ON INFORMATION
THEORY,
VOL.
IT-22, NO. 1, JANUARY1976
75
On the Complexity of Finite Sequences
ABRAHAM LEMPEL, MEMBER, IEEE, AND JACOB ZIV,
A
identification of large families of sequences that yield themselvesto efficient data compression.’ W e do not profess to offer a new absolute measure for complexity which, as mentioned already, we believe to be nonexistent. Rather, we propose to evaluate the complexity of a finite sequencefrom the point of view of a simple selfdelimiting learning machine which, as it scans a given ndigit sequence S = slsZ * * * s, from left to right, adds a new word to its memory every tim e it discovers a substring of consecutive digits not previously encountered. The size of the compiled vocabulary, and the rate at which new words are encounteredalong S, serveas the basic ingredients in the proposed evaluation of the complexity of S. The description of the exact mechanics of the outlined learning process requires some preparatory notation, definitions, and interpretation which take up all of Section II. The proposed complexity measureis discussedin Section III, where it is also motivated and put to test against wellestablished test cases such as the de Bruijn sequences[S]. It is also shown in Section III that, under the proposed complexity measure, most sequences are complex, which is imperative for any complexity measure where “complex” is tantamount to “random.” In particular, it is demonstrated that de Bruijn sequencesare complex with respect to our definition of complexity. W e further show that, under the proposed criterion, sequencesfrom an ergodic source whose normalized entropy is less than unity do not qualify as complex ones, thereby demonstrating that the criterion is not too weak. In Section IV we present a detailed study of the vocabulary of a sequenceand examine the concept of complexity as related to the rate of vocabulary growth. W e conclude by proposing yet another indicator of complexity which is related to the one discussed in Section III and which reflects more closely the rate of vocabulary growth.
I. IN~~DUCTI~N NY ATTEMPT to argue that a given finite sequence of fully specified numbers is a random sequencewill be open to obvious criticism. O n the other hand, in a wide variety of situations arising in digital data systems, it is adequateto have deterministically generatedyet sufficiently “random looking” sequencesthat, while not being truly random, pass most of the agreed-upon randomness tests and allow for repeated experiments under similar “noise conditions.” Roughly speaking,the complexity of a finite fully specified sequenceis a measure on the extent to which the given sequence resemblesa random one. As in the caseof randomness, any attempt to derive an allegedly absolute measure for the complexity of an individual sequence will raise understandableobjections. Early works in this area [l], [2] have linked the notion of complexity of a given sequenceto that of an algorithm by which the sequence is supposed to be generated. Kolmogorov [l] has proposed to use the length of the shortest binary program, when fed into a given algorithm will cause it to produce a specified sequence,as a measure for the complexity of that sequence with respectto the given algorithm. Recent works in this field [3]-[6] have followed more or less along the same lines, while contributing to a further development and refinement of the pioneering ideas introduced by Kolmogorov and Chaitin. In this paper, we propose and explore a new approach II. REPRODUCTION AND PRODUCTION OF SEQUENCES to the problem that links the complexity of a specific Let A* denote the set of all finite length sequences over a sequenceto the gradual buildup of new patterns along the given sequence.Aside from introducing a variety of inter- finite alphabet A. Let Z(S) denote the length of S E A* and esting combinatorial problems, the suggested approach let leads, on one hand, to a new method of constructing A” = (SE A* 1Z(S) = n}, n 2 0. “random looking” sequences and, on the other hand, to the The null-sequenceA, i.e., the “sequence”of length zero, is assumedto be an element of A*. A sequence S E A” is fully specified by writing S = Manuscript received January 3, 1975; revised April 17, 1975. A. Lempel was with the Sperry Rand Research Center, Sudbury, Sl.72 * * * s,; when S is formed from a single element a E A, Mass. He is now with the Department of Mathematical Sciences, we write S = a”. To indicate a substring of S that starts Thomas J. Watson Research Center, Yorktown Heights,. N.Y., on