Lucene 3.6 索引物理文件结构初探

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Lucene索引物理文件结构初探

made by ppm10103

VInt

•VInt表达

整数在文件中以

•最高位为1表示后续byte有内容

•低位在前,高位在后(大尾)

•负数的处理见下面(DataOutput类)

Vint的例子

•二进制(左边为低地址)

十进制进制左边为低地址•200000010

•12701111111

•128 10000000 00000001

16,385 10000001 00000001•1638510000000

String

•Lucene writes strings as UTF‐8 encoded bytes •First the length, in bytes, is written as a Vint •然后就是UTF‐8的bytes

举例

•一个文档两个域

个文档两个域

•content1域,内容为my storeyes test , Store.YES(原始文本存储),Index.ANALYZED(对文本进行索引分析)

•content2域,内容为my storeno test ,Store.NO(原始文本不存储) ,Index.ANALYZED(对文本进行索引分析)

生成的索引

文件后缀名的来源public IndexFileNames{

•final class

•/** Name of the index segment file */

•public static final String SEGMENTS = "segments";

•/** Name of the generation reference file name */

•public static final String SEGMENTS_GEN = "segments.gen";

•/** Name of the index deletable file (only used in

•* pre‐lockless indices) */

•public static final String DELETABLE = "deletable";

•/** Extension of norms file */

•public static final String NORMS_EXTENSION = "nrm";

•/** Extension of freq postings file */

•public static final String FREQ_EXTENSION = "frq";

•/** Extension of prox postings file */

•public static final String PROX_EXTENSION = "prx";

•/** Extension of terms file */

//

•public static final String TERMS_EXTENSION = "tis";

•/** Extension of terms index file */

•public static final String TERMS_INDEX_EXTENSION = "tii";

•/** Extension of stored fields index file */

p g__f;

•public static final String FIELDS_INDEX_EXTENSION = "fdx";

•/** Extension of stored fields file */

•public static final String FIELDS_EXTENSION = "fdt";

•/** Extension of vectors fields file */

•public static final String VECTORS_FIELDS_EXTENSION = "tvf";

•/** Extension of vectors documents file */

•public static final String VECTORS_DOCUMENTS_EXTENSION = "tvd";

•/** Extension of vectors index file */

•public static final String VECTORS_INDEX_EXTENSION = "tvx";

•/** Extension of compound file */

宏观结构

fdt

域内容描述表.fdt

Store.YES类型的Field的文本内容•对应文档的Store YES

•Fdt文件的内容包含FieldCount, FieldNum, Bits, Value

Bi V l

•IndexFileNames.FIELDS EXTENSION

_定义了后缀名为ftd

详细分析fdt文件

•使用二进制表达上面的文件内容(Vint类型)

使用进制表达上面的文件内容(

•00000000 :

•00000000

•00000000

•00000011:FieldsWriter类的static final int FORMAT_CURRENT = FORMAT_LUCENE_3_2_NUMERIC_FIELDS; 固定常量值为3•00000001:FieldCount,总共有1个Field存储

•有多少个Field就有多少个下面的内容

•00000000 :FieldNum,Field序号为0

•00000001:Field policy,Vint的值为1,就是对Field的内容进行索引分

p y值为就是对内容行索引分析

•00010000:Vint的值为16,就是my storeyes test占用的字节数

•再后面就是字符串的Unicode码

U i d

相关文档
最新文档