SYSTEM AND METHOD FOR AUTOMATICALLY DISCOVERING A
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
专利名称:SYSTEM AND METHOD FOR
AUTOMATICALLY DISCOVERING A
HIERARCHY OF CONCEPTS FROM A CORPUS
OF DOCUMENTS
发明人:CHUNG, Christina,LIU, Jinhui,LUK,
Alpha,MAO, Jianchang,TAANK,
Sumit,VUTUKURU, Vamsi
申请号:EP03731225.3
申请日:20030515
公开号:EP1508105A2
公开日:
20050223
专利内容由知识产权出版社提供
摘要:The invention is a method, system and computer program for automatically discovering concepts from a corpus of documents and automatically generating a labeled concept hierarchy. The method involves extraction of signatures from the corpus of documents. The similarity between signatures is computed using a statistical measure. The frequency distribution of signatures is refined to alleviate any inaccuracy in the similarity measure. The signatures are also disambiguated to address the polysemy problem. The similarity measure is recomputed based on the refined frequency distribution and disambiguated signatures. The recomputed similarity measure reflects actual similarity between signatures. The recomputed similarity measure is then used for clustering related signatures. The signatures are clustered to generate concepts and concepts are arranged in a concept hierarchy. The concept hierarchy automatically generates query for a particular concept and retrieves relevant documents associated
with the concept.
申请人:Verity, Inc.
地址:894 Ross Drive Sunnyvale, CA 94089 US 国籍:US
代理机构:Heinze, Ekkehard, Dipl.-Phys. Dr.
更多信息请下载全文后查看。