文件检索系统的设计与实现
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
学号_ _ 密级_____________ __
武汉大学本科毕业论文
组织专家检索系统的
设计与实现
院(系)名 称:信 息 管 理 学 院
专 业 名 称 :信息管理与信息系统
学 生 姓 名 :韩 曙 光
指 导 老 师 :陆 伟 副 教 授
二○○八年五月
BACHELOR'S DEGREE THESIS
OF WUHAN UNIVERSITY
Design and Implementation of Organization Expert Search System
College :School of Information Management
Subject :Information Management and Information System
Name: Shuguang Han
Directed by:Wei Lu,Associate Professor
May ,2008
摘要
Internet的快速发展和互联网相关技术的不断成熟,使得企业(组织)的相关资源纷纷上网,TREC(文本检索国际会议)也因此提出了企业检索任务,主要目标是帮助用户实现在对企业相关数据进行检索的基础上完成特定任务。企业检索的内容既可以是组织外部的数字资源也可以是组织内部的数字资源,这些数字资源通常以异构的形式存在,如邮件、数据库记录、文档、共享文件等。
组织(企业)专家检索是企业检索的很重要的分支,也是当前垂直信息检索研究的热门领域。本文总结了目前国内外组织专家检索的研究现状,分析了构建组织专家检索系统的需求和挑战,并以此为基础,利用组织内外部的网页和期刊论文数据库等信息,设计了从数据资源采集、规整、索引、检索到可视化等整个过程的组织专家检索系统模型及以武汉大学为例的专家检索系统平台——WHU-ES。该系统通过动态定义组织内外表征专家信息的资源列表,设定资源动态更新周期,可实现资源的动态采集、专家专长的智能识别、专家共现聚类关系图的动态生成和分析、专家个人档案信息自动抽取(包括专家肖像提取、专家简介自动识别等)等功能。此外,本文也分析了构建专家检索系统存在的网页正文抽取、专家姓名重叠、社会网络关系分析等难点,提出了可能的解决方案,最后对WHU-ES专家检索系统做了初步评价。
关键词:专家检索;专长识别;组织检索;专家聚类
ABSTRACT
The rapid progress of Internet and related technology make it much easier for us to access the enterprise ( or organization) documents and web pages. As a result, TREC (Text REtrieval Conference) proposed the enterprise retrieval task which purpose is to study enterprise search: satisfying a user who is searching the data of an organization to complete some task. The corpus combines the digital resources with diverse types such as published reports, email, database records, files and shared documents.
As an important part of the Enterprise Retrieval, Organization Expert Search ( Expertise Retrieval ) is the current hot area of Vertical Information Retrieval research. Based on the analysis of the requirement and challenges, this paper summarizes the current development of the expert search, and proposes a general architecture of the organization expert search system, which contains data collections, sorting, indexing, retrieving, visualizing and so on, by using the relevant web pages and academic database as the data collections. Then we construct an expert search system taking Wuhan University as an example, which we called WHU-ES for short. This system achieves some specific functions such as the dynamic collection of diverse resources, the intelligent recognition of expertise and the automatic extraction of expert profile (the portrait picture extraction etc.) and so on. We also analyze the difficulties such as Personal Name Resolution, Social Networks Analysis, and Content Extraction, and then provide the possible solutions. At last, we give the preliminary evaluation of the expert search result.
Keywords: Expert Search; Expertise Recognition; Organization Search; Expert Clustering