信息过滤(Information Filtering)综述
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
At a filtering server
– –
At the user site
– –
Filtering approach
Cognitive filtering
– –
Content-based filtering Document content vs user profiles Collaborative filtering, or properties-based filtering Similarity between users Recommendation systems User modeling & User clustering Complement for content-based systems
Implicit approach
– –
Explicit & Implicit approach
– –
三,IF系统的组成 系统的组成
一般组成
(d) Learning Component updates feedback User personal details user profile relevant data items represented data items Information Provider
Statistical concept
User-model component:
–
Profile is a weighted-vector of index terms(such as: VSM, LSI) Correlation, Cosine measure Robertson&Sparck-Jones formula (PRM) (nave) Bayesian classifier Feedback, query reconstruction(such as: Rocchio)
Semantic networks/Stereotypic inference/Statistical inference on the relationship between words in docs
Underlying Architecture
– – – –
Agent/neural networks for auto inferred model VSM/LSI for explicit inference Concept model for intelligent systems Keyword system for statistically-based systems
Frequency of learning
– –
四,IF系统的评估 系统的评估
Methods & Measures
Evaluation methods of IF systems
Evaluation by Experiments Evaluation by Simulation: such as TREC Analytical Evaluation
User-model component
Gather info about users(explicitly and/or implicitly) Construct the user profiles or other user models(rules, VSM, documents center) Pass the user models to the filtering component User models must be suitable for the document representation
Learning component
To improve further filtering Detect shifts in users' interests Update the user-model
Two concepts used in IF systems
System based on the statistical concept System based on the knowledge-based concept
Neural-network filtering systems Genetic-based filtering systems
User modeling for IF systems
Acquisition of the data for the model
– – – –
Data included in the model
Filtering component
The heart of the IF system Match the user profiles with the represented data items Decision may be binary or probabilistic (ordered by rank) The selected items' relevancy can be determined by the user The relevancy info can be sent to the learning component (feedback info)
Passive IF systems
– –
Location of operation
At the info source
– – –
Post profiles to info provider Clipping service Usually pay fee Info provider send info to server Serve distributed info to users Local filtering system Such as outlook & Netscape Email & Foxmail
–
– –
Selective Dissemination of Information(SDI),来自 图书馆领域. Routing,来自Message Understanding. Current Awareness, Data Mining
IF vs IR/分类 分类/IE 分类
IF&IR:广义地讲,IF是IR的一部分
Measures of evaluation of IF systems
Simple Precision & Recall Statistical Measurements
–
Correlation(User evaluation vs. System evaluation): Rank vector Utility=(A*R+)+(B*N+)+(C*R-)+(D*N-), Normalize ASP(average set precision)=P*R, if P or R=0, ASP is not suitable Coverage Ratio=|Rk|/|U|=|A∩U|/|U|, Rk is the number of documents known to the user Novelty=|Ru|/(|Ru|+|Rk|)
信息过滤(Information Filtering,IF)综述 信息过滤 综述
中科院计算所软件室 王斌 wangbin@ 2001.12.10
主要内容
IF的基本概念 IF系统的分类 IF系统的组成 IF系统的评估 IF的现状及发展趋势
一,基本概念
ห้องสมุดไป่ตู้
定义
IF定义: – 从动态的信息流中将满足用户兴趣的信息挑选出来, 用户的兴趣一般在较长一段时间内不会改变(静态).
Sociological filtering
– – – – –
Methods of acquiring knowledge about users
Explicit approach
– –
User interrogation Filling forms Recording user behavior Time/times/context/activity(save/discard/print/browsing/click)/et c. Document space (case-based) Stereotypic inference(predefined default profile,then change during scanning)
Filtering component
– – –
Learning component
–
Knowledge-based concept
Rule-based and Semantic-nets filtering systems:
– –
Rule (if .. Then take action), obsolescence problem User profile represents by semantic-net (wordnet)
IF&Categorization
–
IF&IE
–
IF applications
Internet Search Results Filter Personal Email Filter List Server/Newsgroup Filter Browser Filter Filter for children Filter for customers: recommendation
二,IF分类体系 分类体系
IF分类示意图 分类示意图
Initiative of operation
Active IF systems
– – –
Collect and send relevant info to users Push to users Info overload, so make accurate user profile Not collect info for users Email or Usenet news
Set-based Measurements
– –
User-oriented Measures
Learning in IF systems
Methods of Learning
– – –
Learning by observation Learning by feedback User-training learning Critical learning Periodic learning
data items (a) Data Analyzer Component
(c) User-Model Component
(b) Filtering Component
Data-analyzer component
Be close to the info provider Obtain or collect data from the info provider Analyze & represent documents(such as Boolean Model, VSM, etc) Pass the representation to the filtering component
Implicit approach: observation of user behavior Explicit approach: fill forms, interact (feedback) Shallow semantics: keywords Enhanced user model, high level knowledge about the user(background past experience)
– – – –
Database动态,需求静态;Database静态,需求静态 User Profile vs Query IF用户要对系统有所了解,IR不需要. IF要涉及到用户建模/个人隐私等社会问题 Categorization中的Category不会经常改变.相对而言,User Profile会动态变化 IF关心相关性,IE只关心抽取的那些部分,不管相关性