逆向文件频率(inverse
合集下载
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
對於主題2,平均準確率為(1/1+2/3+3/5+0+0)/5=0.
Facet-value pair recommendation
in the form of facet-value pairs *Experimental settings
Recall@N (R@N) (the recall of top N documents)
7
*Top Document Frequency (TDF)
initial query
baseline retrieval algorithm
ranked document
Document 1 Document 2 Document 3
Document N
candidate 1 2 3 4
K
assumption: the more frequently a facet-value pair appears in the top ranked documents, the more likely the user will like it.
each metadata field is called a facet, and a facet with a specific value is called a facet-value pair
language: Chinese format: ppt subject: IR genre: comedy
逆向文件频率inverse逆向文件
Interactive Retrieval Based on Faceted Feedback
(SIGIR 10’)Lanbo Zhang, Yi Zhang /09/06
Spoker: Hsu,Yu-wen
1
Outline
Introduction Faceted Feedback
evaluate performance
Mean Average Precision (MAP) Precision@N (P@N) (the precision of top N documents) Recall@N (R@N) (the recall of top N documents)
a simple interactive user feedback mechanism
5
Two major problems
the candidates of facets and possible values for products are usually manually designed
we propose a soft retrieval model, a document that meets a users elected faceted constraint gets a certain number of credits
6
Faceted Feedback
Facet-value pair recommendation
return a document set
: the score of document computed
using a baseline ranking method .
method: TFIDF,BM25 etc.
12
Score document by Soft Model
: the weight of facet learned automatically
BM(Best Match),用來做搜索的相關度評分,即為[給定搜索内容]Q在[給定文件]D中的相關程度,分數越高表示相關度越高。
faceted feedback mechanism : ranked document
documents have their own facets
The facet-value pairs with the largest query likelihoods are chosen as the candidates.
OHSUMED,user1
RCV1,user6
3-fold cross validation learn the parameter
18
OHSUMED
RCV1
Performances of different facet-value pair recommendation approaches.
documents have their own facets
manually assigned generated automatically
Users’ preferences
provide structured queries to describe their information needs but not frequently , sometimes incorrectly
Experimental Methodology
Users’ preferences
baseline retrieval algorithm
3 workers work on each query.
某系統對於主題1檢索出4個相關網頁,其rank分別為1, 2, 4, 7;
ex: “mining”出現在100份文件中,整個文件集文件總數1000,則IDF= log(1000/100)
RCV1(Reuters Corpus Volume 1)
810,000 Reuters news stories published topic, geographical region, industry facets the first 50 topics of TREC 2002 track user
Facet-value pair recommendation Incorporate faceted feedback into retrieval
Experimental Methodology Experimental Results Conclusions
2
Introduction
information needs
14
*Evaluation Based on Mechanical Turk
3 workers work on each query.
15
*Experimental settings
Compare with
baseline retrieval method: BM25, without dback (PRF) real document relevance feedback (RRF)
A personalized search or filtering system usually suffers from the “cold start” problem.
to borrow information from other users. to develop user interaction mechanisms to
we investigate four approaches to recommending good facet-value pairs
Existing e-commerce websites often use a Boolean filtering strategy while retrieving products.
8
*TDF-IDF
:a facet-value pair
:the top document frequency of
for query
9
*Query Likelihood (QL)
:the frequency of in the query
:a translation model of document
based on document facets Incorporate faceted feedback into retrieval
逆向文件頻率(inverse document frequency,IDF)是一個詞語普遍重要性的度量。
Two major problems
faceted constraints 如果系統沒有返回相關文件,則準確率默認為0。
: the whole corpus
:assumed to be uniform over all documents
that contain
The facet-value pairs with the largest query
likelihoods are chosen as the candidates.
4
如果系統沒有返回相關文件,則準確率默認為0。
baseline retrieval algorithm
:the top document frequency of
for query
:the set of scores of all considered facet-value pairs.
逆向文件頻率(inverse document frequency,IDF)是一個詞語普遍重要性的度量。
collect more information from users
interactive user feedback mechanism
learn more about user information needs with limited user interactions
3
Faceted search
10
*TDF-QL
:to normalize the features
:the set of scores of all considered facet-value pairs.
11
Incorporate faceted feedback into retrieval
: the set of F-V pairs chosen by the user. Score documents by Boolean Model
如果系統沒有返回相關文件,則準確率默認為0。
users can choose interesting 主集合的平均準確率(MAP)是每個主題的平均準確率的平均值。
language: Chinese
facet-value
pairs
to improve the returned documents. PRF@5: pseudo relevance feedback using top 5 docs
:the original score of document
:the standard normalization
13
Experimental Methodology
Datasets
OSHUMED dataset
348,566 medical articles from 270 medical journals topics user information needs metadata field MeSH facets (Medical Subject Headline)
Set
=10 =100
16
Experimental Results
Overall Performances of faceted feedback
1,2,3 for OHSUMED; 4,5,6 for RCV1 Soft Model
17
*Boolean model V.S. Soft model
a simple interactive user feedback mechanism Existing e-commerce websites often use a Boolean filtering strategy while retrieving products.
we propose a soft retrieval model, a document that meets a users elected faceted constraint gets a certain number of credits
Facet-value pair recommendation
in the form of facet-value pairs *Experimental settings
Recall@N (R@N) (the recall of top N documents)
7
*Top Document Frequency (TDF)
initial query
baseline retrieval algorithm
ranked document
Document 1 Document 2 Document 3
Document N
candidate 1 2 3 4
K
assumption: the more frequently a facet-value pair appears in the top ranked documents, the more likely the user will like it.
each metadata field is called a facet, and a facet with a specific value is called a facet-value pair
language: Chinese format: ppt subject: IR genre: comedy
逆向文件频率inverse逆向文件
Interactive Retrieval Based on Faceted Feedback
(SIGIR 10’)Lanbo Zhang, Yi Zhang /09/06
Spoker: Hsu,Yu-wen
1
Outline
Introduction Faceted Feedback
evaluate performance
Mean Average Precision (MAP) Precision@N (P@N) (the precision of top N documents) Recall@N (R@N) (the recall of top N documents)
a simple interactive user feedback mechanism
5
Two major problems
the candidates of facets and possible values for products are usually manually designed
we propose a soft retrieval model, a document that meets a users elected faceted constraint gets a certain number of credits
6
Faceted Feedback
Facet-value pair recommendation
return a document set
: the score of document computed
using a baseline ranking method .
method: TFIDF,BM25 etc.
12
Score document by Soft Model
: the weight of facet learned automatically
BM(Best Match),用來做搜索的相關度評分,即為[給定搜索内容]Q在[給定文件]D中的相關程度,分數越高表示相關度越高。
faceted feedback mechanism : ranked document
documents have their own facets
The facet-value pairs with the largest query likelihoods are chosen as the candidates.
OHSUMED,user1
RCV1,user6
3-fold cross validation learn the parameter
18
OHSUMED
RCV1
Performances of different facet-value pair recommendation approaches.
documents have their own facets
manually assigned generated automatically
Users’ preferences
provide structured queries to describe their information needs but not frequently , sometimes incorrectly
Experimental Methodology
Users’ preferences
baseline retrieval algorithm
3 workers work on each query.
某系統對於主題1檢索出4個相關網頁,其rank分別為1, 2, 4, 7;
ex: “mining”出現在100份文件中,整個文件集文件總數1000,則IDF= log(1000/100)
RCV1(Reuters Corpus Volume 1)
810,000 Reuters news stories published topic, geographical region, industry facets the first 50 topics of TREC 2002 track user
Facet-value pair recommendation Incorporate faceted feedback into retrieval
Experimental Methodology Experimental Results Conclusions
2
Introduction
information needs
14
*Evaluation Based on Mechanical Turk
3 workers work on each query.
15
*Experimental settings
Compare with
baseline retrieval method: BM25, without dback (PRF) real document relevance feedback (RRF)
A personalized search or filtering system usually suffers from the “cold start” problem.
to borrow information from other users. to develop user interaction mechanisms to
we investigate four approaches to recommending good facet-value pairs
Existing e-commerce websites often use a Boolean filtering strategy while retrieving products.
8
*TDF-IDF
:a facet-value pair
:the top document frequency of
for query
9
*Query Likelihood (QL)
:the frequency of in the query
:a translation model of document
based on document facets Incorporate faceted feedback into retrieval
逆向文件頻率(inverse document frequency,IDF)是一個詞語普遍重要性的度量。
Two major problems
faceted constraints 如果系統沒有返回相關文件,則準確率默認為0。
: the whole corpus
:assumed to be uniform over all documents
that contain
The facet-value pairs with the largest query
likelihoods are chosen as the candidates.
4
如果系統沒有返回相關文件,則準確率默認為0。
baseline retrieval algorithm
:the top document frequency of
for query
:the set of scores of all considered facet-value pairs.
逆向文件頻率(inverse document frequency,IDF)是一個詞語普遍重要性的度量。
collect more information from users
interactive user feedback mechanism
learn more about user information needs with limited user interactions
3
Faceted search
10
*TDF-QL
:to normalize the features
:the set of scores of all considered facet-value pairs.
11
Incorporate faceted feedback into retrieval
: the set of F-V pairs chosen by the user. Score documents by Boolean Model
如果系統沒有返回相關文件,則準確率默認為0。
users can choose interesting 主集合的平均準確率(MAP)是每個主題的平均準確率的平均值。
language: Chinese
facet-value
pairs
to improve the returned documents. PRF@5: pseudo relevance feedback using top 5 docs
:the original score of document
:the standard normalization
13
Experimental Methodology
Datasets
OSHUMED dataset
348,566 medical articles from 270 medical journals topics user information needs metadata field MeSH facets (Medical Subject Headline)
Set
=10 =100
16
Experimental Results
Overall Performances of faceted feedback
1,2,3 for OHSUMED; 4,5,6 for RCV1 Soft Model
17
*Boolean model V.S. Soft model
a simple interactive user feedback mechanism Existing e-commerce websites often use a Boolean filtering strategy while retrieving products.
we propose a soft retrieval model, a document that meets a users elected faceted constraint gets a certain number of credits