网络信息检索—样本试卷
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Integrity for examination, serious consequences for cheating!
SCUT FINAL EXAMINATION
Attention :
1. Please fill in the information inside the sealed line before the exam
2. Write down your answers directly in this paper (or in an answer sheet)
3. Examination form: open-book (closed-book)
4. This paper contains three section, 100 points in total. There are 120 minutes for the exam
. Section one two
Three
four
five Total
Score Marker
Section One: Individual multiple-choice (3 points per problem, 30%) Problem 1 2 3 4 5 6 7 8 9 10
answer
1. Convert the Boolean Query Expresstion: q=)(321k k k to Disjunctive Normal Form (析
取范式): A.(1,1,1) ∨(1,0,0) ∨(1,1,0) B.(0,1,1) ∨(0,1,0) ∨(0,0,1) C.(1,1,1) ∨(1,0,0) ∨(1,0,1) D.(1,0,1) ∨(1,1,0) ∨(1,0,0)
2. When crawler is gathering pages, which of the following factors has nothing to do with the quality of a page?
A.Depth-first or breadth-first search strategy
B.PageRank of pages
C.Depth to crawl
D.
Content type of pages
3. Document d and query q can be express as a vector: d = (1, 1, 1, 0, 1, 1, 0),q= (1, 0, 1, 0, 0,
1, 1), use inner product and cosine method to calculate the similarity between d and q
respectively are:
A.3,20/3
B.3,5/3
C.5/3,3
D.20/3,3
4. The text pre-processing of a search engine mainly includes: A.
Tokenization 、Remove stop words 、Stemming 、Inverted index
_
____________________S
t u d e n t N o .S
c h o o l S
p e c i a l t y S
e a t N O .(
M u s t n o t w r i t e a n s w e r s i n s i d e t
h e s e a l e d l
i n e )…………………………T h e ………………………
………………………S e a l i n g ………………………………………l i n e ……………………………………c
c
a
c
B.Noise Reduction in Web Pages 、Remove stop words 、Stemming 、Inverted index
C.Noise Reduction in Web Pages 、Tokenization 、Remove stop words 、Stemming
D.
Tokenization 、Remove stop words 、Stemming 、Keywords extraction
5. The features of Web typically include:
A.Zipf law 、Small world theory (小世界理论)、Bow-tie Structure (蝴蝶结结构)
B.Heaps law 、Power Law distribution 、Bow-tie Structure
C.Zipf law 、Small world theory 、Heaps law
D.
Power Law distribution (幂律分布)、Small world theory 、Bow-tie Structure
6. If you need to emphasize that recall is more important than precision and the E measure is:
Where parameter
in E should be:
A. = 1
B. >1
C. <1
D.
=0
7. The edit distance (编辑距离) and Longest Common Subsequence (最长共有子序列) between “misspelled ” and “misinterpreted
” are respectively:
A .8,mis
B .8,mispeed
C .11,mispeed
D .11,mis
8. If the precision of a classification algorithm on two data sets with size of 100 and 1000
respectively are 0.8、0.9, the macro average precision and micro average precision of the algorithm are:
A.0.85,0.81
B.0.85,0.89
C.0.81,0.85
D.
0.89,0.85
9. If the content of a site
’s robot.txt is:
User-agent: GoogleBot
Disabllow: /private
It means that:
A.Does not allow GoogleBot to crawl the files or directories under the /private directory
B.Only allow GoogleBot to crawl the files or directories under the /private directory
C.Does not allow GoogleBot to crawl the files or directories outside the /private directory
D.
Only allow GoogleBot to crawl the files or directories outside the /private directory
R
P PR E
2
2
)1
(d
b
b
a b