完整word版,高级数据库技术问答题(附答案)

合集下载

相关主题

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Database System

Problem 1

The hash join algorithm described in the textbook computes the natural join of two relations.

Write pseudocode for an iterator that implements a hash based algorithm for computing an “anti-join” operation:R anti-join S contains all tuples in R that have no matching tuples in S.Although anti-join is not an operation that can be specified directly in SQL, it is implemented in the query processor to support some types of queries. Your pseudocode must define the standard iterator functions open(),next(),and close(), and must fetch the input tuples using iterator calls (in other words, R and S may not be base relations, but may be outputs of other operators). Show what state information the iterator must maintain between calls. For simplicity, you can assume that S is small and easily fits in memory. Here are more details on anti-join: /wiki/Relational_algebra#Antijoin

Problem 2

Consider an index organized as a B+ tree. The leaf nodes contain pointers to a total of N records, and each block that makes up the index has m pointers. We wish to choose the value of m that will minimize search times on a particular disk device.

We are given the following information:

(a) For the disk that will hold the index, the time to read a given block into memory can be approximated by (25 + .02*m) milliseconds. The 25 milliseconds represent the seek and latency components of the read, the .02*m milliseconds is the transfer time. (That is, as m becomes larger, the larger the block will be and the more time it will take to read it into memory.)

(b) Once the block is in memory a binary search is used to find the right pointer. So the time to process a block in main memory is (a + b log_2 m) milliseconds, for some constants a , b. ("log_2 m" means log base 2 of m.)

(c) The main memory time constant "a" is much smaller than the disk seek and latency time of

25 milliseconds.

(d) The index is full, so that the number of blocks that must be

examined per search is log_m N.

Questions:

(i) What value of m minimizes the time to search for a given record? An approximate answer is OK. (HINT: If you come up with an equation which is hard to solve algebraically, try plugging in values to locate the root of the equation.) The value you obtain should be independent

of "b"!

(ii) What happens as the seek and latency constant (25ms) decreases? For instance, if this constant is cut in half, how does the optimum m value change?

Answer:

Let T1 be the time taken to read one block into main memory and T2 be the time to process that block in memory. If n blocks are examined to search for a given record, then the time taken for this search is :

n * (T1 + T2)

Substituting the appropriate expressions for all the variables, we get

T(m) = [ (25 + 0.02 m) + (a + b log2m) ] * log m N

Ignoring a in comparison with 30, we get

T(m) = [25 + 0.02 m + b log2m] * log m N

Substituting log2m by ln(m)/ln(2) and log m N by ln(N)/ln(m), we get

T(m) = [ (25 + 0.02m)/ln(m) ] * ln(N) + b * ln(N)/ln(2)