华南理工大学数据结构chapter9
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Searching sorted Arrays-- Jump Search.
For some value j, we check every j’th element in L, that is, we check elements L[j], L[2j], and so on. Define m such that mj≦ n < (m + 1)j, then the total cost of this algorithm is at most m + (j-1) 3-way comparisons. (3-way comparison means that we need to know if K is less than, equal to or greater than L[i].) The cost on n items with a jump of size j is
-> For dense sets (small range, high percentage of elements in set). -> Can use logical bit operators. 1 if element in set, 0 otherwise.
2) For Example: to find all primes that are odd numbers between 0 and 15
The car on the left hit the car I left. The car on 3 left hit 3 5 I 5. This is similar in spirit to Ziv-Lempel coding.
12
Searching in Sets
1) Using bit vector to represent the set.
PZipf 1 / i H n , H n 1 / i, loge n H n 1 loge n
i 1 n
Cn iPZipf i / i Η n n /H n n / loge n.
i 1 i 1
n
n
80/20 rule:
– 80% of accesses are to 20% of the records. – For distributions following 80/20 rule,
An exact-match query is a search for the record whose key value matches a specified key value. A range query is a search for all records whose key value falls within a specified range of key values.
Search
Given: Distinct keys k1, k2, …, kn and collection L of n records of the form (k1, I1), (k2, I2), …, (kn, In) where Ij is the information associated with key kj from record j for 1 <= j <= n. • Search Problem: For key value K, locate a record (kj, Ij) in L such that kj = K. (if one exists). Searching is a systematic method for locating the record(s) with key value kj = K.
1
ቤተ መጻሕፍቲ ባይዱ
Successful vs. Unsuccessful
A successful search is one in which a record with key kj = K is found. An unsuccessful search is one in which no record with kj = K is found (and no such record exists).
11
Text Compression Example
Application: Text Compression. Keep a table of words already seen, organized via Move-to-Front heuristic.
– – If a word not yet seen, send the word. Otherwise, send (current) index in the table.
4
Searching Unsorted and Sorted Arrays
Searching sorted Array-- Binary search (See Section 3.5)
Searching sorted Arrays– Dictionary search or interpolation search
3
Searching Unsorted and Sorted Arrays
Searching Unsorted Arrays-- Sequential Search
How many comparisons does linear search do on average?
For large collections of records that are searched repeatedly, sequential search is unacceptably slow.
n
if 1 i n 1 if i n
Cn
i ( i / 2 ) 2. i 1
7
Zipf Distributions
Applications:
– Distribution for frequency of word usage in natural languages. – Distribution for populations of cities, etc.
A hash function maps key values to positions. It is denoted by h. A hash table is an array that holds the records. It is denoted by HT. HT has M slots, indexed form 0 to M-1.
2
Approaches to Search
1. List search (sequential search or binary search). 2. Direct access by key value (hashing). 3. Tree indexing methods (Binary Search Tree, B-Trees).
Order lists by (expected) frequency of occurrence.
– Perform sequential search Assume that for each key ki, the probability pi that the record with key ki will be requested. Cost to access first record: 1 Cost to access second record: 2 Expected search cost:
When we know something about the distribution of key values, then search L at a position p that is appropriate to the value of K as follows.
5
Lists Ordered by Frequency
-> 0011010100010100 (from the left, 1 represents primes 2,3,5,7,11,13)
-> 0011010100010100 & 0101010101010101
13
Hashing (1)
Hashing: The process of mapping a key value to a position in a table.
14
Hashing (2)
For any value K in the key range and some hash function h, h(K) = i, 0 <= i < M, such that HT[i] = K. Hashing is appropriate only for sets (no duplicates). Good for both in-memory and disk-based applications. Hashing is not good for range queries, minimum or maximum queries and queries in key order.
Buffer pools are an example of a self-organizing list.
9
Heuristics
1. Order by actual historical frequency of access. (Similar to LFU buffer pool replacement strategy.) 2. Move-to-Front: When a record is found, move it to the front of the list. (Similar to LRU buffer pool replacement strategy.) 3. Transpose: When a record is found, swap it with the record ahead of it.
Cn 1p1 2 p2 ... npn .
6
Examples
(1) All records have equal frequency.
Cn
i / n (n 1) / 2
i 1
n
(2) Exponential Frequency
pi
{
1 / 2i 1 / 2 n 1
Cn 0.122n.
8
Self-Organizing Lists
Self-organizing lists modify the order of records within the list based on the actual pattern of record accesses.
Self-organizing lists use a heuristic for deciding how to reorder the list. These heuristics are similar to the rules for managing buffer pools.
10
Example 9.4-page 321 Assume that we have eight records, with key values A to H, and that they are initially placed in alphabetical Order (A B C D E F G H). Now, consider the result of applying the following access pattern: F D F G E G F A D F G E. 1) Count heuristic, the final list resulting will be F G D E A B C H, the total cost -- 45 comparisons. 2) Move-to-front heuristic, the final list resulting will be E G F D A B C H, the total cost -- 54 comparisons. 3) Transpose heuristic, the final list resulting will be A B F D G E C H, the total cost -- 62 comparisons.