Enumeration of Binary Trees and Universal Types
金康荣 随机森林算法的中文文本分类方法
金康荣随机森林算法的中文文本分类方法1. Random Forest algorithm is widely used in Chinese text classification.随机森林算法被广泛应用于中文文本分类。
2. This algorithm combines multiple decision trees to improve classification accuracy.该算法通过组合多个决策树来提高分类的准确性。
3. Random Forest algorithm can effectively handle high-dimensional and sparse feature spaces.随机森林算法可以有效处理高维稀疏特征空间。
4. It has been successfully applied in sentiment analysis, topic classification, and news categorization.该算法已成功应用于情感分析、主题分类和新闻归类。
5. The Random Forest algorithm can handle unbalanced datasets in text classification tasks.随机森林算法可以处理文本分类任务中的不平衡数据集。
6. By using feature importance measures, the algorithm can identify the most influential features in the classification process.通过使用特征重要性度量,该算法可以识别分类过程中最具影响力的特征。
7. Random Forest algorithm is computationally efficient and scalable to large datasets.随机森林算法在计算效率和大规模数据集上具有可扩展性。
数据结构之二叉树(BinaryTree)
数据结构之⼆叉树(BinaryTree)⽬录导读 ⼆叉树是⼀种很常见的数据结构,但要注意的是,⼆叉树并不是树的特殊情况,⼆叉树与树是两种不⼀样的数据结构。
⽬录 ⼀、⼆叉树的定义 ⼆、⼆叉树为何不是特殊的树 三、⼆叉树的五种基本形态 四、⼆叉树相关术语 五、⼆叉树的主要性质(6个) 六、⼆叉树的存储结构(2种) 七、⼆叉树的遍历算法(4种) ⼋、⼆叉树的基本应⽤:⼆叉排序树、平衡⼆叉树、赫夫曼树及赫夫曼编码⼀、⼆叉树的定义 如果你知道树的定义(有限个结点组成的具有层次关系的集合),那么就很好理解⼆叉树了。
定义:⼆叉树是n(n≥0)个结点的有限集,⼆叉树是每个结点最多有两个⼦树的树结构,它由⼀个根结点及左⼦树和右⼦树组成。
(这⾥的左⼦树和右⼦树也是⼆叉树)。
值得注意的是,⼆叉树和“度⾄多为2的有序树”⼏乎⼀样,但,⼆叉树不是树的特殊情形。
具体分析如下⼆、⼆叉树为何不是特殊的树 1、⼆叉树与⽆序树不同 ⼆叉树的⼦树有左右之分,不能颠倒。
⽆序树的⼦树⽆左右之分。
2、⼆叉树与有序树也不同(关键) 当有序树有两个⼦树时,确实可以看做⼀颗⼆叉树,但当只有⼀个⼦树时,就没有了左右之分,如图所⽰:三、⼆叉树的五种基本状态四、⼆叉树相关术语是满⼆叉树;⽽国际定义为,不存在度为1的结点,即结点的度要么为2要么为0,这样的⼆叉树就称为满⼆叉树。
这两种概念完全不同,既然在国内,我们就默认第⼀种定义就好)。
完全⼆叉树:如果将⼀颗深度为K的⼆叉树按从上到下、从左到右的顺序进⾏编号,如果各结点的编号与深度为K的满⼆叉树相同位置的编号完全对应,那么这就是⼀颗完全⼆叉树。
如图所⽰:五、⼆叉树的主要性质 ⼆叉树的性质是基于它的结构⽽得来的,这些性质不必死记,使⽤到再查询或者⾃⼰根据⼆叉树结构进⾏推理即可。
性质1:⾮空⼆叉树的叶⼦结点数等于双分⽀结点数加1。
证明:设⼆叉树的叶⼦结点数为X,单分⽀结点数为Y,双分⽀结点数为Z。
数据结构之树和二叉树ppt
G D E F
数据结构中讨论的一般都是有序树
清华大学出版社
数据结构(C++版) 数据结构( ++版 ++
5.1 树的逻辑结构
树的基本术语
森林: 棵互不相交的树的集合。 森林:m (m≥0)棵互不相交的树的集合。 棵互不相交的树的集合 A B E K L F C H D J
清华大学出版社
数据结构(C++版) 数据结构( ++版 ++
data parent
data:存储树中结点的数据信息 : parent:存储该结点的双亲在数组中的下标 :
清华大学出版社
数据结构(C++版) 数据结构( ++版 ++
5.1 树的逻辑结构
树的抽象数据类型定义
Parent 前置条件: 前置条件:树已存在 输入:结点x 输入:结点 功能:求结点x的双亲 功能:求结点 的双亲 输出:结点x的双亲的信息 输出:结点 的双亲的信息 后置条件: 后置条件:树保持不变 Depth 前置条件: 前置条件:树已存在 输入: 输入:无 功能: 功能:求树的深度 输出: 输出:树的深度 后置条件: 后置条件:树保持不变
清华大学出版社
数据结构(C++版) 数据结构( ++版 ++
5.1 树的逻辑结构
树的抽象数据类型定义
PreOrder 前置条件: 前置条件:树已存在 输入: 输入:无 功能: 功能:前序遍历树 输出:树的前序遍历序列 输出: 后置条件: 后置条件:树保持不变 PostOrder 前置条件: 前置条件:树已存在 输入: 输入:无 功能:后序遍历树 功能: 输出: 输出:树的后序遍历序列 后置条件: 后置条件:树保持不变 endADT
清华大学出版社
数据结构(C++版) 数据结构( ++版 ++
《数据结构与算法分析》(C++第二版)【美】Clifford A.Shaffer著 课后习题答案 二
《数据结构与算法分析》(C++第二版)【美】Clifford A.Shaffer著课后习题答案二5Binary Trees5.1 Consider a non-full binary tree. By definition, this tree must have some internalnode X with only one non-empty child. If we modify the tree to removeX, replacing it with its child, the modified tree will have a higher fraction ofnon-empty nodes since one non-empty node and one empty node have been removed.5.2 Use as the base case the tree of one leaf node. The number of degree-2 nodesis 0, and the number of leaves is 1. Thus, the theorem holds.For the induction hypothesis, assume the theorem is true for any tree withn − 1 nodes.For the induction step, consider a tree T with n nodes. Remove from the treeany leaf node, and call the resulting tree T. By the induction hypothesis, Thas one more leaf node than it has nodes of degree 2.Now, restore the leaf node that was removed to form T. There are twopossible cases.(1) If this leaf node is the only child of its parent in T, then the number ofnodes of degree 2 has not changed, nor has the number of leaf nodes. Thus,the theorem holds.(2) If this leaf node is the child of a node in T with degree 2, then that nodehas degree 1 in T. Thus, by restoring the leaf node we are adding one newleaf node and one new node of degree 2. Thus, the theorem holds.By mathematical induction, the theorem is correct.32335.3 Base Case: For the tree of one leaf node, I = 0, E = 0, n = 0, so thetheorem holds.Induction Hypothesis: The theorem holds for the full binary tree containingn internal nodes.Induction Step: Take an arbitrary tree (call it T) of n internal nodes. Selectsome internal node x from T that has two leaves, and remove those twoleaves. Call the resulting tree T’. Tree T’ is full and has n−1 internal nodes,so by the Induction Hypothesis E = I + 2(n − 1).Call the depth of node x as d. Restore the two children of x, each at leveld+1. We have nowadded d to I since x is now once again an internal node.We have now added 2(d + 1) − d = d + 2 to E since we added the two leafnodes, but lost the contribution of x to E. Thus, if before the addition we had E = I + 2(n − 1) (by the induction hypothesis), then after the addition we have E + d = I + d + 2 + 2(n − 1) or E = I + 2n which is correct. Thus,by the principle of mathematical induction, the theorem is correct.5.4 (a) template <class Elem>void inorder(BinNode<Elem>* subroot) {if (subroot == NULL) return; // Empty, do nothingpreorder(subroot->left());visit(subroot); // Perform desired actionpreorder(subroot->right());}(b) template <class Elem>void postorder(BinNode<Elem>* subroot) {if (subroot == NULL) return; // Empty, do nothingpreorder(subroot->left());preorder(subroot->right());visit(subroot); // Perform desired action}5.5 The key is to search both subtrees, as necessary.template <class Key, class Elem, class KEComp>bool search(BinNode<Elem>* subroot, Key K);if (subroot == NULL) return false;if (subroot->value() == K) return true;if (search(subroot->right())) return true;return search(subroot->left());}34 Chap. 5 Binary Trees5.6 The key is to use a queue to store subtrees to be processed.template <class Elem>void level(BinNode<Elem>* subroot) {AQueue<BinNode<Elem>*> Q;Q.enqueue(subroot);while(!Q.isEmpty()) {BinNode<Elem>* temp;Q.dequeue(temp);if(temp != NULL) {Print(temp);Q.enqueue(temp->left());Q.enqueue(temp->right());}}}5.7 template <class Elem>int height(BinNode<Elem>* subroot) {if (subroot == NULL) return 0; // Empty subtreereturn 1 + max(height(subroot->left()),height(subroot->right()));}5.8 template <class Elem>int count(BinNode<Elem>* subroot) {if (subroot == NULL) return 0; // Empty subtreeif (subroot->isLeaf()) return 1; // A leafreturn 1 + count(subroot->left()) +count(subroot->right());}5.9 (a) Since every node stores 4 bytes of data and 12 bytes of pointers, the overhead fraction is 12/16 = 75%.(b) Since every node stores 16 bytes of data and 8 bytes of pointers, the overhead fraction is 8/24 ≈ 33%.(c) Leaf nodes store 8 bytes of data and 4 bytes of pointers; internal nodesstore 8 bytes of data and 12 bytes of pointers. Since the nodes havedifferent sizes, the total space needed for internal nodes is not the sameas for leaf nodes. Students must be careful to do the calculation correctly,taking the weighting into account. The correct formula looks asfollows, given that there are x internal nodes and x leaf nodes.4x + 12x12x + 20x= 16/32 = 50%.(d) Leaf nodes store 4 bytes of data; internal nodes store 4 bytes of pointers. The formula looks as follows, given that there are x internal nodes and35x leaf nodes:4x4x + 4x= 4/8 = 50%.5.10 If equal valued nodes were allowed to appear in either subtree, then during a search for all nodes of a given value, whenever we encounter a node of that value the search would be required to search in both directions.5.11 This tree is identical to the tree of Figure 5.20(a), except that a node with value 5 will be added as the right child of the node with value 2.5.12 This tree is identical to the tree of Figure 5.20(b), except that the value 24 replaces the value 7, and the leaf node that originally contained 24 is removed from the tree.5.13 template <class Key, class Elem, class KEComp>int smallcount(BinNode<Elem>* root, Key K);if (root == NULL) return 0;if (KEComp.gt(root->value(), K))return smallcount(root->leftchild(), K);elsereturn smallcount(root->leftchild(), K) +smallcount(root->rightchild(), K) + 1;5.14 template <class Key, class Elem, class KEComp>void printRange(BinNode<Elem>* root, int low,int high) {if (root == NULL) return;if (KEComp.lt(high, root->val()) // all to leftprintRange(root->left(), low, high);else if (KEComp.gt(low, root->val())) // all to rightprintRange(root->right(), low, high);else { // Must process both childrenprintRange(root->left(), low, high);PRINT(root->value());printRange(root->right(), low, high);}}5.15 The minimum number of elements is contained in the heap with a single node at depth h − 1, for a total of 2h−1 nodes.The maximum number of elements is contained in the heap that has completely filled up level h − 1, for a total of 2h − 1 nodes.5.16 The largest element could be at any leaf node.5.17 The corresponding array will be in the following order (equivalent to level order for the heap):12 9 10 5 4 1 8 7 3 236 Chap. 5 Binary Trees5.18 (a) The array will take on the following order:6 5 3 4 2 1The value 7 will be at the end of the array.(b) The array will take on the following order:7 4 6 3 2 1The value 5 will be at the end of the array.5.19 // Min-heap classtemplate <class Elem, class Comp> class minheap {private:Elem* Heap; // Pointer to the heap arrayint size; // Maximum size of the heapint n; // # of elements now in the heapvoid siftdown(int); // Put element in correct placepublic:minheap(Elem* h, int num, int max) // Constructor{ Heap = h; n = num; size = max; buildHeap(); }int heapsize() const // Return current size{ return n; }bool isLeaf(int pos) const // TRUE if pos a leaf{ return (pos >= n/2) && (pos < n); }int leftchild(int pos) const{ return 2*pos + 1; } // Return leftchild posint rightchild(int pos) const{ return 2*pos + 2; } // Return rightchild posint parent(int pos) const // Return parent position { return (pos-1)/2; }bool insert(const Elem&); // Insert value into heap bool removemin(Elem&); // Remove maximum value bool remove(int, Elem&); // Remove from given pos void buildHeap() // Heapify contents{ for (int i=n/2-1; i>=0; i--) siftdown(i); }};template <class Elem, class Comp>void minheap<Elem, Comp>::siftdown(int pos) { while (!isLeaf(pos)) { // Stop if pos is a leafint j = leftchild(pos); int rc = rightchild(pos);if ((rc < n) && Comp::gt(Heap[j], Heap[rc]))j = rc; // Set j to lesser child’s valueif (!Comp::gt(Heap[pos], Heap[j])) return; // Done37swap(Heap, pos, j);pos = j; // Move down}}template <class Elem, class Comp>bool minheap<Elem, Comp>::insert(const Elem& val) { if (n >= size) return false; // Heap is fullint curr = n++;Heap[curr] = val; // Start at end of heap// Now sift up until curr’s parent < currwhile ((curr!=0) &&(Comp::lt(Heap[curr], Heap[parent(curr)]))) {swap(Heap, curr, parent(curr));curr = parent(curr);}return true;}template <class Elem, class Comp>bool minheap<Elem, Comp>::removemin(Elem& it) { if (n == 0) return false; // Heap is emptyswap(Heap, 0, --n); // Swap max with last valueif (n != 0) siftdown(0); // Siftdown new root valit = Heap[n]; // Return deleted valuereturn true;}38 Chap. 5 Binary Trees// Remove value at specified positiontemplate <class Elem, class Comp>bool minheap<Elem, Comp>::remove(int pos, Elem& it) {if ((pos < 0) || (pos >= n)) return false; // Bad posswap(Heap, pos, --n); // Swap with last valuewhile ((pos != 0) &&(Comp::lt(Heap[pos], Heap[parent(pos)])))swap(Heap, pos, parent(pos)); // Push up if largesiftdown(pos); // Push down if small keyit = Heap[n];return true;}5.20 Note that this summation is similar to Equation 2.5. To solve the summation requires the shifting technique from Chapter 14, so this problem may be too advanced for many students at this time. Note that 2f(n) − f(n) = f(n),but also that:2f(n) − f(n) = n(24+48+616+ ··· +2(log n − 1)n) −n(14+28+316+ ··· +log n − 1n)logn−1i=112i− log n − 1n)= n(1 − 1n− log n − 1n)= n − log n.5.21 Here are the final codes, rather than a picture.l 00h 010i 011e 1000f 1001j 101d 11000a 1100100b 1100101c 110011g 1101k 11139The average code length is 3.234455.22 The set of sixteen characters with equal weight will create a Huffman coding tree that is complete with 16 leaf nodes all at depth 4. Thus, the average code length will be 4 bits. This is identical to the fixed length code. Thus, in this situation, the Huffman coding tree saves no space (and costs no space).5.23 (a) By the prefix property, there can be no character with codes 0, 00, or 001x where “x” stands for any binary string.(b) There must be at least one code with each form 1x, 01x, 000x where“x” could be any binary string (including the empty string).5.24 (a) Q and Z are at level 5, so any string of length n containing only Q’s and Z’s requires 5n bits.(b) O and E are at level 2, so any string of length n containing only O’s and E’s requires 2n bits.(c) The weighted average is5 ∗ 5 + 10 ∗ 4 + 35 ∗ 3 + 50 ∗ 2100bits per character5.25 This is a straightforward modification.// Build a Huffman tree from minheap h1template <class Elem>HuffTree<Elem>*buildHuff(minheap<HuffTree<Elem>*,HHCompare<Elem> >* hl) {HuffTree<Elem> *temp1, *temp2, *temp3;while(h1->heapsize() > 1) { // While at least 2 itemshl->removemin(temp1); // Pull first two treeshl->removemin(temp2); // off the heaptemp3 = new HuffTree<Elem>(temp1, temp2);hl->insert(temp3); // Put the new tree back on listdelete temp1; // Must delete the remnantsdelete temp2; // of the trees we created}return temp3;}6General Trees6.1 The following algorithm is linear on the size of the two trees. // Return TRUE iff t1 and t2 are roots of identical// general treestemplate <class Elem>bool Compare(GTNode<Elem>* t1, GTNode<Elem>* t2) { GTNode<Elem> *c1, *c2;if (((t1 == NULL) && (t2 != NULL)) ||((t2 == NULL) && (t1 != NULL)))return false;if ((t1 == NULL) && (t2 == NULL)) return true;if (t1->val() != t2->val()) return false;c1 = t1->leftmost_child();c2 = t2->leftmost_child();while(!((c1 == NULL) && (c2 == NULL))) {if (!Compare(c1, c2)) return false;if (c1 != NULL) c1 = c1->right_sibling();if (c2 != NULL) c2 = c2->right_sibling();}}6.2 The following algorithm is Θ(n2).// Return true iff t1 and t2 are roots of identical// binary treestemplate <class Elem>bool Compare2(BinNode<Elem>* t1, BinNode<Elem* t2) { BinNode<Elem> *c1, *c2;if (((t1 == NULL) && (t2 != NULL)) ||((t2 == NULL) && (t1 != NULL)))return false;if ((t1 == NULL) && (t2 == NULL)) return true;4041if (t1->val() != t2->val()) return false;if (Compare2(t1->leftchild(), t2->leftchild())if (Compare2(t1->rightchild(), t2->rightchild())return true;if (Compare2(t1->leftchild(), t2->rightchild())if (Compare2(t1->rightchild(), t2->leftchild))return true;return false;}6.3 template <class Elem> // Print, postorder traversalvoid postprint(GTNode<Elem>* subroot) {for (GTNode<Elem>* temp = subroot->leftmost_child();temp != NULL; temp = temp->right_sibling())postprint(temp);if (subroot->isLeaf()) cout << "Leaf: ";else cout << "Internal: ";cout << subroot->value() << "\n";}6.4 template <class Elem> // Count the number of nodesint gencount(GTNode<Elem>* subroot) {if (subroot == NULL) return 0int count = 1;GTNode<Elem>* temp = rt->leftmost_child();while (temp != NULL) {count += gencount(temp);temp = temp->right_sibling();}return count;}6.5 The Weighted Union Rule requires that when two parent-pointer trees are merged, the smaller one’s root becomes a child of the larger one’s root. Thus, we need to keep track of the number of nodes in a tree. To do so, modify the node array to store an integer value with each node. Initially, each node isin its own tree, so the weights for each node begin as 1. Whenever we wishto merge two trees, check the weights of the roots to determine which has more nodes. Then, add to the weight of the final root the weight of the new subtree.6.60 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15-1 0 0 0 0 0 0 6 0 0 0 9 0 0 12 06.7 The resulting tree should have the following structure:42 Chap. 6 General TreesNode 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15Parent 4 4 4 4 -1 4 4 0 0 4 9 9 9 12 9 -16.8 For eight nodes labeled 0 through 7, use the following series of equivalences: (0, 1) (2, 3) (4, 5) (6, 7) (4 6) (0, 2) (4 0)This requires checking fourteen parent pointers (two for each equivalence),but none are actually followed since these are all roots. It is possible todouble the number of parent pointers checked by choosing direct children ofroots in each case.6.9 For the “lists of Children” representation, every node stores a data value and a pointer to its list of children. Further, every child (every node except the root)has a record associated with it containing an index and a pointer. Indicatingthe size of the data value as D, the size of a pointer as P and the size of anindex as I, the overhead fraction is3P + ID + 3P + I.For the “Left Child/Right Sibling” representation, every node stores three pointers and a data value, for an overhead fraction of3PD + 3P.The first linked representation of Section 6.3.3 stores with each node a datavalue and a size field (denoted by S). Each child (every node except the root)also has a pointer pointing to it. The overhead fraction is thusS + PD + S + Pmaking it quite efficient.The second linked representation of Section 6.3.3 stores with each node adata value and a pointer to the list of children. Each child (every node exceptthe root) has two additional pointers associated with it to indicate its placeon the parent’s linked list. Thus, the overhead fraction is3PD + 3P.6.10 template <class Elem>BinNode<Elem>* convert(GTNode<Elem>* genroot) {if (genroot == NULL) return NULL;43GTNode<Elem>* gtemp = genroot->leftmost_child();btemp = new BinNode(genroot->val(), convert(gtemp),convert(genroot->right_sibling()));}6.11 • Parent(r) = (r − 1)/k if 0 < r < n.• Ith child(r) = kr + I if kr +I < n.• Left sibling(r) = r − 1 if r mod k = 1 0 < r < n.• Right sibling(r) = r + 1 if r mod k = 0 and r + 1 < n.6.12 (a) The overhead fraction is4(k + 1)4 + 4(k + 1).(b) The overhead fraction is4k16 + 4k.(c) The overhead fraction is4(k + 2)16 + 4(k + 2).(d) The overhead fraction is2k2k + 4.6.13 Base Case: The number of leaves in a non-empty tree of 0 internal nodes is (K − 1)0 + 1 = 1. Thus, the theorem is correct in the base case.Induction Hypothesis: Assume that the theorem is correct for any full Karytree containing n internal nodes.Induction Step: Add K children to an arbitrary leaf node of the tree withn internal nodes. This new tree now has 1 more internal node, and K − 1more leaf nodes, so theorem still holds. Thus, the theorem is correct, by the principle of Mathematical Induction.6.14 (a) CA/BG///FEDD///H/I//(b) CA/BG/FED/H/I6.15 X|P-----| | |C Q R---| |V M44 Chap. 6 General Trees6.16 (a) // Use a helper function with a pass-by-reference// variable to indicate current position in the// node list.template <class Elem>BinNode<Elem>* convert(char* inlist) {int curr = 0;return converthelp(inlist, curr);}// As converthelp processes the node list, curr is// incremented appropriately.template <class Elem>BinNode<Elem>* converthelp(char* inlist,int& curr) {if (inlist[curr] == ’/’) {curr++;return NULL;}BinNode<Elem>* temp = new BinNode(inlist[curr++], NULL, NULL);temp->left = converthelp(inlist, curr);temp->right = converthelp(inlist, curr);return temp;}(b) // Use a helper function with a pass-by-reference // variable to indicate current position in the// node list.template <class Elem>BinNode<Elem>* convert(char* inlist) {int curr = 0;return converthelp(inlist, curr);}// As converthelp processes the node list, curr is// incremented appropriately.template <class Elem>BinNode<Elem>* converthelp(char* inlist,int& curr) {if (inlist[curr] == ’/’) {curr++;return NULL;}BinNode<Elem>* temp =new BinNode<Elem>(inlist[curr++], NULL, NULL);if (inlist[curr] == ’\’’) return temp;45curr++ // Eat the internal node mark.temp->left = converthelp(inlist, curr);temp->right = converthelp(inlist, curr);return temp;}(c) // Use a helper function with a pass-by-reference// variable to indicate current position in the// node list.template <class Elem>GTNode<Elem>* convert(char* inlist) {int curr = 0;return converthelp(inlist, curr);}// As converthelp processes the node list, curr is// incremented appropriately.template <class Elem>GTNode<Elem>* converthelp(char* inlist,int& curr) {if (inlist[curr] == ’)’) {curr++;return NULL;}GTNode<Elem>* temp =new GTNode<Elem>(inlist[curr++]);if (curr == ’)’) {temp->insert_first(NULL);return temp;}temp->insert_first(converthelp(inlist, curr));while (curr != ’)’)temp->insert_next(converthelp(inlist, curr));curr++;return temp;}6.17 The Huffman tree is a full binary tree. To decode, we do not need to know the weights of nodes, only the letter values stored in the leaf nodes. Thus, we can use a coding much like that of Equation 6.2, storing only a bit mark for internal nodes, and a bit mark and letter value for leaf nodes.7Internal Sorting7.1 Base Case: For the list of one element, the double loop is not executed and the list is not processed. Thus, the list of one element remains unaltered and is sorted.Induction Hypothesis: Assume that the list of n elements is sorted correctlyby Insertion Sort.Induction Step: The list of n + 1 elements is processed by first sorting thetop n elements. By the induction hypothesis, this is done correctly. The final pass of the outer for loop will process the last element (call it X). This isdone by the inner for loop, which moves X up the list until a value smallerthan that of X is encountered. At this point, X has been properly insertedinto the sorted list, leaving the entire collection of n + 1 elements correctly sorted. Thus, by the principle of Mathematical Induction, the theorem is correct.7.2 void StackSort(AStack<int>& IN) {AStack<int> Temp1, Temp2;while (!IN.isEmpty()) // Transfer to another stackTemp1.push(IN.pop());IN.push(Temp1.pop()); // Put back one elementwhile (!Temp1.isEmpty()) { // Process rest of elemswhile (IN.top() > Temp1.top()) // Find elem’s placeTemp2.push(IN.pop());IN.push(Temp1.pop()); // Put the element inwhile (!Temp2.isEmpty()) // Put the rest backIN.push(Temp2.pop());}}46477.3 The revised algorithm will work correctly, and its asymptotic complexity will remain Θ(n2). However, it will do about twice as many comparisons, since it will compare adjacent elements within the portion of the list already knownto be sorted. These additional comparisons are unproductive.7.4 While binary search will find the proper place to locate the next element, it will still be necessary to move the intervening elements down one position in the array. This requires the same number of operations as a sequential search. However, it does reduce the number of element/element comparisons, and may be somewhat faster by a constant factor since shifting several elements may be more efficient than an equal number of swap operations.7.5 (a) template <class Elem, class Comp>void selsort(Elem A[], int n) { // Selection Sortfor (int i=0; i<n-1; i++) { // Select i’th recordint lowindex = i; // Remember its indexfor (int j=n-1; j>i; j--) // Find least valueif (Comp::lt(A[j], A[lowindex]))lowindex = j; // Put it in placeif (i != lowindex) // Add check for exerciseswap(A, i, lowindex);}}(b) There is unlikely to be much improvement; more likely the algorithmwill slow down. This is because the time spent checking (n times) isunlikely to save enough swaps to make up.(c) Try it and see!7.6 • Insertion Sort is stable. A swap is done only if the lower element’svalue is LESS.• Bubble Sort is stable. A swap is done only if the lower element’s valueis LESS.• Selection Sort is NOT stable. The new low value is set only if it isactually less than the previous one, but the direction of the search isfrom the bottom of the array. The algorithm will be stable if “less than”in the check becomes “less than or equal to” for selecting the low key position.• Shell Sort is NOT stable. The sublist sorts are done independently, andit is quite possible to swap an element in one sublist ahead of its equalvalue in another sublist. Once they are in the same sublist, they willretain this (incorrect) relationship.• Quick-sort is NOT stable. After selecting the pivot, it is swapped withthe last element. This action can easily put equal records out of place.48 Chap. 7 Internal Sorting• Conceptually (in particular, the linked list version) Mergesort is stable.The array implementations are NOT stable, since, given that the sublistsare stable, the merge operation will pick the element from the lower listbefore the upper list if they are equal. This is easily modified to replace“less than” with “less than or equal to.”• Heapsort is NOT stable. Elements in separate sides of the heap are processed independently, and could easily become out of relative order.• Binsort is stable. Equal values that come later are appended to the list.• Radix Sort is stable. While the processing is from bottom to top, thebins are also filled from bottom to top, preserving relative order.7.7 In the worst case, the stack can store n records. This can be cut to log n in the worst case by putting the larger partition on FIRST, followed by the smaller. Thus, the smaller will be processed first, cutting the size of the next stacked partition by at least half.7.8 Here is how I derived a permutation that will give the desired (worst-case) behavior:a b c 0 d e f g First, put 0 in pivot index (0+7/2),assign labels to the other positionsa b c g d e f 0 First swap0 b c g d e f a End of first partition pass0 b c g 1 e f a Set d = 1, it is in pivot index (1+7/2)0 b c g a e f 1 First swap0 1 c g a e f b End of partition pass0 1 c g 2 e f b Set a = 2, it is in pivot index (2+7/2)0 1 c g b e f 2 First swap0 1 2 g b e f c End of partition pass0 1 2 g b 3 f c Set e = 3, it is in pivot index (3+7/2)0 1 2 g b c f 3 First swap0 1 2 3 b c f g End of partition pass0 1 2 3 b 4 f g Set c = 4, it is in pivot index (4+7/2)0 1 2 3 b g f 4 First swap0 1 2 3 4 g f b End of partition pass0 1 2 3 4 g 5 b Set f = 5, it is in pivot index (5+7/2)0 1 2 3 4 g b 5 First swap0 1 2 3 4 5 b g End of partition pass0 1 2 3 4 5 6 g Set b = 6, it is in pivot index (6+7/2)0 1 2 3 4 5 g 6 First swap0 1 2 3 4 5 6 g End of parition pass0 1 2 3 4 5 6 7 Set g = 7.Plugging the variable assignments into the original permutation yields:492 6 4 0 13 5 77.9 (a) Each call to qsort costs Θ(i log i). Thus, the total cost isni=1i log i = Θ(n2 log n).(b) Each call to qsort costs Θ(n log n) for length(L) = n, so the totalcost is Θ(n2 log n).7.10 All that we need to do is redefine the comparison test to use strcmp. The quicksort algorithm itself need not change. This is the advantage of paramerizing the comparator.7.11 For n = 1000, n2 = 1, 000, 000, n1.5 = 1000 ∗√1000 ≈ 32, 000, andn log n ≈ 10, 000. So, the constant factor for Shellsort can be anything less than about 32 times that of Insertion Sort for Shellsort to be faster. The constant factor for Shellsort can be anything less than about 100 times thatof Insertion Sort for Quicksort to be faster.7.12 (a) The worst case occurs when all of the sublists are of size 1, except for one list of size i − k + 1. If this happens on each call to SPLITk, thenthe total cost of the algorithm will be Θ(n2).(b) In the average case, the lists are split into k sublists of roughly equal length. Thus, the total cost is Θ(n logk n).7.13 (This question comes from Rawlins.) Assume that all nuts and all bolts havea partner. We use two arrays N[1..n] and B[1..n] to represent nuts and bolts. Algorithm 1Using merge-sort to solve this problem.First, split the input into n/2 sub-lists such that each sub-list contains twonuts and two bolts. Then sort each sub-lists. We could well come up with apair of nuts that are both smaller than either of a pair of bolts. In that case,all you can know is something like:N1, N2。
平衡二叉树
Vocabulary
树 tree 子树 subtree 森林 forest 根 root 叶子 leaf 结点 node 深度 depth 层次 level 双亲 parents 孩子 children 兄弟 brother 祖先 ancestor 子孙 descentdant
性质4
具有n个结点的接近完全二叉树的深度为
证明:假设深度为k,则根据性质2和接近完全 二叉树的定义有
性质5
如果对一棵有n个结点的接近完全二叉树(其深度为log2n+1,下取 整)的结点按层序编号(从第1层到第log2n+1层,每层从左到右), 则对任一结点i(1≤i≤n),有 (1)如果i=1,则结点i是二叉树的根,无双亲;如果i>1,则其双 亲PARENT(i)是结点[ i/2]。 (2)如果2i>n,则结点i无左孩子(结点i为叶子结点);否则其左孩 子LCHILD(i)是结点2i (3)如果2i+l>n,则结点i无右孩 子;否则其右孩子只RCHILD(i) 是结点2i+1
A B C D
E
F Right subtree
Left subtree
Rotate the tree clockwise by 45
A
N
45
B E K
NCBiblioteka FNNDN
G
NN
H M
NN
I
N
J
NN
L
NN
Element Left Right
Left
Right
Several binary trees
depth of ni ::= length of the unique path K from the root to ni. Depth(root) = 0.
A Comparison of Random Binary Tree Generators
cBritish Computer Society 2002A Comparison of Random BinaryTree GeneratorsJ ARMO S ILTANEVA 1ANDE RKKI M ¨AKINEN21Information Technology Center,City of Tampere,Lenkkeilij¨a nkatu 8,Finn-Medi 2,FIN-33520Tampere,Finland2Department of Computer and Information Sciences,PO Box 607,FIN-33014University of Tampere,FinlandEmail:em@cs.uta.fi,Jarmo.Siltaneva@tt.tampere.fiThis paper empirically compares five linear-time algorithms for generating unbiased random binary trees.More specifically,we compare the relative asymptotic performance of the algorithms in terms of the numbers of various basic operations executed on average per tree node.Based on these numbers a ranking of the algorithms,which depends on the operation weights,can bededuced.A definitive ranking of the algorithms,however,hardly exists.Received 1January 2002;revised 15May 20021.INTRODUCTIONBinary trees are essential in various branches of computer science [1].From time to time,there is a need to generate random binary trees.For example,when testing or analysing a program that manipulates binary trees,it is advantageous to have an efficient method to generate random binary trees with a given number n of nodes.In this paper we only consider algorithms that assign equal probability to all members of the family of trees with n nodes,i.e.we always use the uniform distribution.Mathematically,there are no problems at all in generating random binary ly,there exist algorithms to enumerate binary trees.Simply choose a random natural number i from the interval [1...C n ],where the n th Catalan number C n =2n n1n +1gives the number of binary trees with n nodes,and output the i th binary tree from the enumeration with an unranking algorithm.Computationally the problem is more complicated.As Martin and Orr [2]observe,C 5000needs over 2000digits in decimal notation.It is preferable that algorithms generating random binary trees use only integers of polynomial size on n or,if possible,of size O (n).Notice,however,that we also have to use probabilities;that is,real numbers from the interval [0...1].The problem of exponential numbers in generating unbiased random binary trees can be tackled through the use of binary tree codings.Instead of directly generating binary trees,the algorithms actually generate random code words which are in one-to-one correspondence with binary trees.This requires an efficient way to travel between trees and code words.All the coding schemes considered here allow efficient transformation algorithms.These transformationsare obvious when the code words are given,and are not considered in this paper.For further details concerning these transformations,the reader is referred to the original articles introducing the methods [2,3,4,5,6].The rest of this paper is organized as follows.In Section 2we recall the algorithms.Section 3describes the organi-zation of our tests and introduces the results in several data tables.In Section 4we sum up the computational costs given in Section 3.Finally,in Section 5we draw our conclusions.Sections 3–5are based on [7].2.THE ALGORITHMSThere are several linear-time random binary tree generators (for a survey,see [8]).In this section we recall five such algorithms.These algorithms can produce code words for random binary trees without any preliminary computations,while there is a sixth algorithm introduced by Johnsen [9]requiring a preprocessing phase which takes O (n 2)time and space.Johnsen’s algorithm is not considered here since,besides the need of a preprocessing phase,it uses integers of exponential size on n .Algorithms for structures related to binary trees are also presented in the literature,see e.g.[10].2.1.Arnold and SleepStrings of balanced parentheses are well-known to be in one-to-one correspondence with binary trees.The set of all balanced parentheses can be generated by the grammar with productionsS →{S }S,S →λ,where λstands for the empty string.(For notational clarity we use strings of curly brackets {and }.)Suppose we are generating binary trees with n nodes.The corresponding strings of balanced parentheses are of length 2n .From left to right,we construct a balanced654J.S ILTANEVA AND E.M¨AKINEN string by repeatedly choosing between a left parenthesis (corresponding to the production S→{S}S)and a rightparenthesis(the production S→λ).The decisionprobabilities depend only on the number r of unmatched leftparentheses produced so far and on the total number k ofsymbols remaining to be produced.Let A(r,k)denote thenumber of valid continuations when there are r unmatchedleft parentheses and k symbols remaining to be produced.The probabilities to produce left and right parentheses canbe expressed in terms of ly,the number of validcontinuations starting with a left parenthesis is A(r+1,k−1)and the number of valid continuations starting with a rightparenthesis is A(r−1,k−1).Hence,the probability P(r,k)to produce a right parenthesis when there are r unmatchedleft parentheses and k symbols remaining to be produced isP(r,k)=A(r−1,k−1)A(r,k).(1)A string of parentheses can be geometrically represented as a zig-zag line,starting from the origin and consisting of northeast and southeast edges of equal length.Each northeast edge represents a left parenthesis and each southeast edge represents a right parenthesis.A balanced string of parentheses has a drawing in which the line returns to the base line and has no edges below it.A geometric representation of the situation where there are r unmatched left parentheses and k symbols remaining to be produced is a similar zig-zag line from the point(0,r) to the point(k,0),not entering the negative region of the plane.Such paths are called positive paths.The other paths are negative paths.Arnold and Sleep[3]determine the number of positive paths,that is A(r,k),by subtracting the number of negative paths from the total number of paths. This difference isA(r,k)=2(r+1)(k+r+2)k(k+r)/2.(2)Based on(1)and(2),the probability P(r,k)can now bewritten asP(r,k)=r(k+r+2)2k(r+1).(3)Note that r=k gives P(r,k)=1.Equation(3)solves the problem of generating random binary trees:we choose random real numbers from the interval[0...1]and compare them to the results obtained by Equation(3)with the present values of r and k.2.2.Atkinson and SackAtkinson and Sack[4]give a divide-and-conquer algorithm to generate random strings of balanced parentheses.A string of parentheses is said to be balanced with defect i if:(1)it contains an equal number of left and right parentheses,i.e.its zig-zag line returns to the base line;and (2)the zig-zag line has precisely2i edges below the base line.Note that the set of balanced strings with defect0 is the one having a one-to-one natural correspondence with binary trees.Let B ni stand for the set of balanced strings with defect i and with length2n.The sets B n0,B n1,...,B nn are disjoint and their union B n is the set of all strings of parentheses containing an equal number of left and right parentheses. All the sets B ni have the same size2nn(1/(n+1))[11]. The algorithm of Atkinson and Sack chooses a random member of B n and transforms it into the corresponding member of B n0.If w is a string of parentheses,we denote by w the string obtained by replacing each left parenthesis by a right parenthesis and each right parenthesis by a left parenthesis. Let w be a string(not necessarily balanced)containing an equal number of left and right parentheses.We say that w is reducible if it can be written in the form w=w1w2,where both w1and w2are non-empty and contain an equal number of left and right parentheses.Otherwise,w is irreducible. If an irreducible string w contains an equal number of left and right parentheses,then either w or w is balanced. Moreover,w has a unique factorization w=w1w2...w k, where each w i is irreducible[4].The algorithm of Atkinson and Sackfirst generates a random combination X of n integers from{1,2,...,2n}. This is possible in linear time(for details see,e.g.,[12]). Next,a random string x=x1x2...x2n of parentheses is constructed by setting x i={provided that i∈X; otherwise x i=}.There are an equal number of left and right parentheses in x,and hence x is in B n.The crux of the algorithm is the mapping of x to a unique member of B n0.Formally,we need a map :B n→B n0 defined inductively as follows.When n=0,we have 0(λ)=λ.For n>0,we express w∈B n as w=uv, where u is non-empty and irreducible and v is of length s≥0.Now we define n by setting n(w)=u s(v),if u is balanced;otherwise n(w)={ s(v)}t,where u=}t{. It is possible to prove that n is bijective on each B ni[4]. The method of Atkinson and Sack has the nice feature that it uses only integers of size at most2n[4].2.3.KorshKorsh[5]introduced a random binary tree generation algorithm based on bit sequence rotations.Korsh’s method uses a binary tree coding scheme where the tree is given as a sequence of bit pairs,one pair for each node.The bits indicate whether or not the node has a non-null left and right subtree.Pairs are given in preorder.The code word of a tree with n nodes contains(n−1)1-bits and(n+1)0-bits.A k-rotation related to the node k of a Korsh’s code word is obtained by shifting thefirst k−1pairs from the front to the end of the code word.Korsh[5]shows that any sequence of bit pairs with(n−1)1-bits and(n+1)0-bits is either a valid code word or a k-rotation of a unique code word.Hence,it is sufficient to randomly generate a bit pair sequence of appropriate length(as in the method of Atkinson and Sack)and thenfind the corresponding Korsh’s code word as follows.Suppose that d is an arbitrary sequence of bit pairs with(n−1)1-bits and(n+1)0-bits.Find the shortest prefix of d where the number of0-bits exceeds theA C OMPARISON OF R ANDOMB INARY T REE G ENERATORS655 number of1-bits by two.If this prefix is proper(i.e.differsfrom d itself),shift the prefix to the end of d and repeatthe operation.This process will halt in linear time with thedesired result[5].Like the method of Atkinson and Sack,Korsh’s methoduses only integers of size at most2n[5].2.4.Martin and OrrConsider now the following coding method for binary trees.Each node in the right arm(the path from the root followingright child pointers)is labelled with0.If a node is a leftchild,its label is i+1where i is the label of the parent.The label of a right child is the same as the label of its parent.Read the labels in preorder.The code word obtained is calledan inversion table in[2].Generating a binary tree is now equivalent to generatinga code word(x1,x2,...,x n).If x j=i,Martin and Orr[2]use a cumulative probability distribution function F(k),which gives the probability that x j+1∈{0,...,k},k≤i+1.If a is the number of all valid code words with the prefix sofar produced,and b is the number of valid code words withthe prefix so far produced augmented with any code itemfrom the set{0,...,k},then F(k)=b/a.More generally,Fis a function of n,the length of the code word,i,the previouscode item,j,the position in the code word,and k,the upperbound for the next code item to be determined.Martin and Orr[2]give the following formulaF(n,i,j,k)=(k+1)(n−j+i+2)!(2n−2j+k)! (i+2)(n−j+k+1)!(2n−2j+i+1)!.We can now choose a random number x from the interval [0,1),andfind the largest m such that x≥F(n,i,j,m−1). Then m is the next code item.Let P(n,i,j,k)=F(n,i,j,k)−F(n,i,j,k−1) denote the probability that k is the next code item and let Q(n,j,k)=P(n,i,j,k−1)/P(n,i,j,k).To dispense with the factorials,Martin and Orr[2]derive the formulaeQ(n,j,k)=(k+1)(n−j+k+1) (k+2)(2n−2j+k−1)andP(n,i,j,i+1)=(i+3)(n−j)(i+2)(2n−2j+i+1).Because of the fact P(n,i,j,k−1)=Q(n,j,k)P(n,i, j,k),it is now easy to compute the values of P for all necessary k’s starting at i+1(the highest possible value for k)and continuing iteratively towards0(the lowest possible value).2.5.R´e myR´e my[6]gave the following inductive algorithm to generate a random binary tree with n internal nodes and n+1leaves:•suppose that so far we have a binary tree with k internal nodes and k+1leaves;•randomly select one of the2k+1nodes,denote the selected node by v;•replace v by a new node;•randomly choose v to be the left or right child of the new node,the other child of the new node is a new leaf, the subtrees of v are kept unchanged;•repeat the process of inserting nodes until the tree has n internal nodes and n+1leaves.The correctness of R´e my’s algorithm can be proved by considering binary trees with leaves labelled by numbers 1,...,n+ly,it is easy to show by induction(see [6,13]for details)that the algorithm generates allC n(n+1)!=2n!n!binary trees with labelled leaves with probabilityn!2n!.We use the implementation of R´e my’s algorithm given in [14].3.THE TESTSThe algorithms recalled in the previous section are all of linear-time complexity.Hence,in order to compare the algorithms we have to usefiner methods than the order of magnitude of the time complexities.A straightforward method is to compare execution times.The results so obtained,however,depend on the environment in which the tests are performed.More general information is obtained if we count the numbers of various types of operations executed.We record the numbers of the following operation types:•arithmetic operations(abbreviated in the sequel as ADD,MUL and DIV);•array references(ARR);•random number generator calls(RAN);•variable references and assignment statements(LOA, STO);•arithmetic and logical comparisons(CMP);•pointer references(PTR);•recursive procedure calls(REC);•miscellaneous operations(OTH).We have to make a few simplifying assumptions concerning the recording.For example,we treat logical expressions as if they were always completely evaluated, thus disregarding situations where the value of an expression becomes known before the end of the evaluation.We count a multiplication by2as an addition and not as a bit shift. Furthermore,we disregard type conversions and the size of operands in arithmetic operations.The majority of the ADD operations are additions or subtractions by one,mostly in loop counter variables.We count these operations as normal additions instead of increment operations.656J.S ILTANEVA AND E.M¨AKINENTABLE1.The average numbers of the operations per node from the random generation of code words by Arnold and Sleep’s algorithm.n t LOA STO CMP ADD MUL DIV ARR RAN OTH410,00034.9910.75 5.0010.25 2.00 1.00 2.00 1.000.001010,00039.2911.30 5.4512.10 3.00 1.50 2.00 1.500.0010010,00043.4411.91 5.9313.78 3.88 1.94 2.00 1.940.001000500043.9411.99 5.9913.98 3.99 1.99 2.00 1.990.0010,000100043.9912.00 6.0014.00 4.00 2.00 2.00 2.000.00100,00010044.0012.00 6.0014.00 4.00 2.00 2.00 2.000.00TABLE2.The average numbers of the operations per node from the random generation of code words by Atkinson and Sack’s algorithm.n t LOA STO CMP ADD MUL DIV ARR RAN REC OTH410,00085.2826.9012.3215.91 1.000.0015.35 1.000.910.911010,00081.0725.6811.3415.57 1.000.0015.53 1.000.570.5710010,00076.7424.4710.3815.17 1.000.0015.83 1.000.180.181000500075.5224.1410.1115.04 1.000.0015.95 1.000.060.0610,000100075.1624.0410.0415.01 1.000.0015.98 1.000.020.02100,00010075.0624.0110.0115.01 1.000.0015.99 1.000.010.01The correctness of the implementations used in the tests (i.e.the randomness of the trees produced)was verified by using the Chi-square test,as in[14].As an example,we give the detailed code for the Martin–Orr algorithm(cf.Section2.4)with the recordings of the operations.The procedure generates one random code word. Notation c(INS,x)stands for incrementing x times the counter corresponding the operation type INS:void main(){int i,j,k,n,x[n+1];//code word1...ndouble random,sum,p,q;x[1]=0;//c(LOA,2);c(ARR,1);c(STO,1);//c(LOA,4*n-1);c(STO,n);//for//c(CMP,n);c(ADD,n-1);//forfor(j=1;j<=n-1;j++){i=x[j];k=i+1;//c(LOA,4);c(STO,2);//c(ADD,1);c(ARR,1);p=double((i+3)*(n-j))/double((i+2)*(2*n-2*j+i+1));//c(LOA,12);c(STO,1);//c(ADD,8);c(MUL,2);c(DIV,1);sum=p;random=randomGen();//c(RAN,1);c(LOA,1);c(STO,2);while(random>sum){//c(LOA,2);c(CMP,1);//whileq=double((k+1)*(n-j+k+1))/double((k+2)*(2*n-2*j+k-1));//c(LOA,14);c(STO,1);//c(ADD,10);c(MUL,2);c(DIV,1);p=q*p;sum=sum+p;k=k-1;//c(LOA,6);c(STO,3);//c(ADD,2);c(MUL,1);}//c(LOA,2);c(CMP,1);//whilex[j+1]=k;//c(LOA,3);c(STO,1);//c(ADD,1);c(ARR,1);}}The above algorithm generates code words that represent binary trees according to the coding system by Martin and Orr’s algorithm.Because the algorithms use different coding systems,we record the costs in two stages:generating random code words and constructing the binary trees from the code words.We apply the unit cost principle,i.e.operation costs do not depend on the size of the integers handled.The tests were performed in an environment where this is possible for integers not exceeding231−1.We use the random number generator of Park and Miller [15]with parameters a=75and b=231−1giving more than2×109pseudorandom numbers before the sequence repeats itself.A C OMPARISON OF R ANDOMB INARY T REE G ENERATORS657TABLE3.The average numbers of the operations per node from the random generation of code words by Korsh’s algorithm.n t LOA STO CMP ADD MUL DIV ARR RAN OTH410,000103.1027.4120.2118.030.750.0015.250.750.001010,000105.2027.3420.1519.260.900.0016.300.900.0010010,000106.1027.1320.0519.960.990.0016.930.990.0010005000106.0127.0320.0019.99 1.000.0016.99 1.000.0010,0001000105.3326.8719.8219.86 1.000.0016.95 1.000.00100,000100105.9526.9919.9919.99 1.000.0017.00 1.000.00TABLE4.The average numbers of the operations per node from the random generation of code words by Martin and Orr’s algorithm.n t LOA STO CMP ADD MUL DIV ARR RAN OTH410,00031.867.77 2.2614.31 3.02 1.26 1.750.750.001010,00040.409.50 2.6518.90 4.05 1.65 1.900.900.0010010,00047.1410.83 2.9622.54 4.89 1.96 1.990.990.001000500047.9110.98 3.0022.95 4.99 2.00 2.00 1.000.0010,000100047.9911.00 3.0023.00 5.00 2.00 2.00 1.000.00100,00010048.0011.00 3.0023.00 5.00 2.00 2.00 1.000.00TABLE5.The average numbers of the operations per node from the random generation of code words by R´e my’s algorithm.n t LOA STO CMP ADD MUL DIV ARR RAN OTH410,00036.8216.25 3.83 3.00 2.000.009.91 2.009.911010,00036.1415.50 3.89 3.00 2.000.009.87 2.009.8710010,00035.9715.05 3.98 3.00 2.000.009.96 2.009.961000500035.9915.00 4.00 3.00 2.000.009.99 2.009.9910,000100036.0015.00 4.00 3.00 2.000.0010.00 2.0010.00100,00010036.0015.00 4.00 3.00 2.000.0010.00 2.0010.00TABLE6.The average numbers of the operations per node from the construction of binary trees from the code words by Arnold and Sleep’s and Atkinson and Sack’s algorithms.n t LOA STO CMP ADD ARR PTR REC OTH410,00013.75 4.75 3.00 2.00 2.00 3.50 2.000.001010,00013.90 4.90 3.00 2.00 2.00 3.80 2.000.0010010,00013.99 4.99 3.00 2.00 2.00 3.98 2.000.001000500014.00 5.00 3.00 2.00 2.00 4.00 2.000.0010,000100014.00 5.00 3.00 2.00 2.00 4.00 2.000.00TABLE7.The average numbers of the operations per node from the construction of binary trees from the code words by Korsh’s algorithm.n t LOA STO CMP ADD ARR PTR REC OTH410,00012.75 5.75 2.00 2.00 2.00 1.50 1.00 2.001010,00012.90 5.90 2.00 2.00 2.00 1.80 1.00 2.0010010,00012.99 5.99 2.00 2.00 2.00 1.98 1.00 2.001000500013.00 6.00 2.00 2.00 2.00 2.00 1.00 2.0010,000100013.00 6.00 2.00 2.00 2.00 2.00 1.00 2.00658J.S ILTANEVA AND E.M¨AKINENTABLE8.The average numbers of the operations per node from the construction of binary trees from the code words by Martin and Orr’s algorithm.n t LOA STO CMP ADD ARR PTR REC OTH410,0009.75 4.25 2.500.750.75 1.50 1.000.751010,00011.10 4.70 2.800.900.90 1.80 1.000.9010010,00011.91 4.97 2.980.990.99 1.98 1.000.991000500011.99 5.00 3.00 1.00 1.00 2.00 1.00 1.0010,000100012.00 5.00 3.00 1.00 1.00 2.00 1.00 1.00TABLE9.The average numbers of the operations per node from the construction of binary trees from the code words by R´e my’s algorithm.n t LOA STO CMP ADD ARR PTR REC OTH410,00023.258.75 6.750.00 4.50 4.00 2.259.001010,00021.908.30 6.300.00 4.20 4.00 2.108.4010010,00021.098.03 6.030.00 4.02 4.00 2.018.041000500021.018.00 6.000.00 4.00 4.00 2.008.0010,000100021.008.00 6.000.00 4.00 4.00 2.008.00TABLE10.The weights of the operations.LOA STO CMP ADD MUL DIV ARR PTR REC RAN OTH11 1.5 1.522 4.52103223.1.Generating code wordsIn this section we give the observed numbers of the operations for thefive methods from the random generation of code words.Tables1–5show the number n of nodes in trees,the number t of trees generated,and the average numbers of different operations per node(i.e.we sum up the corresponding numbers from all the t trees generated and divide the sums by nt).As a result,the numbers from different test runs are comparable even with differing values of n and t.When n is large enough,these values can be used to estimate and compare the asymptotic performance of the algorithms.3.2.Constructing the treesIn this section we give the observed numbers of the operations per node from the construction of the resulting binary trees from the code words generated in the previous section.The data shown in Tables6–9is collected and presented as in the case of Tables1–5.Notice that Arnold and Sleep’s and Atkinson and Sack’s algorithms use the same coding method.Memory space for tree nodes is allocated from a static pool of n nodes.In Tables6–9there is no column for the memory space allocation operation,since these are executed once per node.4.TOTAL COSTSSo far,we have counted the numbers of different operations the algorithms execute.These numbers are quite different among the algorithms.In order to compare the algorithms, we have to decide weights for different types of operations. The weights used in this paper are shown in Table10. Naturally,reasonable sets of weights vary from one environment to another.When the weights arefixed,it is straightforward to give the total costs of the algorithms.Just multiply every number in Tables1–9by its corresponding weight and add up the numbers in each row.Thefinal results obtained in this manner are given in Tables11–15.The column trees includes the weighted cost of36of allocating memory space for one tree node.The algorithms include many loops with afixed number of iterations.In principle,the cost of the control structure of this kind of loop can be avoided,if the body of the loop is duplicated in the program code the known number of times. As an aside,we recorded the amount by which ignoring these costs reduces the total costs.The reduction in the weighted costs of the generation of code words was of the order of15%for Atkinson and Sack,and Korsh,5%for Martin and Orr,and Remy,and zero for Arnold and Sleep. However,these results did not affect the ranking order of the algorithms.A C OMPARISON OF R ANDOMB INARY T REE G ENERATORS659TABLE11.The weighted costs of Arnold and Sleep’s algorithm.n t Codes Trees Total410,000115.5998.00213.591010,000142.8798.90241.7710010,000167.6899.44267.1210005000170.6699.49270.1610,0001000170.9799.50270.47100,000100171.0099.50270.50TABLE12.The weighted costs of Atkinson and Sack’s algorithm.n t Codes Trees Total410,000268.4998.00366.491010,000257.8698.90356.7610010,000246.9099.44346.3410005000243.8199.49343.3010,0001000242.9199.50342.41100,000100242.6499.50342.14TABLE13.The weighted costs of Korsh’s algorithm.n t Codes Trees Total410,000281.9786.50368.471010,000295.6187.40383.0110010,000303.0787.94391.0110005000303.4587.99391.4510,0001000302.0088.00390.00100,000100303.4088.00391.405.CONCLUSIONSThe level of convergence and monotonicity of the total costs in Tables11–15imply that our tests have been sufficiently long to estimate the relative constant factors of the time complexities(possibly excluding Korsh’s algorithm,whose total cost did not behave monotonically as n increased). We can conclude that exact asymptotic costs are found at least for Arnold and Sleep’s,Martin and Orr’s and R´e my’s algorithms.The standard deviations of the total costs of the algorithms(again excluding Korsh’s algorithm)are very small with n≥1000.First we discuss the numbers of different operation types in the algorithms.In most environments a call of a random number generator is far more expensive than the other operation types recorded.In Table10we have weighted it 32times more expensive than the basic operations LOA and STO.A natural measure of the efficiency of the algorithms is then the number of calls of the random number generator per node.Arnold and Sleep’s and R´e my’s algorithms use two calls per node,while one call is sufficient for the other three algorithms.TABLE14.The weighted costs of Martin and Orr’s algorithm.n t Codes Trees Total410,000104.9072.75177.651010,000130.9776.80207.7710010,000150.5779.23229.8010005000152.7579.47232.2310,0001000152.9879.50232.47100,000100153.0079.50232.50TABLE15.The weighted costs of R´e my’s algorithm.n t Codes Trees Total410,000195.72146.88342.601010,000194.14140.35334.4910010,000194.25136.44330.6810005000194.45136.04330.5010,0001000194.49136.00330.50100,000100194.50136.00330.50With respect to array references,Arnold and Sleep’s and Martin and Orr’s algorithms are clearly the best ones.They need only two array references per node while the other algorithms need at least10array references.Moreover,the two algorithms with the lowest numbers of array references also make their references in consecutive array positions. Tables11–15suggest the following overall order of the algorithms:1.Martin and Orr;2.Arnold and Sleep;3.R´e my;4.Atkinson and Sack;5.Korsh.However,as we let n increase from105up to106,run-time errors occurred with Arnold and Sleep’s and Martin and Orr’s algorithms because of integer overflow.These errors do not affect the results reported in this paper.Moreover, they could be avoided if we followed the algorithms less literally,i.e.rearranged the order of evaluation of arithmetic expressions with the help offloating-point calculations. This is achieved with only a few additional operations. The extra cost incurred is not crucial since on modern computersfloating-point arithmetic,built into hardware,can be quite close to integer arithmetic in speed.Execution times for the generation of code words (measured in the V AX7000-830environment)with average size trees(n=104)gave the order1.Martin and Orr;2.Atkinson and Sack;3.Korsh;4.Arnold and Sleep;5.R´e my.660J.S ILTANEVA AND E.M¨AKINENIn summary,Martin and Orr’s and Arnold and Sleep’s algorithms performed best in terms of weighted costs,but this is offset by the fact that they use larger integers than the other algorithms(see also[8]).Because the algorithms do not differ very much in performance,their overall order, however,may vary depending on the weights,i.e.on the implementation of the operations.In any case,weighted costs based on different operation weights can easily be recalculated from Tables1–9. ACKNOWLEDGEMENTThis work of the second author was supported by the Academy of Finland(project35025).REFERENCES[1]Knuth, D.E.(1997)The Art of Computer Programming.Vol.1,Fundamental Algorithms(3rd edn).Addison-Wesley, Reading,MA.[2]Martin,H.W.and Orr,B.O.(1989)A random binary treegenerator.In Proc.ACM17th Annual Computer Science Conf.,Louisville,KY,21–23February,pp.33–38.ACM Press,New York.[3]Arnold, D.B.and Sleep,M.R.(1980)Uniform randomgeneration of balanced parenthesis strings.ACM Trans.ng.Syst.,2,122–128.[4]Atkinson,M.D.and Sack,J.-R.(1992)Generating binarytrees at random.Inf.Process.Lett.,41,21–23.[5]Korsh,J.F.(1993)Counting and randomly generating binarytrees.Inf.Process.Lett.,45,291–294.[6]R´e my,J.L.(1985)Un proc´e d´e it´e ratif de d´e nombrementd’arbres binaires et son application`a leur g´e n´e ration al´e atoire.RAIRO Inform.Th´e or.,19,179–195.[7]Siltaneva,J.(2000)Random Generation of Binary Trees.Master’s Thesis,Department of Computer and Information Sciences,University of Tampere,Finland.(In Finnish).Also available at http://www.cs.uta.fi/research/theses/masters/. [8]M¨a kinen, E.(1999)Generating random binary trees—asurvey.Inf.Sciences,115,123–136.[9]Johnsen, B.(1991)Generating binary trees with uniformprobability.BIT,31,15–31.[10]Pallo,J.M.(1994)On the listing and random generation ofhybrid binary put.Math.,50,135–145. [11]Feller,W.(1968)Introduction to Probability and ItsApplications(3rd edn).V ol.1,Wiley,New York.[12]Reingold, E.M.,Nievergelt,J.and Deo,N.(1977)Combinatorial Algorithms:Theory and Practice.Prentice-Hall,Englewood Cliffs,NJ.[13]Alonso,L.and Schott,R.(1995)Random Generation ofTrees.Kluwer,Boston,MA.[14]M¨a kinen, E.and Siltaneva,J.(2001)A note onR´e my’s algorithm for generating random binary trees.Missouri J.Math.Sci..To appear.Also available at http://www.cs.uta.fi/reports/r2000.html.[15]Park,S.K.and Miller,K.W.(1988)Random numbergenerators:good ones are hard tofimun.ACM,31, 1192–1201.。
2020-2021年度第二届全国大学生算法设计与编程挑战赛(冬季赛)题解
2020-2021年度第⼆届全国⼤学⽣算法设计与编程挑战赛(冬季赛)题解热⾝赛题⽬描述:海的那边是敌⼈!为了夺回⾃由,艾尔迪亚帝国开始筹备起帝国巨⼈军队,利⽤艾伦始祖巨⼈之⼒,来指挥军队征战。
现在有12名巨⼈,他们的个⼦⾮常奇怪,第i i名巨⼈的⾝⾼为i i⽶。
现在,艾伦要将这12名巨⼈排成⼀排。
他想知道这12名巨⼈的排列⽅式有多少种。
例如:对于3名巨⼈的排列⽅式有6种:{1,2,3}、{1,3,2}、{2,3,1}、{2,1,3}、{3,1,2}、{3,2,1}请输出12名巨⼈的排列⽅式有多少种。
题意:输出12的全排列的个数思路1:使⽤全排列秘技:next_permutation#include<bits/stdc++.h>using namespace std;int main(){int sum = 1;int tr[] = {1,2,3,4,5,6,7,8,9,10,11,12};while (next_permutation(tr, tr + 12)) {sum++;}cout<<sum<<endl;}思路2:其实就是⾼中的排列组合⽽已,A(n,n)#include<bits/stdc++.h>using namespace std;int main(){int sum = 1;for(int i = 1; i <= 12; ++i)sum *= i;cout<<sum<<endl;}题⽬描述:在遥远的卡拉迪亚⼤陆,⼈们喜好骑马喜好砍杀。
⼤陆征战向来⾎⾬腥风,但为了社会主义的和谐发展,我们可以通过三⼦棋来决胜负!已知棋盘规模为 S*T ,具体参加三⼦棋的⼈数并不固定,现在你开了上帝之眼,你能看出来,究竟是谁赢下来这场「⾎腥」三⼦棋吗?胜利条件为:某⼀⾏或某⼀列或某⼀个斜⽅向(从左上到右下斜⽅向或从右上到左下斜⽅向)上有连续⾄少三个相同的棋⼦,则这枚棋⼦为获胜玩家的棋⼦。
算法分析与设计中国大学mooc课后章节答案期末考试题库2023年
算法分析与设计中国大学mooc课后章节答案期末考试题库2023年1.任何多项式时间算法都是好算法,都是有效的。
参考答案:错误2.选择排序的时间复杂度是O(____)参考答案:n^23.子集生成方法有()参考答案:增量构造法_位向量法_二进制法4.冒泡排序的时间复杂度为W(n^2)参考答案:错误5.二进制法生成子集,子集与运算可以生成并集参考答案:错误6.下面不是证明贪心算法证明方法的有()。
参考答案:优化7.使目标函数最大(小)的解是问题的()参考答案:最优解8.对于稠密图,使用()算法计算MST更适合参考答案:Prim9.区间调度问题贪心算法的时间复杂度是()参考答案:O(nlogn)10.最小生成树问题可以使用的算法有()参考答案:Kruskal_Solim_Prim11.问题的可行解是满足约束条件的解参考答案:正确12.贪心算法的思想是寻求局部最优解,逐步达到全局最优解参考答案:正确13.贪心算法总能找到可行解,并且是最优解。
参考答案:错误14.负权的最短路问题可以使用Dijkstra算法计算。
参考答案:错误15.设S是顶点子集,e是正好一个端点在S中的边中的最小边,那么最小生成树中肯定包含e.参考答案:正确16.递归函数的要素是()参考答案:边界条件_递归方程17.T(n) = T(n-1) + n ,T(1)=1,则 T(n) =()参考答案:n(n+1)/2_W(n^2)_Q(n^2)_(n^2)18.递归算法是直接或间接地调用自身的算法。
参考答案:正确19.递归是从简单问题出发,一步步的向前发展,最终求得问题,是正向的。
参考答案:错误20.每个递归算法原则上总可以转换成与它等价的迭代算法,反之不然。
参考答案:错误21.设有5000个无序的元素,希望用最快的速度挑选出其中前10个最大的元素,最好选用( )法。
参考答案:冒泡排序22.找n个元素的中位数的分治算法的时间复杂度为O(___).参考答案:n23.军事上迂回包围、穿插分割、各个歼灭是()思想。
编译原理第4章习题答案
2)文法: S SS | SS* | a 输入串:aaa*a++ 自底向上语法分析过程:
c.消除左递归:S->aS’ S’->SAS’|Ɛ A->+|*
代入
S->aS’ S’->aS’AS’|Ɛ A->+|*
d.得到的文法适用于自顶向下的语法分析吗? 适用。因为文法中不存在左公因子,也不存在左递归
4.4.3 S->SS+|SS*|a
FIRST(S)={a}
因为S是起始符号,把{$}加入到Follow(S)中。 对于S->SS+的第一个S,把First(S+) = {a}加入到Follow(S)中。 对于S->SS*的第一个S,把First(S*) = {a}加入到Follow(S)中。 对于S->SS+的第二个S,把First(+) = {+}加入到Follow(S)中。 对于S->SS*的第二个S,把First(*) = {*}加入到Follow(S)中。 所以,FOLLOW(S)={a,+,*,$}
S A| B A AA | E0E (A是0比1多的串) B BB | E1E (B是1比0多的串) E 0E1E | 1E0E | (E是0和1的个数相等的串)
5)所有由0和1组成的且其中不包含子串011的串的集合。
S AB A 1 A | B 0 B | 01 B |
6)所有由0和1组成的形如xy的串的集合,其中 x y 且x和y等长。 S AB | BA A XAX | 0 (A是奇数长度,中间为0的串) B XBX | 1 (B是奇数长度,中间为1的串) X0|1
数据结构暨南大学期末试卷试题
数据结构暨南大学期末试卷试题一、判断题(共10分)1. 当静态链表采用数组实现时,插入与删除操作仍需移动元素。
2. 栈也是一种线性表,也同样有顺序存储结构和链式存储结构。
3. 二叉树的三种遍历算法区别仅在于对树根、左右子树访问先后顺序的不同。
4. 邻接表是图的一种顺序存储结构。
5. 二叉树就是度数为2的树。
6. 在哈希表中勿需比较就可找到记录在表中的位置。
7. 线性表的链式存储结构既方便其存取操作,也方便其插入与删除操作。
8. 顺序存储结构既适合于完全二叉树,也同样适合于一般的二叉树。
9.一个算法是正确的、高效率的,还不能说它就是一个“好”的算法。
10. 快速排序与堆排序的平均时间复杂度相同。
二、概念填空(共20分,每题2分)1.对顺序存储结构的线性表,设表长为La;在各元素插入为等概率条件下,插入一个数据元素需平均移动表中元素_______ 个;在最坏情况下需移动表中元素_______ 个。
2.从逻辑角度看,四种基本的数据结构可分为__________、___________、____________和____________;两种存储结构为_____________和_________________。
3.一个深度为,的满k(k>2)叉树,其第i层(若存在)有________个结点;编号为p(p>1)的结点其父结点(父结点为非根结点)编号是___________________。
4.具有n个结点的完全二叉树的深度为____________;编号为p(<n)的结点其右孩子(若存在)结点编号是___________。
5.堆栈被称为一个_____________的线性表;队列被称为一个_____________的线性表。
6.静态查找表的查找方法主要有:有序表查找及________________________;在n个记录中进行折半查找,当查找不成功时,与关键字比较次数最多为_____________________。
数据结构及应用算法教程修订版
20
假如中序遍历二叉排序树,所得序列将是有 序旳,即实现了对原始数据旳排序,二叉排序 树即由此得名。
原始序列数据 (49,38,65,76,49,13,27,52)
构造旳二叉排序树 49
38
65
13
49
7627Leabharlann 52中序遍历二叉排序树
for ( i=H.length; i>1; --i ) { // 调整堆来实现排序 H.r[1]←→H.r[i]; // 将堆顶统计和目前未经排序子序列 // H.r[1..i]中最终一种统计相互互换 HeapAdjust(H.r, 1, i-1); // 对 H.r[1] 进行筛选
} 13
} // HeapSort
13
s->data = T ->data;
s->next = head->next;
38
head->next = s; degression(T ->rchild ); } }
s 插入结点
旳指针操作
40
降序排列旳动态模型演示
49
38
76
13
40
134738069
1343890
143308
3138
13
32
字母集: s, t, a, e, i
出现频度: 5, 6, 2, 9, 7
编码: 101, 00, 100, 11, 01
29
0
1
电文: eat
13 01
16 01
6 00
7 01
7 0
1
9 11
1 Embedding an Arbitrary Binary Tree into
Submitted May 18, 1995
பைடு நூலகம்
HE star graph was recently introduced in the literature as an interconnection network architecture for massively parallel systems 1,2]. It soon became popular because its architecture and properties compare favorably to the hypercube. This can be attributed to the fact that both|star graph and hypercube|belong to the same theoretic group of interconnection networks, namely, the Cayley graph. As a result, they both enjoy its symmetry properties. Some of the properties common to the star graph and the hypercube are node and edge symmetry, disjoint paths, fault tolerance, partitionability and recursive structure. The properties that favor the star graph over the hypercube are its sublogarithmic degree and diameter. The degree is the number of edges incident on a node and the diameter is the maximum distance between any pair of nodes in the network. For a star graph and a hypercube with N nodes, the degree and diameter of the star graph grow as O(log N= log logN) whereas they grow as O(log N) for the hypercube, as N grows. A smaller degree implies cheaper hardware implementation and a smaller diameter implies faster average communication time between nodes. This makes the star graph attractive and worthy of consideration. Ongoing research has shown that many results for the hypercube can be adapted to the star graph, although in some cases it remains open whether the same asymptotic bound on some parameters can be achieved 5,7,12]. Characteristics other than degree and diameter need to be considered when evaluating a network. Such characteristics
数据结构、算法与应用 C++语言描述SYL08
满二叉树
当高度为h 的二叉树恰好有2h - 1个元素时,称
其为满二叉树。
在树深度不变的情况下,具有最大可能节点数的 二叉树。所有叶节点都在最底层,除叶节点外, 每个节点的度均为2。
满二叉树
完全二叉树
深度为k具有n个节点的二叉树是一颗完全二叉树,
当且仅当它与k层满二叉树前1 ~ n个节点所构成
二叉树中序遍历非递归
void Inorder (BinaryTreeNode<T> *t ) { InitStack(S); BinaryTreeNode<T> *p=t; while (p || !StackEmpty(S)) { while (p!=null) {Push(S,p); p=p->lchild;} if (!StackEmpty(S)) { Pop(S,p); visit(p); p=p->rchild; } } }
统计二叉树中叶子节点个数
算法基本思想: 先序(或中序或后序)遍历二叉树,在遍历过程中查
找叶子结点,并计数。
由此,需在遍历算法中增添一个“计数”的参数,
并将算法中“访问结点” 的操作改为:若是叶子,
则计数器增1。
统计叶子节点个数
int CountLeaf (BiTree T){ //返回指针T所指二叉树中所有叶子结点个数 if (!T ) return 0;
来存储。
当每个节点都是其他节点的右孩子时,存储空间 达到最大。
右斜二叉树
二叉树链表描述
二叉树最常用的描述方法是用链表或指针。每个
元素都用一个有两个指针域的节点表示,这两个
域为LeftChild和RightChid。除此两个指针域外,
树和森林的概念二叉树 (Binary Tree)二叉树的表示二叉树遍历 (
friend ostream &operator << ( ostream &out, BinaryTree <Type> &Tree )
private: BinTreeNode <Type> *root; Type RefValue; BinTreeNode <Type> *Parent ( BinTreeNode <Type> *start, BinTreeNode <Type> *current ); int Insert ( BinTreeNode<Type> * ¤t, const Type &item ) void Traverse ( BinTreeNode<Type> *current, ostream &out ) const int Find ( BinTreeNode<Type> *current, const Type &item ) const
private: BinTreeNode<Type> *leftChild, *rightChild; Type data;
};
template &lTree { public:
BinaryTree ( ) : root (NULL) { } BinaryTree ( Type value ) : RefValue (value),
} }
Template <class Type> BinTreeNode <Type> *BinaryTree <Type>::Parent ( BinTreeNode <Type> *start, BinTreeNode <Type> *cuurent ) {
Morphing binary trees
1.2 Our Results
2
2 The Basics of Tree Morphing
Let T1 and T2 be two valid n-leaf trees with the same weight sequence at their leaves. Tree T1 is called the source tree and T2 the target tree. The weight at each internal node in either tree is the sum of the weights at its leaf descendants; because T1 and T2 are in general not isomorphic, the internal weights are not the same in the two trees. Because the trees are valid, the weight of every node is strictly between ?1 and +1.
.4 -.6 .4 -.6 .4 .4 .4 -.8 -.6 -.6 .9 -.6 -.8 .9 .4 -.6
Figure 1: Two valid trees with the same weight sequence at their leaves. Let T1 and T2 be two valid n-leaf binary trees whose leaves have the same weight sequence w1; : : :; wn. These trees correspond to two parallel bi-in nite chains, that is, chains with the same turn-angle sequences. The structures of T1 and T2 above the leaves, however, are in general not the same. See Figure 1. These di ering structures correspond to two di erent conformations of polygonal chains with the same underlying turn sequence. In particular, the two trees may have vastly di erent heights, ranging from (log n) to (n). The tree-morphing problem is to transform T1 to T2 by rotations that preserve validity, called valid rotations. In the polygon-morphing setting, a valid rotation corresponds to a deformation (translation and/or scaling) of a sub-chain that preserves parallelism and simplicity. Transforming T1 to T2 by a sequence of m valid rotations corresponds directly to morphing one bi-in nite chain into another with m elementary morphing steps. In the remainder of this paper, we will focus exclusively on the tree-morphing problem, ignoring any relationship to the polygon morphing problem. The problem of transforming one tree to another without any weight constraint has been studied before. Sleator, Tarjan, and Thurston 7] show that an n-node tree can be made isomorphic to any other n-node tree by at most 2n ? 6 rotations, slightly improving an earlier result by Culik and Wood 2]. Our tree-morphing problem, however, is more di cult because of the weight constraint on the nodes, which restricts the choice of rotations available to an algorithm. The best previous algorithm for tree morphing requires O(n4=3+ ) rotations 3]. Our main result is the following theorem: O(n log n) rotations su ce to morph any valid binary tree into another such tree. We can also compute these rotations in the same time bound. This paper is organized in seven sections. In Section 2, we review some of the relevant material from the paper of Guibas and Hershberger, and introduce the basic terminology. In Section 3, we introduce the concept of node inclinations and a key invariant maintained by our algorithm. Sections 4 and 5 provide the details of the proof of our main theorem, and Section 6 describes our algorithm for nding the rotations. We conclude with some remarks and open problems in Section 7.
C#数据结构和算法[BinaryTreesandBinary]
C#数据结构和算法[BinaryTreesandBinary] Trees are a very common data structure in computer science. A tree is anonlinear data structure that is used to store data in a hierarchical manner.We examine one primary tree structure in this chapter, the binary tree, alongwith one implementation of the binary tree, the binary search tree. Binarytrees are often chosen over more fundamental structures, such as arrays andlinked lists, because you can search a binary tree quickly (as opposed to alinked list) and you can quickly insert data and delete data from a binary tree(as opposed to an array).树是⼀种很常见的数据结构。
树是⼀种按层次存储的⾮线性的数据结构。
本章我们讲⼀种主要的树结构--⼆叉树,以及⼆叉树的⼀种实现。
⼆叉树是⼀种基础的数据结构,如数组,链表,你可以快速地从⼆对树中查找数据(相⽐较链表)⽽且你可以从⼆叉树中快递地插⼊和删除数据(相⽐较数组)。
THE DEFINITION OF A TREEBefore we examine the structure and behavior of the binary tree, we need todefine what we mean by a tree. A tree is a set of nodes connected by edges. Anexample of a tree is a company’s organization chart (see Figure 12.1).The purpose of an organization chart is to communicate to the viewer thestructure of the organization. In Figure 12.1, each box is a node and thelines connecting the boxes are the edges. The nodes, obviously, representthe entities (people) that make up an organization. The edges represent therelationship between the entities. For example, the Chief Information Officer(CIO), reports directly to the CEO, so there is an edge between these twonodes. The IT manager reports to the CIO so there is an edge connectingthem. The Sales VP and the Development Manager in IT do not have a directedge connecting them, so there is not a direct relationship between these twoentities.⼆叉树的定义在讲解之前,我们先给出⼆叉树的定义。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
The work was supported by NSF Grants CCR-0208709 and DMS-02-02815, NSA Grant MDA 904-03-10036, and NIH Grant R01 GM068959-01.
∗
1
1
Introduction
Trees are the most important nonlinear structures that arise in computer science. Applications are in abundance (cf. [15, 17]); in this paper we discuss a novel application of binary unlabeled ordered trees (further called binary trees) in information theory (e.g., counting Lempel-Ziv’78 parsings and universal types). Tree structures have been the object of extensive mathematical investigations for many years, and many interesting facts have been discovered. Enumeration of binary trees, which are of principal importance to computer science, has been known already by Euler. Nowadays, the number of such trees built on n nodes is called the Catalan number. Since Euler and Cayley, various interesting questions concerning statistics of randomly generated binary trees were investigated (cf. [9, 15, 17, 24, 26, 27]). In the standard model, one selects uniformly a tree among all binary unlabeled ordered trees built on n nodes, n 1 Tn (where |Tn | = 2 n n+1 =Catalan number). For example, Flajolet and Odlyzko [6] and Takacs [26] established the average and the limiting distribution for the height (longest path), while Louchard [18, 19] and Takacs [25, 26, 27] derive the limiting distribution for the path length (sum of all paths from the root to all nodes). As we indicate below, these limiting distributions are expressible in terms of the Airy’s function (cf. [1, 2]). While deep and interesting results concerning the behavior of binary trees in the standard model were uncovered, there are still many important unsolved problems of practical importance. Recently, Seroussi [22], when studying universal types for sequences and distinct parsings of the Lempel-Ziv scheme, asked for the enumeration of binary trees with a given path length. Let Tp be the set of binary trees of given path length p. Seroussi observed that the cardinality of Tp corresponds to the number of possible parsings of sequences of length p in the Lempel-Ziv’78 scheme, and the number of universal types (that we discuss below). We shall first enumerate Tp (cf. also Seroussi [23]), and then compute the limiting distribution of the number of nodes (phrases in the LZ’78 scheme) when a tree is selected uniformly among Tp . To the best of our knowledge these problems were never addressed before, with the exception of [22]. We show below that they are much harder than the corresponding problems in the more standard Tn model. As mentioned above, the problem of enumerating binary trees of a given path arose in Seroussi’s research on universal types. The method of types [4] is a powerful technique in information theory, large deviations, and analysis of algorithms. It reduces calculations of the probability of rare events to a combinatorial analysis. Two sequences (over a finite alphabet) are of the same type if they have the same empirical distribution. For memoryless sources, the type is measured by the relative frequency of symbol occurrences, while for Markov sources one needs to count the number of pairs of symbols. It turns out (cf. [12]) that the number of sequences of a given Markovian type can be counted by enumerating Eulerian paths in a multigraph. Recently, Seroussi [22] introduced universal types (for individual sequences and/or for sequences generated by a stationary and ergodic source). Two sequences of the same length p are said to be of the same universal type if they generate the same set of phrases in the incremental parsing of the Lempel-Ziv’78 scheme. It is proved that such sequences have the same asymptotic empirical distribution. But, every set of phrases defines uniquely a binary tree of path length p [11, 22] (with the number of phrases corresponding to the number of nodes in the Tp model). For example, strings 10101100 and 01001011 have the same set of phrases {1, 0, 10, 11, 00} and therefore the corresponding binary trees are the same. Thus, enumeration of Tp leads to counting universal types and different LZ’78 parsings of sequences of length p. 2
Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 spa@
Contents
1 Introduction 2 Summary of Results 3 Far Right Region 4 Right Region 2 4 13 16
Enumeration of Binary Trees and Universal Types∗