No one can ever master a programming language (PL) by studying it only without looking into and comparing it with other ones. Modern application software often requires a variety of components written in different PLs. Most importantly, language itself is not important at all; at least not important when compared to the fundamental ideas on architectures, frameworks, the design of the PL. A competing programmer can always get the hang of any PLs quickly.

Consequently, I make a list of PLs I would study seriously in the future.

  1. C# or Java
    • A serious pure OOP and enterprise level PL is always necessary for the reason of productivity.
  2. C/C++
    • The most powerful PL, which can be used to create almost everything.
    • Help to understand things under the hood.
    • C++: semi-OOP version of C, a horrible language which requires a life long time to study.
  3. Bash/Python
    • Scripts make your life easier.
  4. Javascript
    • Everything could be written in Javascript will be eventually written in Javascript.
  5. Lisp
    • Help to understand the beauty of functional PL.

The following part is the first part of my series notes on the book Essential C#, which is redeemed as the best book for C# learners.

  1. Essential C# 4.0: Basics(this post)
  2. Essential C# 4.0: Intermediate
  3. Essential C# 4.0: Advanced

这是一本无趣的书,甚至都不能被称作一本书,而更像是国内某大学老师上课的讲义,兼具冗余繁杂、事后诸葛亮。不可否认,的确有一些新奇的东西在里面,但无论怎么看都不禁让人疑惑:是不是原作者根据工作心得列了一份提纲,然后随意找了个写手照本宣科地添加了些文字?难道作者的目标群体仅仅就是些大一大二的学生么?

当然,无论这本书水平如何,毕竟来自大名鼎鼎的Facebook,我们这些后生们怎么着也得战战兢兢地读完啊……

以下是我的阅读笔记:

在来到耶鲁的九个月中,迫于自高中以来最大的学习压力,阅读量较小,总共读了约莫13本书,其中4、6、10、11算是重读。一度赶在停产之前购置了Kindle DXG ,意图博览盗版书,后来才发现系统与Kindle 3 相比 简直烂到了极致,不仅不能做笔记,更加凶残的是无法原生支持中文。连多看系统都已经不再为DXG更新。

现在我来简单讲讲这些书的读后感。

自1978年改革开放以来,中国经济突飞猛进,摸着石头过河让这片蓝海充满了危险、机遇以及不确定性,然而混沌孕新生,乱世出英雄,一大批企业家们迅速地完成了原始积累;当然,我们也不能忘记路边的累累白骨:他们都是国富民强最直接的功臣,是新时代最伟大的英雄。

他们要么是想人之不敢想,打改革的擦边球,有勇气面对风险,风险有多大,他们最后就取得了多大的成就;要么是运用专业头脑,或者引进外国的先进技术或者理念,步步为营。后生们想要有所建树,最简单直接的办法就是单点快速突破,继而精益求精。

以下是我整理的一些摘录。

毕业啦!

世事总不尽如人意,而贪心如我者,尝试写下人生的十个愿望;倘若某日能够实现十之八九,也就算是此生无悔啦!

  1. 月高风黑的时候总是可以拉上三五好友宵夜。老到住进养老院的时候总有大爷大妈陪着打麻将。
  2. 亲朋好友,还有我,都能够快乐健康地活着,不求有多大的出息,但求日子过得欢喜,无论多舒适还是多苦逼,都能够坚持自己选择的路,得瑟得瑟屁颠屁颠地走下去。
  3. 有个萌妹子一起玩儿,我们都有健美的体魄心态和性格,开开心心玩上一辈子。
  4. 拥有一家自己的互联网公司,创造有趣有用的产品,这是时代赋予青年们的使命。感激上苍让我们CSer成为时代的弄潮儿。
  5. 拿到一个PhD学位,不仅是因为“谁有知识我就对谁佩服得五体投地”,而且理想主义的说,他们至少对人类知识做出了一丁点儿的贡献。完全没有贡献?那就不要浪费时间啦!
  6. 写一本好书、绘一幅好画、填一篇好诗词、吟一支小曲儿、拍一部好电影。
  7. 有精力、财力或者智力帮助他人,尤其是年轻人。

剩下的三个,以后再许!

EOF

From Amazon

Comments

Dynamo successfully builds a highly available and scalable data store, which sacrifices consistency under certain failure scenarios but the data storage is still eventually consistent. The system is complex and thus this paper involves a huge number of details. However, the paper could not specify all of them. Some interesting parts seem to be missing. What are the supporting techniques used in Dynamo? How is it compared to other existing KV distributed store systems? Are there any possible extensions or future works?

This paper assumes that security concerns can be ignored for its internal use. Theoretically, security problems is still possible. At least, how can the administrator detect sybil attacks?

The author tried three partition schemes. The latter versions decouple partitioning and partition placement. The placement is changeable at run time. Consequently, the strategy 3 has a better efficiency. And the system recovers faster and it is easier to archive. However, the key space is partitioned equally into Q partitions, and every node assume Q/S tokens per node. The strategy 3 in Figure 7 is a little confusing. The author should specify that the span each node is responsible for is SIMILAR IN SIZE to each other, instead of exactly EQUAL IN SIZE to each other.

Yale DB

Comments

Two ways to improve HadoopDB

  1. Integrate the file system for local database with HDFS. In the design of HadoopDB, the file system and the database are separated, which means that the more space one part is preallocated and the less space the other part will occupy. This design is not scalable because the system administrator has to partition the space by hand. We could investigate a way to let the database run on the HDFS directly. The database’s file descriptor can be adapted to HDFS directly. Or, even better, we can add an extra virtualized layer between the database and the HDFS.
  2. Set up a better scheduler. It is mentioned in the paper that the original version of hadoop fair scheduler is not so good. It is queue-based and locally greedy and simply does not consider the big picture much. “Quincy: Fair Scheduling for Distributed Computing Clusters” advances a flow-based algorithm, which has a very good performance by increasing the throughput up to 40%.

本文摘录自Thinking in Java

1. The method finalize()

finalize()使用时,会在GC释放空间之前被调用到,目的是在GC的时候做一些特殊的清理工作;但finalize()绝不是C++里面的destructor,因为

1.对象不一定会被GC (而C++里析构函数一定会被调用到)

2.GC不等于析构

既然有GC,Java就没有必要用到析构函数,所以如果有需要在GC时候额外做些事情,就使用finalize()做最后的清理工作,尽管它不一定会被调用到,除非JVM没有内存空间了

3.GC仅和内存有关

既然Java本身内存回收问题就被GC解决了,为什么还要finalize()呢?因为,Java可能内嵌其他语言的代码,如果有类似C的内存处理函数malloc(),就必须要手动写free()。因此,finalize()极少会被用到。

C++创建对象的位置有两种,一是创建在本地的stack上,自动销毁于本段代码的回花括号处,另一个是创建在heap上,必须用delete手动销毁。Java不允许创建本地对象,必须使用new,但是既然有了GC就不用delete.

总之,要记住的是,GC和finalize()并不总会被调用到。

finalize()可以用到debug,代码如下

TerminationCondition.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 class Book {
    boolean checkedOut = false;
    Book(boolean checkOut) {
      checkedOut = checkOut;
    }
    void checkIn() {
      checkedOut = false;
    }
    public void finalize() {
      if(checkedOut)
        System.out.println("Error: checked out.");
        // Normally, you’ll also do this:
        // super.finalize(); // Call the base-class version
    }
  }

  public class TerminationCondition {
    //static Test monitor = new Test();
    public static void main(String[] args) {
      Book novel = new Book(true);
      // Proper cleanup:
      novel.checkIn();
      // Drop the reference, forget to clean up:
      new Book(true);
      // Force garbage collection & finalization:
      System.gc();
    }
  }
  /* output:
 Error: checked out.
 */

此处,System.gc() 强制finalize()必须执行。如果有finalize(),一般情况下,父类的finalize()也必须被执行到,所以Book.finalize()中有super.finalize()

2. GC的工作机制

Java中在heap上new东西并不像其他语言那么expensive,几乎和其他语言在stack上创建对象一样快。

非Java的GC机制:reference counting 数对象被引用了多少次,当等于零时就被清理掉。

Java的机制:adaptive generational stop-and-copy mark-and-sweep. 所有未死的对象都可以追溯到stack或者static storage上的一个引用,形成一个个web。这时候,也有不同的处理方式:

  1. stop-and-copy. 如果程序停下来了,将活的对象拷贝到新的heap上,垃圾就被留在了原来的heap。新heap上的对象也可以被排列得更加紧凑(compact)了。有两个问题:
    • 保持两个heap
    • 周期性拷贝,即便没有new新的东西,或者产生的垃圾很少,会导致浪费,所以需要另一个机制处理这种特殊的情况
  2. mark-and-sweep. 比较慢,但是在产生少量垃圾或者没有垃圾时候非常快。做法是,标记活对象,所有的标记完成后,扫除没有标记的对象。

如何做到adaptive地进行转化呢?JVM的内存分布在很多大的block当中。大的对象,可以独占block,不拷贝,而是增加其generation count,内含小对象的block则会被拷贝并整理。JVM会进行监视,如果所有对象都很稳定,GC效率降低的话,就切换到mark-and-sweep;同样, JVM会注意其效果,要是heap出现很多碎片,就会切换回stop-and-copy

EOF

I Foundations

1 The Role of Algorithms in Computing 5

1.1 Algorithms 5

An algorithm is thus a sequence of computational steps that transform the input into the output.

1.2 Algorithms as a technology 11

lg n < sqrt(n) < n < n lg(n) < n2 < n3 < n!

2 Getting Started 16

2.1 Insertion sort 16

1
2
3
4
5
6
7
8
9
10
11
  void insertionSort(double* A, int size) {
      for (int i = 1; i < size; ++i) {
          double key = A[i];
          int index = i - 1;
          while (index >= 0 && A[index] > key) {
              A[index+1] = A[index];
              -- index;
          }
          A[i+1] = key;
      }
  }

2.2 Analyzing algorithms 23

insertion sort has a worst-case running time of Θ(n2)

2.3 Designing algorithms 29

The divide-and-conquer approach and recursive

  • Divide: left and right subarrays
  • Conquer: recursively sort left and right
  • Combine: merge by comparing two numbers from the sorted left and right pair by pair
mergeSort.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
  //Input:
  // A array
  // size size of A
  // p starting index of the span waiting to be sorted
  // r ending index of the span waiting to be sorted
  //Procedure:
  // sort left
  // sort right
  // merge by comparing two numbers from the sorted left and right pair by pair
  //Output:
  // A sorted array 
    void mergeSort(int* a, int b, int e) {
      if (b < e) {
          int mid = (b+e)/2;
          mergeSort(a, b, mid);
          mergeSort(a, mid+1, e);
          // merge
          int* left = new int[mid-b+1 + 1];
          int* right= new int[e-mid + 1];
          for (int i = b; i <= mid; i++) {
              left[i-b] = a[i];
          }
          left[mid+1-b] = INT_MAX;
          for (int i = mid+1; i <= e; i++) {
              right[i-mid-1] = a[i];
          }
          right[e-mid] = INT_MAX;
          int i = 0, j = 0;
          for (int k = b; k <= e; k++) {
              a[k] = (left[i]<=right[j])?(left[i++]):(right[j++]);
          }
          delete[] left;
          delete[] right;
      }
    }

  • Θ(n lg n) but not operate in place

3 Growth of Functions 43

3.1 Asymptotic notation 43

  • O(). When we have only an asymptotic upper bound, we use O-notation.
    • O(g(n)) = {f(n): there exist positive constants c and n0 such that 0 <= f(n) <= c*g(n) for all n >= n0}.
    • if f(n) == c*g(n) is not possible, we use o()
  • Θ(). When we have both an asymptotic upper bound and an asymptotic lower bound, we use Θ-notation.
    • Θ(g(n)) = {f(n): there exist positive constants c1, c2, and n0 such that 0 <= c1g(n) <= f(n) <= c2g(n) for all n >= n0}
  • Ω(), When we have only an asymptotic lower bound, we use omega-notation.
    • uppercase_omega(g(n)) = {f(n): there exist positive constants c and n0 such that 0 <= c*g(n) <= f(n) for all n >= n0}
    • if f(n) == c*g(n) is not possible, we use ω()

3.2 Standard notations and common functions 53

4 Divide-and-Conquer 65

  • Three ways for solving recurrences
    • substitution method
    • recursive-tree method
    • master method
      • T(n) = a*T(n/b) + f(n), a>=1, b>1. It characterizes a divide- and-conquer algorithm that creates a subproblems, each of which is 1/b the size of the original problem, and in which the divide and combine steps together take f(n) time.

4.1 The maximum-subarray problem 68

In the stock market, you need to buy low and sell high. But the maximum profit does not always start at the lowest price or end at the highest price.

A brute-force solution

Choose 2 from n pairs of (date, price) and calculate the difference. Θ(n2)

A solution using divide-and-conquer

A transformation: create an array of the daily change in price, and find the maximum subarray.

  • find the max subarray on the left
  • find the max subarray on the right
  • find the max subarray crossing the middle
    • find max left, max right, and their sum
  • take the max of the above

TODO: codes

Θ(n*lg(n))

4.2 Strassen’s algorithm for matrix multiplication 75

O(n3) to O(n2.81)

4.3 The substitution method for solving recurrences

  1. Guess the form of the solution.
  2. Use mathematical induction to find the constants and show that the solution works.

4.4 The recursion-tree method for solving recurrences 88

depth * height

4.5 The master method for solving recurrences 93

Master Method

5 Probabilistic Analysis and Randomized Algorithms 114

5.1 The hiring problem 114 5.2 Indicator random variables 118 5.3 Randomized algorithms 122 5.4 Probabilistic analysis and further uses of indicator random variables

II Sorting and Order Statistics Introduction 147

worst-case running time average-cased running time
Insertion sort Θ(n^2) Θ(n^2)
Merge sort Θ(n lg n) Θ(n lg n)
Heapsort O(n lg n) -
Quicksort Θ(n^2) Θ(n lg n) (*)
Counting sort Θ(k+n) Θ(k+n)
Radix sort Θ(d(n+k)) Θ(d(n+k))
Bucket sort Θ(n^2) Θ(n)

6 Heapsort 151

O(n lg n),Our heap data structure is not garbage-collected storage.

heap.hpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
  #include <iostream>
  #include <cstdlib>

  template <class T>
  class Heap
  {
  public:
      Heap():m_array(0), m_size(0), m_maxSize(0){}
      Heap(int maxSize);
      ~Heap(){delete [] m_array;}
      bool insert(T item);
      bool isEmpty();
      T fetchMax();
      void deleteMax();
      void display();
      //Transfer Array<T> to a heap
      void arrayToHeap(T* array, int size);
      //Sort the heap
      void heapSort();

  private:
      void locate_down(int start, int end);
      void locate_up(int start, int end);

  private:
      T* m_array;
      int m_size;//num of elem stored
      int m_maxSize;//allocated memory
  };

  template <class T>
  Heap<T>::Heap(int maxSize)
  {
      m_array=new T[maxSize+1];
      m_size=0;
      m_maxSize=maxSize;
  }

  template <class T>
  T Heap<T>::fetchMax()
  {
      return m_array[1];
  }

  template <class T>
  bool Heap<T>::isEmpty()
  {
      return m_size==0;
  }

  template <class T>
  void Heap<T>::locate_down(int start, int end)
  {
      int c=start*2;
      int temp;
      while(c<=end)
      {
          if((c<end) && (m_array[c]<m_array[c+1])) //Pick the larger one in both childs. ATTENTION: c==end is not approved!!
              c=c+1
          if(m_array[start]<m_array[c])
          {
              temp=m_array[start];
              m_array[start]=m_array[c];
              m_array[c]=temp;
          }
          start=c;
          c=2*c;
      }
  }

  template <class T>  //overlap the top with the bottom, decrease the size, and locate_down
  void Heap<T>::deleteMax()
  {
      m_array[1]=m_array[m_size];
      --m_size;
      locate_down(1,m_size);
  }

  template <class T>
  void Heap<T>::locate_up(int start, int end)
  {
      int loc=end;
      int parent=end/2;
      int temp;
      while(parent>=start && m_array[loc]<m_array[parent])
      {
          temp=m_array[loc];
          m_array[loc]=m_array[parent];
          m_array[parent]=temp;
          loc=parent;
          parent=loc/2;
      }
  }

  template <class T>
  bool Heap<T>::insert(T item)
  {
      if(m_size==m_maxSize)
      {
          std::cerr<<"Heap is full";
          this->~Heap();
          return 0;
      }
      ++m_size;
      m_array[m_size]=item;
      locate_up(1, m_size);
      return true;
  }

  template <class T>
  void Heap<T>::display()
  {
      for(int i=1; i<=m_size; i++)
          std::cout<<m_array[i]<<" ";
      std::cout<<std::endl;
  }

  template <class T>
  void Heap<T>::arrayToHeap(T* array, int size)
  {
      m_maxSize=size;
      m_array=new T[m_maxSize+1];
      for(int i=1; i<=size; ++i)
          this->insert(array[i-1]);
      this->heapSort();
      for(int i=1; i<size; ++i)
          array[i-1]=m_array[i];
  }

  template <class T>
  void Heap<T>::heapSort()
  {
      T temp;
      int i;
      for(i=m_size; i>=2; --i)
      {
          temp=m_array[1];
          m_array[1]=m_array[i];
          m_array[i]=temp;
          locate_down(1, i-1);
      }
  }

6.1 Heaps 151

The (binary) heap data structure is an array object that we can view as a nearly complete binary tree.

1
2
3
4
5
6
7
8
9
10
  // calculate by shifting the binary representation
  int parent(i) {
      return i/2;
  }
  int left(i) {
      return 2*i;
  }
  int right(i) {
      return 2*i + 1;
  }
  • max-heap: for every node i other than the root, A[parent(i)]>=A[i].
  • min-heap: for every node i other than the root, A[parent(i)]<=A[i].
  • height of a node: number of edges on the longest simple downward path from the node to a leaf.
  • height of the heap: height of its root. Θ(lg n)

6.2 Maintaining the heap property 154

1
2
3
4
5
6
7
8
9
10
11
12
13
  // Input:
  //         A array
  //     i index
  //   Binary trees rooted at left(i) and right(i) are max-heaps
  //   but A[i] might be smaller than its children. 
  // Procedure:
  //     exchange A[i] with the larger one between A[left[i]] and A[right[i]]
  //     maxHeapify the new sub tree
  // Output:
  //     This routine lets the value at A[i] float down in the max-heap. 
  maxHeapify(A, i) {
      TODO;
  }

T(n) <= T(2*n/3) + Θ(1), T(n) = O(lg n) = O(height)

6.3 Building a heap 156

conduct maxHeapify upon the array from the rear to the head

O(n)

6.4 The heapsort algorithm 159

build max-heap, exchange the root with the end of heap, and maxHeapify the rest of the heap.

6.5 Priority queues 162

one of the most popular applications of a heap: as an efficient priority queue. It can be used to schedule jobs on a shared computer. Highest-priority job first.

prioqueue.hpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
  #include <iostream>
  template <class T>
  class PrioQueue //Min Heap
  {
  public:
      PrioQueue(int mSize=20);
      ~PrioQueue(){delete [] q;}
      bool IsEmpty() const {return size==0;}
      bool IsFull() const {return size==maxSize;}
      bool Append(const T &x);
      bool Serve(T &x);
  private:
      void AdjustDown(int start, int end);
      void AdjustUp(int start, int end);
      T *q;
      int size, maxSize;
  };

  template <class T>
  PrioQueue<T>::PrioQueue(int mSize)
  {
      maxSize=mSize;
      size=0;
      q=new T[maxSize];
  }

  template <class T>
  void PrioQueue<T>::AdjustDown(int start, int end)
  {
      if(start<end)
      {
          int parent=start;
          int child=2*parent+1;
          while(child<=end)
          {
              if((child<end)&&(q[child]>q[child+1]))
                  child=child+1;
              if(q[parent]>q[child])
              {
                  T temp=q[parent];
                  q[parent]=q[child];
                  q[child]=temp;
              }
              else break;
              parent=child;
              child=2*parent+1;
          }
      }
  }

  template <class T>
  void PrioQueue<T>::AdjustUp(int start, int end)
  {
      if(start<end)
      {
          int child=end;
          int parent=(child-1)/2;
          while((parent>=start)&& (q[parent]>q[child]))
          {
                  T temp=q[parent];
                  q[parent]=q[child];
                  q[child]=temp;
                  child=parent;
                  parent=(child-1)/2;
          }
      }
  }

  template <class T>
  bool PrioQueue<T>::Append(const T &x)
  {
      if(this->IsFull())
      {
          //throw std::Overflow;
          std::cout<<"overflow"<<std::endl;
         return(0);
      }
      size++;
      q[size-1]=x;
      this->AdjustUp(0, size-1);//ATTENTION: n-1
  }

  template <class T>
  bool PrioQueue<T>::Serve(T &x)
  {
      if(this->IsEmpty())
      {
          //throw std::Underflow;
          return(0);
      }
      x=q[0];
      q[0]=q[size-1];
      q[size-1]=0;
      size--;
      this->AdjustDown(0, size-1); //ATTENTION: n-1
      std::cout<<"After AdjustDown:"<<q[0]<<q[1]<<q[2]<<q[3]<<q[4]<<std::endl;
  }

7 Quicksort 170

Despite this slow worst-case running time Θ(n2) , quicksort is often the best practical choice for sorting because it is remarkably efficient on the average: its expected running time is Θ(n lg n), and the constant factors hidden in the Θ(n lg n) notation are quite small.

7.1 Description of quicksort 170

Divide-and-conquer paradigm

  • Divide: element A[q] and subarrays on the left and right, s.t. left <= A[q] <= right
  • Conquer: recursively sort left and right
  • Combine: NULL
quickSort.cxx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  void qSort(int* a, int b, int e) {
      if (b < e) {
          int midVal = a[e];
          // partition
          int j = b;
          for (int i = b; i <= e; i++) {
              if (a[i] <= midVal) {
                  int tmp = a[j]; a[j] = a[i]; a[i] = tmp;
                  j++;
              }
          }
          qSort(a, b, j-2);
          qSort(a, j, e);
      }
  }

7.2 Performance of quicksort 174

  • Worst-case partitioning: the partitioning routine produces one subproblem with n-1 elements and one with 0 elements.
    • T(n) = T(n-1) + T(0) + Θ(n)
    • Θ(n2)
  • Best-case partitioning: even possible split
    • T(n) = 2T(n/2) + Θ(n)
    • Θ(n lg n)
  • Balanced partitioning: Any split of constant proportionality yields a recursion tree of depth Θ(lg n), where the cost at each level is O(n).
  • Average case: PARTITION produces a mix of “good” and “bad” splits.
    • TODO

7.3 A randomized version of quicksort 179

pick the flag in the partition routine randomly (exchange with the one at the rear)

7.4 Analysis of quicksort 180

8 Sorting in Linear Time 191

comparison sorts: merge sort, heapsort, and quicksort sharing, must make Ω(n lg n) comparisons in the worst case to sort n elements.

However, counting sort, radix sort, and bucket sort…

8.1 Lower bounds for sorting 191

8.2 Counting sort 194

counting sort assumes that the input consists of integers in a small range

Create an extra array to record the occurrence of these number and put them back into a new array as the result. We usually use counting sort when we have k = O(n), in which case the running time is Θ(n).

8.3 Radix sort 197

TODO? 感觉像counting sort,只不过是存了整个数,而不是数数的个数

Radix sort tutorial

  • source: List of bytes
  • source_n: number of bytes to sort
  • dest[256]: 256 lists of bytes. each list should have enough space to hold source_n elements.

a pseudo-code would sort the list this way:

for i=0 to source_n do
   dest[source[i]].append(source[i])

8.4 Bucket sort 200

bucket sort assumes that the input is generated by a random process that distributes elements uniformly and independently over the interval [0, 1).

Bucket sort divides the interval [0, 1) into n equal-sized subintervals, or buckets, and then distributes the n input numbers into the buckets. Since the inputs are uniformly and independently distributed over [0, 1), we do not expect many numbers to fall into each bucket. To produce the output, we simply sort the numbers in each bucket (e.g. insertion sort) and then go through the buckets in order, listing the elements in each.

Θ(n)

9 Medians and Order Statistics 213

Selection problem selects i-st smallest number among a set A of n (distinct) numbers

9.1 Minimum and maximum 214

How to find the min in A? Best way is to iterate it once (n-1 comparisons).

How to find min and max simultaneously? In fact, we can find both the minimum and the maximum using at most 3n/2 comparisons. 每次取一对,自己先比较,然后大的跟max比,小的跟min比,总共只需要比3n/2次。

9.2 Selection in expected linear time 215

randomizedSelect.cxx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
  #include <iostream>
  using namespace std;

  int randomizedSelect(double* A, int p, int r, int i) ;
  int randomizedPatition(double* A, int p, int r) ;
  int partition(double* A, int p, int r) ;

  // ----------------------------------------------------------------------------
  // select i-st largest number from A[p] to A[r]
  int randomizedSelect(double* A, int p, int r, int i) {
      if (p == r)
          return A[p];
      int q = randomizedPatition(A, p, r);
      int k = q-p+1;
      if (i == k) // the pivot value is the answer
          return A[q];
      else if (i < k) // on the left
          return randomizedSelect(A, p, q-1, i);
      else            // on the right
          return randomizedSelect(A, q+1, r, i-k);
  }
  // ----------------------------------------------------------------------------
  // get the index of the number in the middle 
  // s.t. subarray on the left <= A[index] <= subarray on the right
  int randomizedPatition(double* A, int p, int r) {
      int i = random()%(p-r+1) + p;
      double tmp = A[i]; A[i] = A[r]; A[r] = tmp;
      return partition(A, p, r);
  }
  // ----------------------------------------------------------------------------
  // pick A[r] as the flag
  // i for the left array <= A[r]
  // j for iterating the whole array 
  // finally, exchange A[r] in the middle
  // return the new middle index
  int partition(double* A, int p, int r) {
      double x = A[r];
      int i = p - 1;
      for (int j = p; j <= r-1; ++j)
          if (A[j] <= x) {
              ++ i;
              double tmp = A[i]; A[i] = A[j]; A[j] = tmp;
          }
      double tmp = A[i+1]; A[i+1] = A[r]; A[r] = tmp;
      return i+1;
  }
  // ----------------------------------------------------------------------------
  int main() {
      double A[10] = {8,1,3,2,4,7,5,6,9,0};
      for (int i = 0; i < 10; ++i) cout << A[i];
      cout << endl;
      cout << randomizedSelect(A, 0, 9, 5) << endl;
      return 0;
  }

E[T(n)] = O(n)

9.3 Selection in worst-case linear time 220

O(n)

III Data Structures

  • Data structures: techniques for representing finite dynamic sets and manipulating them on a computer.
  • Dynamic-set operations: CR(U)D 大小前后: insert, search, delete, max, min, predecessor, successor.
  • Dictionary: a dynamic set supporting insert, delete, and test membership.
  • 2 categories of operations
    • queries
    • modifying operations

10 Elementary Data Structures 232

10.1 Stacks and queues 232

Stack
stack.h
1
2
3
4
5
6
7
8
9
10
11
12
  #include <iostream>
  template <class T>
  class Stack
  {
  public:
      virtual bool IsEmpty() const=0;
      virtual bool IsFull() const=0;
      virtual bool Top(T &x) const=0;
      virtual bool Push(T x) =0;
      virtual T Pop() =0;
      virtual void Clear() =0;
  };
Sequential Stack
seqstack.cxx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
  #include "stack.h"

  template <class T>
  class SeqStack:public Stack<T>
  {
  public:
      SeqStack(int mSize);
      ~SeqStack() {delete []s;}
      bool IsEmpty() const { return top == -1;}
      bool IsFull() const { return top == maxTop;}
      bool Top(T &x) const;         //return the top element in x
      bool Push(T x);
      T Pop();
      void Clear() { top = -1;}
  private:
      int top;
      int maxTop;
      T *s;
  };
  template <class T>
  SeqStack<T>::SeqStack(int mSize)
  {
      maxTop = mSize - 1;
      s = new T[mSize];
      top = -1;
  }

  template <class T>
  bool SeqStack<T>::Top(T &x) const
  {
      if (IsEmpty())
      {
          std::cout<<"Empty"<<std::endl;
          return false;
      }
      x = s[top];
      return true;
  }

  template <class T>
  bool SeqStack<T>::Push(T x)
  {
      if(IsFull())
      {
          std::cout<<"Overflow"<<std::endl;
          return false;
      }
      s[++top] = x;
      return true;
  }

  template <class T>
  T SeqStack<T>::Pop()
  {
      if(IsEmpty())
      {
          std::cout<<"Underflow"<<std::endl;
          return false;
      }
      return s[top--];
  }
SeqStackTest.cxx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
  #include "seqstack.h"
  int main()
  {
      SeqStack<int> S(10);
      std::cout<<S.IsEmpty()<<std::endl;
      std::cout << S.IsFull() << std::endl;
      for(int i = 0; i < 10; i++)
      {
          S.Push(i);
          int x;
          S.Top(x);
          std::cout<< x << " ";
      }
      std::cout << std::endl;
      std::cout << S.IsFull() << std::endl;
      for(int i = 0; i < 10; i++)
      {
          std::cout <<  S.Pop() << " " ;
      }
      std::cout << std::endl;
  }
Queue
queue.h
1
2
3
4
5
6
7
8
9
10
11
12
  #include <iostream>
  template <class T>
  class Queue
  {
  public:
      virtual bool IsEmpty() const=0;
      virtual bool IsFull() const=0;
      virtual bool Front(T& x) const=0;
      virtual bool EnQueue(T x) =0;
      virtual T DeQueue() =0;
      virtual bool Clear() =0;
  };
Sequential Queue
seqqueue.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
  #include "queue.h"

  template <class T>
  class SeqQueue:public Queue
  {
  public:
      SeqQueue(int mSize);
      ~SeqQueue() { delete [] q;}
      bool IsEmpty() const { return front == rear;}
      bool IsFull() const { return (rear+1) == front;}
      bool Front(T& x) const;
      bool EnQueue(T x);
      T DeQueue();
      void Clear() { front = rear = 0;}
  private:
      int front, rear;
      int maxSize;
      T *q;
  };

  template <class T>
  SeqQueue<T>::SeqQueue(int mSize)
  {
      maxSize = mSize;
      q = new T[maxSize];
      front = rear = 0;
  }

  template <class T>
  bool SeqQueue<T>::Front(T& x) const
  {
      if (IsEmpty())
      {
          std::cout << "Empty" << std::endl;
          return false;
      }
      x = q[front];
      return true;
  }

  template <class T>
  bool SeqQueue<T>::EnQueue(T x)
  {
      if (IsFull())
      {
          std::cout << "Overflow" << std::endl;
          return false;
      }
      q[(rear=(rear + 1) % maxSize)] = x;
      return true;
  }

  template <class T>
  T SeqQueue<T>::DeQueue()
  {
      if (IsEmpty())
      {
          std::cout << "Underflow" << std::endl;
          return false;
      }
      T x = q[front];
      front = (front + 1) % maxSize;
      return x;
  }

10.2 Linked lists 236

Linear List
linearlist.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  template <class T>
  class LinearList
  {
  public:
      virtual bool IsEmpty() const=0;
      virtual int GetLength() const=0;
      virtual bool Find(int i, T& x) const=0;            //return element x with index i
      virtual int Search(T x) const=0;                   //if x is found, return its index, else -1
      virtual bool Insert(int i, T x)=0;                 //insert x at element Ai.
      virtual bool Delete(int i)=0;                      //delete element Ai
      virtual bool Update(int i, T x)=0;                 //update element Ai with x
      virtual void Output(std::ostream& out)const=0;     //ouput the list to the stream
  protected:
      int ElemNum;                                        //element number in the list
  };
Single List
singlelist.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
  #include "iostream"
  #include "linearlist.h"

  template <class T> class SingleList;

  template <class T>
  class Node
  {
      private:
          T element;
          Node<T> *next;
          friend class SingleList<T>;
  };

  template <class T>
  class SingleList:public LinearList<T>
  {
      public:
          SingleList(){ head = NULL; this->ElemNum = 0;}
          ~SingleList();
          bool IsEmpty() const;
          int GetLength() const;
          bool Find(int i, T& x) const;
          int Search(T x) const;
          bool Insert(int i, T x);
          bool Delete(int i);
          bool Update(int i, T x);
          void Clear();
          void Output(std::ostream& out) const;
      private:
          Node<T>* head;
  };

  template <class T>
  SingleList<T>::~SingleList()
  {
      Node<T> *p;
      while (head)
      {
          p = head->next;
          delete head;
          head = p;
      }
  }

  template <class T>
  bool SingleList<T>::IsEmpty() const
  {
      return this->ElemNum == 0;
  }

  template <class T>
  int SingleList<T>::GetLength() const
  {
      return this->ElemNum;
  }

  template <class T>
  bool SingleList<T>::Find(int i, T& x) const
  {
      if (i < 0 || i > this->ElemNum -1)
      {
          std::cout << "Exceed Bounds!" << std::endl;
          return false;
      }
      Node<T> *p = head;
      for (int j = 0; j < i; j++) p = p->next;
      x = p->element;
      return true;
  }

  template <class T>
  int SingleList<T>::Search(T x) const
  {
      Node<T> *p = head;
      for (int i = 0; i < this->ElemNum; i ++)
      {
          if(x == p->element)
              return i;
          else
              p = p->next;
      }
      return -1;
  }

  template <class T>
  bool SingleList<T>::Insert(int i, T x)
  {
      if (i < 0 || i > this->ElemNum )
      {
          std::cout << "Exceed Bounds!" << std::endl;
          return false;
      }
      Node<T> * r = new Node<T>(); r->element = x;
      Node<T> * p = head;
      Node<T> * q ;
      for (int j = 0; j < i; j++)
      {
          q = p;
          p = p->next;
      }
      if(i == 0)
      {
          head = r;
          r->next = p;
      }
      else
      {
          q->next = r;
          r->next = p;
      }
      (this->ElemNum) ++;
      return true;
  }

  template <class T>
  bool SingleList<T>::Delete(int i)
  {
      if (!(this->ElemNum))
      {
          std::cout<<"Underflow!"<<std::endl;
          return false;
      }
      if (i < 0 || i > this->ElemNum - 1)
      {
          std::cout << "Exceed Bounds!" << std::endl;
          return false;
      }
      Node<T> * p = head;
      Node<T> * q = head;
      if (i == 0)
      {
          head = head->next;
          delete p;
      }
      else
      {
          for (int j = 0; j < i; j++)
          {
              q = p;
              p = p->next;
          }
          q->next = p->next;
          delete p;
      }
      (this->ElemNum) --;
      return true;
  }

  template <class T>
  bool SingleList<T>::Update(int i, T x)
  {
      if (i < 0 || i > this->ElemNum-1)
      {

          std::cout << "Exceed Bounds!" << std::endl;
          return false;
      }
      Node<T> * p = head;
      for (int j = 0; j < i; j ++)
      {
          p = p->next;
      }
      p->element = x;
      return true;
  }

  template <class T>
  void SingleList<T>::Clear()
  {
      Node<T> *p;
      while (head)
      {
          p = head->next;
          delete head;
          head = p;
      }
  }

  template <class T>
  void SingleList<T>::Output(std::ostream& out) const
  {
      Node<T> *p = head;
      while (p)
      {
          out << p->element << " ";
          p = p->next;
      }
      out << std::endl;
  }
SingleListTest.cxx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
  #include "singlelisti.h"
  #include "stdio.h"
  int main()
  {
      SingleList<int> LA;
      SingleList<int> LB;
      for (int i = 0; i < 6; i++) LA.Insert(i, i);
      for (int i = 5; i < 10; i++) LB.Insert(i-5, i);
      LB.Insert(0, 0);
      LB.Insert(4, 2);
      LB.Insert(LB.GetLength(), 4);
      LA.Output(std::cout);
      LB.Output(std::cout);
      Intersection(LA, LB);
      LA.Output(std::cout);

      return 0;
  }
Double List
doublelist.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
  #include "linearlist.h"

  template <class T> class DoubleList;

  template <class T>
  class Node
  {
      private:
          T element;
          Node<T> *last, *next;
          friend class DoubleList<T>;
  };

  template <class T>
  class DoubleList:public LinearList<T>
  {
      public:
          DoubleList(){ head = NULL; this->ElemNum = 0;}
          ~DoubleList();
          bool IsEmpty() const;
          int GetLength() const;
          bool Find(int i, T& x) const;
          int Search(T x) const;
          bool Insert(int i, T x);
          bool Delete(int i);
          bool Update(int i, T x);
          void Clear();
          void Output(std::ostream& out) const;
      private:
          Node<T>* head;
  };

  template <class T>
  DoubleList<T>::~DoubleList()
  {
      Node<T> *p;
      while (head)
      {
          p = head->next;
          delete head;
          head = p;
      }
  }

  template <class T>
  bool DoubleList<T>::IsEmpty() const
  {
      return this->ElemNum == 0;
  }

  template <class T>
  int DoubleList<T>::GetLength() const
  {
      return this->ElemNum;
  }

  template <class T>
  bool DoubleList<T>::Find(int i, T& x) const
  {
      if (i < 0 || i > this->ElemNum -1)
      {
          std::cout << "Exceed Bounds!" << std::endl;
          return false;
      }
      Node<T> *p = head;
      for (int j = 0; j < i; j++) p = p->next;
      x = p->element;
      return true;
  }

  template <class T>
  int DoubleList<T>::Search(T x) const
  {
      Node<T> *p = head;
      for (int i = 0; i < this->ElemNum; i ++)
      {
          if(x == p->element)
              return i;
          else
              p = p->next;
      }
      return -1;
  }

  template <class T>
  bool DoubleList<T>::Insert(int i, T x)
  {
      if (i < 0 || i > this->ElemNum )
      {
          std::cout << "Exceed Bounds!" << std::endl;
          return false;
      }
      Node<T> * r = new Node<T>(); r->element = x;
      Node<T> * p = head;
      for (int j = 0; j < i; j++)
      {
          p = p->next;
      }
      if(i == 0)
      {
          head = r;
          r->last = NULL;
          r->next = p;
          p->last = r;
      }
      else
      {
          r->last = p->last;
          p->last->next = r;
          r->next = p;
          p->last = r;
      }
      (this->ElemNum) ++;
      return true;
  }

  template <class T>
  bool DoubleList<T>::Delete(int i)
  {
      if (!(this->ElemNum))
      {
          std::cout<<"Underflow!"<<std::endl;
          return false;
      }
      if (i < 0 || i > this->ElemNum - 1)
      {
          std::cout << "Exceed Bounds!" << std::endl;
          return false;
      }
      Node<T> * p = head;
      if (i == 0)
      {
          head = head->next;
          delete p;
      }
      else
      {
          for (int j = 0; j < i; j++)
          {
              p = p->next;
          }
          p->last->next = p->next;
          p->next->last = p->last;
          delete p;
      }
      (this->ElemNum) --;
      return true;
  }

  template <class T>
  bool DoubleList<T>::Update(int i, T x)
  {
      if (i < 0 || i > this->ElemNum-1)
      {

          std::cout << "Exceed Bounds!" << std::endl;
          return false;
      }
      Node<T> * p = head;
      for (int j = 0; j < i; j ++)
      {
          p = p->next;
      }
      p->element = x;
      return true;
  }

  template <class T>
  void DoubleList<T>::Clear()
  {
      Node<T> *p;
      while (head)
      {
          p = head->next;
          delete head;
          head = p;
      }
  }

  template <class T>
  void DoubleList<T>::Output(std::ostream& out) const
  {
      Node<T> *p = head;
      while (p)
      {
          out << p->element << " ";
          p = p->next;
      }
      out << std::endl;
  }
DoubleListTest.cxx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
  #include "singlelisti.h"
  #include "stdio.h"
  int main()
  {
      SingleList<int> LA;
      SingleList<int> LB;
      for (int i = 0; i < 6; i++) LA.Insert(i, i);
      for (int i = 5; i < 10; i++) LB.Insert(i-5, i);
      LB.Insert(0, 0);
      LB.Insert(4, 2);
      LB.Insert(LB.GetLength(), 4);
      LA.Output(std::cout);
      LB.Output(std::cout);
      LA.Delete(3);
      LA.Output(std::cout);

      return 0;
  }

10.3 Implementing pointers and objects 241

  • How do we implement pointers and objects in languages that do not provide them?
    1. A multiple-array representation of objects
      • for example, next[], key[], prev[] can store the values and indexes
    2. A single-array representation of objects
      • for example, store triple < key, next, prev > in one single array
  • Allocating and freeing objects
    • a garbage collector is responsible for determining which objects are unused.
    • We keep the free objects in a singly linked list, which we call the free list. Each object in the representation is either in list L or in the free list, but not in both. Free list acts like a stack. 在free list的head操作

10.4 Representing rooted trees 246

  • Binary trees
  • Rooted trees with unbounded branching
    • left-child, right-sibling representation
      1. x.left-child points to the leftmost child of node x
      2. x.right-sibling points to the sibling of x immediately to its right.
      3. If node x has no children, then x.left-child = NIL, and if node x is the rightmost child of its parent, then x.right-sibling = NIL.
  • Other tree representations. Many are possible.
binarytree.hpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
  template <class T>
  struct BTNode
  {
      BTNode()
      {
          lChild = rChild = NULL;
      }
      BTNode(const T& x)
      {
          element = x;
          lChild = rChild = NULL;
      }
      BTNode(const T& x, BTNode<T>* l, BTNode<T>* r)
      {
          element = x;
          lChild = l;
          rChild = r;
      }
      T element;
      BTNode<T>* lChild, *rChild;
  };

  template <class T>
  class BinaryTree
  {
  public:
      BinaryTree() {root = NULL;}
      ~BinaryTree() {};
      bool IsEmpty() const;
      void Clear();
      bool Root(T& x) const;
      void MakeTree(const T& x, BinaryTree<T>& left, BinaryTree<T>& right);
      void BreakTree(T& x, BinaryTree<T>& left, BinaryTree<T>& right);
      void PreOrder(void (*Visit)(T& x));
      void InOrder(void (*Visit)(T& x));
      void PostOrder(void (*Visit)(T& x));
      int Size();
      void Copy();
  protected:
      BTNode<T>* root;
  private:
      void Clear(BTNode<T>* &t);
      void PreOrder(void (*Visit)(T& x), BTNode<T> *t);
      void InOrder(void (*Visit)(T& x), BTNode<T> *t);
      void PostOrder(void (*Visit)(T& x), BTNode<T> *t);
      int Size(BTNode<T> *t);
      void Clear(BTNode<T> *t);
      friend class BIterator;
  };

  template <class T>
  bool BinaryTree<T>::Root(T& x)const
  {
      if(root)
      {
          x = root->element;
          return true;
      }
      else
          return false;
  }

  template <class T>
  void BinaryTree<T>::MakeTree(const T& x, BinaryTree<T>& left, BinaryTree<T>& right)
  {
      if (root || &left == &right) return;
      root = new BTNode<T> (x, left.root, right.root);
      left.root=right.root=NULL;
  }

  template <class T>
  void BinaryTree<T>::BreakTree(T& x, BinaryTree<T>& left, BinaryTree<T>& right)
  {
      if (!root||&left==&right||left.root||left.root) return;
      x=root->element;
      left.root=root->lChild; right.root=root->rChild;
      delete root;
      root=NULL;
  }

  template <class T>
  void BinaryTree<T>::PreOrder(void (* Visit)(T& x))
  {
      PreOrder(Visit, root);
  }
  template <class T>
  void BinaryTree<T>::PreOrder(void (* Visit)(T& x), BTNode<T>* t)
  {
      if (t)
      {
          Visit(t->element);
          PreOrder(Visit, t->lChild);
          PreOrder(Visit, t->rChild);
      }
  }

  template <class T>
  void BinaryTree<T>::InOrder(void (* Visit)(T& x))
  {
      InOrder(Visit, root);
  }
  template <class T>
  void BinaryTree<T>::InOrder(void (* Visit)(T& x), BTNode<T>* t)
  {
      if(t)
      {
          InOrder(Visit, t->lChild);
          Visit(t->element);
          InOrder(Visit, t->rChild);
      }
  }


  template <class T>
  void BinaryTree<T>::PostOrder(void (* Visit)(T& x))
  {
      PostOrder(Visit, root);
  }
  template <class T>
  void BinaryTree<T>::PostOrder(void (* Visit)(T&x), BTNode<T>* t)
  {
      if(t)
      {
          PostOrder(Visit, t->lChild);
          PostOrder(Visit, t->rChild);
          Visit(t->element);
      }
  }

  template <class T>
  int BinaryTree<T>::Size()
  {
      return Size(root);
  }
  template <class T>
  int BinaryTree<T>::Size(BTNode<T> &t)
  {
      if(!t) return 0;
      return Size(t->lChild) + Size(t->rChild) + 1;
  }

  template <class T>
  BTNode<T>* BinaryTree<T>::Copy(BTNode<T>* t)
  {
      if(!t) return NULL;
      BTNode<T>* q=new BTNode<T>(t->element);
      q->lChild=Copy(t->lChild);
      q->rChild=Copy(t->rChild);
      return q;
  }

  template <class T>
  void BinaryTree<T>::Clear()
  {
      Clear(root);
  }
  template <class T>
  void BinaryTree<T>::Clear(BTNode<T>* t)
  {
      if(t)
      {
          Clear(t->lChild);
          Clear(t->rChild);
          delete t;
      }
  }


  template <class>
  class BIterator
  {
  public:
      virtual T* GoFirst(const BinaryTree<T>& bt)=0;
      virtual T* Next()=0;
      virtual void Traverse(void (*Visit)(T& x), const BinaryTree<T>& bt);
  protected:
      BTNode<T>* r, *current;
  };

  template <class T>
  void BIterator<T>::Traverse(void (*Visit)(T& x), const BinaryTree<T>& bt)
  {
      T* p=GoFirst(bt);
      while(p)
      {
          Visit(*p);
          p=Next();
      }
  }

  template <class T>
  class IInOrder:public BIterator<T>
  {
  public:
      IInOrder(BinaryTree<T>& bt, int mSize)
      {
          r=bt.root;
          current=NULL;
          s=new SeqStack<BTNode<T>* >(mSize);
      }
      T* GoFirst(const BinaryTree<T>& bt);
      T* Next();
  private:
      SeqStack<BTNode<T>*> *s;
  };

11 Hash Tables 253

  • A hash table is an effective data structure for implementing dictionaries.
    • searching worst-case Θ(n)
    • expected O(1)
    • Instead of using the key as an array index directly, the array index is computed from the key.

11.1 Direct-address tables 254

Assume that universe U of keys is reasonably small, and no two elements have the same key. Store x at T[x.key] or at the place which T[x.key] points to. (Even storing the key is not necessary.)

slot: position in the array (direct-address table)

11.2 Hash tables 256

  • What if universe U is large? namely, K of keys stored in a dictionary is much smaller than the universe U of all possible keys?
  • hash function! element is stored in slot h(k) instead of key directly
    • h: U -> {0, 1, …, m-1}
    • An element with key k hashes to slot h(k).
  • Problem: collision: two keys may hash to the same slot.
    • Collision resolution by chaining, insert at the head
      • Given a hash table T with m slots that stores n elements, we define the load factor α for T as n/m, that is, the average number of elements stored in a chain.
      • Worst-case: Θ(n) + T(compute hash)
      • Average-case: simple uniform hashing Any given element is equally likely to hash into any of the m slots, independently of where any other element has hashed to.
        • Unsuccessful, Θ(1+α) n个元素m个slot,期望是每个slot有n/m个元素
        • Successful, Θ(1+α)

11.3 Hash functions 262

  • What makes a good hash function? less collisions
    • satisfies (approximately) the assumption of simple uniform hashing.
    • Hash values are independent of any patterns that might exist in the data.
  • Interpreting keys as natural numbers
  • Different ways:
    1. division method. h(k) = k mod m.
    2. multiplication method. h(k) = ⦏m(kA mod 1)⦎, “kA mod 1” means the fractional part of kA, that is, kA - ⦏kA⦎. (A is a constant, 0 < A < 1) Knuth suggests that A = 0.618
    3. universal hashing. select the hash function at random from a carefully designed class of functions.

11.4 Open addressing 269

  • Open addressing: all elements occupy the hash table itself. No chaining. load factor α can never exceed 1.
    • We compute the sequence of slots to be examined by probing until we find an empty slot. The probing sequence has not to be < 0, 1, …, m-1 > with Θ(n), but depends upon the key being inserted. To determine which slots to probe, we extend the hash function to include the probe number (starting from 0) as a second input. < h(k,0), h(k,1), …, h(k,m-1) >.
    • 注意,此处不能标记删除的slot为NIL,因为这样一来就无法读到存在其后面部分的散列 值了。解决方法是,用DELETED取代NIL.
    • uniform hashing: the probe sequence of each key is equally likely to be any of the m! permutations of < 0, 1, …, m-1 >.
      • linear probing
        • h(k,i) = (h’(k)+i) mod m
        • problem: primary clustering
      • quadratic probing
        • h(k,i) = (h’(k) + c1*i + c2*i2) mod m
        • problem: secondary clustering
      • double hashing
        • one of the best
        • h(k,i) = (h1(k) + i*h2(k)) mod m

11.5 Perfect hashing 277

  • Perfect hashing: O(1) memory accesses are required to perform a search in the worst case.
    • two levels of hashing with universal hashing at each level.
    • By choosing the first-level hash function well, we can limit the expected total amount of space used to O(n)
    • TODO 证明很复杂的样子

C++ std::map的使用

12 Binary Search Trees 286

  • Performance
    • Best-case Θ(lgn)
    • Average-case Θ(lgn)
    • Worst-case Θ(n)
  • Improve? red-black trees(height O(lg n)), B-trees(secondary storage)
bstree.hpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
  #include"dynamic.h"
  #include"binarytree.h"
  //二叉搜索树:
  //所有节点的关键字不同,此树要么为空,要么:
  //1. 左不空,则左所有<根
  //2. 右不空,则右所有>根
  //3. 左右子树均为二叉搜索树
  template <class T>
  class BSTree:public DynamicSet<T>
  {
  public:
      BSTree() {root=NULL;}
      ResultCode Search_recursion(T& x) const;
      ResultCode Search_iteration(T& x) const;
      ResultCode Insert_recursion(T& x);
      ResultCode Insert_iteration(T& x);
      ResultCode Remove(T& x);

  protected:
      BTNode<T>* root;

  private:
      ResultCode Search(BTNode<T> *p, T& x) const;
      ResultCode Insert(BTNode<T> *p, T& x);
  };

  template <class T>
  ResultCode BSTree<T>::Search_recursion(T& x) const
  {
      return Search(root, x);
  }

  template <class T>
  ResultCode BSTree<T>::Search(BTNode<T> *p, T& x) const
  {
      if(!p) return NotPresent;
      else if(x>p->element) return Search(p->rChild, x);
      else if(x<p->element) return Search(p->lChild, x);
      else return Success;
  }
  template <class T>
  ResultCode BSTree<T>::Search_iteration(T& x) const
  {
      BTNode<T> *p=root;
      while(p)
      {
          if(x<p->element) p=p->lChild;
          else if(x>p->element) p=p->rChild;
          else return Success;
      }
      return NotPresent;
  }

  template <class T>
  ResultCode BSTree<T>::Insert_recursion(T& x)//ATTENTION: situation when root is NULL
  {
      if(root) return Insert(root, x);
      else root=new BTNode<T>(x);
  }

  template <class T>
  ResultCode BSTree<T>::Insert(BTNode<T>* p, T& x) //ATTENTION: situation when root is NULL
  {
      if(x>p->element)
      {
         if(p->rChild==NULL)
         {
             p->rChild=new BTNode(x);
             return Success;
         }
         else return Insert(p->rChild, x);
      }
      else if(x<p->element)
      {
         if(p->lChild==NULL)
         {
             p->lChild=new BTNode(T);
             return Success;
         }
         else return Insert(p->lChild, x);
      }
      else return Duplicate;
  }

  template <class T>
  ResultCode BSTree<T>::Insert_iteration(T& x)
  {
      BTNode<T> *p=root, *q=NULL;
      if(!p)
      {
          q=p;
          if(x>p->element) p=p->rChild;
          else if(x<p->element) p=p->lChild;
          else return Duplicate;
      }
      p=new BTNode<T>(x);
      if(!root) root=p;
      else if(x<q->element) q->lChild=p;
      else q->rChild=p;
      return Success;
  }

  template <class T>
  ResultCode BSTree<T>::Remove(T* x)
  {
     // TODO
      }

12.1 What is a binary search tree? 286

binary-search-tree property: left <= middle <= right: Let x be a node in a binary search tree. If y is a node in the left subtree of x, then y.key <= x.key. If y is a node in the right subtree of x, then y.key >= x.key.

How to print out keys in sorted order? inorder tree walk (other walks: preorder, postorder)

1
2
3
4
5
6
7
  void inorderTreeWalk(Node* r) {
      if (r) {
          inorderTreeWalk(r->left);
          cout << r.key;
          inorderTreeWalk(r->right)};
      }
  }

Θ(n)

12.2 Querying a binary search tree 289

  • Iteration is faster in Java, C, Python… Recursion is fairly expensive compared to iteration (in general) because it requires the allocation of a new stack frame.
  • Recursion might be faster in functional programming language.
Searching
  • Recursive version: Me? search left? search right?
  • Iteration version: Use pointer p. p->key? p=p->left? p=p->right?
Min and Max

left-most leaf and right-most leaf

Successor and predecessor
  • Successor: If x has right subtree, return max(x.right), else find in parents.
  • Predecessor: If x has left subtree, return max(x.left), else find in parents.

12.3 Insertion and deletion 294

12.4 Randomly built binary search trees 299

13 Red-Black Trees 308

13.1 Properties of red-black trees 308

  • A BST with one extra color bit of storage per node.
  • Red-black properties:
    1. Every node is either red or black.
    2. The root is black.
    3. Every leaf (NIL) is black.
    4. If a node is red, then both its children are black.
    5. For each node, all simple paths from the node to descendant leaves contain the same number of black nodes.
  • sentinel T.nil for all all pointers to NIL (replacing leaves and parent of root)
  • black-height of node x: number of black nodes on the path from (not including) x to a leaf
  • black-height of a tree: black-height of its root
  • A red-black tree with n internal nodes has height at most 2*log(n+1)

13.2 Rotations 312

  • Rotation: A local operation in a search tree that preserves the binary-search-tree property.
    • left-rotation: counter-clockwise
      • set y
      • turn y’s left subtree into x’s right subtree
      • link x’s parent to y
      • put x on y’s left
    • right-rotation: clockwise

13.3 Insertion 315

  1. insert node red z into tree T (Property 5)
  2. auxiliary procedure RBinsertFixup(T,z) to re-color and rotate.Ω
    • Property 2 is violated if z is the root
    • Property 4 is violated if z’s parent is red.
      • Case 1: z’s uncle y is red: z上两层反色
      • Case 2: z’s uncle y is black and z is a right child: around z left-rotate
      • Case 3: z’s uncle y is black and z is a left child: around z’s parent right-rotate + z上两层反色
    • O(lgn)

13.4 Deletion 323

  • RBTransplant(T, u, v): replace subtree u with subtree v. mind NIL node as parent and leaves

TODO

14 Augmenting Data Structures 339

14.1 Dynamic order statistics 339

  • Rank of an element: its position in the linear order of the set
  • Order-statistic tree: RBT + every node has x.size = number of nodes in its subtree, including the NIL sentinel
    • x.size = x.left.size + x.right.size + 1

14.2 How to augment a data structure 345

  1. Choose an underlying data structure.
  2. Determine additional information to maintain in the underlying data structure.
  3. Verify that we can maintain the additional information for the basic modifying operations on the underlying data structure.
  4. Develop new operations.

14.3 Interval trees 348

RBT + operations on dynamic sets of intervals.