
2. Sorting and Order Statistics

~well since I’m learning rust I will use rust to implement algorithms~
A brief review of sorting problem
A sorting problem is usually defined as below:
Given a sequence of n numbers , we want the algorithm to output a reordering of original sequence , such that
It’s worth noticing that the “numbers” can be anything that can be ordered, and the “sequence” can be any data structure that can contain an ordered collection of objects. Thus the focus of this part is on the problem itself, not the specification of individual type of objects.
Recall the algorithm we introduced in the previous part: insertion sort and quick sort, with their own properties. Insertion sort takes more time than merge sort, but it has good performance on small size input, since it sorts the array in place, means there is only constant number of elements are stored out side of the input array(Recall that the merge
function of merge sort will create additional array as the size of array increases, where insertion sort only stores the max/min element no matter how large the input array is).
In this part, we will introduce two new sorting algorithms: heap sort and quick sort.
Heap sort
Heap sort is an sort algorithm that stores array in-place and its run time is . Heap sort utilizes a data structure called binary heap, which is a complete binary tree that the parent node is always greater/smaller than the child node. When all the parents are greater than their childs, the heap is called max heap, otherwise it’s called a min heap.
Recall that in a binary tree represented by an array, give the i-th index of the array, its parent node idx is , and its left child and right child are and , if we start count from 0 and floor the division.
For a binary tree with node, the height of the tree(the number of edges from of the longest downward path from the root to a leaf) is at least floor(log(n))
.
Proof:
To proof, we use this fact:
for a complete binary tree, the h-th layer has nodes (i.e. layer 1 has node)
Thus, a tree with height has at least
number of nodes.
Let
take on both side, we get:
since is increasing, we can conclude that , and thus the lower limit is what we desired.
Construction of heap
To create a max heap, we can follow the following algorithm:
The input is the node. And the function left(x)
and right(x)
stands for the index of the left child and the right child of the x-th node.
- Check the left node and the right node to find the largest node index of the i-th node(if the i-th node is greater than all of them, the greatest node index is i)
- If the i-th node is not the largest node, swap the i-th node and the greatest node(either left or right node).
- Apply this procedure on the largest node index we found previously.
The rust code:
1 | fn left(v: &[i32], idx: usize) -> Option<usize> { |
The output:
1 | 4 1 4 5 1 1 |
Originally the tree looks like this:
1 | 1 |
and after adjustment, we have:
1 | 4 |
This doesn’t look like a heap, since we only processed one branch, which is the largest branch of the root node.
To implement the full heap sort, we need to adjust the tree from the second lowest layer to the upper most layer:
Why the second lowerst layer
The reason why we don’t begin with the root node because in our scenerio, if we apply the procedure twice, we get:
1 | 4 |
which is not a valid heap.
To come up with a full procedure, we iterate from the last node with children to the root node, and apply the procedure max_heapify
to each node.
The rust code:
1 | fn max_heapify(vec: &mut [i32], root: usize) { |
The output:
1 | 1 1 4 5 1 4 |
If we draw the latter as a binary tree we get:
1 | 5 |
To prove this works, we once again uses loop invarient, since there is a loop in this code:
At the start of each loop, the node are root node of a max heap
Now we are about to proof this actually remains invarient throughout the loop. But we will first proof one lemma:
Lemma 2.1
Given a complete binary tree vec
, if node has child that is the root of a max heap, after the function max_heapify
applied to , will be the root of a max heap.
proof
Note: the right left child and right left child has symmetric proof, we will only discuss the left child.
Assume the left child of the th node is . Since is a max-heap root, it is greater than all of its child nodes. There are two cases:
- : In this case, is already a max heap and we do nothing.
- : In this case, the function
max_heapify
will swap that element. Given that is the root of a max heap, its child nodes are also valid roots for max heaps, which comes back to our initial condition.
After some iterations, case 2 either goes to case 1 or hits the bottom of the tree. When the iteration hits the bottom of the tree, according to our algorithm, the element must be smaller than all the previous elements, and given that all the previous elements are in order, the new binary tree formed by node is a valid max heap.
Initialization: This loop starts from , which is the last element in the second last layer. Notice that the element $vec.len() / 2 $ to are tivially max heap.
Maintenance: In each heap, we construct a max heap with root node as node , By our initialization condition, the child node of are root of a max heap. By Lemma 2.1, after adjustment, should become a new root of a max heap.
Termination: The loop terminates at the root node of the binary tree. At that time, we have applied max_heapify
algorithm to all the nodes, thus every nodes are valid root of max heap, thus the whole tree is a valid max heap.