
1. Foundations

If not specified, all the code is written in c23 standard, compiled with g 14.1
command: g++ -std=c++2b -O0 .cpp -o out && ./out
Notations:
For an sequence A, the notation means the subarray from the a-th element to b-1-th element. I.e., for , is equivalent to . contains no element.
All sequences start their index from 0.
means the sequence of integer from 0 to n, inclusive.
Algorithms
An algorithm is a well-defined computational procedure that takes input and return some output, which can be also viewed as a tool of solving a well-defined computational problem. The problem is generally defined by a desired input and output relationship, and the algorithm describes a specific computational approach to achieve this input-output relationship. For example, the problem could be: “given a sequence of n numbers , return the reordering of the sequence, where ”.
This is a classical sorting problem. We define an instance of a problem as a concrete input that is needed to compute the solution to the problem. In this case, the instance of the problem could be the sequence , and the corresponding solution will be .
We say an algorithm is correct when it always halt with correct output given all the input instances, and that algorithm solves the problem.
An algorithm can be represented in many different ways, including computer programming, natural language, or hardware design, as long as it can provide a precise description of the computational procedure to be followed.
icon-padding
In this series of note most algorithms will be implemented with C++
For example, the sorting problem can be represented by the following C++ code:
1 |
|
Here, the input is an integer vector, which is a list of integers. And the output is the sorted vector.
We can describe the process with natural language:
- Pick the i-th entry in the list and remember it.
- Compare this entry to the entry that is on its left.
- If , shift right one position, i.e. .
- Repeat this process and shift left one position each time.
- If the entry is less than , stop, and make the right element to be .
Observes that when we are moving elements, we are trying to move the smaller element to the left of the list, thus creating a sorted portion. In fact, in the for loop in the code, we see that for the first elements are the elements inside the original array, but sorted. We can define this as a loop invariant:
invariant
At the start of each iteration of the for loop, the subarray consists of the elements in the original array, but in a sorted order.
We use loop invariant to show how an algorithm is correct, and we must show three things about a loop invariant:
- Initialization: It is true prior to the first iteration of the loop.
- Maintenance:: If it is true before an iteration of the loop, it remains true before the next iteration.
- Termination: When the loop ends, the invariant gives us a useful property that helps show that the algorithm is correct.
This is very similar to math induction, where we first prove for a base case something works and prove the inductive hypothesis. The initialization process is the base case, and the maintenance is the inductive hypothesis to be proved.
The third property is the most important one, since we need to use loop invariant to show correctness, and in most cases we use the termination condition along with the loop invariant.
For insertion sort, the three properties of this loop invariant is:
- Initialization: In the code, we started our i by 1, and the subarray contains only one element, and the original element is the element in the subarray and it is sorted.
- Maintenance: Our inductive hypothesis is that for a fix number , the above property holds, and we are about to prove it using . Observes that in each iteration, we move the element at to the left, and we stop when the left element is greater than the number and the right element is less than the number. Because by our base case, the element in the subarray are sorted, and we inserted the new element in the correct place, the new array is sorted. Thus, we have proven the inductive hypothesis.
- Termination: Now we have to examine what happens when the loop ends. The termination condition of the loop is when . Based on our inductive step, we conclude that the subarray is sorted, and this is the whole array. Thus, the array is sorted.
Another simple algorithm that can be used as a example is linear search. This algorithm takes an array with length and a target element as input, and will output the index of where the element is, or if it is not included.
This algorithm has this cpp code:
1 | int linearSearch(const std::vector<int> &arr, int t) { |
We can also use loop invariant to prove the algorithm is correct. When we are iterating through the loop, we check each element to see if this element is target element or not. If no, we proceed to the next element. We do this until there is one matching element. Thus, we claim the loop invariant is:
invariant
At the start of the for loop, the subarray contains no matching element.
And we are about to show them with the property of loop invariant:
- Initialization: Our base case is when the loop variable, i, is 0. In this case, the subarray contains no element, thus no element we want.
- Maintenance: Our inductive hypothesis will thus be: for a fixed number , the subarray contains no matching element. We will prove it using . When , we are comparing the 1-th element with the desired number. If the element is the target element, the loop will quit. If the element is not the element we want, by our base case, the subarray contains no matching element, thus the subarray contains no element we want. Thus, we have proven the inductive hypothesis.
- Termination: When the loop terminate, the loop either is in the end of array, or found a matching element. For the second case, because we have a matching element, we will return the position and end the loop. For the first case, by our inductive hypothesis, the subarray contains no matching element. Because this is the whole array, we conclude there is no matching element, and thus return length of the array, thus the algorithm is correct.
Complexity and growth of functions
When we are talking about the time complexity of an algorithm, we are talking about the time it takes with large number of inputs, and same for the space complexity but we are studying space it takes(but that is not the thing we concern the most in most cases). Because we are studying the complexity when the number of inputs increases without bound, we call this asymptotic efficiency of algorithms.
The tools we use to describe the complexity of an algorithm are functions from to , but sometimes we can expand or shrink the domain to a subset of or . We have the following notations and their definitions:
\begin{align*} \Theta(g(x)) &:= \{f(n)| \exists c_1, c_2, n_0 : 0 \leq c_1g(n) \leq f(n) \leq c_2g(n) \forall n \geq n_0\}\\ O(g(x)) &:= \{f(n)| \exists, c, n_0 : f(n) \leq cg(n) \forall n \geq n_0\}\\ \Omega(g(n)) &:= \{f(n)| \exists c, n_0 : cg(n) \leq f(n) \forall n \geq n_0\} \end{align*}
In natural language:
- is the average complexity of the algorithm.
- is the worst case complexity of the algorithm.
- is the best case complexity of the algorithm.
They are therefore called asymptotic notation
Normally, if we have is some polynomial with multiple terms and some coefficients, we only take the highest terms and ignore the constant coefficients. We can formally prove this using the definition:
By definition, . Here WLOG, let , and we are about to show this is equivalent to . Substitute into the definition, we have:
divide by , we have:
Because there exist some constants that can satisfy either side of the inequality, we can verify that by choosing appropriate constants. In fact, as goes to infinity, we can ignore the lower terms because they are insignificant.
To apply this property to O and , we will use this theorem:
For any two functions and , we have f = if and only if f = and f =
The proof is simple. By definition, is the upper bound of execution time, is the lower bound, and is the intersection of upper and lower bound. Thus, inside the intersection of later two implies inside the former set and vice versa. Thus, the property applies to the latter two functions.
When we are using them inside math expressions, they represent some functions inside that set, and we don’t care what they really are since we only care about the asymptotic behavior of the function. This in most cases can be useful. For example, in this case:
This equation is well-defined because we can always find a function inside and to satisfy the equivalence relationship. Thus, we can greatly simplify our expression.
Similar to bit O and notations, we have small o and notation, which means the not asymptotically tight bounded version for their big counterparts.
The term asymptotically tight bound means that for functions and , we say is a asymptotically tight bound for if for all , for some . Intuitively, we can come up with the definition of “not asymptotic tight bound” for functions and :
And the definition for is:
For example, , but . Because by definition, as n goes to infinity, they have the same growth rate and thus they are the same. Or, in limit notation:
The notation also represents a not asymptotically tight bound for notation, and the definitions is below:
Which means for sufficiently large . In limit notation:
The relation and their small counterparts(except which has no small counterpart) are transitive. But only are reflexive, and only is symmetric.
For and their small counterparts, they also have the following properties:
We will prove the antisymmetric property here:
proof of antisymmetric for $O$
Let be functions from to , we want to show that .
By definition, is in the same degree of , and because we don’t care what really is, WLOG, we can say . The condition can be proven by symmetry.
By definition, and have same domain and codomain.
Thus, we proven the antisymmetric property as desired.
Thus, the relation is a partial order, and notation is also a partial order which can be proven by symmetry.
For small o and relation, we will prove the asymmetric here:
proof of asymmetric of $o$
Let be functions from to , we want to show that .
By definition, implies that has less degree compared to , thus .
Thus, we have proven the asymmetric for as desired.
Thus, the relation is a strict partial order, and is a strict partial order which can be proven by symmetry.
Thus, we can develop a analogy between these notations and comparison between numbers:
- is like
- is like
- is like
- is like
- is like
And we say is asymptotically smaller than if , and is asymptotically larger than if
However, some function cannot be compared like this. For example, if one of the limit doesn’t exist, it brokes our definition of or relation. And and doesn’t work either, because we are comparing the behavior of functions when , and some functions, like , which has its degree oscillating between -1 and 1.
Divide and conquer
Divide and conquer is a idea that is used in many algorithms. Typically, a “divide and conquer” algorithm involves the following steps:
- Divide the problem into smaller portions that is the same instances of the original problem
- Conquer the subproblem recursively with the step provided above, until the problem is small enough to be solved in straightforward manner.
- Combine the result of the subproblem into the solution of the original problem.
For example, one classical divide and conquer problem is merge sort, which has the following cpp code:
1 | void merge(std::vector<int> &arr, int p, int q, int r) { |
To describe this algorithm using the steps we provided above, we have:
- Divide: The array into two subarray, one store the first half of the original array and the other store the other half.
- Conquer: Keep dividing until the array contains only one element, and array with only one element is already sorted.
- Combine: Subarrays are then combined by putting two array back to the correct position.
When we are dividing the array into subarrays, we call the subarrays that unable to sort trivially the recursive case, and when we go to the case that the subarray can be sorted trivially, i.e. it cannot be divide further, we call this base case. When we reached the base case, we stop the recursion and go back.
To calculate the complexity of a divide-and-conquer algorithm, we introduce a tool called recurrence. An recurrence is an equation or inequality that describes a function in terms of its value on smaller inputs.
For example, the recurrence that describes the worst case running time of merge sort we introduced earlier is:
\begin{align*} T(n) &= O(1) \text{ for n = 1} \\ T(n) &= 2T(\frac{n}{2}) + O(n) \text{ for n > 1} \end{align*}
Observes that when , the array can be sorted trivially, thus we have the time complexity of . And for the larger case, we divide the array in half, which corresponds to the term. After that we rearrange the array, which takes linear time complexity, which corresponds to the term.
Notice that we can divide the original problem in different size. For example, when we doing a linear search, we divide the array into 1 element and element subarrays, thus the recurrence for the recursive case is:
We have several ways to solve recurrences:
- substitution method: we guess a bound and use math induction to prove guess is right.
- recursion-tree method: we use the recurrence into a tree which nodes representing a recursive case, then we use technique of bonding summations to solve the recurrence.
- master method: in this method, we find the bound of the recurrence of the form:
where , , and is a given function. This formula characterizes a divide and conquer problem that creates subproblems and each dividing the original problem into of its original size, and the combination time takes time.
Another problem that uses divide-and-conquer algorithm is the “maximum-subarray problem”, where in this problem you are given a array of size , and we are to find two points in the subarray such that the difference (second - first) is the greatest.
In this case, we can certainly use brute force to find the solution, but in this case, it requires checks, which has time complexity of .
Or, in another way, if we create an array that stores the difference between two elements, we can use this to calculate the difference. For example, for sequence:
The difference sequence is:
And the sum of the subarray is the difference between and .
proof
Let , be the array such that . We want to show that
By definition, As desired.
By doing so, we will be able to solve this problem using divide-and-conquer.
If we define the middle position of the array as , there will be only three cases about the location of the subarray:
- The maximum-subarray is entirely in the subarray .
- The maximum-subarray is entirely in the subarray .
- The maximum-subarray is crossing the midpoint.
Notice that in the first case and the second case, we have the same situation as the original problem, which is finding the maximum-subarray inside an array, thus we are dividing the problem into smaller instances.
In the third case, if the maximum-subarray lies between two subarrays, we can find the two sections of the maximum-subarray in linear time because we know the one end of the maximum-subarray is the midpoint. The cpp code is below:
To better orgnize our code, we define a struct to store our result
1 | struct maxArrayResult { |
Now, we have handled all the three cases, and we can write the code for finding the maximum-subarray. The workflow is below:
- check if the array contains only one element
- if so, return the element and that is the maximum-subarray in this interval
- if no, we find the midpoint
- we then recursively deal with the left part and the right part of the array, finding the maximum-subarray on each side. We also need to handle the situation of crossing subarray.
- We compare the results and return the biggest position and sum.
This procedure has the following cpp code:
1 | maxArrayResult maximumArray(std::vector<int> arr, int begin, int end) { |
makeDifferenceArray is a utility function we use to convert the original array into the difference array.
Because we are finding the max sum of the difference array, to find the two points of the original array, we have to add 1 to the end element.
Now we are to analyze the time complexity of this algorithm. We will analyze this using both master method.
First, we consider the base case, where there is only one element between the interval, the time to find the max subarray is . Each time we are dividing the array into half of its original size, producing 2 subproblems, and in each division, we also have to do a check to see if the maximum-subarray goes across the mid point, which takes time complexity. Thus, the time complexity of the recursive case, if we denote the time complexity , is .
Thus, the recurrence of this algorithm is:
\begin{align*} T(n) &= O(1) \text{ for n = 1}\\ T(n) &= 2T(n / 2) + O(n) \end{align*}
Probablistic analysis and randomized algorithms
Consider this problem:
You want to hire a assistant to replace the current assistant, to interview each person, there will be a cost . Because you must fire your current assistant, there exists a fee each time you hire an assistant, . As for this problem, you come up with this strategy:
- Remember the evaluation of your current assistant as the best candidate option.
- Iterate over all the candiates you have.
- interview the candidate
- if the candidate is better than the candidate you have currently, you make the best candidate this candidate and hire that candidate.
- Otherwise you keep interviewing
In this case, we only care about how much will the whole procedure cost. And it’s simple to come up with the worst case scenerio: when the assistant lists are stored in an increasing order, since in this case we have to hire a new assistant each time we interviewed one.
But we know that the distribution cannot always be the worst case, then how about other cases? Can we come up with a so-called average-case running cost? The answer is yes, in most cases, when we know or we can assume the distribution of the inputs, we can use the method of probablistic analysis (in this case we can assume the candidate comes in random order). To properly define this randomness, we say that for all the possible permutations of rankings, they have equal probability of appearing, and alternatively, we say this is a uniform random permutation.
In this case, we can use an RNG to make sure there is a uniform random permutation. In general, we call an algorithm is randomized if the behavior is determined not only by inputs but also by an randon-number generator, and in this case, we will use to denote a random integer in the range .
A brief review of random variables
A random variable is a function from the sample space to , and an indicator random variable associated with event outputs 1 when happens and 0 when doesn’t happen.
The expected value of a random variable is the sum of probability of a certain event occur times the value of the random variable with that event as input.
The expected value of a indicator random variable is the probability of evnet , or .
To analyze the problem with random variables, suppose we have the random variable , the expected value of this random variable would be:
To calculate this, we use a change of variable. suppose we have another indicator random variable called for whether we hired the assistant. In this case, we know we either hire or not hire one assistant, thus we can change the random variable we have to be the sum of the new random variables:
X = \sum_^n_{j=1}{X_j}
and the expected value can be calculated this way:
\begin{align*} E[X] &= \sum E[X_j] &= \sum \frac{1}{j} &= \ln(n) + O(1) \end{align*}
Thus, since we are expected to interview many people, our procedure has average hiring cost of
However, in many cases(and this case), we don’t have control over the sequence of the candidate arrives, thus we cannot make assumption about how the input will be given. To actually make the data we are iterating a uniform random permutation, we use a random number generator to permute the input into a new random order. Thus, there is no so-called “good” input and “bad” input (like when the candidate are presented in increasing order)permutation, we use a random number generator to permute the input into a new random order. Thus, there is no so-called “good” input and “bad” input (like when the candidate are presented in increasing order).