1. Foundations

If not specified, all the code is written in c23 standard, compiled with g 14.1
command: g++ -std=c++2b -O0 .cpp -o out && ./out

Notations:

For an sequence A, the notation $A\text[a, b]$ means the subarray from the a-th element to b-1-th element. I.e., for $A = [1, 1, 4, 5, 1, 4]$ , $A\text[0, 4]$ is equivalent to $[1, 1, 4]$ . $A\text{[a, a]}, a \leq A\text{.size()}$ contains no element.

All sequences start their index from 0.

$[n], n \geq 0$ means the sequence of integer from 0 to n, inclusive.

# Algorithms

An algorithm is a well-defined computational procedure that takes input and return some output, which can be also viewed as a tool of solving a well-defined computational problem. The problem is generally defined by a desired input and output relationship, and the algorithm describes a specific computational approach to achieve this input-output relationship. For example, the problem could be: “given a sequence of n numbers $a_1, a_2, ..., a_n$ , return the reordering of the sequence, where $a_1' \leq a_2' \leq ... \leq a_n'$ ”.

This is a classical sorting problem. We define an instance of a problem as a concrete input that is needed to compute the solution to the problem. In this case, the instance of the problem could be the sequence $1, 1, 4, 5, 1, 4$ , and the corresponding solution will be $1, 1, 1, 4, 4, 5$ .

We say an algorithm is correct when it always halt with correct output given all the input instances, and that algorithm solves the problem.

An algorithm can be represented in many different ways, including computer programming, natural language, or hardware design, as long as it can provide a precise description of the computational procedure to be followed.

icon-padding

In this series of note most algorithms will be implemented with C++

For example, the sorting problem can be represented by the following C++ code:

#include <vector>

void insertionSort(std::vector<int> &in) {
    for (int i = 1; i < in.size(); i++) {
        int key = in[i];
        int j = i - 1;
        while (j >= 0 && in[j] > key) {
            in[j + 1] = in[j];
            j--;
        }
        in[j + 1] = key;
    }
}

Here, the input is an integer vector, which is a list of integers. And the output is the sorted vector.

We can describe the process with natural language:

Pick the i-th entry $a_i$ in the list and remember it.
Compare this entry to the entry $a_{i - 1}$ that is on its left.
If $a_i < a_{i - 1}$ , shift $a_{i - 1}$ right one position, i.e. $a_i = a_{i - 1}$ .
Repeat this process and shift left one position each time.
If the entry is less than $a_i$ , stop, and make the right element to be $a_i$ .

Observes that when we are moving elements, we are trying to move the smaller element to the left of the list, thus creating a sorted portion. In fact, in the for loop in the code, we see that for the first $i - 1$ elements are the elements inside the original array, but sorted. We can define this as a loop invariant:

invariant

At the start of each iteration of the for loop, the subarray $\text{in}[0, i]$ consists of the elements in the original array, but in a sorted order.

We use loop invariant to show how an algorithm is correct, and we must show three things about a loop invariant:

Initialization: It is true prior to the first iteration of the loop.
Maintenance:: If it is true before an iteration of the loop, it remains true before the next iteration.
Termination: When the loop ends, the invariant gives us a useful property that helps show that the algorithm is correct.

This is very similar to math induction, where we first prove for a base case something works and prove the inductive hypothesis. The initialization process is the base case, and the maintenance is the inductive hypothesis to be proved.

The third property is the most important one, since we need to use loop invariant to show correctness, and in most cases we use the termination condition along with the loop invariant.

For insertion sort, the three properties of this loop invariant is:

Initialization: In the code, we started our i by 1, and the subarray $\text{in}[0, 1]$ contains only one element, and the original element is the element in the subarray and it is sorted.
Maintenance: Our inductive hypothesis is that for a fix number $i < \text{in.size()}$ , the above property holds, and we are about to prove it using $i = 2$ . Observes that in each iteration, we move the element at $i + 1$ to the left, and we stop when the left element is greater than the number and the right element is less than the number. Because by our base case, the element in the subarray $\text{in}[0, 1]$ are sorted, and we inserted the new element in the correct place, the new array $\text{in}[0, 2]$ is sorted. Thus, we have proven the inductive hypothesis.
Termination: Now we have to examine what happens when the loop ends. The termination condition of the loop is when $i = \text{in.size()}$ . Based on our inductive step, we conclude that the subarray $\text{in}[0, i]$ is sorted, and this is the whole array. Thus, the array is sorted.

Another simple algorithm that can be used as a example is linear search. This algorithm takes an array $\text{arr}$ with length $n$ and a target element $t$ as input, and will output the index of where the element is, or $\text{arr.size()}$ if it is not included.

This algorithm has this cpp code:

int linearSearch(const std::vector<int> &arr, int t) {
    for (int i = 0; i < arr.size(); i++) {
        if (arr[i] == t) {
            return i;
        }
    }
    return arr.size();
}

We can also use loop invariant to prove the algorithm is correct. When we are iterating through the loop, we check each element to see if this element is target element or not. If no, we proceed to the next element. We do this until there is one matching element. Thus, we claim the loop invariant is:

invariant

At the start of the for loop, the subarray $\text{arr}[0, i]$ contains no matching element.

And we are about to show them with the property of loop invariant:

Initialization: Our base case is when the loop variable, i, is 0. In this case, the subarray $\text{arr}[0, 0]$ contains no element, thus no element we want.
Maintenance: Our inductive hypothesis will thus be: for a fixed number $0 \leq i \leq \text{arr.size()}$ , the subarray $\text{arr}[0, i]$ contains no matching element. We will prove it using $i = 1$ . When $i = 1$ , we are comparing the 1-th element with the desired number. If the element is the target element, the loop will quit. If the element is not the element we want, by our base case, the subarray $\text{arr}[0, 0]$ contains no matching element, thus the subarray $\text{arr}[0, 1]$ contains no element we want. Thus, we have proven the inductive hypothesis.
Termination: When the loop terminate, the loop either is in the end of array, or found a matching element. For the second case, because we have a matching element, we will return the position and end the loop. For the first case, by our inductive hypothesis, the subarray $\text{arr}[0, \text{arr.size()}]$ contains no matching element. Because this is the whole array, we conclude there is no matching element, and thus return length of the array, thus the algorithm is correct.

# Complexity and growth of functions

When we are talking about the time complexity of an algorithm, we are talking about the time it takes with large number of inputs, and same for the space complexity but we are studying space it takes(but that is not the thing we concern the most in most cases). Because we are studying the complexity when the number of inputs increases without bound, we call this asymptotic efficiency of algorithms.

The tools we use to describe the complexity of an algorithm are functions from $\mathbb{N}$ to $\mathbb{R}$ , but sometimes we can expand or shrink the domain to a subset of $\mathbb{N}$ or $\mathbb{R}$ . We have the following notations and their definitions:

$\begin{align*} \Theta(g(x)) &:= \{f(n)| \exists c_1, c_2, n_0 : 0 \leq c_1g(n) \leq f(n) \leq c_2g(n) \forall n \geq n_0\}\\ O(g(x)) &:= \{f(n)| \exists, c, n_0 : f(n) \leq cg(n) \forall n \geq n_0\}\\ \Omega(g(n)) &:= \{f(n)| \exists c, n_0 : cg(n) \leq f(n) \forall n \geq n_0\} \end{align*}$

In natural language:

$\Theta(g(n))$ is the average complexity of the algorithm.
$O(g(n))$ is the worst case complexity of the algorithm.
$\Omega(g(n))$ is the best case complexity of the algorithm.

They are therefore called asymptotic notation

Normally, if we have $g(x)$ is some polynomial with multiple terms and some coefficients, we only take the highest terms and ignore the constant coefficients. We can formally prove this using the definition:

By definition, $\Theta(g(x)) := \{f(n)| \exists c_1, c_2, n_0 : 0 \leq c_1g(n) \leq f(n) \leq c_2g(n) \forall n \geq n_0\}$ . Here WLOG, let $g(x) = \frac{1}{2}n^2 - 3n$ , and we are about to show this is equivalent to $n^2$ . Substitute $n^2$ into the definition, we have:

$c_1n^2 \leq \frac12n^2 - 3n \leq c_2n^2$

divide by $n^2$ , we have:

$c_1 \leq \frac12 - \frac3n \leq c_2$

Because there exist some constants that can satisfy either side of the inequality, we can verify that $\frac12m^2 - 3n \equiv n^2$ by choosing appropriate constants. In fact, as $n$ goes to infinity, we can ignore the lower terms because they are insignificant.

To apply this property to O and $\Omega$ , we will use this theorem:

For any two functions $f$ and $g$ , we have f = $\Theta(g(n))$ if and only if f = $O(g(n))$ and f = $\Omega(g(n))$

The proof is simple. By definition, $O(g(n))$ is the upper bound of execution time, $\Omega(g(n))$ is the lower bound, and $\Theta(g(n))$ is the intersection of upper and lower bound. Thus, inside the intersection of later two implies inside the former set and vice versa. Thus, the property applies to the latter two functions.

When we are using them inside math expressions, they represent some functions inside that set, and we don’t care what they really are since we only care about the asymptotic behavior of the function. This in most cases can be useful. For example, in this case:

$2n^2 + \Theta(n) = \Theta(n^2)$

This equation is well-defined because we can always find a function inside $\Theta(n)$ and $\Theta(n^2)$ to satisfy the equivalence relationship. Thus, we can greatly simplify our expression.

Similar to bit O and $\Omega$ notations, we have small o and $\omega$ notation, which means the not asymptotically tight bounded version for their big counterparts.

The term asymptotically tight bound means that for functions $f(n)$ and $g(n)$ , we say $g(n)$ is a asymptotically tight bound for $f(n)$ if for all $n > n_0$ , $f(n) = cg(n)$ for some $c$ . Intuitively, we can come up with the definition of “not asymptotic tight bound” for functions $f(n)$ and $g(n)$ :

$\forall c > 0, \exists n_0 > 0, \forall n \geq n_0, 0 \leq f(n) < cg(n)$

And the definition for $o(g(x))$ is:

$\{f(n) : \forall c > 0, \exists n_0 > 0, \forall n \geq n_0, 0 \leq f(n) < cg(n)\}$

For example, $2n = o(n^2)$ , but $2n^2 \neq o(n^2)$ . Because by definition, as n goes to infinity, they have the same growth rate and thus they are the same. Or, in limit notation:

$\lim_{n\rightarrow\infty}\frac{f(n)}{g(n)} = 0$

The $\omega$ notation also represents a not asymptotically tight bound for $\Omega$ notation, and the definitions is below:

$\{f(n) : \forall c > 0, \exists n_0 > 0, \forall n \geq n_0, 0 \leq cg(n) < f(n)\}$

Which means $g(n) < f(n)$ for sufficiently large $n$ . In limit notation:

$\lim_{n\rightarrow\infty}\frac{f(n)}{g(n)} = \infty$

The relation $\Theta, O, \Omega$ and their small counterparts(except $\Theta$ which has no small counterpart) are transitive. But only $\Theta, O, \Omega$ are reflexive, and only $\Theta$ is symmetric.

For $O, \Omega$ and their small counterparts, they also have the following properties:

$f(n) = O(g(n)) \iff g(n) = \Omega(f(n)) \\ f(n) = o(g(n)) \iff g(n) = \omega(f(n))$

We will prove the antisymmetric property here:

proof of antisymmetric for $O$

Let $f(n), g(n)$ be functions from $\mathbb{N}$ to $\mathbb{R}$ , we want to show that $O(g(n)) = f(n), O(f(n)) = g(n) \implies f(n) = g(n)$ .

By definition, $O(g(n)) = f(n) \implies g(n)$ is in the same degree of $f(n)$ , and because we don’t care what $g(n)$ really is, WLOG, we can say $g(n) = f(n)$ . The condition $O(f(n)) = g(n)$ can be proven by symmetry.

By definition, $f(n)$ and $g(n)$ have same domain and codomain.

Thus, we proven the antisymmetric property as desired.

Thus, the relation $O$ is a partial order, and $\Omega$ notation is also a partial order which can be proven by symmetry.

For small o and $\omega$ relation, we will prove the asymmetric here:

proof of asymmetric of $o$

Let $f(n), g(n)$ be functions from $\mathbb{N}$ to $\mathbb{R}$ , we want to show that $o(g(n)) = f(n) \implies o(f(n)) \neq g(n)$ .

By definition, $o(g(n)) = f(n)$ implies that $f(n)$ has less degree compared to $g(n)$ , thus $o(f(n)) \neq g(n)$ .

Thus, we have proven the asymmetric for $o$ as desired.

Thus, the relation $o$ is a strict partial order, and $\omega$ is a strict partial order which can be proven by symmetry.

Thus, we can develop a analogy between these notations and comparison between numbers:

$f(n) = O(g(n))$ is like $a \leq b$
$f(n) = \Omega(g(n))$ is like $a \geq b$
$f(n) = \Theta(g(n))$ is like $a = b$
$f(n) = o(n)$ is like $a < b$
$f(n) = \omega(n)$ is like $a > b$

And we say $f(n)$ is asymptotically smaller than $g(n)$ if $f(n) = o(g(n))$ , and $f(n)$ is asymptotically larger than $g(n)$ if $f(n) = \omega(g(n))$

However, some function cannot be compared like this. For example, if one of the limit doesn’t exist, it brokes our definition of $o$ or $\omega$ relation. And $\Omega$ and $O$ doesn’t work either, because we are comparing the behavior of functions when $n \rightarrow \infty$ , and some functions, like $n^{\sin(n)}$ , which has its degree oscillating between -1 and 1.

# Divide and conquer

Divide and conquer is a idea that is used in many algorithms. Typically, a “divide and conquer” algorithm involves the following steps:

Divide the problem into smaller portions that is the same instances of the original problem
Conquer the subproblem recursively with the step provided above, until the problem is small enough to be solved in straightforward manner.
Combine the result of the subproblem into the solution of the original problem.

For example, one classical divide and conquer problem is merge sort, which has the following cpp code:

void merge(std::vector<int> &arr, int p, int q, int r) {
    int n1 = q - p + 1;
    int n2 = r - q;
    std::vector<int> L(n1 + 1);
    std::vector<int> R(n2 + 1);
    for (int i = 0; i < n1; i++) {
        L[i] = arr[p + i];
    }
    for (int j = 0; j < n2; j++) {
        R[j] = arr[q + j + 1];
    }
    L[n1] = INT_MAX;
    R[n2] = INT_MAX;
    int i = 0;
    int j = 0;
    for (int k = p; k <= r; k++) {
        if (L[i] <= R[j]) {
            arr[k] = L[i];
            i++;
        } else {
            arr[k] = R[j];
            j++;
        }
    }
}

void mergeSort(std::vector<int> &arr, int p, int r) {
    if (p < r) {
        int q = (p + r) / 2;
        mergeSort(arr, p, q);
        mergeSort(arr, q + 1, r);
        merge(arr, p, q, r);
    }
}

To describe this algorithm using the steps we provided above, we have:

Divide: The array into two subarray, one store the first half of the original array and the other store the other half.
Conquer: Keep dividing until the array contains only one element, and array with only one element is already sorted.
Combine: Subarrays are then combined by putting two array back to the correct position.

When we are dividing the array into subarrays, we call the subarrays that unable to sort trivially the recursive case, and when we go to the case that the subarray can be sorted trivially, i.e. it cannot be divide further, we call this base case. When we reached the base case, we stop the recursion and go back.

To calculate the complexity of a divide-and-conquer algorithm, we introduce a tool called recurrence. An recurrence is an equation or inequality that describes a function in terms of its value on smaller inputs.

For example, the recurrence that describes the worst case running time of merge sort we introduced earlier is:

$\begin{align*} T(n) &= O(1) \text{ for n = 1} \\ T(n) &= 2T(\frac{n}{2}) + O(n) \text{ for n > 1} \end{align*}$

Observes that when $n = 1$ , the array can be sorted trivially, thus we have the time complexity of $O(1)$ . And for the larger case, we divide the array in half, which corresponds to the $2T(\frac{n}{2})$ term. After that we rearrange the array, which takes linear time complexity, which corresponds to the $O(n)$ term.

Notice that we can divide the original problem in different size. For example, when we doing a linear search, we divide the array into 1 element and $n - 1$ element subarrays, thus the recurrence for the recursive case is:

$T(n) = T(n - 1) + O(1)$

We have several ways to solve recurrences:

substitution method: we guess a bound and use math induction to prove guess is right.
recursion-tree method: we use the recurrence into a tree which nodes representing a recursive case, then we use technique of bonding summations to solve the recurrence.
master method: in this method, we find the bound of the recurrence of the form:

$T(n) = aT(\frac{n}{b}) + f(n)$

where $a \geq 1$ , $b \geq 1$ , and $f(n)$ is a given function. This formula characterizes a divide and conquer problem that creates $a$ subproblems and each dividing the original problem into $\frac{1}{b}$ of its original size, and the combination time takes $f(n)$ time.

Another problem that uses divide-and-conquer algorithm is the “maximum-subarray problem”, where in this problem you are given a array of size $n$ , and we are to find two points in the subarray such that the difference (second - first) is the greatest.

In this case, we can certainly use brute force to find the solution, but in this case, it requires $\binom{n}{2}$ checks, which has time complexity of $O(n^2)$ .

Or, in another way, if we create an array that stores the difference between two elements, we can use this to calculate the difference. For example, for sequence:

$A = \{1, 1, 4, 5, 1, 4\}$

The difference sequence is:

$B = \{0, 3, 1, -4, 3\}$

And the sum of the subarray $B\text{[a, b]}$ is the difference between $A\text{[a]}$ and $A\text{[b]}$ .

proof

Let $A = [n]$ , $B$ be the array such that $B[i] = A[i + 1] - A[i]$ . We want to show that $A[b] - A[a] = \text{sum(}B[a, b]\text{)}$

By definition, $\text{sum(}B[a, b]\text{)} = \sum_{i = a}^{b - 1} B[i] = A[a + 1] - A[a] + A[a + 2] - A[a + 1] + ... = \sum_{i = a + 1} ^ {b} A[i] - \sum_{i = a} ^ {b - 1} A[i] = A[b] - A[a]$ As desired.

By doing so, we will be able to solve this problem using divide-and-conquer.

If we define the middle position of the array as $\text{mid} = \text{arr.size()} / 2$ , there will be only three cases about the location of the subarray:

The maximum-subarray is entirely in the subarray $\text{arr[0, mid]}$ .
The maximum-subarray is entirely in the subarray $\text{arr[mid, arr.size()]}$ .
The maximum-subarray is crossing the midpoint.

Notice that in the first case and the second case, we have the same situation as the original problem, which is finding the maximum-subarray inside an array, thus we are dividing the problem into smaller instances.

In the third case, if the maximum-subarray lies between two subarrays, we can find the two sections of the maximum-subarray in linear time because we know the one end of the maximum-subarray is the midpoint. The cpp code is below:

To better orgnize our code, we define a struct to store our result

struct maxArrayResult {
    int left_idx, right_idx, sum;
};

maxArrayResult maxCrossingArray(std::vector<int> arr, int begin, int mid,
                                int end) {
    int left_sum = std::numeric_limits<int>::min();
    int right_sum = std::numeric_limits<int>::min();
    int sum = 0;
    int max_left, max_right;
    for (int i = mid; i >= begin; i--) {
        sum += arr[i];
        if (sum > left_sum) {
            left_sum = sum;
            max_left = i;
        }
    }
    sum = 0;
    for (int i = mid + 1; i <= end; i++) {
        sum += arr[i];
        if (sum > right_sum) {
            right_sum = sum;
            max_right = i;
        }
    }
    return {max_left, max_right, left_sum + right_sum};
}

Now, we have handled all the three cases, and we can write the code for finding the maximum-subarray. The workflow is below:

check if the array contains only one element
- if so, return the element and that is the maximum-subarray in this interval
if no, we find the midpoint $\text{mid}$
we then recursively deal with the left part and the right part of the array, finding the maximum-subarray on each side. We also need to handle the situation of crossing subarray.
We compare the results and return the biggest position and sum.

This procedure has the following cpp code:

maxArrayResult maximumArray(std::vector<int> arr, int begin, int end) {
    if (begin == end) {
        return {begin, end, arr[begin]};
    } else {
        int mid = (begin + end) / 2;
        auto crossing_result = maxCrossingArray(arr, begin, mid, end);
        auto left_result = maximumArray(arr, begin, mid);
        auto right_result = maximumArray(arr, mid + 1, end);
        if (left_result.sum >= right_result.sum &&
            left_result.sum >= crossing_result.sum) {
            return left_result;
        } else if (right_result.sum >= left_result.sum &&
                   right_result.sum >= crossing_result.sum) {
            return right_result;
        } else {
            return crossing_result;
        }
    }
}

std::vector<int> makeDifferenceArray(std::vector<int> arr) {
    auto ans = std::vector<int>(arr.size() - 1);
    for (int i = 0; i < arr.size() - 1; i++) {
        ans[i] = arr[i + 1] - arr[i];
    }
    return ans;
}

makeDifferenceArray is a utility function we use to convert the original array into the difference array.

Because we are finding the max sum of the difference array, to find the two points of the original array, we have to add 1 to the end element.

Now we are to analyze the time complexity of this algorithm. We will analyze this using both master method.

First, we consider the base case, where there is only one element between the interval, the time to find the max subarray is $O(1)$ . Each time we are dividing the array into half of its original size, producing 2 subproblems, and in each division, we also have to do a check to see if the maximum-subarray goes across the mid point, which takes $O(n)$ time complexity. Thus, the time complexity of the recursive case, if we denote the time complexity $T(n)$ , is $2T(n / 2) + O(n)$ .

Thus, the recurrence of this algorithm is:

$\begin{align*} T(n) &= O(1) \text{ for n = 1}\\ T(n) &= 2T(n / 2) + O(n) \end{align*}$

# Probablistic analysis and randomized algorithms

Consider this problem:

You want to hire a assistant to replace the current assistant, to interview each person, there will be a cost $c_i$ . Because you must fire your current assistant, there exists a fee each time you hire an assistant, $c_h$ . As for this problem, you come up with this strategy:

Remember the evaluation of your current assistant as the best candidate option.
Iterate over all the candiates you have.
1. interview the candidate
2. if the candidate is better than the candidate you have currently, you make the best candidate this candidate and hire that candidate.
3. Otherwise you keep interviewing

In this case, we only care about how much will the whole procedure cost. And it’s simple to come up with the worst case scenerio: when the assistant lists are stored in an increasing order, since in this case we have to hire a new assistant each time we interviewed one.

But we know that the distribution cannot always be the worst case, then how about other cases? Can we come up with a so-called average-case running cost? The answer is yes, in most cases, when we know or we can assume the distribution of the inputs, we can use the method of probablistic analysis (in this case we can assume the candidate comes in random order). To properly define this randomness, we say that for all the $n!$ possible permutations of rankings, they have equal probability of appearing, and alternatively, we say this is a uniform random permutation.

In this case, we can use an RNG to make sure there is a uniform random permutation. In general, we call an algorithm is randomized if the behavior is determined not only by inputs but also by an randon-number generator, and in this case, we will use $\text{Random(a, b)}$ to denote a random integer in the range $[a, b]$ .

# A brief review of random variables

A random variable is a function from the sample space $S$ to $\mathbb{R}$ , and an indicator random variable associated with event $A$ outputs 1 when $A$ happens and 0 when $A$ doesn’t happen.

The expected value of a random variable is the sum of probability of a certain event occur times the value of the random variable with that event as input.

The expected value of a indicator random variable $X_A$ is the probability of evnet $A$ , or $Pr\{A\}$ .

To analyze the problem with random variables, suppose we have the random variable $X = \text{the number of times we hire an assistant}$ , the expected value of this random variable would be:

$E[X] = \sum^n_{i=1}{iPr{X=i}}$

To calculate this, we use a change of variable. suppose we have another indicator random variable called $X_j$ for whether we hired the $j-th$ assistant. In this case, we know we either hire or not hire one assistant, thus we can change the random variable we have to be the sum of the new random variables:

X = \sum_^n_{j=1}{X_j}

and the expected value can be calculated this way:

$\begin{align*} E[X] &= \sum E[X_j] &= \sum \frac{1}{j} &= \ln(n) + O(1) \end{align*}$

Thus, since we are expected to interview $\ln(n)$ many people, our procedure has average hiring cost of $O(c_h\ln(n))$

However, in many cases(and this case), we don’t have control over the sequence of the candidate arrives, thus we cannot make assumption about how the input will be given. To actually make the data we are iterating a uniform random permutation, we use a random number generator to permute the input into a new random order. Thus, there is no so-called “good” input and “bad” input (like when the candidate are presented in increasing order)permutation, we use a random number generator to permute the input into a new random order. Thus, there is no so-called “good” input and “bad” input (like when the candidate are presented in increasing order).

Ayano's blog