QuickSort

Pre-class notes

In-class notes

1 Learning Goals

  • Describe QuickSort and Partition
  • Benchmark worst-case (unlucky) and best-case (lucky) QuickSort runtimes
  • Define and understand key probability terms: “sample space,” “random variable,” “expectation value,” “linearity of expectation.”
  • Describe processes (brute force, clever) for calculating the average runtime of algorithms
  • Analyze number of comparisons done in QuickSort and use to calculate the average runtime of QuickSort
  • Describe advantages of different sorting algorithms

2 QuickSort

QuickSort(A):

// Input: Array \(A\) of unique integers
// Output: Sorted \(A\)

// Base Case
If \(|A|=1\): Return \(A\)

// Preprocessing and Dividing
\(pivInd \leftarrow\) randomly chosen index of \(A\)
Partition(\(A\), \(pivInd\), \(A[pivInd]\)) // Modifies \(A\) to have form \([A_L, pivVal,A_R]\)

// Conquer
QuickSort(\(A_L\))
QuickSort(\(A_R\))

The key subroutine in QuickSort is called Partition

Partition(A, pivInd, pivVal):

// Input: Array \(A\) of unique integers, an pivot element with index pivInd and value pivVal
// Output: An array containing the same values as \(A\), such that all elements with values less than pivVal occur first in the array, and then pivVal is the next element in the array, and then all elements with values more than pivVal occur next.

Swap positions: (pivot element) \(\leftrightarrow\) \((A[1])\)
\(current\leftarrow 2\)
While \(current \leq |A|\):
\(\quad\) If \(A[current]<pivVal\):
\(\qquad\) Swap positions \((A[current])\leftrightarrow\)(pivot element)
\(\qquad\) Swap positions \((A[pivInd+1])\leftrightarrow\)(pivot element)
\(\quad current\leftarrow current +1\)

Our goal in this unit will be to analyze the average runtime of QuickSort. How to do this?

  • Notice that the Partition subroutine takes up the most time in each recursive call.
  • Also notice that the runtime of Partition scales asymptotically with the number of times we do the comparison \[\textrm{If }A[current]<pivVal, \tag{1}\] because the runtime of Partition depends on the number of iterations of the while loop in Partition, and each time we run the while loop, we do the comparison in Equation 1.

So our general strategy to analyze the average runtime of QuickSort will be to track the number of comparisons of the form of Equation 1 over the whole course of the algorithm.

3 Benchmarking the Best-case and Worst-case Runtime of QuickSort

The pivot is chosen randomly, and it turns out that some choices are very good and some are very bad:

  • Lucky: We say a pivot choice is lucky if the pivot element is the median of the array.
  • Unlucky: We say a pivot choice is unlucky if the pivot element is the max or min of the array.
Group Exercise

Consider the following two scenarios:

  1. You get Lucky at every recursive call throughout QuickSort
  2. You get Unucky at every recursive call throughout QuickSort

For each of these two scenarios:

  • Create a recurrence relation for the runtime of QuickSort on an array of size \(n\)
  • Solve the recurrence relation to determine the runtime in each case.

If you finish, try to recall these terms from the probability unit of CS200: sample space, random variable, expectation value, linearity of expectation

4 Strategies for Analyzing the Average Runtime

4.1 Basic Approach

Here is the basic strategy for analyzing the average runtime of an algorithm:

  1. Determine the Sample Space, \(S\), the set of all possible sequences of random events that might occur over the course of the algorithm.
    • ex: In QuickSort, \(S\) is the set of all possible pivot choices that the algorithm might make.
    • ex:
ABCD Question

What is the sample space if QuickSort is run on \(A=[8,5,7]\)

  1. \(S=\{8,5,7\}\)
  2. \(S\) = All possible permutations of \(\{8,5,7\}\)
  3. \(S\) = power set of \(\{8,5,7\}\) (set of all subsets of \(\{8,5,7\}\))
  4. \(S = \{(7), (8,5), (8,7), (5,8), (5,7)\)
  1. Create a Random variable for a quantity that scales with the runtime. (A random variable is a function that maps each element of the sample space to a number.)
    • ex: In QuickSort \(R:S\rightarrow \mathbb{R}\) is the number of comparisons (of the form of Equation 1) done over the course of the algorithm. So \(R(\sigma)\) is the number of comparisons if pivot sequence \(\sigma\) is chosen. From our previous example, with \(A=[8,5,7]\), we can see that \(R(7)<R(8,7)\) because we don’t have any additional comparisons in the recursive calls when the pivot sequence is \(7\) (because we go straight to the base case in the recursive calls), but we do have additional comparisons in the recursive call when the sequence is \((8,7)\).
  2. Take the expectation value of \(R\) (and then the big-O) to get the average runtime. Recall that the expectation value of a random variable is \[\mathbb{E}[R]=\sum_{\sigma\in S}R(\sigma)p(\sigma), \tag{2}\] where \(p(\sigma)\) is the probability of outcome \(\sigma\) occurring.
    • ex: From our example with \(A=[8,5,7]\), using a tree diagram (see hand written notes), we can calculate that \(p(7)=1/3\) and for \(\sigma\in S: \sigma\neq (7)\), we have \(p(\sigma)=1/6\).

The problem with this basic approach is that often the \(p(\sigma)\)’s are hard to calculate. (Even with our very small example of a length-3 array, determining these values was not easy). Also the \(R(\sigma)\)’s are hard to calculate…we haven’t even figured them out yet it is such a pain.

4.2 Clever Approach

Because of these difficulties, we are going to use a different, cleverer approach. Instead of Step 2 and 3 above, we will do a cleverer step 2 and 3:

  1. (Alternate) Create a random variable \(R\) for a quantity that scales with the runtime. But if \(R\) is too complicated to easily determine for a given element of the sample space, then rewrite \(R\) as a sum of simpler random variables, as \(R=\sum_{i}X_i\).
    • ex: If you have a function \(f(x)=x+x^2\), we can write this as a sum of \(f_1(x)=x\) and \(f_x(x)=x^2\). Then \(f=f_1+f_2\), where hopefully \(f_1\) and \(f_2\) will be easier to deal with
    • ex: For QuickSort, we are going to write \(R\) as a sum of the random variables \(X_{i,j}:S\rightarrow R\), where \[\begin{align} X_{i,j}=& \textrm{the number of times }z_i \textrm{ and }z_j \textrm{ are compared.} \end{align}\] where \(z_i\) is the \(i^\textrm{th}\) smallest element of \(A\). So in our example \(A=[8,5,7]\), \(z_1=5\), \(z_2=7,\) and \(z_3=8.\) Then \(X_{2,3}(8,5)\) is the number of times \(7\) and \(8\) are compared over the course of the algorithm, if \(8\) is chosen as the first pivot, and then \(5\) is chosen as the next pivot in the recursive call. Then \[ R=\sum_{i=1}^{n-1}\sum_{j=i+1}^nX_{i,j}=\sum_{i<j}X_{i,j} \] where the final summation is using abbreviated notation to combine two summation symbols into one.
  2. (Alternate) Use linearity of expectation to analyze \(\mathbb{E}[R]\). Linearity of expectation just means we can move the expectation inside the sum. So if we have \(R=\sum_iX_i\), linearity of expectation tells us that \[ \begin{align} \mathbb{E}[R]&=\mathbb{E}\left[\sum_{i}X_i\right]\nonumber\\ &=\sum_{i}\mathbb{E}[X_i] \end{align} \] So instead of having to calculate the expectation value of the complicated function \(R\), we can do it for the hopefully simpler functions \(X_i\).

This is particularly simple in the case when \(X_i\) is an indicator random variable, which means \(X_i\) only has value \(0\) or \(1\). Then \[ \mathbb{E}[X_i]=\sum_{\sigma\in S}X(\sigma)p(\sigma) \tag{3}\] but in each of these terms in the summation, \(X(\sigma)=0\), in which case the term goes away, or \(X(\sigma)=1\), in which case, the term just becomes \(p(\sigma)\). Thus Equation 3 becomes \[ \mathbb{E}[X_i]=\sum_{\sigma\in S:X(\sigma)=1}p(\sigma). \tag{4}\] But if we add up all of the probabilities associated with elements in the sample space whose \(X\)-values are \(1\), that is just the probability of the event that the \(X\)-value of the outcome is \(1\): \[ \mathbb{E}[X_i]=Pr(X=1). \tag{5}\] It will hopefully be the case that it is easier to calculate the probability that \(X\) has value \(1\) that to calculate our original expectation value.

5 Analyzing Average Runtime of QuickSort

5.1 Understanding \(X_{i,j}\)

For QuickSort, we have broken up the total number of comparisons \(R\) in terms of \(X_{i,j}\) the number of comparisons between \(z_i\) and \(z_j\) as \[R=\sum_{i<j}X_{i,j} \] (Step 2, alternate.) Taking the expectation value of both sides, and then applying linearity of expectation, (Step 3 Alternate), we have \[ \mathbb{E}[R]=\mathbb{E}\left[\sum_{i<j}X_{i,j}\right]=\sum_{i<j}\mathbb{E}\left[X_{i,j}\right]. \]

To analyze \(\mathbb{E}[X_{i,j}]\) it is important to note that

  • The pivot is never included as part of the array that is input to the recursive call
  • In partition, the pivot is compared to each element of the array, but no other pairs of elements are compared

Given these facts, we will try to understand \(X_{i,j}\) a bit better:

Group Exercise
  1. Suppose \(z_i\) and \(z_j\) with \(i<j\) are both in a subarray that is input to some recursive call of QuickSort. For each of the following cases, determine, are \(z_i\) and \(z_j\) compared in this call? Are they separated or kept together in future recursive calls?
  • \(z_i\) or \(z_j\) is chosen as the pivot
  • \(z_k\) is chosen as the pivot, such that
    • \(i<k<j\)
    • \(k<i,j\)
    • \(k>i,j\)
  1. What values can \(X_{i,j}\) take (only 2 are possible) and under which conditions does it take each value?
  2. What is the probability of \(z_i\) and \(z_j\) being compared to each other?
ABCD Question

What is the probability that \(X_{i,j}=1\)?

  1. \(\frac{1}{j-i}\)
  2. \(\frac{2}{j-i+1}\)
  3. \(\frac{2}{n}\)
  4. \(\frac{2}{n^2}\)

5.2 Putting Everything together

We have \[ R=\sum_{i<j}X_{i,j} \] Taking the expectation value of both sides, and then using linearity of expectation, we have \[ \begin{align} \mathbb{E}[R]=&\mathbb{E}\left[\sum_{i<j}X_{i,j}\right]\\ =&\sum_{i<j}\mathbb{E}\left[X_{i,j}\right]. \end{align} \] Since \(X_{i,j}\) is an indicator random variable, we have, using Equation 3 to Equation 5, that \[ \begin{align} \mathbb{E}[R]=&\sum_{i<j}Pr\left(X_{i,j}=1\right)\\ =&\sum_{i=1}^{n-1}\left(\sum_{j=i+1}^n\frac{2}{j-i+1}\right). \end{align} \] Now if we expand out the term in the parentheses, we get \[ \mathbb{E}[R] =\sum_{i=1}^{n-1}\left(2\left(\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\cdots+\frac{1}{n-i+1}\right)\right). \] Next we are going to add some additional positive terms onto the right hand side of the equation. This will introduce an inequality because the right hand side will be bigger than it was before: \[ \mathbb{E}[R] \leq \sum_{i=1}^{n-1}\left(2\left(\frac{1}{1}+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\cdots+\frac{1}{n-i+1}\right)\right). \] But now the right hand side can be rewritten as \[ \mathbb{E}[R] \leq 2\sum_{i=1}^{n-1}\left(\sum_{j=1}^n\frac{1}{j}\right). \tag{6}\] A useful mathematical inequality (you do not need to know how to prove, or memorize but you should be aware of it) is \[ \sum_{j=1}^n\frac{1}{j}\leq \ln(n)+1 \tag{7}\] where \(\ln\) is the natural log: \(\log_e\). Plugging Equation 7 into Equation 8, we have \[ \begin{align} \mathbb{E}[R] \leq& 2\sum_{i=1}^{n-1}\left(\ln(n)+1\right)\\ =&2(n-1)\left(\ln(n)+1\right)\\ =&O(n\log n). \end{align} \tag{8}\]

Thus we see that even though we might get very unlucky with our sequence of pivot choices and get an \(O(n^2)\) runtime, the average runtime is \(O(n\log n)\). Because the worst case is so much worse than the best case, in order to get an average runtime that is close to the best case, the distribution of the sample space must be highly concentrated around \(O(n\log n)\).

6 When to use different sorting algorithms?

Group Exercise

For each of the following, think about whether it might be better to use QuickSort or MergeSort, and why?

  • You have limited space
  • You need to sort multiple lists in parallel
  • Your array is stored as a linked list
  • The size of your array is small
  • You want speed, and you can query the value of any array element quickly.