- Informally define asymptotic complexity with a focus on runtime
- Describe the purpose and calculation of Big-O notation
- Compare (e.g. rank) constant, logarithmic, linear, quadratic, and cubic complexities
- Predict the runtime of an algorithm based on its Big-O complexity

Several lectures ago we plotted the execution time of membership queries of list and sets of different sizes. Check out lists_vs_sets_improved.py for a refresher. Recall that we observed that the query time grew linearly with the size of list, but did not grow at all as we increased the size of the set. Today we are going to talk more formally about the efficiency of these and other algorithms.

We can always talk about absolute running time, but that requires lots of experimentation and is easily confounded by different inputs, the computer, etc. Instead we want a tool that allows us to talk about and compare different algorithms, data structures, etc. before we do any implementation, while eliding unnecessary details.

That tool is asymptotic analysis, that is an estimate of the execution time
(termed time complexity), memory usage or other properties of an algorithm as
the input *grows* arbitrarily large.

The key idea: How does the run-time grow as we increase the input size?

For example, in the case of querying lists, if we double the size of the list, how will the execution time (time complexity) grow? Will it be unchanged, double, triple, quadruple? Double! Similarly, how would the execution time of querying sets grow if we double the size of the set? Unchanged!

Big-O notation is way of describing the upper-bound of the growth rate as a
function of the size of the input, typically abbreviated *n*. That is, it is
really about growth rates. Big-O is a simplification of \(f(n)\),
the actual functional relationship, which follows these two rules:

- If \(f(n)\) is the sum several terms, only the term with largest growth rate is kept.
- Any constant terms, i.e. those that do not depend on
*n*are omitted.

Thus \(f(n)=3n^3 + 6n^2 + n + 5\) would be \(O(n^3)\).

Using big-O, we can describe groups of algorithms that have similar asymptotic behavior:

Complexity | Description |
---|---|

\(O(1)\) | Constant |

\(O(\log n)\) | Logarithmic |

\(O(n)\) | Linear |

\(O(n\log n)\) | Linearithmic |

\(O(n^2)\) | Quadratic |

Returning to our “list vs. sets” example, querying a list is \(O(n)\) or linear time and querying a Python set is \(O(1)\), or constant time.

What about the standard deviation computation in our statistics lab? Here is a
Python module with two possible implementations. What is the
complexity of these two implementations? In the first, we compute the mean
before the loop. Thus inside the loop we are only performing a constant number
of operations (a subtraction, multiplication and addition). Thus it is a linear
time implementation. In the latter, in each loop iteration, i.e. *n* times, we
are computing the average, an \(O(n)\) operation. Thus the overall
complexity is \(O(n^2)\), or quadratic! We should definitely
choose the first approach!

*PI Questions*

Keep in mind that big-O approximates the growth rate in the limit where the input size is very large. That assumption may not apply for the inputs you are interested in. And keep in mind that big-O drops the constants, however in reality the constant factors may be quite large. Thus think of big-O as a useful tool for thinking about efficiency (albeit with caveats) that complements experiments and other approaches.

In addition to the classes we describe above, we can talk about some broader
classes of time complexity. For example polynomial time. Polynomial time
problems are those that can be solved in \(O(n^k)\), where *k* is
some constant. Most of the algorithms we have encountered fall into this
category.

We often focus on a specific subset of polynomial time problems, decision problems (those problems that have a boolean answer). The class of decision problems that can be solved in polynomial time using a deterministic Turing machine are in the class P. Another important class is NP, or “non-deterministic polynomial time”. At present there are no known solutions for these problems that run in polynomial time. An example of an NP problem is the decision version of the traveling salesman problem (TSP):

“Given a matrix of distances between *n* cities, determine if there is a route
visiting all cities exactly once with total distance less than *k*.”

We can verify if a solution is valid in polynomial time, but finding such a route generally, at present, takes exponential time in the worst case. Proving whether P=NP, or not, is one of the key open problems in Computer Science (with a hefty cash prize, if I recall). Finding a polynomial time solutions to NP problems would make many otherwise intractable, or at least very difficult, problems (like TSP) tractable, even in the worst case.

TSP is one of a sub-class of NP problems termed NP-complete. NP-complete problems can be translated to other NP-complete problems in polynomial time. Thus a polynomial time solution to any NP-complete problem would a polynomial time solution to all other NP-complete problems!

Yes! One of the most famous undecidable problems is the halting program: Given an arbitrary program and its input, as an input, determine whether the program will finish or run forever.

Alan Turing proved that a general algorithm to solve the halting problem for any program does not exist. To do so created a Turing machine, a theoretical model of a computer that is used to model and test theoretical aspects of computing.