CS 150
Searching and Sorting
- What is an algorithm?
- detailed instructions for solving a problem
- method of steps to accomplish a task
- Examples
- cooking recipe
- finding the GCD of two numbers (look up Euclid's algorithm)
- sort a list of numbers
- find a route from one place to another (cars, packet routing, phone routing, ...)
- find the longest common substring between two strings
- add two numbers
- microchip wiring/design (VLSI)
- solving sudoku
- cryptography
- compression (file, audio, video)
- spell checking
- pagerank
- classify a web page
- ...
- Main parts to algorithm analysis
- developing algorithms that work
- making them faster
- analyzing/understanding the efficiency/runtime
- What questions might we want to ask/measure about an algorithm?
- how long does it take to run? How efficient is it?
- how much memory does it use?
- does it finish?
- is it right?
- how hard is it to code?
- Searching
Input: a list of elements, and a target element to look for
Output: is the target contained in the list?
- If so, possibly retrieve information stored with the element
- Examples:
1. look up a name in a phone book
2. find a street address in a phone book
- why is #2 more difficult?
- if list is unsorted, need to look at each element
- simple, but slow
- takes time proportional to length of list N
* LINEAR SEARCH
- look at each element until found or get to end of list
- runtime is O(N) ("big-Oh of N") on average (also worst-case)
- aka "linear"
- aka proportional to N
- if N doubles, runtime doubles
- can we do better?
- yes, if list is sorted
- strategy is similar to number guessing game
- I'm thinking of a number between 1 and 1000
- 500?
- too high
- 250?
- too low
- 375?
- too high
...
- each guess cuts search range in half
- only need k guesses for N = 2^k
- e.g., for N = 1000, need only 10 guesses (since 2^10 = 1024)
- in other words, if you keep cutting 1000 in half, it takes only 10 steps to reach 1
- 20 steps for 10^6 (one million)
- 30 steps for 10^9 (one billion)
* BINARY SEARCH
- (requires sorted list)
- zero in on target number, cutting search range in half at each step
- in general, given N elements, it takes log2 N steps
- runtime is O(log N)
- much better than O(N)
- if N doubles, only takes an extra step (i.e. time increases by constant amount)
- code examples:
searching.py
- Sorting
Input: A list of numbers nums
Output: The list of numbers in sorted order, i.e. nums[i] <= nums[j] for all i < j
- try to sort in place
- many different ways to sort a list
- Demo: xsortlab
* SELECTION SORT
- find smallest, move to the front
- find second-smallest, move into second place
- ...
- requires N passes over list
- in each pass, find smallest of remaining elements
- to find the smallest, simply traverse list, keep track of location of smallest so far
- finding smallest of N elements takes O(N) time
- so total runtime is prop. to N + (N-1) + ... + 3 + 2 + 1
- this is roughly N^2 / 2, or (N * (N + 1)) / 2, to be precise
- since we only want a rough proportionality measure, we can drop the factor of 1/2
- runtime is O(N^2), aka "quadratic"
- if N doubles, runtime quadruples
- if N goes up by factor of 10, runtime goes up by factor of 100
* INSERTION SORT
- like picking up a hand of cards
- keep inserting the next unsorted value in the sorted portion
- run xsortlab for demonstration
- sorted portion grows: 1, 2, 3, ..., N
- insertion process for each number could take linear time in worst case
- like selection sort, results in O(n^2) quadratic runtime
- in best case (if already sorted, or nearly sorted), runs in linear time
- can we do better?
- run compare_sorting() in sorting.py
* MERGE SORT
- significantly faster algorithm, especially for large N
- divide and conquer (recursive algorithm):
- divide list into two halves
- sort half lists recursively using merge sort
- merge the two sorted sublists
- merging step takes O(N) time
- log N levels of recursion (keep cutting N in half until reach 1)
- total runtime is O(N log N)
- run plot_merge_sort() in sorting.py
* SUMMARY
- algorithms
- searching
- linear search (best you can do if unsorted)
- binary search (if sorted, much faster)
- sorting
- selection sort: repeatedly move smallest element to the front
- insertions sort: repeatedly insert next unsorted element among sorted ones
- both of these have quadratic runtime
- merge sort: recursive divide-and-conquer algorithm
- much faster for large N
- big O notation is a tool that allows us to compare different algorithms while
hiding the details that don't matter
- hard to analyse code directly, e.g., count number of operations
- what counts as an operation?
- different operations take different amounts of time
- will depend on the input
- asymptotic analysis
- Key idea: how does the runtime grow as we increase the input size N?
- in our case, as we search or sort more numbers, roughly how will the runtime increase?
- Big-O notation: an upper bound on the runtime
- describe how the runtime changes as the input time changes
- this gives us groups of methods/functions that behave similarly
- O(f(n)): as n increases, the runtime increases with f(n)
- O(n) = linear
- double the input size, double the runtime
- examples: linear search, finding minimum, computing average
- O(n^2) = quadratic (worse than linear)
- double the input size, quadruple the runtime
- examples: selection sort, insertion sort
- O(log N) = logarithmic (better than linear)
- double the input size, runtime only increases by constant
- example: binary search, looking up name in phone book
- O(N log N) (almost as good as linear)
- double the input size, runtime doubles (plus small constant)
- example: merge sort