Final Review

Final practice problems

Final Logistics

When and where
December 11 7:00-10:00PM in 75 SHS 102.
What can I bring?
One piece of letter-sized paper with notes on both sides (I will provide copies of the cheat sheet)
What can’t I bring?
Anything else, e.g., book, computer, other notes, etc.
Are there additional office hours?
Yes! I will hold regular office hours on Monday, December 9, and then Tues. Dec. 10 and Wed, Dec. 11 10:00AM-noon (as well as by appointment).

What will the exam cover?

The exam will effectively have two parts:

  1. Four new questions on the topics covered since midterm 2 (topics numbered 17-20)
  2. Retake opportunities for all topics from midterms 1 and 2 (topics numbered 1-16). Recall that you only retake problems for which don’t already have an M (3 points) or E (4 points).

The new topics are:

  1. Searching and sorting algorithms
  2. Use big-O analysis to inform algorithm design
  3. Understand numeric representations and associated operations
  4. Vectorized execution

The retake topics are:

  1. Understand the role of function scope
  2. Writing functions with randomness
  3. Writing functions with loops
  4. Choosing appropriate loops
  5. Creating simpler equivalent conditionals
  6. Finding errors
  7. Writing functions with sequences
  8. Utilizing turtle and other modules
  9. Connect Python to the outside world with file I/O and command line arguments
  10. Implications of the Python memory model
  11. Applications of data structures
  12. Writing functions with sets
  13. Writing functions with dictionaries
  14. Understanding and using recursive functions
  15. Finding errors (in recursive functions)
  16. Using Object-Oriented Programming

The exam will NOT include material that was in the readings but that we did not discuss in class, or use in our programming assignments or the practical problems.

Types of questions

  • Determine the output/result of code
  • Rewrite code with better style
  • Identify bugs in code
  • Reassemble jumbled Python statements into a viable function
  • Fill-in missing statements in code
  • Translate code from NumPy/datascience to “built-in” Python
  • Write code to solve a problem
  • Map algorithms to specific implementations and vice-versa
  • Determine and compare big-O time complexity and execution time

Review Questions

Solutions will be available after class (make sure to reload the page).

  1. You are given four programs, each uses one of four different implementation for searching a sorted array: iterative linear search, recursive linear search, iterative binary search, recursive binary search. Unfortunately you don’t know which program uses which approach, but you do know the last five calls of the mystery sorting function for each program.

    1. Search called with lists of length 433, 432, 431, 430, 429.

    2. Search called once with list of length 1000.

    3. Search called once with list of length 1000.

    4. Search called with lists of length 16, 8, 4, 2, 1

    Which program uses which approach. If there is insufficient information to answer uniquely, indicate the possible programs.

    Since search in program A is called with lists decreasing in size by one, we infer recursive linear search. Since search in program D is called with lists decreasing by a factor of 2, we infer recursive binary search. Given the available information, programs B and C could both be iterative linear search or iterative binary search.

    If there is insufficient information to answer uniquely suggest an experiment(s) to uniquely identify the approach used by each program.

    Assuming that search is measurable portion of the runtime and we can change the inputs to the search function, we could try doubling the input size. Since linear search has a time complexity of \(\mathcal{O}(n)\), its runtime should double (or at least increase noticeably), while for binary search as a time complexity of \(\mathcal{O}(\log n)\), and so its run time would only increase minimally.

    In practice we could likely observe differences in the absolute run time. But recall the big-O really describes the growth rate as the input grows very large (recall it is described as asymptotic analysis), not the runtime and that two algorithms have the same complexity necessarily mean that they have the same runtime, or if one has lower big-O time complexity that it is predictably faster than the other.

  2. What is the Big-O worst-case time complexity of the following Python code? Assume that list1 and list2 are lists of the same length.

    def difference(list1, list2):
        result = []
        for val1 in list1:
            if val1 not in list2:
                result.append(val1)
        return result

    The outer loop has n iterations, while the in operation has a worst-case time complexity of n, so the total worst-case time complexity is \(\mathcal{O}(n^2)\).

    The not in operator in this context is equivalent to not (val1 in list2). The worst case time complexity for in on a list is \(\mathcal{O}(n)\) because we potentially have to examine all the elements in the list. The average case is still \(\mathcal{O}(n)\), because on average we will need to examine half the elements.

    How could solve this more efficiently? With the subtraction operator on sets.

    Why does in use linear search instead of something faster, like binary search? The in operator is designed to work on any list, not just sorted lists. And as we observed in the practice problems, checking if a listed is sorted has the same time complexity as linear search. And so checking first if a list is sorted and then using binary search does not improve our worst case time complexity compared to just using linear search and likely makes the code slower in practice.

  3. What decimal numbers are represented by the following binary numbers:

    1. 1101

      13

    2. 111

      7

    3. 10010+11011

      1  1
       10010  18
      +11011  27
      ------  --
      101101  45
  4. Translate the following function using NumPy to just use Python built-ins assuming a_list is a list of floats (instead of a NumPy vector) and lower is a single (scalar) float:

    def sum_above(a_list, lower):
        return np.sum(a_list[a_list > lower])

    Recall that a_list[a_list > lower] is “vectorized” indexing, that is a_list > lower computes a vector (array) of booleans by performing an element-wise comparison. The indexing operation keeps all values of a_list for which the corresponding boolean is True.

    def sum_above(a_list, lower):
        """ Sum all value in a_list greater than lower """
        result = 0
        for val in a_list:
            if val > lower:
                result += val
        return result
  5. [From retest #2] Add to the body of the mystery function below such that after the code below executes, z and y have the same value and neither is equal to y’s initial value. If no such body is possible, indicate so. Briefly explain your answer.

    def mystery(x):
        x = x[:]
        # Add code here...
        return x
    
    y = [1, 2, [3, 4]]
    z = mystery(y)
    # z and y have the same value, and y is no longer [1, 2, [3, 4]]

    Since a slice copy is performed, only the nested lists remain aliased and so we want to modify the nested list, e.g., append a value, so z and y have the same value and are different from [1, 2, [3, 4]], the original y.

    def mystery(x):
        x = x[:]
        x[2].append(6)
        return x
  6. [From retest #2] Assume course enrollment data is a stored as a list of tuples, where each tuple contains a CRN number as a string and the ID number, also as a string, of a student enrolled in that course (i.e., ("92669", "00123456") for student “00123456” enrolled in the course “92669”). Write a function named under that takes this list and a integer floor, as parameters, and returns a list of tuples with the CRN and the number of students enrolled for courses with floor students or fewer. The order of the CRNs in the returned list does not matter. You can assume that the input list is non-empty and that there are no duplicate enrollments. For example:

    >>> enrolled = [("92669", "00123456"), ("92669", "00123457"), ("92670", "00123457")]
    >>> under(enrolled, 1)
    [("92670", 1)]
    >>> under(enrolled, 2)
    [("92669", 2), ("92670", 1)]
    def under(enrolled, floor):
        counts = {}
        for crn, student in enrolled:
            if crn in counts:
                counts[crn] += 1
            else:
                counts[crn] = 1
        result = []
        for count in counts.items():
            if count[1] <= floor:
                result.append(count)
        return result