Lab 5: Notes

Hints on Lab 5

  1. Type conversions

    There is one necessary type conversion: converting strings read from the file to floats before analysis. This is best done once, when the file is read. Keep the values as floats rather than converting them to integers.

    One situation where type conversion may be useful is computing indices for median. Recall that only integers can be used as an index. Thus if you used some variation of len(data)/2 to compute your index, integer conversion would be needed. If you use floor division, e.g. len(data)//2, no conversion is needed, the result is already an integer type (and a reason why floor division is preferred for computing indices).

  2. Functions should work for as many input variations as possible

    The median function should sort the data, that is not rely on the data being previously sorted, so that it can be used with both sorted and unsorted inputs. In general, we want our functions to work for as many input variations as possible.

  3. Be careful with mode when using frequencies as a template

    If you model your mode solution on the frequencies function, be careful not to repeat the same error as in the original frequencies function. After the loop completes, we still need to check the largest value.

  4. DRY up repeated computations

    In implementing the median function avoid computing the middle index multiple times, e.g., len(data) // 2. If you find yourself copying or otherwise duplicating a computation, assign it to variable (once) and use that variable multiple times, e.g.,

     middle = len(data) // 2
    
  5. Pull computations that don’t change out from the loop body

    Inside the standard deviation function the average doesn’t change and thus can be computed once before the loop. Doing so is much more efficient than recomputing the average again and again in each iteration of the loop. For some of the larger files you likely noticed the computation took an appreciable amount of time. Recomputing the average in the loop turns a computation that should take time porportional to the length of the list (i.e., grows linearly) into one that takes time proportional to the square of the list length (i.e., grows quadratically). The latter grows quite quickly even for “small” lists. We will learn about computational scaling more formally later in the semester. But in general we want to write as efficient an implementation as possible.