Type conversions
There is one
necessary type conversion: converting strings read from the file to
float
s before analysis. This is best done once, when the file is read.
Keep the values as floats rather than converting them to integers.
One situation where type conversion may be useful is computing
indices for median
. Recall that only integers can be used as an index.
Thus if you used some variation of len(data)/2
to compute your index,
integer conversion would be needed. If you use floor division, e.g.
len(data)//2
, no conversion is needed, the result is already an integer
type (and a reason why floor division is preferred for
computing indices).
Functions should work for as many input variations as possible
The median
function should sort the data, that is not rely on the data
being previously sorted, so that it can be used with both sorted and
unsorted inputs. In general, we want our functions to work for as many
input variations as possible.
Be careful with mode
when using frequencies
as a template
If you model your mode
solution on the frequencies
function, be
careful not to repeat the same error as in the
original frequencies
function. After the
loop completes, we still need to check the largest value.
DRY up repeated computations
In implementing the median
function avoid computing the middle index multiple
times, e.g., len(data) // 2
.
If you find yourself copying or otherwise duplicating a computation, assign
it to variable (once) and use that variable multiple times, e.g.,
middle = len(data) // 2
Pull computations that don’t change out from the loop body
Inside the standard deviation function the average doesn’t change and thus can be computed once before the loop. Doing so is much more efficient than recomputing the average again and again in each iteration of the loop. For some of the larger files you likely noticed the computation took an appreciable amount of time. Recomputing the average in the loop turns a computation that should take time porportional to the length of the list (i.e., grows linearly) into one that takes time proportional to the square of the list length (i.e., grows quadratically). The latter grows quite quickly even for “small” lists. We will learn about computational scaling more formally later in the semester. But in general we want to write as efficient an implementation as possible.