Let’s check out a function to compute summary statistics of scores in a class. This function prompts the user to enter scores and then computes the max, min and average scores.
What if we wanted to compute the median, that is the “middle” value? Could we easily adapt this function to also compute the median? No. We can’t because to find the median we need to retain all of the scores in some way so we can then determine the “middle” value. Lists are a data structure we could use to store all of the scores.
What is a “data structure”? A particular way of organizing data. Different kinds of data structures have different properties that make them best suited for different kinds of computations. Data structures are such a fundamental topic in CS, we have an entire course, CS201, on data structures.
When we talk about properties what do we mean? We mean memory, time, invariants, etc. We will discuss in more detail (and with more formality) during the semester.
Let’s introduce our first data structure: Lists.
Lists can be created with []
or with list(...)
.
>>> [7, 4, 3, 6, 1, 2]
[7, 4, 3, 6, 1, 2]
>>> 10 # not a list
10
>>> [10] # a list with one element
[10]
>>> l = [7, 4, 3, 6, 1, 2]
>>> l
[7, 4, 3, 6, 1, 2]
>>> type(l)
<class 'list'>
>>> list("abcd")
['a', 'b', 'c', 'd']
What happened in that example? Let’s check out the list function (really constructor):
>>> help(list)
...
list(iterable) -> new list initialized from iterable's items
Recall that a string can be treated as an ordered collection, that is can
be iterated over (e.g., with a for
loop). If we use a string as a argument, it is
treated as a collection that can be converted into a list of characters.
How can we conceptualize lists? They are sequences of values (much like strings are sequences of characters) of any type (e.g., integers, floats, strings, other lists, etc.)
What are some differences between lists and strings though? Strings are sequences of characters only, while lists are sequences of values of any types, including lists, and in fact can have values of different types.
>>> a_valid_list = [1, 2.0, True, "string", [1]]
>>> a_valid_list
[1, 2.0, True, 'string', [1]]
We can apply the same indexing, slicing and membership operators we used with strings (because lists are also sequences, just like strings).
>>> list_of_strings = ["this", "is", "a", "list", "of", "strings"]
>>> list_of_strings[0]
'this'
>>> list_of_strings[0][1]
'h'
>>> list_of_strings[1].upper()
'IS'
>>> list_of_strings[1:4]
['is', 'a', 'list']
>>> list_of_strings[::2]
['this', 'a', 'of']
>>> "of" in list_of_strings
True
We can also apply operators like +
and *
for concatenation and repetition,
that is, these operators are overloaded for lists.
>>> [1] + [2]
[1, 2]
>>> [1] * 4
[1, 1, 1, 1]
The empty list is the list []
.
>>> []
[]
We often use the empty list for initialization, similar to how we use the empty string.
for
loopMuch like we iterated over the characters of a string, we can also iterate over
the elements of a list with a for
loop.
>>> for s in list_of_strings:
... print(s)
...
this
is
a
list
of
strings
Many of the built-in functions that are defined on strings are also defined on lists:
len(L)
Returns the number of items in list Lmax(L)
Returns the maximum value in list Lmin(L)
Returns the minimum value in list Lsum(L)
Returns the sum of the values in list Lsorted(L)
Returns a copy of list L where the items are in order from
smallest to largest (This does not mutate L.)>>> len(list_of_strings)
6
>>> min(list_of_strings)
'a'
>>> max(list_of_strings)
'this'
>>> sum(list_of_strings)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Why did min
and max
work, but not sum
? Recall that we can compare strings
with relational operators, e.g., ‘<’, but when we look at the documentation for
sum
we see it is only defined for numeric types.
>>> help(sum)
Help on built-in function sum in module builtins:
sum(iterable, start=0, /)
Return the sum of a 'start' value (default: 0) plus an iterable of numbers
When the iterable is empty, return the start value.
This function is intended specifically for use with numeric values and may
reject non-numeric types.
>>> list_of_nums = [5, 3, 7, 9, 4, 1]
>>> sum(list_of_nums)
29
Lists are also objects. We can use the dir
function to list the available
methods.
>>> dir(list)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
A method we will use often is sort
. Recall that when we want to use help
with methods we need to prefix the method name with the type, e.g.,
>>> help(list.sort)
Help on method_descriptor:
sort(...)
L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*
Let’s look at some other methods, specifically pop
:
>>> help(list.pop)
Help on method_descriptor:
pop(...)
L.pop([index]) -> item -- remove and return item at index (default last).
Raises IndexError if list is empty or index is out of range.
So what does this do? It “pops” the last element off the list, optionally specifying a specific index to pop, e.g.,
>>> list_of_strings = ["this", "is", "a", "list", "of", "strings"]
>>> list_of_strings.pop()
'strings'
>>> list_of_strings
['this', 'is', 'a', 'list', 'of']
Note that pop
modifies the list on which it is invoked. Also note in this case
if you don’t assign the result of pop
to a variable it is lost forever.
Unlike strings, which are immutable, list
s are mutable and many of the
methods, including sort
and reverse
modify the list on which they are
invoked (indicated by the *IN PLACE*
in the documentation) instead of
returning a new list (sort
and reverse
actually return None
). We will
learn more about mutability in future classes but for now, just be aware that
many of the list methods modify the list.
The method append
is a counterpart to pop
:
>>> list_of_strings.append("strings")
>>> list_of_strings
['this', 'is', 'a', 'list', 'of', 'strings']
>>> list_of_strings.sort()
>>> list_of_strings
['a', 'is', 'list', 'of', 'strings', 'this']
Compare the above to using the sorted
method, which returns a copy of the
list. Notice that list_of_strings
remains unmodified.
>>> list_of_strings = ['this', 'is', 'a', 'list', 'of', 'strings']
>>> sorted(list_of_strings)
['a', 'is', 'list', 'of', 'strings', 'this']
>>> list_of_strings
['this', 'is', 'a', 'list', 'of', 'strings']
One more method you will need for the peer instruction questions:
>>> help(list.remove)
Help on method_descriptor:
remove(...)
L.remove(value) -> None -- remove first occurrence of value.
Raises ValueError if the value is not present.
To learn more about list methods check out the Python documentation.
Peer instruction questions (Lists) [1] (Section A, Section B)
Let’s check out an implementation of our scoring functions that uses lists. Here we read the scores into a list and perform multiple analyses on those scores, including assigned grades, which depends on having “seen” all of the scores.
Here is a more concise implementation of average using the built-in functions we saw today:
def average(scores):
"""
Compute average of list of scores
Args:
scores: List of numeric scores
Returns:
Average of scores as a float
"""
return sum(scores) / len(scores)