An opinionated summary of “major” Python types:
Type | Ordered | Mutable | Mutable Values | Typical (but not only) Usage |
---|---|---|---|---|
List | Yes | Yes | Yes | Ordered collection of variable length (often homogenous) |
Set | No | Yes | No | Membership/Set operations |
Tuple | Yes | No | Yes | Heterogeneous (ordered) collection of fixed length |
Dictionary | Yes-ish | Yes | Yes (but not keys) | Key -> Value lookup |
What is the deal with Dictionary ordering? As of Python 3.7, dictionaries are specified to maintain their elements in insertion order. That this unlike Sets, when you iterate through the elements of a Dictionary, the elements will be in a known order.
A really common tool in data analysis are histograms, typically implemented as
a plot where the x-axis is bins and the y-axis is the count of items in that
bin. We already implemented a histogram analysis in the frequencies
function
in Lab 5 and in our genotype frequency example. But today lets look at an easier, more generalizable and often faster approach to creating histograms.
On paper determine the histogram for the following data:
[1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
How did you do it? Probably you kept a tally for each number. Each time you encountered a new number, you initialized its count at 1, and every time you encountered a previously observed number you incremented its count. That is you were keeping track of two connected pieces of the information, the number and its associated count. We could describe these as the “key” and “value”.
Dictionaries, or dict
s in Python, are data structures that store keys and
associated values, optimized for efficiently looking up the value by key.
In other languages this data structures is called a (hash)map or an associative
array.
PI Question 1 (Dictionaries)1
A dictionary literal is created with { ... }
, e.g.
>>> {}
{}
>>> d = { 5: 1, 6: 2 }
>>> d
{5: 1, 6: 2}
Note that { ... }
is also used for sets. You indicate a dictionary with the
<key> : <value>
syntax. Note that the {}
is an empty dictionary, not an
empty set (empty sets can be created with the set constructor, e.g. set()
).
In the above, the integer 5 is the key and the integer 1 is its value.
We can efficiently access values with the indexing operator, e.g.
>>> d[5]
1
>>> d[6]
2
>>> d[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 1
but we will get an error if the key is not in the dictionary. What if we aren’t
sure the key is in the dictionary and we have a default value we would want to use
instead? We can use get
:
>>> d.get(1, 5)
5
>>> d.get(5, 6)
1
We can also use the index operator to add key-value pairs to the dictionary. Note that assigning a value to a key that does exist will overwrite the previous value; assigning a value to a key that does not exist will create that key (with that value) in the dictionary. Recall that the same is not true for a list. Assigning to an index outside the current “range” of the list is an error.
>>> d[3] = 7
>>> d
{3: 7, 5: 1, 6: 2}
PI Questions 2 & 3 (Dictionaries)1
Dictionary keys can be any “hashable” type (same as set
s), and even mixed
types. Keys are unique, that is you can’t have duplicate keys with different
values. For our purposes, “hashable” implies “comparable” (so uniqueness of the
keys can be enforced) and immutable so that prior comparisons aren’t
invalidated by changing a key.
The values can be of any type, including mutable types like lists, etc. That means we can modify values in place. And a dictionary can have duplicate values.
>>> d["string"] = "test"
>>> d
{'string': 'test', 3: 7, 5: 1, 6: 2}
>>> d["a_list"] = [1, 2, 3]
>>> d
{'string': 'test', 'a_list': [1, 2, 3], 3: 7, 5: 1, 6: 2}
>>> d[3] += 5
>>> d
{'string': 'test', 'a_list': [1, 2, 3], 3: 12, 5: 1, 6: 2}
PI Question 4 (Dictionaries)1
Much like lists and other data structures, dictionaries can be the argument to
built-in functions like len
, support operators like in
and are also objects
with various methods.
>>> len(d)
5
>>> 'a_list' in d
True
>>> dir(dict)
['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
Some commonly used methods:
>>> d.keys()
dict_keys(['string', 'a_list', 3, 5, 6])
>>> d.values()
dict_values(['test', [1, 2, 3], 12, 1, 2])
>>> d.pop("a_list")
[1, 2, 3]
>>> d
{'string': 'test', 3: 12, 5: 1, 6: 2}
What about iteration? How can we use for
loops with dictionaries? There are
actually several ways. Using a dictionary as sequence in a for loop iterates
over the keys, e.g. the following loops are identical
>>> for k in d:
... print(k)
...
3
5
6
>>> for k in d.keys():
... print(k)
...
3
5
6
We can then use the keys to access the associated values. We can also iterate
over the key-value tuples using the items
methods and tuple unpacking.
>>> help(dict.items)
Help on method_descriptor:
items(...)
D.items() -> a set-like object providing a view on D's items
>>> for i in d.items():
... print(i)
...
(3, 12)
(5, 1)
(6, 2)
>>> for k,v in d.items():
... print(k, "=>", v)
...
3 => 12
5 => 1
6 => 2
items
returns a set like object of dictionary’s items, with are (key,value)
tuples. We can iterate over those tuples directly as in the first loop, or
unpack the tuples into specific key a and value variables.
A tuple is an
immutable sequence. That said, tuples are not just used as a immutable
list. Often tuples are heterogeneous structures, e.g. ("January", 1, 1970)
could represent a date, where specific slots refer to values with a consistent
meaning, e.g. month.
Tuples can be created with parentheses, e.g. (1, 2)
or the named constructor,
tuple()
, or often just with a comma (and no parentheses).
>>> my_tuple = (1, 2, 3, 4)
>>> my_tuple
(1, 2, 3, 4)
>>> another_tuple = ("a", "b", "c", "d")
>>> another_tuple
('a', 'b', 'c', 'd')
>>> my_tuple[0]
1
>>> my_tuple[1]
2
>>> for i in my_tuple:
... print(i)
...
1
2
3
4
>>> my_tuple[1:3]
(2, 3)
Python allows for very concise unpacking of tuples (termed “tuple unpacking”). This is the feature we used when iterating through the items of a dictionary.
>>> var1, var2, var3, var4 = my_tuple
>>> var1
'a'
>>> var4
'd'
We can use this feature to elegantly implement swap:
>>> x = 10
>>> y = 20
>>> (y, x) = (x, y)
>>> x
20
>>> y
10
Recall that tuples are immutable, like strings.
>>> my_tuple[0] = 5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> my_tuple.append(6)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'
But that immutability doesn’t prevent reassignment of a variable “pointing to” a tuple, just changing the elements of a tuple.
>>> my_tuple = another_tuple
>>> my_tuple
('a', 'b', 'c', 'd')
Review histogram.py. Let’s particularly focus our attention on:
for item in data:
if item in counts:
counts[item] += 1 # Increment count if present
else:
counts[item] = 1
Notice we need to handle two different cases, 1) we have not previously seen a
value, i.e. item in counts
is False, and 2) we have previously seen a value.
This is an example of a more general practice in developing our algorithms -
thinking of the different input/execution situations. That is asking ourselves
will our approach handle for instance: even and odd length strings, empty
lists, inputs with dissimilar lengths, the first occurrence and all future
occurrences, etc. A key step in our design process is enumerating all the cases
our program needs to handle (and then making sure our approach does so
correctly).
What would happen if we tried the following?
d = { 1 : "one", 2 : "two", 3 : "three" }
for key in d:
if key == 2:
d.pop(key)
We would get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
Depending on the data structure and the operation, we may or may not get an error. However, we will have a hard time reasoning about the behavior. It is generally not a good approach to modify a collection while iterating. Make a copy instead.