Class 17: Tuples and Dictionaries

Objectives for today

Data structure categorization

An opinionated summary of “major” Python types:

Type Ordered Mutable Mutable Values Typical (but not only) Usage
List Yes Yes Yes Ordered collection of variable length (often homogenous)
Set No Yes No Membership/Set operations
Tuple Yes No Yes Heterogeneous (ordered) collection of fixed length
Dictionary Yes-ish Yes Yes (but not keys) Key -> Value lookup

What is the deal with Dictionary ordering? As of Python 3.7, dictionaries are specified to maintain their elements in insertion order. That this unlike Sets, when you iterate through the elements of a Dictionary, the elements will be in a known order.

Histograms: A motivating question

A really common tool in data analysis are histograms, typically implemented as a plot where the x-axis is bins and the y-axis is the count of items in that bin. We already implemented a histogram analysis in the frequencies function in Lab 5 and in our genotype frequency example. But today lets look at an easier, more generalizable and often faster approach to creating histograms.

On paper determine the histogram for the following data:

[1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]

How did you do it? Probably you kept a tally for each number. Each time you encountered a new number, you initialized its count at 1, and every time you encountered a previously observed number you incremented its count. That is you were keeping track of two connected pieces of the information, the number and its associated count. We could describe these as the “key” and “value”.

Dictionaries, aka maps, aka associative arrays

Dictionaries, or dicts in Python, are data structures that store keys and associated values, optimized for efficiently looking up the value by key. In other languages this data structures is called a (hash)map or an associative array.

PI Question 1 (Dictionaries)1

A dictionary literal is created with { ... }, e.g.

>>> {}
{}
>>> d = { 5: 1, 6: 2 }
>>> d
{5: 1, 6: 2}

Note that { ... } is also used for sets. You indicate a dictionary with the <key> : <value> syntax. Note that the {} is an empty dictionary, not an empty set (empty sets can be created with the set constructor, e.g. set()).

In the above, the integer 5 is the key and the integer 1 is its value.

We can efficiently access values with the indexing operator, e.g.

>>> d[5]
1
>>> d[6]
2
>>> d[1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 1

but we will get an error if the key is not in the dictionary. What if we aren’t sure the key is in the dictionary and we have a default value we would want to use instead? We can use get:

>>> d.get(1, 5)
5
>>> d.get(5, 6)
1

We can also use the index operator to add key-value pairs to the dictionary. Note that assigning a value to a key that does exist will overwrite the previous value; assigning a value to a key that does not exist will create that key (with that value) in the dictionary. Recall that the same is not true for a list. Assigning to an index outside the current “range” of the list is an error.

>>> d[3] = 7
>>> d
{3: 7, 5: 1, 6: 2}

PI Questions 2 & 3 (Dictionaries)1

Dictionary keys can be any “hashable” type (same as sets), and even mixed types. Keys are unique, that is you can’t have duplicate keys with different values. For our purposes, “hashable” implies “comparable” (so uniqueness of the keys can be enforced) and immutable so that prior comparisons aren’t invalidated by changing a key.

The values can be of any type, including mutable types like lists, etc. That means we can modify values in place. And a dictionary can have duplicate values.

>>> d["string"] = "test"
>>> d
{'string': 'test', 3: 7, 5: 1, 6: 2}
>>> d["a_list"] = [1, 2, 3]
>>> d
{'string': 'test', 'a_list': [1, 2, 3], 3: 7, 5: 1, 6: 2}
>>> d[3] += 5
>>> d
{'string': 'test', 'a_list': [1, 2, 3], 3: 12, 5: 1, 6: 2}

PI Question 4 (Dictionaries)1

Much like lists and other data structures, dictionaries can be the argument to built-in functions like len, support operators like in and are also objects with various methods.

>>> len(d)
5
>>> 'a_list' in d
True
>>> dir(dict)
['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']

Some commonly used methods:

>>> d.keys()
dict_keys(['string', 'a_list', 3, 5, 6])
>>> d.values()
dict_values(['test', [1, 2, 3], 12, 1, 2])
>>> d.pop("a_list")
[1, 2, 3]
>>> d
{'string': 'test', 3: 12, 5: 1, 6: 2}

What about iteration? How can we use for loops with dictionaries? There are actually several ways. Using a dictionary as sequence in a for loop iterates over the keys, e.g. the following loops are identical

>>> for k in d:
...     print(k)
... 
3
5
6
>>> for k in d.keys():
...     print(k)
... 
3
5
6

We can then use the keys to access the associated values. We can also iterate over the key-value tuples using the items methods and tuple unpacking.

>>> help(dict.items)
Help on method_descriptor:

items(...)
    D.items() -> a set-like object providing a view on D's items

>>> for i in d.items(): 
...     print(i)
... 
(3, 12)
(5, 1)
(6, 2)
>>> for k,v in d.items(): 
...     print(k, "=>", v)
... 
3 => 12
5 => 1
6 => 2

items returns a set like object of dictionary’s items, with are (key,value) tuples. We can iterate over those tuples directly as in the first loop, or unpack the tuples into specific key a and value variables.

Tuples: An interlude

A tuple is an immutable sequence. That said, tuples are not just used as a immutable list. Often tuples are heterogeneous structures, e.g. ("January", 1, 1970) could represent a date, where specific slots refer to values with a consistent meaning, e.g. month.

Tuples can be created with parentheses, e.g. (1, 2) or the named constructor, tuple(), or often just with a comma (and no parentheses).

>>> my_tuple = (1, 2, 3, 4)
>>> my_tuple
(1, 2, 3, 4)
>>> another_tuple = ("a", "b", "c", "d")
>>> another_tuple
('a', 'b', 'c', 'd')
>>> my_tuple[0]
1
>>> my_tuple[1]
2
>>> for i in my_tuple:
...     print(i)
... 
1
2
3
4
>>> my_tuple[1:3]
(2, 3)

Python allows for very concise unpacking of tuples (termed “tuple unpacking”). This is the feature we used when iterating through the items of a dictionary.

>>> var1, var2, var3, var4 = my_tuple
>>> var1
'a'
>>> var4
'd'

We can use this feature to elegantly implement swap:

>>> x = 10
>>> y = 20
>>> (y, x) = (x, y)
>>> x
20
>>> y
10

Recall that tuples are immutable, like strings.

>>> my_tuple[0] = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> my_tuple.append(6)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'

But that immutability doesn’t prevent reassignment of a variable “pointing to” a tuple, just changing the elements of a tuple.

>>> my_tuple = another_tuple
>>> my_tuple
('a', 'b', 'c', 'd')

Histograms: Putting it together

Review histogram.py. Let’s particularly focus our attention on:

for item in data:
    if item in counts:
        counts[item] += 1  # Increment count if present
    else:
        counts[item] = 1

Notice we need to handle two different cases, 1) we have not previously seen a value, i.e. item in counts is False, and 2) we have previously seen a value. This is an example of a more general practice in developing our algorithms - thinking of the different input/execution situations. That is asking ourselves will our approach handle for instance: even and odd length strings, empty lists, inputs with dissimilar lengths, the first occurrence and all future occurrences, etc. A key step in our design process is enumerating all the cases our program needs to handle (and then making sure our approach does so correctly).

What about modifying a collection while iterating?

What would happen if we tried the following?

d = { 1 : "one", 2 : "two", 3 : "three" }
for key in d:
    if key == 2:
        d.pop(key)

We would get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration

Depending on the data structure and the operation, we may or may not get an error. However, we will have a hard time reasoning about the behavior. It is generally not a good approach to modify a collection while iterating. Make a copy instead.