Lecture 11: Dictionaries and Tuples

Objectives for today

Dictionaries

A dictionary, or dict in Python, is a data structure that stores keys and associated values, optimized for efficiently looking up the value by key. In other languages this data structure is called a map or an associative array.

Peer instruction questions (Dictionaries) [1] (Section A, Section B)

A dictionary literal is created with { ... }, e.g.,

>>> {}
{}
>>> d = { 'cat': 'die Katze', 'dog': 'der Hund' }
>>> d
{'cat': 'die Katze', 'dog': 'der Hund'}

Note that { ... } is also used for sets (which we will learn about next time). You indicate a dictionary with the <key> : <value> syntax. Note that the {} is an empty dictionary, not an empty set. Another way to create an empty dictionary is with the dictionary constructor dict(). As we will see in the future, empty sets can be created with the set constructor set().

In the above, the string ‘dog’ is a key and the string ‘der Hund’ is its value.

We can efficiently access values with the indexing operator, e.g.,

>>> d['cat']
'die Katze'
>>> d['dog']
'der Hund'
>>> d['bird']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'bird'

but we will get an error if the key is not in the dictionary. What if we aren’t sure the key is in the dictionary and we have a default value we would want to use instead? We can use get with an optional second parameter that will be returned as the value of the get:

>>> d.get('dog')
'der Hund'
>>> d.get('bird', '')
''
>>> d
{'cat': 'die Katze', 'dog': 'der Hund'}

We can use the index operator to add key-value pairs to the dictionary. Assigning a value to a key that already exists will overwrite the previous value; assigning a value to a key that does not exist will create that key (with that value) in the dictionary. Note that the same is not true for a list. Assigning to an index outside the current “range” of the list is an error.

>>> d['bird'] = 'der Vogel'
>>> d
{'cat': 'die Katze', 'dog': 'der Hund', 'bird': 'der Vogel'}

Peer instruction questions (Dictionaries) [1] (Section A, Section B)

Dictionary keys are unique, that is, you can’t have duplicate keys with different values. They can be any “hashable” type, which for our purposes implies “comparable”.

The values can be of any type, including mutable types like lists, etc. That means we can modify values in place. And a dictionary can have duplicate values.

>>> mixed = dict()
>>> mixed['string'] = 'test'
>>> mixed[5] = 10
>>> mixed['a_list'] = [1, 2, 3]
>>> mixed
{'string': 'test', 5: 10, 'a_list': [1, 2, 3]}
>>> mixed[5] += 7
>>> mixed
{'string': 'test', 5: 17, 'a_list': [1, 2, 3]}

Peer instruction questions (Dictionaries) [1] (Section A, Section B)

Much like lists and other data structures, dictionaries can be the argument to built-in functions like len, support operators like in and are also objects with various methods.

>>> len(mixed)
3
>>> 'a_list' in mixed
True
>>> dir(dict)
[..., 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']

Some commonly used methods:

>>> mixed.keys()
dict_keys(['string', 5, 'a_list'])
>>> mixed.values()
dict_values(['test', 17, [1, 2, 3])
>>> mixed.pop("a_list")
[1, 2, 3]
>>> mixed
{'string': 'test', 5: 17}

We can use a for loop to iterate over a dictionary in several ways.

Using a dictionary as a sequence in a for loop iterates over the keys to access the associated values. The following loops are identical:

>>> for k in d:
...     print(k)
... 
cat
dog
bird

>>> for k in d.keys():
...     print(k)
... 
cat
dog
bird

We can also iterate over the key-value pairs using the items method:

>>> help(dict.items)
Help on method_descriptor:

items(...)
    D.items() -> a set-like object providing a view on D's items

>>> for i in d.items(): 
...     print(i)
... 
('cat', 'die Katze')
('dog', 'der Hund')
('bird', 'der Vogel')

>>> for k,v in d.items(): 
...     print(k, "=>", v)
... 
cat => die Katze
dog => der Hund
bird => der Vogel

items returns a set-like object of dictionary items, which are (key,value) pairs. We can iterate over those pairs directly as in the first loop, or unpack the pairs into specific key and value variables.

Histograms: A motivating question

A really common tool in data analysis are histograms, typically implemented as a plot where the x-axis is bins and the y-axis is the count of items in that bin. We already implemented a histogram analysis in the frequencies function in Lab 5. But today let’s look at an alternate, easier, approach to creating histograms.

On paper determine the histogram for the following data:

[1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]

How did you do it? Probably you kept a tally for each number. Each time you encountered a new number, you initialized its count at 1, and every time you encountered a previously observed number you incremented its count. That is, you were keeping track of two connected pieces of information, the number and its associated count. We could describe these as the “key” and “value”, respectively.

Review histogram.py.

Let’s particularly focus our attention on:

for item in data:
    if item in counts:
        counts[item] += 1  # increment count if present
    else:
        counts[item] = 1

Notice we need to handle two different cases, 1) we have not previously seen a value, i.e., item in counts is False, and 2) we have previously seen a value. This is an example of a more general practice in developing our algorithms - thinking of the different input/execution situations. That is, we ask ourselves will our approach handle for instance: even and odd length strings, empty lists, inputs with dissimilar lengths, the first occurrence and all future occurrences, etc. A key step in our design process is enumerating all the cases our program needs to handle (and then making sure our approach does so correctly).

Here is an improved loop body that uses get instead of the entire if statement:

for item in data:
    counts[item] = counts.get(item, 0) + 1

Do not modify a collection while iterating over it

What would happen if we tried the following?

d = { 1 : "one", 2 : "two", 3 : "three" }
for key in d:
    if key == 2:
        d.pop(key)

We would get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration

It is generally not a good approach to modify a collection while iterating. Make a new copy instead.

Tuples

A tuple is an immutable sequence. Often tuples are heterogeneous structures, e.g., ("January", 1, 1970) could represent a date, where specific slots refer to values with a consistent meaning, e.g., month.

Tuples can be created with parentheses, e.g., (1, 2) or the named constructor, tuple(), or even with just a comma (and no parentheses).

>>> my_tuple = (1, 2, 3, 4)
>>> my_tuple
(1, 2, 3, 4)
>>> another_tuple = ("a", "b", "c", "d")
>>> another_tuple
('a', 'b', 'c', 'd')
>>> my_tuple[0]
1
>>> my_tuple[1]
2
>>> for i in my_tuple:
...     print(i)
... 
1
2
3
4
>>> my_tuple[1:3]
(2, 3)

Python allows for very concise unpacking of tuples (termed “tuple unpacking”). This is the feature we used when iterating through the items of a dictionary.

>>> var1, var2, var3, var4 = my_tuple
>>> var1
'a'
>>> var4
'd'

We can use this feature to elegantly implement swap:

>>> x = 10
>>> y = 20
>>> (y, x) = (x, y)
>>> x
20
>>> y
10

Note that tuples are immutable, like strings.

>>> my_tuple[0] = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

>>> my_tuple.append(6)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'

Recall though that immutability doesn’t prevent reassignment, e.g.,

>>> my_tuple = another_tuple
>>> my_tuple
('a', 'b', 'c', 'd')

Data structure categorization

An brief summary of major Python types:

Type Ordered Mutable Mutable Values Typical (but not only) Usage
List Yes Yes Yes Ordered homogeneous collection
Dictionary Yes-ish Yes Yes (values, not keys) Key -> Value lookup
Tuple Yes No Yes Heterogeneous (ordered) collection
Set No Yes No Membership/Set operations

What is the deal with Dictionary ordering? As of Python 3.7, dictionaries are specified to maintain their elements in insertion order. So when you iterate through the elements of a Dictionary, the elements will be in a known order.

Summary

  1. Dictionaries
  2. Tuples
  3. Data structure comparison
  1. histogram.py
  2. Python documentation for dictionary
  3. Python documentation for tuple

Supplemental Reading (optional)