A dictionary,
or dict
in Python, is a data structure that stores keys and
associated values, optimized for efficiently looking up the value by key.
In other languages this data structure is called a map or an associative
array.
Peer instruction questions (Dictionaries) [1] (Section A, Section B)
A dictionary literal is created with { ... }
, e.g.,
>>> {}
{}
>>> d = { 'cat': 'die Katze', 'dog': 'der Hund' }
>>> d
{'cat': 'die Katze', 'dog': 'der Hund'}
Note that { ... }
is also used for sets (which we will learn about next time).
You indicate a dictionary with the <key> : <value>
syntax. Note that the {}
is an empty dictionary, not an empty set. Another way to create an empty dictionary
is with the dictionary constructor dict()
.
As we will see in the future, empty
sets can be created with the set constructor set()
.
In the above, the string ‘dog’ is a key and the string ‘der Hund’ is its value.
We can efficiently access values with the indexing operator, e.g.,
>>> d['cat']
'die Katze'
>>> d['dog']
'der Hund'
>>> d['bird']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'bird'
but we will get an error if the key is not in the dictionary. What if we aren’t
sure the key is in the dictionary and we have a default value we would want to use
instead? We can use get
with an optional second parameter that will be returned
as the value of the get
:
>>> d.get('dog')
'der Hund'
>>> d.get('bird', '')
''
>>> d
{'cat': 'die Katze', 'dog': 'der Hund'}
We can use the index operator to add key-value pairs to the dictionary. Assigning a value to a key that already exists will overwrite the previous value; assigning a value to a key that does not exist will create that key (with that value) in the dictionary. Note that the same is not true for a list. Assigning to an index outside the current “range” of the list is an error.
>>> d['bird'] = 'der Vogel'
>>> d
{'cat': 'die Katze', 'dog': 'der Hund', 'bird': 'der Vogel'}
Peer instruction questions (Dictionaries) [1] (Section A, Section B)
Dictionary keys are unique, that is, you can’t have duplicate keys with different values. They can be any “hashable” type, which for our purposes implies “comparable”.
The values can be of any type, including mutable types like lists, etc. That means we can modify values in place. And a dictionary can have duplicate values.
>>> mixed = dict()
>>> mixed['string'] = 'test'
>>> mixed[5] = 10
>>> mixed['a_list'] = [1, 2, 3]
>>> mixed
{'string': 'test', 5: 10, 'a_list': [1, 2, 3]}
>>> mixed[5] += 7
>>> mixed
{'string': 'test', 5: 17, 'a_list': [1, 2, 3]}
Peer instruction questions (Dictionaries) [1] (Section A, Section B)
Much like lists and other data structures, dictionaries can be the argument to
built-in functions like len
, support operators like in
and are also objects
with various methods.
>>> len(mixed)
3
>>> 'a_list' in mixed
True
>>> dir(dict)
[..., 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
Some commonly used methods:
>>> mixed.keys()
dict_keys(['string', 5, 'a_list'])
>>> mixed.values()
dict_values(['test', 17, [1, 2, 3])
>>> mixed.pop("a_list")
[1, 2, 3]
>>> mixed
{'string': 'test', 5: 17}
We can use a for
loop to iterate over a dictionary in several ways.
Using a dictionary as a sequence in a for loop iterates over the keys to access the associated values. The following loops are identical:
>>> for k in d:
... print(k)
...
cat
dog
bird
>>> for k in d.keys():
... print(k)
...
cat
dog
bird
We can also iterate
over the key-value pairs using the items
method:
>>> help(dict.items)
Help on method_descriptor:
items(...)
D.items() -> a set-like object providing a view on D's items
>>> for i in d.items():
... print(i)
...
('cat', 'die Katze')
('dog', 'der Hund')
('bird', 'der Vogel')
>>> for k,v in d.items():
... print(k, "=>", v)
...
cat => die Katze
dog => der Hund
bird => der Vogel
items
returns a set-like object of dictionary items, which are (key,value)
pairs. We can iterate over those pairs directly as in the first loop, or
unpack the pairs into specific key and value variables.
A really common tool in data analysis are histograms, typically implemented as
a plot where the x-axis is bins and the y-axis is the count of items in that
bin. We already implemented a histogram analysis in the frequencies
function
in Lab 5. But today let’s look at an alternate, easier, approach to creating
histograms.
On paper determine the histogram for the following data:
[1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
How did you do it? Probably you kept a tally for each number. Each time you encountered a new number, you initialized its count at 1, and every time you encountered a previously observed number you incremented its count. That is, you were keeping track of two connected pieces of information, the number and its associated count. We could describe these as the “key” and “value”, respectively.
Review histogram.py.
Let’s particularly focus our attention on:
for item in data:
if item in counts:
counts[item] += 1 # increment count if present
else:
counts[item] = 1
Notice we need to handle two different cases, 1) we have not previously seen a
value, i.e., item in counts
is False, and 2) we have previously seen a value.
This is an example of a more general practice in developing our algorithms -
thinking of the different input/execution situations. That is, we ask ourselves
will our approach handle for instance: even and odd length strings, empty
lists, inputs with dissimilar lengths, the first occurrence and all future
occurrences, etc. A key step in our design process is enumerating all the cases
our program needs to handle (and then making sure our approach does so
correctly).
Here is an improved loop body that uses get
instead of the entire if
statement:
for item in data:
counts[item] = counts.get(item, 0) + 1
What would happen if we tried the following?
d = { 1 : "one", 2 : "two", 3 : "three" }
for key in d:
if key == 2:
d.pop(key)
We would get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
It is generally not a good approach to modify a collection while iterating. Make a new copy instead.
A tuple is
an immutable sequence. Often tuples are heterogeneous structures,
e.g., ("January", 1, 1970)
could represent a date, where specific
slots refer to values with a consistent meaning, e.g., month.
Tuples can be created with parentheses, e.g., (1, 2)
or the named constructor,
tuple()
, or even with just a comma (and no parentheses).
>>> my_tuple = (1, 2, 3, 4)
>>> my_tuple
(1, 2, 3, 4)
>>> another_tuple = ("a", "b", "c", "d")
>>> another_tuple
('a', 'b', 'c', 'd')
>>> my_tuple[0]
1
>>> my_tuple[1]
2
>>> for i in my_tuple:
... print(i)
...
1
2
3
4
>>> my_tuple[1:3]
(2, 3)
Python allows for very concise unpacking of tuples (termed “tuple unpacking”). This is the feature we used when iterating through the items of a dictionary.
>>> var1, var2, var3, var4 = my_tuple
>>> var1
'a'
>>> var4
'd'
We can use this feature to elegantly implement swap:
>>> x = 10
>>> y = 20
>>> (y, x) = (x, y)
>>> x
20
>>> y
10
Note that tuples are immutable, like strings.
>>> my_tuple[0] = 5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> my_tuple.append(6)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'
Recall though that immutability doesn’t prevent reassignment, e.g.,
>>> my_tuple = another_tuple
>>> my_tuple
('a', 'b', 'c', 'd')
An brief summary of major Python types:
Type | Ordered | Mutable | Mutable Values | Typical (but not only) Usage |
---|---|---|---|---|
List | Yes | Yes | Yes | Ordered homogeneous collection |
Dictionary | Yes-ish | Yes | Yes (values, not keys) | Key -> Value lookup |
Tuple | Yes | No | Yes | Heterogeneous (ordered) collection |
Set | No | Yes | No | Membership/Set operations |
What is the deal with Dictionary ordering? As of Python 3.7, dictionaries are specified to maintain their elements in insertion order. So when you iterate through the elements of a Dictionary, the elements will be in a known order.