Class 11

Tuples and the Python memory model

Objectives for today

Explain when tuples are used
Describe tuples as an immutable ordered collection of objects of arbitrary type
Create a tuple
Explain and use functions and operators on tuples
Use tuples to return multiple values from a function
Describe the Python memory model
Explain the behavior of references
Differentiate between mutable and immutable types
Predict how immutable and mutable values referenced by variables will be affected by assignments, operators, functions and methods

Tuples

A tuple is an immutable sequence. That said, tuples are not just used as a immutable list. Often tuples are heterogeneous structures, e.g. ("January", 1, 1970) could represent a date, where specific slots refer to values with a consistent meaning, e.g. month.

Tuples can be created with parentheses, e.g. (1, 2) or the named constructor, tuple(), or often just with a comma (and no parentheses).

>>> my_tuple = (1, 2, 3, 4)
>>> my_tuple
(1, 2, 3, 4)
>>> another_tuple = ("a", "b", "c", "d")
>>> another_tuple
('a', 'b', 'c', 'd')
>>> my_tuple[0]
1
>>> my_tuple[1]
2
>>> for i in my_tuple:
...     print(i)
... 
1
2
3
4
>>> my_tuple[1:3]
(2, 3)

Python allows for very concise unpacking of tuples into individual variables (termed “tuple unpacking”). This is the feature we used when iterating through the items of a dictionary.

>>> var1, var2, var3, var4 = another_tuple
>>> var1
'a'
>>> var4
'd'

We can use this feature to elegantly implement swap:

>>> x = 10
>>> y = 20
>>> (y, x) = (x, y)
>>> x
20
>>> y
10

Recall that tuples are immutable, like strings.

>>> my_tuple[0] = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> my_tuple.append(6)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'

But as with strings, that immutability doesn’t prevent reassignment of a variable “pointing to” a tuple, just changing the elements of a tuple.

>>> my_tuple = another_tuple
>>> my_tuple
('a', 'b', 'c', 'd')

Histograms: Putting it together

Review histogram.py. Let’s particularly focus our attention on:

for item in data:
    if item in counts:
        counts[item] += 1  # Increment count if present
    else:
        counts[item] = 1

Notice we need to handle two different cases:

we have not previously seen a value, i.e. item in counts is False, and
we have previously seen a value.

A (key) part of developing and implementing an algorithm is enumerating all the cases our program needs to handle (and then making sure our approach does so correctly). That is asking ourselves will our approach handle for instance: even and odd length strings, empty lists, inputs with dissimilar lengths, the first occurrence and all future occurrences, etc.

What is going on in the following loop?

for key, count in counts.items():
    # Arguments to print are automatically converted to strings, the following is
    # equivalent to print(str(key) + "\t" + str(counts[key]))
    print(key, count, sep="\t")

Recall from last time that the items method returns an iterable of (key, value) tuples, i.e., each iteration of the above loop with assign the next key-value pair in the loop variable as a tuple. By specified multiple variables as the loop variables, we leverage tuple unpacking to automatically split that tuple into its individual parts, assigning each part to a different variable.

Modifying a collection while iterating?

As we work more and more with mutable data structures, we may encounter (or be tempted to create) a situation in which we modify a collection while iterating through it. For example, what would happen if we tried the following? We get an error!

d = { 1 : "one", 2 : "two", 3 : "three" }
for key in d:
    if key == 2:
        d.pop(key)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[1], line 2
      1 d = { 1 : "one", 2 : "two", 3 : "three" }
----> 2 for key in d:
      3     if key == 2:
      4         d.pop(key)

RuntimeError: dictionary changed size during iteration

Depending on the data structure and the operation, we may or may not get an error. However, we will have a hard time reasoning about the behavior. It is generally not a good approach to modify a collection while iterating. Make a copy instead, something we talk more about next!

Objects and references

Almost every value in Python is actually an object. ints, floats, etc. are objects, just like strings are objects. For example, here are all the methods available for for integers:

x= -10
dir(x)

['__abs__',
 '__add__',
 '__and__',
 '__bool__',
 '__ceil__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floor__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__le__',
 '__lshift__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__round__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__trunc__',
 '__xor__',
 'as_integer_ratio',
 'bit_count',
 'bit_length',
 'conjugate',
 'denominator',
 'from_bytes',
 'imag',
 'numerator',
 'real',
 'to_bytes']

When we create a new object, we are allocating memory for that object’s data (e.g. the integer above). When we assign that object to a variable that variable is now a reference to that object. Let’s look at the memory model for the following code. When we reassign a variable we are changing the object the variable “points to”. When we assign one variable to another, e.g. y=x, we are setting both variables to point to the same object in memory. Changing these pointers does not change the underlying objects or result in “copying” objects.

Mutability

We saw that lists are very similar to strings in many respects. But one important difference is mutability. In Python strings are immutable, i.e., they can’t be modified. Python lists are mutable. Let’s see the impact of mutability.

a = [1, 2, 3]
b = a
a[1] = 4
a

[1, 4, 3]

a has the value we would expect, but notice that b has also changed!

[1, 4, 3]

Let’s draw out the memory model. Notice that changes to the list “pointed to” by a are reflected in b, since they both reference the same underlying list object. As we continue in the semester we will learn how to take advantage of mutability. In the meantime we need to know about the interaction between mutability and aliasing. In the above code, a and b point to the same underlying list. When we modify that list via a or b, the changes are reflected in both variables. We would say a and b are aliased.

Parameters as references

Aliasing can occur with function parameters as well.

def aliasing(param):
    param[1] = 4
a = [1, 2, 3]
aliasing(a)
a

[1, 4, 3]

Again, let’s check out the memory model. Since strings, integers, floats, etc. are also objects, why don’t we have the same aliasing problems? Recall that strings are immutable. So are integers, etc. There are no operations on integers that modify the underlying objects, all methods and operators create new integers.

As a more general example, consider the following function:

def my_function(a, b):
    return a+b

Recall that a and b are the parameters, specifically “formal parameters”. When we call my_function we:

Evaluate the arguments (or “actual parameters”) left to right
The arguments are bound to the function’s formal parameters (similar to assignment). The formal parameters are effectively new variables that are references the same objects as the actual parameters
Execute the body of the function

When the arguments are immutable, the multiple references to the same objects don’t matter. But as we saw earlier for mutable objects, like lists, function calls create the possibility for aliasing. Modifications applied via the function parameters are reflected in other variables that are references to the same underlying object.

Consider another example:

def my_function(a):
    a = [0]*5
    a[0] = 6
x = [1, 2, 3, 4, 5]
my_function(x)
x

[1, 2, 3, 4, 5]

Let’s check out the memory model. In my_function, why isn’t x modified? The statement a = [0] * 5 we are creating a new list object, different than that originally pointed to by a (and x), assigning it to a. Any subsequent change to a effects this new object and not the list pointed to be x.

Shallow vs. deep copies

Why does Python implement variable and parameter assignment via references, termed “shallow copies”? Performance. We don’t need to copy the entire object. Often we just want to “use” the value, not modify it.

What if we need a “deep copy”? That is to perform variable assignment without the potential for aliasing For lists we can use slicing. Slices of a list are not shallow copies.

>>> x = [1, 2, 3, 4, 5]
>>> y = x[0:2]
>>> y
[1, 2]
>>> y[0] = 6
>>> y
[6, 2]
>>> x
[1, 2, 3, 4, 5]
>>> y = x[:]
>>> y[4] = 12
>>> y
[1, 2, 3, 4, 12]
>>> x
[1, 2, 3, 4, 5]

Note that slices aren’t truly deep copies. If the list contains mutable values, e.g. other lists, those values are not deep copied. Check out the the memory model for the following example:

>>> x = [1, 2, [3, 4], 5, 6]
>>> y = x[:]
>>> y[2][0] = 7
>>> y[3] = 8
>>> y
[1, 2, [7, 4], 8, 6]
>>> x
[1, 2, [7, 4], 5, 6]

For truly deep copies, i.e., of arbitrary nested lists, you will want to use the copy module.

A deeper example

Consider the following two functions:

def insert_after(a_list, n1, n2):
    """
    Return a new list consisting of all elements from a_list and a copy of n2 after each occurrence of n1.
    
    (list[int], int, int) -> list of int
    
    >>> insert_after([3, 4, 5], 3, 10) 
    [3, 10, 4, 5]
    """

def insert_after2(a_list, n1, n2):
    """
    Insert n2 after each occurrence of n1 in a_list.
    
    (list[int], int, int) -> NoneType

    >>> x = [3, 4, 5]
    >>> insert_after2(x, 3, 10) 
    >>> x
    [3, 10, 4, 5]
    """

How would the implementations differ? The first will need to create a copy of the list to be returned. The second would directly modify its argument. How might you approach the implementations differently?

Show possible implementations

def insert_after(a_list, n1, n2):
    """
    Return a new list consisting of all elements from a_list,
    plus a copy of n2 after each occurrence of n1.
    
    (list of int, int, int) -> list of int
    
    >>> insert_after([3, 4, 5], 3, 10) 
    [3, 10, 4, 5]
    """
    new_list = []
    for element in a_list:
        new_list.append(element)
        if element == n1:
            new_list.append(n2)
    return new_list

def insert_after2(a_list, n1, n2):
    """
    Insert n2 after each occurrence of n1 in a_list.
    
    (list of int, int, int) -> NoneType

    >>> x = [3, 4, 5]
    >>> insert_after2(x, 3, 10) 
    >>> x
    [3, 10, 4, 5]
    """
    i=0
    while i < len(a_list):
        if a_list[i] == n1:
            a_list.insert(i+1, n2)
            i += 1
        i += 1

Why a while loop in the second function? Recall our note last time about changing a collection while iterating through it with a for loop. Since we are modifying the list, the indices are changing and so we need (re)evaluate the stopping conditional every iteration.