Class 18: References

Objectives for today

Objects and references

Almost every value in Python is actually an object. ints, floats, etc. are objects, just like strings are objects. For example, here are all the methods available for for integers:

>>> x= -10
>>> dir(x)
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']
>>> help(x)

When we create a new object, we are allocating memory for that object’s data (e.g. the integer above). When we assign that object to a variable that variable is now a reference to that object. Let’s look at the memory model for the following code:

z = 2
x = 1
y = x
x = z

When we reassign a variable we are changing the object the variable “points to”. When we assign one variable to another, e.g. y=x, we are setting both variables to point to the same object in memory. Changing these pointers does not change the underlying objects or result in “copying” objects.

Mutability

We saw that lists are very similar to strings in many respects. But one important difference is mutability. In Python strings are immutable, i.e., they can’t be modified. Python lists are mutable. Let’s see the impact of mutability.

>>> a = [1, 2, 3]
>>> b = a
>>> a[1] = 4
>>> a
[1, 4, 3]
>>> b
[1, 4, 3]

Let’s draw out the memory model. Notice that changes to the list “pointed to” by a are reflected in b, since they both reference the same underlying list object.

As we continue in the semester we will learn how to take advantage of mutability. In the meantime we need to know about the interaction between mutability and aliasing. In the above code, a and b point to the same underlying list. When we modify that list via a or b, the changes are reflected in both variables. We would say a and b are aliased.

Aliasing can occur with function parameters as well.

>>> def aliasing(param):
...     param[1] = 4
... 
>>> a = [1, 2, 3]
>>> aliasing(a)
>>> a
[1, 4, 3]

Let’s check out the memory model.

Since strings, integers, floats, etc. are also objects, why don’t we have the same aliasing problems? Recall that strings are immutable. So are integers, etc. There are no operations on integers that modify the underlying objects, all methods and operators create new integers.

Parameters as references

Let’s consider the function below:

def my_function(a, b):
    return a+b

Recall that a and b are the parameters, specifically “formal parameters”. When we call my_function we:

  1. Evaluate the arguments (or “actual parameters”) left to right
  2. The arguments are bound to the function’s formal parameters (similar to assignment). The formal parameters are effectively new variables that are references the same objects as the actual parameters
  3. Execute the body of the function

When the arguments are immutable, the multiple references to the same objects don’t matter. But as we saw earlier for mutable objects, like lists, function calls create the possibility for aliasing. Modifications applied via the function parameters are reflected in other variables that are references to the same underlying object.

Consider another example:

>>> def my_function(a):
...     a = [0]*5
...     a[0] = 6
... 
>>> x = [1, 2, 3, 4, 5]
>>> my_function(x)
>>> x
[1, 2, 3, 4, 5]

Let’s check out the memory model. In my_function, why isn’t x modified? The statement a = [0] * 5 we are creating a new list object, different than that originally pointed to by a (and x), assigning it to a. Any subsequent change to a effects this new object and not the list pointed to be x.

Shallow vs. deep copies

Why does Python implement variable and parameter assignment via references, termed “shallow copies”? Performance. We don’t need to copy the entire object. Often we just want to “use” the value, not modify it.

What if we need a “deep copy”? That is to perform variable assignment without the potential for aliasing?

For lists we can use slicing. Slices of a list are not shallow copies.

>>> x = [1, 2, 3, 4, 5]
>>> y = x[0:2]
>>> y
[1, 2]
>>> y[0] = 6
>>> y
[6, 2]
>>> x
[1, 2, 3, 4, 5]
>>> y = x[:]
>>> y[4] = 12
>>> y
[1, 2, 3, 4, 12]
>>> x
[1, 2, 3, 4, 5]

Note that slices aren’t truly deep copies. If the list contains mutable values, e.g. other lists, those values are not deep copied. Check out the the memory model for the following example:

>>> x = [1, 2, [3, 4], 5, 6]
>>> y = x[:]
>>> y[2][0] = 7
>>> y[3] = 8
>>> y
[1, 2, [7, 4], 8, 6]
>>> x
[1, 2, [7, 4], 5, 6]

For truly deep copies, you will need the copy module.

PI Questions (References)1

Starting to think swap

For the code below, what are the values after invoking swap:

>>> def swap(a, b):
...     temp = a
...     a = b
...     b = temp
... 
>>> x = 10
>>> y = 20
>>> swap(x, y)
>>> x
10
>>> y
20

That was a surprising result. Let’s visualize the memory model. So can we not implement swap in Python? No, we can swap using the tuple data structure. Tuples are sequences of a fixed length (unlike lists, which can grow or shrink) and can easily be “packed” and “unpacked”. Here we use tuple unpacking on the left-hand side to effect the swap.

>>> x = 10
>>> y = 20
>>> (y, x) = (x, y)
>>> x
20
>>> y
10

This code first constructs a tuple that points to the previous values of x and y and then unpacks that tuple, assigning to y and x. It is effectively the same as

x = 10
y = 20
temp = (x, y)
y = temp[0]
x = temp[1]

A deeper example

Consider the following two functions:

def insert_after(a_list, n1, n2):
    """
    Return a new list consisting of all elements from a_list and a copy of n2 after each occurrence of n1.
    
    (list[int], int, int) -> list of int
    
    >>> insert_after([3, 4, 5], 3, 10) 
    [3, 10, 4, 5]
    """

and

def insert_after2(a_list, n1, n2):
    """
    Insert n2 after each occurrence of n1 in a_list.
    
    (list[int], int, int) -> NoneType

    >>> x = [3, 4, 5]
    >>> insert_after2(x, 3, 10) 
    >>> x
    [3, 10, 4, 5]
    """

How would the implementations differ? The first will need to create a copy of the list to be returned. The second would directly modify its argument. Show possible implementations…

def insert_after2(a_list, n1, n2):
    """
    Return a new list consisting of all elements from a_list,
    plus a copy of n2 after each occurrence of n1.
    
    (list of int, int, int) -> list of int
    
    >>> insert_after([3, 4, 5], 3, 10) 
    [3, 10, 4, 5]
    """
    new_list = []
    for element in a_list:
        new_list.append(element)
        if element == n1:
            new_list.append(n2)
    return new_list

def insert_after2(a_list, n1, n2):
    """
    Insert n2 after each occurrence of n1 in a_list.
    
    (list of int, int, int) -> NoneType

    >>> x = [3, 4, 5]
    >>> insert_after2(x, 3, 10) 
    >>> x
    [3, 10, 4, 5]
    """
    i=0
    while i < len(a_list):
        if a_list[i] == n1:
            a_list.insert(i+1, n2)
            i += 1
        i += 1

Why a while loop in the second function? Since we are modifying the list, the indices are changing and so we need evaluate the stopping conditional every iteration.