Lecture 13: References

Objectives for today

Objects and references

Almost every value in Python is actually an object. ints, floats, etc., are objects, just like strings are objects. For example, here are all the methods available for integers:

>>> x= -10
>>> dir(x)
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']
>>> help(x)

When we create a new object, we are allocating memory for that object’s data (e.g., the integer above). When we assign that object to a variable that variable is now a reference to that object. Let’s look at the memory model for the following code:

z = 2
x = 1
y = x
x = z

When we reassign a variable we are changing the object the variable “points to”. When we assign one variable to another, e.g. y=x, we are setting both variables to point to the same object in memory. Changing these pointers does not change the underlying objects.

Mutability

We saw that lists are very similar to strings in many respects. But one important difference is mutability. In Python strings are immutable, i.e., they can’t be modified. Python lists are mutable. Let’s see the impact of mutability. Notice that changes to the list “pointed to” by a are reflected in b, since they both reference the same underlying list object.

>>> a = [1, 2, 3]
>>> b = a
>>> a[1] = 4
>>> a
[1, 4, 3]
>>> b
[1, 4, 3]

Let’s draw out the memory model.

In the above code, a and b point to the same underlying list. When we modify that list via a or b, the changes are reflected in both variables. We would say a and b are aliased.

Aliasing can occur with function parameters as well.

>>> def aliasing(param):
...     param[1] = 4
... 
>>> a = [1, 2, 3]
>>> aliasing(a)
>>> a
[1, 4, 3]

Let’s check out the memory model.

Since strings, integers, floats, etc., are also objects, why don’t we have the same aliasing problems? Recall that strings are immutable. So are integers, etc. There are no operations on integers that modify the underlying objects, all methods and operators create new integers.

Parameters as references

Let’s consider the function below:

def my_function(a, b):
    return a+b

Recall that a and b are the parameters, specifically “formal parameters”. When we call my_function we:

  1. Evaluate the arguments (or “actual parameters”)
  2. The arguments are bound to the function’s formal parameters (similar to assignment). The formal parameters are effectively new variables that are references to the same objects as the actual parameters
  3. Execute the body of the function

When the arguments are immutable, the multiple references to the same objects don’t matter. But as we saw earlier for mutable objects, like lists, function calls create the possibility for aliasing. Modifications applied via the function parameters are reflected in other variables that are references to the same underlying object.

For example, consider the following:

def my_function(a, b):
    a += b
    return True

Sending lists as arguments, we see that this code changes the argument that variable a references:

>>> x = [1,2,3]
>>> y = [4,5]
>>> my_function(x, y)
True
>>> x
[1, 2, 3, 4, 5]

Consider another example:

>>> def my_function(a):
...     a = [0]*5   # variable a now points to a new object
...     a[0] = 6
... 
>>> x = [1, 2, 3, 4, 5]
>>> my_function(x)
>>> x
[1, 2, 3, 4, 5]

Let’s check out the memory model.

In my_function, why isn’t x modified? The statement a = [0] * 5 we are creating a new list object, different than that originally pointed to by a (and x), assigning it to a. Any subsequent change to a affects this new object.

Shallow vs. deep copies

Why does Python implement variable and parameter assignment via references, termed “shallow copies”? Performance. We don’t need to copy the entire object. Often we just want to “use” the value, not modify it.

What if we need a “deep copy”? That is, to perform variable assignment without the potential for aliasing?

For lists we can use slicing. Slices of a list are not shallow copies.

>>> x = [1, 2, 3, 4, 5]
>>> y = x[0:2]
>>> y
[1, 2]
>>> y[0] = 6
>>> y
[6, 2]
>>> x
[1, 2, 3, 4, 5]
>>> y = x[:]
>>> y[4] = 12
>>> y
[1, 2, 3, 4, 12]
>>> x
[1, 2, 3, 4, 5]

Note that slices aren’t truly deep copies. If the list contains mutable values, e.g., other lists, those values are not deep copied. The memory model for the following example:

>>> x = [1, 2, [3, 4], 5, 6]
>>> y = x[:]
>>> y[2][0] = 7
>>> y[3] = 8
>>> y
[1, 2, [7, 4], 8, 6]
>>> x
[1, 2, [7, 4], 5, 6]

For truly deep copies, you will need the copy module.

Starting to think swap

For the code below, what are the values after invoking swap:

>>> def swap(a, b):
...     temp = a
...     a = b
...     b = temp
... 
>>> x = 10
>>> y = 20
>>> swap(x, y)
>>> x
10
>>> y
20

What is happening here? Let’s visualize the memory model. So can we not implement swap in Python? Recall that we can implement swap using tuples:

>>> x = 10
>>> y = 20
>>> (y, x) = (x, y)
>>> x
20
>>> y
10

A deeper example

For those interested, this optional section provides an example constrasting modifying a list parameter vs returning a new copy of a list.

Consider the following two functions:

def insert_after(a_list, n1, n2):
    """
    Insert n2 after each occurrence of n1 in a_list.
    
    (list of int, int, int) -> NoneType
    """

and

def insert_after2(a_list, n1, n2):
    """
    Return a new list consisting of all elements from a_list,
    plus a copy of n2 after each occurrence of n1.
    
    (list of int, int, int) -> list of int
    
    >>> insert_after2([3, 4, 5], 3, 10) 
    [3, 10, 4, 5]
    """

How would the implementations differ? The first would directly modify its parameter. The second will need to create a copy of the list to be returned. Why a while loop in the first function below? Since we are modifying the list, the indices are changing and so need evaluate the stopping conditional every iteration.

def insert_after(a_list, n1, n2):
    """
    Insert n2 after each occurrence of n1 in a_list.
    
    (list of int, int, int) -> NoneType
    """
    i=0
    while i < len(a_list):
        if a_list[i] == n1:
            a_list.insert(i+1, n2)
            i += 1
        i += 1

def insert_after2(a_list, n1, n2):
    """
    Return a new list consisting of all elements from a_list,
    plus a copy of n2 after each occurrence of n1.
    
    (list of int, int, int) -> list of int
    
    >>> insert_after2([3, 4, 5], 3, 10) 
    [3, 10, 4, 5]
    """
    new_list = []
    for element in a_list:
        new_list.append(element)
        if element == n1:
            new_list.append(n2)
    return new_list

Summary

  1. Objects and references
  2. Mutability and aliasing
  3. Shallow and deep copies
  1. Python copy module
  2. Python memory models: example 1, example 2, example 3, example 4, example 5, example 6, Question 1, Question 2

Supplemental Reading (optional)