Class 6

Sequences, including methods

Objectives for today

Use sequence functions on strings and lists
Use sequence, string and list methods to transform those objects
Differentiate between mutable and immutable types
Predict how immutable and mutable values referenced by variables will be affected by assignments, 1.perators, functions and methods
Practice implementing functions that loop over strings

Sequence operations

We noted last time that strings and lists are both sequences, that is both are instances of the sequence ADT. As a result they both support a number of common operators and functions. From the Python documentation:

Operation	Result
`s + t`	the concatenation of `s` and `t`
`s * n` or `n * s`	equivalent to adding `s` to itself `n` times
`s[i]`	i^th item of `s`, origin 0
`s[i:j]`	slice of `s` from `i` to `j`
`s[i:j:k]`	slice of `s` from `i` to `j` with step `k`
`len(s)`	length of `s`
`min(s)`	smallest item of `s`
`max(s)`	largest item of `s`
`sorted(s)`	a new list with the items from `s` in sorted ascending order

For example:

list_of_strings = ["this", "is", "a", "list", "of", "strings"]
len(list_of_strings)
min(list_of_strings)
max(list_of_strings)

'a'

'this'

These functions can be applied to different sequences, i.e., both strings and lists. Some of the functions might have additional requirements for the sequence. For example, we can only apply max, min and sorted to sequences of comparable types (i.e., types that can be compared to each other). Try max([["a", 1]). What happens and why? We get an error because we can’t compare a string ("a") and a number. Another example, not shown above is sum. As its name suggests, sum computes the sum of all the elements in the sequence. It can only be applied to sequences of numeric types, i.e., it can’t be used with strings.

help(sum)

Help on built-in function sum in module builtins:

sum(iterable, /, start=0)
    Return the sum of a 'start' value (default: 0) plus an iterable of numbers
    
    When the iterable is empty, return the start value.
    This function is intended specifically for use with numeric values and may
    reject non-numeric types.

Sequences as objects

String methods

In the examples above, we invoked a specific function that can take different values (perhaps even of different types) as an argument. An alternate approach is to invoke different functions on a specific value, i.e., invoke a method on an object. We can think of a method as a function specific to a particular type. Methods are invoked or called with the object.method syntax. For example:

s = "Hi CSCI0146"
s.index("C")
s.count("0")

This approach to organizing our program is called “object-oriented programming” (OOP), because the data and the functions are contained within/associated with objects. And for the purposes of this class all objects of a specific type, e.g., all string objects, have the same set of methods.

Methods are very similar to the functions we have been working with: they can accept additional parameters (in addition to the object on which they were invoked, also termed the “receiver”) and you can obtain their docstrings with help. But notice that we need to specify the object (receiver) itself, or the type (class) of the object we are invoking the method on.

help(s.index)
help(str.index)

So how can we learn more about the methods for an object or type?

Python documentation, e.g. for string methods
dir function
help function (as we just saw)

The dir function will return a list of all the methods available for an object, say a string.

dir(s)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

What do you notice? We see the index and count methods we noted earlier, but also many more. The methods with “double underscores” (sometimes called a “dunder”) are special methods used by Python’s internal tools for implementing operators (e.g., __add__ is used by +) and other built-in functions. The other methods are specific to strings, that is they extend the sequence ADT with functionality relevant to strings but not other sequences. For example, as their names suggests, upper and lower transform the string to upper and lower case respectively.

s = "Hi CS146"
s.lower()
s.upper()

'hi cs146'

'HI CS146'

Combining method calls

How can we apply a sequence of method calls? Recall that we can use the return value from one function call as the argument to another, e.g.

from random import randint
len(range(randint(1, 5)))

The equivalent pattern for method calls would be the following. You can see why this is often called “chaining”…

s.lower().capitalize()

'Hi cs146'

What about the receiver? The receiver can be any expression, it is not required to a variable or literal. For example:

("gI".replace("g","h") + " CS" + str(292//2)).swapcase()

'Hi cs146'

Here we are invoking the swapcase method on the expression inside the parentheses. The expression inside the parentheses is evaluated first to produce the string "hI CS146", and then the swapcase method is invoked on that string to produce the final result. A good strategy for evaluating these types of expressions is to work step-by-step keeping in mind the order in which Python evaluates expressions. In general, Python evaluates expressions from left to right while observing PEMDAS. Remember that a set of parentheses are themselves a (single) expression.

List methods

A list has a similar set of methods, including index and count shared with other sequences. But it also has its own set of specialized methods.

dir(list)

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

Let’s look at some methods, e.g., specifically pop:

help(list.pop)

Help on method_descriptor:

pop(self, index=-1, /)
    Remove and return item at index (default last).
    
    Raises IndexError if list is empty or index is out of range.

So what does this do? It “pops” the last element off the list, optionally specifying a specific index to pop, e.g.:

list_of_strings = ["this", "is", "a", "list", "of", "strings"]
list_of_strings.pop()
list_of_strings

'strings'

['this', 'is', 'a', 'list', 'of']

Notice that pop modifies the list on which it is invoked. Also note in this case if you don’t assign the result of pop to a variable it is lost forever. That is unlike strings, which are immutable, lists are mutable and many of the methods, including sort and reverse modify the list on which they are invoked instead of returning a new list (sort and reverse actually return None).

Is there a counterpart to pop. Yes. append.

list_of_strings.append("strings")
list_of_strings

['this', 'is', 'a', 'list', 'of', 'strings']

list_of_strings.sort()
list_of_strings

['a', 'is', 'list', 'of', 'strings', 'this']

Compare the above to using the sorted method, which returns a copy of the list. Notice that when using sorted the list_of_strings argument remains unmodified.

list_of_strings = ['this', 'is', 'a', 'list', 'of', 'strings']
sorted(list_of_strings)
list_of_strings

['a', 'is', 'list', 'of', 'strings', 'this']

['this', 'is', 'a', 'list', 'of', 'strings']

One more method you will need for the in-class questions:

help(list.remove)

Help on method_descriptor:

remove(self, value, /)
    Remove first occurrence of value.
    
    Raises ValueError if the value is not present.

To learn more about list methods check out the Python documentation.

(Im)mutability

s = "Bruce"
t = s.lower()
t
s

'bruce'

'Bruce'

Unlike lists, strings in Python are immutable, that is they can’t be changed. The string methods that look like they are changing the string are actually creating a new string object. Check out the Python Tutor memory model picture for the example above.

We can see the same immutability property in the examples below (the Python Tutor memory module picture):

s = "test"
s.upper()
s

'TEST'

'test'

a = "hi"
b = a
a = "bye"
b

'hi'

This is something we will talk about more as the course progresses. For the moment we want to be aware that some types, like strings, are immutable and others, like lists, are mutable. For the former, methods don’t change the object on which they are invoked, instead the method creates a new object. For the latter, some methods will modify the object, others will not.

Building up strings

In many of our functions will use a similar pattern in which we build up a new string piece-by-piece by appending characters to a result string (initialized as the empty string or ""). We saw some examples of this pattern previously and now want to implement another example. Specifically we want to write a function named password_gen that takes a single parameter length and generates a random password (as a string) of length characters.

How could we implement our password generator?

Show a possible approach

There are many ways, but a simple one is to use randint to index into a string of allowed characters. More formally, this is an example of sampling with replacement, that is every time we sample an item from a set of potential items, e.g. the letters, we replace it in the set so it could be sampled again in the future. An alternative is “sampling without replacement”, in which each item can only be selected once. For this application, why would sampling with replacement be preferred?

As always we want to solve this problem in several steps, instead of trying to tackle the whole problem at once. What are some possible intermediate steps? As an example, I would start by defining a constant CHARS with the allowed characters.

Show possible next steps

Define a constant CHARS with the allowed characters
Create a version of password_gen to create a string of the specified length with a fixed character
Enhance password_gen to create a string with random characters

Now let’s implement those steps.

Show a possible implementation

Define a constant CHARS with the allowed characters

CHARS = "abcdefghijklmnopqrstuvwxyz0123456789_!@#$%^&*"

Create a version of password_gen to create a string of the specified length with a fixed character

CHARS = "abcdefghijklmnopqrstuvwxyz0123456789_!@#$%^&*"

def password_gen(length):
    result = ""
    for i in range(length):
        result = result + CHARS[0]
        # We could also write this as
        # result += CHARS[0]
        # result += ... is shorthand for result = result + ...
    return result

Enhance password_gen to create a string with random characters

from random import randint

CHARS = "abcdefghijklmnopqrstuvwxyz0123456789_!@#$%^&*"

def password_gen(length):
    result = ""
    for i in range(length):
        result = result + CHARS[randint(0, len(CHARS)-1)]
    return result

Recall that randint has an inclusive end, and so to not exceed the length of CHARS we need to use len(CHARS)-1 as the end argument.

And finally add the finishing touches, e.g. docstrings.

from random import randint

CHARS = "abcdefghijklmnopqrstuvwxyz0123456789_!@#$%^&*"

def password_gen(length):
    """
    Generate a random password

    Args:
        length: number of characters in the password

    Returns:
        Password string
    """
    result = ""
    for i in range(length):
        result = result + CHARS[randint(0, len(CHARS)-1)]
    return result

What about lists

We will apply a similar pattern for building up lists. Could we just use the our implementation for strings, changing the initial empty string to an empty list? That is do the following?

def list_gen(length):
    result = []
    for i in range(length):
        result = result + CHARS[randint(0, len(CHARS)-1)]
    return result

Unfortunately, no. While the + operator is defined for lists, it only supports concatenating two lists. If we try the following we will get an error:

result = []
result = result + "a"

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[20], line 2
      1 result = []
----> 2 result = result + "a"

TypeError: can only concatenate list (not "str") to list

To use concatenation, we would need to turn the letter into a list with a single element like shown below. Notice we are using square brackets to create a list with a single item, then creating a new list by concatenating result and our single item list (containing a single letter).

result = result + ["a"]

The full implementation would look like:

def list_gen(length):
    result = []
    for i in range(length):
        result = result + [CHARS[randint(0, len(CHARS)-1)]]
    return result

This is not a very efficient approach, however. We are creating two new lists in loop iteration, the single element list and result of the concatenation. A more typical, and more efficient, approach is to use append to add new items to the list.

But, instead of starting with the empty string and using the + operator, we will start with the empty list and typically use the append method. With append we are taking advantage of list’s mutability to grow a single list over time instead of creating many new lists. The final implementation:

def list_gen(length):
    result = []
    for i in range(length):
        result.append(CHARS[randint(0, len(CHARS)-1)])
    return result

Python Note: Subtle differences in var += ... and var = var + ...

The following loop body would also work for list_gen, but for subtle reasons:

result += CHARS[randint(0, len(CHARS)-1)]

We describe result += ... as a shorthand for result = result +, but it is actually a distinct operator that can have its own implementation. For lists, Python optimizes += by implementing it with the extend method, that is result += where result is a list is equivalent to result.extend(...). The extend method takes any sequence as its argument (and in fact any iterable, a broader ADT that just sequences), which is why it works with a string. += for lists is not shorthand for the following, which as we noted above would not work:

result = result + CHARS[randint(0, len(CHARS)-1)]