= ["this", "is", "a", "list", "of", "strings"]
list_of_strings len(list_of_strings)
min(list_of_strings)
max(list_of_strings)
6
'a'
'this'
Sequences, including methods
We noted last time that strings and lists are both sequences, that is both are instances of the sequence ADT. As a result they both support a number of common operators and functions. From the Python documentation:
Operation | Result |
---|---|
s + t |
the concatenation of s and t |
s * n or n * s |
equivalent to adding s to itself n times |
s[i] |
ith item of s , origin 0 |
s[i:j] |
slice of s from i to j |
s[i:j:k] |
slice of s from i to j with step k |
len(s) |
length of s |
min(s) |
smallest item of s |
max(s) |
largest item of s |
sorted(s) |
a new list with the items from s in sorted ascending order |
For example:
list_of_strings = ["this", "is", "a", "list", "of", "strings"]
len(list_of_strings)
min(list_of_strings)
max(list_of_strings)
6
'a'
'this'
These functions can be applied to different sequences, i.e., both strings and lists. Some of the functions might have additional requirements for the sequence. For example, we can only apply max
, min
and sorted
to sequences of comparable types (i.e., types that can be compared to each other). Try max([["a", 1])
. What happens and why? We get an error because we can’t compare a string ("a"
) and a number. Another example, not shown above is sum
. As its name suggests, sum
computes the sum of all the elements in the sequence. It can only be applied to sequences of numeric types, i.e., it can’t be used with strings.
Help on built-in function sum in module builtins:
sum(iterable, /, start=0)
Return the sum of a 'start' value (default: 0) plus an iterable of numbers
When the iterable is empty, return the start value.
This function is intended specifically for use with numeric values and may
reject non-numeric types.
In the examples above, we invoked a specific function that can take different values (perhaps even of different types) as an argument. An alternate approach is to invoke different functions on a specific value, i.e., invoke a method on an object. We can think of a method as a function specific to a particular type. Methods are invoked or called with the object.method
syntax. For example:
This approach to organizing our program is called “object-oriented programming” (OOP), because the data and the functions are contained within/associated with objects. And for the purposes of this class all objects of a specific type, e.g., all string objects, have the same set of methods.
Methods are very similar to the functions we have been working with: they can accept additional parameters (in addition to the object on which they were invoked, also termed the “receiver”) and you can obtain their docstrings with help
. But notice that we need to specify the object (receiver) itself, or the type (class) of the object we are invoking the method on.
So how can we learn more about the methods for an object or type?
dir
functionhelp
function (as we just saw)The dir
function will return a list of all the methods available for an object, say a string.
['__add__',
'__class__',
'__contains__',
'__delattr__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__getitem__',
'__getnewargs__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__iter__',
'__le__',
'__len__',
'__lt__',
'__mod__',
'__mul__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__rmod__',
'__rmul__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'capitalize',
'casefold',
'center',
'count',
'encode',
'endswith',
'expandtabs',
'find',
'format',
'format_map',
'index',
'isalnum',
'isalpha',
'isascii',
'isdecimal',
'isdigit',
'isidentifier',
'islower',
'isnumeric',
'isprintable',
'isspace',
'istitle',
'isupper',
'join',
'ljust',
'lower',
'lstrip',
'maketrans',
'partition',
'removeprefix',
'removesuffix',
'replace',
'rfind',
'rindex',
'rjust',
'rpartition',
'rsplit',
'rstrip',
'split',
'splitlines',
'startswith',
'strip',
'swapcase',
'title',
'translate',
'upper',
'zfill']
What do you notice? We see the index
and count
methods we noted earlier, but also many more. The methods with “double underscores” (sometimes called a “dunder”) are special methods used by Python’s internal tools for implementing operators (e.g., __add__
is used by +
) and other built-in functions. The other methods are specific to strings, that is they extend the sequence ADT with functionality relevant to strings but not other sequences. For example, as their names suggests, upper
and lower
transform the string to upper and lower case respectively.
How can we apply a sequence of method calls? Recall that we can use the return value from one function call as the argument to another, e.g.
The equivalent pattern for method calls would be the following. You can see why this is often called “chaining”…
What about the receiver? The receiver can be any expression, it is not required to a variable or literal. For example:
Here we are invoking the swapcase
method on the expression inside the parentheses. The expression inside the parentheses is evaluated first to produce the string "hI CS146"
, and then the swapcase
method is invoked on that string to produce the final result. A good strategy for evaluating these types of expressions is to work step-by-step keeping in mind the order in which Python evaluates expressions. In general, Python evaluates expressions from left to right while observing PEMDAS. Remember that a set of parentheses are themselves a (single) expression.
A list
has a similar set of methods, including index
and count
shared with other sequences. But it also has its own set of specialized methods.
['__add__',
'__class__',
'__class_getitem__',
'__contains__',
'__delattr__',
'__delitem__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__getitem__',
'__gt__',
'__hash__',
'__iadd__',
'__imul__',
'__init__',
'__init_subclass__',
'__iter__',
'__le__',
'__len__',
'__lt__',
'__mul__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__reversed__',
'__rmul__',
'__setattr__',
'__setitem__',
'__sizeof__',
'__str__',
'__subclasshook__',
'append',
'clear',
'copy',
'count',
'extend',
'index',
'insert',
'pop',
'remove',
'reverse',
'sort']
Let’s look at some methods, e.g., specifically pop
:
Help on method_descriptor:
pop(self, index=-1, /)
Remove and return item at index (default last).
Raises IndexError if list is empty or index is out of range.
So what does this do? It “pops” the last element off the list, optionally specifying a specific index to pop, e.g.:
list_of_strings = ["this", "is", "a", "list", "of", "strings"]
list_of_strings.pop()
list_of_strings
'strings'
['this', 'is', 'a', 'list', 'of']
Notice that pop
modifies the list on which it is invoked. Also note in this case if you don’t assign the result of pop
to a variable it is lost forever. That is unlike strings, which are immutable, list
s are mutable and many of the methods, including sort
and reverse
modify the list on which they are invoked instead of returning a new list (sort
and reverse
actually return None
).
Is there a counterpart to pop
. Yes. append
.
Compare the above to using the sorted
method, which returns a copy of the list. Notice that when using sorted
the list_of_strings
argument remains unmodified.
list_of_strings = ['this', 'is', 'a', 'list', 'of', 'strings']
sorted(list_of_strings)
list_of_strings
['a', 'is', 'list', 'of', 'strings', 'this']
['this', 'is', 'a', 'list', 'of', 'strings']
One more method you will need for the in-class questions:
Help on method_descriptor:
remove(self, value, /)
Remove first occurrence of value.
Raises ValueError if the value is not present.
To learn more about list methods check out the Python documentation.
Unlike lists, strings in Python are immutable, that is they can’t be changed. The string methods that look like they are changing the string are actually creating a new string object. Check out the Python Tutor memory model picture for the example above.
We can see the same immutability property in the examples below (the Python Tutor memory module picture):
This is something we will talk about more as the course progresses. For the moment we want to be aware that some types, like strings, are immutable and others, like lists, are mutable. For the former, methods don’t change the object on which they are invoked, instead the method creates a new object. For the latter, some methods will modify the object, others will not.
In many of our functions will use a similar pattern in which we build up a new string piece-by-piece by appending characters to a result string (initialized as the empty string or ""
). We saw some examples of this pattern previously and now want to implement another example. Specifically we want to write a function named password_gen
that takes a single parameter length
and generates a random password (as a string) of length
characters.
How could we implement our password generator?
There are many ways, but a simple one is to use randint
to index into a string of allowed characters. More formally, this is an example of sampling with replacement, that is every time we sample an item from a set of potential items, e.g. the letters, we replace it in the set so it could be sampled again in the future. An alternative is “sampling without replacement”, in which each item can only be selected once. For this application, why would sampling with replacement be preferred?
As always we want to solve this problem in several steps, instead of trying to tackle the whole problem at once. What are some possible intermediate steps? As an example, I would start by defining a constant CHARS
with the allowed characters.
CHARS
with the allowed characterspassword_gen
to create a string of the specified length with a fixed characterpassword_gen
to create a string with random charactersNow let’s implement those steps.
Define a constant CHARS
with the allowed characters
Create a version of password_gen
to create a string of the specified length with a fixed character
Enhance password_gen
to create a string with random characters
from random import randint
CHARS = "abcdefghijklmnopqrstuvwxyz0123456789_!@#$%^&*"
def password_gen(length):
result = ""
for i in range(length):
result = result + CHARS[randint(0, len(CHARS)-1)]
return result
Recall that randint
has an inclusive end, and so to not exceed the length of CHARS
we need to use len(CHARS)-1
as the end argument.
And finally add the finishing touches, e.g. docstrings.
from random import randint
CHARS = "abcdefghijklmnopqrstuvwxyz0123456789_!@#$%^&*"
def password_gen(length):
"""
Generate a random password
Args:
length: number of characters in the password
Returns:
Password string
"""
result = ""
for i in range(length):
result = result + CHARS[randint(0, len(CHARS)-1)]
return result
We will apply a similar pattern for building up lists. Could we just use the our implementation for strings, changing the initial empty string to an empty list? That is do the following?
def list_gen(length):
result = []
for i in range(length):
result = result + CHARS[randint(0, len(CHARS)-1)]
return result
Unfortunately, no. While the +
operator is defined for lists, it only supports concatenating two lists. If we try the following we will get an error:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[20], line 2 1 result = [] ----> 2 result = result + "a" TypeError: can only concatenate list (not "str") to list
To use concatenation, we would need to turn the letter into a list with a single element like shown below. Notice we are using square brackets to create a list with a single item, then creating a new list by concatenating result
and our single item list (containing a single letter).
The full implementation would look like:
def list_gen(length):
result = []
for i in range(length):
result = result + [CHARS[randint(0, len(CHARS)-1)]]
return result
This is not a very efficient approach, however. We are creating two new lists in loop iteration, the single element list and result of the concatenation. A more typical, and more efficient, approach is to use append
to add new items to the list.
But, instead of starting with the empty string and using the +
operator, we will start with the empty list and typically use the append
method. With append
we are taking advantage of list’s mutability to grow a single list over time instead of creating many new lists. The final implementation:
def list_gen(length):
result = []
for i in range(length):
result.append(CHARS[randint(0, len(CHARS)-1)])
return result
var += ...
and var = var + ...
The following loop body would also work for list_gen
, but for subtle reasons:
We describe result += ...
as a shorthand for result = result +
, but it is actually a distinct operator that can have its own implementation. For lists, Python optimizes +=
by implementing it with the extend
method, that is result +=
where result
is a list is equivalent to result.extend(...)
. The extend
method takes any sequence as its argument (and in fact any iterable, a broader ADT that just sequences), which is why it works with a string. +=
for lists is not shorthand for the following, which as we noted above would not work: