Class 7

Objectives for today

Strings

In a previous lecture we introduced the notions of types, and specifically the str string type. Since then we have been using strings quite extensively as literals.

To indicate a string we surround the characters with single or double quotes. Why either or? Recall, that we can use one when the other is a character within the string, e.g.,

>>> "a single quote isn't a problem"
"a single quote isn't a problem"
>>> 'neither are "quotes"'
'neither are "quotes"'

But how do we get double quotes into a double quoted string? With backslash escaping.

>>> "quotes in \"quotes\""
'quotes in "quotes"'
>>> "backslashes with \\"
'backslashes with \\'
>>> print("backslashes with \\")
backslashes with \

That is one example of escape sequences. An even more common escape sequence is \n which inserts a newline character into the string (advances to a new line, at the beginning of the line). For example:

>>> print("A new\nline")
A new
line

Recall that we introduced using + for concatenation, e.g.

>>> "hello" + " world"
'hello world'

and the multiplication operator * for duplication, e.g.

>>> "hello" * 3
'hellohellohello'

In the context of strings, we would term + and * “overloaded” operators. That is we have overloaded their typical meaning, addition, with functionality relevant to strings.

Recall that + isn’t overloaded for concatenating integers and other “not strings”. We needed to explicitly convert integers to strings.

>>> "hello" + 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't convert 'int' object to str implicitly
>>> "hello" + str(5)

What about the reverse? We will probably need to convert strings to integers explicitly as well. We can use the int (and float, etc.) functions to do so.

>>> 5 + "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>> 5 + int("hello")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'hello'
>>> 5 + int("5")
10

PI Questions 1 (String Operators)1

Strings as a Sequence

We think of a string as a single object, but a string is also a sequence of characters that can be manipulated as an ordered collection. I hope you can already sense that like for loops, sequences are a key CS (semantic) concept.

The Python operator for accessing specific elements in a sequence is [], with indices beginning at 0, i.e. an index of zero references the first element in the sequence (termed “zero-indexed”):

>>> s = "here is my string"
>>> s[0]
'h'
>>> s[5]
'i'
>>> s[10]
' '
>>> s[100]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

That last error should make sense to us. We can’t access beyond the end of the sequence.

Python has some very sophisticated indexing functionality, termed “slicing”. A slice is defined as “lower bound” : “upper bound” [: “stride”], with an inclusive lower bound, an exclusive upper bound and an optional stride (defaults to 1).

>>> s[0:4]
'here'
>>> s[0:6]
'here i'
>>> s[0:6:2]
'hr '
>>> s[:4]
'here'
>>> s[4:]
' is my string'
>>> s[::]
'here is my string'
>>> s[-1]
'g'
>>> s[-4:-2]
'ri'

As we noticed in those last examples, negative indices are permitted. A negative index counts backward from the end of the sequence. The index of the last element is -1, the second to last is -2, and so on. That is negative indexing is “1-indexed” – we start counting at one. We can also have negative strides, which decrements the indices (effectively moving right to left, or reversing the sequence).

Strings as sequences and for loops

The combination of sequences and for loops is very powerful.

>>> s = "hello"
>>> for i in range(5):
...     print(s[i])
... 
h
e
l
l
o

It is poor style to directly specify the length of the string in the for loop. What if we wanted to iterate through a different string? We would call the number “5” in this context a “magic number” (it appears out of nowhere like magic) and “magic numbers” are a code smell. There is a len function that will return the length of the string, which we can use instead to derive the loop sequence from the variable.

>>> s = "hello"
>>> for i in range(len(s)):
...     print(s[i])
... 
h
e
l
l
o

Can we put this all together to reverse a string using a for loop and indexing? Show a possible implementation

def reverse(s):
    """
    Reverse a string argument, e.g, if argument is "hello", returns "olleh"

    Args:
        s: Input string

    Returns:
        Reversed string
    """
    r = ""
    for i in range(len(s)):
        r = r + s[-(i+1)]
    return r

Note that there are many other possible implementations for reversing a string (check out our PI questions!). Likely the most concise approach, though, is to take advantage of negative strides in slicing, specifically [::-1]. When we elide the start and end, Python slices start at the “beginning” and go to the “end”. For a negative stride that corresponds to starting at the end of the string and counting down. Specifically for a string s, [::-1] is equivalent to the indices generated by range(len(s)-1, -1, -1)), i.e., for s="hello" the sequence 4, 3, 2, 1, 0.

>>> s = "hello"
>>> s[::-1]
'olleh'

We use the range function to generate a sequence of indices. But the string itself is a sequence. Will the following work? Yes! Recall that for loops iterate through the loop sequence. We have been using range to create such a sequence, but we can actually use any ordered sequence, such as a string, not just range.

>>> for c in "hello":
...     print(c)
... 
h
e
l
l
o

PI Questions 2 (String Iteration)1