In a previous lecture we introduced the notions of types, and specifically the
str
string type. Since then we have been using strings quite extensively as
literals.
To indicate a string we surround the characters with single or double quotes. Why either or? Recall, that we can use one when the other is a character within the string, e.g.,
>>> "a single quote isn't a problem"
"a single quote isn't a problem"
>>> 'neither are "quotes"'
'neither are "quotes"'
But how do we get double quotes into a double quoted string? With backslash escaping.
>>> "quotes in \"quotes\""
'quotes in "quotes"'
>>> "backslashes with \\"
'backslashes with \\'
>>> print("backslashes with \\")
backslashes with \
That is one example of escape
sequences.
An even more common escape sequence is \n
which inserts a newline character
into the string (advances to a new line, at the beginning of the line). For
example:
>>> print("A new\nline")
A new
line
Recall that we introduced using +
for concatenation, e.g.
>>> "hello" + " world"
'hello world'
and the multiplication operator *
for duplication, e.g.
>>> "hello" * 3
'hellohellohello'
In the context of strings, we would term +
and *
“overloaded” operators.
That is we have overloaded their typical meaning, addition, with functionality
relevant to strings.
Recall that +
isn’t overloaded for concatenating integers and other “not
strings”. We needed to explicitly convert integers to strings.
>>> "hello" + 5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't convert 'int' object to str implicitly
>>> "hello" + str(5)
What about the reverse? We will probably need to convert strings to integers
explicitly as well. We can use the int
(and float
, etc.) functions to do
so.
>>> 5 + "hello"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>> 5 + int("hello")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'hello'
>>> 5 + int("5")
10
PI Questions 1 (String Operators)1
We think of a string as a single object, but a string is also a sequence of
characters that can be manipulated as an ordered collection. I hope you can
already sense that like for
loops, sequences are a key CS (semantic) concept.
The Python operator for accessing specific elements in a sequence is []
, with
indices beginning at 0, i.e. an index of zero references the first element in
the sequence (termed “zero-indexed”):
>>> s = "here is my string"
>>> s[0]
'h'
>>> s[5]
'i'
>>> s[10]
' '
>>> s[100]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
That last error should make sense to us. We can’t access beyond the end of the sequence.
Python has some very sophisticated indexing functionality, termed “slicing”. A slice is defined as “lower bound” : “upper bound” [: “stride”], with an inclusive lower bound, an exclusive upper bound and an optional stride (defaults to 1).
>>> s[0:4]
'here'
>>> s[0:6]
'here i'
>>> s[0:6:2]
'hr '
>>> s[:4]
'here'
>>> s[4:]
' is my string'
>>> s[::]
'here is my string'
>>> s[-1]
'g'
>>> s[-4:-2]
'ri'
As we noticed in those last examples, negative indices are permitted. A negative index counts backward from the end of the sequence. The index of the last element is -1, the second to last is -2, and so on. That is negative indexing is “1-indexed” – we start counting at one. We can also have negative strides, which decrements the indices (effectively moving right to left, or reversing the sequence).
for
loopsThe combination of sequences and for
loops is very powerful.
>>> s = "hello"
>>> for i in range(5):
... print(s[i])
...
h
e
l
l
o
It is poor style to directly specify the length of the string in the for
loop. What if we wanted to iterate through a different string? We would call
the number “5” in this context a “magic number” (it appears out of nowhere like magic) and “magic numbers” are a code smell. There is a len
function that will return the length of the string, which we can use instead to derive the loop sequence from the variable.
>>> s = "hello"
>>> for i in range(len(s)):
... print(s[i])
...
h
e
l
l
o
Can we put this all together to reverse a string using a for
loop and indexing? Show a possible implementation…
def reverse(s):
"""
Reverse a string argument, e.g, if argument is "hello", returns "olleh"
Args:
s: Input string
Returns:
Reversed string
"""
r = ""
for i in range(len(s)):
r = r + s[-(i+1)]
return r
Note that there are many other possible implementations for reversing a string (check out our PI questions!). Likely the most concise approach, though, is to take advantage of negative strides in slicing, specifically [::-1]
. When we elide the start and end, Python slices start at the “beginning” and go to the “end”. For a negative stride that corresponds to starting at the end of the string and counting down. Specifically for a string s
, [::-1]
is equivalent to the indices generated by range(len(s)-1, -1, -1))
, i.e., for s="hello"
the sequence 4, 3, 2, 1, 0.
>>> s = "hello"
>>> s[::-1]
'olleh'
We use the range function to generate a sequence of indices. But the string
itself is a sequence. Will the following work? Yes! Recall that for
loops iterate through the loop sequence. We have been using
range
to create such a sequence, but we can actually use any ordered
sequence, such as a string, not just range
.
>>> for c in "hello":
... print(c)
...
h
e
l
l
o
PI Questions 2 (String Iteration)1