"a single quote isn't a problem"
'neither are "quotes"'
"a single quote isn't a problem"
'neither are "quotes"'
Sequences, including strings and lists
In a previous class we introduced the notions of types, and specifically the str
type. Since then we have been using strings quite extensively as literals.
To indicate a string we surround the characters with single or double quotes. Why either or? We can use one when the other is a character within the string, e.g.,
"a single quote isn't a problem"
'neither are "quotes"'
But how do we get double quotes into a double quoted string? With backslash escaping.
'quotes in "quotes"'
'backslashes with \\'
backslashes with \
That is one example of escape sequences. An even more common escape sequence is \n
which inserts a newline character into the string (advances to a new line). For example:
Recall that we introduced using +
for concatenation, e.g.
and the multiplication operator *
for duplication, e.g.
In the context of strings, we would term +
and *
“overloaded” operators. That is we have overloaded their typical meaning, addition, with functionality relevant to strings.
Recall that +
isn’t overloaded for concatenating strings with integers or other “not strings”. We needed to explicitly convert integers to strings.
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[7], line 1 ----> 1 "hello" + 5 2 "hello" + str(5) TypeError: can only concatenate str (not "int") to str
What about the reverse? We will probably need to convert strings to integers explicitly as well. We can use the int
(and float
, etc.) functions to do so.
We think of a string as a single object, but a string is also a sequence of characters that can be manipulated as an ordered collection. I hope you can already sense that like for
loops, sequences are a key CS (semantic) concept.
The Python operator for accessing specific elements in a sequence is []
, with indices beginning at 0, i.e. an index of zero references the first element in the sequence (termed “zero-indexed”):
'h'
'i'
' '
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[9], line 5 3 s[5] 4 s[10] ----> 5 s[100] IndexError: string index out of range
That last error should make sense to us. We can’t access beyond the end of the sequence.
Python has some very sophisticated indexing functionality, termed “slicing”. A slice is defined as “lower bound” : “upper bound” [: “stride”], with an inclusive lower bound, an exclusive upper bound and an optional stride (defaults to 1).
'here'
'here i'
'hr '
'here'
' is my string'
'here is my string'
'g'
'ri'
As we noticed in those last examples, negative indices are permitted. A negative index counts backward from the end of the sequence. The index of the last element is -1, the second to last is -2, and so on. That is negative indexing is “1-indexed” – we start counting at one. We can also have negative strides, which decrements the indices (effectively moving right to left, or reversing the sequence).
for
loopsThe combination of sequences and for
loops is very powerful.
It is poor style to directly specify the length of the string in the for
loop. What if we wanted to iterate through a different string? We would call the number “5” in this context a “magic number” (it appears out of nowhere like magic) and “magic numbers” are a code smell. There is a len
function that will return the length of the string, which we can use instead to derive the loop sequence from the variable.
Can we put this all together to reverse a string using a for
loop and indexing?
def reverse(s):
"""
Reverse a string argument, e.g, if argument is "hello", returns "olleh"
Args:
s: Input string
Returns:
Reversed string
"""
r = ""
for i in range(len(s)):
r = r + s[-(i+1)]
return r
Note that there are many other possible implementations for reversing a string (check out our in-class questions!). Likely the most concise approach, though, is to take advantage of negative strides in slicing, specifically [::-1]
. When we elide the start and end, Python slices from the “beginning” to the “end”. For a negative stride that corresponds to starting at the end of the string and counting down. Specifically for a string s
, [::-1]
is equivalent to the indices generated by range(len(s)-1, -1, -1))
, i.e., for s="hello"
the sequence 4, 3, 2, 1, 0.
We use the range function to generate a sequence of indices. But the string itself is a sequence. Will the following work? Yes! Recall that for
loops iterate through the loop sequence. We have been using range
to create such a sequence, but we can actually use any ordered sequence, such as a string, not just range
.
A string is a sequence, an “ordered collection”, but only of characters. We can imagine that a sequence is a useful concept - a useful data structure - for other types, e.g., integers, too not just characters. A list
is that data structure.
Let’s dig into those terms:
In Python, a string (str
type) is a sequence, but so is a list
and several other data structures we will learn about later. All of these data structures, i.e., both lists and strings, support the same set of operations (implement the same ADT).
What are some differences between lists and strings though?
[]
or with list(...)
.[7, 4, 3, 6, 1, 2]
10 # not a list
[10] # a list with one element
l = [7, 4, 3, 6, 1, 2]
l
type(l)
list("abcd")
[7, 4, 3, 6, 1, 2]
10
[10]
[7, 4, 3, 6, 1, 2]
list
['a', 'b', 'c', 'd']
What happened in that example? Let’s check out the list
function (really its “initializer”). A string is an iterable (iterates over each character), and so when we construct a list from a string, we get a list of the individual characters.
As we noted, a list is an ordered collection of any type, e.g.,
To which we can apply the same indexing, slicing and membership operators we used with strings.
list_of_strings = ["this", "is", "a", "list", "of", "strings"]
list_of_strings[0]
list_of_strings[0][1]
list_of_strings[1:4]
list_of_strings[::2]
"of" in list_of_strings
'this'
'h'
['is', 'a', 'list']
['this', 'a', 'of']
True
However a key difference is that we can reassign items via the indexing operator, that is indexing operations can appear on the left-hand side of an assignment operation. Notice that we reassign the last item in the list to be a different string. But if we attempt a similar reassignment with one of the strings itself, we get an error because strings are immutable. The ability to modify the contents of a list as our program executes is one of the key features (and uses of the list
data structure).
['this', 'is', 'a', 'list', 'of', 'items']
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[18], line 3 1 list_of_strings[-1] = "items" 2 list_of_strings ----> 3 list_of_strings[0][1] = "d" TypeError: 'str' object does not support item assignment
Much like strings, lists support the +
operator for concatenation, *
for duplication and can be used as the loop sequence with for
loops: