Class 5

Sequences, including strings and lists

Objectives for today

  • Create string literals using both single-quotes and double-quotes
  • Explain the purpose of and use backslash-escaping in string literals
  • Describe a string as an ordered sequence of characters
  • Explain and use sequence operators (indexing, slicing) to obtain subsequences (including individual characters) or transform a string
  • Apply knowledge of strings as a sequence to lists, a sequence of any type

Strings

In a previous class we introduced the notions of types, and specifically the str type. Since then we have been using strings quite extensively as literals.

To indicate a string we surround the characters with single or double quotes. Why either or? We can use one when the other is a character within the string, e.g.,

"a single quote isn't a problem"
'neither are "quotes"'
"a single quote isn't a problem"
'neither are "quotes"'

But how do we get double quotes into a double quoted string? With backslash escaping.

"quotes in \"quotes\""
"backslashes with \\"
print("backslashes with \\")
'quotes in "quotes"'
'backslashes with \\'
backslashes with \

That is one example of escape sequences. An even more common escape sequence is \n which inserts a newline character into the string (advances to a new line). For example:

print("A new\nline")
A new
line

Recall that we introduced using + for concatenation, e.g.

"hello" + " world"
'hello world'

and the multiplication operator * for duplication, e.g.

"hello" * 3
'hellohellohello'

In the context of strings, we would term + and * “overloaded” operators. That is we have overloaded their typical meaning, addition, with functionality relevant to strings.

Recall that + isn’t overloaded for concatenating strings with integers or other “not strings”. We needed to explicitly convert integers to strings.

"hello" + 5
"hello" + str(5)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 "hello" + 5
      2 "hello" + str(5)

TypeError: can only concatenate str (not "int") to str

What about the reverse? We will probably need to convert strings to integers explicitly as well. We can use the int (and float, etc.) functions to do so.

5 + "hello"
5 + int("hello")
5 + int("5")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 5 + "hello"
      2 5 + int("hello")
      3 5 + int("5")

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Strings as Sequences

We think of a string as a single object, but a string is also a sequence of characters that can be manipulated as an ordered collection. I hope you can already sense that like for loops, sequences are a key CS (semantic) concept.

The Python operator for accessing specific elements in a sequence is [], with indices beginning at 0, i.e. an index of zero references the first element in the sequence (termed “zero-indexed”):

s = "here is my string"
s[0]
s[5]
s[10]
s[100]
'h'
'i'
' '
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[9], line 5
      3 s[5]
      4 s[10]
----> 5 s[100]

IndexError: string index out of range

That last error should make sense to us. We can’t access beyond the end of the sequence.

Python has some very sophisticated indexing functionality, termed “slicing”. A slice is defined as “lower bound” : “upper bound” [: “stride”], with an inclusive lower bound, an exclusive upper bound and an optional stride (defaults to 1).

s[0:4]
s[0:6]
s[0:6:2]
s[:4]
s[4:]
s[::]
s[-1]
s[-4:-2]
'here'
'here i'
'hr '
'here'
' is my string'
'here is my string'
'g'
'ri'

As we noticed in those last examples, negative indices are permitted. A negative index counts backward from the end of the sequence. The index of the last element is -1, the second to last is -2, and so on. That is negative indexing is “1-indexed” – we start counting at one. We can also have negative strides, which decrements the indices (effectively moving right to left, or reversing the sequence).

Sequnces and for loops

The combination of sequences and for loops is very powerful.

s = "hello"
for i in range(5):
    print(s[i])
h
e
l
l
o

It is poor style to directly specify the length of the string in the for loop. What if we wanted to iterate through a different string? We would call the number “5” in this context a “magic number” (it appears out of nowhere like magic) and “magic numbers” are a code smell. There is a len function that will return the length of the string, which we can use instead to derive the loop sequence from the variable.

While code “smells” are not bugs, they are characteristics of a program that may indicate a deeper problem or create problems in the future
s = "hello"
for i in range(len(s)):
    print(s[i])
h
e
l
l
o

Can we put this all together to reverse a string using a for loop and indexing?

def reverse(s):
    """
    Reverse a string argument, e.g, if argument is "hello", returns "olleh"

    Args:
        s: Input string

    Returns:
        Reversed string
    """
    r = ""
    for i in range(len(s)):
        r = r + s[-(i+1)]
    return r

Note that there are many other possible implementations for reversing a string (check out our in-class questions!). Likely the most concise approach, though, is to take advantage of negative strides in slicing, specifically [::-1]. When we elide the start and end, Python slices from the “beginning” to the “end”. For a negative stride that corresponds to starting at the end of the string and counting down. Specifically for a string s, [::-1] is equivalent to the indices generated by range(len(s)-1, -1, -1)), i.e., for s="hello" the sequence 4, 3, 2, 1, 0.

s = "hello"
s[::-1]
'olleh'

We use the range function to generate a sequence of indices. But the string itself is a sequence. Will the following work? Yes! Recall that for loops iterate through the loop sequence. We have been using range to create such a sequence, but we can actually use any ordered sequence, such as a string, not just range.

for c in "hello":
    print(c)
h
e
l
l
o

Lists, Sequences for anything

A string is a sequence, an “ordered collection”, but only of characters. We can imagine that a sequence is a useful concept - a useful data structure - for other types, e.g., integers, too not just characters. A list is that data structure.

Let’s dig into those terms:

  • A “data structure” is a particular way of organizing data. Different kinds of data structures have different properties that make them better/best suited for different kinds of computations. Data structures are such a fundamental idea in CS, they are focus of a whole class: CS201!
  • We can generalize the properties of similar data structures with abstract data types (ADT), i.e., multiple different data structures can implement the same ADT. An ADT is a (theoretical) model of the behavior of a data type, e.g., a sequence, from the perspective of the user. You could think of it as a “contract” that a particular data structure must satisfy.

In Python, a string (str type) is a sequence, but so is a list and several other data structures we will learn about later. All of these data structures, i.e., both lists and strings, support the same set of operations (implement the same ADT).

What are some differences between lists and strings though?

  • A string is a sequence of characters, while a list of a sequence of elements of any type (i.e. a list can contain integers, strings, other lists, etc.)
  • A string is immutable while a list is mutable. As a result string methods do not modify the string on which they are invoked, instead they create a new string with those modifications applied. In contrast, as we will see some list methods modify the list on which they are invoked and it is possible for us to re-assign individual items in a list.
  • Strings are created with single or double quotes, lists can be created with [] or with list(...).
[7, 4, 3, 6, 1, 2]
10 # not a list
[10] # a list with one element
l = [7, 4, 3, 6, 1, 2]
l
type(l)
list("abcd")
[7, 4, 3, 6, 1, 2]
10
[10]
[7, 4, 3, 6, 1, 2]
list
['a', 'b', 'c', 'd']

What happened in that example? Let’s check out the list function (really its “initializer”). A string is an iterable (iterates over each character), and so when we construct a list from a string, we get a list of the individual characters.

>>> help(list)
...
 list(iterable) -> new list initialized from iterable's items

As we noted, a list is an ordered collection of any type, e.g.,

a_valid_list = [1, 2.0, True, "string", [1]]
a_valid_list
[1, 2.0, True, 'string', [1]]

To which we can apply the same indexing, slicing and membership operators we used with strings.

list_of_strings = ["this", "is", "a", "list", "of", "strings"]
list_of_strings[0]
list_of_strings[0][1]
list_of_strings[1:4]
list_of_strings[::2]
"of" in list_of_strings
'this'
'h'
['is', 'a', 'list']
['this', 'a', 'of']
True

However a key difference is that we can reassign items via the indexing operator, that is indexing operations can appear on the left-hand side of an assignment operation. Notice that we reassign the last item in the list to be a different string. But if we attempt a similar reassignment with one of the strings itself, we get an error because strings are immutable. The ability to modify the contents of a list as our program executes is one of the key features (and uses of the list data structure).

list_of_strings[-1] = "items"
list_of_strings
list_of_strings[0][1] = "d"
['this', 'is', 'a', 'list', 'of', 'items']
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[18], line 3
      1 list_of_strings[-1] = "items"
      2 list_of_strings
----> 3 list_of_strings[0][1] = "d"

TypeError: 'str' object does not support item assignment

Much like strings, lists support the + operator for concatenation, * for duplication and can be used as the loop sequence with for loops:

for s in list_of_strings:
    print(s)
this
is
a
list
of
items