Class 8

While Loops

Objectives for today

  • Use relational operators to compare values
  • Describe the execution of for and while loops and differentiate between them
  • Evaluate a computational problem and appropriately choose a for or while loop

Non-boolean types that can be used as booleans

Will the following print statement get executed?

if "a string?":
    print("Do I get executed?")

Yes. Most values can be used in a boolean context. In Python, 0, 0.0, None, empty sequences (e.g. "") and a few other values evaluate as False (often termed “falsy”) and everything else is True (often termed “truthy”).

In general, I don’t recommend using this “implicit” type conversion as it just increases the chances for difficult-to-find bugs.

Now that we know about implicit booleans, though, we can resolve a common bug. In a previous in-class question, the solution was a == b or a == 5. Can we simplify that expression as a == b or 5 (i.e., is == distributive)? No, that expression is evaluated as a (a == b) or 5, which is not the same (and will always evaluate to True as 5 evaluates as True). What about a == (b or 5)? No, equality is not distributive. Instead if b is truthy, the above expression simplifies to a == b, if not, it simplifies to a == 5. If a and b are both 0, we should get True, but will get False.

Short circuit evaluation

Let’s think about a and b. If a evaluates to False do I need to evaluate b? Similarly if in a or b a evaluates to True do I need to evaluate b?

No. Python (and many other languages) “short-circuit” the evaluation of logical expressions. Doing so can both improve efficiency (we don’t perform computations we don’t need) and help us manage potentially problematic situations. For example in the following, we only ever execute dangerous_operation if is_valid(input) evaluates to True.

is_valid(input) and dangerous_operation(input)

Comparing strings and other non-numeric types

We can also apply relational operators, i.e. <, ==, etc., to other types; most notably strings. For example:

'Aardvark' < 'Zebra'
'aardvark' < 'Zebra'
True
False

That second one doesn’t seem to make much sense… Just as + has different meaning for strings than integers, the < a meaning specific to strings. Python compares strings using lexicographic ordering, i.e., it compares the ordering of corresponding characters. The first characters are compared, if equal, then the 2nd characters are compared and so on. If one string is a substring of the other, it is lexicographically less than, e.g.,

>>> "abc" < "abcdef"
True

This is not the same as a case-insensitive alphabetical ordering. When two letters are being compared, that comparison is based on their underlying numeric encoding. In the character encoding used by Python (and lots of other software), all upper case letters are less than lower case letters. Hence “aardvark” is “greater than” “Zebra”. You can access this numerical encoding with the ord function, e.g.,

ord("A")
ord("a")
65
97

Some more examples:

"test" == "test"
"test" == "testing"[0:4]
"test" == "TEST"
True
True
False

If we wanted to ensure that a string comparison was case insensitive, how could be we do so? Use the upper or lower methods to ensure consistent case.

while loops

We previously used for loops to execute a block of code for a specific number of repetitions. What if we don’t know how many iterations are needed? What might be such a situation? Obtaining a valid input from a user. We don’t know how many tries it will take someone to provide a valid input.

This is where we apply while loops. The general structure of a while loop is:

while <bool expression>:
    statement1
    statement2
    ...
    statementn

The statements in the loop body (i.e. statement1 … statementn) will be executed repeatedly until the boolean expression, the loop conditional, evaluates to False.

Here is a concrete example. What will this code print?

x = 5
while x > 0:
    print(x)
    x = x-1
5
4
3
2
1

What is the necessary implication of while loops? That some statement(s) within the body of the while loop will change the loop conditional (or otherwise terminate the loop). What happens if that is not the case, e.g.

while True:
    print("How many times will this string get printed?")

Hit Ctrl-C (Ctrl and C simultaneously) or the Thonny “stop sign” button to stop execution. This is called an infinite loop and is a common problem. You will likely need to use Ctrl-C or the stop sign button at some point.

What about the following loop? Will it terminate?

i = 0
while i < 10:
    print("How many times will this string get printed?")
    i = i - 1

No. Because i starts at 0 and only gets smaller it will always be less than 10. What about this loop?

from random import randint
i = randint(1, 20)
while i < 10:
    print("How many times will this string get printed?")
    i = i + 1

Yes. If the value is less than 10, the loop with terminate after some number of iterations. If the value is greater than or equal to 10, the loop won’t execute at all.

In addition to changing the loop conditional we can also explicitly terminate the loop with the break statement. As its name suggests, break terminates the loop and begins executing the first statement after the loop. Although break is most commonly used with while loops, it can also be used to end for loops “early” (i.e., only perform a subset of the iterations).

Random Guessing Game

Let’s implement a guessing game in which Python picks a random integer from 1-20 (inclusive), and keeps asking the user to guess the number until they get it right. To help the user, the game should give hints “higher”, or “lower”.

What would be some key elements in such a program?

  • Generate a random integer
  • Accept input from the user
  • Conditional to check if guess was correct, and print success or hint messages
  • Loop that repeatedly prompts user for input and performs the above comparison

Check out guessing_game.py. Here we see 3 strategies for implementing the loop:

  1. Use a boolean variable correct to track if the user answered correctly
  2. break out of a “while True” loop on a correct answer
  3. Compare the user response to the desired answered in the loop conditional

Additionally, note the use of the input function for reading user inputs. input prints its prompt argument then waits for and returns the string the user typed before hitting Enter/Return. input returns a string and so we need to convert the result to an int for use in our game.

help(input)
Help on method raw_input in module ipykernel.kernelbase:

raw_input(prompt='') method of ipykernel.ipkernel.IPythonKernel instance
    Forward raw_input to frontends
    
    Raises
    ------
    StdinNotImplementedError if active frontend doesn't support stdin.

while vs for loops

Often we can implement the same functionality with both a for loop and a while loop. In fact, in Python any for loop can be readily implemented with a while loop. The reverse is trickier. There are tricks that would enable us to make a for loop behave like a while loop in some situations, but they are exactly that – tricks. So for our purposes we should think of while loops as a superset of for loops.

As an example, how could we write a for loop to print the even numbers from 1-10 inclusive?

for i in range(2, 11, 2):
    print(i)
2
4
6
8
10

And the same with a while loop?

i = 2
while i <= 10:
    print(i)
    i += 2
2
4
6
8
10

We can see the clear correspondence between the while loop and the for loop. Here we effectively re-implemented the range within the while loop by setting the initial value, the end condition and the increment. What are some other ways we could have achieved the same result? One is to iterate through all integers in the range [1,10], but use an if statement, e.g., if i % 2 == 0: to identify and print the even values.

When to use for vs. while?

So when do we use a for loop and when do we use a while loop?

We use a for loop when

  • The number of iterations is known before the loop and does not depend on the statements in the loop body. That doesn’t mean the number of iterations is a fixed number (like the 4 sides of a square), just that we know the number of iterations before the loop starts.
  • The increment to the loop variable is the same on every iteration

As an example, iterating over all the elements in a sequence, e.g. a string, is an example of a situation where the number of iterations is known at the initiation of the loop (number of elements in sequence) and the increment is consistent (increment one element each iteration).

We would use a while loop in other settings, such as

  • The number of iterations is not known beforehand
  • The increments to loop variable are different, e.g. depend on a computation in the loop body

This an example where “style” matters but there are not necessarily clear “rules”. Often one approach or the other is more appropriate. The right choice will make the code more elegant, easier to reason about (and easier to debug).

while loops in action

Simulating population genetics

Genes have different versions, or variants, termed alleles. These different alleles can be associated with different traits, e.g. do you taste certain chemicals as bitter. Population genetics is the study of how evolutionary processes affect the frequencies of alleles in the population. For example, if a population starts with a mixture of two alleles and if there is no advantage for one allele over the other, then one of the alleles will eventually disappear and the other will be present in 100% of the individuals (described as becoming fixed in the population).

To convince ourselves of this phenomenon, we are going to create a simple of simulation of a haploid organism (just one copy of chromosome) that has two alleles ‘a’ and ‘A’. We will represent our population of size n with a string of length n containing the letters ‘a’ and ‘A’. We will then simulate each new generation by randomly sampling from the current population with replacement to create a new population of the same size. We then want to return the number of generations required for one of the alleles to become fixed.

As always we want to decompose our problem into smaller problems that are easier to solve and thus build up the solution piece-by-piece. How could we break this problem into a set of functions that solve smaller problems and what semantic tools are needed for those functions?

  1. Write a function named next_gen that takes the current generation as a parameter and returns the next generation.

    What semantic tools are needed here? Likely a for loop, a way to randomly sample from a string, and the string accumulation pattern.

  2. Write a function named pop_sim that takes an initial population as a string and returns the number of generations to required for fixation.

    What semantic tools are needed here? A loop to iterate over the generations. And a way to detect if both alleles are present in the string.

Let’s start with next_gen and then implement pop_sim. next_gen has a single string parameter, pop the current population and returns the new population, a string of the same size. An example would be:

>>> pop = "AAAAaAaAAA"
>>> next_gen(pop)
'aAAaAAaAAA'

What “pattern” will this function likely take? We could use the string construction pattern we used in PA3 in which we build strings up character by character in a for loop. In this context, the pattern might look like:

def next_gen(pop):
    next_pop = ""
    for i in range(len(pop)):
        next_pop += ...
    return next_pop

Here we want to randomly sample from pop with replacement. As we did before we can use indexing and randint, e.g. pop[randint(0, len(pop)-1)]. As you might imagine this a very common task, and so the random module has a choice function to do exactly this kind of sampling. The choice function randomly selects one element from a non-empty sequence. I suspect you will find choice helpful in PA4 (and beyond).

def next_gen(pop):
    """
    Generate the next generation by randomly sampling from the
    current population
    
    Args:
        pop: Current population as a string
    
    Returns:
        Next generation as a string
    """
    next_pop = ""
    for i in range(len(pop)):
        next_pop += choice(pop)
    return next_pop

Now let’s turn to pop_sim, which has a single string parameter, pop, the initial population and returns the number of generations till fixation. We need a loop to generate the successive generations, but do we know how many generations we will have to simulate? No. 

A while loop. In the previous next_gen function, we know the number of loop iterations (the size of the population or the length of the string) and thus could (and should) use a for loop. Here we don’t know the number generations required to reach fixation and so need to use a while loop

When should the loop in pop_sim terminate? When one allele becomes fixed. Alternately, when should the loop keep executing? As long as both alleles are present in the population, i.e., both “a” and “A” are in the population string.

The in operator returns True if its left-hand side is present in the right-hand side operand.

"a" in pop and "A" in pop

What do you want to do each iteration? Here, each iteration is a new generation, that is we want to simulate the next generation resulting from the current generation, or the new pop resulting from the existing pop. Recall we already implemented a function next_gen that generates a new population from an existing population. Let’s use it here.

while "a" in pop and "A" in pop:
    pop = next_gen(pop)

What is the ultimate return value? We want to return the number of generations, i.e., the number of loop iterations. To keep track of the number of generations, we need to count how many times the loop executes. With for loops that is always a known quantity. With while loops we will need to introduce a “counter” variable that is incremented each time the loop executes.

def pop_sim(pop):
    """
    Simulate allele fixation in a population
    
    Args:
        pop: Initial population as a string
    
    Returns:
        Integer number of generations need to achieve fixation
    """
    generations = 0
    while "a" in pop and "A" in pop:
        pop = next_gen(pop)
        generations += 1
    return generations

Check out a full implementation including a function to generate an initial population.

Adapted from (Libeskind-Hadas and Bush 2014).

Libeskind-Hadas, Ran, and Eliot Bush. 2014. Computing for Biologists: Python Programming and Principles. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781107337510.