Here we discuss an interesting problem that has parallels with the work you are doing this week for Prelab 4 and Lab 4. There is nothing to turn in for this write-up – it is just for extra practice.
Genes have different versions, or variants, termed alleles. These different alleles can be associated with different traits, e.g., do you taste certain chemicals as bitter. Population genetics is the study of how evolutionary processes affect the frequencies of alleles in the population. For example, if a population starts with a mixture of two alleles and if there is no advantage for one allele over the other, then one of the alleles will eventually disappear and the other will be present in 100% of the individuals (described as becoming fixed in the population).
To convince ourselves of this phenomenon, we are going to create a simple simulation of a haploid organism (just one copy of chromosome) that has two alleles ‘a’ and ‘A’. We will represent our population of size n with a string of length n containing the letters ‘a’ and ‘A’. We will then simulate each new generation by randomly sampling from the current population to create a new population of the same size. We then want to return the number of generations required for one of the alleles to become fixed.
As always we want to decompose our problem into smaller problems that are easier to solve and thus build up the solution piece-by-piece. A way to decompose the problem:
Write a function named next_generation
that takes the current generation as a
parameter and returns the next generation.
What semantic tools are needed here? Likely a for
loop, a way to randomly
sample from a string, and the string accumulation pattern.
Write a function named simulate_population
that takes an initial population as a
string and returns the number of generations required for fixation.
What semantic tools are needed here? A loop to iterate over the generations.
However, we don’t know how many generations will be required so we will need
a while
loop. And a way to detect if both alleles are present in the
string.
If you wanted to write a solution, we suggest you start with
next_generation
and then implement simulate_population
.
The first function is a familiar application of the string construction
pattern we used in Lab 3 (in which we build strings up character by character).
One new feature is the choice
function from the random module. This
function
randomly selects one element from a non-empty sequence. It effectively
implements the very common operation seq[randint(0, len(seq)-1)]
.
You will likely find choice
helpful in Lab 4 (and beyond).
Let’s focus on simulate_population
. We need a loop to generate the successive
generations, but we don’t know how many generations, and thus loop iterations,
will be required. Thus we will need to use a while
loop. In contrast, in the
next_generation
, we know the number of iterations (the size of the population or the
length of the string) and thus could (and should) use a for
loop.
When should the loop in simulate_population
terminate? When one allele becomes fixed.
Alternately, when should the loop keep executing? As long as both alleles are
present in the population, i.e., both “a” and “A” are in the population string. How
can we express that as a conditional statement?
"a" in population and "A" in population
The in
operator returns True
if its left-hand side is present in the
right-hand side operand.
Thus our loop looks like:
while "a" in population and "A" in population:
population = next_generation(population)
If we want to keep track of the number of generations, we need to count how
many times the loop executes. For for
loops that is always a known quantity.
For while
loops we will need to introduce a counter variable that is
incremented each time the loop executes.
generations = 0
while "a" in population and "A" in population:
population = next_generation(population)
generations += 1
Here is an implementation.