for
and while
loopsGenes have different versions, or variants, termed alleles. These different alleles can be associated with different traits, e.g. do you taste certain chemicals as bitter. Population genetics is the study of how evolutionary processes affect the frequencies of alleles in the population. For example, if a population starts with a mixture of two alleles and if there is no advantage for one allele over the other, then one of the alleles will eventually disappear and the other will be present in 100% of the individuals (described as becoming fixed in the population).
To convince ourselves of this phenomenon, we are going to create a simple of simulation of a haploid organism (just one copy of chromosome) that has two alleles ‘a’ and ‘A’. We will represent our population of size n with a string of length n containing the letters ‘a’ and ‘A’. We will then simulate each new generation by randomly sampling from the current population with replacement to create a new population of the same size. We then want to return the number of generations required for one of the alleles to become fixed.
As always we want to decompose our problem into smaller problems that are easier to solve and thus build up the solution piece-by-piece. How could we break this problem into a set of functions that solve smaller problems and what semantic tools are needed for those functions? Show a possible decomposition…
Write a function named next_gen
that takes the current generation as a
parameter and returns the next generation.
What semantic tools are needed here? Likely a for
loop, a way to randomly
sample from a string, and the string accumulation pattern.
Write a function named pop_sim
that takes an initial population as a
string and returns the number of generations to required for fixation.
What semantic tools are needed here? A loop to iterate over the generations. And a way to detect if both alleles are present in the string.
I would suggest we start with next_gen
and then implement pop_sim
. next_gen
has a single string parameter, pop
the current population and returns the new population, a string of the same size. An example would be:
>>> pop = "AAAAaAaAAA"
>>> next_gen(pop)
'aAAaAAaAAA'
What “pattern” will this function likely take? We could use the string construction pattern we used in Lab 3 in which we build strings up character by character in a for
loop. What would the body of that loop look like? Show a possible implementation…
Recall our pattern looks something like
def next_gen(pop):
next_pop = ""
for i in range(len(pop)):
next_pop += ...
return next_pop
Here we want to randomly sample from pop
with replacement. As we did before we can use indexing and randint
, e.g. pop[randint(0, len(pop)-1)]
. As you might imagine this a very common task, and so the random module has a choice
function to do exactly this kind of sampling. The choice
function randomly selects one element from a non-empty sequence. I suspect you will find choice
helpful in lab 4 (and beyond). Putting it all together (with a docstring):
def next_gen(pop):
"""
Generate the next generation by randomly sampling from the
current population
Args:
pop: Current population as a string
Returns:
Next generation as a string
"""
next_pop = ""
for i in range(len(pop)):
next_pop += choice(pop)
return next_pop
Now let’s turn to pop_sim
, which has a single string parameter, pop
, the initial population and returns the number of generations till fixation. We need a loop to generate the successive generations, but do we know how many generations we will have to simulate? No. So which type of loop will we need? A while
loop. Why could we use a for
loop for next_gen
, but not here?
In next_gen
, we know the number of iterations (the size of the population or the length of the string) and thus could (and should) use a for
loop.
When should the loop in pop_sim
terminate? When one allele becomes fixed.
Alternately, when should the loop keep executing? As long as both alleles are
present in the population, i.e. both “a” and “A” in the population string. How
can we express that as a conditional statement?
The in
operator returns True
if its left-hand side is present in the right-hand side operand.
"a" in pop and "A" in pop
Thus our loop looks like:
while "a" in pop and "A" in pop:
pop = next_gen(pop)
If we want to keep track of the number of generations, we need to count how
many times the loop executes. For for
loops that is always a known quantity.
For while
loops we will need to introduce a counter variable that is
incremented each time the loop executes. Putting it all together…
def pop_sim(pop):
"""
Simulate allele fixation in a population
Args:
pop: Initial population as a string
Returns:
Integer number of generations need to achieve fixation
"""
generations = 0
while "a" in pop and "A" in pop:
pop = next_gen(pop)
generations += 1
return generations
Check out a full implementation including a function to generate an initial population.