Genes have different versions, or variants, termed alleles. These different alleles can be associated with different traits, e.g. do you taste certain chemicals as bitter. Population genetics is the study of how evolutionary processes affect the frequencies of alleles in the population. For example, if a population starts with a mixture of two alleles and if there is no advantage for one allele over the other, then one of the alleles will eventually disappear and the other will be present in 100% of the individuals (described as becoming fixed in the population).
To convince ourselves of this phenomenon, we are going to create a simple of simulation of a haploid organism (just one copy of chromosome) that has two alleles ‘a’ and ‘A’. We will represent our population of size n with a string of length n containing the letters ‘a’ and ‘A’. We will then simulate each new generation by randomly sampling from the current population with replacement to create a new population of the same size. We then want to return the number of generations required for one of the alleles to become fixed.
As always we want to decompose our problem into smaller problems that are easier to solve and thus build up the solution piece-by-piece. How could we break this problem into a set of functions that solve smaller problems and what semantic tools are needed for those functions? Show a possible decomposition…
Write a function named
next_gen that takes the current generation as a
parameter and returns the next generation.
What semantic tools are needed here? Likely a
for loop, a way to randomly
sample from a string, and the string accumulation pattern.
Write a function named
pop_sim that takes an initial population as a
string and returns the number of generations to required for fixation.
What semantic tools are needed here? A loop to iterate over the generations. And a way to detect if both alleles are present in the string.
I would suggest we start with
next_gen and then implement
next_gen has a single string parameter,
pop the current population and returns the new population, a string of the same size. An example would be:
>>> pop = "AAAAaAaAAA" >>> next_gen(pop) 'aAAaAAaAAA'
What “pattern” will this function likely take? We could use the string construction pattern we used in Lab 3 in which we build strings up character by character in a
for loop. What would the body of that loop look like? Show a possible implementation…
Recall our pattern looks something like
def next_gen(pop): next_pop = "" for i in range(len(pop)): next_pop += ... return next_pop
Here we want to randomly sample from
pop with replacement. As we did before we can use indexing and
pop[randint(0, len(pop)-1)]. As you might imagine this a very common task, and so the random module has a
choice function to do exactly this kind of sampling. The
choice function randomly selects one element from a non-empty sequence. I suspect you will find
choice helpful in lab 4 (and beyond). Putting it all together (with a docstring):
def next_gen(pop): """ Generate the next generation by randomly sampling from the current population Args: pop: Current population as a string Returns: Next generation as a string """ next_pop = "" for i in range(len(pop)): next_pop += choice(pop) return next_pop
Now let’s turn to
pop_sim, which has a single string parameter,
pop, the initial population and returns the number of generations till fixation. We need a loop to generate the successive generations, but do we know how many generations we will have to simulate? No. So which type of loop will we need? A
while loop. Why could we use a
for loop for
next_gen, but not here?
next_gen, we know the number of iterations (the size of the population or the length of the string) and thus could (and should) use a
When should the loop in
pop_sim terminate? When one allele becomes fixed.
Alternately, when should the loop keep executing? As long as both alleles are
present in the population, i.e. both “a” and “A” in the population string. How
can we express that as a conditional statement?
in operator returns
True if its left-hand side is present in the right-hand side operand.
"a" in pop and "A" in pop
Thus our loop looks like:
while "a" in pop and "A" in pop: pop = next_gen(pop)
If we want to keep track of the number of generations, we need to count how
many times the loop executes. For
for loops that is always a known quantity.
while loops we will need to introduce a counter variable that is
incremented each time the loop executes. Putting it all together…
def pop_sim(pop): """ Simulate allele fixation in a population Args: pop: Initial population as a string Returns: Integer number of generations need to achieve fixation """ generations = 0 while "a" in pop and "A" in pop: pop = next_gen(pop) generations += 1 return generations
Check out a full implementation including a function to generate an initial population.