Prelab 5 Notes

Here we discuss an interesting problem that has parallels with the work you are doing this week for Prelab 5 and Lab 5. There is nothing to turn in for this write-up – it is just for extra practice.

Allele Frequency

Genes have different versions, or variants, termed alleles. Humans are diploid organisms, that is we have two copies of each chromosome (excluding sex chromosomes), one inherited from our mother, the other from our father. Thus for the alleles ‘a’ and ‘A’, we can have three possible genotypes: ‘aa’, ‘aA’, ‘AA’, i.e., 0, 1, or 2 copies of the ‘A’ allele.

A very useful statistic is the allele frequency, that is how common is a given allele in the population. Assume you have a file containing a set of genotypes expressed in terms of allele counts, e.g., 0, 1 or 2. Write a function allele_freq that takes a filename as a parameter and returns the allele frequency.

How can we do so? As always we want to decompose our problem into smaller pieces. Here we can decompose our problem into 1) reading the file into a list of integers, and 2) computing the allele frequency from that list.

For part one, recall that we can read a file line by line with a for loop. Each line will be a string, and thus needs to be converted to an integer with the int function.

def read_file(filename):
    with open(filename, "r") as file:
        counts = []
        for line in file:
            counts.append(int(line.strip()))
        return counts

The frequency computation involves a sum of the number of alleles divided by the total possible number of alleles (recall 2 per person, i.e., per line). Recall that we can use the built-in sum function (no explicit loop needed).

def freq(counts):
    return sum(counts) / (2 * len(counts))

And putting it all together:

def allele_freq(filename):
    counts = read_file(filename)
    return freq(counts)

Running this code on an example file prelab5-alleles.txt containing

0
1
1
1
1
1
1
1
1
2

computes the expected allele frequency of 0.5

>>> allele_freq("prelab5-alleles.txt")
0.5