You may work on problems together, but write up
the solutions on your own. Especially, team work of students with
a background in life
sciences together with students with a background
in statistics, computer science or mathematics is highly encouraged. However,
every
student must be able to present his/her solutions
in class.
(1) Dayhoff's method applied to DNA
Use the following alignment for fitting a model
of DNA evolution along the lines of the
Dayhoff model for protein evolution.
TACTGCTATGCTATTCTAGTCTGGGCATCTATCATAAAGAGAGGGTATCATCTCT
TACTACTAC---ATTTCAATCTGAGCCTCTATCAGAAAGAGAGGCTATCTTCTCT
(a) Calculate M(emp).
(b) What is the mutability of A,C,G and T?
(c) Calculate a transition matrix, which is calibrated
to 1 PAM.
(d) Calculate a 64-PAM score matrix from this
model.
(e) Calculate the stationary distribution of
the model.
(2) The score of random alignments and profiles
We consider the following score system for DNA
Match
=+3
Transition =-1
Transversion =-2
Let X_1, X_2, X_3, Y_1, Y_2, Y_3 be independent
uniformly distributed random variables with values in {A,C,G,T}. I.e. P[X_k=i]=P[Y_l=j]=1/4
for all i,j in {A,C,G,T} and all l,k in {1,2,3}. Now we consider the fixed
alignment
X_1 X_2 X_3
Y_1 Y_2 Y_3
(a) For a fixed real number t, what is the probability
that the score of the small alignment above is greater or equal to t?
(b) Give a general formula for random alignments
of arbitrary length.
Assume that we have a profile score matrix (a position specific score) given by a (4xm)-matrix W(i,j). Remenber, the definition of profile scores: If we align a letter i to column j of the profile, we score this position by W(i,j). Now, we score this profile with a random sequence. (As above, the sequence has independent and uniformly distributed positions.)
(c) For a real number t, calculate P[Score>t]. Or in other words, calculate the distribution of the score.