Assignment 3

To be handed in   Friday  December 1   10 AM   in my mailbox (old chemistry  211).

You may work on problems together, but write up the solutions on your own. Especially, team work of students with  a background in life
sciences together with students with a background in statistics, computer science or mathematics is highly encouraged. However, every
student must be able to present his/her solutions in class.
 

(1) Dayhoff's method applied to DNA
Use the following alignment for fitting a model of DNA evolution along the lines of the
Dayhoff model for protein evolution.

TACTGCTATGCTATTCTAGTCTGGGCATCTATCATAAAGAGAGGGTATCATCTCT
TACTACTAC---ATTTCAATCTGAGCCTCTATCAGAAAGAGAGGCTATCTTCTCT

(a) Calculate M(emp).
(b) What is the mutability of A,C,G and T?
(c) Calculate a transition matrix, which is calibrated to 1 PAM.
(d) Calculate a 64-PAM score matrix from this model.
(e) Calculate the stationary distribution of the model.

(2) The score of random alignments and profiles
We consider the following score system for DNA
Match            =+3
Transition    =-1
Transversion =-2
Let X_1, X_2, X_3, Y_1, Y_2, Y_3 be independent uniformly distributed random variables with values in {A,C,G,T}. I.e. P[X_k=i]=P[Y_l=j]=1/4 for all i,j in {A,C,G,T} and all l,k in {1,2,3}. Now we consider the fixed alignment
X_1 X_2 X_3
Y_1 Y_2 Y_3

(a) For a fixed real number t, what is the probability that the score of the small alignment above is greater or equal to t?
(b) Give a general formula for random alignments of arbitrary length.

Assume that we have a profile score matrix (a position specific score) given by a (4xm)-matrix W(i,j).  Remenber, the definition of profile scores: If we align a letter i to column j of the profile, we score this position by W(i,j). Now, we score this profile with a random sequence. (As above, the sequence has independent and uniformly distributed positions.)

(c) For a real number t, calculate P[Score>t]. Or in other words, calculate the distribution of the score.