In this assignment you will be exploring perturbations of Hardy-Weinberg equilibrium, as discussed in chapter 3 of Mielke, Konigsberg, and Relethford. Refer back to Homework 0 for reminders on how to use R and RStudio.
For each question you will load and run code to produce plots and then answer some follow-up questions.
Make sure to read all the way to the end of each question to answer the follow-up questions.
The script below relates to page 56 in Mielke, Konigsberg, and Relethford (which is why the script is called "P.56").
P.56=function(){
cat('Enter the number of individuals with type-A blood (179 in Taylor & Prior)', fill=F ); A = scan(n=1,quiet=T)
cat('Enter the number of individuals with type-B blood (35 in T & P)', fill=F ); B = scan(n=1,quiet=T)
cat('Enter the number of individuals with type-AB blood (6 in T & P)', fill=F ); AB = scan(n=1,quiet=T)
cat('Enter the number of individuals with type-O blood (202 in T & P)', fill=F ); O = scan(n=1,quiet=T)
p=0;q=0;r=0
N=A+B+AB+O
r[1]=sqrt(O/N)
p[1]=(1-r[1])/2
q[1]=p[1]
for(i in 2:10){
AA=p[i-1]^2/(p[i-1]^2+2*p[i-1]*r[i-1])*A
AO=A-AA
BB=q[i-1]^2/(q[i-1]^2+2*q[i-1]*r[i-1])*B
BO=B-BB
p[i]=(2*AA+AO+AB)/(2*N)
q[i]=(2*BB+BO+AB)/(2*N)
r[i]=1-p[i]-q[i]
}
ABO = matrix(c(A,(p[10]^2+2*p[10]*r[10])*N,B,(q[10]^2+2*q[10]*r[10])*N,
AB,(2*p[10]*q[10])*N,O,r[10]^2*N),nc=2,byrow=T)
colnames(ABO)=c('Obs','Exp')
rownames(ABO)=c('A','B','AB','O')
cat('\n')
print(ABO)
cat('\n')
return(list(N=N,p=p[10],q=q[10],r=r[10]))
}
Run the script by typing P.56()
into the console panel or “running” it from the script panel.
The script asks for the number of individuals with type A, B, AB, and O blood. Only for AB and O phenotypes are the genotypes known (AB and OO, respectively). Type A includes AA and AO while type B includes BB and BO. The script uses something called the expectation maximization algorithm to find the allele frequencies (p, q, and r).
First run the script using the prompted numbers from Taylor and Prior (1938) (A: 179, B: 35, AB: 6, O: 202). Copy the output to your document.
Now, pick your own counts (whatever you want) and copy the output to your document.
Using the output for N, p, and q, and the Hardy Weinberg equation, verify the expected number of individuals with the AB phenotype. Show your work
In RStudio, this would be
> N*2*p*q
where N, p, and q are from the output ($N, $p, and $q)
Run the script below for Tab3.2
.
Tab3.2=function (AA=136,AB=66,BB=17)
{
AA=round(AA,0)
AB=round(AB,0)
BB=round(BB,0)
options(warn=-1)
N=AA+AB+BB
Two.N=2*N
A=2*AA+AB
B=2*BB+AB
f.A=(2*AA+AB)/Two.N
f.B=1-f.A
cat('\nObserved values:\n')
cat(c(AA,AB,BB))
cat('\n')
cat('Expected values:\n')
cat(c(f.A^2*N,2*f.A*f.B*N,f.B^2*N))cat('\n')
p = function(AA,AB,BB) return(exp(lgamma(N+1)+lgamma(A+1)+lgamma(B+1)+log(2)*AB
-lgamma(AA+1)-lgamma(AB+1)-lgamma(BB+1)-lgamma(Two.N+1)))
start.AA=(A-B)/2
stop.AA=A/2
sto.p=0
obs.p=p(AA,AB,BB)
for(i in start.AA:stop.AA){
t.AA=i
t.AB=A-2*t.AA
t.BB=N-t.AA-t.AB
sto.p[i-start.AA+1]=p(t.AA,t.AB,t.BB)
}
stop.AA=(start.AA:stop.AA)[max(which(sto.p>1.e-8))]
top=max(sto.p)*1.1
plot(c(start.AA-.5,stop.AA+0.5),c(0,top),type='n',xlab='Number of AA genotypes',
ylab='Probability')
tot.p=0
for(i in start.AA:stop.AA){
curr=i-start.AA+1
if(sto.p[curr]<=obs.p) {
tot.p=tot.p+sto.p[curr]
rect(i-.5,0,i+.5,sto.p[curr],density=40)
}
else rect(i-.5,0,i+.5,sto.p[curr])
}
lines(c(AA,AA),c(0,1))
cat('p-value:\n')
cat(round(tot.p,6))
cat('\n')
}
This is the example that is given in Table 3.2 (p. 59) of Mielke, Konigsberg, and Relethford. In the output the first line (136, 66, and 17) gives the observed genotype counts. The second line gives the expected counts under Hardy-Weinberg equilibrium, and the last line is the probability of getting the observed (or any more extreme counts) under Hardy-Weinberg equilibrium. The graph is a more visual form of Table 3.2 from Mielke, Konigsberg, and Relethford. If you want more practice using this script, try “Tab3.2(5,2,2)” which is the example from page 57 of Mielke, Konigsberg, and Relethford.
BUT now, for the actual problem. Do your own example starting with a fake sample that is in Hardy-Weinberg equilibrium, and then gradually “bump” the sample away from Hardy-Weinberg equilibrium. Here’s my example, below, using my favorite number (34). Please, though, do your own example. Here’s the easiest way to do this:
Start from: \(p^2\cdot N\), \(2\cdot p \cdot q\cdot N\), \(q^2\cdot N\) (p^2*N,2*p*q*N,q^2*N
)
and then “bump” one or more of the “N” numbers. Then copy the output
(as I did below) and paste into your document as problem #2.
> Tab3.2(.34^2*34,2*.34*(1-.34)*34,(1-.34)^2*34)
Observed values:
4 15 15
Expected values:
3.889706 15.22059 14.88971
p-value:
1
> Tab3.2(.34^2*34,2*.34*(1-.34)*34,(1-.34)^2*340)
Observed values:
4 15 148
Expected values:
0.7919162 21.41617 144.7919
p-value:
0.003327
> Tab3.2(.34^2*34,2*.34*.66*100,.66^2*34)
Observed values:
4 45 15
Expected values:
10.97266 31.05469 21.97266
p-value:
0.000623
> Tab3.2(.34^2*34,2*.34*(1-.34)*34,(1-.34)^2*3400)
Observed values:
4 15 1481
Expected values:
0.08816667 22.82367 1477.088
p-value:
1e-06
Which of your runs caused the largest departure from equilibrium? In other words, which run gave you the lowest p-value?
Figure 3.9 (on page 71 of Mielke, Konigsberg, and Relethford) was drawn using only two starting allele frequencies (0.0001 and 0.9999). But let’s say that we need to look at more starting allele frequencies in order to be convinced that in a balanced polymorphism selection always takes us to the same allele frequency (in this case at 0.808734).
Load the script below.
Fig3.9=function (w=c(0.7635,1,0.0001),p.start=c(0.0001,0.9999))
{
p=0.0001
t=0:42
w.AA=w[1]
w.AS=w[2]
w.SS=w[3]
p[1]=p.start[1]
for(i in 2:43){
ave.w=p[i-1]^2*w[1]+2*p[i-1]*(1-p[i-1])*w[2]+(1-p[i-1])^2*w[3]
p[i]=(p[i-1]^2*w[1]+p[i-1]*(1-p[i-1])*w[2])/ave.w
}
plot(t,p,type='l',xlim=c(0,40),ylim=c(0,1),xlab='Generation',
ylab='Allele Frequency (A)')
n.starts=length(p.start)
for(j in 2:n.starts){
p[1]=p.start[j]
for(i in 2:43){
ave.w=p[i-1]^2*w[1]+2*p[i-1]*(1-p[i-1])*w[2]+(1-p[i-1])^2*w[3]
p[i]=(p[i-1]^2*w[1]+p[i-1]*(1-p[i-1])*w[2])/ave.w}
lines(t,p)
}
}
Now run the following code and paste the graph into your homework.
Fig3.9(p=runif(100))
What does each line on the graph represent?
Now look at the opposite situation in which heterozygotes have decreased fitness.
Run the line below and include your graph in your homework.
Fig3.9(c(1,.6,1),runif(100))
Look up "fixed allele" (at Wikipeda) and explain how this relates to what you see in the graph.
Do the plot of average fitness for the sickle-cell example and heterozygote disadvantage and paste these graphs to your homework.
Fig3.13=function (w=c(0.7635,1,0.0001),x=c(.65,.95),y=c(0.77,0.82),
tan.int=.01)
{
w.AA=w[1]
w.AB=w[2]
w.BB=w[3]
p=seq(0.01,0.99,0.01)
ave.w = function(p) return(p^2*w[1]+2*p*(1-p)*w[2]+(1-p)^2*w[3])
slope.w=function(p){
return(2*(p*(w.AA-w.AB)+(1-p)*(w.AB-w.BB)))
}
plot(p,ave.w(p),type='l',xlim=x,ylim=y)
tangent=function(p){
b=slope.w(p)
a=ave.w(p)-p*b
abline(a,b)
}
tans=seq(x[1],x[2],tan.int)
n=length(tans)
for(i in 1:n) tangent(tans[i])
}
Run the line below and copy the graph to your homework:
Fig3.13()
Run the line below and copy the graph to your homework:
Fig3.13(c(1,.6,1),c(0,1),c(0,1),tan=.05)
In the second plot, if you drew arrows to indicate the direction of evolution, would they point toward or away from the middle of the plot?
Genetic drift is one of the harder concepts to grasp, so it will behoove you to look at a number of graphs. The scripts Fig3.17
and Fig3.20
were written in part so that people could “play around” and look at a
bunch of graphs. So I encourage you to do this. A useful way to use
Fig3.17 is as follows:
> Fig3.17(10^0) # i.e., deme sizes are all = 1
> Fig3.17(10^1) # i.e., deme sizes are all = 10
> Fig3.17(10^2) # i.e., deme sizes are all = 100
> Fig3.17(10^3) # i.e., deme sizes are all = 1000
> Fig3.17(10^4) # i.e., deme sizes are all = 10000
Fig3.17=function (N=10,gens=100,jit=0)
{
demes=100
sim.line=function(){
p=0
p[1]=0.5
for(i in 2:gens){
p[i]=rbinom(1,2*N,p[i-1])/(2*N)
}
return(p)
}
p=sim.line()
plot(1:gens,jitter(p,jit),type='l',xlab='Generation',ylab='p',ylim=c(0,1),main=paste('N = ',N))
for(i in 2:demes){
p=sim.line()
lines(1:gens,jitter(p,jit))
}
}
Fig3.20=function (Nids=5,gens=c(5,10,20,30))
{
opar = par(no.readonly = T)
on.exit(par(opar))
if(.Platform$OS.type=='windows') quartz=function() windows()
quartz()
par(mfrow=c(3,2))
my.hist=function(label){
plot(0:N.alleles,vec,xlim=c(-.5,N.alleles+.5),ylim=c(0,max(vec)),
type='n',axes=F,xlab='Number of A alleles',ylab='Frequency')
if(label==1) {title(main=paste(label,' Generation'))}
else {title(main=paste(label,' Generations'))}
axis(1)
axis(2)
for(i in 1:states){
rect(i-1.5,0,i-.5,vec[i])
}
}
N.alleles=2*Nids
print(noquote(paste(' There are ',N.alleles,' alleles per deme')))
print(noquote("How many alleles should be A alleles at the dawn of time?"))
cat('\nPut cursor back in R console to answer this question\n')
N.start=scan(n=1)
cat('\nClick on graphics window to see results\n')
states=N.alleles+1
Tr=diag(states)
for(i in 1:states){
p=(i-1)/N.alleles
Tr[i,]=dbinom(0:N.alleles,N.alleles,p)
}
vec=c(rep(0,N.start),1,rep(0,states-N.start-1))
my.hist(0)
vec=as.vector(vec%*%Tr)
my.hist(1)
i.count=1
for(i in 1:gens[4]){
vec=as.vector(vec%*%Tr)
if(i==gens[i.count]){
my.hist(gens[i.count])
i.count=i.count+1
}
}
}
You can use Fig3.20
in like fashion, but you will need to answer as to how many “A” alleles there are. For example, if you do Fig3.20(10^0)
you will be told that there are 2 alleles per deme, and for the dawn of time you can specify 0, 1, or 2 “A” alleles.
NOTE: there is nothing to turn in from this homework “problem” (Number 6) so you’re “on your own” for this last one.