Oops.
Hands up those of you who think a candidate gene approach to QTLs is a good idea.
Now, you all need to sit down, take a deep breath, and then
read this post at Gene Expression, which discusses a recent paper in PLoS Genetics.
The gist of it is that a lot of between-population variation in gene expression seems to be caused by trans-acting modifiers of gene expression. If this is true (and there are several reason it may not be: see below), then candidate gene approaches to finding QTLs may be of little use.
The candidate gene approach is premised on the idea that we can guess the genes that control a trait, and the variation in the trait is controlled by variation at or near the gene. This derives from an old-fashioned view where (simplifying somewhat) the structure of a protein is what determines the trait. But one of the incidental effects of evo-devo has been to show that a lot of developmental processes are modulated through gene regulation: i.e. transcription factors (like HIF) are important. This does not necessarily mean that the old-fashioned view doesn’t work: if the genetic polymorphism that causes variation in regulation is close to the gene (i.e. it is cis-acting), then the approach will still find the right gene, even if further fine-scale mapping points upstream of the coding sequence.
Problems arise if the polymorphism is not near the gene (i.e. it is trans-acting). For example, it may be in a transcription factor that is involved in regulation of expression of one of the candidate genes. In that case, a candidate gene approach will miss the effect, unless another candidate gene happens by chance to be close to the right spot.
What the PLoS Genetics paper claims is that most of the genetic variation in gene expression is caused by trans-acting factors, about 88% (with a standard error of 3%). In other words, variation in gene expression is largely controlled by parts of the genome away from the gene in question. So, if the evo-devo people are right and gene expression is what is important for variation, candidate gene approaches to finding QTLs will miss most QTLs.
There are a few oddities about this study that should give pause for thought. the study is an association study: in essence, the authors take a population of about 100 African-Americans and regress the proportion of African-Americans ancestry against expression, and show that the regression coefficient is much larger for the whole genome proportion than for the SNP in the middle of the gene whose expression is measured.
One problem is the way ancestry is defined. The ends of the distribution are given by two standard populations: European, and Yoruba. The Yoruba are a tribe in Nigeria (incidentally, the tribe most of the political elite hail from). But the African-American population does not seem to be drawn from individuals of Yoruba descent. Given the comparatively large genetic variation in Africa, one wonders how representative one tribe is. Would a different result be obtained if a different group, say from Cameroon, had been used?
There is another subtler problem. The ideal way to estimate the effect of cis-acting elements would be to use a SNP sat on or right next to the element. Instead the authors of the paper use a SNP in the middle of the gene, i.e. some distance away. Obviously there could be recombination between the two. The only information given about LD is
Because chromosomal segments of ancestry in AA typically span >10 Mb [21], it is nearly always the case that a gene lies completely within a single ancestry block, so that our analysis is not sensitive to the choice of genomic location used to define cis ancestry γgs.
But Reference 21 shows that there is about a 50% probability of a recombination between sites 10Mb apart (assuming 1Mb=1cM). So I’m not sure we can really rule out recombination (I should really now present some data on the distance between the centre of a gene and cis-acting regulatory elements, shouldn’t I?).
Why is this a problem? Well, it is well-known in statistics that if a covariate in a regression is measured with error, then the estimated coefficient is biased towards zero, i.e. the effect is underestimated. So, if the LD between the regulatory element and the SNP is less than 1, the estimate of the cis effect is too small. How bad this is I can’t be sure: it will obviously depend on the LD.
Of course, the same problem will arise with trans-acting elements too. But the error might be smaller – if there are several trans-acting factors, then the average African ancestry in the SNPs will be closer to the average in the trans-acting elements, and hence the error will be less. How important this is I’m not sure – one would need to look at it carefully to decide.
Haha! This must be a good paper – more research is required!
Alkes L. Price, Nick Patterson, Dustin C. Hancks, Simon Myers, David Reich, Vivian G. Cheung, Richard S. Spielman (2008). Effects of cis and trans Genetic Ancestry on Gene Expression in African Americans PLoS Genetics, 4 (12) DOI: 10.1371/journal.pgen.1000294


2 Comments
So what do you think – could this be (part of) of the answer for the puzzle blogged here?
http://blogs.helsinki.fi/egru-blog/2008/11/20/heritability-hangover/
Possibly, it would depend on how large the effects of the trans-acting factors were. If there are a lot (or a few) with small effects, it could. But GWAs should pick up any variation that has a large effect, whether it’s in a coding region, promoter, transcription factor or whatever.