From candidate genes to genome-wide association studies

In my last post I asked where the genes for psychological traits are, and argued that over the last two decades candidate gene studies have failed to identify genes that are reliably associated with complex behavioral phenotypes. In this post, I will discuss more recent whole genome methods, such as genome-wide association studies, and what we have learned from these.

In the early 2000s, a new technology transformed the field: genome-wide association studies (GWAS), which simultaneously analyze common genetic variants across the entire genome. This was a made possible by the Human Genome Project, an international project with the goal of determining the sequence of base pairs that make up human DNA, and identifying and mapping all the genes in the human genome. Previously we had to identify a handful of putative candidate genes in advance, based on what we thought we knew about the neurobiological basis of the trait or behavior of interest. Now we could investigate the whole genome to try to locate genes associated with these phenotypes.

However, this technology came at a cost; given the very large number of statistical tests conducted in a GWAS (at least 500,000 genetic variants tested, and now as many as 2 million), a significance threshold of 0.05 would be far too liberal (the well-known multiple comparisons problem).The solution was to impose a corrected p-value threshold of 10-8, known as “genome-wide significance”. To put that into context, the “five-sigma” level used in particle physics corresponds to a p-value of only ~10-7. In GWAS, the statistical bar is set high.

This, in turn, had some interesting consequences: it was clear that to achieve genome-wide significance, very large samples would be required (particularly if, as suspected, the effects being sought were also very small). No individual group would be able to achieve this, so an era of collaboration began. Research groups came together in large, multinational consortia, to harmonise and pool their data.This change transformed the field. While the candidate gene era produced (arguably) no findings that have stood the test of time, GWAS has revealed countless reproducible findings. The combination of a theory-free approach and very large samples (providing adequate power to detect even very small effects) has been transformative. There are some interesting parallels with the Many Labs project.

Variants associated with a number of disease phenotypes have been identified, as well as for phenotypes relevant to behavioral researchers (e.g., tobacco use, schizophrenia). Two main conclusions can be drawn from this research. First, it is now clear that the effects of individual common genetic variants on complex behavioral phenotypes are very small, with each accounting for 0.1% of less of variation in that phenotype. This means that enormous numbers of genes are likely to contribute to variation in those phenotypes. Second, the candidate genes that were intensively studied before the advent of GWAS have, for the most part, not emerged in GWAS. This is particularly striking because it suggests that what we thought we knew about the neurobiology of the behaviors we were investigating was incomplete, at best.

One example illustrates this well. Tobacco dependence has for a long time been thought to be principally due to the effects of nicotine (the main addictive constituent of tobacco) on two nicotine acetylcholine receptor (nAChR) sub-types – the alpha-4 and beta-2. However, GWAS of smoking behavior indicated that a variant in a cluster of genes encoding different nAChRs (the alpha-5, alpha-3 and beta-4) influences how heavily people smoke. This finding led to renewed interest in these nAChRs, and evidence quickly emerged that the alpha-5, in particular, appears to mediate the ability to tolerate high doses of nicotine that would otherwise be toxic and aversive.

The mismatch between the genes identified by GWAS and those we were studying in the candidate gene era is striking. What is most exciting is that the theory-free approach of GWAS appears to be revealing new insights in a way that, with hindsight, candidate genes never could have done. By definition, in candidate gene studies we were looking in neurobiological systems we thought were involved in the phenotype of interest. It’s not clear whether these could ever have genuinely told us that was truly novel. GWAS, on the other hand, bolstered by high levels of statistical stringency to protect us against false positives, is providing us with a deeper understanding of the neurobiological systems that drive behavior.


Flint, J. & Munafò, M.R. (2013). Candidate and non-candidate genes in behavior genetics. Current Opinion in Neurobiology, 23, 57-61.

Fowler, C.D., Lu, Q., Johnson, P.M., Marks, M.J. & Kenny, P.J. (2011). Habenular α5 nicotinic receptor subunit signalling controls nicotine intake. Nature, 471, 597-601.