Thursday, September 10, 2015

What population genetic diversity can and can't tell us

By Anne Buchanan and Ken Weiss

Genetic diversity is indisputably a marker of geographic origin and human migration.  The reason is very simple: new mutations arise independently and, to a great extent uniquely, and they arise in some local area with only a single copy of the newly arisen variant.  Over time, that variant will either disappear (not be passed down to any offspring) or may increase in frequency.  Because humans traditionally had but few surviving children per parent, and mated locally, only slow increases and spread of descendant copies of a variant would occur.  Local areas had a unique pattern of genomic variants and, depending on their population size and structure, different amounts of variation.  Because all humans originated from a smallish emigration from a source population in Africa, there is more, and more complex, genomic variation there than in Eurasia.

Beyond these clear facts about the amount and distribution of human genomic diversity, interpretations of what it means, implies, involves get fuzzy, political, emotional and controversial; race is seen as either a genetic construct or a social one, and it is correlated in some ways with geographic location or origin, so that it is not obvious how genetic variation per se can be interpreted in terms of traits like societal diversity in wealth, achievements and the like.

The danger of course is to assume that geographic correlation of some societal trait with genomic variation is caused by that variation, that is, that societal variation is 'genetic'.  It is natural for some in the developed world to want to see their achievements as being due to inherent genetic traits (read: superiority), and there is a very long history, all the way back to the Greeks in western tradition, to hold such views of inherency.  But this is hard to demonstrate.

An interesting new paper in the September issue of Genetics tries to make some sense of the meaning of genetic diversity ("Genetic Diversity and Societally Important Disparities," Rosenberg and Kang, 2015) by examining "the ways in which population differences in genetic diversity might contribute to consequential societal differences across populations." Rosenberg and Kang assess the importance of genetic diversity in forensics, organ transplants, and genome wide association studies, as well as its contribution to societal disparities.  They conclude that genetic diversity must be taken into account for biological purposes, but they find no association with societal diversity.  Here's why.

Their paper was at least in part occasioned by a controversy over a 2013 report concluding that population genetic variation can be used as a proxy for economic diversity, and success ("The 'Out of Africa' Hypothesis, Human Genetic Diversity, and Comparative Economic Development," American Economic Review, Ashraf and Galor, 2013).  Ashraf and Galor (A and G) write:
This research advances and empirically establishes the hypothesis that, in the course of the prehistoric exodus of Homo sapiens out of Africa, variation in migratory distance to various settlements across the globe affected genetic diversity and has had a persistent hump-shaped effect on comparative economic development, reflecting the trade-off between the beneficial and the detrimental effects of diversity on productivity. While the low diversity of Native American populations and the high diversity of African populations have been detrimental for the development of these regions, the intermediate levels of diversity associated with European and Asian populations have been conducive for development.
And, this was all determined at "the dawn of humankind."  Naturally, and conveniently, a hump-shaped pattern rather than a simple linear one was needed if one had to similarly denigrate Native Americans and Africans.  None of that sort of argument for inherency is qualitatively new but the attempt to make it genetic and hence inherently true had a juicy appeal.  Rosenberg and Kang (R and K), however, apply the same methods to an even larger data set and find no association with economic success.

R and K make it clear that, in their attempt to replicate A and G's study, they are considering within-population diversity, not between.  This is important, because internal diversity is calculated from the population itself, not from a larger collection of populations which has various issues of sample selection, sample size, and the like. Within a population when one can assume approximate random-mating, one can estimate heterozygosity in ways far more unclear when analyzing multiple populations at one go.  So, R and K are calculating expected heterozygosity, "the probability that two draws from a population at a specific site in the genome will produce different genetic types."

Expected heterozygosity follows a consistent geographic pattern,  
...occurring as a function of increasing distance from East Africa, measured over land-based routes. The highest heterozygosities appear in populations from Africa, followed by populations from the Middle East, Europe, and Central and South Asia. Populations of East Asia have still lower heterozygosities, and Pacific Islander and Native American populations, at the greatest geographic distance from Africa over migration paths traversed in human evolution, are the least heterozygous. The linear decrease in heterozygosity with increasing distance from Africa is a strong and replicable
relationship, achieving correlation coefficients near 20.9 in a variety of studies of different genetic markers and sets of populations.
The explanation for the decreasing diversity out of Africa is that each new founding population is a subset of the original group, and thus carries with it less genetic diversity than the non-migrants.



The serial founder model in human evolution. (A) A schematic of the model. Each color
represents a distinct allele. Migration events outward from Africa tend to carry with them only a
subset of the genetic diversity from the source population, and some alleles are lost during
migration events.  (B) An example of the model at a particular genetic locus, TGA012. Each set of
vertical bars depicts the allele frequencies in a population, with different colors representing distinct
alleles. Within continental regions, populations are plotted from left to right in decreasing order
of expected heterozygosity at the locus [equation (3)]. This figure illustrates the loss of alleles across
geographic regions; Native Americans all possess the same allele. The allele frequencies are taken
from Rosenberg et al. (2005).  Source: Rosenberg and Kang, 2015

Other factors influence diversity as well, such as admixture between different groups, but distance from the original source is replicably the primary determining factor.  There are of course geographic irregularities, such as bodies of water or mountain ranges, but the general pattern is clear, consistent with archeology, linguistic patterns, and so on.

Tests of the interaction between genetic diversity and social factors
Forensics
Genetic diversity is used in forensics to identify a suspect with high probability if the DNA from the crime scene is a perfect match to an individual in the database.  If an exact match isn't found, the DNA profile may be used to identify relatives, which can be done because they will differ by theoretically predictable amounts.  The underlying genetic heterozygosity in a population, however, determines the likelihood that a partial match to a sample is from a genetic relative.  In a low diversity population, risk of a false positive is higher than in a high diversity population, because in the former a higher fraction of individuals will share each allele, which will mean it is less informative.

The different levels of genetic diversity in different populations means that the usefulness of DNA for identification purposes varies between populations.  And, populations are unequally represented in forensic databases.  That is a social issue, not a biological one, and doesn't obviate the relationship between genetic diversity and identification of social relationships.

Transplants
Genetic diversity is important in determining matches for the purpose of organ transplantation, particularly bone marrow.  Here, higher diversity populations will have lower match probabilities -- that is, it's most difficult to find a match when diversity in the population is highest, and the difficulty descends with decreasing diversity.  These are rather clear issues.

The difficulty is greater when populations are less likely to be well represented in match databases, which is, again, a social issue.
...the chance that no donor match is found is greatest for African Americans, followed by the Asian-American, Hispanic, Native American, and white groups. As in the forensic case, the population genetics of genetic diversity, together with societal factors that vary across populations, contributes to the quantity of ultimate interest. Both genetic diversity and its interaction with factors that affect participation in transplantation are important in increasing the probability that any given recipient can find a successful match.
GWAS
Genome wide association studies searching for alleles associated with disease rely on the relative proximity of SNPs, or DNA markers, with disease alleles.  In populations with high genetic diversity, in African populations, or among African Americans, because of the longer history of genomic recombination events that scramble nearby nucleotide variants over the generations, results in lower linkage disequilibrium (LD), so that the proximity of markers to causal alleles can't be relied upon with the same likelihoods as in more recent populations.  One needs more marker test sites to find the LD one needs to make associations with traits, for example.  R and K report that it has been estimated that 96% of subjects in GWAS are of European ancestry. The social implications of this are that disease alleles are even less likely to be identified in high diversity populations than in others.  The vast majority of GWAS and similar findings can be extrapolated only with great and unknown uncertainty at present (though many still attempt it, in what can be called expeditions of wishful thinking).

So, these are three examples of situations in which differences in genetic diversity between populations, interacting with social diversity, can have important social implications -- false positives in forensics, low probabilities of transplant matches, and low likelihood of inclusion in genetic research.
Each of these settings involves a problem that is fundamentally biological—DNA-based identification, transplantation, and genetics of disease. In each setting, principles from population-genetic theory in which aspects of genetic diversity feature prominently underlie the contribution of genetic diversity: theories of forensic and transplantation matching explicitly produce an inverse relationship between match probabilities and genetic diversity, and GWA statistics rely on models of the decay of genetic diversity and production of LD during migrations.  
Back to economics
R and K then return to the societal economics question, to re-examine whether population-level biological determinants are relevant to economic development, asking whether population genetic diversity is as useful when applied to a discipline in which population genetics theory is not relevant. Among other things, there are dangers of being statistically misled by phenomena such as Simpson's paradox and the ecological fallacy.

A and G used a small amount of genetic data to calculate genetic heterozygosity for a small number of populations, and imputed heterozygosity for many more based on geographic distance from Africa. Imputation generally takes sites found in one study that didn't look for variation between them, and assumes the states of those internal sites based on studies of other pouplations where they were typed.  This is a common, if iffy practice, in GWAS, but at least works reasonably well when the samples are from the same geographic area, such as Europe. It is sometimes needed because different GWA studies of a given trait use different marker sites (because they use different genotyping platforms).

R and K recalulated the results by using actual genetic data for more populations, but retaining the same analytic methods used in the original study.  So, rather than actual data for 53 populations in 21 countries, R and K used genetic data from 237 populations in 39 countries.  And they found no effect of genetic diversity on economic success.

Further, they chose multiple different samples of 21 countries, and found a significant effect in at most 27% of them.  Thus, three quarters of the time, had A and G chosen a different sample subset, they would have found no effect.  And, conclude R and K, even if the assumption that studying population genetic diversity and its effect on economic development is valid, the effect didn't persist for an expanded set of populations and countries.  While genetic diversity affects differences between populations in a variety of other ways, when the effect is biological and population genetics theory applies, economic success is not one of them.  "[P]rinciples of population genetics produce no theory of the economic development of nations..."

It is of course plausible that overall variation patterns include variation that leads one population, overall, to have more, or less, of some societal attribute.  One can always construct post hoc stories that fit social prejudices, for example.  But plausibility is not the same as truth, and one can -- and should -- ask why the investigators are making their societal assertions in the first place.  Generally, we know the answer, and it isn't very savory.

No comments: