People who enroll in genetic studies are genetically predisposed to do so.
According to the Catalogue of Bias, ascertainment bias occurs when a sample being studied is not representative of the target population. This can produce misleading or even false conclusions, and it can be hard to detect since it cannot usually be identified by examining the sample alone. This is why many studies try to use variables other than participation in the study to make sure their samples are as representative as possible.
Studies examining how a particular treatment affects a particular health outcome often try to handle ascertainment bias by adjusting for “covariates,” things like education level or socioeconomic status, that could affect health outcomes independently of the treatment. But Stefania Benonisdottir and Augustine Kong at Oxford’s Big Data Institute have just demonstrated that we can determine if genetic studies are biased using nothing but the genes of the participants.
And they used that technique to show that there’s a genetic contribution that influences the tendency to participate in genetic studies.
Finding bias
You may wonder how this was done—quite reasonably, since we can’t very well compare the genes of participants to those of non-participants. The analysis done by Kong and his student relies on the key idea that a genetic sequence that occurs more frequently in participants than in nonparticipants will also occur more frequently in the genetic regions that are shared by two related participants.
Put differently, a bit of DNA that is common in the population will show up frequently in the study. But it will still only have a 50/50 chance of showing up in the child of someone who carried a copy. If a bit of DNA makes people more likely to enroll in genetic studies, it will be more common both in the overall data and among closely related family members.
So they checked the genetic sequences shared between first-degree relatives—either parents and children or siblings (but not twins)—in the UK Biobank. They outline three principles of genetic-induced ascertainment bias:
- Among a population with shared ancestry who share identical stretches of DNA, like that of the UK Biobank, those identical stretches will be enriched with sequences that positively affect the decision to participate in the study. Of course, genetic sequences that are under positive selection for any other reason might also be enriched. That’s why it’s essential to have pairs of close relatives who share these identical sequences included in the study.
If a parent has a copy of a genetic variant that promotes participation, it will be passed on to the children in the study more frequently. Likewise with shared and not-shared genetic sequences between siblings within the study. If a DNA sequence shows these three behaviors—it shows up more frequently than it should by random chance, especially among parents, children, and siblings—that sequence probably induces participation. - Genetic sequences that promote participation will occur more frequently in participants with close relatives in the study than in those without.
- If genetics do in fact predispose people to participate, then there should be more pairs of first-degree relatives in the study than if participation is random. In the UK Biobank there are fully twice as many sibling pairs as would be expected by random sampling. It is worth noting here that the UK Biobank does not recruit families (participants are adults, ages 40–69, who provided consent). Still, family members do talk to each other, and they share other environmental influences.
This analysis used genetic data from about 500,000 people collected between 2006 and 2010. It examined roughly 500,000 genetic regions from around 20,000 pairs of first-degree relatives. They didn’t find (or look for) “a gene” that correlates with participation in a study. Rather, they compared all of the shared and not-shared genetic sequences among the pairs of first-degree relatives enrolled in the study and analyzed their relative frequencies according to the above three principles.