Overdiagnosis: false-negatives and sensitivity of RT-PCR [1]
why false negatives aren't much of a problem in practice
Welcome to Limits of Inference! The post below is not intended as a self-standing piece. This is some supplementary context in support of a previous article. To get an introduction to the problem of overdiagnosis, check out the original piece here.
Questions? Concerns? Let me know in the comments. Also, subscribe below.
Lab test data is not natively 'positive' or 'negative.' The measured data is collected a continuous number like 5, 0.001, or 200. The test protocol includes at least one decision threshold to turn this continuous data into something discrete (positive, negative). This hidden inference explains why one cannot just prevent overdiagnosis by fixing the tests.
If you make a test more sensitive by lowering the decision threshold (find more true-positives), it also becomes less specific (more susceptible to false-positives). These two types of error cannot be manipulated independently except by scientific advancement or technological invention. Depending on the intended use case, engineers can design a test to prefer false-negatives or false-positives
Biomedical tests tend to be engineered to avoid false-negatives and prefer false-positives. To some, RT-PCR tests feel like an exception to this heuristic, creating confusion, even among epidemiologists. Many research scientists understand RT-PCR to have low sensitivity (it misses the virus ~10% of the time) and high specificity (false-positive only 5% of the time). Those numbers aren’t actually that low or high relative to other tests, but the test itself is more specific than sensitive.
Given these numbers, many scientists have concluded (first on Twitter then cited by journalists) that there are more false-negatives than false-positives. I was surprised to see that claim since it is counter to all past experience screening for disease. I carefully considered the scientific arguments as I came across them. Most of the mistakes were mathematical (Bayesian reasoning is difficult). But when not that, these data stories tend to miss two crucial points.
First, we care about the sensitivity of the entire testing strategy (the clinical sensitivity), not of one RT-PCR test delivered in isolation (the technical sensitivity). The clinical sensitivity is higher than the technical sensitivity because, in a clinic, we can test people multiple times. If someone is suspected of having COVID-19 due to symptoms or exposure and gets a negative test result, a physician will test again until the patient tests positive. Even given an initial false-negative, the case is likely to get counted eventually as a true-positive in the disease prevalence data. The observed sensitivity of RT-PCR ends up much higher than its sensitivity measured in the lab. At worst, we find the case a day or two later. There is no such correction pathway for false-positives (at least not using CDC guidance).
Fun fact, this repeat-test strategy is the same methodology used in the original experiment that measured the sensitivity statistic most often cited for COVID-19 RT-PCR. People known to be exposed to the virus are selected for the study. The experimenter tests this same group of people repeatedly over multiple days and counts how many people who initially test negative go on to test positive in the future. The available RT-PCR performance stats are better understood to be the ratio of the technical sensitivity to clinical sensitivity. 80% of all patients who will ever test positive do so on the first test, but 100% eventually get diagnosed. This is another example of how context matters for the interpretation of data— those who google numbers or statistics and apply them without reading the underlying papers will tend to make poor inferences.
Repeated testing is a reasonable strategy because the signal (the spread of disease) can change with time as people are newly infected. Also, if patients are high-risk, we want to be more certain about the test result. We are not missing that many true cases in practice, at least not due specifically to RT-PCR sensitivity.
The second point that is missed is that the biology behind the test isn’t that important overall. A significant source of false-positives is human or logistical errors. Biological testing errors can only add to the noise, not counteract it. For example, the CDC released thousands of tests in February that gave 100% false positives due to a manufacturing error. This is exactly the type of fluke that people operating in the real world or using the resulting data need to consider when making inferences. We know about that one mistake, but we are rarely lucky enough to know when or why most errors arise. We know enough about human fallibility, in general, to be confident that errors are happening. Still, we do not know precisely when or where, or how much without field testing.
This is a limit of inference. Domain-specific knowledge of infectious disease or RT-PCR is not the only, nor the most important context needed to tell the data story.