Discover more from Limits of Inference
Theory over Data
how scientific reasoning and logic failed during a public health crisis
We know how this pandemic will end.
We did not know precisely when it would start. It took a moment to realize it was happening. Yet, we knew how this was going to end a year ago, two years ago, even a decade ago.
Nothing novel published this year in Nature or Science (or posted to biorXiv) will be relevant to how this ends. New data, particularly any data that suggests something different or unexpected, is much more likely to be wrong than right.
The unit that matters for scientific predictions is not data but theory. Scientific theories are famously powerful because they generalize. Scientific theories work even without specific knowledge of the circumstances.
The day to day work of a research scientist being what it is, requiring laser-like focus on the bleeding edge of knowledge and measured by the speed of publication, it is easy to understand why this community has developed an obsession with novelty.
It is a problem when those habits bleed over to the management of a public health crisis. Knowledge is not generated study by study, week by week, or day by day. Scientific knowledge grows decade by decade or even century by century.
A dismissiveness of old theories and obsession with new data has led to several bad decisions during this global pandemic. The cycle goes like this: forget what was known from theory, delay essential decisions out of uncertainty, act on erroneous conclusions from brand new studies, then eventually figure out that the original theory was right all along.
During the 1918 influenza pandemic, germ theory was brand new and mostly undeveloped. No one had heard the word virus or knew what distinguished viruses from bacteria or other pathogens. But scientists had figured out that the public wearing face masks effectively slowed the spread of influenza.
One hundred years later, during the COVID-19 pandemic, modern scientific leaders — CDC, WHO, NIH — did not think there was enough evidence that masks would slow the spread of an infectious respiratory virus. This decision turned out to be a mistake.
The breadth of evidence available at the time makes this a tough one to understand. For one, other countries have used face masks to control epidemics in recent years. Also, doctors and scientists use face masks every day to prevent exposure to viruses in hospitals and labs across the country. Evidence that masks were useful in preventing exposure to infectious disease was, literally, in front of our faces.
Yet, no new study specifically demonstrated that masks worked for this new coronavirus, so the experts were unsure. Excuses for the uncertainty abound. Masks are unhelpful or even a placebo. It was not clear if asymptomatic spread was a factor. People might use them imperfectly. It would give a false sense of security.
It is worth noting: none of these concerns are relevant in a crisis. These are optimizations. There was nothing yet in place to optimize. 10% effective is better than 0% effective. Some insist it wasn’t a matter of effectiveness but supply. If supply was the concern, then the right strategy was to ask the public to put a t-shirt in front of their face, a bandana, or a shield. Nearly any physical barrier would help. There was a shortage of pink yarn from the sheer number of knit hats within very recent memory. There is little doubt that an effort of that scale or bigger would have mobilized in response to a global pandemic.
Today we are using masks to control the coronavirus, exactly as the public did for the flu in 1919, and China has been for the past decade. The experts eventually caught up to where the thinking was a hundred years ago. The information needed to make an effective decision was available long before this pandemic began. There just wasn’t a study or data to prove it, and the experts were too caught up in the details and edge cases to accept imperfect solutions and think outside the box.
Theory also predicts that the next global pandemic would most likely be of “flu-like severity.” This has turned out to be more or less the case, probably less deadly than the worst flu strains in history. COVID-19 is solidly within the range of variability of influenzas that have evolved in the past.
Yet, the scientific community spent months in the middle of this crisis debating the numbers after the decimal place, and how to correct the data to get the exact right number. This confusion took up space in journals and the media, and many concluded that this virus was worse in every way than the flu. Many people, including some epidemiologists, still believe this disease to be more meaningfully more severe and more infectious than the flu.
There is a lot of domain knowledge packed into that phrase “flu-like severity,” so it is worth introducing the theory itself.
The Theory of Severity and Spread
A virus is not alive and cannot reproduce by itself. It needs a living host. This fact is what most differentiates a virus from bacteria. When a virus uses the cells of another organism to replicate, we say the host is infected. Infection is not the same as disease. A disease is defined by the symptoms or suffering inflicted on the host organism due to a viral infection. Only a tiny fraction of viruses cause disease.
The dependence on other organisms means the pace and path of a viral outbreak change with the host's social behavior. The host behaviors often matter more than the biology of the virus, particularly when controlled for the method of transmission (airborne, mosquitoes, physical contact, etc.).
Viruses are more infectious when they spread without disease, i.e., asymptomatically. Healthy hosts move around and interact more socially than diseased hosts, giving the virus more chances to spread. The most viral viruses, the most infectious viruses, are ones that the public has never heard about. These are infecting humans all the time, causing numerous non-deadly epidemics. The goal of public health is to minimize disease, not eliminate viruses from the earth. Infection is not disease, so we do not consider it harmful.
When a viral infection reliably causes a very deadly disease, one where an infected individual is likely to become sick or die, the number of infected hosts will be much lower. Humans distance ourselves from our social groups when sick, so infections that cause suffering or disease have fewer chances to spread. Examples of viruses that cause these deadly diseases are Ebola, MERS, and smallpox. Epidemics of these types spread slower and are more likely to stay geographically-localized (though globalization has weakened this effect).
The third class of viruses causes the deadly global pandemics of history. These pandemics are rare because it requires a strange set of circumstances to achieve both fast viral spread (infection) and impact on health (disease). It requires a disease that is both severe enough to kill and also minor enough that it does not impact the social behavior of the host. There are two ways viruses have been able to pull this off. One is when hosts are contagious and asymptomatic for long periods before a deadly disease develops. This is known as delayed disease onset (HIV-like). The more traveled path is one where different hosts have disparate outcomes, there is unequal risk within the host population. Most hosts survive asymptomatically or with minor symptoms. However, a small percentage of hosts experience severe disease and even death. This is what is meant by a disease of ‘flu-like severity.’
The risk profiles of flu-like illnesses are predictable from an understanding of the mechanisms. Viruses can only mutate so much and still be infectious. Hosts within the same species are by far more biochemically similar than they are different. So flu-like viruses exploit other systemic health issues, impacting the same high-risk groups that suffer the most consequences from all diseases (not just viruses): the young, the old, the sick, the immunocompromised, or anyone in environmental conditions that increases the odds of secondary infections (like wartime conditions or poverty).
As the human population is large, even a minor disease can result in a heartbreaking death toll if it can effectively reach any of these high-risk populations. The result is a counterintuitive situation where the virus that has caused the most deaths and the worst pandemics in history is also a common nuisance that people recover from numerous times in life.
In summary, when humans infect each other, disease severity (risk to the average person) and its ability to spread (infectiousness) are inversely correlated. One goes up, and the other goes down.
Application of Theories in Practice
Theories like this are powerful because it is possible to know a lot from little observation. If one knew absolutely nothing about a virus and had to take a guess, the default guess is that the virus is harmless. There are many more harmless viruses than harmful ones. As soon as this virus was observed in China, we knew it was a deadly virus (people were dying quickly). There are more Ebola-like deadly viruses than flu-like, so the best guess at this stage was it was another coronavirus like SARS or MERS. Shortly after that, the new coronavirus was observed in multiple counties. Over the next week, as the virus spread and cases soared, it was increasingly clear that this was a disease of flu-like severity. It is hard to imagine a virus spreading widely to multiple continents simultaneously without healthy hosts going about their lives as usual.
Inference from this theory required no numbers, no statistics, just a few observations. This is textbook stuff, so it is promising for the scientific method that things have more or less played out as predicted.
Scientists are obsessed with details. Some will fairly criticize my summary above and point out that aspects of this theory are wrong. The amount of time a person spends asymptomatic varies between viruses. The biochemistry of the virus also plays a role. All of these things are correct. My summary is a generalization.
Obsessing over edge cases is missing the point. When leaders make strategic decisions, the details do not matter much because the solutions available are similarly broad. The most important thing is not to end up fighting the wrong battle altogether.
Like face masks and disease severity, months of discussion and media coverage was devoted to the role of asymptomatic spread in this pandemic. The experts (WHO) repeatedly cautioned that asymptomatic spread had not been observed to be a factor early in the pandemic.
Asymptomatic means the absence of any visible symptoms or disease. It is challenging to observe asymptomatic spread. By its nature, it often goes undetected. It would require testing of seemingly healthy individuals. Diagnostic tests will be of limited supply early in any pandemic, not necessarily available for researchers to use to test a hypothesis and publish a study.
Theory predicts that asymptomatic spread is a major contributor to a pandemic's speed with human to human transmission, and SARS-CoV-2 was moving fast. Time is of the essence, and different decisions are required if the contagious are among the healthy. For example, one would be telling everyone—not just visibly sick people— to wear a mask if the goal is to flatten the curve. Asymptomatic spread also changes the likelihood of success of containment measures, how long the measures would need to last to be effective, and the severity of the human and economic costs incurred from preventative measures.
It has now been shown via a few different studies that asymptomatic spread is critically important to the infectiousness of SARS-CoV-2. It is great to have confirmation, but we could have known that this was likely from theory alone. There are good odds of being right using basic reasoning, plus the added benefit of being empowered to act quickly in an evolving crisis.
Nothing New Under the Sun
Good theories are many studies in one small package. To conclude that severity and infectiousness are inversely correlated took hundreds of studies (many of whose top-level conclusions were wrong), including biological experiments, historical analyses of pandemic events, statistical models, and field experiments on ongoing epidemics. This theory led to the development of techniques that allow containment of Ebola-like viruses, including quarantines and contract-tracing. A theory with a track record of helpful predictions is the best evidence the scientific method can provide. No theory is perfect, but tried-and-true theory is much less likely to be wrong than new data collected mid-crisis.
If a new study reports a finding that conflicts with existing theory, that’s one vote yes and hundreds of other votes no. It is better to trust theory until there is more evidence that falsifies it.
In a pinch, a useful heuristic to judge a new study is to look for a plausible mechanism for the claim. Never trust any number, effect size, or statistical significance metric. The data matters much less than the quality of the idea.
Rare and strange symptoms of disease seem to appear only in specific age groups? No. Random correlations with obscure treatments or blood types? No. They are statistical results and are more likely spurious than real. Data lies. Theory has had time to be corrected.
The Trump Factor, The Noise Factor
Many experts know to use theory first. Accurate inferences are everywhere if one knows where to look for them. But, there were not enough voices to counter the scale of the noise.
It does seem there is truth to the idea that there was a Trumpian takeover of expertise. The first spokespeople for COVID-19 at the CDC appeared to be working from this theoretical framework (though wrong on face masks). In February and early March, these scientists made each of the observations outlined above and then presented predictions consistent with theory. The message was this: the disease had spread globally, almost entirely undetected. Therefore it is clearly very infectious. Many Americans will soon be exposed to the novel coronavirus if they are not already. Most Americans were not at any substantial risk from the disease. The epidemiologists then insisted that focus and resources stay on protecting high-risk groups.
Then, this signal stopped entirely. All CDC telebriefing on COVID-19 stopped for around two months. Things appear to have gotten weird. According to reporting at the time, Trump was upset that the CDC's message was that this new virus would infect many people. The details are murky, but the CDC did go silent. Since VP Mike Pence took over, the early thought leaders and experts are nowhere to be heard, though that could be due to burnout during a crisis.
This two-month gap in information and messaging from a critical government institution left a void. That void was filled by the loudest voices, not the most informed. The original predictions were soon buried under the weight of scientists and data scientists analyzing the data just out of China and Italy. These scholars concluded that this disease was terrifying in an unknown and never before seen way. At least, this is what appeared to happen in news sources on the progressive and center-left.
The numbers, when viewed exactly as reported, did look terrifying. They suggested a virus that is both more infectious and more deadly than the flu. That would be unheard of, an entirely unprecedented event. Nothing like that has occurred before. Worse than the deadliest virus in history.
Claiming something is entirely new is good salesmanship, but these claims are warnings to any scientist, even those coming from different fields. What is more likely: that this is something wholly different, something entirely out of step with what has happened before (i.e., all of human history), or that these numbers are wrong?
Those familiar with the fundamental theories in epidemiology would have had even more reason to be skeptical of these ‘hot-off-the-press’ data-driven claims. If the disease was meaningfully different than the flu in severity, it was either more infectious or more deadly. But it was not both.
The evidence that the virus that was quickly spreading was compelling. The evidence that COVID-19 was deadly outside of the usual high-risk groups was less compelling. The worst cases are observed first because they are easier to find. The early data collected in an epidemic is expected to be skewed by this sort of observer bias. Everything we observed was largely consistent with theory. There was no reason to think something entirely new was happening. Too few were willing or able to think past the data, past the numbers on a page.
The problem with misinformation is that it results in bad decisions
Fear is a powerful motivator. People are afraid of novel viruses for understandable reasons. Conclusions that appear to justify that fear are emotionally validating. These conclusions will overpower more well-reasoned ideas. But that is only part of what went wrong.
The flu is the most deadly virus in human history. Being flu-like in severity, plus a novel virus with no immunity meant the world was (and is) facing a severe pandemic, among the worst viral pandemics we have seen in history. We had many reasons to fear this virus. They just were not the reasons that were used to make policy decisions. That is what makes this type of inferential mistake such an enormous problem, not a minor annoyance. It compounds the damages from an already very dire situation, making it even worse than it needed to be.
Given the understanding from the theory that any disease capable of causing this type of global pandemic would be approximately flu-like in severity, one would have known 1) masks were important, 2) early claims that the disease was deadly in healthy adults were unlikely to be accurate, 3) that asymptomatic spread was a factor 4) that the risk from disease is going to be dramatically different between highest-risk and lowest-risk populations. Critically, one would have also had to have come up with an answer to the obvious question:
If this disease is like the flu, why are locking down today when we don’t lockdown for the flu?
Obvious questions that stem from theory deserve to have thoughtful answers. Good questions expose flaws in our reasoning or thought processes. Many experts believed this disease was an existential risk, one different than the flu, so they never tried to come up with an answer to this obvious question. Thinking through this in the context of the theory and available resources would have been a critical thought exercise that shaped a better policy response. Why don’t we just contain the flu? Perhaps there is a good reason for the past hundred years of decisions that have been made?
Theory on How this Pandemic Ends
Absent treatments or cures, our public health tools for infectious diseases, are designed to amplify natural biological defense mechanisms. We do what nature does already, just more of it and better.
Quarantine, contract tracing, etc. — these were tools designed to work on Ebola-like viruses. Specifically, they are designed to speed up the natural mechanism of containment that occurs when every host in a population is at risk. Humans naturally remove ourselves from social groups when sick. Our species may have evolved the biochemical mechanisms that cause certain symptoms for this purpose. Evolutionarily, it helps make sure a person knows when they are ill and does not infect others.
Social distancing is not the natural defense mechanism for viral diseases that are flu-like. It does not work well when there is a large amount of asymptomatic spread. Evolutionarily speaking, it would not be in our best interest to quarantine or social distance even when not sick. Humans being social is essential to our survival. If we always socially distanced just in case, humans would have gone extinct. Life is what happens when we are not sick. Perhaps there is something to learn somewhere in there about the costs of blunt social distancing approaches.
Nature has a different defense mechanism for flu-like pandemics: population immunity. Though relevant to all viruses and pathogens at the individual level, immunity at the population level is the dominant mitigation method for ending the pandemic phase of a flu-like virus. It is a solution that reflects the reality that there is disparate risk within the population. Infection is not disease. Asymptomatic infection is not a disease. But, asymptomatic infection does usually results in some amount of immunity. Population immunity lets the strong protect the weak.
The same feedback loops that cause outbreaks work in reverse to protect us once immunity exists. If even one member of a family gets sick during an initial infection, everyone is potentially exposed. In reverse, if even one member of the family has become immune, it partially protects others in a family unit. This health benefit of immunity comes long before the 70% threshold that has been unfairly maligned and entirely misunderstood throughout this pandemic.
While our most successful public health tools work to amplify nature’s defense mechanisms, the choice to use lockdowns and shutdowns in 2020 does the exact opposite. When the infection doesn’t spread even in low-risk populations, everyone is still susceptible. The moment people try to live normally again, the virus will start anew. Time passes, but the end is still equally far away. Substantial suffering and costs have accumulated in the meantime. Many people have felt dismayed at this realization, believing that social distancing helped this problem go away. That is not the case. Social distancing does not reduce total harm. It only delays it, spreading it out over an extended period (flattening the curve meant getting infected later, not never). We cannot just run or hide from a virus. Nature determines the rules. We just play by the game.
Vaccines amplify population immunity by intentionally exposing healthy members of the population to a virus so that they develop immunity faster. But vaccines have never been used successfully to slow a pandemic. Pandemic phases of flu-like diseases end in a little less than two years on their own. We have not been able to scale a vaccination-based response in that amount of time in the past (though I believe someday this may be possible). Given that this is the first experiment of its kind, there is substantial engineering and logistical uncertainty if this will work. But there is minimal scientific uncertainty about the need for increased immunity. Whether it is by vaccine or asymptomatic spread, this pandemic ends with population immunity.
Of course, only one of those options is available to us today to minimize harm. The vaccine? Who knows.
Re-learning the same lesson
Scientists and public health officials will look back soon and realize it was possible to achieve equivalent (or much better) public health outcomes at a much lower cost to the economy and human life. The mistake was not that the shutdowns were too soft or short, but that they were fighting the wrong battle. What we know works is amplifying the natural mechanisms. Choosing to try to counter natural tendencies was a mistake.
There is a time and place for different strategies, and we should keep trying new things to get better at our pandemic responses. It will always be easier when we accurately assess the nature of the virus. The information needed to make good public health decisions was available, just not in the places that scientists chose to look.
Close the RSS feeds, ignore twitter. Open a textbook, start there. Also, we would all be better off if journalists stopped reporting anything from biorXiv.