Limits of Inference

Overdiagnosis: why scientists and statisticians think we should test fewer people

Clare Gollnick — Mon, 04 Jan 2021 12:15:54 GMT

Welcome back to Limits of Inference! Thanks for being here.

This newsletter now has readers from many different backgrounds. I love it, but it has presented me with a writing challenge. To keep my posts readable, I have been editing out some details. These edits are sorely needed as I am long-winded by nature. But, sometimes, it makes me sad not to dive into all the nitty-gritty details. I cannot help it, still an academic at heart. So, I am trying something new— footnotes. The footnotes are here for those who care deeply about a topic or want more substance. These long footnotes are tangential, so dive in at your own risk. Let me know how this experiment works.

If you enjoy the newsletter, please tell others about it. If not, reach out to me and tell me why.

Subscribe now

This year a manufacturing snafu caused test shortages at a critical point early in the pandemic, and the need for widely available testing became a rallying cry. Testing remains an important tool; however, we've taken it too far. We are testing too many people for SARS-COV-2. As a result, we are overestimating the health impact of COVID-19 relative to other respiratory illnesses.

Overdiagnosis is not a new phenomenon. We have seen screening contribute to overdiagnosis with prostate cancer, chronic kidney disease, cervical cancer, breast cancer, ADHD, thyroid cancer, and depression. This list is just the tip of the iceberg. As this is a mistake that modern medicine keeps repeating, it's worth talking about how and why.

Data tells us that something happened, not why. The why is a story a person comes up with to explain data using context and domain expertise. Many people assume a test tells us that a person has COVID-19, but that assumption conflates the data (the test result) with the inference or explanation (the presence or absence of disease) [1]. A positive test result tells us that a person got a positive test result. Yes, this statement is annoyingly tautological, but that is exactly as useless as data is by itself.

When one searches for evidence, it becomes harder to guess the explanation. This problem is fundamental, not unique to public health. Let's develop the intuition using an example outside of biology: the metal detectors used to screen crowds entering airports, courthouses, sports arenas, etc. Metal detectors search for criminals by recognizing metal, a material unlikely to be found on a person's body but commonly found in guns and knives. Most of us have seen or heard a metal detector alarm at least once, and odds are it was a false alarm (nearly all alarms are). This is the expected outcome whenever we screen large numbers of people looking for evidence of rare events. False alarms will dominate, but it will still prove difficult to predict why or when the next one will occur.

Metal detectors detect evidence of metal (data being tautological as always). However, it takes inference from that data to determine whether the person has a weapon or violent intent. In practice, there are many reasons why someone has metal on their person that are neither dangerous nor criminal—from bone screws to pacemakers, from underwire bras to steel-toed shoes, from coins to snap closures on a clutch (yep, that last one happened to me) [2]. Likewise, there are alternative explanations for a positive COVID-19 test other than a viral disease — a contaminated swab, a mislabeled sample, a cross-reaction with a different molecule, high background noise, exposure without infection, residual debris from a previous infection. The more weird edge cases that can explain a false alarm, the more errors will occur.

Any single alternate explanation may seem like a rare circumstance that does not matter to the big picture. That changes with scale. Once we search extensively or test thousands or millions of people, all explanations, even rare ones, are likely to occur. Surgical screws are rare in the population overall, but we will almost certainly find some in a crowd of ten thousand sports fans. It is unlikely for a technician to cross-contaminate a sample. However, we still expect regular, consistent mistakes caused by human error anytime we run millions of tests per day.

There is another way to think about the problem of scale. Testing does not cause disease. The total number of cases does not increase with the number of tests performed. However, every single test has a small chance of returning in error, whether due to a mistake or just the general weirdness of the world. So while the number of true-positives has an upper bound (max value), the potential for false-positives is limitless. The signal is constant, but the noise grows each time we search for evidence of disease.

We can estimate the scale of the false-positives problem using Bayes theorem. The specific tests used for COVID-19 are too new to have reliable real-world performance data, but we can expect them to be excellent — much more precise than metal detectors. I will use the same numbers the FDA provides as guidance to practitioners here. But be aware that even the guidance numbers are hypothetical and represent a best-case scenario [3].

Assume a large COVID-19 outbreak occurs in New York City, and Governor Andrew Cuomo goes into beast mode, testing every person in the city in 24 hours. It's a major operational success, but how much should we trust the data that comes out?

Given a large outbreak, we would expect perhaps 1% of the city's population will be infected at once. In a city of 8.3 million, that is a total of 83,000 people. The hypothetical testing process is nearly perfect, correctly flagging 98% of all infected people as positive (81,500 cases are true-positives, missing 1500 as false-negatives). We tested every person in NYC to find the infected, including approximately 8.2 million uninfected people. Again, the testing procedure is robust, but different types of errors, such as cross-contamination or data entry, can happen, so we assume we classify 98% of these people correctly as well. That is around 8 million true-negatives with 164,000 false-positives. The accuracy of this hypothetical would be phenomenal by biological laboratory testing standards [1].

But look closer at what these results mean for someone who received a positive test result. Eighty-one thousand five hundred results (81,500) are true-positives, but 164,000 results are false-positives. That means 1) 70% of positive test results are false-positives; 2) a person who received a positive test result is more likely to be healthy than infected; 3) anyone analyzing this data would overestimate the actual number of COVID-19 cases by around 3X or 300%.

This result is famous and famously counterintuitive, making it the centerpiece of many Bayesian statistics classes. Since it is counterintuitive, I will reiterate the main points. The test is extremely accurate. Most test results provide the correct answer (over 8 million correct, less than 170,000 wrong). Nearly all the negatives are true-negatives; almost all infected people test positive. The data quality problem shows up in the subset of people who received a positive test (the evidence we were actively searching for). After testing positive, there remains a high probability that a person is not infected. We end up overdiagnosing the disease.

Errors may be unlikely, but a tiny percentage of an enormous number is a big number in its own right. Technically, a person who received a positive test result went from having a 1% chance of having the disease to a ~35% chance after the test. Statisticians may appreciate a probabilistic win, but individuals perceive such a test as wrong most of the time.

There are no statistics, no data tricks, that can fix this after it has occurred. If there were a post hoc fix, this wouldn't be a limit of inference, only a math problem. One statistic used in the example above, the disease prevalence, is unknown in real-world scenarios. Figuring out how many people have the disease was the goal of testing in the first place. Though the thought experiment above is not practical, it's a typical case study used to teach epidemiologists that this is a problem that quickly becomes overwhelming if we screen widely for a disease, even given outstanding laboratory tests. If we try to go out searching for evidence of disease, the resulting data will be garbage.

We can prevent overdiagnosis by testing fewer people. Specifically, we need to test fewer uninfected people as missing infected people would be counterproductive to the goal. Of course, we don't know who is infected or not, so we must be strategic. Scientists achieve this balancing act by figuring out risk factors and using them to identify smaller subsets of the population more likely to be infected, i.e., high-risk groups. Suppose Cuomo had restricted the tests to people who had been in the vicinity of the outbreak (viral outbreaks are highly geographically localized). In this hypothetical, 70% of the healthy people are not tested, but the infected people still are. A person who tests positive then has a 61% chance of being a true-positive. That's a significant improvement in data quality, not to mention the time and money saved.

Targeting is an elegant solution to a fundamental limit of inference. Screening only high-risk groups is not about laziness or lack of resources— testing everyone is a bad idea even if we had all the money and time in the world to devote to a single disease. The rallying cry "everyone can get a test" is good politics but bad public health policy. It is an expert's job to know better (and say so loudly) to counter the public's understandable but fear-driven demands.

Testing less is already the gold standard solution to this problem, and we see evidence of this all over medicine. Once ubiquitous, pap smears and other screening tests for cervical cancer are now recommended at most once every few years due to the resultant overtreatment problems. The PSA test for prostate cancer is now used only sparingly in adults over 50, as its data proved mostly to be noise. Using ECGs to screen for heart disease is not recommended for anyone who is asymptomatic. Mammograms are now targeted to increasingly smaller high-risk groups by age and family history after identifying too many non-cancerous abnormalities caused scares. Again, the tip of the iceberg of what is a well-known problem.

Data is a tool. We choose how we use our tools. When a metal detector alarms, we do not scream and flee, nor do we charge the person for attempted murder in a court of law; instead, we look for other evidence. Multiple data sources could also inform a diagnosis. In fact, this is how doctors typically diagnose disease. Patients come into the office with symptoms, and the symptoms determine which tests are ordered. Laboratory data is combined with clinical data before making a diagnosis, reducing the amount of error.

Some physicians are, no doubt, still operating in this usual way in their practice. But at least in NYC, the official guidance blasted through commercials and street signage is to do the exact thing that we know from both theory and experience causes overdiagnosis, test without reason or evidence. Worse, the tests are free —as long as the person brings up no other health concerns or symptoms at the appointment. Otherwise, it becomes a medical appointment billed to insurance. This situation is perverse, incentivizing patients not to discuss the exact context that would lead to better medicine.

It is under these unprecedented circumstances that the CDC decided that COVID-19 surveillance cases can be confirmed and reported based on laboratory evidence alone. This standard is a break from best practices that, to be honest, I struggle to understand. Yes, there is an operational need to move fast. Standardization is important for comparison. Using one data set means there are fewer phone calls involved, less code, etc. But using laboratory evidence alone when actively screening for a disease means the data becomes effectively noise. Is that really worth it?

One could argue it is, and some do. One reason we screen for this disease is to prevent new infections. Scientists believe transmission can occur without symptoms. Since we cannot wait for symptoms to confirm a case and still achieve the goal, the situation seems to require overdiagnosis as a means to an end [3]. I have argued that this is a flawed goal. However, if we choose to pursue it, we must also accept the consequences.

Scientists and analysts should not use the COVID-19 testing data to do other research or support any scientific conclusions about the severity or prevalence of this disease. Unfortunately and unsurprisingly, many are. But, this data does not accurately estimate the total health impact or spread of COVID-19. Any inference drawn from comparing this data set with historical data would exaggerate this pandemic's severity relative to other illnesses and causes of death. One data set was collected through active screening at an unprecedented scale, and the other was not. One data set required only laboratory tests as evidence, and the other included only cases screened by clinical symptoms and confirmed by laboratory tests. As they say, garbage in, garbage out. We cannot fix the data, so we have to choose not to use the tool.

Lastly, I am concerned that this testing data is being reported to the public directly as news without any context or caveats. Attempts to point out flaws are censored. Some scientific leaders (notably Dr. Fauci) have admitted that they believe it acceptable to lie to the public as a means to an end. Perhaps that is also the reasoning here, to mislead the public to get the desired behavior?

In general, as a leadership strategy, I find there are negative consequences to choosing to lead with lies. Accuracy is a better, more ethical approach. A nuanced message is just as effective. If you got a COVID-19 test as a precaution, not due to an inciting event or symptoms, you can trust a negative more than a positive—act and reason accordingly.

Testing too much can and does make a disease appear more common than it is. The root cause is not biological but epistemological—any data set collected via search will have the same problem. When we screen millions of credit card transactions looking for fraud or billions of network packets looking for evidence of hackers, the same false-positive problem shows up to the tune of billions of dollars of lost revenue and customers. Perhaps it wouldn't feel so jarring to communicate this point to the public if we spoke openly about data's fallibility on these other topics. We should never start with the assumption that any data set is an objective representation of the world. Data can be wrong. No one has a monopoly on truth.

Footnotes

Overdiagnosis: false-negatives and sensitivity of RT-PCR [1]

Clare Gollnick — Mon, 04 Jan 2021 11:00:24 GMT

Welcome to Limits of Inference! The post below is not intended as a self-standing piece. This is some supplementary context in support of a previous article. To get an introduction to the problem of overdiagnosis, check out the original piece here.

Questions? Concerns? Let me know in the comments. Also, subscribe below.

Subscribe now

Lab test data is not natively 'positive' or 'negative.' The measured data is collected a continuous number like 5, 0.001, or 200. The test protocol includes at least one decision threshold to turn this continuous data into something discrete (positive, negative). This hidden inference explains why one cannot just prevent overdiagnosis by fixing the tests.

If you make a test more sensitive by lowering the decision threshold (find more true-positives), it also becomes less specific (more susceptible to false-positives). These two types of error cannot be manipulated independently except by scientific advancement or technological invention. Depending on the intended use case, engineers can design a test to prefer false-negatives or false-positives

Biomedical tests tend to be engineered to avoid false-negatives and prefer false-positives. To some, RT-PCR tests feel like an exception to this heuristic, creating confusion, even among epidemiologists. Many research scientists understand RT-PCR to have low sensitivity (it misses the virus ~10% of the time) and high specificity (false-positive only 5% of the time). Those numbers aren’t actually that low or high relative to other tests, but the test itself is more specific than sensitive.

Given these numbers, many scientists have concluded (first on Twitter then cited by journalists) that there are more false-negatives than false-positives. I was surprised to see that claim since it is counter to all past experience screening for disease. I carefully considered the scientific arguments as I came across them. Most of the mistakes were mathematical (Bayesian reasoning is difficult). But when not that, these data stories tend to miss two crucial points.

First, we care about the sensitivity of the entire testing strategy (the clinical sensitivity), not of one RT-PCR test delivered in isolation (the technical sensitivity). The clinical sensitivity is higher than the technical sensitivity because, in a clinic, we can test people multiple times. If someone is suspected of having COVID-19 due to symptoms or exposure and gets a negative test result, a physician will test again until the patient tests positive. Even given an initial false-negative, the case is likely to get counted eventually as a true-positive in the disease prevalence data. The observed sensitivity of RT-PCR ends up much higher than its sensitivity measured in the lab. At worst, we find the case a day or two later. There is no such correction pathway for false-positives (at least not using CDC guidance).

Fun fact, this repeat-test strategy is the same methodology used in the original experiment that measured the sensitivity statistic most often cited for COVID-19 RT-PCR. People known to be exposed to the virus are selected for the study. The experimenter tests this same group of people repeatedly over multiple days and counts how many people who initially test negative go on to test positive in the future. The available RT-PCR performance stats are better understood to be the ratio of the technical sensitivity to clinical sensitivity. 80% of all patients who will ever test positive do so on the first test, but 100% eventually get diagnosed. This is another example of how context matters for the interpretation of data— those who google numbers or statistics and apply them without reading the underlying papers will tend to make poor inferences.

Repeated testing is a reasonable strategy because the signal (the spread of disease) can change with time as people are newly infected. Also, if patients are high-risk, we want to be more certain about the test result. We are not missing that many true cases in practice, at least not due specifically to RT-PCR sensitivity.

The second point that is missed is that the biology behind the test isn’t that important overall. A significant source of false-positives is human or logistical errors. Biological testing errors can only add to the noise, not counteract it. For example, the CDC released thousands of tests in February that gave 100% false positives due to a manufacturing error. This is exactly the type of fluke that people operating in the real world or using the resulting data need to consider when making inferences. We know about that one mistake, but we are rarely lucky enough to know when or why most errors arise. We know enough about human fallibility, in general, to be confident that errors are happening. Still, we do not know precisely when or where, or how much without field testing.

This is a limit of inference. Domain-specific knowledge of infectious disease or RT-PCR is not the only, nor the most important context needed to tell the data story.

Overdiagnosis: technical versus clinical false positive [2]

Clare Gollnick — Mon, 04 Jan 2021 11:00:23 GMT

Questions? Concerns? Let me know in the comments. Also, subscribe below.

Subscribe now

Defining a false-positive is often the most contentious part of a discussion on overdiagnosis. Is the metal detector there to detect metal? Or is the metal detector there to prevent violent crime? Is it a false-positive if there was metal but no weapon? What if the person had a weapon but didn’t intend to commit a crime?

These debates tend to occur in the world of ‘but, technically.’ Differentiating between a false-positive that did not mean what the patient or user understood it to mean and a true technical false positive can start fights. For example, most pregnancy test guidance says false-positive tests are virtually impossible. However, a woman can test positive when not pregnant, for example, if she were pregnant in the past and had an early miscarriage. A person wondering if there will be a baby in ten months reasonably understands this scenario as a false-positive. The scientists or engineers who designed the test will insist these are not false positives but just a different and valid explanation for a positive pregnancy test other than pregnancy. A version of this argument gets framed as 'clinical significance' vs. 'statistical significance,' which I think is useful differentiation. But as far as consumer products go, I left science half-a-decade ago and now build products for real people. I'm on the side of the layperson. You knew why I was taking this test, why I bought your product. I wanted to know if there was going to be a baby in ten months. Pick a better indicator of pregnancy. Design a better test. Take responsibility for your product. Stop blaming the user.

This definition is also contentious within the infectious disease community. For one, a person can be infected with a virus, or the virus can be present in their nose without being sick or contagious, particularly if they are partially immune. The ‘but technically’ crowd will say that it is not a false-positive— the virus was detected accurately. There might be a chance of spread, in the way there might be a chance the sun explodes tomorrow. But to real people living actual lives, if they are not contagious and not sick and would never become sick, they are healthy. I was taking this test to know if it was safe to go to my friend's birthday. It is as safe as life ever can be. That makes it a real false-positive. Stop lying to the user.

Debates about where to draw the line drawn between a clinical false-positive and a technical false positive usually become obnoxious, philosophical, or both. The best answer depends on the use case or inference being drawn from the data. No matter where one draws the line — whether it is cancer, COVID-19, or metal detectors — it does not matter to the broader overdiagnosis problem. There only needs errors of any kind for the problem to recur from screening or exhaustive search. There are infinite potential sources of noise given any standard or definition.

Overdiagnosis: how early detection changes the definition of disease itself [3]

Clare Gollnick — Mon, 04 Jan 2021 11:00:23 GMT

Questions? Concerns? Let me know in the comments. Also, subscribe below.

Subscribe now

How to identify overdiagnosis of a disease

The graph below demonstrates a common framework, invented in the context of cancers, to identify overdiagnosis. According to this framework, overdiagnosis is the likely explanation anytime an increase of cases (incidence) does not coincide with an increase in death (mortality). Diseases are defined by the impact on health, not the biochemical evidence that corresponds to disease. If we find a real disease, the most important health consequence of the disease (i.e., death) should increase as the incidence does.

Image reproduced from multiple sources (primary source here from Oke, J.L., O’Sullivan, J.W., Perera, R. et al. Sci Rep8, 14663 (2018). Also discussed introduced here and here)

Since the number of COVID-19 cases has gone up by an order of magnitude from the spring peak to the winter, and the deaths per day are only now approximately the same as the peak from the original NYC-NJ outbreak, it is straightforward to infer that we are substantially overdiagnosing COVID-19.

Source Data

However, it is worth being careful here since there are other explanations consistent with the data. The early COVID-19 data were collected under a very different context. In March, doctors often used ventilators, a dangerous treatment with substantial side effects of its own, to treat COVID-19. This ineffective and sometimes deadly treatment may have contributed to some of the early excess deaths. We also didn't have any tests back in March and April, so our estimate of the spread of the disease is lower than what we would observe with our testing infrastructure today. Plus, the disease was entirely new, so its geographic reach has almost certainly increased. Given this context, a plausible alternate explanation exists other than overdiagnosis: it could be that COVID-19 was always not particularly deadly. This is probably part of the truth. This would mean the overdiagnosis problem is less severe, but also, the pandemic would be a less severe health crisis overall.

Why early detection leads to more overdiagnosis

There is an important parallel between COVID-19 and cancer— early detection is considered of paramount importance for clinical outcomes. Early detection is valuable because modern treatments work better on early-stage cancers. Similarly, early detection of COVID-19 may prevent further spread.

In cancer, the problem with trying to diagnose earlier and earlier is that it leads to a sliding threshold problem — the line between diseased and healthy gets moved so that more and more people are defined as sick. This changed definition of disease is another common source of overdiagnosis and a much more contentious one. For example, some scientists started treating lesions with chemotherapy when it was previously considered precancerous and just something to watch. Scientists started considering asymptomatic infections as cases of the disease, COVID-19, instead of just infections.

It is only practical to change the definition of a disease once we can observe it, so a new test or technology is often a prerequisite for a definition change (to find infections without symptoms or smaller precancerous lesions). If we look back at history, we see evidence of a more troubling dynamic. New technology tends to cause a change in disease definition. A small group of scientists invent a new test looking for an early indicator of existing disease (usually one they also identified as important). In early experimental settings, the new tests find more cases of disease even earlier than before. That is counted as a win since the community values early detection. This leads to an official institution-backed change in the disease's definition to include all occurrences of the early indicator as clinical evidence of disease. More people are diagnosed and treated for the disease. The disease gets more funding because it now is measured to be a bigger health risk. Then, a new scientist claims to have invested in a new, better test. The cycle begins again.

This cycle can easily become a game of how many things can we designate disease, disconcertingly independent of the resulting health outcomes. This was the observation that motivated the invention of the overdiagnosis framework above.

Diseases are human constructs. We observe something, call it a disease, and then treat the disease. Nothing is stopping us from labeling all of human existence a disease. To be clear, this is a controversial claim. Not everyone agrees this is happening in any one circumstance. Instead of staking a strong claim today, I will say this, keep an eye out for situations where, for whatever reason, the definition of the disease itself seems to change to make the point.

The goal is not to find and name all the weird things that happen in the human body but to improve people's lives. Many abnormalities cure themselves if left alone. Not all things people can name or study are real problems. As a principle, we should only be chasing things that make a difference in outcomes that matter.

Source: Zahl, et al. Br J Cancer 2013;109:2014-2019

The best and worst of public health

Clare Gollnick — Fri, 11 Dec 2020 12:00:14 GMT

Welcome to Limits of Inference. I am happy my blog has become a substack newsletter. This isn’t my full-time gig, so I cannot promise to write on a schedule. Hopefully, that’s ok. I don’t want to crowd your inbox anyway.

Limits of Inference is a newsletter about data generally—and specifically, the way data fails to provide objective truth about the world. I didn’t intend to write about biology when I started this project. However, COVID-19 is dominating my thinking. This topic is a natural one for me since I’ve spent half of my career working in biomedical engineering developing drugs and therapies. My deep dive into problems in biological research spurred my initial interest in the misuse of data. So, the next few articles will be COVID-19 related. Drop me a note in the comments if you want to hear about something else. Also, happy to answer questions.

Subscribe now

Childhood vaccinations allow kids to play together without spreading disease. IUDs allow couples to plan their family and still have a fulfilling sex life. Chemotherapy allows cancer patients more time to see their family and friends. NSAIDs allow athletes to manage inflammation to train harder. Even needle exchange programs allow safer drug use by reducing the risk of infection.

Public health and medicine advanced dramatically in the last century. These success stories share a common trait. Public health policies, particularly progressive ones, find a way to say yes to life.

Yet, in 2020, many public health experts have advocated for policies that deny life. We locked down instead of spending time with friends. Our grandparents died alone instead of being surrounded by their loved ones. Passports were made useless instead of providing opportunities to explore the world. Young adults stopped nurturing new relationships that might someday become their chosen family.

Many people believe these sacrifices are necessary to combat this pandemic. Trusted officials have told the public that science says this is true. But in fact, scientific research aims to design new solutions that allow life to continue despite the ever-present risk from disease. It is not true that we have no other options. Even if it were, it is the job of scientists to invent better options if none exist. A lack of alternatives would reflect a failure of scientists, not a consequence of science.

There are different ways for a leader to respond to a risk. An expert can publish weekly death statistics as haunting reminders that even going outside increases skin cancer risk, or an expert could hand out bottles of sunscreen before a 5k and say, enjoy your run.

An educator can teach abstinence and use the negative consequences of pregnancy and STDs to scare teenagers. Alternatively, an educator can teach about the risks while also demonstrating how to use a condom and engage in safe sex.

Scientists had a choice to lead with progressive policy and even be creative with their solutions during this pandemic. Instead, they chose to lead with fear and motivate with shame. I am proud of the history of this field, but this year has been a disaster. It is time to stop the fear-mongering and start being better scientists and better doctors. It is time to find a way to start saying yes to life.

Policy failures often stem from inadequate definitions of success. I think this is a root cause of the pandemic failures in 2020. As I have written before, we’ve focused on the wrong data, wasting resources trying to prevent infections instead of minimizing disease. There is more.

To start, let’s clear this one up. Public health and medical professionals are not in the business of saving lives. The reason is simple: every single one of us is mortal. Every person who has been born will die. If one tries to measure any policy’s success using the metric of “lives saved,” there is only one possible conclusion— we have saved zero lives, ever.

To use ‘lives saved’ as a success metric, a scientist or analyst must cherry-pick the data, limiting the scope of a graph or metric to one cause of death or a period of time. This analytical decision can be deceptive if used to hide the counterfactual. A policy that prevented deaths by one cause necessarily increased the number of deaths by another. A discussion of what happened instead is the context necessary to judge if a policy is effective overall.

The claim is this: lockdowns, social distancing, and travel bans saved lives in 2020 by preventing the spread of COVID-19. This claim is dubious, given that the virus is spreading virtually uncontrolled despite global lockdowns. Let’s assume these policies at least helped toward this goal.

This is what happened instead. Lockdowns, travel bans, social distancing, and the fear-based messaging required to get public acceptance of these interventions disrupted supply chains. The social isolation from shutdowns increased poverty, depression, and obesity, all risk factors for life-long disease. Told to stay home, most people avoided the doctor’s office or hospital altogether.

This is what these second-order consequences look like for us today. The world anxiously awaits vaccines, but Pfizer cannot manufacture their vaccine at a sufficient scale. They do not have enough equipment, such as cold storage. Why? We told the people who make the freezers back in March—stay home to save lives. With long lead times required to manufacture complex equipment, we couldn’t ramp up fast enough to get the COVID-19 vaccine to the population that needs it.

Some nurses and doctors on the front line are at greater risk from infectious disease because they do not have adequate PPE (masks, gloves, gowns). We do not have enough PPE because, again, we told the people who make the PPE —stay home to save lives.

Supply problems were the inevitable consequence of telling people to stop working. Global supply chains are enormous. Millions of people need to work in some capacity or another to manufacture even a single N95 mask end-to-end. Medicines and supplies that doctors use to treat diseases exist because people leave their houses to work together. A functional economy is a boon to public health. Shutting it down has real negative consequences even to health and medicine.

Then, there is the fact that social distancing itself is not good for anyone’s health. In its extreme, solitary confinement is a punishment, even torture. Because it is mentally straining, social distancing encourages terrible habits such as sedentary lifestyles, binge eating, drug use, and excessive drinking. If someone never loses weight, never changes their lifestyle, or never kicks the depression caused by this quarantine, it could mean a lifetime of poor health, including early death (but it won’t be observed in 2020).

Even if we consider diseases that occurred just this year, evidence of the pandemic response policy's negative consequences is ubiquitous. There are hundreds of thousands of missing cancer diagnoses in the US. When diagnosed, the cancer is late-staged and less likely to be treatable. Diseases of poverty rose as travel bans closed hospitals staffed by foreign doctors. There were an estimated 200,000 missing tuberculosis diagnoses due to just the first three months of travel bans this year. Patients in the US are 2.4 times more likely to die from a heart attack in 2020 than in 2019. Organ donations have dwindled this year, endangering the lives of thousands of people waiting for a transplant.

Even deaths now counted as caused by COVID-19 may be caused indirectly by social distancing. Around ten thousand additional Alzheimer's patients have died this year, making dementia patients one of the most at-risk groups for the pandemic. Social distancing lowers the chance that a person gets exposed to SARS-COV-2 (benefit). However, social isolation also speeds up Alzheimer’s and dementia progression (cost). Alzheimer’s and dementia have a one hundred percent fatality rate. When dementia progresses, it increases the risk of death from viral disease. In short, social distancing might decrease the odds of getting COVID-19, but it increases the odds that a dementia patient will die from COVID-19 (and every other disease). It is not possible to disambiguate these effects from just the data on a death certificate. As fewer people died from the flu this year, the numbers appear close enough that we may find someday that more of the excess dementia deaths resulted from social distancing than from the novelty of the virus.

None of these consequences are being considered in the design or execution of pandemic policy today. Cities are opening and closing based on the number of COVID-19 infections alone. Monitoring the spread of COVID-19 is essential, but it is one of the many health risks we face. This narrow focus on one disease is resulting in troublesome and self-serving storytelling with data from professionals.

Today, the CDC (and consequently many journalists) refer to these deaths as “COVID-related” despite people not dying of that disease. Some analysts count these numbers as deaths from the pandemic itself, usually applying an analytical approach that attributes all excess deaths to the pandemic as a default assumption as if human failures could not possibly contribute.

A reminder of the limits of inference — data tells us that something happened (it tells us someone died), not why (was it the pandemic or the policy?). The why is a story we tell ourselves to make good decisions in the future. The ‘why’ is inference. A better story results in better policy.

It is convenient for the CDC to use the phrase “COVID-19 related” because it sounds to the casual reader that these deaths were the inevitable and unfortunate consequence of the pandemic. An alternative story is that fear-mongering, a single-minded focus on one disease, caused elderly patients to be too afraid to go to the hospital for heart attack symptoms or get preventative care. Telling older people to be frightened or families to stay apart was a policy choice. Hospitals were never full. Therefore, I believe a more accurate data story counts these deaths as costs caused by the CDC and NIH —lockdowns saved some people and killed others. By acknowledging this more nuanced data story, we could have made better decisions.

Some deaths this year were inevitable. We are living through a deadly pandemic, and we are all mortal. I believe many of the second-order consequences of pandemic policy were avoidable if we thought more holistically about defining success and managing risk. Other policy options allowed life to continue in the spirit of sunscreen and safe-sex.

We could have told the public to limit social activities to the one or two they value most (and keep it outdoors when possible). This policy would allow people to live well while still flattening the curve and keeping hospital beds open.

We could have told workers (accurately) it was safe to keep working unless they are in a high-risk group. This choice would have kept supply chains and economies going while still protecting the most vulnerable. We could have even done more work, manufactured supplies more quickly, to get the vaccines distributed faster. We could have created operations centers and medical transportation teams to make sure individual hospitals were never overwhelmed.

With a message of relative risk, it would have been easy for the public to understand why experts actively encouraged even high-risk groups to get preventive care like annual check-ups and cancer screenings from their doctors. COVID-19 is dangerous, but so is cancer. It is always important to live each day that we have on earth and take care of ourselves.

We could have asked everyone to wear a mask as a low-cost intervention. Like condoms, masks are not fun but not a huge detractor from life. I doubt people would have rebelled as much it didn’t feel the seeing people’s faces was the last hint of normal social interaction.

We could have compensated nursing home workers well and scheduled shifts, so they quarantined monthly. This way, the disease did not enter the facilities where 40% of all deaths have occurred. Instead, we told everyone, including nursing home workers, to quarantine indiscriminately and without any compensation.

These are just a few different policy ideas that reduce risk while allowing more life. Public health should work by looking at the totality of harm to life, not just the graph of one disease over one year. COVID-19 is a risk, but it is not the only one we face.

One would hope that the CDC did the analysis and decided that more people needed to die of heart attacks, cancer, tuberculosis, drug addiction, strokes, kidney failure, and Alzheimer’s disease (…the list continues) to prevent the spread of COVID-19. I don’t think they did. I think these scientists never thought about second-order consequences. They considered the one disease given the data in the one graph. This hypothesis comes from my experience working in the field and watching the professionals leading the response today. That’s the topic for a future post.

The point here is that the story isn't as simple as ”stay home to save lives.” Zero lives have been saved this year (or any year). However, billions of years of life have not been lived. Pandemics are real. Risk is never zero. However, the science supporting lockdowns as a response to a pandemic is only obvious to those who cherry-pick the data and deprioritize most of the population's health needs to tell the story.

We should start counting everyone, everyone’s life, and everyone’s health when we define success. It would tell a better story, be more consistent with data, and we would make better decisions.

Understanding Risk From Viral Disease

Clare Gollnick — Fri, 13 Nov 2020 18:51:36 GMT

A century ago, an influenza virus, nicknamed the Spanish flu, killed 50 million people over two years. It was the most deadly viral pandemic in recorded human history. How did this pandemic end?

It hasn’t ended— at least not if we apply the same metrics used in current news reports and government policies to judge success with COVID-19. The Spanish flu got rebranded as a member of the seasonal flu a long time ago. But the virus formerly known as Spanish flu is still causing seasonal outbreaks to this day. People are getting infected and reinfected multiple times in their lives. Individuals in high-risk groups are dying from Spanish influenza today, a century after the pandemic began and decades after a vaccine was available.

No expert would say that the 1918 pandemic was still ongoing. Something strange happened this year. Our leaders changed the rules for COVID-19. Politicians, journalists, even some scientists started to compute risk and quantify harm from this latest viral disease differently than we had for any previously existing virus or any human illness. We began counting infections, not disease, as the metrics of success for a public health policy.

Infections are a risk factor for a viral disease, not the disease itself. We are making a category error. The vast majority of viral infections do not cause disease. Most infections help prevent future illnesses. These new metrics set the acceptable risk level from the viral disease so low it cannot be reached with any modern medical treatments or tools. That's not progress. That’s fear-mongering.

Viruses don’t go away. The numbers we are reporting and decisioning on right now—infection rates, daily percentage of positive tests—-these numbers will never stay low enough to prevent shutdowns for any significant time, even years from now. This coronavirus will be recurring in seasonal waves for the rest of our lives.

Epidemiologists never counted infections as a problem itself before 2020. Remember swine flu? It made the news briefly in 2009. That’s the same strain of the virus that caused the Spanish flu. This novel and deadly virus caused a global pandemic only a decade ago. Yes, you have lived through other deadly global pandemics. Since 2009, a considerable percentage of the global population has been infected with this mutant of influenza, most unaware as it happened. Some of us did not have symptoms at all. A few of us felt tired one night, and that was a subtle sign of fighting off an infection we brushed off as work stress. Others thought it was a cold, a bad headache, or correctly labeled it the flu but not the new flu type. When young and healthy, we experience many viral infections as minor nondescript annoyances— like how the vast majority of the public experiences COVID-19 today.

Infection without disease is a common phenomenon in epidemiology. It is easy to find other examples that are less politically fraught than comparing COVID-19 to the flu. Only 1 in 4 people have had the kissing disease, mono, in their lifetime. But 90% of adults have immunity to the EBV virus that causes mono. If you never got mono during your teenage years, it is more likely than not that you had the infection without knowing it. You could have even been the super-spreader in your high school or college. We all infected each other as an age-appropriate social herd.

Measles is another top-of-mind example of infections regularly occurring without causing disease. Measles outbreaks recurred in the past five years. This recurrence was only possible because the measles virus was circulating and causing asymptomatic infections in our population. If the measles had been eradicated, it couldn’t have recurred even due to the anti-vax movement's nonsense.

Viral infections are ubiquitous, and that can be a good thing. Constant endemic minor infections prevent deadly pandemics from novel viruses, like a booster shot for a vaccination program. The swine flu pandemic of 2009 was nothing like the Spanish flu of 1918. Many people, specifically high-risk groups over 60 years old, were partially immune to the virus from childhood infections. If we had prevented these past infections, the swine flu pandemic would have been much more deadly within the high-risk population than it was.

As we age, our immune system is less effective at fighting off infections. We experience more disease, or more severe forms of the disease, with each infection. Perhaps, we feel sick enough to go to the doctor instead of just shaking it off as stress-related exhaustion. That act, the choice to go to the doctor for treatment, is the official difference between a ‘case’ and ‘infection’ in usual public health metrics (see graph below). There are more cases of the flu in high-risk groups in our population, not necessarily more infections. The co-occurrence of the risk factor of age with the risk factor of infection is more likely to result in disease. Of course, age is a risk factor for most diseases. Our knees and backs hurt more from a tennis match at age 45 than when we played at 15. Our bodies ache more from the same infection, too—aging blows.

Remember flattening the curve? This graph is official CDC guidance from 2007. Look at the y-axis. It is based on cases. Cases are people who are in the clinic and using medical resources. This is not a plot intended to consider minor infections or positive RT-PCR tests. The critical idea got entirely lost due to fear-mongering and misinformation in the media.

What about vaccines?

We can minimize harm from COVID-19 without preventing all underlying infections. That’s the only option we have available. Vaccines are not treatments for a viral disease. They use our immune system to prevent an infection from becoming a disease. One needs a functioning immune system for this to work. Low-risk groups historically respond well to vaccines but often had minimal disease risk to start. Vaccines are less effective in high-risk groups for the same reasons these groups are more susceptible to disease in the first place. We don’t have a cure for age.

At the population level, vaccines can eventually lower the number of infections. This reduction occurs through a process known as herd immunity. When the people around you, your social circle or herd, do a better job of fighting an infection in their own body, they are less likely to expose you to the virus in the first place. As we age, we rely even more on our family and friends—on herd immunity—to stay healthy.

The Wisdom to Know the Difference

Viruses never go away. Viruses never stop spreading. It doesn’t make sense to be afraid of SARS-COV-2 or to organize your life to avoid it. You are going to be infected eventually, even if you get a vaccine first. You do not have control.

Let’s address two common fears these days. First, fear that you are killing someone else by allowing yourself to get infected today. Second, fear of unknown consequences of viral infections.

Infections sometimes have long-term health impacts even without disease. I am unaware of examples from past coronaviruses or the common cold. I would bet money any claims of side-effects end up being bad science. But it is not impossible. HPV infections correlate with cervical cancer, and the EBV virus (mono) causes lymphoma and other cancers (~140,000 deaths each year). These risks are real, but about the same scale as thousands of other tiny risks inherent to human life.

As for blame, no one would say you killed your high-school boyfriend by kissing him decades ago, even if he eventually died from EBV+ lymphoma. Trying to optimize for a seventh-order consequence is not a viable way to live life or make risk-based decisions. Things in life are not that predictable, and many events are overdetermined (if it weren't one thing, it’d be another). Public health officials making this argument about COVID-19 should know better. This idea is central to risk management and taught in introductory classes in the field.

If you are healthy right now, the disease risk from a SARS-COV-2 infection is minimal. If you are in a high-risk group, please take extra precautions, not just for COVID-19 but for all infections and high-risk activities.

We are managing two major risk factors right now: the risk from infection and age. We cannot prevent infections. We are all getting older, no matter what we do. I hope one day to be in the high-risk group for respiratory infections. I may even die from COVID-19 or the Spanish flu one day. But that would mean I was lucky enough to live that long. We need to think more about what that means for how we define success.

Science does not solve our problems. People do.

Clare Gollnick — Wed, 28 Oct 2020 21:56:00 GMT

Recent opining over a loss of trust in science and scientific institutions ignores some critical context. The Center for Disease Control (CDC), the National Institute of Health (NIH), and the World Health Organization (WHO) have made real mistakes in the official interpretations of pandemic data. Each institution has also provided contradictory guidance and engaged in fear-mongering. It is fair to point out that our scientific institutions erred in applying even some of the most fundamental epidemiology theories and ethical principles to the design of our public health response.

In many ways, the Limits of Inference is my plea to fix these problems to benefit future generations. The mistakes are not new, but symptoms of a cultural problem pervasive in scientific institutions. To fix this, we first have to admit there is a problem.

A disclaimer seems necessary, given the political climate. I am not saying WHO or CDC messed up because I do not ‘believe in science.’ I am a scientist. This familiarity allows me to assert with confidence that the pursuit of knowledge remains a political, human, and often flawed process. It is crucial to judge the institutions by the quality of the output, not only the scientific method’s theoretical potential.

The CDC and WHO both erred in the stories they told to explain COVID-19 data. Like everyone else, scientists write stories to make sense of data. If you are not familiar with my ‘stories we tell ourselves’ framework (SWTO), check out an intro here. As a brief reminder, data tells us that something happened, not why. The why is a story that a human — whether it is a scientist, an analyst, a politician, etc. — writes to make sense of it. There is never only one story that can explain a dataset. Instead, the person doing the analysis chooses what they consider the best story and presents that as an argument to inform a specific decision or achieve a particular outcome. Scientists have no secret method to better storytelling, just the luxury of a lifelong focus on a narrow problem space.

I am frustrated with our scientific institutions today due to the repeated insistence that ‘science says lockdowns are necessary.’ My frustration is not only because I disagree with the conclusion (I think we have better options), but because that statement is unscientific on its face. Science cannot say what we should do about any problem—pandemic or otherwise— because we make decisions based on goals, values, and ethics. Science has no system of ethics, nor any knowledge of our goals or the available solutions. Hence science has no opinion on any policy in response to the pandemic or any problem in society. Individual scientists can have opinions, however, and that is the point. Science does not say lockdowns are necessary; some scientists do. Our leading scientific institutions and their leadership do not know the difference between science and their own opinions. This blindspot is a red flag that government leaders and decision-makers should not overlook.

Thomas Malthus has the infamous position of being synonymous with the type of hubris that plagues these modern scientists. In 1798, Malthus analyzed historical trends and observed that human population growth is exponential, while food and resource growth is linear. As technology improved, the food supply increased, but the population also grew as a result. As the population grew faster than the food supply (see the graph), humanity kept finding itself in a trap. Therefore, Malthus concluded there was an upper limit on the earth’s population. If surpassed, a catastrophe such as famine and war must occur.

Malthusian Economics. Image Source: Wikipedia

Then, the industrial revolution began. Food supply outpaced population growth. Malthus’ prediction was made to look ridiculous within only a few years.

Notice the form of the error he makes in his doomsday predictions for the human race: Malthus could not imagine a technology that would increase food production meaningfully, so he concluded that no solution would ever exist. He told this story using both data and mathematical models — It has not happened in the past. I cannot imagine how this would happen; therefore, it cannot occur. Intentional or not, Malthus told a story that assumes all human ingenuity resided in his mind and his mind alone.

One thing to realize about Malthus in the context of the stories framework — Malthus was correct on the science itself. The fundamental relationships Malthus observed about food production and population growth and their respective growth curves were accurate in 1798 and are still fair approximations of the dynamics today. The mistake was that science could not (and still cannot) be used to know the solutions to problems. Science cannot inform us about what is possible, just what is currently happening.

Science says catastrophe will inevitably occur. Science says global lockdowns are the only way out. The parallels between Malthus and some of our current scientific leaders are hard to ignore. Even when presented directly with the idea that they could be wrong, our scientific leadership insists — by the authority of The Science, The Data, and The Math— that there are no other viable alternative solutions. Clearly, the sarcastic emphasis and capitalization are mine.

This hubris stems from a pervasive cultural problem within scientific professions that incentivizes scientists to see their own perspective as objectively correct, instead of a story they have constructed to explain some limited data. We have trained an entire generation of professional scientists, and now another new field of data scientists, to believe that data and math matter more to problem-solving than reason, creativity, and strategic thinking.

Scientists, by-and-large, are not inventors, not engineers, not entrepreneurs. Solutions are not a laboratory scientist’s wheelhouse. Many people can contribute to solving our world’s biggest problems. We need more than just a tiny-group of narrow-minded scientists thinking about how. Scientists should be asking for help, not claiming to know everything themselves.

Solutions to our problems do not come from analyzing data—creativity and vision matter. Even scientific research is a creative endeavor when done well. Data is only a tool. Let’s work to understand why this is true more intuitively, preferably before the next global pandemic. More to come.

Defining Success: Zero-risk is not an option

Clare Gollnick — Wed, 28 Oct 2020 20:03:00 GMT

Only if we end the pandemic everywhere can we end the pandemic anywhere. The entire world has the same goal: cases of COVID-19 need to go to zero.
Our World in Data

The above quote is from a site that aggregates data on the COVID-19 pandemic. I have seen similar ideas expressed around the internet. Though well-intentioned, this type of thinking is damaging our global public policy response. Here’s why.

Zero cases of a COVID-19 is an unattainable goal. Investing resources trying to reach a goal that is unattainable usually causes problematic second-order consequences.

Here is a more realistic expectation. Even after a vaccine exists, we expect this coronavirus to cause seasonal and localized outbreaks for many decades. We should aim to get near this baseline level of risk as quickly as possible, using the least costly measures that get us there.

We cannot eradicate viruses or even effectively prevent them from spreading by changing our actions. Though they vary in severity and prevalence at any given time, nearly all our viral enemies are still around on earth today. Polio still exists. Measles still exists. HPV still exists. Herpes still exists. Influenza still exists. West Nile still exists. Ebola still exists. Chickenpox still exists. HIV still exists. Zika still exists. The list goes on and on.

Smallpox is the only virus that infects humans ever to be eradicated and one of two viruses in total. That is a total of two viruses out of the gazillions of viruses on earth today (and gazillions underestimates the total number of viruses). Smallpox is an exceedingly rare exception to the rule.

Nature, not scientists, determine which viruses spread. There have only been a handful of viruses that have ever met the conditions necessary to attempt eradication. We knew early in 2020 that SARS-CoV-2 was definitely not one.

As this virus is not eradicable, SARS-CoV-2 would still be circulating in our environment no matter what policy actions we had taken in response to its initial spread. Even if the entire world socially distanced for weeks and the number of infections got to zero, it would be a temporary win. Before too long, outbreaks would start again. As long as we remain a naive population (not immune), new infections risk starting the pandemic cycle over again.

Risk is a game of probabilities and expected outcomes. We seek policies that reduce the odds of severe disease (benefits) without unnecessary sacrifices to other aspects of life and health (costs). In other words, we must do a trade-off analysis.

There is nothing to trade if one doesn't first have that ability to change outcomes, i.e., control. We have some amount of control, but less than we think. For example, we had enough resources and the capabilities to minimize disease in isolated, high-risk populations, such as nursing homes. We should have done this. We didn't do this. That's a topic for a different day. Despite our failures, no country ever had enough resources or capacity to prevent all resurgences and or all outbreaks of a virus that is this infectious, with so much asymptomatic spread.

Hindsight bias leads commentators and analysts to believe that outbreaks are preventable or controllable. After an outbreak occurs, it is possible to trace the series of events with precision. Each step seems to have a clear cause and effect. Did someone sing? That was the cause. We should ban singing. An analyst cannot observe how many other environments with identical initial conditions (i.e., situations where people sang) in which outbreaks did not result. This is critical context if one wants to design effective policy. If we ban every action that has ever led to a person spreading disease (hugging, buying food, saying hi, exercising, treating a patient, etc.), there would be nothing left in our lives.

Not everything is predictable. Thus is the nature of chance. Data nerds like myself will be the first to point out average trends and interesting correlations. The worst outbreaks spread in crowded neighborhoods, the worst outcomes occur in groups with poor healthcare, and most outbreaks happen near international transportation hubs. But these correlations will never be sufficient to be used as a decision threshold. They are real correlations but neither necessary nor sufficient to design policy. They are not predictive.

Hindsight bias leads us to believe we have control. This leads to unrealistic expectations and definitions of success. When we overestimate the scale of possible benefits, strategic leaders will reject solutions that offer effective and efficient risk mitigation and instead choose to pursue an impossible goal. Trade-off analysis breaks down when you do not correctly define success.

Economists, CEOs, governors, and prime ministers are working from the assumption that long term gains will outweigh the short-term sacrifices. But those gains will not materialize, not in the ways many are imagining. Whatever the lower bound is on COVID-19 prevalence, it is non-zero. Risk will still exist. Factors outside our control set the lower bound, i.e., how fast the virus mutates, how strong our immunity remains, and the remote possibility of future technological invention. When will scientists figure out how to cure a viral disease? No one has any idea.

The graph below is a conceptual representation of available trade-offs. The shape of the curve may look familiar. It’s just a depiction of diminishing returns. If it were possible, to stop COVID-19 forever, the benefits would keep accumulating even as we adopt increasing costly preventative measures. But since we cannot eliminate the disease, we end up in a plateau region where we make marginal improvements to the COVID-19 pandemic while incurring massive costs.

Diminishing Returns from Increasingly Costly Public Health Policies. For Illustrative Purposes. Not Based on Models or Data. Image by Author.

Some believe that we should do everything in our power to save even one life. That's wrong for several reasons. To start, it is not within the set of solutions available. The techniques we use to mitigate damage from infectious disease — quarantines, shutdowns, social distancing— cause harm too. Often this is presented as a false choice, pretending as if our choice is to accept increasing economic damage in exchange for public health gains. The reality is that shutdowns and social distancing create a public health crisis of their own, causing deaths and increasing risk in different ways than the problem they seek to solve. Even if we choose to go all-in on COVID-19, zero health risk is not an option.

We should focus on where we have control: minimizing the extent of the damage done on the path to herd immunity. Instead of obsessing over which states currently have outbreaks (which is unavoidable for any highly infectious disease), policy thinkers should be using data to inform decision-makers about what is ‘as good as it gets’ when it comes to controlling infectious disease. Experts in their fields can then make informed trade-offs, ideally implementing strategies that keep us in the cost-effective part of the diminishing returns curve. Until we do this thinking, we are likely to be engaging in a type of self-flagellation.

Beyond ineffective policy, the mistaken belief that we could eliminate this virus’s spread is also sowing social unrest and preventing economic recovery. It is preventing individuals from doing the necessary risk assessment and personal trade-off analysis in their own lives. Disease is part of human existence. We will never be at zero risk, not from COVID-19, not from cancer, not from the flu. Yet, time moves on, and our lives progress.

Fear tends to dominate our thinking. As policies everywhere continue to fail to meet impossible expectations, people get increasingly scared. The public starts to look for someone to blame. They have been told that outbreaks are preventable, so see every bump in the graph caused by personal and individual failures within their community. The fear-driven desire to see anyone, even friends or family, as the root causes of the continued pandemic leads to angry outbursts. Here in Brooklyn, I have seen strangers yelling at each other in the streets about perceived threats to their safety. This fear and anger will make recovering from this natural disaster much more painful than it needed to be.

Blame and shame have no place in a public health response. It wasn't appropriate to blame the gay community for AIDs and certainly does not make sense to blame our friends and family (or college students or beach-goers) for COVID-19. The idea that one person’s actions could set off a chain reaction that might somewhere down the line cause another type of harm is another example of hindsight bias. It assumes perfect knowledge of the consequences of a series of random events, i.e., ‘the butterfly effect.’

Chaos theory is a weak starting place for the design of effective public health policy. This sort of thinking will make us end up like Chidi in the TV show 'The Good Place.’ Chidi was so afraid that every action he took could eventually cause harm —even something small like buying a blueberry muffin—that he ended up paralyzed by indecision. Chidi sacrificed everything and accomplished nothing, good or bad. To get out of this negative cycle, we have to start being pragmatic: differentiating what is possible from what is likely, what is within our control and what is not.

The risk of infectious viral disease is not something we eliminate or avoid. This type of risk is something we accept and mitigate. We weigh different types of harm and consider a variety of stakeholders. We acknowledge that individual people measure their lives differently and have competing interests. Individuals cannot be held responsible for second or fifth order consequences of day to day choices they need to make to live. We need effective strategies that mitigate harm proportional to the costs incurred. In public health, ethical value systems often clash; autonomy may support different decisions than utilitarianism. But this is not the first time our ethical frameworks have come into conflict with each other. With this risk mindset, we can start discussing which measures have the greatest return on investment.

As Harvard Medical School epidemiologist Julia Marcus has written about beautifully, risk is not binary. Zero risk is an imaginary goal. Our resources are not unlimited. Our lives are not infinite. Let's keep our policy solutions in the part of the diminishing returns curve where the cost-benefit analysis makes sense and builds policies that take a holistic perspective on human health.

What is a probabilistic value proposition?

Clare Gollnick — Tue, 22 Sep 2020 20:09:00 GMT

Imagine if Gmail delivered 95% of emails to the correct person’s inbox and sent the other 5% to a different person’s inbox. Or, imagine if Microsoft Word saved 4 out of every 5 documents successfully but otherwise encoded the file incorrectly, rendering the document unreadable.

A product that only works some of the time — this would be crazy and frustrating for users. No one would buy a product like this. Unless…

Imagine if only 1 out of every 30 shows Netflix recommended was one that you actually wanted to watch, and most were shows you’ve already seen. Or imagine if Amazon’s Alexa only understood 8 out of 10 things you asked of her, often just responding, “I’m sorry, I don’t understand the question.”

A product that only works some of the time — not as crazy in these contexts. What’s different? The last two examples are product features designed to have a probabilistic value proposition.

The hallmark of a probabilistic value proposition is that the benefit a customer expects from the product may or may not occur during any particular interaction. Instead, value is delivered as an aggregate over multiple interactions: one in five, nine in ten, one in a million. These interactions may occur across periods of time, across diverse contexts, given extreme or rare circumstances, or even across different groups of customers.

Products that deliver probabilistic value are rare compared to more traditional products like t-shirts, canned food, and televisions. However, they exist both inside and outside of computing. For example, a seat belt has a probabilistic value proposition. We wear a seat belt during every car ride but experience life-saving benefits only if we get in an accident (ideally never).

Artificial intelligence and machine learning are based on statistics, an entire discipline built on the concept of being right probabilistically — often enough to be useful, but not always. By their nature, all products using artificial intelligence and machine learning have a probabilistic value proposition. Understanding the implications is, therefore, especially important for founders or practitioners in this space.

All artificial intelligence and machine learning products have a probabilistic value proposition. (Image by author)

A probabilistic value proposition means thinking about all possible customer experiences

When we design a product with a deterministic value proposition (the normal kind like Google’s email server and Microsoft Word’s save-to-disk), we focus on maximizing our product's benefits by building new features and functionality. We define these benefits relative to the world where a customer is not using our product or where a customer is using a competing product. Therefore, the value proposition is the sum of all the benefits our product provides relative to those worlds.

When we design a product with a probabilistic value proposition, we still focus on maximizing the benefits of using our product. But, because the benefits do not always occur, we must also consider any costs associated with neutral or bad outcomes that a customer may experience trying to use our product. We need to design two user experiences: one where the benefit is provided and one much less glamorous user experience where our product does nothing or gets the answer wrong. The value proposition of a probabilistic product is the sum of all the benefits of using the product minus the costs associated with using the product when it has bad or neutral outcomes.

Seat belts provide the benefit of safety in the event of a car accident and must be comfortable enough to use the other 99.99% of the time. Gmail can use AI to filter spam from our inboxes and must also provide the functionality to restore real emails to the inbox when the data-driven machine inevitably makes a mistake.

No one likes to imagine bad outcomes from using their product, but probabilistic value propositions require that type of thinking. At some level, the benefits must outweigh the costs. How that balance is achieved can have strategic ramifications up to, and including, the business model.

When the benefits of using your product clearly outweigh the costs, in a way your customer can understand, your team no longer needs to treat the probabilistic nature of your product as a bug to be squashed. AI products built with well-balanced probabilistic value propositions are noticeably easier to operationalize, scale, monitor, and troubleshoot.

Theory over Data

Clare Gollnick — Wed, 09 Sep 2020 20:22:00 GMT

We know how this pandemic will end.

We did not know precisely when it would start. It took a moment to realize it was happening. Yet, we knew how this was going to end a year ago, two years ago, even a decade ago.

Nothing novel published this year in Nature or Science (or posted to biorXiv) will be relevant to how this ends. New data, particularly any data that suggests something different or unexpected, is much more likely to be wrong than right.

The unit that matters for scientific predictions is not data but theory. Scientific theories are famously powerful because they generalize. Scientific theories work even without specific knowledge of the circumstances.

The day to day work of a research scientist being what it is, requiring laser-like focus on the bleeding edge of knowledge and measured by the speed of publication, it is easy to understand why this community has developed an obsession with novelty.

It is a problem when those habits bleed over to the management of a public health crisis. Knowledge is not generated study by study, week by week, or day by day. Scientific knowledge grows decade by decade or even century by century.

A dismissiveness of old theories and obsession with new data has led to several bad decisions during this global pandemic. The cycle goes like this: forget what was known from theory, delay essential decisions out of uncertainty, act on erroneous conclusions from brand new studies, then eventually figure out that the original theory was right all along.

Face Masks

During the 1918 influenza pandemic, germ theory was brand new and mostly undeveloped. No one had heard the word virus or knew what distinguished viruses from bacteria or other pathogens. But scientists had figured out that the public wearing face masks effectively slowed the spread of influenza.

One hundred years later, during the COVID-19 pandemic, modern scientific leaders — CDC, WHO, NIH — did not think there was enough evidence that masks would slow the spread of an infectious respiratory virus. This decision turned out to be a mistake.

The breadth of evidence available at the time makes this a tough one to understand. For one, other countries have used face masks to control epidemics in recent years. Also, doctors and scientists use face masks every day to prevent exposure to viruses in hospitals and labs across the country. Evidence that masks were useful in preventing exposure to infectious disease was, literally, in front of our faces.

Yet, no new study specifically demonstrated that masks worked for this new coronavirus, so the experts were unsure. Excuses for the uncertainty abound. Masks are unhelpful or even a placebo. It was not clear if asymptomatic spread was a factor. People might use them imperfectly. It would give a false sense of security.

It is worth noting: none of these concerns are relevant in a crisis. These are optimizations. There was nothing yet in place to optimize. 10% effective is better than 0% effective. Some insist it wasn’t a matter of effectiveness but supply. If supply was the concern, then the right strategy was to ask the public to put a t-shirt in front of their face, a bandana, or a shield. Nearly any physical barrier would help. There was a shortage of pink yarn from the sheer number of knit hats within very recent memory. There is little doubt that an effort of that scale or bigger would have mobilized in response to a global pandemic.

Today we are using masks to control the coronavirus, exactly as the public did for the flu in 1919, and China has been for the past decade. The experts eventually caught up to where the thinking was a hundred years ago. The information needed to make an effective decision was available long before this pandemic began. There just wasn’t a study or data to prove it, and the experts were too caught up in the details and edge cases to accept imperfect solutions and think outside the box.

Disease Severity

Theory also predicts that the next global pandemic would most likely be of “flu-like severity.” This has turned out to be more or less the case, probably less deadly than the worst flu strains in history. COVID-19 is solidly within the range of variability of influenzas that have evolved in the past.

Yet, the scientific community spent months in the middle of this crisis debating the numbers after the decimal place, and how to correct the data to get the exact right number. This confusion took up space in journals and the media, and many concluded that this virus was worse in every way than the flu. Many people, including some epidemiologists, still believe this disease to be more meaningfully more severe and more infectious than the flu.

There is a lot of domain knowledge packed into that phrase “flu-like severity,” so it is worth introducing the theory itself.

The Theory of Severity and Spread

A virus is not alive and cannot reproduce by itself. It needs a living host. This fact is what most differentiates a virus from bacteria. When a virus uses the cells of another organism to replicate, we say the host is infected. Infection is not the same as disease. A disease is defined by the symptoms or suffering inflicted on the host organism due to a viral infection. Only a tiny fraction of viruses cause disease.

The dependence on other organisms means the pace and path of a viral outbreak change with the host's social behavior. The host behaviors often matter more than the biology of the virus, particularly when controlled for the method of transmission (airborne, mosquitoes, physical contact, etc.).

Viruses are more infectious when they spread without disease, i.e., asymptomatically. Healthy hosts move around and interact more socially than diseased hosts, giving the virus more chances to spread. The most viral viruses, the most infectious viruses, are ones that the public has never heard about. These are infecting humans all the time, causing numerous non-deadly epidemics. The goal of public health is to minimize disease, not eliminate viruses from the earth. Infection is not disease, so we do not consider it harmful.

When a viral infection reliably causes a very deadly disease, one where an infected individual is likely to become sick or die, the number of infected hosts will be much lower. Humans distance ourselves from our social groups when sick, so infections that cause suffering or disease have fewer chances to spread. Examples of viruses that cause these deadly diseases are Ebola, MERS, and smallpox. Epidemics of these types spread slower and are more likely to stay geographically-localized (though globalization has weakened this effect).

The third class of viruses causes the deadly global pandemics of history. These pandemics are rare because it requires a strange set of circumstances to achieve both fast viral spread (infection) and impact on health (disease). It requires a disease that is both severe enough to kill and also minor enough that it does not impact the social behavior of the host. There are two ways viruses have been able to pull this off. One is when hosts are contagious and asymptomatic for long periods before a deadly disease develops. This is known as delayed disease onset (HIV-like). The more traveled path is one where different hosts have disparate outcomes, there is unequal risk within the host population. Most hosts survive asymptomatically or with minor symptoms. However, a small percentage of hosts experience severe disease and even death. This is what is meant by a disease of ‘flu-like severity.’

The risk profiles of flu-like illnesses are predictable from an understanding of the mechanisms. Viruses can only mutate so much and still be infectious. Hosts within the same species are by far more biochemically similar than they are different. So flu-like viruses exploit other systemic health issues, impacting the same high-risk groups that suffer the most consequences from all diseases (not just viruses): the young, the old, the sick, the immunocompromised, or anyone in environmental conditions that increases the odds of secondary infections (like wartime conditions or poverty).

As the human population is large, even a minor disease can result in a heartbreaking death toll if it can effectively reach any of these high-risk populations. The result is a counterintuitive situation where the virus that has caused the most deaths and the worst pandemics in history is also a common nuisance that people recover from numerous times in life.

In summary, when humans infect each other, disease severity (risk to the average person) and its ability to spread (infectiousness) are inversely correlated. One goes up, and the other goes down.

Application of Theories in Practice

Theories like this are powerful because it is possible to know a lot from little observation. If one knew absolutely nothing about a virus and had to take a guess, the default guess is that the virus is harmless. There are many more harmless viruses than harmful ones. As soon as this virus was observed in China, we knew it was a deadly virus (people were dying quickly). There are more Ebola-like deadly viruses than flu-like, so the best guess at this stage was it was another coronavirus like SARS or MERS. Shortly after that, the new coronavirus was observed in multiple counties. Over the next week, as the virus spread and cases soared, it was increasingly clear that this was a disease of flu-like severity. It is hard to imagine a virus spreading widely to multiple continents simultaneously without healthy hosts going about their lives as usual.

Inference from this theory required no numbers, no statistics, just a few observations. This is textbook stuff, so it is promising for the scientific method that things have more or less played out as predicted.

Scientists are obsessed with details. Some will fairly criticize my summary above and point out that aspects of this theory are wrong. The amount of time a person spends asymptomatic varies between viruses. The biochemistry of the virus also plays a role. All of these things are correct. My summary is a generalization.

Obsessing over edge cases is missing the point. When leaders make strategic decisions, the details do not matter much because the solutions available are similarly broad. The most important thing is not to end up fighting the wrong battle altogether.

Asymptomatic Spread

Like face masks and disease severity, months of discussion and media coverage was devoted to the role of asymptomatic spread in this pandemic. The experts (WHO) repeatedly cautioned that asymptomatic spread had not been observed to be a factor early in the pandemic.

Asymptomatic means the absence of any visible symptoms or disease. It is challenging to observe asymptomatic spread. By its nature, it often goes undetected. It would require testing of seemingly healthy individuals. Diagnostic tests will be of limited supply early in any pandemic, not necessarily available for researchers to use to test a hypothesis and publish a study.

Theory predicts that asymptomatic spread is a major contributor to a pandemic's speed with human to human transmission, and SARS-CoV-2 was moving fast. Time is of the essence, and different decisions are required if the contagious are among the healthy. For example, one would be telling everyone—not just visibly sick people— to wear a mask if the goal is to flatten the curve. Asymptomatic spread also changes the likelihood of success of containment measures, how long the measures would need to last to be effective, and the severity of the human and economic costs incurred from preventative measures.

It has now been shown via a few different studies that asymptomatic spread is critically important to the infectiousness of SARS-CoV-2. It is great to have confirmation, but we could have known that this was likely from theory alone. There are good odds of being right using basic reasoning, plus the added benefit of being empowered to act quickly in an evolving crisis.

Nothing New Under the Sun

Good theories are many studies in one small package. To conclude that severity and infectiousness are inversely correlated took hundreds of studies (many of whose top-level conclusions were wrong), including biological experiments, historical analyses of pandemic events, statistical models, and field experiments on ongoing epidemics. This theory led to the development of techniques that allow containment of Ebola-like viruses, including quarantines and contract-tracing. A theory with a track record of helpful predictions is the best evidence the scientific method can provide. No theory is perfect, but tried-and-true theory is much less likely to be wrong than new data collected mid-crisis.

If a new study reports a finding that conflicts with existing theory, that’s one vote yes and hundreds of other votes no. It is better to trust theory until there is more evidence that falsifies it.

In a pinch, a useful heuristic to judge a new study is to look for a plausible mechanism for the claim. Never trust any number, effect size, or statistical significance metric. The data matters much less than the quality of the idea.

Rare and strange symptoms of disease seem to appear only in specific age groups? No. Random correlations with obscure treatments or blood types? No. They are statistical results and are more likely spurious than real. Data lies. Theory has had time to be corrected.

The Trump Factor, The Noise Factor

Many experts know to use theory first. Accurate inferences are everywhere if one knows where to look for them. But, there were not enough voices to counter the scale of the noise.

It does seem there is truth to the idea that there was a Trumpian takeover of expertise. The first spokespeople for COVID-19 at the CDC appeared to be working from this theoretical framework (though wrong on face masks). In February and early March, these scientists made each of the observations outlined above and then presented predictions consistent with theory. The message was this: the disease had spread globally, almost entirely undetected. Therefore it is clearly very infectious. Many Americans will soon be exposed to the novel coronavirus if they are not already. Most Americans were not at any substantial risk from the disease. The epidemiologists then insisted that focus and resources stay on protecting high-risk groups.

Then, this signal stopped entirely. All CDC telebriefing on COVID-19 stopped for around two months. Things appear to have gotten weird. According to reporting at the time, Trump was upset that the CDC's message was that this new virus would infect many people. The details are murky, but the CDC did go silent. Since VP Mike Pence took over, the early thought leaders and experts are nowhere to be heard, though that could be due to burnout during a crisis.

This two-month gap in information and messaging from a critical government institution left a void. That void was filled by the loudest voices, not the most informed. The original predictions were soon buried under the weight of scientists and data scientists analyzing the data just out of China and Italy. These scholars concluded that this disease was terrifying in an unknown and never before seen way. At least, this is what appeared to happen in news sources on the progressive and center-left.

The numbers, when viewed exactly as reported, did look terrifying. They suggested a virus that is both more infectious and more deadly than the flu. That would be unheard of, an entirely unprecedented event. Nothing like that has occurred before. Worse than the deadliest virus in history.

Claiming something is entirely new is good salesmanship, but these claims are warnings to any scientist, even those coming from different fields. What is more likely: that this is something wholly different, something entirely out of step with what has happened before (i.e., all of human history), or that these numbers are wrong?

Those familiar with the fundamental theories in epidemiology would have had even more reason to be skeptical of these ‘hot-off-the-press’ data-driven claims. If the disease was meaningfully different than the flu in severity, it was either more infectious or more deadly. But it was not both.

The evidence that the virus that was quickly spreading was compelling. The evidence that COVID-19 was deadly outside of the usual high-risk groups was less compelling. The worst cases are observed first because they are easier to find. The early data collected in an epidemic is expected to be skewed by this sort of observer bias. Everything we observed was largely consistent with theory. There was no reason to think something entirely new was happening. Too few were willing or able to think past the data, past the numbers on a page.

The problem with misinformation is that it results in bad decisions

Fear is a powerful motivator. People are afraid of novel viruses for understandable reasons. Conclusions that appear to justify that fear are emotionally validating. These conclusions will overpower more well-reasoned ideas. But that is only part of what went wrong.

The flu is the most deadly virus in human history. Being flu-like in severity, plus a novel virus with no immunity meant the world was (and is) facing a severe pandemic, among the worst viral pandemics we have seen in history. We had many reasons to fear this virus. They just were not the reasons that were used to make policy decisions. That is what makes this type of inferential mistake such an enormous problem, not a minor annoyance. It compounds the damages from an already very dire situation, making it even worse than it needed to be.

Given the understanding from the theory that any disease capable of causing this type of global pandemic would be approximately flu-like in severity, one would have known 1) masks were important, 2) early claims that the disease was deadly in healthy adults were unlikely to be accurate, 3) that asymptomatic spread was a factor 4) that the risk from disease is going to be dramatically different between highest-risk and lowest-risk populations. Critically, one would have also had to have come up with an answer to the obvious question:

If this disease is like the flu, why are locking down today when we don’t lockdown for the flu?

Obvious questions that stem from theory deserve to have thoughtful answers. Good questions expose flaws in our reasoning or thought processes. Many experts believed this disease was an existential risk, one different than the flu, so they never tried to come up with an answer to this obvious question. Thinking through this in the context of the theory and available resources would have been a critical thought exercise that shaped a better policy response. Why don’t we just contain the flu? Perhaps there is a good reason for the past hundred years of decisions that have been made?

Theory on How this Pandemic Ends

Absent treatments or cures, our public health tools for infectious diseases, are designed to amplify natural biological defense mechanisms. We do what nature does already, just more of it and better.

Quarantine, contract tracing, etc. — these were tools designed to work on Ebola-like viruses. Specifically, they are designed to speed up the natural mechanism of containment that occurs when every host in a population is at risk. Humans naturally remove ourselves from social groups when sick. Our species may have evolved the biochemical mechanisms that cause certain symptoms for this purpose. Evolutionarily, it helps make sure a person knows when they are ill and does not infect others.

Social distancing is not the natural defense mechanism for viral diseases that are flu-like. It does not work well when there is a large amount of asymptomatic spread. Evolutionarily speaking, it would not be in our best interest to quarantine or social distance even when not sick. Humans being social is essential to our survival. If we always socially distanced just in case, humans would have gone extinct. Life is what happens when we are not sick. Perhaps there is something to learn somewhere in there about the costs of blunt social distancing approaches.

Nature has a different defense mechanism for flu-like pandemics: population immunity. Though relevant to all viruses and pathogens at the individual level, immunity at the population level is the dominant mitigation method for ending the pandemic phase of a flu-like virus. It is a solution that reflects the reality that there is disparate risk within the population. Infection is not disease. Asymptomatic infection is not a disease. But, asymptomatic infection does usually results in some amount of immunity. Population immunity lets the strong protect the weak.

The same feedback loops that cause outbreaks work in reverse to protect us once immunity exists. If even one member of a family gets sick during an initial infection, everyone is potentially exposed. In reverse, if even one member of the family has become immune, it partially protects others in a family unit. This health benefit of immunity comes long before the 70% threshold that has been unfairly maligned and entirely misunderstood throughout this pandemic.

While our most successful public health tools work to amplify nature’s defense mechanisms, the choice to use lockdowns and shutdowns in 2020 does the exact opposite. When the infection doesn’t spread even in low-risk populations, everyone is still susceptible. The moment people try to live normally again, the virus will start anew. Time passes, but the end is still equally far away. Substantial suffering and costs have accumulated in the meantime. Many people have felt dismayed at this realization, believing that social distancing helped this problem go away. That is not the case. Social distancing does not reduce total harm. It only delays it, spreading it out over an extended period (flattening the curve meant getting infected later, not never). We cannot just run or hide from a virus. Nature determines the rules. We just play by the game.

Vaccines amplify population immunity by intentionally exposing healthy members of the population to a virus so that they develop immunity faster. But vaccines have never been used successfully to slow a pandemic. Pandemic phases of flu-like diseases end in a little less than two years on their own. We have not been able to scale a vaccination-based response in that amount of time in the past (though I believe someday this may be possible). Given that this is the first experiment of its kind, there is substantial engineering and logistical uncertainty if this will work. But there is minimal scientific uncertainty about the need for increased immunity. Whether it is by vaccine or asymptomatic spread, this pandemic ends with population immunity.

Of course, only one of those options is available to us today to minimize harm. The vaccine? Who knows.

Re-learning the same lesson

Scientists and public health officials will look back soon and realize it was possible to achieve equivalent (or much better) public health outcomes at a much lower cost to the economy and human life. The mistake was not that the shutdowns were too soft or short, but that they were fighting the wrong battle. What we know works is amplifying the natural mechanisms. Choosing to try to counter natural tendencies was a mistake.

There is a time and place for different strategies, and we should keep trying new things to get better at our pandemic responses. It will always be easier when we accurately assess the nature of the virus. The information needed to make good public health decisions was available, just not in the places that scientists chose to look.

Close the RSS feeds, ignore twitter. Open a textbook, start there. Also, we would all be better off if journalists stopped reporting anything from biorXiv.

Debunking the myth of purely ‘data-driven’ decisions

Clare Gollnick — Wed, 27 May 2020 21:20:00 GMT

It is impossible to reason from data alone.

We always make assumptions and draw from what we already believe to be true.

This should not be blasphemous. As a data scientist, I do not consider this a particularly negative or pessimistic viewpoint. It’s just a realistic assessment of what data can actually do for us as a tool.

Let’s try to see if we can build quickly build an intuition about why this is true.

Take a moment to examine the shape data above. Ask yourself, what shape will come next? What color will it be? How confident are you in this inference?

. . .
. . .
. . .
. . .

Done thinking? I’d love to hear what you thought. Unfortunately, blogs are a one-sided conversation, so you’ll have to settle for my inner monologue.

I see a set of shapes, or rather a set of two shapes, one inscribed in the other. There are broad patterns I could pull out like triangles tend to have inscribed white circles (ignore the exception). I could do summary stats to predict what comes next:

There is a 60% chance the next shape is a triangle because 17 in 28 shapes are triangles. There is a 57% chance the next shape is a triangle with an inscribed circle because 16 in 28 shapes are triangles with inscribed circles.

Why do I think it’s best to assume things will just continue in the same way? These shapes hardly look natural. They are probably artificial in some way. What do I know in my real life that could have such strange shapes?

These are pieces from the board game we accidentally knocked off the table. We were able to find all the brightly-colored pieces quickly. The white spherical pieces seemed to have rolled farther away. Since it’s all that’s left, there is a 100% chance the next shape we find is a white sphere.

But, wait, we never even observed a white sphere at all. Why would I predict something not before seen or observed? It’s as if I just made that up. Where did that come from? What does it even mean to come next when I am looking at a static image anyway? What if the next shape is not new but evolving, something like …

These are rare tri-polar magnets that change shape and color as they attract each other. The next shape will be an orange triangle with an inscribed blue star—the midway point between the square and triangle magnet evolution. Look, one square magnet has already started to evolve into a triangle, proof that we are on the right track!

All the ideas above explain the data equally well. They are all ‘data-driven’ in that they fit the data perfectly. One even allows complete certainty in our inference. Yet each one points to a very different prediction of what will happen next and why. Which one is the best? How can I tell?

Data does not speak for itself

How can one analyze a data set such that they guarantee a correct conclusion every single time?

This question has plagued epistemologists (philosophers that study the theory of knowledge) and statisticians (mathematicians who study applied data analysis) forever. Those who have proposed a generalizable solution so far have come up short.

This is what is known as the problem of induction. David Hume famously described this problem in 1739: there is no way to justify a priori that learning from past experiences (from data) is valid except to observe that it has worked before. That’s circular logic — we assume we can learn from experience because we have learned from experience.

Incredible minds have picked up the problem of induction while studying scientific thinking. This includes Karl Popper, who proposed that scientific theories could only be falsified, not proven, and Thomas Kuhn, who coined the term ‘paradigm shift,’ a consensus model of knowledge that explains the uneven timeline of advances throughout history. Each of these thinkers deserves separate consideration. This blog is meant to be a quick read, so here’s my best two-sentence summary of the least controversial bits:

Data tells us that something was observed, not why.

We need to know the why before we can learn or predict from data.

Humans must come up with the why through creativity.

Infinite stories and yet nothing to do

This leaves us with a new problem, a very different problem than analyzing data. We need a why to make an inference. As we saw above, there are many different why’s that can explain any data set. I wrote down three whys for the shape data above, but there are actually infinite why’s. You probably thought of something different than I did. If a limit exists, it is a practical one imposed by restrictions on time, energy, and imagination.

A quick note on semantics: different disciplines have used various words to describe the large set of potential why’s allowed by any data set — ideas, universes, scenarios, myths, models, generalizations, explanations, algorithms, hypotheses, inferences, etc. From now on, I am choosing to use the word story because it reflects both the creativity inherent to their generation and the skepticism that should be applied before their acceptance.

Any data set is explainable by infinite stories. To act or learn, we must choose one story over another. We need a story selection process. This selection process is called inductive reasoning. Reasoning is, literally, the act of creating reasons. Reasoning is choosing to believe one possible story over another. Reasons have nothing to do with data and everything to do with what we already believe (i.e., assumptions).

Four common inductive reasons (to choose one story over another)

#1 Someone else says that one story is true

One could choose to interview the person who collected the data or perhaps an expert in the field to get their opinion. This usually helps us with, at a minimum, figuring out what the data is designed to represent. Is it magnets or board game pieces? How many board game pieces are there? How the data was collected is important too. Was it random or selected for certain qualities? Of course, this person (source of truth) could always be mistaken, confused, tired, biased, or lying. You’d have to assume the person is trustworthy. We all need to start somewhere.

#2 The story is more of the same

By nature of being an observation, all data is about the past. To use data to do anything at all, we have to assume that the past is a good model for the future. The sun has risen each morning; therefore, it will rise again tomorrow morning. This past-is-future assumption is ubiquitous. Hume called it the Principle of Uniformity of Nature. This reason has the added benefit of making the inferences simple — things will continue to be the same… 59% of people polled said they would vote for Candidate A today, so 59% of all people will vote for Candidate A on election day. The problem with this reason is that while many things are constant across time, the world also changes. How can we know what changes and what is constant? We’ll have to assume any specific problem or inference.

#3 The story has proven useful before

This is a very famous reason, in my opinion, the greatest of all time reason or GOAT reason, to prefer one story over another. It is the heart of the modern scientific method. Scientists tell themselves stories (a.k.a. hypotheses, theories) and then try to kill the story by finding experimental data inconsistent with the story (Karl Popper called this falsification). In this view, the best science stories are the ones that have survived the longest, challenged with the most experimentation, without being rejected.

In my years as an experimental scientist, I found this ‘old rules over new rules’ reason particularly elucidating because it allowed me to prefer the story ‘the data is bad’ over stories that were actually consistent with observed data. Here is an inspired-by-real-life example. Which story should you prefer: new data collected by a first-year student demonstrates that Einstein’s theory of special relativity was wrong all this time, or a first-year student makes a methodological error during an experiment? Sure, it could go either way. But the old-rules reason would lead one to prefer the story about a student making a mistake. I, for one, would be willing to bet on it.

You may have heard this ‘old rules’ reasoning in the context of Bayesian statistics. Bayesians call the ‘best existing story’ a prior. This is a powerful approach but also leads us astray from time to time. No story is ever perfect, plus who gets to say which existing story is best anyway?

#4 The story is the simplest

This reason is better known as Occam’s Razor. We prefer the simplest story because we believe that our world is more likely to be simple than complex. Occam’s Razor has been incorporated into many machine learning algorithms, particularly those used in natural language processing with a high-dimensionality feature space. AI practitioners call it regularization, but that’s basically just renaming Occam’s razor — at its core, it is just math that says simple is better. Is our world exactly as simple as it needs to be, and no more? Or is it sometimes overly complicated? What do you think?

All good reasons, except when they are wrong

These are four of the most common reasons to prefer one story over another. For any given data problem, we will use different reasons to get from infinite stories to one.

None of these reasons originate from the data itself. None of these reasons are universally true or universally helpful. Human judgment provides the reasons, sometimes in the form of existing expertise or context, and identifying relevant concepts or ideas.

In summary, analysis from data, reasoning from data is an inherently pragmatic and human endeavor. Data alone is insufficient to come to any conclusion. There are too many stories we could tell ourselves. This idea gives a whole new meaning to the phrase ‘reasonable people can disagree.’

As a data scientist, this ‘infinite stories’ mental model influences how I think about my role in a business and society. I code my favorite reasons into an algorithm. This is what we call machine learning. But who is to say my reasons are better than others? My reasons will work well for some use cases and poorly for others. It will depend on what is true or what we choose to assume is true. There is no such thing universal best-practice of learning from data.

Human or machine, none of this even begins to resemble is a repeatable or predictable path to truth. Accepting this reality actually helps data scientists do their job, the job of making data useful for a specific purpose, instead of searching for right answers.

What is inductive reasoning?

Clare Gollnick — Thu, 09 Apr 2020 21:28:00 GMT

Premises
All 𝛂 are 𝛃.
𝑥 is 𝛂

Conclusion
𝑥 is 𝛃.

This conclusion is true.

Unless the premises have been derived from data, in which case, I have no idea.

In recent years, the idea of being data-driven has become synonymous with objectivity. This remains a point of frustration for me, as an inference from data requires inductive reasoning, a fallible process that does not guarantee a true conclusion. Inductive reasoning is less objective than its most obvious counterpart, deductive logic, which has much closer ties to traditional domain expertise.

Deductive logic is a process that starts from a set of rules (premises, domain expertise, generalizations) to inform a specific example (data, future prediction). Inductive reasoning is the reverse, going from specific examples (data, past observations) to the general rules (beliefs, hypotheses, rules generalizations, premises).

In deductive logic, one assumes a premise to be true. A conclusion can be valid (true) if it is derived from the premises, and the argument is sound.

In inductive reasoning, there are no assumptions. Instead, there are beliefs — a set of premises with variable confidences. Some beliefs are more powerful (more likely to be true) than others.

Practically, what does this look like? If a deductive premise is expressed as “all 𝛂 are 𝛃,” analogous inductive premises could take the form: “there is good reason to believe all 𝛂 are 𝛃,” or “every known example of 𝛂 has also been 𝛃,” or “many many 𝛂 are also 𝛃.” A conclusion from inductive reasoning is one that is likely or probable, given the uncertainty of the beliefs. Nothing is ever proven, certainly not objective. It always depends.

In the specific example above, one cannot necessarily conclude that 𝑥 is 𝛃 because beliefs (unlike assumptions) can be inconsistent. There may already be high confidence in the belief that that 𝑥 is not 𝛃. The process of choosing between possible (at times contradictory) beliefs and updating confidence in them as new data is observed inductive reasoning. Particularly in its most advanced forms, it is more art than science.

In the 𝛂, 𝛃, 𝑥 example above, what can one conclude if the premises are derived from data? It depends on why the question was asked, how the beliefs were generated, and even what actions might be taken as a result. So, as I began, I have no idea.

Abstract ideas like this benefit from concrete examples. This will be the focus of the next post (and probably dozens after it).

First, let’s make sure we set the stakes sufficiently high.

This is not just a semantic game for philosophers. Decision-makers who act confidently based on data alone will overestimate the strength and ’ strength and quality, risking the success of long-term goals. This impacts every field from doctors practicing medicine to entrepreneurs and executives setting business strategy, investors hedging the market, and government officials deriving policy.

I have found that understanding the nature of inductive reasoning, its benefits, and its disadvantages allow me to use it as a tool more effectively. That is what this series is about. There are limits of inference. Data is not magic.

How Science Became Information Engineering

Clare Gollnick — Fri, 03 Apr 2020 19:57:00 GMT

My statistics obsession was initially fueled by fear.

I was two years into a graduate research project examining the differences between neuroplasticity in adult and developing brains. Things were going poorly.

Coming into graduate school with a fellowship, I had the rare freedom to come up with own my graduate research project. Seeking easy wins, I choose to do something crazy. I started by trying to reproduce well-known past experimental results.

I designed conceptual replication experiments. Conceptual replications are experiments designed to test the same hypothesis of a previous study without using the same methods. Some scientists consider this type of experiment inferior to exact replications. Still, they are practical. Wet lab research is expensive. Labs do not always have the same equipment or reagents needed to do an exact replication. Plus, experimental methodologies improve with time.

I also happen to like conceptual replications for their scientific value. A scientific finding is not particularly useful if it is only true in a very narrow experimental context. If an effect is also observable from other measurement modalities or in slightly different experimental contexts, it suggests the potential for a larger impact.

So two years in, I had performed dozens of replication experiments. More than half of the time, I did not observe the same result as the scientists who had come before me. This was a trend. It was starting to shake me.

During my undergraduate studies, I worked in three labs doing independent research. In two out of three positions, I started my research by attempting to reproduce the experiments of a Ph.D. student who had graduated. Both times, I failed to replicate the previous graduate student's published results.

I shrugged off these early experiences. I was inexperienced. Graduate school was my chance to do things right. I had set out to do better research, to be a better scientist. Reproducing results, some of which were in textbooks, seemed like it should be easy. It was not working. I was failing again. I sensed my scientific career was at risk.

Statistics became a critical part of my search for an explanation. At first, it was a troubleshooting endeavor. I read dozens of statistics textbooks. I spent my weekends running Monte Carlo simulations to develop strong probabilistic intuitions. I re-implemented hypothesis tests and Bayesian algorithms in Matlab—to prove to myself I could. I ended up reading a ton of philosophy (epistemology). Yes, it was the start of something great.

No data. No problem.

I have my Ph.D.

I am not a scientist anymore.

This period of research changed the trajectory of my entire life. It led me to invent my signature approach to data science and statistical modeling.

I analyze fake data.

Yes, completely fake, as in randomly generated. I generate and use fake data to build models of real-world phenomena. It works wonderfully. How did I arrive at this strange methodology?

Back to graduate school ...

Given my replication experiments' limits, it was always possible that methodological differences explained the reproducibility problems. Which methodological difference was creating the problem? I started looking for a way to compare my process with the original research, step by step. I needed intermediate results to discern, where my results first diverged from the published study. All I had were the final graphs.

This was before the open science movement. There was minimal data archiving. Data sharing was not part of the culture.

So, I reverse-engineered the studies. Working backward from the published graphs, I tried to figure out what the raw data must have been. I used theory, plus experimental data I collected, to make informed estimates about noise sources and distributions. Eventually, I worked all the way back from published figures to raw data. It was fake raw data. This new data was consistent with the effects I was able to replicate. Then, I added the missing signal. When analyzed exactly as described in the Methods section of the original papers, I could reproduce the figures in the final published articles: every graph, every data point, every error bar, most representative examples. I had mastered the art of simulation.

Exceptional claims require exceptional evidence.
Carl Sagan.

Those of you who have followed the reproducibility crisis in science will not be surprised to learn that this process exposed evidentiary problems, signs of p-hacking, and other inferential mistakes. Addressing the reproducibility crisis in biological sciences continues to be a passion of mine. This is not the topic for today.

I want to talk about fake data.

Fake data is an amazing tool for developing robust, scalable, and highly-accurate algorithms and analytic strategies for business use cases.

With fake but realistic data, I could test the sensitivity of different analytic approaches. I could figure out how much noise or systemic error it took to flip the conclusion. I could identify the root causes when the results deviated from expectations. In short, I could build more robust analytics.

By controlling both the quality of the data and how it was analyzed, I controlled the output consistency. In other words, I could engineer the system.

AI products should be the result of information engineering, not data science.
Tweet

The single most important thing to learn from statistics or epistemology is that data is not knowledge. Data is not objective. Data is only as useful as the business problem or scientific question to which it is applied.

In science, we do not get to control both the data and analytics. The data is as good as our methods and experiments allow. The result is a slow, painful process of iterative discovery and learning from falsification.

In business, we control both the data and analytics. We choose what data we collect and how accurately it is collected. We have the option to enforce quality standards on data. We also build the analytic approach. We get to control the quality of the outcome. Data scientists should use this amazing power more.

In short, I have learned that data science is bad for business. Product development processes should not be scientific—science self-corrects over generations. Products should be engineered. Engineering is the application of scientific theories to achieve specific outcomes. Engineering does not need to create new knowledge.

Data is not magic. I suspect information engineering will be a big part of the future of data science. There are tricks to it. I want to write about it.

Using data to make decisions that impact the world

Clare Gollnick — Fri, 03 Apr 2020 17:34:00 GMT

Welcome to Limits of Inference by Clare Gollnick. I write on data, statistics, engineering, and epistemology.

Subscribe now

In the meantime, tell your friends!

Limits of Inference

Overdiagnosis: why scientists and statisticians think we should test fewer people

Footnotes

Overdiagnosis: false-negatives and sensitivity of RT-PCR [1]

Overdiagnosis: technical versus clinical false positive [2]

Overdiagnosis: how early detection changes the definition of disease itself [3]

How to identify overdiagnosis of a disease

Why early detection leads to more overdiagnosis

The best and worst of public health

Understanding Risk From Viral Disease

What about vaccines?

The Wisdom to Know the Difference

Science does not solve our problems. People do.

Defining Success: Zero-risk is not an option

Related Reading

What is a probabilistic value proposition?

A probabilistic value proposition means thinking about all possible customer experiences

Theory over Data

Face Masks

Disease Severity

The Theory of Severity and Spread

Application of Theories in Practice

Asymptomatic Spread

Nothing New Under the Sun

The Trump Factor, The Noise Factor

The problem with misinformation is that it results in bad decisions

Theory on How this Pandemic Ends

Re-learning the same lesson

Debunking the myth of purely ‘data-driven’ decisions

Data does not speak for itself

Infinite stories and yet nothing to do

Four common inductive reasons (to choose one story over another)

#1 Someone else says that one story is true

#2 The story is more of the same

#3 The story has proven useful before

#4 The story is the simplest

All good reasons, except when they are wrong

What is inductive reasoning?

How Science Became Information Engineering

No data. No problem.

Using data to make decisions that impact the world