The Mathematics of COVID-19
There has been so much coverage of COVID-19 that some of the language of epidemiological models used to forecast the virus’s evolution is now familiar to us. But for those who do not fully grasp how those models work, here is a little primer.
The reproduction number R
It’s not often that a mathematical symbol – even if it’s only a letter of the English alphabet – becomes common in the public media. But since the onset of COVID-19 we occasionally see the symbol R (or R0) – the reproduction number – mentioned in mass media articles.
R is the number of people that a person who already has the disease – an infectious person – will infect. (An infected person is assumed to be infectious; as we shall see, this assumption can have variations.) If R is much greater than one, the disease will spread rapidly. If R is less than one, the disease will gradually dissipate; the smaller R, the more rapid the dissipation.
Think of it this way. Suppose R is the number of children a typical woman has. In the 1960s R in the United States was about three. Population increased rapidly. In some countries in Africa and parts of South Asia, R can be as much as 5 or 6 – population increases very rapidly so long as the death rate is not also high.
In Italy and other countries in Europe, in Taiwan and Japan and other places, R is low – even below one. The population ages and dwindles.
The religious Shaker communities of the late 18th and 19th centuries had – at least in principle – an R of zero. They were celibate and did not believe in child-bearing. Therefore, unsurprisingly, they died out.
Another useful figure is the serial interval – the time from when a person gets infected until the next person that person infects becomes infected. The serial interval for SARS and COVID-19 has been estimated at seven days.
If the serial interval for human births were seven days instead of what it is, about 25 years – that is, if humans reproduced every seven days instead of 25 years on average – and the reproduction number were two, then in a year there would be two quadrillion new people on earth (2,000 trillion). That’s how fast it goes.
The components of R
In the earliest phases of a disease, there are only infected people and “susceptible” people. An infected person is assumed to be infectious, and a susceptible person is one who is not infected but can catch the disease.
The reproduction number R depends on three things: how many susceptible people the infectious person interacts with per unit time (e.g. per day); the probability that a susceptible person will be infected during an interaction; and the length of time (e.g. number of days) for which the infected person is infectious.
These three variables can be multiplied together to get R. For example: suppose an infectious person interacts with 10 susceptible people per day; the probability that a susceptible person catches the disease during an interaction is 0.05 (5%); and the infected person remains infectious for five days. Then R will equal 10 people per day, times 0.05 probability of infection, times five days, for a reproduction number R = 10 x 0.05 x 5 = 2.5.
The basic reproduction number R0
The “basic reproduction number” is designated R0. This is the number of other people that the first infected person will infect.
As more people are infected the basic reproduction number R0 doesn’t hold anymore. For example, if half the people are already infected then R becomes half of what R0 was initially, because half the people an infectious person interacts with are not susceptible anymore – they have already been infected.
Early estimates of R0 for COVID-19 based on data on infections in China placed R0 at approximately 2.5 – meaning that an infected person will infect, on average, two and a half susceptible persons. Some later estimates have placed that number higher; knowledge of the disease is still evolving. This makes COVID-19 more infectious than SARS (R0 about 2.0, though estimates vary), or influenza (about 1.5), or Ebola (1.5) but less infectious than measles, mumps, and chicken pox (about 12).
In addition, the reproduction number R can depart from R0 because of measures taken to control the spread. We will come to this later.
Assumptions and approximations
Like all models, epidemiological models require simplifying assumptions. For example, “the number of people interacted with” allows for no nuance – we don’t know how close the interaction, or for how long. Obviously the interaction of a dentist with a patient is different from the interaction of a delivery person with the recipient of a package. The probability of transmission depends on these differences, but the models generally assume they’re all the same – at least on average.
Also, the model, in its simplest form, assumes the infected person is either infectious or not infectious – there is no middle ground. But some findings suggest that a person infected with COVID-19 may be more or less infectious during different stages of the disease. There are also some indications that someone who has been exposed to higher doses of the virus may be more infectious.
Partly because of these variations, many of the models divide the population into finer subpopulations – for example children who go to school; adults who go to work; people who are infected but with no symptoms, people who have symptoms, people with severe symptoms who are in the hospital, etc., each with a different R.
The SIR and SEIR models
The basic epidemiological models assume that the entire population falls into three (or four) classes: S for susceptible; (E for “exposed,” meaning – depending on the model – either infected but not infectious yet or infectious but without symptoms); I for infected and infectious; and R for “removed” meaning either recovered and no longer susceptible, or dead.
S (susceptible) -> I (infected) -> R (removed)
To keep it simple for now I’ll explain only the SIR model. For each transition from S (susceptible) to I (infected) or from I (infected) to R (removed or dead) there is a transition rate, or probability per unit time. The transition rate from S, susceptible, to I, infected, is usually designated beta (β).
The model can actually be solved mathematically, but is usually run in the form of a computer simulation. If the population is assumed homogeneous (every person interacts with the same number of other people, with the same probability of transmission, and the same number of days infectious) then the model is simple. But if the population is not assumed homogeneous then the simulations can be “agent-based,” with each agent representing the typical member of a subgroup of the population, or even each individual.
A simulation begins with a single agent or small group being infectious, with every other agent susceptible. The initial reproduction number is the basic reproduction number, R0. Each agent infects β other agents at each unit time step.i
Another transition rate is then needed: the rate of passage from infected (I) to removed (R). This rate might not be probabilistic but simply the time unit divided by the number of days between infection and recovery (or death).
This highlights another assumption of the models – that a recovered person is no longer susceptible to the disease. It is uncertain to what extent this assumption is correct. (Indeed, whether that assumption is correct has far-reaching consequences for overcoming the pandemic.) Coronavirus experts believe that a recovered person is not susceptible to the disease for at least a few months after recovering – and then, if susceptible at all, perhaps only to a milder form of the disease. Therefore, to make projections of the spread of the disease several months hence, it is reasonable to assume a recovered person is no longer susceptible.
An exception to the SIR models is the model deployed by the Institute for Health Metrics and Evaluation (IHME) at the University of Washington. Instead of using a SIR model, it simply fits U.S. data on infections, hospitalizations, and deaths up to current point in time to data from other countries such as Italy and Spain where the disease progressed faster. It thus predicts future growth and decline rates for the disease in terms of how these rates developed in other countries or regions. Because it uses a statistical, data-fitting model rather than an epidemiological model – which employs population dynamics of the SIR type – epidemiologists have raised concerns about the IHME model. But it is another way to make projections that are highly uncertain no matter what model is used.
The exponential growth phase
When the disease is new and spreading, it spreads at first exponentially. The doubling time depends on the rate of spread β. Based on the spread in China in January the infected population was estimated to double approximately every two days (with an uncertainty band of about 1-5 days).
Doubling every two days means if it started with a single infected individual then it will infect 32,000 in a month but more than six million in a month and a half. This is why, when they saw the results of the Imperial College model, officials in the UK and US decided that something had to be done to slow the spread.
Exponential spread will abate only if measures are taken to slow the spread, or if so many people are already infected that in most encounters at least one of the people will not be susceptible (or if an effective vaccine is developed). This latter stage is called “herd immunity.”
Estimates for COVID-19 are that about 70% of the population would have to have been infected before it would achieve herd immunity. Thus, at death rates that have been approximately estimated, waiting for herd immunity would imply that about 1% of the global population would die (about 80 million people). This is again why that path was not chosen, and instead, measures to slow the spread were instituted in almost every country.
Severity of the disease
The SIR model says nothing about how severe the disease is. If it is used to model the common cold – itself usually a coronavirus, albeit of a kind different from COVID-19 – it will show rapid spread but that will cause no concern because the symptoms are mild, don’t require hospitalization or medical equipment and don’t cause death.
But the consequences of COVID-19 – unlike the common cold – can be severe. Therefore, rates of hospitalization, rates of use of specialized medical equipment such as ventilators, and rates of recovery or death must be built into the models. With an agent-based simulation, it is easy to add these rates but they require yet another layer of assumptions, some of which can be estimated only very roughly from the incomplete data. For example, rates of death from COVID-19 have varied from country to country, and it is reasonable to assume the rate will change depending on how overloaded hospitals are, so it will itself depend on the number of cases relative to hospital and medical equipment capacities.
Fitting the assumptions to the data
As with many models, the assumptions used – i.e., the inputs – are usually calibrated by fitting the models to the available data.ii This can be done by varying the assumptions, running the models, and seeing which set of assumptions produces results closest to those that have actually been observed.
Early versions of the models – that is, their assumptions – are based on a paucity of data. Therefore, models are recalibrated frequently as new observations of real-world data are gathered.
Arriving at these assumptions is particularly difficult because the data we can observe are not necessarily representative of the actual reality. They may be wildly off the mark. For example, the observed death rate is based on the number of deaths divided by the number of cases. But the number of COVID-19 cases is not known because many go unreported, especially if they are mild. And even the number of deaths is uncertain because deaths reported to be from COVID-19 may actually be from “co-morbidities” – especially for people with previous underlying medical conditions, who are the most susceptible to the disease – and because COVID-19 testing has not been widely available in some countries, like the U.S., so when someone died it may not have been determined whether the person had COVID-19 or not.
Reducing the rate of transmission using social measures
Given the uncertainty around the data and the assumptions, it could be concluded that the results of the models are hardly worth a damn. But this would be wrong. Even very approximate results are extremely important.
For example, if β, the rate of transmission, says that half a susceptible person per day is infected by an infectious person, and the period for which the infected person is infectious is five days – so that two people will be infected by one infectious person in four days – then the doubling time is four days. Starting from only one person infected, a million people will be infected in two and a half months.
But if the rate of infection can be cut in half, only a thousand people will be infected in two and a half months.
Obviously, even if the numbers are wild estimates, they make clear the importance of cutting the rate of transmission by 50%, or more, and the sooner the better – a week delay can mean a quadrupling of future morbidities and deaths.
And so we have come to our present era of “social distancing” and contact tracing. The whole purpose is to reduce the first two terms that can be multiplied together to get the transmission rate β: the number of contacts an infected person makes per unit time; and the probability of transmission of the disease during that contact.
The number of contacts is reduced by advising or requiring people to stay at home or otherwise isolated. The probability of transmission of the disease is reduced by advising or requiring people to wear a surgical mask, wash their hands regularly and avoid touching their face, and keep at a distance of at least two meters from other people, so that tiny droplets expelled from the lungs of infected people nearby will not be inhaled.
And the contacts of people who have been confirmed to have the disease are traced so that they can be advised or required to quarantine themselves so they won’t infect others. Some countries such as South Korea and Taiwan and other Asia-Pacific countries have been particularly effective at carrying out this strategy. But they had the advantage of their experience with a previous coronavirus, severe acute respiratory syndrome (SARS), in 2003.
Uses of the models
Simulation models of the SIR or SEIR type or variations on them can be put to multiple uses. They can project – within wide bands of probability due to the uncertainty of their assumptions, but still providing very helpful indications – results over a period of months such as how many people will catch the disease, how many will need hospitalization, how many of those will require intensive care, and how many will die. This can project the need for hospital beds, ambulances, medical equipment, doctors and other healthcare workers, tests for the disease, and coffins and morgues. If that need will exceed capacity then capacities can be ramped up quickly or social distancing measures can slow the spread, or both – and the models can be rerun to gauge the effects.
The models could even project the effects on specific areas of the economy, and possibly approximately quantify their economic impacts, so as to evaluate the impacts on supply chains, government services, and perhaps to assess cost-benefit trade-offs to help decide what, and how much, social distancing measures to undertake.
If a vaccine is developed and vaccinations can be ramped up, the models can help determine how many vaccines to administer over time as they become more widely available, among what communities, and what age and health status groups.
The statistician George Box said, “All models are wrong, but some are useful.” Mathematical models of epidemiology are among the models that are most wrong. And yet, paradoxically, they are also among the models that are most useful.
Economist and mathematician Michael Edesess is adjunct associate professor and visiting faculty at the Hong Kong University of Science and Technology, chief investment strategist of Compendium Finance, adviser to mobile financial planning software company Plynty, and a research associate of the Edhec-Risk Institute. In 2007, he authored a book about the investment services industry titled The Big Investment Lie, published by Berrett-Koehler. His new book, The Three Simple Rules of Investing, co-authored with Kwok L. Tsui, Carol Fabbri and George Peacock, was published by Berrett-Koehler in June 2014.
i If the agent is an individual, a random number determines whether the agent infects another individual or not.
ii In some cases, calibrations must be done cautiously. In the case of models used in the mid-2000s to assess the probability of default of collateralized debt obligations (CDOs), calibrations were often circular because the data used for the calibrations – market values of CDOs and related instruments – were themselves influenced heavily by the results of the models.