# How long and how high for Covid-19

Cases of Covid-19 are trending back up globally and in the US. The world has nearly reached 10 million cases with over 2.3 million in the US. There is still a lot we don’t understand about SARS-CoV-2 transmission but I am confident we are no where near herd immunity. Our model is consistently showing that the case ascertainment ratio, that is the ratio of official Covid-19 cases to total SARS-CoV-2 infections, is between 5 and 10. That means that the US has less than 25 million infections while the world is less than 100 million.

Herd immunity means that for any fixed reproduction number, R0, the number of active infections will trend downward if the fraction of the initially susceptible population falls below 1/R0, or the total number infected is higher than 1- 1/R0. Thus, for an R0 of 4, three quarters of the population needs to be infected to reach herd immunity. However, the total number that will eventually be infected, as I will show below, will be

$1 -\frac{e^{-R_0}}{1- R_0e^{-R_0}}$

which is considerably higher. Thus, mitigation efforts to reduce R0 will reduce the total number infected. (2020-06-27: This expression is not accurate when R0 is near 1. For a formula in that regime, see Addendum.)

Some regions in Western Europe, East Asia, and even the US have managed to suppress R0 below 1 and cases are trending downward. In the absence of reintroduction of SARS-CoV-2 carriers, Covid-19 can be eliminated in these regions. However, as the recent spikes in China, South Korea, and Australia have shown, this requires continual vigilance. As long as any person remains infected in the world, there is always a chance of re-emergence. As long as new cases are increasing or plateauing, R0 remains above 1. As I mentioned before, plateauing is not a natural feature of the epidemic prediction models, which generally either go up or go down. Plateauing requires either continuous adjustment of R0 through feedback or propagation through the population as a wave front, like a lawn mower cutting grass. The latter is what is actually going on from what we are seeing. Regions seem to rise and fall in succession. As one region reaches a peak and goes down either through mitigation or rapid spread of SARS-CoV-2, Covid-19 takes hold in another. We saw China and East Asia rise and fall, then Italy, then the rest of Western Europe, then New York and New Jersey, and so forth in series, not in parallel. Now it is spreading throughout the rest of the USA, South America, and Eastern Europe. Africa has been spared so far but it is probably next as it is beginning to explode in South Africa.

A reduction in R0 also delays the time to reach the peak. As a simple example, consider the standard SIR model

$\frac{ds}{dt} = -\beta sl$

$\frac{dl}{dt} = \beta sl -\sigma l$

where $s$ is the fraction of the population susceptible to SARS-CoV-2 infection and $l$ is the fraction of the population actively infectious. Below are simulations of the pandemic progression for R0 = 4 and 2.

We see that halving R0, basically doubles the time to reach the peak but much more than doubles the number of people that never get infected. We can see why this is true by analyzing the equations. Dividing the two SIR equations gives

$\frac{dl}{ds} = \frac{\sigma l -\beta sl}{\beta sl}$,

which integrates to $l = \frac{\sigma}{\beta} \ln s - s + C$. If we suppose that initially $s=1$ and $l = l_0<<1$ then we get

$l = \frac{1}{R_0} \ln s + 1 - s + l_0$ (*)

where $R_0 = \beta/\sigma$ is the reproduction number. The total number infected will be $1-s$ for $l=0$. Rearranging gives

$s = e^{-R_0(1+l_0+s)}$

If we assume that $R_0 s <<1$ and ignore $l_0$ we can expand the exponential and solve for $s$ to get

$s \approx \frac{e^{-R_0}}{1- R_0e^{-R_0}}$

This is the fraction of the population that never gets infected, which is also the probability that you won’t be infected. It gets smaller as $R_0$ increases. So reducing $R_0$ can exponentially reduce your chances of being infected.

To figure out how long it takes to reach the peak, we substitute equation (*) into the SIR equation for $s$ to get

$\frac{ds}{dt} = -\beta(\frac{1}{R_0} \ln s + 1 - s + l_0) s$

We compute the time to peak, $T$, by separating variables and integrating both sides. The peak is reached when $s = 1/R_0$.  We must thus compute

$T= \int_0^T dt =\int_{1/R_0}^1 \frac{ds}{ \beta(\frac{1}{R_0} \ln s + 1 - s +l_0) s}$

We can’t do this integral but if we set $s = 1- z$ and $z<< 1$, then we can expand $\ln s = -\epsilon$ and obtain

$T= \int_0^T dt =\int_0^{l_p} \frac{dz}{ \beta(-\frac{1}{R_0}z + z +l_0) (1-z)}$

where $l_p = 1-1/R_0$. This can be re-expressed as

$T=\frac{1}{ \beta (l_0+l_p)}\int_0^{l_p} (\frac{1}{1-z} + \frac{l_p}{l_p z + l_0}) dz$

which is integrated to

$T= \frac{1}{ \beta (l_0+l_p)} (-\ln(1-l_p) + \ln (l_p^2 + l_0)-\ln l_0)$

If we assume that $l_0<< l_p$, then we get an expression

$T \approx \sigma \frac{\ln (R_0l_p^2/l_0)}{ R_0 -1}$

So, $T$ is proportional to the recovery time $\sigma$ and inversely related to $R_0$ as expected but if $l_0$ is very small (say 0.00001) compared to $R_0$ (say 3) then $\ln R_0/l_0$ can be big (around 10), which may explain why it takes so long for the pandemic to get started in a region. If the infection rate is very low in a region, the time it takes a for a super-spreader event to make an impact could be much longer than expected (10 times the infection clearance time (which could be two weeks or more)).

Addendum 2020-06-26: Fixed typos in equations and added clarifying text to last paragraph

Addendum 2020-06-27: The approximation for total infected is not very good when $R_0$ is near 1, a better one can be obtained by expanding the exponential to quadratic order in which case you get the new formula for the

$s = \frac{1}{{R_{0}}^2} ( e^{R_0} - R_0 - \sqrt{(e^{R_0}-R_0)^2 - 2{R_0}^2})$

However, for $R_0$ near 1, a better expansion is to substitute $z = 1-s$ into equation (*) and obtain

$l = \frac{1}{R_0} \ln 1-z + z + l_0$

Set $l=0$, after rearranging and exponentiating,  obtain

$1 - z = e^{-R_0(l_0+z)}$, which can be expanded to yield

$1- z = e^{-R_0 l_0}(1 - R_0z + R_0^2 z^2/2$

Solving for $z$ gives the total fraction infected to be

$z = (R_0 -e^{R_0l_0} + \sqrt{(R_0-e^{R_0l_0})^2 - 2 R_0^2(1-e^{R_0l_0})})/R_0^2$

This took me much longer than it should have.

## 13 thoughts on “How long and how high for Covid-19”

1. yohannes says:

the formula for the fraction of the total population infected should give zero for Ro=1. I am curious about the shape of this function just above Ro=1 since the relevant range is between 1-2. Above Ro=2 too many people are infected so mitigation will not be an effective strategy. In that case I think the most effective strategy is the Swedish one, where you split the population into vulnerable and non-vulnerable and try your best to decouple them.

Like

2. @yohannes Even if R0 = 1 the fraction infected is not zero. At R0=1, the pandemic will still infect some small fraction before burning out. You are right that I should plot the function though.

Like

3. yohannes says:

If Ro=1 then the number of infected doesn’t change. So the fraction of infected in the limit of large population is zero. If I plug in Ro=1 into your equation you get something like 0.4, which will be hard to interpret. There is a curve given here for the total infected: https://science.sciencemag.org/content/368/6492/713.summary

Like

4. I f R0 = 1 then I is a constant and S starts to decrease. As it decreases, I will then start to decrease and both will fall towards zero. However, the asymptotic S will not be 1, it will end up at some value. The formula is an approximation for R0s << 1, which will not be entirely true for R0 = 1 so the number will be off and total infected is probably less but it is not zero.

Like

5. In fact the approximation is the worst when R0=1, but a quadratic expansion won’t be too bad.

Like

6. @yohannes Your are correct. The formula breaks down when R0 is less than or equal to 1. My apologies.

Like

7. yohannes says:

I am still puzzled. Say we take Ro=1-e where e is small. The epidemic doesn’t spread. So the fraction of people infected, in the large N limit, should be zero. Now, if your result gives a finite value at Ro=1 then there will be a discontinuous jump. So you will still have to explain why there is a jump to some finite
fraction of the population. This is crucial because in most states R hovers around 1.

Like

8. The perils of asymptotic expansions. I think the new formulation now gets the R0 near 1 behavior correct. It is now zero at R0=1 if l0 = 0.

Like

9. yohannes says:

I wrote my comment before the addition. It looks right. The dependence on lo makes sense since near Ro=1 the total number infected will depend strongly on the initial infected.

I just wanted to point out that the fraction infected will depend strongest on Ro near R0~1. So states with Ro in the range 1-1.5 are precisely those where mitigation efforts will have the strongest effect on the number infected. According to https://rt.live/ most states are sitting in the range 0.8-1.2. This is precisely the regime where behavioral changes will have the most effect on the epidemic progression.

When Ro gets close to 2 then more than 80% are infected and behavioral changes (unless they are substantial) will not have much of an effect.

Like

10. yohannes says:

one last comment. Since the expression for z is unwieldy it is useful to expand in lo, which is the smallest parameter in the problem. Then z=(2/Ro^2) (Ro-1) which can be compared to the herd immunity threshold P=(1/Ro)(Ro-1) . So the total fraction of people infected, compared to the herd immunity threshold, is roughly a factor of 2. So essentially half the total number of people infected will be infected after herd immunity is reached.

Like

11. […] In order for an infectious disease (e.g. COVID-19) to spread, the infectious agent (e.g. SARS-CoV-2) must jump from one person to another. The rate of this happening depends on the rate that an infectious person will come into contact with a susceptible person multiplied by the rate of the virus making the jump when the two people are nearby. The reproduction number R is obtained from the rate of infection spread times the length of time a person is infectious. If R is above one then a single person will infect more than one person on average and thus the pandemic will grow. If it is below one, then the pandemic will diminish. Herd immunity happens when enough people have been infected that the rate of finding a susceptible person becomes low enough that R drops below one. You can find the math behind this here. […]

Like

12. […] like . The infection rate also appears in the initial reproduction number . From a previous post, I derived approximate expressions for how long a pandemic would last and show that it scales as […]

Like