Forecasting infectious diseases can help public health officials fight epidemics. But how accurate can these predictions be?

Samuel Scarpino, an assistant professor in Northeastern’s Network Science Institute, analyzed 25 years of data for 10 different infectious diseases to determine how far in advance epidemics can be predicted. Photo by Adam Glanzman/Northeastern University

Whooping cough in Texas. Measles in New York. Ebola in the Democratic Republic of Congo.

Around the world, public health officials are increasingly relying on disease forecasts to predict and fight outbreaks. But there’s a limit to how accurate these forecasts can be, says Samuel Scarpino, an assistant professor in the Network Science Institute at Northeastern.

In a recent paper published in Nature Communications, Scarpino and a colleague at the Institute for Scientific Interchange analyzed 25 years of data for 10 different infectious diseases. What they found might seem counterintuitive: as they added more data, the diseases became harder to predict.

“What we see for all these different infectious diseases is, as you see more data, you actually become increasingly less certain about what’s going on,” says Scarpino, who also has appointments in Northeastern’s marine and environmental sciences, physics, and health sciences departments.

The problem is that outbreaks are inconsistent: People behave differently, a particular disease strain is unexpectedly virulent, vaccines are developed. One year’s outbreak won’t necessarily follow the same rules as the ones before it or the ones two years down the road.

The researchers also found that certain characteristics could make a disease more difficult to forecast than others. The flu, for example, spreads at unpredictable rates. Usually, a sick individual will only infect one or two others, but if the virus is in a school, it can suddenly spread very quickly.

“Compare that to, say, a disease like measles or pertussis,” Scarpino says. “In a population that is largely unvaccinated and hasn’t been exposed before, it’s going to spread like wildfire. And that’s much easier to forecast over longer ranges.”

There’s going to be some window beyond which you can’t make accurate forecasts without gathering more data and retraining your models.

Samuel Scarpino, assistant professor in Northeastern’s Network Science Institute

This doesn’t mean we can’t have accurate predictions, Scarpino says. But it means that researchers can’t exclusively rely on old data.

“There’s going to be some window beyond which you can’t make accurate forecasts without gathering more data and retraining your models,” Scarpino says.

He likens it to weather forecasting. Nobody expects meteorologists to accurately predict a hurricane several months in advance. But during hurricane season, researchers can update their models with current data on sea surface temperatures, pressure systems, and wind speeds to determine the strength of a storm and where it might hit next week.

Disease forecasting requires a similar approach. The researchers found that, given good data in the first few weeks, models could reliably be expected to predict the spread of a disease over the course of a single outbreak.

Which means stopping future epidemics will require a continuous, worldwide effort to collect and analyze data.

“For each outbreak, we’re going to have to have computational biologists, physicists, mathematicians, statisticians, data scientists, et cetera, that will train models on data, make forecasts, and continually refine those,” Scarpino says. “Essentially, exactly what happens with the National Weather Service.”

For media inquiries, please contact Mike Woeste at m.woeste@northeastern.edu or 617-373-5718.