Tips for untangling COVID-19 data from WSJ data ace Paul Overberg

Author(s)
Published on
July 23, 2020

Over the past several months, data on COVID-19 has been used by governments around the world to order businesses to close and people to stay inside their homes, and has been a bedrock feature of news reports on the pandemic.

But that doesn’t mean the numbers are always presented in their proper contexts.

Veteran data journalist Paul Overberg of The Wall Street Journal shared tips this week with fellow reporters at the 2020 National Fellowship for cleaning up coronavirus data, particularly as it relates to demographics.

“This is about untangling all the disparities we see and trying to get a handle on them,” he said. “This is no simple laboratory experiment. This is the real world.”

Overberg urged reporters to look beyond the raw numbers in order to identify more accurate trends. He calls much of the data “noisy”: “There’s a signal in there, and we just have to sort of get past the noise and focus on the signal.”

“It doesn’t mean we can’t do our job,” he said, when confronted with imperfect data. “We just have to work a little harder, think a little harder and write a little more clearly to help the public understand what they need to understand.”

For instance, he said, COVID-19 testing numbers are not representative of the prevalence of infection in the general population, because the people who were tested were the ones who showed up to do so. Hospitalization figures are “a lot firmer,” he said, because individuals are only admitted to the hospital if they’re ill. He called deaths “the great equalizer,” but said even those numbers may be underreported because some states haven’t been counting probable deaths.

Ferreting out COVID-19 disparities is another area where it helps to dig deeper, Overberg said. He noted that certain people have been more adversely affected by COVID-19, citing factors such as their age, sex, race, health status, income, education, and maybe even their genetic code.

For example, he said, people who are poorer and less educated are more likely to be “the ones who can’t isolate at home. They live in more crowded housing. They may not understand the intricacies of what a virus is about and how it propagates and things like that. Job exposure is also a significant factor.”

“These all wrap together,” he added. “You can’t just say let’s look at the race breakdown ... and let’s look at the age breakdown.”

He said it’s “simplistic” to break down deaths by race within a given state. That’s because some cities have been harder hit by the coronavirus and tend to have different demographic makeups than other areas in those states.

As a result, the Centers for Disease Control and Prevention now releases location-adjusted data. As of July 8, Overberg showed, those numbers indicate that whites make up 60% of the overall population but only 42% of the “weighted” population (places that are coronavirus hotspots), while for blacks those numbers are 13% and 17%, respectively. That data suggests that Blacks may be prone to dying from COVID-19 in part because they live in areas where it is more prevalent.

The CDC also releases age-adjusted COVID-19 data, Overberg said. This is helpful for comparing races and ethnicities because Hispanic and Latino Americans tend to be younger than their Black and white counterparts, though both minority groups are more likely to die from the coronavirus at younger ages. Age-adjustment is a way to make groups with different age distributions comparable, by using a “standard” population as a baseline for comparison.

Overberg showed the overall COVID-19 death rates of 33 people per 100,000 for whites, 67 for Blacks and 35 for Hispanics and Latinos. But when the death rates are adjusted for age, they become 26, 88 and 64 per 100,000, respectively. So, the disparities are greater than they appear on the face of it.

Overberg said journalists can crunch these numbers and then focus their reporting on the question: “Does policy reflect what we pretty much know now, and is the government responding appropriately?”

He suggested some possible story ideas along those lines: Is testing available at the locations and times most appropriate for its target audience? Is testing being publicized at the right times and places and in the right languages? Have cities set up quarantine and isolation facilities where they’re needed? Are governments mandating sick leave? Is telehealth offered for free? Is tracking and tracing adequate?

“It’s pretty well-documented that there are these disparities,” Overberg said. “What is your local government actually bringing to bear with that knowledge, to actually make policy work for the known facts?”

In addition, he recommended reporters check out COVID-related disparity data from the CDC, National Academies of Sciences, Health Affairs, U.S. Department of Veterans Affairs and Harvard University.

“I know this can be a dry subject, but it’s an important one,” he said. “Demographics are always important. You realize it even more when you see all the ways it plays out in something that’s literally life and death.”