Know Your Source: Where Do Health Data Originate?

Published on
December 3, 2012

When the National Cancer Institute contacted me about speaking to reporters about ways to cover cancer and other diseases more completely, my mind immediately went to the data.

We see health headlines every day, but we rarely stop to think about the data on which those headlines rest.

Over a series of posts, I’m going to try to break down some of the broad categories of data, discuss how they are used and point out some of their limitations. I’m going to cover four main areas: vital statistics, censuses, surveys, and estimates.

There are many, many data sources, locally, nationally and internationally. They range from international agencies to governments to businesses to academic centers to non-profit organizations.

Where do you get this data?

There are archives and repositories that specialize in specific slices of data. Here are a few examples:

IPUMS at the University of Minnesota houses census data from as far back as 1850. IPUMS, or Integrated Public Use Microdata Series, has a slogan that would make the Justice League of America proud: “Use it for good, never for evil.” The New York Times used it quite well earlier this year when it created an interactive map allowing you to enter your household income and find out “how you rank in 344 zones across the country.” I found out that if I moved back to Montana I would be much more elite than where I am now in Seattle.

The Inter-university Consortium for Political and Social Research (ICPSR) has an archive of social science data. Last week they posted new data from a survey conducted among high school seniors that covers a dizzying array of topics:

Drugs covered by this survey include tobacco, smokeless tobacco, alcohol, marijuana, hashish, prescription medications, over-the-counter medications, LSD, hallucinogens, amphetamines (stimulants), Ritalin (methylphenidate), Quaaludes (methaqualone), barbiturates (tranquilizers), cocaine, crack cocaine, GHB (gamma hydroxy butyrate), ecstasy, methamphetamine, and heroin. Other topics include attitudes toward religion, changing roles for women, educational aspirations, self-esteem, exposure to drug education, and violence and crime (both in and out of school).

The Simple Online Data Archive for Population Studies (SodaPop) at Penn State has a wide range of data available, including one of the best titled surveys I have ever come across: Assets and Health Dynamics of the Oldest-Old. Here’s how SodaPop describes it:

Assets and Health Dynamics of the Oldest-Old (AHEAD) is a national survey of community-based Americans born in 1923 or earlier. It is sponsored by the National Institute on Aging. The focus of the AHEAD survey is to understand the impacts and interrelationships of changes and transitions for older Americans in three major domains: health, financial, and family. The first wave of data was collected from October, 1993 through July, 1994 by the Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, Michigan.

There also is a massive data catalog being created by the Institute for Health Metrics and Evaluation (IHME) where I work, called the Global Health Data Exchange.

Next I’ll write more about vital statistics. That means babies! And, well, dead people, too. So much of what we know about health starts with how we track those two end points.

Image by Jennifer Pahlka via Flickr