Tired of being spoon-fed data points? A few tips for hunting health data in the wild

Published on
December 7, 2016

Health data has become relatively plentiful and easy to obtain in the past several years. One big reason is the open data movement in government and the nonprofit world. Another is the Affordable Care Act, which included several data-driven mandates for the health care industry. They’ve helped to bring a new sense of transparency to an often-opaque business.

Much of the available data exists in a “scorecard” format and is designed foremost for consumers. A good example is the nonprofit Cal Hospital Compare.

As a data journalist, let me state my bias: I do not like scorecards. I’m greedy. I want all the data. I want it raw. I want to analyze it myself. (Did you notice all the I’s in that paragraph? You’ve got to have an ego to spend hours staring at a computer.)

So let me lead you into temptation, away from manicured gardens and into the wild where you might get into trouble — and where you can learn a lot about how health care really works.

First stop: the California Health and Human Services Open Data Portal. This is the public data center for California’s gargantuan Health and Human Services Agency. On the home page you’ll find nine blue buttons, including one that says “All Data.” Click on that and you’ll be taken to a page with a menu to the left and 10 summaries to the right. At the bottom of the page is a note indicating (as I write in early December) that these are the first 10 of 391 results available.

The Health and Human Services Portal uses a common bit of database magic called “Boolean logic.” If you remember ninth-grade algebra, you’re good to go. It works like this: Go the menu on the left and click something. We’ll start by clicking on the “View Types” item “Datasets.” After a moment, an “X” appears next to the word Datasets, and up at the top we see this:


Now go back to the menu on the left, look at the section at the top labeled “Categories” and click on the item “Healthcare.” Another “X” appears, and we get another change at the top. This time we have just 53 results. This is Boolean logic in action. If we wish, we could narrow our search still further, say for patient discharge data or hospital quality. But it will only take us a few minutes to scroll through the records and go bargain hunting.

In just the first two pages I find All-Cause Unplanned 30-day Hospital Readmission Rates, County Medi-Cal Certified Eligible Counts by County, and, an old favorite of mine, Emergency Department Data by Expected Payer Source.



Hospitals are always complaining that their ER’s are financial black holes. So why not dig into the numbers and find out what’s really going on? The data is just sitting there for free. And it’s available in formats you can download into Excel or any database manager (CSV, CSV for Excel, tab-separated).

The richest source of material within the California health care world is OSHPD, the Office of Statewide Health Planning and Development. OSHPD collects an enormous variety of statistics, particularly on hospital finances. Very few state governments can match OSHPD, and as far as I can tell, none exceed it in the quality and depth of its financial data.

Pro tip: If you do health data, and if you’ve got a background in business journalism, plan to spend a lot of time mining OSHPD data.

The Big Kahuna of the healthcare world is, of course, Medicare. And for health data, that means the place to go, or rather, the places to go, are data.medicare.gov and data.cms.gov

The former is distinctly friendlier and easier to navigate. It’s home to the justly famous Hospital Compare, a set of dozens of downloadable datasets on the nation’s hospitals. Each is updated several times a year. Be sure to check the “About” button at the far right of each screen for the “updated” date. Also check the “Data Updates” file, which lists each of the nearly 200 measures in Hospital Compare and the date it was last updated.

A few points to keep in mind:

  • Many of these measures are controversial. Yes, everyone agrees that hospital-acquired infections (HAI’s) are bad. But plenty of hospital administrators (and some researchers) will tell you that Medicare uses imprecise methods to count the number of HAI’s, and that some hospitals are being wrongfully fined as a result.
  • Medicare is issuing fines (and some bonuses) based on these measures. So these measures have a real impact on the hospitals. We are no longer talking purely, or mostly, about consumer education. Medicare is using this data as a blunt instrument.
  • Be careful in using these measures to compare hospitals. The readmission measures in particular are adjusted for patient mix – and that’s not an adjustment you can make on your own.

The other Medicare data portal, data.cms.gov, is organized similar to the state site, with Boolean logic in mind. It had 291 documents available as of early-December, including 185 datasets. This is the place to go if you want to venture beyond the safe confines of Hospital Compare. Here, for example, you can provide payment summaries by Medicare for the Top 100 Diagnosis-Related Groups in fiscal year 2013 at each of 3,000 hospitals nationwide.

Good stuff, if you’re ready to dive into the weeds.

[Photo by Mike Tigas via Flickr.]