Skip to main content.

The Power of Small Data: Why you probably don’t need ‘big data’ for your stories

The Power of Small Data: Why you probably don’t need ‘big data’ for your stories

Picture of William Heisel

I started crunching numbers for reporting using a spreadsheet.

And spreadsheets seemed good enough for many of the data-driven stories that I did for the first 10 years of my career. But then people started talking about “big data.” The term showed up on health reporting blogs and in health reporting newsletters. It became the topic of books. Panels were convened around the term at conferences.

I thought I needed “big data” to take my reporting to the next level, and this was one of the reasons my project on family courts faltered, as I wrote in a previous post.

What I did not understand at the time was that big data was beyond my ken. And that’s because I actually didn’t really understand what big data meant. Here are two useful definitions.

The first comes from the McKinsey Global Institute. The consulting firm put together a guide to big data in 2011 called “Big Data: The next frontier for innovation, competition, and productivity.” I would encourage you to read it but only if you’re writing about big data. (The point of this piece is to let you know that you really don’t need to or want to try manipulating big data for your stories.)

The McKinsey definition starts with this basic premise:

Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.

Let’s pause on that for a minute. If typical database software tools are not able to adequately capture and analyze big data, how can you possibly hope to do any work with it using an Excel spreadsheet? Even Microsoft Access has a 2 gigabyte size limit per database.

But, more importantly, big data is getting bigger. As McKinsey notes:

As technology advances over time, the size of datasets that qualify as big data will also increase. Also note that the definition can vary by sector, depending on what kinds of software tools are commonly available. With those caveats, big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes).

Again, the chances that you will be working with data that amounts to 24 terabytes is fairly slim. (But please send me examples of stories that you have done that made use of data in that size range. I’d love to see them!)

So why do so many reporters – including myself a few years ago – think that harnessing big data will finally give them that big story they have been hoping for?

Gil Press, who wrote an excellent shades-of-William-Safire exploration of the origins of the term for Forbes. He offered a variety of possible definitions. My favorite is this one:

A new attitude by businesses, nonprofits, government agencies, and individuals that combining data from multiple sources could lead to better decisions.

That’s where reporters should focus. It’s not so much about the size. It’s about the linkages. It’s about finding new insights by combining data from different disciplines. It’s about cross-walking between datasets that may – at first glance – seem to have nothing to do with each other.

I did not fully realize the power of doing this – at a large scale or small – until taking a job at the Institute for Health Metrics and Evaluation at the University of Washington in 2009. What IHME does is often referred to as big data for global health. With good reason. We are pulling together data on everything that might affect health – from death records to surveys to population censuses – from 188 countries in hundreds of languages and data layouts.

IHME was created to provide independent, rigorous scientific measurements to accelerate progress on health. They had two basic principles in mind: 1. Everyone deserves to live a long life in full health; 2. To improve health, we need better health evidence.

To generate that evidence, the research center had to figure out how to gather more data than had ever been gathered before. And it had to figure out how to tie those wildly different datasets together to create meaningful estimates of levels and trends in health. (There have been hundreds of publications to date explaining all of the ways we do that, so I won’t get into that here.)

Having a big data warehouse in Seattle doesn’t do much good, though, if no one is using it to improve health. This year, I took an eye-opening trip to Rwanda where I saw exactly how small slices of big data could be used to make a real world difference.

Next: How one country used small data to make a big change.

Related post

The Power of Small Data: Lessons learned from a number-crunching career

Photo by Buy Tourism Online via Flickr.


Picture of

I just read in another story about the clusters of anencephaly in Washington State and the frustratingly inadequate research being done regarding that problem. It's hard to believe that Texas would be doing anything better than Washington but there's the proof. Great stories, especially if read together.


Follow Us



CHJ Icon