What I learned from surveying hospitals about their use of algorithms in health care

Author(s)
Published on
August 24, 2021

A few months after the start of the pandemic, I noticed several hospitals were touting tools to predict which patients might have COVID-19, or which of their patients were at risk of getting sickest from the disease. It made me wonder: Who is checking to ensure these systems work as intended?  

When I first pitched this project to my editor, my goal was to piece together whether these systems were yielding accurate results, and if they were prone to bias. The challenge, as I later learned, is that most of these algorithms have not been subject to any FDA review, and I would have to create my own dataset if I wanted to follow how they were being used in hospitals. 

I contacted 90 hospitals about their use of predictive models during the pandemic through structured interviews. Of 26 respondents, about half of them were using these software tools to help care teams make critical decisions about prioritizing care resources. A group of outside researchers had also flagged several of these tools as having a high risk of bias, and hospitals’ practices for evaluating them varied widely. 

With many of our readers working either for health systems or startups, I hoped this would help raise awareness that these tools need to be evaluated more carefully.

In fact, there are many unregulated software tools currently being used in health care, even in patient care. This area is ripe for future stories — prediction models are currently being used to predict whether a patient is likely to be admitted to the ICU, have sepsis, or even be a “no show” for an appointment. Insurance companies, pharmacy benefit managers and other large entities also are developing and implementing predictive tools.

Occasionally, these tools are illuminated in research papers. Just recently, researchers found  that a model to predict sepsis developed by Epic Systems didn’t work as well as the company had stated. 

More scrutiny is needed of this growing industry, but sometimes it can be difficult to know where to start. Here are some of the questions and tips I found most helpful as I set out to survey hospitals on their use of predictive algorithms:

Any algorithm is worth looking at — not just AI tools 

When I started this project, I wasn’t sure whether to narrowly focus my article on machine learning tools, or algorithms in general. But as one source pointed out to me, both are prone to bias. 

While machine learning tools are notorious for being “black boxes,” meaning it’s difficult to know what exactly led to a given result, simpler formulas can still lead to problems when poorly designed. One example of this is Stanford’s algorithm that deprioritized some frontline doctors for the first COVID-19 vaccines because of how it factored in age and job roles. Policy decisions, including California’s vaccine equity algorithm, or how COVID-19 aid was distributed to hospitals across the U.S., are some more examples where imperfect models can have real-world effects. 

Don’t just ask how a model works — ask what it’s trying to optimize

As I prepared to start surveying hospitals, I sought feedback from a few sources on what would be the most important questions to ask before jumping in. Two of the most critical questions for me were: What was the purpose of the algorithms they had implemented, and what were they trying to optimize?

This made the difference between a hospital telling me that they had developed a tool to stratify patients based on risk, and telling me that they were using that tool to prioritize higher risk COVID-19 patients for an at-home monitoring program. 

It can also be important to know how exactly these terms are being defined. For example, an algorithm sold by Optum that was used to flag high-risk patients for follow-up treatment defined risk based on the cost of care, rather than a patient’s actual health, which ended up assigning Black patients lower risk scores as a result. 

Ask questions about how an algorithm has been evaluated

Because the data used to develop these models can shape their accuracy, it's important to test algorithms across different patient demographics and different locations. For example, a model developed using patient data in San Francisco might not be as accurate if it was implemented at a hospital in Cleveland. 

While hospitals should carefully evaluate that a tool works for their specific patient population before implementing it, many don’t have a process set up to ensure this happens. 

Even among FDA cleared tools, this is still a concern. A recent review of 130 FDA-cleared AI algorithms found that most studies didn’t share if the model had been evaluated in multiple sites, and a handful had only been evaluated in one site. Only 17 of the studies confirmed that the device had been evaluated for its efficacy in different patient demographics. 

Understand when regulations come into play 

Devices where software makes medical decisions, such as pacemakers, still have to go through the full FDA approval process.

Software tools that rely on a doctor or another medical professional to make the actual decision are often regulated by the FDA as class II medical devices, meaning a company only needs to prove that they are “substantially equivalent” to an existing device. Some examples of cleared devices include software to flag potential stroke cases in CT scans, or software implemented in Apple Watches and other consumer devices to flag potential arrhythmias. 

But none of the software tools I learned about in my survey had been reviewed by the FDA at all. Although they were used to help doctors make critical decisions, such as which COVID-19 patients should get more monitoring at home, or who will need to be transferred to the ICU, because the software wasn’t being marketed, they weren’t subject to these regulations.