GPT-4 is hurtling into medical records. Why aren’t more journalists covering this?

Published on
May 1, 2023

Did you agree to give all your most private medical information to GPT-4? I sure didn’t. But Microsoft and Open AI’s GPT-4 are already going through someone’s data. Is it yours?

Last week, the health care software company Epic announced that it would give Microsoft and GPT-4 access to patient records for the sake of automatically drafting replies to patients, and improving the ability of health systems to analyze their own patient records to cut costs and improve productivity. 

This announcement is different from many other types of data sharing or AI training partnerships that have been in the news recently. Giving these tools access to patient records because “It’s probably okay” or “Let’s try this and see how it goes” is not what we do in health care. 

Health care is highly regulated, and there are extensive laws, regulations and ethics to consider when it comes to patient data. These all apply even if data is presumably “anonymized” or “de-identified,” as often claimed.

Epic’s announcement should be prompting lots of hard questions from reporters and legal experts. Here are some of the most basic concerns worth investigating in the wake of this announcement:

Does this violate HIPAA? The landmark health privacy law passed in 1996 has several parts to it. One is that your information cannot be shared with outside organizations or people without your consent. It is not clear from this Epic announcement if any of the patients whose data is being used in these new Microsoft initiatives gave their consent. 

You may be thinking, but can’t people use anonymized datasets without violating HIPAA? They can, but only under certain conditions. One condition is that it must benefit the patient — for example, for a quality improvement project. But you have to be able to prove that giving access to this sensitive information benefits the patient. Also, you’re not allowed, under HIPAA, to just let someone cruise through private information and check things out to see what they might be able to do, especially if that access is motivated by cutting costs, one of the factors cited by Epic’s CEO in the news release. 

Cost-cutting to benefit patients may be allowable for data usage under HIPAA, but historically those cuts are also supposed to benefit patients and result in better care or improved access. It’s not clear if that is happening here, or if it’s merely a profit-maximizing effort. What is the presumed payoff for patients and is there any kind of oversight or monitoring to make sure it happens? In other words, who benefits from this sharing of data, and are there communities who may be more vulnerable or at risk of being taken advantage of?

It's also not clear how much data trawling has already taken place. How much has Microsoft, and GPT-4 already seen and learned about patients from their records in order to assess this partnership’s potential and profitability prior to this announcement? Health care journalists should be asking these very questions right now. 

Can GPT-4 figure out who you are? AI is, in essence, a massive autonomous and unregulated data broker, whose entire existence was created to cross-link information and then generate responses to queries based on the “large language model.”

Epic’s data-exploration tool SlicerDicer, which will now receive a boost from GPT-4, supposedly works on an anonymized subgroup of patients. However, data brokers can easily identify who you are by cross-referencing publicly or sold sets of data about you. Doing the same would likely be a piece of cake for GPT-4, who has “learned” from the internet. If you are a person with Stoneman’s Syndrome (which affects one in 1 million people) and GPT-4 finds your file, can it figure out who you are? Can it identify you and your circle of family and friends from all the people who searched for Stoneman Syndrome on the internet? Can it figure out locations based on the institution that gave its Epic data to GPT-4? 

The critical question here is what safeguards or firewalls are being put up to ensure that these AI tools don’t violate ostensibly narrow mandates and instead violate HIPAA privacy rules on an unthinkable scale? These are questions journalists should already be asking these companies. 

What is the FDA doing here? We can say, based on the sometimes error-filled and hallucinatory responses of artificial intelligence, that the jury is out when it comes to predicting absolutely what it can and will do. By definition, if you don’t know the outcome of what you’re doing, it is an experiment. There are strict rules that are supposed to be followed before patient data is shared for experimental reasons. Even a paper checklist had to be shown to be safe, and effective before it could be used in patient care. Health-tracking apps, such a pulse and oxygen monitors, were only allowed onto watches and phones after many years of robust and ongoing discussions with regulators about their accuracy and safety. Since the FDA was deeply involved in those patient information tools, it’s also important to know more about whether they are or are not involved in this partnership. 

Why such a rapid rollout? One detail of the announcement is that the new AI tools will be embedded in Epic to help practitioners generate autoreplies to patients. Doctors and other health providers have been generating thousands and thousands of notes for years as part of their training. If, after all that practice, they still need help to write one, they can just go online and ask for an AI generated generic note without giving the platform any of the patient’s private details. They don’t have to do it in Epic. 

Does this latest announcement mean that GPT-4 is training its language model on your personal health information, instead of the AI simply “helping” a clinician? If so, given that more and more organizations are now asking for compensation for use of their data to train an AI, are you owed compensation if your data was used? We clearly need more and better information in these areas, and smart reporting would be an essential first step.

There has been remarkable silence from the media, regulatory and governmental agencies since this announcement. And there is a remarkable lack of information available. It is important for us, as a society, to ask these questions, before silence is taken as consent. Journalists, the ball is now in your court.

Dr. Jan Gurley is a physician, public health leader, and a member of the Center for Health Journalism’s advisory board. The opinions expressed here are those of Dr. Gurley and not her employer, the San Francisco Department of Public Health.