Q&A with UCSF's Kim Klausner: Drug industry practices should be an open book
Perhaps more than anyone who has ever written about ghostwriting in medical literature, Kim Klausner knows where the bodies are buried. Klausner is the Industry Documents Digital Libraries Manager for the University of California-San Francisco, which means she is in charge of the Drug Industry Documents Archive, a collection of thousands of documents that detail how the drug industry has used continuing medical education and medical literature to help market its products.
Klausner has been an archivist for 13 years, primarily in community-based, nonprofit archives, such as the Labor Archives & Research Center, the Western Jewish History Center and the Gay, Lesbian, Bisexual, Transgender Historical Society. She spent two years watching and cataloging tobacco industry video tapes (focus groups, commercials, internal corporate meetings, etc.) for the Legacy Tobacco Documents Library, which is also based at UCSF. For the past three years, while managing the tobacco library, she has developed DIDA.
Q: How did the DIDA get off the ground?
A: There were two UCSF physicians who had been expert witnesses in a trial about the off-label marketing of neurontin. They were familiar with our Tobacco Industry Document Archive. Now they had access to all these pharmaceutical industry documents and felt it was important for others to have access as well. They approached the library and said, "Could you create a searchable database of documents?" So we did. It was 1,000 documents. On average, we estimated that a document is about four or five pages. There are a lot of one-page documents and then there are longer documents. It wasn't that much compared to our tobacco library, which has more than 10 million.
Q: What is the purpose of the archive?
A: Making accessible previously secret documents from the pharmaceutical industry to expose practices that are detrimental to public health. We're a health sciences campus, so we are primarily focused on that area.
Q: Did you use the same process as you had with the tobacco documents?
A: A similar process. We modeled it on the British American Tobacco Document Archive. That was the software that was the most up to date at the time. There was full-text searching for the documents themselves and you could construct Boolean queries on the metadata. We had funded it initially with a gift that one of the plaintiff's lawyers gave us. Thomas Green. So we were able to set it up with those funds. Shortly thereafter we added some documents from the Committee on Government Reform. And then it sort of pretty much stayed that way for a couple of years.
Q: So how did the new raft of documents end up coming your way?
A: PLoS had petitioned the court to unseal these documents, and, literally the day before The New York Times story was going to appear, PLoS called us and said, "How would you like to add these documents to DIDA?" So then we needed to do the work.
Q: What type of software did you use to convert the documents into a searchable format?
A: For the OCR-ing (optical character recognition) we use software by a company called Abbyy. You have to license it. It's for large scale OCR jobs. The software we use for the archiving is based on open source software that we have customized. It's not something that people can go out and buy. It requires each document to have a unique file name, and lawyers don't necessarily name their documents this way. We had to rename each document by hand. And we needed to have some metadata, like title, date, type of document, author and those kinds of things.
Q: How did you decide what needed to be in the metadata and what could be left out?
A: We have certain fields, like an organization field and a people mentioned field. We were given a really short turnaround. So in racing to do it we just didn't fill in all the fields. That's not how we like to do it. They all should be filled in. At a minimum we do have a title, date, type, unique identifying number and author. Those would be the absolute minimum. But we weren't always able to fill out the people and organizations mentioned.
Q: Were there some documents that you could not make searchable?
A: They all could be OCR'd, but the quality of the OCR is very dependent on the quality of the characters in the document. These were all documents from the 1990s and the 2000s, so they were relatively recent and in pretty good condition. Sometimes if you get a type-written document they are hard to OCR.
Q: Do you track who is using the archive?
A: Through Google Analytics. So, actually, I can get some information about that. But I'm not at liberty to say.
Q: Forget about actual names of people. How many people have started coming to the site after the new documents were added?
A: We definitely saw an increase in usage. In the past it's been about 2,000 people a month. But just in the past two weeks we have had 1,250 visitors. Some people come and look at one page and leave and other people spend more time, a lot more time in some cases.
Q: Have you gotten many calls about it?
A: No. Actually there's been very little interaction between us and people using the site, unfortunately, from my perspective.
Q: Have you had any complaints from the companies or the doctors mentioned?
A: No.
Q: Does that surprise you?
A: No. It doesn't surprise me. We do get people calling up or emailing us every so often about the tobacco documents, because their name is mentioned somewhere and they're not happy that the document is up there for the world to see. But in this case, probably a lot of the doctors don't know that we have it. The pharmaceutical companies have definitely found us, and I know that they're looking at it. But we don't put stuff up there to embarrass anyone.
Q: Having written a lot about doctors who have gotten into trouble, I have had calls from doctors saying, "Why is that up there?" and "You better take it down," even when most of what I am writing is based on public records.
A: We haven't had that yet with DIDA. I think we are flying under the radar right now.
Q: Other than The New York Times, what sort of coverage have you been getting?
A: Kris Hundley at the St. Petersburg Times in Florida is a great health reporter who has written some stories using DIDA. She searched for Florida doctors and did an expose about this doctor and his involvement in ghostwriting. The Toronto Star did an expose about a woman doctor up there who is also prominent. The Milwaukee Journal Sentinel did an article about continuing medical education at the University of Wisconsin.
Q: Is there anything that you have in the archive that is only available if you are physically on campus?
A: No. It's all digitally available.
Next week: Klausner talks about the gems she's found while mining the archive.