GenAI, You, and Medicine: A long history with new opportunities and risks

The blog post, list of top quotes, and beginner’s guide are a summary of the Producer’s Cut transcript drafted by aiChat, Vanderbilt University Medical Center’s generative artificial intelligence tool built and managed by a team of VUMC Informatics, IT, operational and research professionals working in partnership with Microsoft. A human reviewed the writing for accuracy, content and hallucinations. VUMC’s aiChat launched internally this summer.

 

 

In the first episode of season four of the DNA podcast, experts discuss the exciting and potentially concerning developments in the use of generative AI models, such as ChatGPT, in medicine and other fields.

While the technology has been around for some time, the recent advancements in deep learning have allowed computers to create things and mimic human behavior in ways that were previously impossible.

The panelists discuss the potential benefits of these models, such as more efficient healthcare and better patient education, but also the risks of privacy breaches and bias. They stress the importance of establishing guardrails and educating people on the limitations of these tools to ensure they are used responsibly. The experts also share their own experiences using generative AI models, including using ChatGPT to improve decision support tools in healthcare.

The use of AI in healthcare has potential benefits, such as reducing alert fatigue and personalizing care for specific patient subsets. However, there are also concerns about the accuracy and reliability of AI-generated suggestions. Additionally, the use of AI in healthcare raises issues around data security and privacy, as well as the potential for bias in the algorithms. The hype around AI in healthcare may not always reflect the reality of the technology’s limitations and challenges. Ultimately, the management and use of AI in healthcare will need to be carefully considered to ensure that personal rights are protected.

The panelists also emphasized the importance of educating the public about the limitations and potential risks of ChatGPT, and the need to be vigilant about misinformation and deep fakes. Overall, the panelists agreed that there is a great opportunity for the technology to revolutionize healthcare, but caution and oversight are necessary to ensure its safe and effective use.

 

Beginner’s Guide to Key Concepts

  1. Generative AI Models: The term refers to a type of artificial intelligence model that can create new content, such as text, images, or videos, that did not previously exist. It does this by analyzing patterns and relationships in a large dataset and then generating new content based on those patterns.
  2.  Bias in Data: Bias in data refers to the presence of systemic and unfair differences in the data that can lead to discriminatory outcomes. Biases can be introduced into the data through the selection of the data itself or through the way the data is collected, processed, or analyzed.
  3. Decision Support Tools: Decision support tools are software applications that provide healthcare providers with information and knowledge to help them make better decisions about patient care. These tools are designed to provide relevant information at the point of care and can include clinical guidelines, alerts, and reminders.
  4. Electronic Health Records (EHRs): Electronic health records are digital versions of a patient’s medical history, which include information about their medical conditions, treatments, and medications. EHRs are designed to be accessible to healthcare providers and patients, and can be used to track patient outcomes, improve patient care, and support medical research.

 

Top Quotes identified by aiChat

“The more we’re able to help people understand how these tools actually work, the less we’ll run into people using them in inappropriate ways and getting information that they find to be unreliable.” – Yaa Kumah-Crystal, MD, MPH, MS, is an Associate Professor of Biomedical Informatics and Pediatric Endocrinology at Vanderbilt University

“And we recently took a lot of our decision support logic and actually ran it through ChatGPT to see if it had any suggestions for us. And we were really pleasantly surprised it actually did come up with a variety of ways to improve the logic of our decision support.” – Adam Wright, PhD, FACMI, FAMIA, FIAHSI is Professor of Biomedical Informatics and Medicine at Vanderbilt University Medical Center and serves as the director of the Vanderbilt Clinical Informatics Center

“The question is now under what conditions is the computer trustworthy? And partially because of the fact that it’s trying to mimic human behavior, humans are fallible, and computers are fallible as an artifact. And so you really have to start asking questions around under what conditions do I trust the outcome and output of this machine?” – Brad Malin, PhD, Accenture Professor of Biomedical Informatics, Biostatistics, and Computer Science, as well as Vice Chair for Research Affairs in the Department of Biomedical Informatics

“We have to be vigilant at all ages, at all levels of education and common sense about misinformation, about deep fakes.” – Holly Fletcher, MS, Director of Media Equity and Emerging Platforms, moderator

Read Full Transcript

DNA: Discoveries in Action Season 4 Episode 1 Transcript

Brad Malin: I think that we’re in the fourth AI bubble if we had to count them. We’ve seen these over time and what ends up happening is that after the bubble bursts, there are some things that survive. And those things that survive have been battle tested as a result. So the reason why I bring that up is if you look at the way in which the revolution, or as I’m referring to it as the bubble is growing, it sounds as though the singularity has occurred to the point where we have a new breakthrough that is just going to solve all the world’s ills. And the question is, under what circumstances is that actually true? You will not know that from the moment that it occurs.

Michael Matheny: The other thing that is really concerning, but it’s hard to get a handle around, is these generative AI models learn from the body of evidence you give it. Well, we know that there’s a lot of data that’s overrepresented and underrepresented, so there are plenty of racial and ethnic cultures that don’t have equal representation in the electronic, publicly available internet data.

Yaa Kumah-Crystal: I think the touchpoint between these large language models and education has been one of the most fascinating things to see evolve with the USMLE and the bar exam demonstrating that these models have the capability to pass them or at least report answers that match what the expectation should be is pretty impressive.

Adam Wright: We have several hundred people who work in our health IT department that are building order sets and decision support tools and forms for documentation. And I think that a lot of that work could be made much more efficient with tools like generative tools like ChatGPT. So I’d be interested in seeing could we use these tools to actually generate tools in our EHR that make our EHR work work better?

Clark Buckner: Welcome back DNA listeners. I’m Clark Buckner and I’m so excited to share the first producer’s cut of a new season. Season four of the DNA podcast is all about the influences and instruments shaping medicine, work, and our wellbeing, and not to mention technological, generational, economic, and cultural elements. To get a glimpse of what’s ahead, we’re asking a lot of smart, energetic people how they harness these variables into opportunity and vision. And what better way to start than listening to experts on artificial intelligence, and medicine, research, and biomedical informatics. Talk about the excitement and risks of generative AI like ChatGPT. We had an amazing turnout for the Twitter spaces conversation and I’m amped to be able to share it with you today. Let’s get settled and get exploring.

Holy Fletcher: All right, here we go. I am Holly Fletcher and I am the director of media equity and emerging platforms at Vanderbilt University Medical Center. And just a quick bit about me, so you know where I’m coming from on this. I am a journalist by genetic coding, so as soon as ChatGPT was launched to the public, it captured my imagination and attention for a variety of reasons. First, plagiarism. If you are using text from ChatGPT to answer some very basic background context, who’s to say that someone else is not using that? And what kind of

questions does that bring up? I am interested from the accuracy perspective. We are going to hear today about some “hallucinations” and what the all that means. And then, of course, I’m always interested in whether this will stifle or whether this will spur some creativity. And efficiency. There are so many things that we could just help do rote, that’s in our everyday jobs.

And this technology has the potential to impact all of that and so much more. And it turns out that after talking to our experts that we have with us today, even though they come from such different backgrounds from me, we have a biomedical ethicist, we have a pediatric endocrinologist who does voice EHR stuff. And we’ve got so many other experts, but yet, they have the same questions and the same visions of concern and opportunity. And they actually understand the backside of this technology, which is what we are here to talk about today.

So with that said, we are going to kick this off and we are going to hear from Dr. Michael Matheny first, as he talks about where we are with this technology right now.

Michael Matheny: Yeah. No, thank you. And it’s a pleasure being invited to speak and I’m looking forward to the exciting discussion with my co-panelists. No, so I think actually people don’t realize that some version of machine learning and artificial intelligence have really been around for a long time. One of the most famous risk models, the Framingham cardiovascular or heart disease risk calculator, it’s been around since the ’80s. And a lot of people don’t realize that because it was translated effectively onto a paper list to basically calculate manually for many years when it was used clinically. And so even though you don’t see them, there’ve actually been some of these tools that have been in use for years and years, although they’ve gotten a whole lot more exciting in recent years of course.

Holy Fletcher: Could you tell us right now what about the vision that you see and how you have responded to all of the news and hype that has been everywhere, all the time?

Michael Matheny: Yeah, no, that’s a really good question. It’s been really exciting. My background is in primary care clinically and I’ve been doing research in artificial intelligence and machine learning for many years and applications of these tools. And to see the power and the capacity of some of the recent deep learning and generative models, just the things that they can do to help your everyday life is really exciting. At the same token, though, it’s been a little scary to see potential privacy breaches and potential bias and other sorts of things that I’m sure that this group will get into and discuss in much more detail during our hour together.

Holy Fletcher: Brad, what about you? Let’s have you pop in and tell a little bit about your background and why this is so interesting at this point.

Brad Malin: Sure. So I’m Brad Malin. I’m the Accenture professor of biomedical informatics, biostatistics and computer science at Vanderbilt. But I also have an affiliation with the Center for Biomedical Ethics and Society and run a lot of projects that are based on figuring out how to be ethically responsible with respect to investigations. I think we’re standing at a really important point in time. It’s not that artificial intelligence has just become popular, I think Matheny, Michael,

was correct in that this is a concept that’s been around for many years. The situation now is that it’s really started to become more generative in its nature. It’s gotten to the point where it’s not just predicting a particular outcome for let’s say some type of an outcome like a heart attack let’s say, if somebody’s at risk for it.

It’s turned into a situation where the computer is going to create things and is going to make recommendations or create concepts that we don’t know whether or not this is real. It’s become very good at mimicking or at least mimicking human behavior in many different situations. Everything from programming a computer to doing an interpretation of a medical record. And so the question is now under what conditions is the computer trustworthy? And partially because of the fact that it’s trying to mimic human behavior, humans are fallible and computers are fallible as an artifact. And so you really have to start asking questions around under what conditions do I trust the outcome and output of this machine? That’s a societal question. That reaches beyond the technology and it starts asking the question of what are the guardrails that we want in place and under what conditions are we willing to trust information that’s being communicated in a manner where we think that it isn’t completely trustworthy?

Holy Fletcher: Yaa, you and I have had various discussions and emails about this and some exciting conversations. So from your vantage point, what is exciting and what is the vision for how this could radically impact life?

Yaa Kumah-Crystal: I am just so excited to be part of this conversation because just as the other colleagues were saying, this technology isn’t new and as someone who works with it electronic health record, what’s fascinating and I hope is a little revolutionary for others, is that what happened was that this became very accessible and easy to use. You could log in and just get going. The simple interface gave you a wealth of information and useful information back. And based on that, there’s so many ways you can use this to get just 10x work you’re doing. From a medical perspective, I think there’s some really intriguing ways we can use the output from tools with the large language models to help educate people about their conditions and help translate complex medical information into information that’s more accessible and readable.

The common concern and complaint that’s raised about these large language models and all the generative tools is that it’s not clear what it is they’re doing in the background and people might be misunderstanding and thinking there’s more understanding going on than what is actually happening, which is just a representation and mimicking of what it was trained to do. And I think the more we’re able to help people understand how these tools actually work, the less we’ll run into people using them in inappropriate ways and getting information that they find to be unreliable.

It’s almost as if you’re trying to eat cereal with a fork and say, “Oh, this doesn’t work.” Well, if you know the tool that you’re using and pick the right tool, then you can accomplish great things. These are language models for the most part, like ChatGPT, I think Brad alluded to other tools

for image generation, et cetera. But they’re supposed to be here to help you compose information, put words together that makes sense. That doesn’t necessarily mean you’re going to get factual information. I think the more people understand that and the limitations, the more we can actually leverage them to do what they’re made to do.

Holy Fletcher: Adam, I am curious to hear from you for a second about, first of all, your interest and also what your team has been trying with ChatGPT recently and some pretty interesting results.

Adam Wright: Yeah, absolutely. So it’s exciting to be here. My name is Adam Wright, I’m a professor of biomedical informatics here at Vanderbilt. Really the Vanderbilt Clinical Informatics Center and also have some responsibility for clinical decision support here at Vanderbilt. And so if you think about the entire complex that is the electronic health record, one of the reasons that we use an EHR is because we think we can help people make better decisions. And so for decades we have been developing and delivering clinical decision support. Much of it is today sort of Boolean logic, decision tree based. So something that compares medications on your medication list to identify potential drug interactions or that looks at your age and your risk factors, to identify which screening tests might make sense for you. And we’ve made some efforts in the past to use some elements of AI or machine learning to improve those things.

So often classifiers or prediction algorithms that guess which patients might be at risk for a particular disease. We actually recently published a paper looking at using ChatGPT to analyze the knowledge bases of decision support that we have. So we have over 800 decision support tools that we use here at Vanderbilt, and it’s a big challenge to keep those up to date as guidelines change, new medications come on the market, rare clinical conditions are seen for the first time. And so we have a decision support team that’s responsible for maintaining that knowledge, consists of physicians, nurses, pharmacists and informaticians. And we recently took a lot of our decision support logic and actually ran it through ChatGPT to see if it had any suggestions for us. And we were really pleasantly surprised it actually did come up with a variety of ways to improve the logic of our decision support.

In many cases, sort of making it narrower or identifying subsets of patients. For example, patients with organ transplants or patients who are immunosuppressed or patients with certain diseases where guidance might not be appropriate or guidance might need to be tailored. And the beauty of this is twofold. Number one, it should reduce the burden of alerting in the electronic health record. We call that alert fatigue. And so the hope is that by getting rid of some alerts that were inappropriate, we can make it less frustrating for doctors and nurses who use the EHR. And then it also works towards our vision or mission of personalized care. So we’re finding subpopulations of patients who need modified suggestions. And so this helps us make our care a little bit more personal and a little bit more accurate. What’s interesting though is that we showed the suggestions that ChatGPT made to clinicians. We actually had clinicians review the logic of the alerts themselves to see if they could make improvements and ChatGPT performed just a little worse than the humans.

But people were inclined to accept many of the suggestions. Potentially made a few suggestions that weren’t quite right. And so I heard you mention hallucination. At one point it said to find patients on a medication called Etenfergot, there is no such medication. It just sort of hallucinated there, confabulated that medication out of thin air. We think it might have been trying to give of a similarly named medication, Etanercept, but that was a surprise to us. And then it also pointed out that, it recommended that we consider the risk of postoperative nausea and vomiting in patients that were taking antidepressants, especially SSRI medications. And there’s actually no known link between those two. There might be some evidence in reverse. So some of the suggestions that it made were not factual or not helpful. So we found it to be really important to have humans screen the output before making any changes.

Holy Fletcher: One thing that I’ve been watching over the last several months is the the drip-drip and then we had a bit more of a flow of journals and some other news agencies being transparent or trying to grapple with how ChatGPT as a writing function or tool fits into scholarly work. And so far, if I’m recalling correctly, most of it has come down as ChatGPT is not a valid author. And so that gets me to thinking because so much of research and medicine and the pursuit of knowledge, just like journalism, is writing and thinking through ideas. How do you think about the use of a tool like this to either push forward and maybe get something on a blank page or do you think it needs to be or it needs to be kept to some other part of the writing process?

Yaa Kumah-Crystal: I think that ChatGPT and other of these language generated tools are a really good aid in starting the writing process. If you’ve ever asked it to generate an article for you or something, a speech, an email, you’ll find that it does a remarkably good job compared to nothing. But it’s still a little mediocre compared to really good writers. You can ask it to imitate the style of Stephen King or the style of Maya Angelou and it does a really good imitation of them. But hands down, I’d rather read the original. So if your aim is to have mediocre style writing, then you’ll stop at that. But I think we aspire for more than that.

So if anything else, they can help give you a starting point that you can build off of to help massage and sculpture ideas and generate new ideas from. And that’s one of the things I think is really interesting with regards to being able to generate ideas and where do ideas come from? I don’t know if you’ve ever had discussions with someone else where you’re trying to find a word and you’re like, oh, what’s the word for this? And you iterate a few times and they throw out some words and you throw out some words and then you find a different word. It wasn’t even the word you were looking for, but another one that was better and you arrive that word because you were bouncing ideas off each other.

And where ideas come from is the experience of iterating through. And to have this tool available to you, where you’re able to iterate through on demand, on your own timeframe and just flesh out ideas and the process of editing, I think is just such a remarkable tool for our learners coming into the world. And honestly, to be competitive in the next few years, they’re going to be expected to know how to use these tools and more than just the people who are

generating the content, they need to learn how to be editors. They need to learn how to comb through the content that’s generated for them and figure out how to turn something that’s moderate, mediocre into something that’s amazing and phenomenal.

Michael Matheny: Yeah, I think you make a great point. I think I’ll choose a bit of a counterpoint here in that I think some of what you’re describing is de novo new generation by the tool, but I think some of what you’re describing is editing and you have the ideas, you have the outline, you give it some information and then the tool helps you and crafts. And I feel like I think there’s a risk, there’s been a lot of discussion around homogeneity of thought and reduction in creativity that might occur if you let the tool generate too much for you rather than effectively use the tool, generate your ideas, outline it, spec it out, and then have it help you fill in the gaps or give you alternatives.

And so I think there’s actually a little bit of a risk from an educational and a thought-training perspective to let it do too much of the early stages. I think certainly it is entirely invaluable to help you fill in the gaps and find things. Pair programming with a software development has also been a wonderful experience. So I think there’s an opportunity there, but there is some caution.

Adam Wright: Yeah, one thing I worry about is where do these ideas come from? So what are the theories in academia is that we stand on the shoulders of giants and we build on each other’s ideas. ChatGPT is giving me ideas. They’re not ChatGPT’s ideas exactly, they’re other people’s ideas. And ChatGPT does not do a great job or really even an acceptable job of telling you where they came from. If ChatGPT gives me some ideas and I say, “Where’d those come from?” It tends to hallucinate references to made up papers and fake titles and fake journals. And so I just worry that if we depend too much on ChatGPT for ideas, we may accidentally plagiarize sets of ideas or combinations of ideas without ever really knowing where they came from. So I worry about that a fair amount. I’ve seen newer tools that try to have some linkages to actual sources and searches, and that may be part of the solution. But I think it’s really important for our trainees and faculty to know where their ideas came from.

Holy Fletcher: I would like to hop in here and welcome our speaker, Jon. Jon, welcome to the chat. Do you want to take a minute to introduce yourself and why this is a cool thing from your perspective and inviting you to speak?

Jon Schoenecker: I am Jon Schoenecker. I am one of the pediatric orthopedic surgeons here. I do lots of deformity in children, correction of things like hip dysplasia and trauma. And then I’m also part of the direction of innovations over at the Children’s Hospital for development of new technology. And then I also run a basic science lab in the Bone Center, mostly out of the Department of Pharmacology. And my background on this is simple, is that it took me 19 years from graduating from high school before I was a board certified orthopedic surgeon. And I’ve always been thoroughly amazed at how long it takes for us to learn how to do medicine and always am looking for faster ways of doing it, more efficient ways of taking care of the patterns that we recognize in children and other aspects of healthcare and coming up with the ways to fix

them. And I’ve been having a great time playing around in particular in the GPT space in terms of new ways of doing that, both from a clinical standpoint as well as a research standpoint. So that’s my interest in it.

Holy Fletcher: So I am interested in talking about, we are seeing so much hype right now. You can’t open up any kind of newsletter or social media site or anywhere where you get information and not see headlines about what ChatGPT, Bard, Midjourney and the fusion into being can do or are up to and aren’t always great. So I’m curious, and this might be interesting to start with Brad and Michael, what are your reservations about AI being in the headline like this? Coming from the journalism side, I will be the first to say that it is really, really difficult to write a headline that is not a snooze fest and captures people’s attention and gets them to click. We have short attention spans and sex and sensationalism sell. That’s what it comes down to. So what are you thinking about that kind of public awareness and perceptions impacts on the work that you all and your colleagues across the country and the world do?

Brad Malin: It’s an interesting question. So I think that we’re in the fourth AI bubble, if we had to count them. We’ve seen these over time and what ends up happening is that after the bubble bursts, there are some things that survive. And those things that survive have been battle tested as a result. So the reason why I bring that up is if you look at the way in which the revolution or as I’m referring to it as the bubble is growing, it sounds as though the singularity has occurred to the point where we have a new breakthrough that is just going to solve all the world’s ills. And the question is, under what circumstances is that actually true? You will not know that from the moment that it occurs. It’s really going to be a question of do you have a technology that has sufficiently adapted to the niche or has been developed to support people in a way that is going to be meaningful?

And I think the jury’s out on this, so it could be that you find that this type of technology is going to be useful for giving you your sexy title if you want. And okay, if that’s the only thing that ends up being as the staying power, or the thing that survives at the end of this bubble. Then we might go, all right, what’s next? When are we going to have something that is going to be transformational again? And it’ll take some time because what’s going to end up happening is that a lot of things are going to get hyped. People are going to expect that they’re going to have something that’s going to work for every problem that they want to address. And when that doesn’t happen, there will be a lot of doubt about what and when. I guess you can say what is the technology? And when is the technology going to be appropriate? I think that’s really the question. There will be some things that survive and some things that don’t.

Michael Matheny: Yeah, no. I agree with Brad and I guess I would like to pull out the two specific points. So one is I think when we think about the excitement of ChatGPT, there’s just a lot of little things in our regular lives and workflow that you could potentially use. I think it’s significantly more complicated in healthcare for a number of reasons. One, there’s a whole lot of issues around data security, privacy, the right now ChatGPT can pretty much use any data you submitted internally to help it further train its algorithms and use those pieces. A lot of things

have to be worked out in the environments where healthcare systems operate for appropriate protection of data. And I think that there’s going to be some challenges there.

I think the other thing that is really concerning, but it’s hard to get a handle around is these generative AI models learn from the body of evidence you give it. Well, we know that there’s a lot of data that’s overrepresented and underrepresented, so there are plenty of racial and ethnic and cultures that don’t have equal representation in the electronic, publicly available internet data. And I think we’re at risk for whatever is most prevalent and present in these models being put forth. I remember a couple of articles I read over the weekend about how just an example of generating a story or generating some pieces came out with a very western, industrialized affluent framework in terms of how it wrote the story and the elements that it used. And so I think we really have to be careful of that.

The other challenge then becomes, well how do we detect it and how do we manage it? And I think there’s going to be a lot of work that needs to be done in that area. Particularly some of the things that are already being done where you go in, you generally train the model and then you go in, you try to remove offensive content and you try to retrain it to be appropriate for given context of use. I tend to agree a little bit with Brad, is I think it’s this splash of excitement. But I think there’s a ton of work to be done to really make these things safe and usable in a number of healthcare contexts.

Brad Malin: Just to follow on, one thing that was pointed out that people might have missed is that there is a difference between the technology itself and the way the technology is managed. And so if you have a system like ChatGPT that is owned by a for-profit organization and now it’s under, I guess, the guise of Microsoft, you have to ask a question about under what conditions will it be okay to provide data that could be sensitive? And I think that Michael’s hit the problem head on in that right now there’s no way that we would be providing real medical records into that system, until we had a better understanding of how that information was going to be collected and subsequently used. So those are two different issues. The technology is evolving, but the way that it gets managed could have potential implications about personal rights.

Yaa Kumah-Crystal: With regards to some of the hype around this, I would 100% agree that per Gartner we would be in what would be the peak of inflated expectation, I think is what they describe it as. But guys, I don’t think this is unwarranted. What is happening in November, again, these have existed for some time and the people who were doing AI before AI was cool, are rolling their eyes at everybody because there have been tools that have done this. But we have now put in people’s hands a tool that is able to mimic human representation of understanding and language representation. And again, I use the word mimic because we don’t want to anthropomorphize this too much, but we equate the ability to communicate with intelligence.

And the fact that this thing can mimic intelligence very well, I think, should not be underestimated with regards to the societal impact. Not everybody is a statistician and is going

to appreciate that these are [inaudible] models and there’s machine learning behind that. They’re going to see these things, they’re going to encounter these things and they’re going to build relationships with these things. Mark my words, there are going to be religions that sprout around these tools and these entities that seem to understand and know and can do things in ways that people weren’t able to accomplish before. So even if we stopped at GPT-4, if nothing else came from this, the things we’re able to do with computers now that are clearer to people than before November, is tremendous. And this comes to human computer interactions and just the concept of programming computers. There’s the saying, “What’s the best programming language? Is it Python? R? It’s English.” Just being able to use your words to have the computer do what you need to do.

Going back to medicine specifically, a lot of the struggle we have in medical research is being able to understand what in the world is going on with all our data because so much of it is free text notes. These tools are really good at combing through text and language and assigning them categorical numbers and values, so you can go through those and assess those. And all things aside about figuring out data privacy and integrity, I feel like that’s just a technical and policy thing to solve. But the potential for what we’re able to do, if we stop today and all we do is hook up our GPT-4 to WolframAlpha and put our symbolic representations of knowledge about how to manage medicine, is still stupendous.

Holy Fletcher: So we are going to do a first here and we are going to have a question from the audience. What do you have to ask?

Chris: Oh hi, this is Chris. I’m actually building systems AI with healthcare for quite some time. I work on the medical imaging side and I’m just really glad that we are actually talking about generative AI in medicine and healthcare. And I would quickly like to find out few of the things that I have tried and while we all are looking forward to LLMs and ChatGPT, there are a few ways how to use it. So coming from the privacy perspective that was discussed initially. So the first thing is that if anybody in your network or in your hospital or ecosystem, if you’re building a privacy, if you’re thinking to actually use GPT for pipeline to understand and assess your medical record, make sure that there is a privacy pipeline before that. And that privacy pipeline will first de-identify every identifiable PHI information, generalize it, and then send it as a simple report with GPT-4 pipeline, which can then report back. Like just what if you are saying that what Glass AI, they are doing using ChatGPT if I’m not wrong.

Other thing is that people who are using ChatGPT today, they tend to forget one thing, that if you’re using ChatGPT, the models which have been getting used in ChatGPT, one is GPT-35-Turbo. GPT-35-Turbo is not that advanced as GPT-4. So anybody, if you are using ChatGPT, make sure that you subscribe to GPT-4 which is far more intuitive when it comes to medical data. Because for the same thing it’ll create and give you equal data points, but it will give better results.

And the last thing is how do you use GPT in translational medicine or in any effort? From a

developer perspective, let’s say from a question perspective, obviously whatever you ask and it is going to give you. For example, I was just saying that okay, what are the active ingredients of Trental? And it actually pushed me back that okay, pentoxifylline is over there. But then that is not going to help me because if I’m building a system, at the same time, what I did is that okay, write a program for me and a program will be written by GPT-4, not the 35-Turbo. So write a program for me which can fetch the data from NIH data source. And one of the thing which we’ll do is, and make sure that you do not or any of the data scientists should not run the code as it is. You have to go and check that the APIs that they are going to call from each of these sources, be it NIH or NHS or any of the symptoms dataset or anything, it will bring back from that data source.

You have to always look into which data source that they’re looking in, what are they returning? So when you plan to integrate GP-4 in the medical pipeline, make sure that one of the privacy pipeline is being taken care. Whatever data has been returned, make sure that has been read through and being identified through some manual intervention before it goes and solves it. Using directly ChatGPT in a framework where it is going to directly help from a medicine perspective, from a symptom perspective. Yes, it does identify a symptom, but at the same time, it will also give out symptoms from disease that might not be the same cluster. So this is what I have tried, tested and I thought to share it with you. Thank you.

Holy Fletcher: Thank you so much. Your point about the data privacy and then, Brad, we’ll hop over to you. I saw something this weekend, I believe, that Samsung actually okayed some of their engineers to debug some code in ChatGPT and in the process of that, they wound up sending their proprietary code to ChatGPT, which is now in possession of their proprietary code and corporate secrets. So it’s really interesting and it’s just absolutely vital to remember that this is not a black hole that you will never see again. One of our speakers before reminded everyone that it is privately owned and you are giving them access to your brain.

All right, Brad, and then we’ll go from Brad to Adam.

Brad Malin: I believe that what Chris was saying in general was true. You certainly want to have some type of a privacy pipeline in place. It’s only one issue, though, around a much larger question regarding the safety of these systems. And this whole notion of once you provide information over there, you don’t know who’s looking at it, you don’t know what the terms are with respect to the organization that is now holding the data. You don’t know if they’re going to take that information and misuse it in ways that you did not intend for it to be used. It’s not just that you are using these technologies as they’ve been stood up for you. It’s that the organizations behind them, they’re always going to want more data.

They’re going to take whatever you provide to them and they’re going to use it in some way, whether it’s to train or fine-tune or retrain, it’s going to be there. And so in healthcare, I think, is really important. Unless you have a business associate relationship with this company, you really have to question under what conditions you should provide them with access to anything. But again, it’s not the technology issue. This is about governance. This is about under what

conditions do we need to have some type of regulation around what these technologies, how they should function? It’s really just a matter of building out the societal management that is considered to be appropriate for the handling of large amounts of information. That’s really all I wanted to lay on the table here.

Adam Wright: Yeah, I agree with Brad. I think there are also some contractual solutions. So there are some ways that we can use ChatGPT through Azure, where they don’t capture everything we send them. But so far, if you notice that, sorry, I told you guys was focused on build, kind of the logic of our EHR rather than identifiable clinical information. We’re still, I think, reluctant about sharing PHI with ChatGPT, but there’s probably some contractual solution combined with a privacy architecture, like what Brad was laying out that’s the solution.

And then I would also just add that I found that upgrading my account to ChatGPT Plus and using the GPT-4.0 model was quite a significant improvement. I think that’s probably underappreciated because most people are using the free version. But that 4.0 model is quite a bit more impressive than the GPT-3.5-Turbo. So I agree with Chris about that, too.

Michael Matheny: Yeah, this is Michael. I want to add to that, I totally agree GPT-4 is quite a bit more powerful. Just to note for that Zach Khanani just put out an article actually where he and some others fed ChatGPT the medical licensing exam and it was correct more than 90% of the time, which was a huge difference from prior models. So absolutely, that is true.

Holy Fletcher: Yaa, that brings us to what you have been working on some. Do you want to speak a bit about the obstacle course that you’ve been putting it through on the biomedical side?

Yaa Kumah-Crystal: Absolutely. Touchpoint between these large language models and education has been one of the most fascinating things to see evolve. With the USMLE and the bar exam demonstrating that these models have the capability to pass them or at least report answers that match what the expectation should be is pretty impressive. So a cohort of folks wanted to see how well this tool will do with regards to the clinical informatics boards, which is one of the board exams that we take to be certified clinical informaticists. Involves questions to do with problem solving and synthesis of information. And this was using the GPT-3.5 model. It was able to pass with a 74% rate, which is as well or better than a clinical informatics trained fellow, which is pretty impressive, I would say. Again, this is not even using GPT4. What we would say about that is, well what does this mean for education?

If a computer can perform just as well as someone else who is academically trained, what does it mean for what we expect people to know and how we expect people apply that knowledge, and what we’re testing on and why? In informatics we have maintenance certification. So you take your initial high stakes exam in a proctored setting where you have to make sure that you’re in the room, people are watching you, and you have all the data in your brain and you have to dish it out as the time comes. We’ve moved towards, for maintenance of certification,

self-paced exams where you can take things in a setting of your choice and it’s even open book, where you can refer to things online and journal articles that you have. You have a timeframe with which to answer any of those questions.

With all the questions, though, in the informatics boards, even if you have access to Google, it’s not simple, straightforward questions like, “What is the answer to this?” It’s not just like matching and fill in the blank. You still have to think through a process and apply these principles to derive an answer with. So you still have to have some kind of foundational knowledge to say like, well, I understand this principle, therefore, to answer this question, I have to know how to apply this principle to this answer. If ChatGPT can answer that with no training required, that means someone without any foundational knowledge whatsoever can pass one of these exams. And what does that necessarily mean about the integrity of these tests and what we’re testing and why we’re testing it? So I think this is one of the areas where in medical education and elementary education, teachers are a bit nervous about what it means for our future learners.

I work in the master’s informatics training course and we have graduate students and physicians that take our courses. So these are people who are opting in to get more education. These are adult learners, which are a wonderful group to work with. But in elementary education where people are still forming their thoughts and their ideas and their voice, when we were talking about writing before, it’s hard to know how to craft a good story if you haven’t found your voice yet. If you’re an adult learner and you’re having a tool help you think of your thoughts, so you can craft it in your own voice, that’s different. But with elementary education and people that are still figuring things out, where do these things fit in facilitating, learning, and how could they potentially hinder learning as well?

Holy Fletcher: The last several minutes of this discussion have hinged around regulation and guidelines and providing people with the parameters on how to be savvy and how to be trained in usage. And I’m curious, and it could be from a variety of perspectives, but where do you all think regulations need to come from, from a writer’s perspective, whether it’s journalism or communications or media or advertising, I think that there is a best set of practices regarding attribution. We saw attribution and maybe a type of misjudgment over at Vanderbilt’s about six weeks ago. But I see this as a multifactorial or multi-layered place for regulations, depending on its usage. For instance, it’s internal, maybe it’s through the FDDA or CMS. And do you think there’s any place for federal or state regulation to get involved?

Brad Malin: Yes. But I think at the end of the day you have to ask the question of is that the right way to go? So you regulate a system when you think or when you have experience that the environment is going to be perverted in some way. If a market is going to be manipulated, then you put rules in place to prevent it from being skewed in a direction that makes it unfair, manipulable. So I’m not convinced that we’re in a situation yet in which the technology needs to be regulated. It’s probably going to be more appropriate to figure out the applications of the technology and then let the authorities or the agencies that have some oversight with respect to that environment.

So if we’re talking about healthcare, then we might be talking about the US Department of Health and Human Services. If we’re talking about whether this is general commerce, we might be talking about the Federal Trade Commission. But you’re really going to have to see under what conditions this type of technology is brought to the table before you can say who should the regulator be. I will say New York as a state has already put in a law that makes it illegal to use AI for discrimination purposes or for employment purposes. So whether or not we want this or we say do we need this or not, we’ve already seen that at least one state and other states are potentially going to follow suit.

Yaa Kumah-Crystal: I’d agree that some kind of oversight or general guidelines are going to be needed to make sure that we are using these tools safely. With regards to medicine, I think we’re one of the most tightly regulated industry and if we’re going to be implementing these tools, some of the very straightforward things that comes to mind are with regards to how we are going to allow it to give medical advice. And when it does give medical advice, when it gives medical advice to a provider versus a patient, what does that allowed to look like? And who is ultimately responsible? Is it Dr. Kumah-Crystal, the pediatric endocrinologist? Is it Vanderbilt? Is it Sam Altman? So I think putting in some kind of parameters and clarity in place so that we know how to expect these things to bubble up and people can engineer and design appropriately the safeguards in place.

And I think just larger themes of regulation and oversight to just protect ourselves in society against some of the potential for abuse. We mentioned the discrimination, which I think is huge and important that we need to do a lot of focus on regards to where the data even comes from. But also, I don’t know if you guys ever watch that show, Silicon Valley. The conclusion of that hilarious series was their AI at the end got so smart that it broke encryption and they realized that they couldn’t let it go out into the wild because that would disrupt whole financial systems and everything ever. So maybe we should have rules around, “Hey, if you get to GPT-5, 6, 7 and you figure out it can break encryption, let’s have some more discussions with national security before you put that in the hands of the people in Reddit.”

Holy Fletcher: Yes, that’s very good. Adam, well it looks like you’re going to speak and then I want to do a lightning round.

Adam Wright: Yeah, I think I largely agree. But I do think that this genie is hard to put back in the bottle. So if the AI learned how to break encryption, then it means encryption can be broken. And so maybe we need to make sure the world knows that in some responsible disclosure kind of way. So when I see things about AGI pause, or not trying to build bigger models or something, I’m not sure that I see that as the answer, but I do think care and how we train these things is really important.

Holy Fletcher: So for this lightening round, Jon, we’ll start with you. I’m curious about what really fuels your fire right now to think about this and what you want people to just embrace and

chase.

Jon Schoenecker: Yeah, I think that the technology, as Adam just said in terms of the genie being out of the bottle, is absolutely amazing. And the uses that you can apply this to are so vast that the idea, every day you wake up and you have different ideas you can use this for. For me personally, yeah, I come in from a field that we talked about how it can pass the medical licensing exams, but in pediatric orthopedics we take care of rare diseases with heterogeneous presentations that we’re never going to have the data that would let us live up to what we would do in an evidence-based model. And so I like the joke around is that what comes in my clinic door, about 10% of it you can test on, 90% I treat by my intuition as a physician.

And I think one of the neatest things I’ve seen about this is that there’s been a big gap in medicine, as an example, of how to actually measure that clinical intuition. And there’s a way that you can play around with these models to actually figure out and objectively show how we take care of these type of diseases that is very difficult to do through traditional biostatistics, et cetera. It actually is downloading our brain. There’s a reason why you go and see the silver-haired physician who’s been seeing things for a long time because he can treat or she can treat the things that we don’t have data for and you trust them for that. So there’s aspects about this that I found incredibly enlightening, that I think we could do that make things like our subspecialty clinics at Vanderbilt special.

Holy Fletcher: Michael?

Michael Matheny: Yeah, so I too think there’s a tremendous opportunity for a deployment and use of these tools. I guess the thing I would like to go back to is the glass is half full perspective. I think for population health management and for clinical decision support, I think there’s just a lot of opportunity to, once properly sub-trained and integrated, to be able to support healthcare and really move things forward. I also think there’s a lot of work still to be done in translating how ChatGPT represents its outputs back into computable knowledge. So I think a lot of the group has talked about how it generates a narrative text and represents what it’s learned. I think Yaa made this point, that’s still not entirely the same as itself computable structured knowledge that can be then used succinctly downstream. And so I think there’s a lot of opportunity to still work in that area.

Holy Fletcher: Brad?

Brad Malin: I would echo the sentiment that there’s a lot of opportunity here. I think generative AI in its rawest form is going to facilitate new types of investigation. It’s going to speed up the work of many different people across society and across healthcare in general. At the same time, I think that we are going to have to pay attention to the way in which it’s affecting the way people think and make sure that people don’t habituate to something that creates a false sense of security with respect to the way that it provides a service. That’s going to be challenging, the deeper this technology becomes ingrained within society. And so that’s just going to be

something that people are going to have to be forced to be vigilant about. Holy Fletcher: Adam.

Adam Wright: Yeah, so I think that when this debate came out, a lot of our first conversations were about how we’ll use it to write notes, how we use it to correspond with patients. I would just put in a plug for the idea of using it to build clinical tools. And what I mean by that is I’ve actually found the most use recently from generative models and things like GitHub Copilot and tools that help me write code better. And so we have several hundred people who work in our health IT department that are building order sets and decision support tools and forms for documentation. And I think that a lot of that work could be made much more efficient with tools, generative tools like ChatGPT. So I’d be interested in seeing could we use these tools to actually generate tools in our EHR that make our EHR work better. So I would say writing notes is cool, talking to patients is cool, diagnosing is cool, but don’t forget the scutt work of getting EHR to work right.

Holy Fletcher: Yaa.

Yaa Kumah-Crystal: I think we are on the precipice of something incredible here and I couldn’t be more excited. I don’t know if you guys remember when Napster came out, not admitting that I ever used that tool because, my goodness, that’s not my property. And there was just so much excitement about the fact that you could now do this new thing and it was very disruptive and it was buggy, it was not even a great application. It would crash a lot, but you could do this new thing with technology and you could just share information in a way you never could before. Metallica wasn’t thrilled about that, but that caused something. And the way music has changed and evolved and going from that 20 years later, 30 years later, it’s a different place.

And when the internet came out, we were all like, “Oh, information at people’s fingertips, everybody’s going to be so much smarter.” And we kind of were, but then we got a lot of misinformation and there was a lot of spam and then we got bullying. A lot of the tools that we have at our hands make us realize what is important to us and what we need to focus on and what easily distracts us. And I think this is such an important time where we’re kind of like nascent in where these tools are going and what path they’re taking for us to figure out what we want to get out of these things, what direction we want to steer society in. And how we can actually make the fact that information is now at your fingertips, but it’s actually understandable on all these different levels.

How we, as a society, want to make sure we’re establishing what our priorities are, learning how to converse clearly with each other, tearing down some of these silos and these echo chambers, so that we can arrive at a path of mutual creativity. Because, as we march towards these tools getting smarter and smarter, what if we do get to AGI one day? What does that mean for us and what’s it going to say about us and how’s it going to react to everything we’ve put out there in the universe? How’s it going to judge us? I don’t know. But I think the sooner we start to put

more positivity out there and start thinking about the best ways we can leverage these tools, the better they will reflect who we are as people.

Holy Fletcher: I love all of that. We do have a request from Nadine, so I will bring her up. Nadine, if you could keep it kind of brief since we’re at time. And my same rule goes, if you get rowdy, I’ll have to kick it back down, but here we go.

Nadine: I’m new here and I’m a medical student. I wanted to say few things about ChatGPT and I have a question. First, I want to say that it’s too early to say that ChatGPT has risks or opportunity or any kind of real influence on medicine because first let me say that English is my third language, not even my second. I first talk in Arabic. I asked a few medical question in Arabic to ChatGPT and it was a disaster. Like nothing was right, not even close. The same questions were 1% true in English and in French. But in Arabic, it was a total disaster. So maybe some people will use it to diagnose themselves and it’ll be like false diagnosis.

Second note or remark, it will be that one of the most important thing in medicine is the doctor-patient relationship and the psychological bond. So this is not related or even close in ChatGPT. My question is who’s watching, who’s monitoring ChatGPT? And there are more than the benefits if it’s a public thing for people to diagnose themselves and to get some information. And thank you.

Holy Fletcher: Thank you so much. Who wants to speak to what happens if people use ChatGPT to get a diagnosis or to inform themself about a disease?

Yaa Kumah-Crystal: Well, if you haven’t heard from me enough, I can weigh in. But anyone feel free to interrupt me because I could talk for hours. I think everybody knows that. With regards to how useful it is in other languages, I think that’s one of the most important things people need to be aware of with regards to the limitation of these tools. Again, if you’re eating cereal with a fork, it’s not going to work as well. You have to understand what it’s used for, how it was trained and what its limitations are. So I don’t think the database of all the knowledge that went into training it is representative of all these other languages and cultures and things like that. And it can extrapolate and it can make inference as best it can, but it’s not going to achieve the same level of performance as what it was trained on, which tells us, which is good. I’m glad you did that assessment and found this because that gives us more information about how to make it more performative.

With regards to its ability to diagnose. So you’ll hear things on both ends of the spectrum. There’s actually an op-ed recently published, from an ED physician, who put in some symptoms that a patient who he had recently came in and it missed an ectopic pregnancy, which is a life-threatening disease to miss. And at the other end of the spectrum, you have someone who said they were able to put in symptoms when they were on a diagnostic journey for years and it diagnosed congenital adrenal hyperplasia, which is a very rare condition.

So again, I think it comes down to how are we going to make sure the user interfaces of these tools – because this is like the easiest thing in the world to use. You log in with a Google account, you verify you’re a human, ironically enough, and you’re good to go. You start putting in stuff, start giving you stuff back. How are we going to make sure that when we put these tools in the hands of just end users, not people who are qualified enough to know how to log into a Twitter Space and understand informatics, that they know what the limitations are.

You get a popup screen that says like, “This is in research mode” and then you sign a EULA that surrenders all your rights for anything you put in there. Is that safe? Is that fair? Do we need regulation around that, so that it literally has in bold letters and blinking saying this stuff might be made up? These are the conversations that need to be had. As a physician, as a researcher, as someone who spends way too much time on ChatGPT, I know what it’s going to do and I know when it’s making up stuff. But not everybody does. And I don’t know that they’ve done enough with these interface to make sure people know that.

Holy Fletcher: I completely agree. Before we wrap up, I do want to say, Jon, did you want to pop in?

Jon Schoenecker: Oh, I completely agree with what Yaa said, is that right now it’s not safe. In the current form, right now on the model, it’s medically not safe. Its whole entire job is to try to give you an answer that makes you feel good. And the analogy we talked about before of just saying 90% of what shows up to our clinic, there really isn’t data for, it’s going to hallucinate, it’s going to figure out things that is going to give you an answer and that could be a problem. We’ve run some fun models ourselves and done what we call ordinal hierarchy where we look through and say, what disease, adverse outcome would you be most worried about? And recently when I ran through it, GPT-4 told me it was more worried about a patient developing a DVT than death. And that’s pretty striking. And so there’s going to be a lot of fine-tuning of this before, I think, we even think of rolling it out as to being safe for patient care.

Holy Fletcher: I cannot tell you how excited I am that all of you came and you stayed and you chatted about this. I love that we had questions, it was fantastic. We will be doing more about this, I hope, in season four in the relatively near future. So follow insights to stay in touch. And also as we wrap up, I just want to remind people that we are in a stage, it sounds like, and from my observations, that’s very much like news consumption on the internet. We have to be vigilant at all ages, at all levels of education and common sense about misinformation, about deep fakes. And I think that we need to be as vigilant and proactive in educating, to your point, Yaa, about how people interact with this technology.

As people are beginning, too slowly, but beginning to talk about news verity, the accuracy of what you read. There’s this news literacy project that is vitally important. And from my perspective as a journalist who works with really smart brains on topics like this, it sounds like we need to have an AI or ChatGPT kind of literacy project to make sure that people know that it can hallucinate, that it can prioritize in the wrong way, and to treat it with skepticism. The joke is,

“It’s on the internet, so it must be true.” And I think that there’s a lot of work to be done to make sure that we as a society and as institutions are approaching this with enthusiasm, but also caution. So I just want to say thank you so much. Please follow all of our speakers, so you can stay in touch with their research and we will see you here.

Clark Buckner: What’s interesting to me is that since this conversation occurred, we’ve already seen shifts in industries, including medicine and at VUMC respond to both opportunity and efficiency of generative AI and the risks. The National Academics of Medicine, for instance, launched a three-year initiative to establish an AI code of conduct for medicine and research. And these swift responses are possible because AI is actually decades old and is already powering so much of our lives. The new component is that now it’s at the disposal of the public.

We’ll be discussing this on season four, along with other forces, etching grooves of change into our lives, our work, and our health. Subscribe and follow us anywhere at you get your podcasts and look for the new season at listendna.com. Thank you for being part of the DNA community. We really appreciate you.

As a reminder, Vanderbilt Health, DNA, Discoveries in Action, is an editorialized podcast from Vanderbilt Health that isn’t meant to replace any form of medical advice or treatment. If you have questions about your medical care or health, please consult your physician or care provider.

Interested in learning more about how health care is using AI?

 

 

Read:

JAMA Network Open: Algorithmovigilance—Advancing Methods to Analyze and Monitor Artificial Intelligence–Driven Health Care for Effectiveness and Equity

Published April 15, 2021

Peter J. Embi, MD, MS1

Attend:

Vanderbilt Health’s Brock Family Center for Applied Innovation is sponsoring the inaugural Healthcare Artificial Intelligence Sessions (HAIS 23).

When: Wednesday, September 20th 9am-5pm (All day attendance is encouraged)

Where: Langford Auditorium, Vanderbilt University Medical Center, 2209 Garland Ave, Nashville, TN 37232

Who: All are welcome to attend this in-person event

Cost: Free

Learn more about the event and register.