New tool uncovers information in patient clinic notes to improve lung cancer screening for smokers and former smokers.

Key Takeaways

  • EHR tool scans both structured data and text-based clinic notes
  • Adding unstructured data finds more patients eligible for lung cancer screening
  • Black patients benefit with enhanced lung cancer screening eligibility assessment

A team of investigators at Vanderbilt University have developed a tool to mine EHR contents for details on smoking habits to help identify patients who might benefit from early lung-cancer screening.

The tool uses natural language processing (NLP), a form of artificial intelligence, to unearth smoking-related data tucked away in lengthy clinic notes. These clues will be combined with more structured, quantitative EHR data entered in designated fields, such as current smoking status, smoking volume or quit year.

The dual-pronged tactic proved successful in identifying patients eligible for lung cancer screening in a recent proof-of-concept study.

“This approach has the potential to streamline the identification of [lung cancer screening]-eligible patients and serve as a technological foundation for the future development of a clinical-decision-support tool,” the team said in an article in The International Journal of Medical Informatics.

Support for Screening

The new tool is the latest aimed at addressing dismal lung-cancer screening rates in the United States. Currently, less than 5 percent of at-risk adults receive the recommended CT scans to check for nodules. Among non-Hispanic Blacks, the rate drops to 1.7 percent.

A strategy to boost screening is to expand the eligibility pool. In 2021, the U.S. Preventive Services Task Force lowered from 30 to 20 the number of pack-years required to support a screening recommendation. Pack-years are used to measure potential impact on the smoker and is based on the period of time spent having approximately 20 smokes a day, the number in a typical cigarette pack. This move nearly doubled the number of adults eligible for the recommendation, the researchers noted. A major goal of the effort was to reduce health disparities among Black Americans.

Clinical decision support for lung cancer screening stands to be greatly improved, both overall and with special regard to health care equity for Black patients.

In developing their new EHR tool, the researchers had a similar aim.

“We wanted to increase the proportion of Black and young patients who meet screening guidelines, many of whom may not report smoking habits in patient portals or structured datasets,” said first author Siru Liu, Ph.D., an assistant professor of biomedical informatics at Vanderbilt. “For these patients, especially, it is important to integrate smoking data from clinical notes.”

Validating the Tool

Liu and colleagues tested their algorithm using three years of structured and unstructured data from the Vanderbilt EHR. They included 102,475 patients in the final dataset and more than 1.5 million clinic notes. Approximately 10 percent of the patients identified as Black or African American.

The researchers applied two other artificial intelligence models using structured data alone. This baseline strategy yielded 5,887 patients eligible for screening.

In contrast, the new hybrid approach yielded 10,231 eligible patients – almost doubling the number. By adding information from notes and other unstructured data, the model boosted the number of newly eligible Black patients.

Using text written into the EHR related to smoking can play an important role in assessing eligibility for lung cancer screening, the researchers found.

“Compared with the baseline approach, our NLP-based approach identified 119 percent more Black/African Americans who meet screening guidelines,” they wrote.

Putting it Into Practice

Leveraging NLP to extract relevant information from clinic notes could save time for providers. The approach is gaining traction in efforts to identify high-risk patients across specialties.

The researchers are hopeful that as NLP-based tools evolve, they might be optimized to run in real-time within the EHR, converting unstructured data to structured data that can be more quickly assessed. The goal, Liu says, is to ensure no at-risk patients fall through the cracks.

“Our results show that electronic clinical decision support for lung cancer screening stands to be greatly improved, both overall and with special regard to health care equity for Black patients,” Liu told VUMC News.

About the Expert

Siru Liu, Ph.D.

Siru Liu, Ph.D., is an assistant professor in the Department of Biomedical Informatics at Vanderbilt University Medical Center and the Department of Computer Science at Vanderbilt University. Her research focuses on the optimization of EHR features and functions, with a particular interest in clinical decision support.