VUMC team pioneers more efficient methods for validating EHR data in biomedical research.

Electronic-health-record data is rapidly gaining traction as an efficient and affordable way to stoke biomedical research.

However, validation studies continue to find substantial errors in some EHR data that raises red flags.

Bryan Shepherd, Ph.D., a professor of biostatistics and biomedical informatics at Vanderbilt University Medical Center, is working to identify better methods of validating EHR data and new ways to use validation data in biomedical research.

“We want to harness the EHR’s full potential to gain a deeper understanding of human health,” he said.

Shepherd and his team are applying novel statistical methods, study designs and tools to develop new strategies for validating patient records and other electronic data. He said their goal is to reduce or eliminate errors in EHR datasets – especially those that could bias study results.

NIH MERIT Award

Shepherd and his team already have a track record of success. They recently received a MERIT Award, or Method to Extend Research in Time Award, from the National Institute of Allergy and Infectious Diseases (NIAID). The MERIT Award provides long-term grant support to investigators’ work based on their demonstration of competent and productive research and likelihood of continuing their research forward at a high level. 

Shepherd and his research partner Pamela Shaw, Ph.D., senior biostatistics investigator at Kaiser Permanente Washington Health Research Institute, were recognized for their research project analyzing “correlated outcome and covariate errors” in studies involving HIV/AIDS. 

Other VUMC investigators on the awarded team were Gustavo Amorim, Ph.D. and Ran Tao, Ph.D. in the Department of Biostatistics, Stephany Duda, Ph.D. in the Department of Biomedical Informatics, and Timothy Sterling, M.D. and Jessica Castilho, M.D., in the Department of Medicine, Division of Infectious Diseases.

HIV/AIDS studies often rely on existing EHR data that has not typically been gathered for research purposes, which merits a closer look, Shepherd said.

“These datasets are often incomplete,” he said. “They may neglect to record lifestyle factors or other pertinent details that the study is focused on.”

For the purposes of research, EHR data – typically gathered through physician-patient interaction – are generally extracted by automated computer algorithms, which can result in gaps and inaccuracies. Without proper statistical adjustments, these errors can introduce bias into study results, Shepherd said.

“We develop study designs and methods that identify and adjust for the potential impact of error-prone data.”

“We develop study designs and methods that identify and adjust for the potential impact of error-prone data,” he explained.

According to Shepherd, more precise answers to research questions can lead to improved clinical decision-making and better care for people living with HIV by accurately predicting their risks and responses to treatment.

Applying to Other Datasets

These methods also work well for other datasets. Shepherd and his team recently used them in a paper on gestational diabetes mellitus and childhood obesity that won a 2023 Biometrics prize.

Increasingly, EHR data analysis and similar statistical frameworks are being focused on international populations using the International Epidemiology Databases to Evaluate AIDS, network, which contains decades of data on more than 2.2 million people in 44 countries.

About the Expert

Bryan Shepherd, Ph.D.

Bryan Shepherd, Ph.D., is a professor of biostatistics and biomedical informatics at Vanderbilt University Medical Center. His research involves developing and applying novel statistical methods to studies of HIV/AIDS and other diseases of importance to global health.