Publications | Hao Liu

Note: JBI and JAMIA are leading health informatics journals with impact factors 8.000 and 7.942 (2022).
AMIA (American Medical Informatics Association) is one of the top conferences in biomedical and health informatics.

2026

AMIA Informatics

A Multi-Model LLM Consensus Framework to Identify EHR-Predictable Eligibility Criteria in NSCLC Immunotherapy Trials

Abdul Muqeeth, Yu Huang, Bian Jiang, Hao Liu, and Yan Zhuang

In AMIA 2026 Amplify Informatics Summit Feb 2026

2025

bdcc

Fine-Tuning LLaMA2 for Summarizing Discharge Notes: Evaluating the Role of Highlighted Information

Mahshad Koohi Habibi Dehkordi, Yehoshua Perl, Fadi P Deek, and Hao Liu

Big Data and Cognitive Computing Feb 2025

HTML
JMIR

Improving Large Language Models’ Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation

Mahshad Koohi Habibi Dehkordi, Yehoshua Perl, Fadi P Deek, Zhe He, Vipina K Keloth, Hao Liu, and 2 more authors

JMIR Medical Informatics Feb 2025

HTML
MedInfo

LLM-Integrated Normalization and Knowledge for FHIR (LINK-FHIR)

Zhen Hou, Yan Zhuang, Hao Liu, and Ming Jiang

In 20th World Congress on Medical and Health Informatics (MedInfo 2025) Aug 2025

HTML
MedInfo

EvidenceOutcomes: a Dataset of Clinical Trial Publications with Clinically Meaningful Outcomes

Yiliang Zhou, Abigail M. Newbury, Gongbo Zhang, Betina Ross Idnay, Hao Liu, Chunhua Weng, and 1 more author

Aug 2025

HTML
npj Health Syst

Enhancing medical coding efficiency through domain-specific fine-tuned large language models

Zhen Hou, Hao Liu, Jiang Bian, Xing He, and Yan Zhuang

npj Health Systems May 2025

Abs HTML

Medical coding is essential for healthcare operations yet remains predominantly manual, error-prone (up to 20%), and costly (up to $18.2 billion annually). Although large language models (LLMs) have shown promise in natural language processing, their application to medical coding has produced limited accuracy. In this study, we evaluated whether fine-tuning LLMs with specialized ICD-10 knowledge can automate code generation across clinical documentation. We adopted a two-phase approach: initial fine-tuning using 74,260 ICD-10 code–description pairs, followed by enhanced training to address linguistic and lexical variations. Evaluations using a proprietary model (GPT-4o mini) on a cloud platform and an open-source model (Llama) on local GPUs demonstrated that initial fine-tuning increased exact matching from <1% to 97%, while enhanced fine-tuning further improved performance in complex scenarios, with real-world clinical notes achieving 69.20% exact match and 87.16% category match. These findings indicate that domain-specific fine-tuned LLMs can reduce manual burdens and improve reliability.
AAAI

Geoinformatics-Guided Machine Learning for Power Plant Classification

Blessing Austin-Gabriel, Aparna S. Varde, and Hao Liu

Feb 2025
HEALTHINF

CFC Annotator: A Cluster-Focused Combination Algorithm for Annotating Electronic Health Records by Referencing Interface Terminology

Shuxin Zhou, Pritam Sen, Hao Liu, Mahshad Koohi H Dehkordi, and Yehoshua Perl

In 18th International Conference on Health Informatics (HEALTHINF 2025) Jan 2025

2024

BIBM

Using Clinical Entity Recognition for Curating an Interface Terminology to Aid Fast Skimming Of EHRs

Navya Martin Kollapally, Mahshad Koohi H Dehkordi, Yehoshua Perl, James Geller, Fadi P Deek, Hao Liu, and 4 more authors

In 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM2024) Dec 2024
BIBM

Enhancing Patient Comprehension: An Effective Sequential Prompting Approach to Simplifying EHRs Using LLMs

Mahshad Koohi H Dehkordi, Shuxin Zhou, Yehoshua Perl, Fadi P Deek, Andrew J. Einstein, Gai Elhanan, and 2 more authors

In 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM2024) Dec 2024
IISA

Audiovisual Multimodal Cough Data Analysis for Tuberculosis Detection

Jyoti Yadav, Aparna S. Varde, Hao Liu, George Antoniou, and Lei Xie

In 2024 IEEE International Conference on Information, Intelligence, Systems and Applications (IISA2024) Jul 2024
IEEE/ACM

Demo: Accelerating Patient Screening for Clinical Trials using Large Language Model Prompting

Anand Gopeekrishnan, Shibbir Ahmed Arif, and Hao Liu

In 2024 IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE) Jul 2024

HTML
IISA

Audiovisual Multimodal Cough Data Analysis for Tuberculosis Detection

Jyoti Yadav, Aparna S. Varde, Hao Liu, George Antoniou, and Lei Xie

In 2024 IEEE International Conference on Information, Intelligence, Systems and Applications (IISA2024) Jul 2024
NeurIPS

EvidenceOutcomes: a Dataset of Clinical Trial Publications with Clinically Meaningful Outcomes

Yiliang Zhou, Abigail Newbury, Gongbo Zhang, Betina Idnay, Hao Liu, Chunhua Weng, and 1 more author

NeurIPS (in submission) Jun 2024

Abs

The fundamental process of evidence extraction and synthesis in evidence-based medicine involves extracting PICO (Population, Intervention, Comparison, and Outcome) elements from biomedical literature. However, Outcomes, being the most complex elements, are often neglected or oversimplified in existing benchmarks. To address this issue, we present EvidenceOutcomes, a novel, large, annotated corpus of clinically meaningful outcomes extracted from biomedical literature. We first developed a robust annotation guideline for extracting clinically meaningful outcomes from text through iteration and discussion with clinicians and Natural Language Processing experts. Then, three independent annotators annotated the Results and Conclusions sections of a randomly selected sample of 500 PubMed abstracts and 140 PubMed abstracts from the existing EBM-NLP corpus. This resulted in EvidenceOutcomes with high-quality annotations of an inter-rater agreement of 0.76. Additionally, our fine-tuned PubMedBERT model, applied to these 500 PubMed abstracts, achieved an F1-score of 0.69 at the entity level and 0.76 at the token level on the subset of 140 PubMed abstracts from the EBM-NLP corpus. EvidenceOutcomes can serve as a shared benchmark to develop and test future machine learning algorithms to extract clinically meaningful outcomes from biomedical abstracts.
ICHI

Using Generative Large Language Models for Hierarchical Relationship Prediction in Medical Ontologies

Hao Liu, Shuxin Zhou, Zhehuan Chen, Yehoshua Perl, and Jiayin Wang

In 2024 IEEE 12th International Conference on Healthcare Informatics (ICHI) Jun 2024

Abs HTML

This study extends the exploration of ontology enrichment by evaluating the performance of various open-sourced Large Language Models (LLMs) on the task of predicting hierarchical relationships (IS-A) in medical ontologies including SNOMED CT Clinical Finding and Procedure hierarchies and the human Disease Ontology. With the previous finetuned BERT models for hierarchical relationship prediction as the baseline, we assessed eight open-source generative LLMs for the same task. We observed only three models, without finetuning, demonstrated comparable or superior performance compared to the baseline BERT -based models. The best performance model OpenChat achieved a macro average F1 score of 0.96 (0.95) on SNOMED CT Clinical Finding (Procedure) hierarchy, an increase over 7% from the baseline 0.89 (0.85). On human Disease Ontology, OpenChat excels with an F1 score of 0.91, outperforming the second-best performance model Vicuna (0.84). Notably, some LLMs prove unsuitable for hierarchical relationship prediction tasks or appliable for concept placement of medical ontologies. We also explored various prompt templates and ensemble techniques to uncover potential confounding factors in applying LLMs for IS-A relation predictions for medical ontologies.
JAMIAO

Retrieval Augmented Scientific Claim Verification

Hao Liu, Soroush Ali, Jordan G. Nestor, Elizabeth Park, Betina Idnay, Yilu Fang, and 5 more authors

JAMIA Open Jan 2024

Abs HTML

Objective: To evaluate the veracity of a PICO-based claim against clinical trial literature on PubMed. Materials and Methods: We construct CoVERt, a new Covid VERification dataset that consists of COVID-19-related PICO-compatible claims accompanied by clinical trial abstracts that either support or refute the claim. We then develop CliVER, an end-to-end scientific Claim VERification system for CoVERt. CliVER automatically selects abstracts from the clinical trial literature containing rational sentences that Support or Refute a given PICO-based claim. We further introduce an ensemble of three state-of-the-art systems for label prediction. Results: We indexed 189,648 abstracts in PubMed between January 2010 to October 2021 as our clinical literature repository. The performance of CliVER was evaluated by verifying 19 claims from six disease domains by clinicians. CliVER achieved a precision of 79.0% for abstract retrieval, 67.4% for sentence selection, and 63.2% for label prediction, respectively. In the label prediction evaluation on CoVERt, CliVER achieved an F1 score of 0.92, where the ensemble of label prediction models outperforms each state-of-the-art by an absolute increase in F1 score from 3% to 11%. Conclusion: CliVER is a pioneering system that demonstrates the potential to automate PICO-based claim verification on clinical trial publications. We hope it can be leveraged to reduce the labor cost of evidence extraction for clinical research and improve the efficiency of claim verification using the massive medical literature.

2023

BIBM

Using Annotation for Computerized Support for Fast Skimming of Cardiology Electronic Health Record Notes

Mahshad Koohi H Dehkordi, Andrew J Einstein, Shuxin Zhou, Gai Elhanan, Yehoshua Perl, Vipina K Keloth, and 2 more authors

2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Dec 2023

HTML
JAMIA

The Suitability of UMLS and SNOMED-CT for Encoding Outcome Concepts

Abigail Newbury, Hao Liu, Betina Idnay, and Chunhua Weng

J. Am. Med. Inform. Assoc. Aug 2023

HTML
MIE

How Good is ChatGPT for Medication Evidence Synthesis?

Hao Liu, Yifan Peng, and Chunhua Weng

Medical Informatics Europe Apr 2023

HTML
AMIA

Can Race-sensitive Biomedical Embeddings Improve Healthcare Predictive Models?

Hao Liu, Nour Moustafa-Fahmy, Casey Ta, and Chunhua Weng

AMIA Informatics Summits Mar 2023

Abs HTML

This reproducibility study presents an algorithm to weigh in race distribution data of clinical research study samples when training biomedical embeddings. We extracted 12,864 PubMed abstracts published between January 1st, 2000 and January 1st, 2022 and weighed them based on the race distribution data extracted from their corresponding clinical trials registered on ClinicalTrials.gov. We trained Word2vec and BERT embeddings and evaluated their performance on predicting length of hospital stay (LHS) and intensive care unit (ICU) readmission using MIMIC-IV electronic health record data. We observed that models trained using race-sensitive embeddings do not consistently outperform the neutral embeddings ones when used for LHS prediction (with similar Mean Absolute Error 1.975 vs. 2.008) or ICU readmission prediction (with similar accuracy 74.61% vs. 75.17% and the same AUC 0.775), respectively. We conclude that demographic sensitive embeddings do not necessarily significantly improve the accuracy of health predictive models as previously reported in the literature.
JBI

A Data-Driven Approach to Optimizing Clinical Study Eligibility Criteria

Yilu Fang, Hao Liu, Betina Idnay, Karen Marder, and Chunhua Weng

J. Biomed. Inform. Apr 2023

Abs HTML

Objective: To evaluate the veracity of a PICO-based claim against clinical trial literature on PubMed. Materials and Methods: We construct CoVERt, a new Covid VERification dataset that consists of COVID-19-related PICO-compatible claims accompanied by clinical trial abstracts that either support or refute the claim. We then develop CliVER, an end-to-end scientific Claim VERification system for CoVERt. CliVER automatically selects abstracts from the clinical trial literature containing rational sentences that Support or Refute a given PICO-based claim. We further introduce an ensemble of three state-of-the-art systems for label prediction. Results: We indexed 189,648 abstracts in PubMed between January 2010 to October 2021 as our clinical literature repository. The performance of CliVER was evaluated by verifying 19 claims from six disease domains by clinicians. CliVER achieved a precision of 79.0% for abstract retrieval, 67.4% for sentence selection, and 63.2% for label prediction, respectively. In the label prediction evaluation on CoVERt, CliVER achieved an F1 score of 0.92, where the ensemble of label prediction models outperforms each state-of-the-art by an absolute increase in F1 score from 3% to 11%. Conclusion: CliVER is a pioneering system that demonstrates the potential to automate PICO-based claim verification on clinical trial publications. We hope it can be leveraged to reduce the labor cost of evidence extraction for clinical research and improve the efficiency of claim verification using the massive medical literature.

2022

JBI

Ontology-based Categorization of Clinical Studies by Their Conditions

Hao Liu, Simona Carini, Zhehuan Chen, Spencer Phillips Hey, Ida Sim, and Chunhua Weng

J. Biomed. Inform. Nov 2022

Abs HTML Code

OBJECTIVE: The free-text Condition data field in the ClinicalTrials.gov is not amenable to computational processes for retrieving, aggregating and visualizing clinical studies by condition categories. This paper contributes a method for automated ontology-based categorization of clinical studies by their conditions. MATERIALS AND METHODS: Our method first maps text entries in ClinicalTrials.gov’s Condition field to standard condition concepts in the OMOP Common Data Model by using SNOMED CT as a reference ontology and using Usagi for concept normalization, followed by hierarchical traversal of the SNOMED ontology for concept expansion, ontology-driven condition categorization, and visualization. We compared the accuracy of this method to that of the MeSH-based method. RESULTS: We reviewed the 4,506 studies on Vivli.org categorized by our method. Condition terms of 4,501 (99.89%) studies were successfully mapped to SNOMED CT concepts, and with a minimum concept mapping score threshold, 4,428 (98.27%) studies were categorized into 31 predefined categories. When validating with manual categorization results on a random sample of 300 studies, our method achieved an estimated categorization accuracy of 95.7%, while the MeSH-based method had an accuracy of 85.0%. CONCLUSION: We showed that categorizing clinical studies using their Condition terms with referencing to SNOMED CT achieved a better accuracy and coverage than using MeSH terms. The proposed ontology-driven condition categorization was useful to create accurate clinical study categorization that enables clinical researchers to aggregate evidence from a large number of clinical studies.
MedInfo

A Sample Size Extractor for RCT Reports

Fengyang Lin, Hao Liu, Paul Moon, and Chunhua Weng

Stud. Health Technol. Inform. Jun 2022

Abs HTML

Sample size is an important indicator of the power of randomized controlled trials (RCTs). In this paper, we designed a total sample size extractor using a combination of syntactic and machine learning methods, and evaluated it on 300 Covid-19 abstracts (Covid-Set) and 100 generic RCT abstracts (General-Set). To improve the performance, we applied transfer learning from a large public corpus of annotated abstracts. We achieved an average F1 score of 0.73 on the Covid-Set testing set, and 0.60 on the General-Set using exact matches. The F1 scores for loose matches on both datasets were over 0.74. Compared with the state-of-the-art tool, our extractor reports total sample sizes directly and improved F1 scores by at least 4% without transfer learning. We demonstrated that transfer learning improved the sample size extraction accuracy and minimized human labor on annotations.
MedInfo

Evaluation of Criteria2Query: Towards Augmented Intelligence for Cohort Identification

Cong Liu, Hao Liu, Casey Ta, James Roger, Alex Butler, Junghwan Lee, and 3 more authors

Stud. Health Technol. Inform. Jun 2022

Abs HTML

Electronic healthcare records data promises to improve the efficiency of patient eligibility screening, which is an important factor in the success of clinical trials and observational studies. To bridge the sociotechnical gap in cohort identification by end-users, who are clinicians or researchers unfamiliar with underlying EHR databases, we previously developed a natural language query interface named Criteria2Query (C2Q) that automatically transforms free-text eligibility criteria to executable database queries. In this study, we present a comprehensive evaluation of C2Q to generate more actionable insights to inform the design and evaluation of future natural language user interfaces for clinical databases, towards the realization of Augmented Intelligence (AI) for clinical cohort definition via e-screening.
MedInfo

Representation and Normalization of Complex Interventions for Evidence Computing

Zhehuan Chen, Hao Liu, Stan Liao, Marguerite Bernard, Tian Kang, Latoya A Stewart, and 1 more author

Stud. Health Technol. Inform. Jun 2022

Abs HTML

Complex interventions are ubiquitous in healthcare. A lack of computational representations and information extraction solutions for complex interventions hinders accurate and efficient evidence synthesis. In this study, we manually annotated and analyzed 3,447 intervention snippets from 261 randomized clinical trial (RCT) abstracts and developed a compositional representation for complex interventions, which captures the spatial, temporal and Boolean relations between intervention components, along with an intervention normalization pipeline that automates three tasks: (i) treatment entity extraction; (ii) intervention component relation extraction; and (iii) attribute extraction and association. 361 intervention snippets from 29 unseen abstracts were included to report on the performance of the evaluation. The average F-measure was 0.74 for treatment entity extraction on an exact match and 0.82 for attribute extraction. The F-measure for relation extraction of multi-component complex interventions was 0.90. 93% of extracted attributes were correctly attributed to corresponding treatment entities.
MedInfo

Data-Driven Modeling of Randomized Controlled Trial Outcomes

Zhehuan Chen, Yilu Fang, Hao Liu, and Chunhua Weng

Stud. Health Technol. Inform. May 2022

Abs HTML

Anecdotally, 38.5% of clinical outcome descriptions in randomized controlled trial publications contain complex text. Existing terminologies are insufficient to standardize outcomes and their measures, temporal attributes, quantitative metrics, and other attributes. In this study, we analyzed the semantic patterns in the outcome text in a sample of COVID-19 trials and presented a data-driven method for modeling outcomes. We conclude that a data-driven knowledge representation can benefit natural language processing of outcome text from published clinical studies.
MedInfo

Extending PICO with Observation Normalization for Evidence Computing

Ali Turfah, Hao Liu, Latoya A Stewart, Tian Kang, and Chunhua Weng

Stud. Health Technol. Inform. Jun 2022

Abs HTML

While the PICO framework is widely used by clinicians for clinical question formulation when querying the medical literature, it does not have the expressiveness to explicitly capture medical findings based on any standard. In addition, findings extracted from the literature are represented as free-text, which is not amenable to computation. This research extends the PICO framework with Observation elements, which capture the observed effect that an Intervention has on an Outcome, forming Intervention-Observation-Outcome triplets. In addition, we present a framework to normalize Observation elements with respect to their significance and the direction of the effect, as well as a rule-based approach to perform the normalization of these attributes. Our method achieves macro-averaged F1 scores of 0.82 and 0.73 for identifying the significance and direction attributes, respectively.
JAMIA

Combining Human and Machine Intelligence for Clinical Trial Eligibility Querying

Yilu Fang, Betina Idnay, Yingcheng Sun, Hao Liu, Zhehuan Chen, Karen Marder, and 3 more authors

J. Am. Med. Inform. Assoc. Jun 2022

Abs HTML

OBJECTIVE: To combine machine efficiency and human intelligence for converting complex clinical trial eligibility criteria text into cohort queries. MATERIALS AND METHODS: Criteria2Query (C2Q) 2.0 was developed to enable real-time user intervention for criteria selection and simplification, parsing error correction, and concept mapping. The accuracy, precision, recall, and F1 score of enhanced modules for negation scope detection, temporal and value normalization were evaluated using a previously curated gold standard, the annotated eligibility criteria of 1010 COVID-19 clinical trials. The usability and usefulness were evaluated by 10 research coordinators in a task-oriented usability evaluation using 5 Alzheimer’s disease trials. Data were collected by user interaction logging, a demographic questionnaire, the Health Information Technology Usability Evaluation Scale (Health-ITUES), and a feature-specific questionnaire. RESULTS: The accuracies of negation scope detection, temporal and value normalization were 0.924, 0.916, and 0.966, respectively. C2Q 2.0 achieved a moderate usability score (3.84 out of 5) and a high learnability score (4.54 out of 5). On average, 9.9 modifications were made for a clinical study. Experienced researchers made more modifications than novice researchers. The most frequent modification was deletion (5.35 per study). Furthermore, the evaluators favored cohort queries resulting from modifications (score 4.1 out of 5) and the user engagement features (score 4.3 out of 5). DISCUSSION AND CONCLUSION: Features to engage domain experts and to overcome the limitations in automated machine output are shown to be useful and user-friendly. We concluded that human-computer collaboration is key to improving the adoption and user-friendliness of natural language processing.

2021

JBI

A Knowledge Base of Clinical Trial Eligibility Criteria

Hao Liu, Chi Yuan, Alex Butler, Yingcheng Sun, and Chunhua Weng

J. Biomed. Inform. May 2021

Abs HTML

OBJECTIVE: We present the Clinical Trial Knowledge Base, a regularly updated knowledge base of discrete clinical trial eligibility criteria equipped with a web-based user interface for querying and aggregate analysis of common eligibility criteria. MATERIALS AND METHODS: We used a natural language processing (NLP) tool named Criteria2Query (Yuan et al., 2019) to transform free text clinical trial eligibility criteria from ClinicalTrials.gov into discrete criteria concepts and attributes encoded using the widely adopted Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and stored in a relational SQL database. A web application accessible via RESTful APIs was implemented to enable queries and visual aggregate analyses. We demonstrate CTKB’s potential role in EHR phenotype knowledge engineering using ten validated phenotyping algorithms. RESULTS: At the time of writing, CTKB contained 87,504 distinctive OMOP CDM standard concepts, including Condition (47.82%), Drug (23.01%), Procedure (13.73%), Measurement (24.70%) and Observation (5.28%), with 34.78% for inclusion criteria and 65.22% for exclusion criteria, extracted from 352,110 clinical trials. The average hit rate of criteria concepts in eMERGE phenotype algorithms is 77.56%. CONCLUSION: CTKB is a novel comprehensive knowledge base of discrete eligibility criteria concepts with the potential to enable knowledge engineering for clinical trial cohort definition, clinical trial population representativeness assessment, electronical phenotyping, and data gap analyses for using electronic health records to support clinical trial recruitment.
MedInfo

Potential Role of Clinical Trial Eligibility Criteria in Electronic Phenotyping

Zhehuan Chen, Hao Liu, Alex Butler, Anna Ostropolets, and Chunhua Weng

Stud. Health Technol. Inform. May 2021

Abs HTML

2,719 distinctive phenotyping variables from 176 electronic phenotypes were compared with 57,150 distinctive clinical trial eligibility criteria concepts to assess the phenotype knowledge overlap between them. We observed a high percentage (69.5%) of eMERGE phenotype features and a lower percentage (47.6%) of OHDSI phenotype features matched to clinical trial eligibility criteria, possibly due to the relative emphasis on specificity for eMERGE phenotypes and the relative emphasis on sensitivity for OHDSI phenotypes. The study results show the potential of reusing clinical trial eligibility criteria for phenotyping feature selection and moderate benefits of using them for local cohort query implementation.
MedInfo

Participatory Design of a Clinical Trial Eligibility Criteria Simplification Method

Yilu Fang, Jae Hyun Kim, Betina Ross Idnay, Rebeca Aragon Garcia, Carmen E Castillo, Yingcheng Sun, and 4 more authors

Stud. Health Technol. Inform. May 2021

Abs HTML

Clinical trial eligibility criteria are important for selecting the right participants for clinical trials. However, they are often complex and not computable. This paper presents the participatory design of a human-computer collaboration method for criteria simplification that includes natural language processing followed by user-centered eligibility criteria simplification. A case study on the ARCADIA trial shows how criteria were simplified for structured database querying by clinical researchers and identifies rules for criteria simplification and concept normalization.
JAMIA

Misalignment Between COVID-19 Hotspots and Clinical Trial Sites

Lauren Franks, Hao Liu, Mitchell S V Elkind, Muredach P Reilly, Chunhua Weng, and Shing M Lee

J. Am. Med. Inform. Assoc. Oct 2021

Abs HTML

Hundreds of interventional clinical trials have been launched in the United States to identify effective treatment strategies for combating the coronavirus disease 2019 (COVID-19) pandemic. However, to date, only a small fraction of these trials have completed enrollment, delaying the scientific investigation of COVID-19 and its treatment options. This study presents novel metrics to examine the geographic alignment between COVID-19 hotspots and interventional clinical trial sites and evaluate trial access over time during the evolving pandemic. Using temporal COVID-19 case data from USAFacts.org and trial data from ClinicalTrials.gov, U.S. counties were categorized based on their numbers of cases and trials. Our analysis suggests that alignment and access have worsened as the pandemic shifted over time. We recommend strategies and metrics to evaluate the alignment between cases and trials. Future studies are warranted to investigate the impact of the misalignment of cases and clinical trial sites on clinical trial recruitment.
JBI

Visual Comprehension and Orientation into the COVID-19 CIDO Ontology

Ling Zheng, Yehoshua Perl, Yongqun He, Christopher Ochs, James Geller, Hao Liu, and 1 more author

J. Biomed. Inform. Aug 2021

Abs

The current intensive research on potential remedies and vaccinations for COVID-19 would greatly benefit from an ontology of standardized COVID terms. The Coronavirus Infectious Disease Ontology (CIDO) is the largest among several COVID ontologies, and it keeps growing, but it is still a medium sized ontology. Sophisticated CIDO users, who need more than searching for a specific concept, require orientation and comprehension of CIDO. In previous research, we designed a summarization network called “partial-area taxonomy” to support comprehension of ontologies. The partial-area taxonomy for CIDO is of smaller magnitude than CIDO, but is still too large for comprehension. We present here the “weighted aggregate taxonomy” of CIDO, designed to provide compact views at various granularities of our partial-area taxonomy (and the CIDO ontology). Such a compact view provides a “big picture” of the content of an ontology. In previous work, in the visualization patterns used for partial-area taxonomies, the nodes were arranged in levels according to the numbers of relationships of their concepts. Applying this visualization pattern to CIDO’s weighted aggregate taxonomy resulted in an overly long and narrow layout that does not support orientation and comprehension since the names of nodes are barely readable. Thus, we introduce in this paper an innovative visualization of the weighted aggregate taxonomy for better orientation and comprehension of CIDO (and other ontologies). A measure for the efficiency of a layout is introduced and is used to demonstrate the advantage of the new layout over the previous one. With this new visualization, the user can “see the forest for the trees” of the ontology. Benefits of this visualization in highlighting insights into CIDO’s content are provided. Generality of the new layout is demonstrated.
ACI

A Framework for Systematic Assessment of Clinical Trial Population Representativeness Using Electronic Health Records Data

Yingcheng Sun, Alex Butler, Ibrahim Diallo, Jae Hyun Kim, Casey Ta, James R Rogers, and 2 more authors

Appl. Clin. Inform. Aug 2021

Abs

BACKGROUND: Clinical trials are the gold standard for generating robust medical evidence, but clinical trial results often raise generalizability concerns, which can be attributed to the lack of population representativeness. The electronic health records (EHRs) data are useful for estimating the population representativeness of clinical trial study population. OBJECTIVES: This research aims to estimate the population representativeness of clinical trials systematically using EHR data during the early design stage. METHODS: We present an end-to-end analytical framework for transforming free-text clinical trial eligibility criteria into executable database queries conformant with the Observational Medical Outcomes Partnership Common Data Model and for systematically quantifying the population representativeness for each clinical trial. RESULTS: We calculated the population representativeness of 782 novel coronavirus disease 2019 (COVID-19) trials and 3,827 type 2 diabetes mellitus (T2DM) trials in the United States respectively using this framework. With the use of overly restrictive eligibility criteria, 85.7% of the COVID-19 trials and 30.1% of T2DM trials had poor population representativeness. CONCLUSION: This research demonstrates the potential of using the EHR data to assess the clinical trials population representativeness, providing data-driven metrics to inform the selection and optimization of eligibility criteria.
AMIA

A Comparison between Human and NLP-based Annotation of Clinical Trial Eligibility Criteria Text Using The OMOP Common Data Model

Xinhang Li, Hao Liu, Fabrı́cio Kury, Chi Yuan, Alex Butler, Yingcheng Sun, and 3 more authors

AMIA Jt Summits Transl Sci Proc May 2021

Abs HTML

Human annotations are the established gold standard for evaluating natural language processing (NLP) methods. The goals of this study are to quantify and qualify the disagreement between human and NLP. We developed an NLP system for annotating clinical trial eligibility criteria text and constructed a manually annotated corpus, both following the OMOP Common Data Model (CDM). We analyzed the discrepancies between the human and NLP annotations and their causes (e.g., ambiguities in concept categorization and tacit decisions on inclusion of qualifiers and temporal attributes during concept annotation). This study initially reported complexities in clinical trial eligibility criteria text that complicate NLP and the limitations of the OMOP CDM. The disagreement between and human and NLP annotations may be generalizable. We discuss implications for NLP evaluation.
JAMIA

The COVID-19 Trial Finder

Yingcheng Sun, Alex Butler, Fengyang Lin, Hao Liu, Latoya A Stewart, Jae Hyun Kim, and 6 more authors

J. Am. Med. Inform. Assoc. Mar 2021

Abs HTML

Clinical trials are the gold standard for generating reliable medical evidence. The biggest bottleneck in clinical trials is recruitment. To facilitate recruitment, tools for patient search of relevant clinical trials have been developed, but users often suffer from information overload. With nearly 700 coronavirus disease 2019 (COVID-19) trials conducted in the United States as of August 2020, it is imperative to enable rapid recruitment to these studies. The COVID-19 Trial Finder was designed to facilitate patient-centered search of COVID-19 trials, first by location and radius distance from trial sites, and then by brief, dynamically generated medical questions to allow users to prescreen their eligibility for nearby COVID-19 trials with minimum human computer interaction. A simulation study using 20 publicly available patient case reports demonstrates its precision and effectiveness.
JBI

Building an OMOP Common Data Model-compliant Annotated Corpus for COVID-19 Clinical Trials

Yingcheng Sun, Alex Butler, Latoya A Stewart, Hao Liu, Chi Yuan, Christopher T Southard, and 2 more authors

J. Biomed. Inform. Jun 2021

Abs

Clinical trials are essential for generating reliable medical evidence, but often suffer from expensive and delayed patient recruitment because the unstructured eligibility criteria description prevents automatic query generation for eligibility screening. In response to the COVID-19 pandemic, many trials have been created but their information is not computable. We included 700 COVID-19 trials available at the point of study and developed a semi-automatic approach to generate an annotated corpus for COVID-19 clinical trial eligibility criteria called COVIC. A hierarchical annotation schema based on the OMOP Common Data Model was developed to accommodate four levels of annotation granularity: i.e., study cohort, eligibility criteria, named entity and standard concept. In COVIC, 39 trials with more than one study cohorts were identified and labelled with an identifier for each cohort. 1,943 criteria for non-clinical characteristics such as “informed consent”, “exclusivity of participation” were annotated. 9767 criteria were represented by 18,161 entities in 8 domains, 7,743 attributes of 7 attribute types and 16,443 relationships of 11 relationship types. 17,171 entities were mapped to standard medical concepts and 1,009 attributes were normalized into computable representations. COVIC can serve as a corpus indexed by semantic tags for COVID-19 trial search and analytics, and a benchmark for machine learning based criteria extraction.
JAMIA

Towards Clinical Data-driven Eligibility Criteria Optimization for Interventional COVID-19 Clinical Trials

Jae Hyun Kim, Casey N Ta, Cong Liu, Cynthia Sung, Alex M Butler, Latoya A Stewart, and 9 more authors

J. Am. Med. Inform. Assoc. Jan 2021

Abs HTML

OBJECTIVE: This research aims to evaluate the impact of eligibility criteria on recruitment and observable clinical outcomes of COVID-19 clinical trials using electronic health record (EHR) data. MATERIALS AND METHODS: On June 18, 2020, we identified frequently used eligibility criteria from all the interventional COVID-19 trials in ClinicalTrials.gov (n = 288), including age, pregnancy, oxygen saturation, alanine/aspartate aminotransferase, platelets, and estimated glomerular filtration rate. We applied the frequently used criteria to the EHR data of COVID-19 patients in Columbia University Irving Medical Center (CUIMC) (March 2020-June 2020) and evaluated their impact on patient accrual and the occurrence of a composite endpoint of mechanical ventilation, tracheostomy, and in-hospital death. RESULTS: There were 3251 patients diagnosed with COVID-19 from the CUIMC EHR included in the analysis. The median follow-up period was 10 days (interquartile range 4-28 days). The composite events occurred in 18.1% (n = 587) of the COVID-19 cohort during the follow-up. In a hypothetical trial with common eligibility criteria, 33.6% (690/2051) were eligible among patients with evaluable data and 22.2% (153/690) had the composite event. DISCUSSION: By adjusting the thresholds of common eligibility criteria based on the characteristics of COVID-19 patients, we could observe more composite events from fewer patients. CONCLUSIONS: This research demonstrated the potential of using the EHR data of COVID-19 patients to inform the selection of eligibility criteria and their thresholds, supporting data-driven optimization of participant selection towards improved statistical power of COVID-19 trials.

2020

BMC

Missing Lateral Relationships in Top-level Concepts of an Ontology

Ling Zheng, Yan Chen, Hua Min, P Lloyd Hildebrand, Hao Liu, Michael Halper, and 3 more authors

BMC Med. Inform. Decis. Mak. Dec 2020

Abs HTML

BACKGROUND: Ontologies house various kinds of domain knowledge in formal structures, primarily in the form of concepts and the associative relationships between them. Ontologies have become integral components of many health information processing environments. Hence, quality assurance of the conceptual content of any ontology is critical. Relationships are foundational to the definition of concepts. Missing relationship errors (i.e., unintended omissions of important definitional relationships) can have a deleterious effect on the quality of an ontology. An abstraction network is a structure that overlays an ontology and provides an alternate, summarization view of its contents. One kind of abstraction network is called an area taxonomy, and a variation of it is called a subtaxonomy. A methodology based on these taxonomies for more readily finding missing relationship errors is explored. METHODS: The area taxonomy and the subtaxonomy are deployed to help reveal concepts that have a high likelihood of exhibiting missing relationship errors. A specific top-level grouping unit found within the area taxonomy and subtaxonomy, when deemed to be anomalous, is used as an indicator that missing relationship errors are likely to be found among certain concepts. Two hypotheses pertaining to the effectiveness of our Quality Assurance approach are studied. RESULTS: Our Quality Assurance methodology was applied to the Biological Process hierarchy of the National Cancer Institute thesaurus (NCIt) and SNOMED CT’s Eye/vision finding subhierarchy within its Clinical finding hierarchy. Many missing relationship errors were discovered and confirmed in our analysis. For both test-bed hierarchies, our Quality Assurance methodology yielded a statistically significantly higher number of concepts with missing relationship errors in comparison to a control sample of concepts. Two hypotheses are confirmed by these findings. CONCLUSIONS: Quality assurance is a critical part of an ontology’s lifecycle, and automated or semi-automated tools for supporting this process are invaluable. We introduced a Quality Assurance methodology targeted at missing relationship errors. Its successful application to the NCIt’s Biological Process hierarchy and SNOMED CT’s Eye/vision finding subhierarchy indicates that it can be a useful addition to the arsenal of tools available to ontology maintenance personnel.
JBI

Concept Placement using BERT Trained by Transforming and Summarizing Biomedical Ontology Structure

Hao Liu, Yehoshua Perl, and James Geller

J. Biomed. Inform. Dec 2020

Abs HTML

The comprehensive modeling and hierarchical positioning of a new concept in an ontology heavily relies on its set of proper subsumption relationships (IS-As) to other concepts. Identifying a concept’s IS-A relationships is a laborious task requiring curators to have both domain knowledge and terminology skills. In this work, we propose a method to automatically predict the presence of IS-A relationships between a new concept and pre-existing concepts based on the language representation model BERT. This method converts the neighborhood network of a concept into “sentences” and harnesses BERT’s Next Sentence Prediction (NSP) capability of predicting the adjacency of two sentences. To augment our method’s performance, we refined the training data by employing an ontology summarization technique. We trained our model with the two largest hierarchies of the SNOMED CT 2017 July release and applied it to predicting the parents of new concepts added in the SNOMED CT 2018 January release. The results showed that our method achieved an average F1 score of 0.88, and the average Recall score improves slightly from 0.94 to 0.96 by using the ontology summarization technique.
SciData

Chia, A Large Annotated Corpus of Clinical Trial Eligibility Criteria

Fabrı́cio Kury, Alex Butler, Chi Yuan, Li-Heng Fu, Yingcheng Sun, Hao Liu, and 3 more authors

Scientific Data Aug 2020

Abs HTML

We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serve as a shared benchmark to develop and test future machine learning, rule-based, or hybrid methods for information extraction from free-text clinical trial eligibility criteria.

2019

AMIA

Training a Convolutional Neural Network with Terminology Summarization Data Improves SNOMED CT Enrichment

Ling Zheng, Hao Liu, Yehoshua Perl, and James Geller

AMIA Annu. Symp. Proc. Aug 2019

Abs HTML

As a step toward learning to automatically insert new concepts into a large biomedical ontology, we are studying the easier problem of automatically verifying that an IS-A link should exist between a new child concept and an existing parent concept. We are using a Convolutional Neural Network, a powerful machine learning method. However, results depend on the quality of the training data. We use SNOMED CT (July 2017) for training and the subsequent release for testing. The main problem is to find a good set of negative training data. We experiment with two approaches, based on uncle-nephew (not connected) pairs of concepts. We contrast using the complete Clinical Finding hierarchy of SNOMED CT with using the powerful Area Taxonomy ontology summarization mechanism to constrain the training data. The results for the task of verifying IS-A links are improved by 8.6% when going from the complete hierarchy to the Area Taxonomy.
AMIA

Transfer Learning from BERT to Support Insertion of New Concepts into SNOMED CT

Hao Liu, Yehoshua Perl, and James Geller

AMIA Annu. Symp. Proc. Aug 2019

Abs HTML

With advances in Machine Learning (ML), neural network-based methods, such as Convolutional/Recurrent Neural Networks, have been proposed to assist terminology curators in the development and maintenance of terminologies. Bidirectional Encoder Representations from Transformers (BERT), a new language representation model, obtains state-of-the-art results on a wide array of general English NLP tasks. We explore BERT’s applicability to medical terminology-related tasks. Utilizing the “next sentence prediction” capability of BERT, we show that the Fine-tuning strategy of Transfer Learning (TL) from the BERTBASE model can address a challenging problem in automatic terminology enrichment - insertion of new concepts. Adding a pre-training strategy enhances the results. We apply our strategies to the two largest hierarchies of SNOMED CT, with one release as training data and the following release as test data. The performance of the combined two proposed TL models achieves an average F1 score of 0.85 and 0.86 for the two hierarchies, respectively.

2018

AMIA

Overlapping Complex Concepts Have More Commission Errors, Especially in Intensive Terminology Auditing

Ling Zheng, Hao Liu, Yehoshua Perl, James Geller, Christopher Ochs, and James T Case

AMIA Annu. Symp. Proc. Dec 2018

Abs HTML

SNOMED CT is a large, complex and widely-used terminology. Auditing is part of the life cycle of terminologies. A review of terminologies’ content can identify two error categories: commission errors, such as an incorrect parent or attribute relationship, indicating errors in a concept’s modeling, and omission errors, such as missing a parent or attribute relationship, representing incomplete modeling of a concept. According to our experience, terminology curators are mostly interested in commission errors. In recent years, a long-term remodeling project has addressed modeling issues in SNOMED CT’s Infectious disease and Congenital disease subhierarchies. In this longitudinal study, we investigated a posteriori the efficacy of complex concepts, called overlapping concepts, to identify commission errors during intensive auditing periods and during maintenance periods over several releases. The algorithmic implication is that when auditing resources are scarce, a methodology of auditing first, or only, the overlapping concepts will obtain a higher auditing yield.
ICBO

A Quality Assurance Methodology for ChEBI Ontology Focusing on Uncommonly Modeled Concepts

Hao Liu, Ling Chen, Ling Zheng, Yehoshua Perl, and James Geller

International Conference on Biomedical Ontology (ICBO) Dec 2018

Abs

The Chemical Entities of Biological Interest (ChEBI) ontology is an important knowledge source of chemical entities in a biological context. ChEBI is large and complex, making it …
IEEE BIBM

Enrichment of SNOMED CT Ophthalmology Component to Support EHR Coding

Hao Liu, P Lloyd Hildebrand, Yehoshua Perl, and James Geller

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Dec 2018

Abs

The US government has offered major financial incentives to encourage MDs, including ophthalmologists, to adopt Electronic Health Records (EHRs) in their practices. SNOMED …
ICBO

Can a Convolutional Neural Network Support Auditing of NCI Thesaurus Neoplasm Concepts?

Hao Liu, Ling Zheng, Yehoshua Perl, James Geller, and Gai Elhanan

International Conference on Biomedical Ontology (ICBO) Dec 2018

Abs

Paper Title (use style: paper title) Page 1 Can a Convolutional Neural Network Support Auditing of NCI Thesaurus Neoplasm Concepts? Hao Liu1, Ling Zheng2, Yehoshua Perl1, James …
AMIA

Using Convolutional Neural Networks to Support Insertion of New Concepts into SNOMED CT

Hao Liu, James Geller, Michael Halper, and Yehoshua Perl

AMIA Annu. Symp. Proc. Dec 2018

Abs HTML

Many major medical ontologies go through a regular (bi-annual, monthly, etc.) release cycle. A new release will contain corrections to the previous release, as well as genuinely new concepts that are the result of either user requests or new developments in the domain. New concepts need to be placed at the correct place in the ontology hierarchy. Traditionally, this is done by an expert modeling a new concept and running a classifier algorithm. We propose an alternative approach that is based on providing only the name of a new concept and using a Convolutional Neural Network-based machine learning method. We first tested this approach within one version of SNOMED CT and achieved an average 88.5% precision and an F1 score of 0.793. In comparing the July 2017 release with the January 2018 release, limiting ourselves to predicting one out of two or more parents, our average F1 score was 0.701.

2017

MedInfo

Correcting Ontology Errors Simplifies Visual Complexity

Hao Liu, Ling Zheng, Yehoshua Perl, Yan Chen, and Gai Elhanan

Stud. Health Technol. Inform. Dec 2017

Abs HTML

In previous research we have shown that hierarchically complex overlapping concepts have a higher error rate of errors versus control concepts. In this poster we show an exmaple from Neoplasm concepts of the NCI thesaurus (NCIt) demonstrating that erroneous overplapping concepts, reflected in the partial-area units of a partial-area taxonomy, display visual complexity. Furthermore, correcting these erroneous concepts causes visual simplification.
IEEE

Multi-layer Big Knowledge Visualization Scheme for Comprehending Neoplasm Ontology Content

Ling Zheng, Christopher Ochs, James Geller, Hao Liu, Yehoshua Perl, and Sherri Coronado

IEEE International Conference on Big Knowledge (ICBK) Dec 2017

Abs

Big Knowledge repositories, in the form of large ontologies, typically consist of many thousands of knowledge assertions. They have complex network structures consisting of …
AIM

From SNOMED CT to Uberon: Transferability of Evaluation Methodology between Similarly Structured Ontologies

Gai Elhanan, Christopher Ochs, Jose L V Mejino, Hao Liu, Christopher J Mungall, and Yehoshua Perl

Artif. Intell. Med. Jun 2017

Abs

OBJECTIVE: To examine whether disjoint partial-area taxonomy, a semantically-based evaluation methodology that has been successfully tested in SNOMED CT, will perform with similar effectiveness on Uberon, an anatomical ontology that belongs to a structurally similar family of ontologies as SNOMED CT. METHOD: A disjoint partial-area taxonomy was generated for Uberon. One hundred randomly selected test concepts that overlap between partial-areas were matched to a same size control sample of non-overlapping concepts. The samples were blindly inspected for non-critical issues and presumptive errors first by a general domain expert whose results were then confirmed or rejected by a highly experienced anatomical ontology domain expert. Reported issues were subsequently reviewed by Uberon’s curators. RESULTS: Overlapping concepts in Uberon’s disjoint partial-area taxonomy exhibited a significantly higher rate of all issues. Clear-cut presumptive errors trended similarly but did not reach statistical significance. A sub-analysis of overlapping concepts with three or more relationship types indicated a much higher rate of issues. CONCLUSIONS: Overlapping concepts from Uberon’s disjoint abstraction network are quite likely (up to 28.9%) to exhibit issues. The results suggest that the methodology can transfer well between same family ontologies. Although Uberon exhibited relatively few overlapping concepts, the methodology can be combined with other semantic indicators to expand the process to other concepts within the ontology that will generate high yields of discovered issues.

2014

IEEE

Microblogging as a Social Sensing Tool

Hao Liu, Ziqian Dong, and Huanying Gu

11th IEEE International Conference on Networking, Sensing and Control Jun 2014

Abs

Microblogging has become a popular communication tool among Internet users. Millions of users retrieve information and share opinions on different aspects of their daily lives using …