RareBERT: Transformer Architecture for Rare Disease Patient Identification using Administrative Claims

20Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.

Abstract

A rare disease is any disease that affects a very small percentage (1 in 1,500) of population. It is estimated that there are nearly 7,000 rare disease affecting 30 million patients in the U. S. alone. Most of the patients suffering from rare diseases experience multiple misdiagnoses and may never be diagnosed correctly. This is largely driven by the low prevalence of the disease that results in a lack of awareness among healthcare providers. There have been efforts from machine learning researchers to develop predictive models to help diagnose patients using healthcare datasets such as electronic health records and administrative claims. Most recently, transformer models have been applied to predict diseases BEHRT, G-BERT and Med-BERT. However, these have been developed specifically for electronic health records (EHR) and have not been designed to address rare disease challenges such as class imbalance, partial longitudinal data capture, and noisy labels. As a result, they deliver poor performance in predicting rare diseases compared with baselines. Besides, EHR datasets are generally confined to the hospital systems using them and do not capture a wider sample of patients thus limiting the availability of sufficient rare disease patients in the dataset. To address these challenges, we introduced an extension of the BERT model tailored for rare disease diagnosis called RareBERT which has been trained on administrative claims datasets. RareBERT extends Med-BERT by including context embedding and temporal reference embedding. Moreover, we introduced a novel adaptive loss function to handle the class imbalance. In this paper, we show our experiments on diagnosing X-Linked Hypophosphatemia (XLH), a genetic rare disease. While RareBERT performs significantly better than the baseline models (79.9% AUPRC versus 30% AUPRC for Med-BERT), owing to the transformer architecture, it also shows its robustness in partial longitudinal data capture caused by poor capture of claims with a drop in performance of only 1.35% AUPRC, compared with 12% for Med-BERT and 33.0% for LSTM and 67.4% for boosting trees based baseline.

Cite

CITATION STYLE

APA

Prakash, P. K. S., Chilukuri, S., Ranade, N., & Viswanathan, S. (2021). RareBERT: Transformer Architecture for Rare Disease Patient Identification using Administrative Claims. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 1, pp. 453–460). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i1.16122

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free