TY - JOUR
T1 - A Transformer-Based Model Trained on Large Scale Claims Data for Prediction of Severe COVID-19 Disease Progression
AU - Lentzen, Manuel
AU - Linden, Thomas
AU - Veeranki, Sai
AU - Madan, Sumit
AU - Kramer, Diether
AU - Leodolter, Werner
AU - Fröhlich, Holger
N1 - The work is licensed under CC BY 4.0 Deed https://creativecommons.org/licenses/by/4.0/
PY - 2023/6/22
Y1 - 2023/6/22
N2 - In situations like the COVID-19 pandemic, healthcare systems are under enormous pressure as they can rapidly collapse under the burden of the crisis. Machine learning (ML) based risk models could lift the burden by identifying patients with a high risk of severe disease progression. Electronic Health Records (EHRs) provide crucial sources of information to develop these models because they rely on routinely collected healthcare data. However, EHR data is challenging for training ML models because it contains irregularly timestamped diagnosis, prescription, and procedure codes. For such data, transformer-based models are promising. We extended the previously published Med-BERT model by including age, sex, medications, quantitative clinical measures, and state information. After pre-training on approximately 988 million EHRs from 3.5 million patients, we developed models to predict Acute Respiratory Manifestations (ARM) risk using the medical history of 80,211 COVID-19 patients. Compared to Random Forests, XGBoost, and RETAIN, our transformer-based models more accurately forecast the risk of developing ARM after COVID-19 infection. We used Integrated Gradients and Bayesian networks to understand the link between the essential features of our model. Finally, we evaluated adapting our model to Austrian in-patient data.
AB - In situations like the COVID-19 pandemic, healthcare systems are under enormous pressure as they can rapidly collapse under the burden of the crisis. Machine learning (ML) based risk models could lift the burden by identifying patients with a high risk of severe disease progression. Electronic Health Records (EHRs) provide crucial sources of information to develop these models because they rely on routinely collected healthcare data. However, EHR data is challenging for training ML models because it contains irregularly timestamped diagnosis, prescription, and procedure codes. For such data, transformer-based models are promising. We extended the previously published Med-BERT model by including age, sex, medications, quantitative clinical measures, and state information. After pre-training on approximately 988 million EHRs from 3.5 million patients, we developed models to predict Acute Respiratory Manifestations (ARM) risk using the medical history of 80,211 COVID-19 patients. Compared to Random Forests, XGBoost, and RETAIN, our transformer-based models more accurately forecast the risk of developing ARM after COVID-19 infection. We used Integrated Gradients and Bayesian networks to understand the link between the essential features of our model. Finally, we evaluated adapting our model to Austrian in-patient data.
KW - COVID-19
KW - Precision Medicine
KW - trans-former-based models
KW - Humans
KW - Pandemics
KW - Bayes Theorem
KW - Machine Learning
KW - Disease Progression
KW - Electronic Health Records
UR - https://www.mendeley.com/catalogue/b5a6e201-b0a0-3d08-8e56-79e1c8afdf51/
U2 - 10.1109/JBHI.2023.3288768
DO - 10.1109/JBHI.2023.3288768
M3 - Article
C2 - 37347632
SN - 2168-2194
VL - 27
SP - 4548
EP - 4558
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
IS - 9
ER -