This study introduced a Drug Response Prediction (DRP) framework that provides a complete pipeline to predict disease activity scores and identify the group that does not respond well to anti-TNF treatments, thus showing promise in supporting clinical decisions based on EHR information.
Rheumatoid arthritis (RA) is the most common inflammatory arthritis, affecting 1% of the population. It is an autoimmune condition resulting in significant joint destruction and morbidity. Machine learning (ML) has the potential to identify patterns in patient electronic health records (EHR) to forecast the best clinical treatment to improve patient outcomes. This study introduced a Drug Response Prediction (DRP) framework with two main goals: 1) design a data processing pipeline to extract information from tabular clinical data, and then preprocess it for functional use, 2) predict RA patients’ response to drugs and evaluate classification models’ performance. We propose a novel two-stage ML framework based on European Alliance of Associations for Rheumatology (EULAR) criteria cutoffs to model drug effectiveness. In the first stage, the ML models regress the changes in the Disease Activity Score in 28 joints (ΔDAS28) of patients who are bio-naïve to anti-tumor necrosis factor (TNF) treatments; in the second stage, the patient’s responses to drugs are classified using predicted ΔDAS28 scores with thresholds. We empirically show that such division into subtasks significantly improves the accuracy of predicting drug effectiveness in RA patients. Furthermore, regression of ΔDAS28 scores makes our model more interpretable to health care providers, and the classification of the change between initial and 3-month DAS scores would give an easy-to-understand binary recommendation. Our model Stacked-Ensemble DRP was developed and cross-validated using data from 425 RA patients. The evaluation used a subset of 124 patients (30%) from the same data source. In the evaluation of the test set, two-stage DRP leads to improved classification accuracy over other end-to-end classification models for binary classification. Our proposed method provides a complete pipeline to predict disease activity scores and identify the group that does not respond well to anti-TNF treatments, thus showing promise in supporting clinical decisions based on EHR information. The code is open source and is available on GitHub: https://github.com/Gaskell-1206/Ensemble_DRP.