Introduction: The overwhelming majority of patients diagnosed with prostate cancer will die from a competing cause. As a result, predicting mortality is an essential consideration in disease management. While several predictive models have been developed, relatively few urologists use these tools. Models with fewer inputs that accurately predict other cause mortality (OCM) may better promote usability. We aimed to identify the most influential variables for predicting OCM for men with prostate cancer.
Methods: Using SEER-CAHPS data, we identified men 65 years and older diagnosed with prostate cancer from 2004 to 2013. We then identified 76 candidate input variables inclusive of patient demographics, cancer information, claims-based health indicators, and patient-reported health measures. Next, we applied Least Absolute Shrinkage and Selection Operator (LASSO) regression, a machine-learning technique, to identify the core subset of predictive variables for OCM. Models were selected based on the Schwartz Bayesian information criterion (SBC), with lower values indicating better performance.
Results: Among 3,240 men diagnosed with prostate cancer, 246 (7.62%) died of prostate cancer and 631 (19.48%) died of other causes. LASSO regression identified an 18-variable model consisting of 1 demographic, 3 cancer, 10 claims-based, and 4 patient-reported variables. Among the 17 health indicators in Charlson Comorbidity Index, only congestive heart failure (CHF) and chronic obstructive pulmonary disease (COPD) were selected. The top 6 variables in the LASSO model (i.e., age, patient-reported general health, patient-reported comorbidity count, and claims-based indicators for CHF, COPD, and ambulance) accounted for most of the predictive performance, yielding a SBC of -5611.10 vs. -5701.93 for the full model (Figure). A model using just these 6 variables produced a time-dependent AUC of 0.758 vs. 0.783 for the full model.
Conclusions: Estimating OCM in men with prostate cancer can be accurately accomplished using relatively few data inputs through the inclusion of patient-reported and claims-based health measures. Incorporating different data types in combination with novel machine learning techniques may produce less burdensome tools that facilitate use among urologists. Source of
Funding: Brooke Namboodri Spratte was supported by Summer Medical Student Fellowship Program sponsored by the American Urological Association through the support of the Herbert Brendler, MD, Research Fund.
Hung-Jui Tan, MD, MSHPM was supported by a Mentored Research Scholar Grant in Applied and Clinical Research, MRSG-18-193-01-CPPB, from the American Cancer Society as well as the NIH Loan Repayment Program.