Background: Clinical phenotyping of patients at ED triage may improve resource assignment and risk assessment. We hypothesized that computational summarization of electronic health record data would reveal patient phenotype clusters that stratify by visit disposition.
Methods: This retrospective study included ED visits between March 2014 and July 2017 from three EDs resulting in either admission or discharge. A total of 972 variables were extracted per patient visit, including demographics, chief complaint, historical vitals, labs, and medications. We used principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) to summarize 100,000 patient visits bootstrapped from a total of 560,000 visits while withholding disposition decision and triage score. We then assigned phenotype clusters with Gaussian Mixture Models and calculated cluster agreement with Adjusted Rand Index (ARI).
Results: Sampled patient visits had an admission rate of 29.72 ± 0.03% (95% CI). Visualization of data available at triage summarized with PCA revealed three phenotypes stratified by admission risk (54.42 ± 0.08%, 17.74 ± 0.10%, 17.31 ± 0.05%), while UMAP revealed four phenotypes (max: 49.75 ± 1.85%, min: 19.75 ± 0.14%). For both methods, phenotypes differed primarily by demographic and socioeconomic factors. We then algorithmically selected the optimum number of clusters for PCA finding 24.7 ± 2.7 phenotypes with a range of admission risks 8.15 ± 0.06% to 75.36 ± 0.24%. The ARIs for PCA and UMAP with 25 clusters were 0.55 ± 0.02 and 0.41 ± 0.01, suggesting that consistent group assignments were found with each method.
Conclusion: Computational patient phenotyping with PCA and UMAP reveals a finite, reproducible number of clinical clusters that risk stratify patients by disposition at the time of triage. Further research is needed to extract patient characteristics defining each phenotype.