
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

poster
CoCluster: A Framework for Comorbidity, Complication, and Covariate Clustering via Machine Learning
Background Clustering is a machine learning technique that identifies distinct groups based on similarities in a given set of variables. This project leverages binary comorbidity, complication, or confounding data to identify distinct patient groups with varying risk profiles. Patients are grouped by similar combinations in a set of variables, this aims to identify key factors associated with an outcome as well as unique, nuanced interactions that are not apparent we term the process of clustering based on comorbidities, complications, or covariates, “CoClustering”. This study aims to guide researchers in applying CoClustering, including selecting appropriate clustering algorithms and understanding their statistical foundations.
Methods For the illustrative example provided in this framework, the 2015-2019 National Inpatient Sample (NIS) was queried for patients diagnosed with Cerebral Infarction. Using the K-Modes algorithm, patients were clustered based on 18 clinically relevant comorbidities and age groups <65, 65-79, 80+). Cluster quality was assessed using the Davies-Bouldin Index (DBI) and Calinski-Harabasz Index (CHI) to determine the optimal number of clusters. Post-clustering analysis included Odds Ratio analysis of mortality amongst clusters, multivariate logistic regression adjusting for sex, race, income quartile and primary payer.
Results In the illustrative example, nine unique clusters were formed. Post-clustering analysis showed statistically significant differences in mortality, the highest mortality group (Group 9) had an OR of 6.05 (95% CI: 4.99-7.33) and an AOR of 6.27 (95% CI: 5.11-7.69) when compared to Group 1.
Conclusion This framework provides clinical researchers with a practical approach to apply clustering for the identification of subgroups in diverse clinical datasets, and a demonstration of the utility of clustering in analyzing highly dimensional comorbidity data. By effectively managing high dimensionality and sparsity in large datasets, machine learning clustering algorithms like K-Modes can reveal clinically relevant patterns and key variables associated with an outcome. The CoClustering technique can be applied broadly, ranging from offering a preliminary analysis of a clinical dataset to identify salient variables to further investigate, to grouping patients with a common diagnosis to better understand individual prognosis, further enabling highly personalized care.