Machine Learning Models Help Identify Subtypes of Parkinson Disease

Three PD subtypes were identified that have highly predictable progression rates corresponding to slow, moderate, and fast disease progression.

Machine learning models have helped researchers identify 3 Parkinson disease (PD) subtypes, which could help in improving the detection of clinical outcomes, according to study findings published in the journal NPJ Parkinson’s Disease.

Recognizing that an unmet need exists for the characterization of distinct disease subtypes in patients with PD, along with improvements in individualized predictions of disease progression, researchers used unsupervised and supervised machine learning methods on comprehensive, longitudinal clinical data from the Parkinson’s Disease Progression Marker Initiative (PPMI; n=294 cases) to identify patients’ subtypes and predict disease progression. To ensure the generalizability of the results, they replicated and validated the subtype identification in an independent, well-characterized Parkinson’s Disease Biomarkers Program (PDBP) cohort (n=263 cases).

Based on the current analysis, 3 distinct disease subtypes with highly predictable rates of progression were identified, which corresponded to slow, moderate, and rapid disease progression.

Highly accurate projections of disease progression 5 years after initial PD diagnosis were attained, with an average area under the curve (AUC) of 0.92 (95% CI, 0.95±0.01) for those with slow progression (PDvec1); 95% CI, 0.87±0.03 for those with moderate progression (PDvec2); and 95% CI, 0.95±0.02 for those with fast progression (PDvec3) at cross-validation.

We anticipate that machine learning models will improve patient counseling, clinical trial design and ultimately individualized patient care.

The predictor built on baseline and year 1 data performed even better, with an average AUC of 0.953 (95% CI, 0.97±0.01 for PDvec1; 95% CI, 0.91±0.02 for PDvec2; and 95% CI, 0.97±0.01 for PDvec3) at cross-validation.

Besides the cross-validation of the predictive models in the PPMI cohort, the accuracy of the predictive model was validated in the independent PDBP cohort. The predictive model trained on the PPMI baseline data accurately differentiated patients in the PDBP cohort with an AUC of 0.84. The replicated predictive model performs quite well for PDvec1 and PDvec3 (AUCs of 0.91 and 0.88, respectively). Because of the small sample size, however, the predictive model does not predict as well on PDvec2 (AUC of 0.73).

Among key biomarkers of interest, serum neurofilament light was identified as a significant indicator of rapid disease progression. The current study findings were replicated in an independent cohort, with the analytical code released and machine learning models developed in an open science manner.

The main limitation involved in the use of such approaches is the fact that they require large datasets to facilitate model construction, replication, and validation. Longer follow-up periods, greater ancestral diversity in study samples, and larger sample sizes are all key to broadening the applicability of this work.

“Our data-driven study provides insights to deconstruct PD heterogeneity.  This approach could have immediate implications for clinical trials by improving the detection of significant clinical outcomes. We anticipate that machine learning models will improve patient counseling, clinical trial design, and ultimately individualized patient care,” the researchers concluded.

Disclosure: Some of the study authors have declared affiliations with biotech, pharmaceutical, and/or device companies. Please see the original reference for a full list of authors’ disclosures. 


Dadu A, Satone V, Kaur R, et al. Identification and prediction of Parkinson’s disease subtypes and progression using machine learning in two cohorts. NPJ Parkinsons Dis. Published online December 16, 2022. doi:10.1038/s41531-022-00439-z