Exploring Dementia Using Patient Records From Longitudinal MRI Data

Source Code

A project aiming to distinguish healthy patient records from cases of dementia using longitudinal MRI data.

Project Summary:

The "Exploring Dementia Using Patient Records From Longitudinal MRI Data" project was centered on analysing the progression of dementia, including Alzheimer's disease, by examining MRI patient records. The data consisted of five individual datasets, representing 150 patients who underwent up to five MRI scan visits each. This comprehensive data collection tracked dementia and Alzheimer's status across multiple time points.

In this study, extensive data pre-processing was applied to ensure accuracy and consistency in the dataset. Two primary predictive models, the Random Forest Classifier and the Support Vector Machine, were utilised to discern patterns and relationships indicative of dementia within the data. A thorough comparison analysis of these models was conducted to assess their performance in identifying key dementia-related features.

The conclusion of the project highlighted the effectiveness of these models in predicting dementia cases, providing valuable insights into the disease's progression and the most significant attributes for its detection. This analysis contributed to the broader understanding of dementia and Alzheimer's disease, demonstrating the potential of advanced data analysis in medical research.

Key Results Visualised:

Project Conclusion:

Within the project, two objectives were pursued: distinguishing healthy patient records cases from cases of dementia, including Alzheimer’s disease, and discovering the most predictive attributes. The initial stages of the analysis encompassed data pre-processing and exploratory analysis, with MICE imputation selected for handling missing data. After this, two machine learning models were implemented with nested cross-validation to ensure robustness. A comparative analysis followed, aiming to discern the best performing model. Addressing the second objective, feature importance’s for both models were scrutinised, and an analysis of SHAP values was conducted for the Random Forest model, to uncover the most predictive attributes.

The comprehensive evaluations conducted led to the determination that the Random Forest Classifier stands out as the best model. This conclusion was supported by its superior performance during both cross-validation and final model assessment, coupled with the interpretative benefits afforded by SHAP analysis – an advantage particularly beneficial to the Random Forest model. Looking ahead, a potential future improvement for enhancing the model selection would be to refine the hyperparameter tuning process. Rather than selecting the best-performing fold’s hyperparameters, a more robust approach would involve aggregating the hyperparameters across all folds. This strategy would mitigate the risk of inconsistencies arising from folds that may contain data with varying predictive qualities, thereby strengthening the reliability of the model performance.

Click the Source Code link located near the top of the page to view all of the code and data visualisations.