Robustness and Generalizability of Speech Biomarkers
- Host Institution: ki:elements
- PhD Enrolment: University of Bonn
- Start Date: October 2025
- Duration: 36 months
- Official PhD Supervisor: Holger Fröhlich
Even though several studies have already demonstrated that speech analysis is promising for early diagnosis and disease monitoring in PD, there are several inherent factors that are influenced by speaker characteristics (e.g. language, dialect, gender, age) and recording condition (e.g. type of microphone, environmental noise). These factors may influence the comparability of collected data and thus affect the generalizability of AI/ML models.
Hence, this project has two objectives:
We will employ newly collected data within the LuxPARK study (collected by DC5), which will integrate information about recording conditions, microphone type, and the speaker’s dialect. Notably, data from the same subject under different conditions will be collected. Voice features using the typical workflow implemented at KIE will be extracted. We will statistically analyse the potentially confounding effect of recording conditions, microphone type, and speaker’s dialect on voice features. Provided that there are significant effects, we will subsequently train and test a series of AI/ML models for discriminating between PD and controls as well as between disease progression clusters identified in earlier work (Hähnel et al., submitted). Using XAI techniques such as SHAP we will analyse robustness features under changing conditions and explore whether a robust feature subset can be identified. Furthermore, we will assess whether adding features related to the context of recording (e.g. recording condition) to the training data could help to improve model generalizability of AI/ML models. Alternatively, we will explore modern domain adaptation approaches, in particular transfer learning, to adapt AI/ML models to the data of an individual patient. More specifically, we plan to pre-train modern timer series transformer models on voice signals or extracted low-level features. After pre-training, time series transformers can be fine-tuned on the data of a single patient observed up to a certain time point while employing a typical sliding window approach. We will investigate whether such a strategy could help to improve the prediction performance relative to the more conventional approach using traditional feature engineering and ML. DC6 will work closely together with DCs5 and 4 due to the common focus on speech biomarkers.
This project is part of the "Trustworthiness" work package.