Research Objectives
Data in medicine is often not representative of the entire disease population. This is because the selection of patients for specific studies or the choice of specific hospitals induces a statistical selection bias, a situation that FH has also investigated empirically in prior work. Consequently, the assumption that training data has been sampled independent and identically distributed (i.i.d.) from the overall population, which is a basis of machine learning theory, is typically violated. This in turn has negative impacts on the generalization ability of AI/ML models and imposes a significant challenge for trustworthiness and for transfer of such models into clinical practice. The objective of this project is to investigate whether the generalization ability of AI/ML models trained on multi-modal clinical and genomic data could be improved by leveraging modern concepts for domain adaptation of neural networks that have mostly been developed in the imaging and natural language processing fields. Given the breadth of possible domain adaptation strategies, we aim to conduct a systematic comparison of representation learning techniques and supervised model adaptation, as well as unsupervised domain adaption-based, including adversarial learning, domain translation, contrastive learning, and invariant feature learning (a causal machine learning technique) using e.g. the TLlib transfer learning library. The use case is supervised prediction of cognitive decline and depression in PD with a neural network, using a multimodal combination of genetic, clinical and demographic data in PPMI, ICEBERG and LuxPARK cohorts. We will formulate these prediction problems as time-to-event-based risk models with an appropriate loss function. XAI techniques such as (causal) SHAP will be used to understand the putative causal influence of genetic, demographic and clinical factors on cognitive decline and depression. DC3 will closely collaborate with DC6, who will tackle generalizability based on speech data.