Theoretical: Multimodal training with missing data

Hello everyone:

I have a medical dataset with three different modalities:

  • Images
  • Tabular data
  • Genetic data (needs to be handled separately from tabular)

As much I would like to change this, I have a lot of missing data. The % patients with all the data is very small. For example, I only have images for about 10% of the patients.

On a normal multimodal approach, I would have 3 different “heads” of the network that would ingest each modality and somewhere in the middle I would concatenate it (or any other strategy) to send it again through a series of FC layers until the output layer.

If I were to have no missing data, I would just do a forward pass, compute the error and backpropagate it to update all the model’s weights. With so many missing values, if I were to send, for example, an image containing all black whenever I don’t have that modality, I would be biasing my model.

I came up with the idea of having different optimizers for the different “heads” of the network and maybe another that would update all the weights of the shared architecture. I would switch on/off the heads if I don’t have a specific modality and after the forward pass I would call optimizer_n.step() for all the modalities that I did have data. Thus updating only the weights of the nuerons that were used.

Is this a reasonable strategy? I have not got much experience working much with multiple optimizers nor multiple modalities so any advice is greatly welcomed.