MultiModal learning of losses - added losses of models of 2 different feature vector modalities stop converging after threshold

Sorry for the late/early day question. I’m trying to do multi-modal ensembling of losses from 2 different models. I get loss1 = criterion(x0,y0) , loss2 = criterion(x1,y1) and loss3 = loss1+loss2. I do optimizer.zero_grad();loss3.backward();optimizer.step() …The loss function would stop converging after a threshold. Do you guys know any ways to continue reducing loss of these feature vectors of different modalities of data?
I tried concatenating tensors of these different modalities of data, but the precision-recall isn’t performant.