Model converging to similar output regardless the loss magnitude

Hi,

I have a multiple term loss where the term3 is weighted by a hyperparameter w.

loss = term1+0.006*term2+w*term3

I’m trying to do a grid search of my w parameter using values from 0.001 to 1x10^17 in steps multiples of 10. My model was pretrained previously in another dataset but with the term3 set to zero.

Now I would like to load the pretrained model and retrain my model for 25 epochs on each value in the grid search with the influence of term3. Surprisingly, the absurd high values converge into almost the same results as the low values. For a better understanding, my model is an encoder/decoder network. I thought that the higher the term3 is weighted in the loss the most tweaked my model parameters will be and eventually, the more weirdest-looking outputs I would get in the generated samples. Am I missing anything here?

Learning rate and optimizers are kept the same during all training.