Model only returns "untrained" predictions that tend to 0 when in eval mode

Oliver · March 20, 2023, 6:58am

Thanks a lot!

I did but just via the flag use_bn = False and I also tried commenting it out (not sure if I tried that with dropout or bn)
Now I just deleted it completely.

Thanks again, I’ll keep investigating as this is bugging me, I find it rather odd too, I feel there must be some bug somewhere in my code.

Oliver · March 22, 2023, 5:31pm

As an update,
I believe it was the batch_norm layers.

What ended up solving it was switching to swish (nn.SiLU), I got there by trying all different activations and its the only one that worked, although I had to be patient, at first it went the same way, start at around 0.5 then go to 0 and only after about 15 epochs it output > 0. And the validation scores followed a different pattern.

I also tried Layernorms, which got rid of the issue but didn’t perform as well (as just leaving the model in train mode with batchnorm).

SELU without any batchnormalization never converged in the limited training steps I tried (about 20 epochs).

I will continue to investigate and update. Might help someone else.

Oliver · March 23, 2023, 10:39am

Hmm, alright I think I am narrowing down the issue to the dataset having subsets that are quite different from one another and thus have different means and variances.

However, this is a little odd since I have seen the same batchnorm behaviour when I prenormalize (i.e. use transforms to set the mean and variances in the dataset already).