Loss abnormal after using batchnorm1d

fatmelon · December 11, 2019, 8:43am

Hi, I’m training a model with BCEWithLogitsLoss. The model has a head like this:

self.head = nn.Linear(config.hidden_size, num_labels)

My loss during this training is about 0.3, when I use the head below:

self.head = nn.Sequential(  #
            nn.Linear(config.hidden_size, 150),
            nn.BatchNorm1d(150),
            nn.Linear(150, num_labels),
        )

The loss goes to 0.6. Is this a normal situation?

ptrblck · December 11, 2019, 3:51pm

Is just the initial loss higher?
How does the training look like? Are you able to get the same or lower loss with the batchnorm layer or does it stay at a higher level constantly?

fatmelon · December 11, 2019, 4:51pm

The training process is roughly as follows:

The first one uses BN, the second one does not use BN. The training code is long, I send you a message with it. Thank you.

ptrblck · December 11, 2019, 4:54pm

While the training loss seems to be higher, the validation score seems to be better in the batchnorm model.
In that case, I would stick to it and maybe play around with some hyperparameters.

fatmelon · December 11, 2019, 4:57pm

Yes, the batchnorm model has better validation score, but worse loss. Is it because of the multi-gpu?

ptrblck · December 12, 2019, 5:30am

It’s hard to tell, where exactly this effect comes from and I have not really any idea, so let’s wait for some experts on this topic