The default value of size_average=True in loss function is a trap

magic282 · June 23, 2017, 11:52am

Hi all,

I am not going to start a new issue and would like to discuss it here.

I think that the default value of size_average=True is somewhat a trap. I am working on NLP and not very familiar with other areas. For me, it took me two days debugging this. At least in NLP, I could not see any reasonable motivation to set this value to True by default.

smth · June 27, 2017, 9:20pm

hmmm, you would want to normalize the loss by the number of samples right?

magic282 · June 28, 2017, 1:32am

Not always.
Suppose in NMT task. The output with shape (100, 64, 30000), where 100 is output sequence length, 64 is the batch size and 30000 is the vocab size. We do a reshape to have output (100 * 64, 30000) and get the loss. However, we divide the loss by 64, not by 100*64. That is the problem I have met.

smth · June 28, 2017, 2:21am

you have a constant sequence length. In NMT, you usually have variable length sentences and you want to normalize by the sentence length right?

Either ways, I’m not an expert here
size_average has been the default in Torch, and continues to be in PyTorch, maybe i can have a note in the basic tutorial about this.
Thanks for the feedback.

magic282 · June 28, 2017, 2:53am

It is an example and the sentences are of variable length. But it seems we don’t normalize the loss by sentence length, for example in opennmt’s implementation https://github.com/OpenNMT/OpenNMT-py/blob/master/train.py#L169