I am not going to start a new issue and would like to discuss it here.
I think that the default value of size_average=True is somewhat a trap. I am working on NLP and not very familiar with other areas. For me, it took me two days debugging this. At least in NLP, I could not see any reasonable motivation to set this value to True by default.
hmmm, you would want to normalize the loss by the number of samples right?
Suppose in NMT task. The output with shape (100, 64, 30000), where 100 is output sequence length, 64 is the batch size and 30000 is the vocab size. We do a reshape to have output (100 * 64, 30000) and get the loss. However, we divide the loss by 64, not by 100*64. That is the problem I have met.
you have a constant sequence length. In NMT, you usually have variable length sentences and you want to normalize by the sentence length right?
Either ways, I’m not an expert here
size_average has been the default in Torch, and continues to be in PyTorch, maybe i can have a note in the basic tutorial about this.
Thanks for the feedback.
It is an example and the sentences are of variable length. But it seems we don’t normalize the loss by sentence length, for example in opennmt’s implementation https://github.com/OpenNMT/OpenNMT-py/blob/master/train.py#L169