Why do we need to do loss.cuda() when we we have already done model.cuda()?

Hi,

I do not know the implementation of those loss functions exactly (if they are modified), but if a criterion does not have any parameters, then sending to cuda makes no difference as there is no operation to change parameters in cuda.

Bests