The impact of moving a module to cuda is actually to move all it’s parameters to cuda.
Criterion don’t have parameters in general, so it is not necessary to do it.
Apologies for the direct message.
If I am using a dynamically created tensor for loss calculation, what would be the recommended approach for writing the loss function for optimizing Cuda command use? Ex. -
From your code, I think the main issue is that you’re using numpy arrays (which cannot be on GPU).
You might want to keep everything as Tensors and when you create new Tensors, you can pass the device= kwarg. In this case, you want to match the device of the inputs I guess so attn_val.device.