Difference between apply an call for an autograd function

The custom function is completely stateless. So you can just create the Tensor based on the input you get in the forward.
Since it is stateless, you won’t be able to re-use it anyway. So there is not issue with model.cuda().

Okay @albanD
But If I want to use DataParallel. Then I can’t define a tensor inside forward function like mytensor=torch.zeros(4).to(device). Because this tensor will be always sent to device 0 and for dataparallel this tensor’s copy should be available on all the devices.

This is true only for nn.Module, not for custom Functions.
The input to your custom Function will actually be on the right GPU already. So if you use the same device as the input, it will work fine.

1 Like