Could someone elaborate on which is better/preferred, especially when x is a batch of training data? Thanks!
The former is preferred when you want to train this parameter. Because if you do var_x = Variable(x, requires_grad=True).cuda()
, the gradients are actually copied from gpu to cpu, and accumulated to the Variable(x, requires_grad=True)
term, which you don’t save. So it is lost.
In other cases it shouldn’t make much difference.
2 Likes