Variable grad is always None when extending autograd

richard · January 12, 2018, 3:41pm

Yeah it looks like what’s happening is that x = Variable(torch.zeros(…), requires_grad=True).cuda() creates an intermediate Variable y = Variable(torch.zeros(...), requires_grad=True) and then assigns x = y.cuda().

Since y is the leaf node, the gradients only accumulate in y and not x.