Gradient of tensor is none, while the gradient of its copy is not none?

   for batch_idx, (data, target) in enumerate(train_loader):
        meta_data = data
        meta_data.requires_grad = True
        print(data.requires_grad) // prints True, since we are not doing deep copy
        data, target =, // moving tensors to GPU
        output = model(data)
        loss = F.nll_loss(output, target)
        print(meta_data.grad) //prints the corresponding gradient
        print(data.grad) // prints None, Why is this the case ??
        print((data - meta_data.cuda()).norm(2))

My question is why is meta_data.grad not none, while data.grad is none ? is it because of moving the data to the gpu ? Thanks in advance!


It is because .to() is a differentiable op. And so the data that you get back is not a leaf anymore (if you print it, you’ll see the grad_fn attached to it).

Thank you, this resolves it !