Using backward cannot calculate grad for input

I am new to pytorch, my task is to fix the weight of the network and update the input of the network iteratively for many times. In the 1st iteration(t = 0), I can get the grad for the input. However for the second run (says t = 1), when I use .backward. I cannot fetch the grad of the input, the grad is None. I’m really apprecaite if anyone here can help. Thank you in advance!

 for t in range(self._opt.temperature):
            syn_energy = self._Des128(sample_seq) # deep network
            print(syn_energy.sum())
            syn_energy.sum().backward(retain_graph=True) # backward the output
            temp = sample_seq + 0.5 * self._opt.sampling_step * self._opt.sampling_step * sample_seq.grad
            sample_seq = sample_seq * (1-mask) + temp * mask
            sample_seq.clamp_(0.0, 1.0) # min , max

Do you have sample_seq.requires_grad set to True in 2nd iteration?

I didn’t set it every iteration. I only set it once. I have added one line in the loop after forward the input

            print(t,syn_energy.sum(),sample_seq.requires_grad)

The ouput is as follows:
0 tensor(-1.0209, device=‘cuda:1’, grad_fn=SumBackward0) True
1 tensor(6.3761, device=‘cuda:1’, grad_fn=SumBackward0) True
However, sample_seq.grad is None.

I would say, try creating a new tensor before forward() call everytime.

sample_seq = sample_seq.detach().clone()
sample_seq.requires_grad = True
...forward()...

To me, it looks like you are involving the same variable sample_seq in further calculations and its difficult to trace . I am not sure if this is the issue. But you can try.

1 Like

Thank you!! I try it and it can actually works. But this approach will copy the data every iteration which will slow down the process. I agree the problem exists because I reuse the same input in every iteration. I wonder is there any in-place method instead of copy the data to a new memory?

I think, retain_graph=True is unnecessary.

I feel bad to write spaghetti code. But try ths :slight_smile:

syn_energy.sum().backward()

# creating next input
new_sample_seq = sample_seq.detach()
temp = new_sample_seq + 0.5 * self._opt.sampling_step * self._opt.sampling_step * sample_seq.grad
new_sample_seq = new_sample_seq * (1-mask) + temp * mask
new_sample_seq.clamp_(0.0, 1.0) # min , max
new_sample_seq.requires_grad = True
sample_seq = new_sample_seq

Basically, I am detaching the input variable so that it doesn’t get any gradients from these operations.

Thank you very much!! I have tried datach before but forget to assign it to itself( >____< ). This is my final code which can work.

        for t in range(self._opt.temperature):
            syn_energy = self._Des128(sample_seq)  # deep network
            syn_energy.sum().backward()  # backward the output
            # print(t,syn_energy.sum(),sample_seq.requires_grad)
            temp = sample_seq + 0.5 * self._opt.sampling_step * self._opt.sampling_step * sample_seq.grad
            sample_seq = sample_seq * (1 - mask) + temp * mask
            sample_seq.clamp_(0.0, 1.0)  # min , max
            sample_seq = sample_seq.detach()
            sample_seq.requires_grad = True