Gradient calculated in custom autograd function does not assigned to .grad field

MariosOreo · November 20, 2019, 11:56am

Hello,

I try to run will the custom autograd function as follows, which is used in mixed_operation of ProxylessNAS.

class ArchGradientFunction(torch.autograd.Function):

    @staticmethod
    def forward(ctx, x, binary_gates, run_func, backward_func):
        ctx.run_func = run_func
        ctx.backward_func = backward_func
        detached_x = detach_variable(x)
        with torch.enable_grad():
            output = run_func(detached_x)
        ctx.save_for_backward(detached_x, output)
        return output.data

    @staticmethod
    def backward(ctx, grad_output):
        detached_x, output = ctx.saved_tensors
        grad_x = torch.autograd.grad(output, detached_x, grad_output, only_inputs=True)
        # compute gradients w.r.t. binary_gates
        binary_grads = ctx.backward_func(detached_x.data, output.data, grad_output.data)
        return grad_x[0], binary_grads, None, None

As shown that the gradients w.r.t. binary_gates is calculated in backward. But binary_gates.grad is None when I track it by backward hooks. I know that asking questions for some specific project is not unwise choice. Since the authors close the issue block on Github, I have to ask you for some help, what could be the error of this case?

Thanks in advance!

albanD · November 20, 2019, 2:49pm

Hi,

Asking question for specific projects on the forum is fine.

You should be careful and never use .data. You should use .detach() or with torch.no_grad().

binary_gates.grad will only be populated if it’s a leaf variable and it requires gradients.
Also, the backward hook on nn.Modules (that should not be used, see the doc), run before the .grad field for its inputs is populated. So it is expected even for a leaf Tensor that you don’t see it in the hook.