How to skip .grad computation but still run backward hooks?

For my application it’s more efficient to keep gradients in factored form, ie, without applying the backprops @ activations matmul. Is there a recommended way to disable this computation, but run the rest of the autograd as expected?

Hi @Yaroslav_Bulatov,

Setting requires_grad=False should work in your use case. It won’t stop autograd to run the backprop on the rest of the model, and only compute intermediate buffers needed by the rest of the computations.

Thanks for the tip, I think that works for me. It seems some care needs to be taken not to set this on leaf parameters

layer1 = nn.Linear(1,1, bias=False)
layer2 = nn.Linear(1,1, bias=False)
net = nn.Sequential(layer1, layer2)
x = torch.ones((1))
layer2.weight.requires_grad=False
net(x*x).backward()
assert layer2.weight.grad is None
layer1.weight.requires_grad=False
net(x*x).backward()   # nothing to propagate

Great!
That’s correct, at least one leaf node has to require_grad so autograd starts recording the graph.