For my application it’s more efficient to keep gradients in factored form, ie, without applying the backprops @ activations matmul. Is there a recommended way to disable this computation, but run the rest of the autograd as expected?
requires_grad=False should work in your use case. It won’t stop autograd to run the backprop on the rest of the model, and only compute intermediate buffers needed by the rest of the computations.
Thanks for the tip, I think that works for me. It seems some care needs to be taken not to set this on leaf parameters
layer1 = nn.Linear(1,1, bias=False) layer2 = nn.Linear(1,1, bias=False) net = nn.Sequential(layer1, layer2) x = torch.ones((1)) layer2.weight.requires_grad=False net(x*x).backward() assert layer2.weight.grad is None layer1.weight.requires_grad=False net(x*x).backward() # nothing to propagate
That’s correct, at least one leaf node has to
autograd starts recording the graph.