For my application it’s more efficient to keep gradients in factored form, ie, without applying the backprops @ activations matmul. Is there a recommended way to disable this computation, but run the rest of the autograd as expected?
Setting requires_grad=False
should work in your use case. It won’t stop autograd to run the backprop on the rest of the model, and only compute intermediate buffers needed by the rest of the computations.
Thanks for the tip, I think that works for me. It seems some care needs to be taken not to set this on leaf parameters
layer1 = nn.Linear(1,1, bias=False)
layer2 = nn.Linear(1,1, bias=False)
net = nn.Sequential(layer1, layer2)
x = torch.ones((1))
layer2.weight.requires_grad=False
net(x*x).backward()
assert layer2.weight.grad is None
layer1.weight.requires_grad=False
net(x*x).backward() # nothing to propagate
Great!
That’s correct, at least one leaf node has to require_grad
so autograd
starts recording the graph.