Yes, the workflow should work as shown in this dummy example:
baseline = nn.Linear(1, 1)
model = nn.Linear(1, 1)
for param in model.parameters():
param.requires_grad_(False)
x = torch.randn(1, 1)
out = baseline(x)
print(out.grad_fn) # prints valid grad_fn
out = model(out)
out.mean().backward()
print(baseline.weight.grad)
Try to debug baseline_m and make sure your output has a valid .grad_fn.
You might accidentally detach the activation by e.g. wrapping it into another tensor, using numpy etc.
When I do print(out.grad_fn) after model(out), I get <TransposeBackward0 object at 0x16b03b898>. Is that normal? Shouldn’t I have no grad fn for model?
My baseline_m is just a single linear layer, so you would have to access a valid module and its parameter.
If baseline_m contains available parameters, the output of model should also have a grad_fn, although you should check that the gradients are not set in model.