How can I use 2 models in a training loop?

I have 2 models: model and baseline_m. I froze all of the weights of model, so I can use those gradients to update the parameters of baseline_m.

            baseline_output = baseline_m(baseline_input)

            out, output_sizes = model(baseline_output, input_sizes)
            decoded_output, decoded_offsets = decoder.decode(out, output_sizes)

But then I get:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Which makes sense, since the gradients are frozen for model. But still, shouldn’t baseline_m update?

Yes, the workflow should work as shown in this dummy example:

baseline = nn.Linear(1, 1)
model = nn.Linear(1, 1)

for param in model.parameters():
    param.requires_grad_(False)

x = torch.randn(1, 1)
out = baseline(x)
print(out.grad_fn) # prints valid grad_fn
out = model(out)
out.mean().backward()
print(baseline.weight.grad)

Try to debug baseline_m and make sure your output has a valid .grad_fn.
You might accidentally detach the activation by e.g. wrapping it into another tensor, using numpy etc.

1 Like

Perfect - thank you.

When I do print(out.grad_fn) after model(out), I get <TransposeBackward0 object at 0x16b03b898>. Is that normal? Shouldn’t I have no grad fn for model?

And when I do print(baseline_m.weight.grad), I get AttributeError: 'BaselineModel' object has no attribute 'weight'

My baseline_m is just a single linear layer, so you would have to access a valid module and its parameter.

If baseline_m contains available parameters, the output of model should also have a grad_fn, although you should check that the gradients are not set in model.

1 Like