Combining Trained Models in PyTorch

The .requires_grad attribute of parameter of the model, which should be updated, should not be set to False.

You can disable the gradient calculation for models, which should be frozen as seen in this example:

modelA = nn.Linear(1, 1)
modelB = nn.Linear(1, 1)
for param in modelB.parameters():
    param.requires_grad = False

out = modelA(torch.randn(1, 1))
out = modelB(out)
out.backward()

for param in modelA.parameters():
    print(param.grad) # valid grads
    
for param in modelB.parameters():
    print(param.grad) # None

torch.backends.cudnn.enable = False or the context manager with torch.backends.cudnn.flags(enabled=False) would disable cudnn.