I have two networks.
One network is a diffusion network that makes diffused images.
The other network is the classifier.
If the Classifier network uses the output of the diffusion network, is the diffusion network’s weight also updated?
The classifier network’s loss is CrossEntropy(predict_label, true label).
And predict label is the output of the classification network.
Classification network’s input is the diffusion network’s output.
backward() call on the output of the classifier network, the parameters of the diffusion network will get their gradients populated (or accumulated if the gradients are already set to some value). The classifier network just adds operations on top of the first network.
This sort of behaviour can be easily disabled by calling
detach() on the output of the first network, before passing it as an input to the second network.
output_classifier = classifier(output_diffusion.detach())
detach() returns a new tensor that points to the same underlying storage, but no longer tracks gradient (
requires_grad is set to False). This means that the gradient that flows through the second network can no longer be propageted through the first network.