Why removing VGG gradient in perceptual loss

I saw some remove the VGG model gradient when they train style transfer or perceptual loss in this way.

**for** param **in** vgg.parameters():
param.requires_grad_( **False** )

Isn’t necessary to compute the gradient since we are back propagating the VGG to the generated model ?

If some input tensors still need the gradient, backpropagation will work correctly, just avoiding to store the gradient in all parameters using .requires_grad_(False).

I tested with and without .requires_grad_(False) . results are different.

How large is the difference?
Did you try to make the results deterministic following these docs?
If the difference is at approx. <=1e-5 it might be due to FP32 precision.

1 Like