While computing VGG Perceptual loss, although I have not seen, I feel it is alright to wrap the computation of VGG features for the GT image inside torch.no_grad()
.
So basically I feel the following will be alright,
with torch.no_grad():
gt_vgg_features = self.vgg_features(gt)
nw_op_vgg_features = self.vgg_features(nw_op)
# Now compute L1 loss
or one should use,
gt_vgg_features = self.vgg_features(gt)
nw_op_vgg_features = self.vgg_features(nw_op)
In both approaches requires_grad
for VGG parameters is set False
and VGG put in eval()
mode.
The first approach will save a lot of GPU resources and feel should be numerically equal to the second one as no backpropagation is required through GT images. But in most implementations, I find the second approach used for computing VGG perceptual loss.
So which option we should go for implementing VGG perceptual loss?