# How to calculate VGG feature loss without saving unnecessary gradient

Hi, I used a pre-trained VGG network as a feature extractor and compute L1Loss between VGG features of two images. Before, i implement this by zero gradients of VGG each time.
But today, I saw an implementation which set require_gradients= False . I am curious that if require_dradient is False, how do the gradients backpropagate to the network before VGG? Is gradients of VGG being zeroed after each backpropagation?

Here are the part of the codes.

``````class Vgg19(torch.nn.Module):
super(Vgg19, self).__init__()
vgg_pretrained_features = models.vgg19(pretrained=True).features
self.slice1 = torch.nn.Sequential()
self.slice2 = torch.nn.Sequential()
self.slice3 = torch.nn.Sequential()
self.slice4 = torch.nn.Sequential()
self.slice5 = torch.nn.Sequential()
for x in range(2):
for x in range(2, 7):
for x in range(7, 12):
for x in range(12, 21):
for x in range(21, 30):
for param in self.parameters():

def forward(self, X):
h_relu1 = self.slice1(X)
h_relu2 = self.slice2(h_relu1)
h_relu3 = self.slice3(h_relu2)
h_relu4 = self.slice4(h_relu3)
h_relu5 = self.slice5(h_relu4)
out = [h_relu1, h_relu2, h_relu3, h_relu4, h_relu5]
return out

``````
``````class VGGLoss(nn.Module):
def __init__(self, gpu_ids):
super(VGGLoss, self).__init__()
self.vgg = Vgg19().cuda()
self.criterion = nn.L1Loss()
self.weights = [1.0/32, 1.0/16, 1.0/8, 1.0/4, 1.0]

def forward(self, x, y):
x_vgg, y_vgg = self.vgg(x), self.vgg(y)
loss = 0
for i in range(len(x_vgg)):
loss += self.weights[i] * self.criterion(x_vgg[i], y_vgg[i].detach())
return loss
``````

Do you need the gradients for your pretrained VGG model or are you using it as a fixed feature extractor?
If the latter is true, you can use `Variable(..., volatile=True)`, so that it will use the absolute minimal amount of memory to evaluate the model.

Parameters = network weights (W). It’s telling the network not to compute d/dW.
The inputs (X) still require gradients d/dX, and only these are required for chain rule. d/dW depends on all the d/dX’s ahead of it, but not vice-versa.

1 Like

Hi @Ginsunuva, ,