.backward() results in variable being empty?

ProGamerGov · March 18, 2018, 10:29pm

When running this code:

def forward(self, input):
        self.output = input.clone()
        self.G = self.gram(input)
        self.G.div(input.nelement())
        self.targetP = nn.Parameter(self.target,requires_grad=False)
        self.loss = self.strength * self.crit(self.G.cpu(), self.targetP.cpu())

def backward(self, input, gradOutput):
          print(self.G.size())
          self.GP = nn.Parameter(self.G.data,requires_grad=True)
          self.dG = self.GP.backward(self.GP, retain_graph=False)
          print(self.dG)

My self.G tensor has a size of:

(1L, 64L, 64L)

But after using .backward() on it, the resulting dG variable ends up being empty:

None

Am I using .backward() incorrectly here?

ProGamerGov · March 19, 2018, 12:10am

Here’s what I am trying to do:

class StyleLoss(nn.Module):

    def __init__(self, strength, normalize):
        super(StyleLoss, self).__init__()
        #self.target = torch.Tensor()
        self.target = Variable((torch.Tensor()),requires_grad=False)
        self.strength = strength
        self.gram = GramMatrix()
        self.criterion = nn.MSELoss()
        self.mode = None
        self.blend_weight = None
        self.G = None
        self.normalize = 'False'


    def forward(self, input):
        self.output = input.clone()
        self.G = self.gram(input)
        self.G.div(input.nelement())
        if self.mode == 'capture':
          if self.blend_weight == None:
            self.target.data.resize_as_(self.G.cpu().data).copy_(self.G.cpu().data)
          elif self.target.nelement() == 0:
            self.target.data.resize_as_(self.G.cpu().data).copy_(self.G.cpu().data).mul_(self.blend_weight.cpu())
          else:
            self.target.data.add(self.blend_weight.cpu(), self.G.data)
        elif self.mode == 'loss':
            self.loss = self.strength * self.criterion(self.G.cpu(), self.target.cpu())
        return self.output

After running the forward function, I need to run self.G backwards to create dG, like in this Lua/Torch code: dG = self.crit:backward(self.G, self.target), but unlike Lua/Torch, PyTorch doesn’t have a backwards pass function (at least one that works in the same way) like Lua/Torch.

    def backward(self, input, gradOutput):
        if self.mode == 'loss':
          dG = self.G.backward(self.G, self.target)
          self.gradInput = self.gram.backward(input, self.dG)
          if self.normalize == 'True':
	         self.gradInput.div(torch.norm(self.gradInput, 1) + 1e-8) # Normalize Gradients
	      self.gradInput.mul(self.strength)
          self.gradInput.add(gradOutput)
        else:
          self.gradInput = gradOutput
        return self.gradInput

Could someone please help me with that this? Should I somehow be using a backwards hook? If so, how would that work?

albanD · March 20, 2018, 10:34am

Hi,

Few things:

self.G.div(input.nelement()) here, div is not an inplace operation, div_ would be inplace (all inplace operation have an _ postscript). So this line is not actually doing anything. You migh want to replace it with self.G = self.G.div(input.nelement()).
When using pytorch, you don’t need to worry about how the backward pass should work. Just write your forward pass doing exactly what you want to do without unpacking Variables. And then just call .backward() on your final loss and it will do a backward corresponding exactly to what you did while computing this loss.
Never unpack Variables otherwise the backward pass cannot be computed: this line self.target.data.resize_as_(self.G.cpu().data).copy_(self.G.cpu().data) should just be self.target = self.G. Similarly, self.target.data.resize_as_(self.G.cpu().data).copy_(self.G.cpu().data).mul_(self.blend_weight.cpu()) should just be self.target = self.G * self.blend_weight. And self.target.data.add(self.blend_weight.cpu(), self.G.data) shuold be self.target = self.target + self.blend_weight * self.G.
I am not sure why you call .cpu(), but that does not look like it is needed here, the criterion can be computed on GPU without any issue.