No grad accumulator for a saved leaf! error

divinho · February 11, 2020, 6:29pm

My network applies modules a, b and c upon the input in that order.
c looks like:

    def forward(self, anch, pos, neg):
        pos_out = self.fc(anch) + self.fc(pos)
        pos_out = to.sigmoid(self.fc_final(fu.relu(pos_out)))
        neg_out = self.fc(anch) + self.fc(neg)
        neg_out = to.sigmoid(self.fc_final(fu.relu(neg_out)))
        binary_loss = (-to.log(pos_out + 1e-7) - to.log(1 - neg_out)).mean()
        return binary_loss

When I wrap b (just does some indexing operations) in torch.no_grad(), I get an error RuntimeError: No grad accumulator for a saved leaf!. It works when I don’t wrap it, or when I change forward to concat the inputs before calling self.fc (a fully connected layer, of course the input dimension is changed when I do that).

What’s going on? And is wrapping with torch.no_grad() the right thing when I don’t want some operations to have an effect on the gradient?

albanD · February 11, 2020, 11:07pm

Hi,

There is no b in your code sample.
Could you share both code: the one that works and the one that fails so that we can reproduce this? Thanks !

divinho · February 12, 2020, 12:07am

This is b:

def create_pos_neg(x):
    """ x has shape (N, T, C) """
    with to.no_grad():
        sz = x.size(1) - 1
        idx = to.randint(0, sz, size=(2,)).cuda()
        idx_offset = to.randint(max(-idx[0].item(), -3), min(sz-idx[0].item(), 3), size=(1,))
        if idx_offset.item() == 0:
            idx_offset = 1
        else:
            idx_offset = idx_offset.item()
        indcs = to.randperm(x.size(0)).cuda()
        anchor = x[:, idx[0]]
        pos = x[:, idx[0].item()+idx_offset]
        neg = x[indcs, idx[1]]
    return anchor, pos, neg

It works if I don’t use no_grad() or change c to (concatening inputs)

def forward(self, anch, pos, neg):
        pos_out = self.fc(to.cat((anch, pos,), dim=-1))
        pos_out = to.sigmoid(self.fc_final(fu.relu(pos_out)))
        neg_out = self.fc(to.cat((anch, pos,), dim=-1))
        neg_out = to.sigmoid(self.fc_final(fu.relu(neg_out)))
        binary_loss = (-to.log(pos_out + 1e-7) - to.log(1 - neg_out)).mean()
        return binary_loss

albanD · February 12, 2020, 2:09am

Sorry, I can’t find any variables called a, b or c. Not sure what you mean here.
Can you give a code sample that I can run to make this clearer?

divinho · February 12, 2020, 2:14am

Sorry I should have been clearer, a, b and c are not variables but each a group of operations (either a function or a nn.Module::forward). The point I was trying to make was that first I do operations where I want the gradient to be tracked (in a), then I do some where I don’t want them to be tracked (b), then I want them to be tracked again (c).

I’ll try and create a simple example tomorrow.