Backward fails with torch v1.6.0

cerisara · September 7, 2020, 4:33pm

I’ve a piece of code that’s running fine with torch v1.4.0, but that fails with torch v1.6.0 with error:
“RuntimeError: Trying to backward through the graph a second time…”

Is it something known/expected within this update ?

The code is a basic linear layer with a custom but complex loss function, that is implemented as an nn.Module (too complex to copy here, sorry…)

Thank you for any hint about what happened during these updates that may relate to this error !

albanD · September 7, 2020, 8:12pm

Hi,

We do quite a lot of changes between releases and you can check the release notes for any breaking change we make.

When you say “complex loss” you mean that in the sense that it is complicated or uses complex numbers? If it’s the second, we started working on autograd support for complex numbers. So you might get derivatives for things that used not to have some. You might need some extra .detach() to avoid any issue there.

Otherwise, the error comes from the fact that you’re calling backward on some part of the graph twice. Most likely because you share part of the graph between iterations. You might want to check that you don’t do any operation on Tensors that require gradients outside of the inner training loop.

Otherwise, we would need a code sample to give you more guidance

cerisara · September 8, 2020, 6:31am

Thank you for your reply.
Sure, I understand that some piece of code is required, I’ll try to write a minimal version of the code and put it here if I can’t debug it myself
I meant a complicated loss - but without parameters.
I was just surprised that it works smoothly with v1.4.0… but OK, I’ll try and debug first, thanks for the great library !

cerisara · September 8, 2020, 6:51am

OK, found it: it had nothing to do with the loss, but with the inputs:
the inputs xs are a list of pretrained word embeddings, and then, I used to build
sentence embeddings before sending to the model training process like that:

xl = []
for xx in xs:
    # sum all words embeddings to get one sentence embedding
    x = torch.sum(xx.squeeze(),dim=0)
    xl.append(x)
# put all sentence embeddings into one tensor
x = torch.stack(xl)

The tensor x is then passed to the standard training process, but I forgot the

x = x.detach()

This used to work with 1.4.0, but not any more with 1.6.0, which makes sense.

Sorry for the false alarm, and thanks again !