Hi all,
I’m working on a “learning to learn by gradient descent by gradient descent”-style system and have encountered a problem where the system works on cpu, but crashes when running on the gpu.
A minimal example gist is here and to see the error: (“RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation”) flip the “cuda” variable in main to True.
Interestingly this error only occurs when the higher level network is an RNN, testing on a higher level with a single linear layer doesn’t seem to trigger the problem.
In addition, backporting and running the code on PyTorch 0.3.0 works fine too gist. I havent tested on 0.3.1 as it discriminates against my gpu for being old.
Thanks! Removing that line does work, there’s now no error. The question now is: when would it be appropriate to flatten the parameters? The restrictions on in-place operations on leaf variables means that I have to clone, adjust, then write back the parameters every update which fragments them.
Do I just have to eat the performance penalty and find some intermittent opportunities to flatten them? Or can I do something smarter?
For anyone else brought here by searching online, this was my issue: einsum in v4.0 modifies variables in-place. My CPU PyTorch version was v4.1, while my GPU PyTorch version was v4.0.