I’m working on a “learning to learn by gradient descent by gradient descent”-style system and have encountered a problem where the system works on cpu, but crashes when running on the gpu.
A minimal example gist is here and to see the error: (“RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation”) flip the “cuda” variable in main to True.
Interestingly this error only occurs when the higher level network is an RNN, testing on a higher level with a single linear layer doesn’t seem to trigger the problem.
In addition, backporting and running the code on PyTorch 0.3.0 works fine too gist. I havent tested on 0.3.1 as it discriminates against my gpu for being old.
I appreciate any advice you guys can give.
You should not be flattening parameters during every forward pass: https://gist.github.com/JosephDavidsonKSWH/c50d2a4c5bd19ad5fb12602f239ce78f#file-l2l-py-L75-L77 . That is bad for performance and may be what is causing the issue here.
Thanks! Removing that line does work, there’s now no error. The question now is: when would it be appropriate to flatten the parameters? The restrictions on in-place operations on leaf variables means that I have to clone, adjust, then write back the parameters every update which fragments them.
Do I just have to eat the performance penalty and find some intermittent opportunities to flatten them? Or can I do something smarter?
Hi Richard, why does flattening parameters cause an error on the GPU but not the CPU?
Also, what other types of operations throw inplace errors on the GPU but not CPU? I have the same problem as OP, but I do not flatten my parameters.
For anyone else brought here by searching online, this was my issue: einsum in v4.0 modifies variables in-place. My CPU PyTorch version was v4.1, while my GPU PyTorch version was v4.0.