I’m working on a “learning to learn by gradient descent by gradient descent”-style system and have encountered a problem where the system works on cpu, but crashes when running on the gpu.
A minimal example gist is here and to see the error: (“RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation”) flip the “cuda” variable in main to True.
Interestingly this error only occurs when the higher level network is an RNN, testing on a higher level with a single linear layer doesn’t seem to trigger the problem.
In addition, backporting and running the code on PyTorch 0.3.0 works fine too gist. I havent tested on 0.3.1 as it discriminates against my gpu for being old.
I appreciate any advice you guys can give.