CUDA run out of memory

Python has function scoping, so you might be able to save a bit of memory, if you wrap the training in a train function and call it as described here.

The inplace argument is not available for all layers, so you could try it for e.g. LeakyReLU and see, if the code still works and if you save some memory.