Python has function scoping, so you might be able to save a bit of memory, if you wrap the training in a train
function and call it as described here.
The inplace
argument is not available for all layers, so you could try it for e.g. LeakyReLU
and see, if the code still works and if you save some memory.