Memory use increased during training

I guess this memory increase could come from the intermediate forward activations which are needed for the gradient computation. You could take a look at this post to get an estimation of the memory usage.