Extreme GPU memory usage

Besides the inputs and model parameters the intermediate activations will also be saved during the training so you might want to check this post.