It might be expected and you could check the number of trainable parameters, buffers, and the shape of all intermediate activations to verify it. This post gives you an example how to do so.
Alternatively, you could also add debug print statements into the forward
method and check how much memory is used where via print(torch.cuda.memory_summary())
.