Memory usage increases by at least 30 when applying model

Good to hear a batch size of 16 works.
Yeah, the intermediate activations can be quite huge, e.g. especially if you are using a lot of kernels in a conv layer.

.buffers is used for internal tensors, which do not require gradients, e.g. the running_mean and running_var in batchnorm layers.
If you want to get the intermediate outputs, you could register forward hooks as explained in this post.

1 Like