The batch size would increase the activation sizes during the forward pass, while the model parameter (and gradients) would still use the same amount of memory as they are not depending on the used batch size. This post explains the memory usage in more detail.
3 Likes