Misunderstanding CUDA out of memory

PyTorch tries to allocate the memory for the complete tensor, so increasing the batch size would also increase (some) tensors and thus the memory blocks are also bigger. If you are now running out of memory, the failed memory block might be bigger (as seen in the “tried to allocate …” message), while the already allocated memory is smaller.

Reserved memory returns the allocated and cached memory.
Cached memory is used to be able to reuse device memory without reallocating it.