non_blocking=True reserves extra memory in GPU 0

I am running experiments using DDP and I observed that, when setting non_blocking=True to move the batch to the GPUs, memory is reserved at GPU 0, even if it is not being used by the experiment (in which case utilization remains at 0%). When non_blocking=False, this does not happen.

Is this normal? What might be causing it?

Iā€™m unable to reproduce the additional memory usage of GPU0 using the DDP example and adding non_blocking=True to the to() operation, so could you post a minimal, executable code snippet, please?

1 Like