I am running experiments using DDP and I observed that, when setting non_blocking=True to move the batch to the GPUs, memory is reserved at GPU 0, even if it is not being used by the experiment (in which case utilization remains at 0%). When non_blocking=False, this does not happen.
Is this normal? What might be causing it?