When I understand correctly, this happens with regular use not e.g., as child processes upon e.g., quitting the run (was thinking that it might be potentially related to the discussion here PyTorch doesn’t free GPU’s memory of it gets aborted due to out-of-memory error and a bug in Python’s multiprocessing)
How does your “data loading / training” loop look like?
Maybe you could run sth like
for epoch in range(num_epochs):
for batch_idx, (features, targets) in enumerate(train_loader):
and print the batch_idx
– I would be curious what numbers you get if num_workers > 1, i.e. getting a condition batch_idx > num_training_examples / batch_size