I’m noticing some weird behavior with memory not being freed from CUDA as it should be.
I can reproduce the following issue on two different machines:
Machine 1 runs Arch Linux and uses pytorch 0.3.1b0+2b47480 on python 2.7
Machine 2 runs Ubuntu 16.04 and uses pytorch 0.3.0.post4 on python 2.7
The simplest example I can do to replicate looks like this:
##########################################################################
# FUNCTION BLOCK #
##########################################################################
# fxn taken from https://discuss.pytorch.org/t/memory-leaks-in-trans-conv/12492
def get_gpu_memory_map():
result = subprocess.check_output(
[
'nvidia-smi', '--query-gpu=memory.used',
'--format=csv,nounits,noheader'
])
return float(result)
def memout_example():
assert vars() == {}
# empty slate
# build persistent data
val_loader = cifar_loader.load_cifar_data('val', normalize=False,
batch_size=16, use_gpu=True)
# now loop through batches and show that there's accumulating...
for batch_no, (batch, labels) in enumerate(val_loader):
# clean up garbage and clear cuda cache as much as possible
gc.collect()
print "BATCH NUMBER: %s" % batch_no
print "GPU MEMORY: %s" % get_gpu_memory_map()
assert sorted(vars().keys()) == sorted(['labels', 'val_loader',
'batch', 'batch_no'])
torch.cuda.empty_cache()
# load things needed for attack
base_model = cifar_resnets.resnet32()
adv_trained_net = checkpoints.load_state_dict_from_filename(
'half_trained_madry.th', base_model)
adv_trained_net.cuda()
cifar_normer = utils.DifferentiableNormalize(mean=config.CIFAR10_MEANS,
std=config.CIFAR10_STDS)
pgd_perceptual_loss = plf.PerceptualXentropy(adv_trained_net,
normalizer=cifar_normer, use_gpu=True)
pgd_attack_obj = aa.LInfPGD(adv_trained_net, cifar_normer,
pgd_perceptual_loss)
adv_images = pgd_attack_obj.attack(batch.cuda(), labels.cuda(),
l_inf_bound =8.0/255.0,
step_size=1.0/255.0,
num_iterations=16, verbose=False)
# push things to cpu (in hopes it gets them out of the cache)
# also delete everything and be sure to collect garbage before next mb
batch.cpu()
labels.cpu()
del adv_images
del batch
del labels
del pgd_attack_obj
del pgd_perceptual_loss
del cifar_normer
adv_trained_net.cpu()
del adv_trained_net
del base_model
return
##########################################################################
# BREAK THE PLANET BLOCK #
##########################################################################
print memout_example()
Hopefully the annotations make things clear, but gist is that I’m running adversarial attacks across many minibatches from CIFAR. For each minibatch, however, I’m deleting all references to everything except the loop-variables and then reinitializing. It’s my understanding that if I delete references, then garbage collect, and then call torch.cuda.empty_cache() the CUDA memory allocated by the last minibatch should be cleared out.
However, this is not what I’m witnessing. My output looks like:
Files already downloaded and verified
BATCH NUMBER: 0
GPU MEMORY: 554.0
BATCH NUMBER: 1
GPU MEMORY: 2306.0
BATCH NUMBER: 2
GPU MEMORY: 3896.0
BATCH NUMBER: 3
GPU MEMORY: 5484.0
BATCH NUMBER: 4
GPU MEMORY: 7074.0
BATCH NUMBER: 5
GPU MEMORY: 8664.0
BATCH NUMBER: 6
GPU MEMORY: 10252.0
Until I get an error that looks like
RuntimeError: cuda runtime error (2) : out of memory at /build/python-pytorch/src/pytorch-0.3.1-py2-cuda/torch/lib/THC/generic/THCStorage.cu:58
So somehow, despite aggressively trying to clear CUDA memory, things accumulate and eventually I run out of memory.
I’m happy to share more of my code or host a live jupyter notebook to demonstrate the issue.