Why CUDA runs out of memory when calling torch.save?


I was training a model for 3D semantic segmentation, which impose a very heavy memory pressure. torch.__version__ is 0.1.11+37d9568

After I fit data and model into GPU, everything went well until I tried to save a checkpoint using torch.save. And I got the following trace back:

THCudaCheck FAIL file=/home/zhang/src/pytorch/torch/csrc/generic/serialization.cpp line=38 error=2 : out of memory Traceback (most recent call last): ... File "/home/zhang/pytorch/packages/torchmed/utils/trainer.py", line 152, in _snapshot torch.save(state_dict, filename) File "/home/zhang/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 120, in save return _save(obj, f, pickle_module, pickle_protocol) File "/home/zhang/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 192, in _save serialized_storages[key]._write_file(f) RuntimeError: cuda runtime error (2) : out of memory at /home/zhang/src/pytorch/torch/csrc/generic/serialization.cpp:38

Does that mean I should reserve some memory for checkpoint saving? If so, how much should I reserve?

BTW, when I make a checkpoint, the training and testing processes should have finished, and the output Variable, i.e. loss, has been out of scope, which, I think, means the used GPU memory could be freed, so that there should be enough memory for the snapshoting. Am I wrong about the memory free mechanism?

Many many thanks for any suggestion!!

1 Like

I think this is worth filing as a bug if you haven’t resolved it yet – there should be no need, in principle, for GPU memory to be allocated during serialization, so if the serialization code does that it could probably be improved.

I resolved this problem by moving the model to cpu and then performing seialization.
I also think this is a bug, but I have no idea how to fix it …

1 Like

+1. I had the same problem and moving the model to cpu before saving also worked.

1 Like

+1. I also had the problem… Is there any other method to solve it?

+1, I am also getting the same error, but with device = ‘cuda:1’. The code is executing properly with ‘cuda:0’. I don’t know why?

I solve this problem by your method, thanks!