Hi!
I am trying to perform Monte Carlo sampling to estimate the uncertainty of my network. To do so, I have dropout enabled and I am executing following loop:
outputs = [self._model(inputs) for i in range(X)]
Where “inputs” is just a single frame.
First I tried this code in Google Colab, and everything was working fine, even with X = 40 and batch_size = 100 (frames) at once.
When I switch back to a personal computer with a 6GB dedicated memory GPU, if X is greater than 5 (with one single frame), I get the famous error “CUDA Out of memory”. Even if I split the code such that:
outputs = [self._model(inputs) for i in range(X/2)]
outputs = [self._model(inputs) for i in range(X/2)]
The code gives me same results.
Why is this happening? Is the for loop being parallelized by CUDA API and thus collapsing the memory? How can I solve this?
I don’t know if this can solve your issue. But I was having a similar one (i am looping a model over different folds) and at the second loop I obtained this famous memory error.
At the end of the loop I free the memory in the following way:
for element in dir():
if element[0:2] != "__":
del globals()[element]
import torch
torch.cuda.empty_cache()
First of all, thank you for your response.
I have tried your approach. However, I still need so many variables in my script, so I cannot do that, as the programm wouldn’t work as a consequence. If I only use “torch.cuda.empty_cache()”, I still get “CUDA out of memory”, so aparently I cannot solve it with that.
So it is clearly about the memory used to allocate the output tensors in the GPU. I have tried to first replicate the outputs in a CPU tensor before computing more samples, but it is still not working.