How to clear some GPU memory?

(Matthew Kleinsmith) #1


I put some data on a GPU using PyTorch and now I’m trying to take it off without killing my Python process. How can I do this?

Here was my attempt:

import torch
import numpy as np

n = 2**14
a_2GB = np.ones((n, n))  # RAM: +2GB
del a_2GB  # RAM: -2GB
a_2GB = np.ones((n, n))  # RAM: +2GB
a_2GB_torch = torch.from_numpy(a_2GB)  # RAM: Same
a_2GB_torch_gpu = a_2GB_torch.cuda()  # RAM: +0.9GB, VRAM: +2313MiB
del a_2GB  # RAM: Same, VRAM: Same
del a_2GB_torch_gpu  # RAM: Same, VRAM: Same
del a_2GB_torch  # RAM: -2GB, VRAM: Same

(Yun Chen) #2

Even though nvidia-smi shows pytorch still uses 2GB of GPU memory, but it could be reused if needed.

After del try:

a_2GB_torch_gpu_2 = a_2GB_torch.cuda()
a_2GB_torch_gpu_3 = a_2GB_torch.cuda()  

you’ll find it out.

(Nirvan Tyagi) #3

Even if that same process can reuse the GPU memory, it doesn’t look like other processes can. I’m running into a similar utilization concern.

Another process will run into Out of Memory errors, while the original process keeps the GPU memory even after it is done using it.

[Solved] Why does a cuda float tensor with 64 million floats use ~512MB GPU?
(James Bradbury) #4

That’s right. When there are multiple processes on one GPU that each use a PyTorch-style caching allocator there are corner cases where you can hit OOMs, but it’s very unlikely if all processes are allocating memory frequently (it happens when one proc’s cache is sitting on a bunch of unused memory and another is trying to malloc but doesn’t have anything left in its cache to free; if the first one were allocating at all it would hit the limit and know to free its cache). It could be improved, but it’s a lot better than frameworks that commandeer your whole GPU even if they’re only using 100MB…

(aviolov) #5

I have run into a related issue while using the experimental Windows version. in my train phase, CUDA allocates about 4GBs for mini-batches and I optimize my params. Then when I am done and want to predict on a separate dataset, using the same mini-batch size, a fresh new 4GBs are allocated.

To be more precise, when i am done training, and nothing but the model should remain on the GPU, I can breakpoint and issue these commands: (all memory readings come from nvidia-smi):
T = torch.rand(1000,1000000).cuda() // Now memory reads 8GB (i.e. a further 4 GB was allocated, so the training 4GB was NOT considered ‘free’ by the cache-allocator, even though it was being reused during training)
del T // Still 8 GB (as expected)
T = torch.rand(1000,1000000).cuda() // Still 8GB as expected, the cache-allocator is reusing the same space as the first T above

So it looks like the 4GB from training are still taking up space on the GPU, even though they should be freed. But later they are being reused (when retraining the same model). I.e. they can be reused for the same purpose but not for arbitrary tensors - which makes no sense to me, of course.

Is there a way to manually force the caching allocator to free some GPU memory space? Or, since it seems that the cache-allocator doesn’t think the space is actually free - Can I pull my model.to_cpu() and then ask torch to free everything it has on the GPU?

(Sebastien) #6

For those who are facing similar memory issue, look at the autograd setting: Volatile.
It’s recommended for inference mode, to optimize the amount of memory used in evaluating the model .

you could do sth like this:
volatile_input = Variable(torch.randn(1000,1000000), volatile=True).cuda()

(aviolov) #7

good call, thanks, We already set our input variables in predict() to volatile=True. My impression is that GPU memory left committed from the training is being ‘hoarded’ and it is that memory that I would like to clear / free / repurpose. (I actually tried setting volatile=False, to all my variables in the predict method, but that didn’t fix the memory ‘leak’)

(jdhao) #8

It is not memory leak, in newest PyTorch, you can use torch.cuda.empty_cache() to clear the cached memory.

(Lonelylingoes) #9

I have the same problem as MatthewKleinsmith’s.
And I set the volatile=False and use torch.cuda.empty_cache() ,it still does not works.

(Simon Wang) #10

If you already removed unwanted references to the Variables, empty_cache should definitely work. You can check by seeing the nvidia-smi values.

volatile=False is the default option. It will build the graph as it goes. Setting it makes no difference.

(Nikhil Verma) #12

If you already removed unwanted references to the Variables, empty_cache should definitely work

@SimonW Could you elaborate what do you mean by remove unwanted references? Do you mean explicitly deleting variables (del variable) or something else?

After every epoch I’m calling torch.cuda.empty_cache(), but nvidia-smi still shows an increase in GPU memory after every loop.

    for epoch in range(20):
        for batch in train_data.batches:
            inputs, targets = batch
            predictions = model(inputs)
            predictions = predictions.view(-1, model.vocab_size, model.batch_size)
            targets = targets.view(-1, model.batch_size)
            loss = loss_function(predictions, targets)

(Simon Wang) #13

torch.cuda.empty_cache doesn’t give PyTorch extra GPU memory to use. See So it won’t help if you are solving a OOM with only PyTorch using that GPU.

The structure of your code segment looks fine. So it’s probably one of loss_function, init_hidden, tester.test, or tester.print_samples that’s causing the issue.

(Nikhil Verma) #14

I realise where I was making a mistake. My model has an LSTM and I’m supposed to pass on a new, empty variable as the hidden state. If I pass on an existing variable, such as the hidden state from the previous timestep, the model backprops all the way back to the first epoch on every epoch of training. This is precisely why the GPU memory kept exploding after every epoch. I now have something like this, and it works fine.

def zero_hidden():
    return (torch.zeros(1, 1, hidden_dim),
            torch.zeros(1, 1, hidden_dim))

lstm_out, lstm_hidden = lstm(lstm_in, zero_hidden())