I am trying to pass some inputs on a network. I can not simply use net(input_)
because I can not push all the inputs at once to the GPU due to memory limits.
I thus wrote a function to pass the inputs by batches:
def batch_pass(net, input_, batch, device):
print('cuda memory before batch pass', torch.cuda.memory_allocated(device=device))
l = 0 if len(input_) % batch == 0 else 1
r = []
for i in range(len(input_)//batch + l):
in_ = input_[i*batch:(i+1)*batch].to(device)
r += [net(in_)]
del in_
torch.cuda.empty_cache()
print('cuda memory after batch pass', torch.cuda.memory_allocated(device=device))
return torch.vstack(r)
However, the cuda memory allocated keeps on growing, and it seems that doing del in_
and torch.cuda.empty_cache()
has no effect.
The output is:
cuda memory before batch pass 26404352
cuda memory during batch pass 65019904
cuda memory during batch pass 103635456
cuda memory during batch pass 142251008
cuda memory during batch pass 180866560
...
cuda memory during batch pass 13232923136
cuda memory during batch pass 13271538688
cuda memory during batch pass 13310154240
cuda memory during batch pass 13348769792
cuda memory during batch pass 13387385344
And the function cannot finish due to error:
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.90 GiB total capacity; 12.48 GiB already allocated; 9.44 MiB free; 15.05 GiB reserved in total by PyTorch)
How can I pass input_
without using too much cuda memory?