Pytorch GPU Memory Usage

legoh · April 27, 2020, 1:15am

Hi guys, I’m not really sure why this is happening but if I measure my data object, it’s about 265mb in the GPU. If I measure the model, it’s also about 300mb. But once I start training, pytorch uses up almost all my GPU memory. I can’t really understand why. I’ve set pin_memory=False in my DataLoader already and it still displays this behavior. Is there a way to trace the memory usage properly for each object? I’m only doing a single sample every time as I can’t move to batches since all my GPU is being used up.

ptrblck · April 27, 2020, 2:22am

Besides the data and model parameters, the CUDA context will use some memory as well as the intermediate activations, which are needed to calculate the backward pass.
Also note that PyTorch uses a caching allocator, which will reuse the memory.
nvidia-smi will thus show the complete memory usage, while torch.cuda.memory_allocated() will give you the allocated memory only.

legoh · April 27, 2020, 2:28am

Hmm if that’s the case, it’d mean that majority of the GPU allocation is for the intermediate stages of forward / backward passes? Hence, will it be likely that I can improve the GPU allocation if I reduce the data types of the inputs? e.g. converting from float 64 to float 32.

ptrblck · April 27, 2020, 2:30am

That might be the case.
E.g. for a single conv layer, the output might contain more elements than the input, if out_channels > in_chanels and you don’t reduce the spatial size.

Yes. PyTorch uses float32 by default, so if you’ve called double() on your model (and don’t strictly need the precision), I would recommend to use FP32.

legoh · April 27, 2020, 2:55am

Okay got it. Here’s something that’s really puzzling to me though. 1 sample of my DataLoader appears to have ~250Mb (when I first load 1 sample into the GPU). But oddly, if i increment my batch size to 4, I get this error

RuntimeError: CUDA out of memory. Tried to allocate 7.41 GiB (GPU 0; 11.17 GiB total capacity; 8.34 GiB already allocated; 2.28 GiB free; 8.60 GiB reserved in total by PyTorch)

Which is really odd, does this mean my data object is actually larger? In this instance, 7.41/4 = 1.8 Gb?

ptrblck · April 27, 2020, 2:59am

Yes, it means that not only the data sample itself uses additional memory, but also all intermediate activations.

legoh · April 27, 2020, 3:06am

I see ok. So one way is to reduce the size of the tensors that in the object. I keep a tensor that is used to index another tensor in the object (e.g. A[b: ]). In this case, does b have to be of type LongTensor? Or can I reduce this to a smaller integer type?