Memory allocation for backward pass

Stefaanhess · October 15, 2020, 12:35pm

I am trying to estimate the memory allocation of my GPU before training. Therefore I count the number of model-parameters, the dimensions of the in/out-tensors and I track all intermediate computations. When I run model(inputs) with torch.no_grad(), the estimated memory allocation in byte is (model_params + input_params + intermediate_params) * 32 / 8. This also matches the observation with torch.cuda.memory_allocated().
When I do the same with building the graph, I would expect the allocation to be (model_params + input_params + 2 * intermediate_params) * 32 / 8, because of the backward pass. But the observation with torch.cuda.memory_allocated() is about (model_params + input_params + 3 * intermediate_params) * 32 / 8. Are there any other things allocated on the GPU, or am I missing something?
Thanks for helping out,
Stefaan

albanD · October 15, 2020, 4:34pm

Hi,

All the intermediary states do get allocated but they should all be freed when you run in no_grad() (unless you keep reference to them).
And when grad mode is enabled, most of these will be kept alive by the autograd to be able to compute the backward indeed. But nothing else no.