Libtorch memory on training

I created a PointNet module with libtorch. It has a point cloud input as {min batch size, 3, pointCount(about 500000)} shape. But, when I training the model, It consumes so many memory.

As I know, on training, ‘conv, linear, batchNormalize…’ hold it’s input tensor for backward. If input tensor shape is {4, 3, 500000} and float32 type, it has 24000000 (=4 x 3 x 500000 x 4) byte on memory. is it true? And if it is True. can I change storage for the values with libtorch?

It is true that some layers need to store intermediate tensors to be able to calculate the gradients during the backward pass.
If you are running out of memory you could either decrease the batch size (or number of points, if possible). I don’t know, if e.g. torch.utils.checkpoint can be used in libtorch, but that might be another approach to trade compute for memory.

1 Like

Thanks for your reply. I think torch.utils.checkpoint is useful for me.

But I have some questions of meomory use of torch. For monitoring memory useage of torch, I run my custom model which contains code below in both way model->eval() and model->train().

printShape(x);  [4, 3, 40695]
printMemory();
x = conv1x1(x);
printShape(x);  //[4, 64, 40695]
printMemory(); //increase about 41mb
x = bn1(x);
printShape(x);  //[4, 64, 40695]
printMemory(); //increase about 41mb
x = func::relu(x);
printShape(x);  //[4, 64, 40695]
printMemory(); //increase about 41mb

But there is no difference between two way and a non-paramters module relu also takes some memory. It looks like all intermediate tensors always not be freed for some reason, until call backward() (and I run with a CPU, not GPU). Now, I am tring to know why tensor not released even if not used. can you help me?

I solved my problem.

torch::AutoGradMode enable_grad(false);

but I wonder that below result

data = data.set_requires_grad(false);
auto req_grad = data.requires_grad(); // flase

data = conv1(data);
auto req_grad = data.requires_grad(); // true

Even though relu doesn’t use parameters, the intermediate tensors might be stored, if they are needed to calculate the gradients in the backward pass.
The output of conv1 would require gradients, if the gradient calculation wasn’t disabled for the code block, so could you double check if the AutoGradMode is indeed set to false for this calculation?

Yes, AutoGradMode is set to false. and then, data.requires_grad () always return false. My wonder is that conv1(data) return requires gradient, when AutoGradMode = true and input data not requires gradient. I thought that inpudata.requires_grad(false) is enougth to prevent creating gradient. :grinning:

Usually the input data doesn’t require gradients and the output of a layer with trainable parameters will require gradients.
Since AutoGradMode was set to true, the result is thus expected.

Here is a small code snippet in Python:

x = torch.randn(1, 1)
print(x.requires_grad)
> False

lin = nn.Linear(1, 1)
out = lin(x)
print(out.requires_grad)
> True

This would be a very simple example of the “standard” use case, where you don’t need to calculate gradients in the input tensor.
As long as you don’t disable the gradient calculation Autograd will create the computation graph (and will thus create outputs with requires_grad=True) for operations using trainable parameters.