Memory leak in LibTorch, extremely simple code

I have the following tiny code snippet, which allocates a new chunk of CUDA memory every time I call model.forward():

auto model = torch::jit::load("");
auto out   = torch::empty({1, 512, 512});
for (int i = 0; i < 20; i++) {
	auto in  = torch::empty({1, 3, 512, 512}, torch::kCUDA);
	auto res = model->forward({in}).toTensor(); // every time this is called, another 2GB CUDA memory
	out.copy_(res[0]); // without this line, no leaks

Obviously, this runs out of CUDA memory very quickly.
Curiously, if I don’t consume the result of model.forward(), then no leaks.
Am I doing something wrong?

I’m on Ubuntu 18.04, CUDA 10.0, and PyTorch compiled from source, v1.0.1 branch.

1 Like

I have found a workaround (pseudo-code):

float* buf = (float*) malloc(bufSize);
for (int i = 0; i < 20; i++) {
    // input is a CUDA tensor
    auto res = model->forward(input);
    auto resCPU = res.cpu();
    memcpy(buf, resCPU.accessor<float, 3>().data(), bufSize);
    // resCPU goes out of scope here, so presumably some or other reference count
    // is also release, so that the CUDA memory involved in the model.forward() can now
    // be released.

The key difference here is that there is no long-lived object that is the destination of a CUDA -> CPU copy. In my original example, the ‘out’ object seems to cause references to the CUDA memory to never be released.
Is this intended behaviour?

1 Like


The problem is that you’re doing a forward in a network. And if you don’t disable the autograd, then all the buffers are saved to be able to backprop when you want.
Then you save that output in another Tensor thus keeping this whole history around.

If you just want to do evaluation, you should disable GradMode (not sure where it is in the cpp doc).

The c++ equivalent of torch.no_grad() would be NoGradGuard from
torch/csrc/api/include/torch/utils.h. From the current comments you can see that it is a thread-local guard to disable gradients.

torch::NoGradGuard no_grad_guard;
[rest of your code with no grad]


Thanks @albanD, @shane-carroll!

torch::NoGradGuard no_grad_guard works!

Memory usage is also over 2x less, which makes sense.

This really ought to be in the “getting started” docs. I’ll try and find time to make a little PR that includes this in the documentation.

1 Like

when I am on CPU, I have the same problem. I tried [torch::NoGradGuard no_grad_guard;] but its of no use. I compile the code to dll for another program to use. There’s no problem when I run my code independently. In another program, its still ok when the function [forward] is annotated…I really dont know how to solve, anyone knows?please help thanks!

I also encountered this question. Using torch::NoGradGuard no_grad_guard works.
Want to ask what if I want to backward as well? In that case we can’t set NoGradGuard…
Thanks for the help in advance!

If you want to backward, then you need to keep these buffers. Be careful though only to link things that make sense for your problem.
If you want to accumulate the loss for example, you will need to use detach() to make sure the history won’t be kept.


Thanks so much for your reply!

So after accumulating loss or getting gradients, what variables need to be detached?

The variables that naturally go out of scope don’t need any special treatment.
If you keep some things around across iterations, for example if you accumulate the loss for every batch of the epoch, then you want to detach before accumulating in this buffer.

1 Like

Oh, I see. What if I just get the loss, gradient input and weights, do I need to do something special?

That depends what you do with them :slight_smile:
If you return that to other code as plain values. Then you should detach them before returning them.
If you just use them locally to update the weights, nothing is needed.