I have the following tiny code snippet, which allocates a new chunk of CUDA memory every time I call model.forward():
auto model = torch::jit::load("model.tm");
auto out = torch::empty({1, 512, 512});
for (int i = 0; i < 20; i++) {
auto in = torch::empty({1, 3, 512, 512}, torch::kCUDA);
auto res = model->forward({in}).toTensor(); // every time this is called, another 2GB CUDA memory
out.copy_(res[0]); // without this line, no leaks
}
Obviously, this runs out of CUDA memory very quickly.
Curiously, if I donāt consume the result of model.forward(), then no leaks.
Am I doing something wrong?
Iām on Ubuntu 18.04, CUDA 10.0, and PyTorch compiled from source, v1.0.1 branch.
float* buf = (float*) malloc(bufSize);
for (int i = 0; i < 20; i++) {
// input is a CUDA tensor
auto res = model->forward(input);
auto resCPU = res.cpu();
memcpy(buf, resCPU.accessor<float, 3>().data(), bufSize);
// resCPU goes out of scope here, so presumably some or other reference count
// is also release, so that the CUDA memory involved in the model.forward() can now
// be released.
}
The key difference here is that there is no long-lived object that is the destination of a CUDA -> CPU copy. In my original example, the āoutā object seems to cause references to the CUDA memory to never be released.
Is this intended behaviour?
The problem is that youāre doing a forward in a network. And if you donāt disable the autograd, then all the buffers are saved to be able to backprop when you want.
Then you save that output in another Tensor thus keeping this whole history around.
If you just want to do evaluation, you should disable GradMode (not sure where it is in the cpp doc).
The c++ equivalent of torch.no_grad() would be NoGradGuard from torch/csrc/api/include/torch/utils.h. From the current comments you can see that it is a thread-local guard to disable gradients.
torch::NoGradGuard no_grad_guard;
[rest of your code with no grad]
when I am on CPU, I have the same problem. I tried [torch::NoGradGuard no_grad_guard;] but its of no use. I compile the code to dll for another program to use. Thereās no problem when I run my code independently. In another program, its still ok when the function [forward] is annotatedā¦I really dont know how to solve, anyone knows?please help thanks!
I also encountered this question. Using torch::NoGradGuard no_grad_guard works.
Want to ask what if I want to backward as well? In that case we canāt set NoGradGuardā¦
Thanks for the help in advance!
If you want to backward, then you need to keep these buffers. Be careful though only to link things that make sense for your problem.
If you want to accumulate the loss for example, you will need to use detach() to make sure the history wonāt be kept.
The variables that naturally go out of scope donāt need any special treatment.
If you keep some things around across iterations, for example if you accumulate the loss for every batch of the epoch, then you want to detach before accumulating in this buffer.
That depends what you do with them
If you return that to other code as plain values. Then you should detach them before returning them.
If you just use them locally to update the weights, nothing is needed.