[Bug] Memory leak in C++ libtorch

It seems there is a bug leading to a memory leak in the (C++) libtorch.

#include <torch/extension.h>
#include <thread>


class FrameVector {

private:
    std::vector<int> data;

public:
    FrameVector() {
        std::vector<int> data(7056);
        this->data = data;
    }
};

class FrameTensor {

private:
    torch::Tensor data;

public:
    FrameTensor() {
        this->data = torch::zeros({1, 84, 84});
    }
};

template<class T>
void f() {

    int capacity = 1000000;
    std::vector<std::vector<T>> frames(capacity);

    for (auto i = 0; i < capacity + 1000000; i++) {
        if (i == capacity) {
            std::cout << "buffer is full!" << std::endl;
            std::this_thread::sleep_for(std::chrono::seconds(2));
            std::cout << "restart!" << std::endl;
        }
        frames[i % capacity].push_back(T());
        if (i >= capacity) {
            frames[i % capacity].erase(frames[i % capacity].begin());
        }
    }

}

int main(int argc, char *argv[])
{
    f<FrameTensor>();  // needs 34G to fill the replay buffer, then memory increases to 60G
    f<FrameVector>();  // needs 34G to fill the replay buffer, then memory stay constant (as it should)
}

The bug only seems to occur when the torch::Tensor is stored in nested containers for examples:

  • std::vector<std::vector<T>>
  • std::vector<std::deque<T>>

I believe the internal counter that keep track of the number of references to the torch::Tensor fail to count the correct number of references. This leads the tensors memory to never be released.

Am I correct to believe this is a libtorch bug? If so, how do I report it? @ptrblck

Just to understand the numbers better, I’m confused why:

  • 7056 (elements) x 4 (Bytes) x 1000000 (tensors) is only 28G not 34G
  • 60G isn’t perfectly double 34G

Let me know if I’m missing something obvious.

Hello, thanks for your fast answer, I used htop to monitor the memory usage. Some of the memory was already used before the program started (34 - 28 = 6G of already used memory). Then, 6 + 2 * 28 = 62G total. I allocated large enough chunks for memory so that I could ignored small variation in memory usage, i.e., if the memory increases significantly over 34G, I know there is a leak.

1 Like