LibTorch CUDA huge initial VRAM allocation

Hi. I’m using libtorch for inference. I’m running into an issue where I need to run two instances of a program, but due to libtorch’s massive initial memory allocation, I can only fit one instance on my GPU.

Inspecting VRAM usage, the first call to:

torch::Tensor ones = torch::ones({1, 1, 1, 1}).cuda();

increases VRAM usage by 1.6GB. Is this normal, or is there a way to get around it? This seems way too high to be reasonable.