How to set per process memory fraction in c++?

When i load model and infer by libTorch, how could i set per process memory fraction in c++, like “torch.cuda.set_per_process_memory_fraction” in python.

I would check if you could call this caching allocator function that the Python binding calls: pytorch/Module.cpp at f84f89b1c3f2bc74512e7a7b05ae6185164a9b3e · pytorch/pytorch · GitHub

:heart:Tanks very much. It’s important for us.

Is there any updates?

I believe @eqy suggested to try to use c10::cuda::CUDACachingAllocator::setMemoryFraction(fraction, device); as shown in the link, so were you able to use it?

Sorry, I didn’t found this api in libtorch before. But i find now.
I will try. Thanks.

It seems doesn’t work normally. I had set mem fracation to 10%. But it can alloc gpu mem 20%.
Test codes:

int test_set_mem_limit() {
  std::cout << "test_set_mem_limit" << std::endl;
  c10::cuda::CUDACachingAllocator::init(1);
  c10::cuda::CUDACachingAllocator::setMemoryFraction(0.1, 0);
  c10::cuda::CUDACachingAllocator::emptyCache();
  long long total_memory = 11721506816;
  auto tmp_tensor = torch::empty({(long long)(total_memory * 0.2)}, torch::kFloat32);
  std::cout << "tmp_tensor sizes: " << tmp_tensor.sizes() << std::endl; // tmp_tensor sizes: [2344301363] 
  std::this_thread::sleep_for(std::chrono::seconds(10));
  std::cout << "test_set_mem_limit end" << std::endl;
  return 0;
}

Is my codes wrong using apis?

It looks like you aren’t specifying the device when calling torch::empty, which would allocate the tensor using host memory which wouldn’t be limited by setMemoryFraction. Could you try setting the device to CUDA for the call to torch::empty? Relevant documentation: Tensor Creation API — PyTorch master documentation

It work. Thanks:

std::cout << "test_set_mem_limit" << std::endl;
  c10::cuda::CUDACachingAllocator::init(1);
  c10::cuda::CUDACachingAllocator::setMemoryFraction(0.1, 0);
  c10::cuda::CUDACachingAllocator::emptyCache();
  long long total_memory = 11721506816;
  auto tmp_tensor = torch::empty({(long long)(total_memory * 0.2)}, torch::kFloat32).to({torch::kCUDA, 0}); // CUDA out of memory. Tried to allocate 8.73 GiB (GPU 0; 10.92 GiB total capacity; 0 bytes already allocated; 5.97 GiB free; 1.09 GiB allowed;
  std::cout << "tmp_tensor sizes: " << tmp_tensor.sizes() << std::endl;
  std::this_thread::sleep_for(std::chrono::seconds(10));
  std::cout << "test_set_mem_limit end" << std::endl;