Automatically determine max. batch-size for inference


I want to use the max. batch-size possible for inference.
I thought of reading out CUDA’s free memory and by knowledge of my tensor 's physical size calculating the batch_size. Is this reasonable, or are there more sophisticated methods to do this?

What would be the proper function to call for determining the free space available for my images?
I guess reading out the currently used memory could be done via

#include <c10/cuda/CUDACachingAllocator.h> 
uint64_t curr_mem = c10::cuda::CUDACachingAllocator::currentMemoryAllocated(0);

But I’d also need the total memory …


I don’t think PyTorch provides this, but you could try your luck with cudaMemGetInfo.

Best regards