I want to use the max. batch-size possible for inference.
I thought of reading out CUDA’s free memory and by knowledge of my tensor 's physical size calculating the batch_size. Is this reasonable, or are there more sophisticated methods to do this?
What would be the proper function to call for determining the free space available for my images?
I guess reading out the currently used memory could be done via
#include <c10/cuda/CUDACachingAllocator.h> [...] uint64_t curr_mem = c10::cuda::CUDACachingAllocator::currentMemoryAllocated(0);
But I’d also need the total memory …