I have been reading on a different forum post that using torch.cuda.empty_cache() is not usually recommended. However, I see a lot of GPU memory being released permanently when using it once.
Here is what I am doing:
#loading model
model = torch.load(‘modelName.pt’)
torch.compile with batch size 16
batch_size = 16
trt_model = torch_tensorrt.compile(model,
inputs = [torch_tensorrt.Input((batch_size, 3, 224, 224),)],
enabled_precisions= {torch_tensorrt.dtype.half}
)
Without using torch.cuda.empty_cache(), nvidia-smi reports 17954 MiB of GPU memory being used. Any inference on the model keeps the memory usage the same. However, if I use torch.cuda.empty_cache(), the memory usage drops to 2428 MiB. Any inference still shows the usage to be 2428 MiB. Inference times seem to be very similar between both cases.
Is there any reason using torch.cuda.empty_cache() would not be recommended here? Thanks for any help.