Torch-TRT - any reason not to use torch.cuda.empty_cache?

I have been reading on a different forum post that using torch.cuda.empty_cache() is not usually recommended. However, I see a lot of GPU memory being released permanently when using it once.
Here is what I am doing:
#loading model
model = torch.load(‘modelName.pt’)
torch.compile with batch size 16
batch_size = 16
trt_model = torch_tensorrt.compile(model,
inputs = [torch_tensorrt.Input((batch_size, 3, 224, 224),)],
enabled_precisions= {torch_tensorrt.dtype.half}
)
Without using torch.cuda.empty_cache(), nvidia-smi reports 17954 MiB of GPU memory being used. Any inference on the model keeps the memory usage the same. However, if I use torch.cuda.empty_cache(), the memory usage drops to 2428 MiB. Any inference still shows the usage to be 2428 MiB. Inference times seem to be very similar between both cases.

Is there any reason using torch.cuda.empty_cache() would not be recommended here? Thanks for any help.

Based on your memory usage it seems TorchTRT might use a large workspace (or another intermediate allocation) during its initial execution steps and is not freeing it afterwards.
We generally do not recommend clearing the cache as it will synchronize your device and since PyTorch clears the cache itself after e.g. benchmarking cuDNN algorithms or when running into an OOM.
@narendasan would know details about TorchTRT’s execution and where this memory jump might come from.