It’s probably memory reserved by the CUDA driver. That seems to increase with newer cards. NVIDIA doesn’t explain why, but it might have to do with changes to the instruction set on newer architectures.
You can look at how much is reserved by the driver by doing a minimal allocation, which creates a CUDA context:
import torch
import time
torch.randn(1).cuda()
time.sleep(1000)