It’s possible to use NVLS via the torch.cuda.MemPool API which landed in this PR. We are also working on enabling it in e.g. DDP related to this PR.
torch.cuda.MemPool