Memory usage drastically increases when torch.cuda.is_available() is present in the code

My code was consuming a very high amount of RAM, and i eventually narrowed the problem down to this:

import torch
import psutil


torch.set_default_device('cuda')  # This bug only happens on GPU

# Mem: 318.99 MB either way
print(f"Mem: {psutil.Process().memory_info().rss / 1024 / 1024:.2f} MB")  

# Changing the dtype or size doesn't measurably change high memory usage 
t = torch.rand(10_000_000, dtype=torch.float32)  

# Mem: 1461.05 MB, but 411.43 MB when torch.cuda.is_available() is absent
print(f"Mem: {psutil.Process().memory_info().rss / 1024 / 1024:.2f} MB")

Is this a bug in PyTorch?

This is expected behavior as libraries are loaded on the host. The memory footprint was reduced in CUDA 11.8+ via CUDA’s lazy module loading, which is enabled by default on PyTorch.

1 Like