[Solved] Why does a cuda float tensor with 64 million floats use ~512MB GPU?

ASAIK, PyTorch uses a caching allocator, while the memory is “free”, this is not reflected in the view from the device.

1 Like