I saw a rule about the memory allocation mechanism from line 66 of this file: To further reduce fragmentation, blocks >= 200MB are not allowed to be split. These oversize cached blocks will still satisfy requests within 20MB of the oversize cached block size.
My understanding of this is that blocks larger than or equal to 200MB will not be split. So, I did a test.
import torch
import torch.cuda
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
temp1 = torch.tensor([1.0]*1024*1024*100).to(device)
del temp1
temp2 = torch.tensor([1.0]*1024*1024*6).to(device)
print(torch.cuda.memory_reserved()/1024/1024)
The output of this program is 400. It looks like temp2 split 400MB of temp1’s memory.
I’m very confused about this. Can anyone give me some advice? Thanks!