Confusion about memory allocation mechanism

I saw a rule about the memory allocation mechanism from line 66 of this file: To further reduce fragmentation, blocks >= 200MB are not allowed to be split. These oversize cached blocks will still satisfy requests within 20MB of the oversize cached block size.

My understanding of this is that blocks larger than or equal to 200MB will not be split. So, I did a test.

import torch
import torch.cuda
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
temp1 = torch.tensor([1.0]*1024*1024*100).to(device)
del temp1
temp2 = torch.tensor([1.0]*1024*1024*6).to(device)

The output of this program is 400. It looks like temp2 split 400MB of temp1’s memory.

I’m very confused about this. Can anyone give me some advice? Thanks!

I’m very possibly wrong here, but is temp1 not 400MB itself (100M * 4 bytes of 32float), and then you delete it (but pytorch still keeps that memory reserved, so you’ll still see 400MB), and then since it’s free the next smaller allocation (6MB*4=24MB) uses up that some of that freed space, and the rest remains reserved?

Can blocks larger than or equal to 200MB be split?