I met an unexpected memory allocation in torch.cuda.memory_snapshot()
when I ran the following code:
import torch
W = torch.rand(1000, 10).cuda()
b = torch.rand(10).cuda()
X = torch.rand(1000).cuda()
y = X @ W + b
print(torch.cuda.memory_snapshot())
Since W
requests 1000 * 10 * 4B = 40000B, b
requests 10 * 4B = 40B, X
requests 1000 * 4B = 4000B, y
requests 10 * 4B = 40B, and there is a matrix alignment mechanism, if everything is as expected, the output should look something like this (and I did get the similar result in another env, this output is from torch 1.10.0):
[
{
device: 0,
address: 68780294144,
total_size: 2097152,
allocated_size: 45568,
active_size: 45568,
segment_type: "small",
blocks: [
{ size: 40448, state: "active_allocated" },
{ size: 512, state: "active_allocated" },
{ size: 4096, state: "active_allocated" },
{ size: 512, state: "inactive" },
{ size: 512, state: "active_allocated" },
{ size: 2051072, state: "inactive" },
],
},
];
But this time, besides the output above, I got some additional output like this:
{
device: 0,
address: 23007248515072,
total_size: 20971520,
allocated_size: 8519680,
active_size: 8519680,
requested_size: 8519680,
stream: 0,
segment_type: "large",
blocks: [
{ size: 8519680, requested_size: 8519680, state: "active_allocated" },
{ size: 12451840, requested_size: 0, state: "inactive" },
],
}
It always occupies this fixed size: 8519680B, and this only happens when I do matrix multiplications or use a nn.Linear
. If I use nn.Conv2d
or nn.RNN
, this magic number will disappear, just as expected.
The environment in which I met this problem: PyTorch 2.0.0 + CUDA 11.4 + Ubuntu 20.04.
The environment in which I didn’t meet this problem: PyTorch 1.10.0 + CUDA 11.6 + Windows 10.
Any information will be helpful.