Why does moving a model to the GPU increase the CPU memory usage?

peterkim95 · June 14, 2021, 1:56pm

I have a simple model:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        
        self.fc1 = nn.Linear(303, 128)
        self.fc2 = nn.Linear(128, 1)
        
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

When I create the model on my CPU as such,

model = Net()

Both CPU and GPU memory usage remain unchanged. However, when I then move the model to the GPU,

model.cuda() # model.to(device) does the same

My CPU memory usage shoots up from 410MB to 1.95GB and my GPU memory usage goes from 0MB to 716MB.

My question is

Before I move the model to the GPU, why does creating the model not affect even my CPU memory? The parameters surely must go somewhere (unless they’re lazily created?)
Why does moving the model to the GPU suddenly consume both CPU and GPU memory? Shouldn’t all the weights go entirely to the GPU?

eqy · June 14, 2021, 10:19pm

I’m not sure what directly causes (1) but I think there could be reasons behind (2)
(2) Did you create any cuda tensors before moving your model to the GPU? If not, then my handwavy explanation for this is that the moment you create a cuda tensor (e.g., directly declaring one or by moving your model to cuda), this does a whole bunch of things like spin up a cuda context and load a bunch of extra library code corresponding to GPU kernels and all of these things will take up memory.

peterkim95 · June 15, 2021, 12:53am

I did not create any cuda tensors before moving the model so your explanation is certainly possible. Although I am surprised that cuda related libraries will take over a GB of CPU memory.