I have a simple model:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(303, 128)
self.fc2 = nn.Linear(128, 1)
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
When I create the model on my CPU as such,
model = Net()
Both CPU and GPU memory usage remain unchanged. However, when I then move the model to the GPU,
model.cuda() # model.to(device) does the same
My CPU memory usage shoots up from 410MB to 1.95GB and my GPU memory usage goes from 0MB to 716MB.
My question is
- Before I move the model to the GPU, why does creating the model not affect even my CPU memory? The parameters surely must go somewhere (unless they’re lazily created?)
- Why does moving the model to the GPU suddenly consume both CPU and GPU memory? Shouldn’t all the weights go entirely to the GPU?