CUDA out of memory when optimizer.step()

shirui-japina · September 15, 2019, 5:40am

I try to extract image features by InceptionA (part of GoogLeNet). When there is no optimizer.step(), it works even with the batch size 128. But when there is optimizer.step(), it will Error: CUDA out of memory.
Here is the code:

model = InceptionA(pool_features=2)
model.to(device)
optimizer = optim.Adam(model.parameters())
criterion = nn.BCELoss(reduction=‘mean’)
for epoch in range(100):
    for i, (batch_input, label) in enumerate(data_loader):
        optimizer.zero_grad()
        output = model(batch_input)
        loss = criterion(output, label)
        loss.backward()
        optimizer.step() # Error here

How can I fix this error?

shirui-japina · September 15, 2019, 8:57am

I have already solved the problem. The code is here:
1) optimizer = optim.SGD(model.parameters(), lr=0.0001)
and
2) loss = criterion(torch.sigmoid(output), label)
the reason of 2) is

BCELoss accepts only inputs that have all elements in range [0; 1]

But I don’t know why 1) have to use optim.SGD() and optim.Adam() can’t

ptrblck · September 15, 2019, 11:05pm

Adam uses internal running estimates and thus uses more memory than e.g. SGD.
If your GPU is almost full and you call step on your Adam optimizer, these running estimates will be created and might thus yield an out of memory error.

shirui-japina · September 16, 2019, 4:01am

Is there any solution or PyTorch function to solve the problem? Even work at a slow speed.
Or the only way to solve it is to use a better GPU or multiple GPUs, is that right?

ptrblck · September 16, 2019, 10:18am

The easiest way would be to lower your batch size. If that’s not possible (e.g. if your batch size is already 1), you could have a look at torch.utils.checkpoint to trade compute for memory.

shirui-japina · September 16, 2019, 10:54am

I got it. Thanks for your advising