Optimizer closure causes out-of-memory error

Tia · April 25, 2020, 8:50pm

Hi, I met an out-of-memory error when I use closure in my new optimizer, which needs the value of the loss f(x) to update the parameters x in each iteration. The scheme is like

g(x) = f’(x)/f(x)
x = x - a*g(x)

So I define a closure function before optimizer.step

def closure():
    optimizer.zero_grad()
    outputs = net(inputs)
    loss = criterion(outputs, targets)
    loss.backward()
    return loss, outputs
loss, outputs = optimizer.step(closure)

It works fine when I apply this optimizer to train a CNN model for MNIST. However, when I use this optimizer to train ResNet34 for cifar10, even on an HPC cluster, the program will be killed after a few iterations because of an out-of-memory error.

I think the memory of the compute node (128G) is large enough, and it works fine when I change the optimizer to torch.optim.SGD, with the same other settings. The corresponding code for SGD is:

optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()

So the only difference I can notice is the use of closure function in the new optimizer.

I have two questions:

Did I use the closure function correctly? So far this new optimizer works fine for some smaller data like MNIST, but since closure is rarely used in optimizers, and there are not many examples, so I am not so sure if I used the closure correctly.
Is the out-of-memory error caused by the use of closure function? It’s not clear for me how the closure and optimizer.step work in PyTorch so I have no idea where this out-of-memory error comes from.

Any help is highly appreciated! Thanks!

ptrblck · April 26, 2020, 7:40am

I’m not sure how your new optimizer is defined, but e.g. LBFGS (which also requires a closure) can be memory intensive as stated in the docs:

This is a very memory intensive optimizer (it requires additional param_bytes * (history_size + 1) bytes). If it doesn’t fit in memory try reducing the history size, or use a different algorithm.

If your optimizer’s step method expects the output and loss as a return value, then the closure looks correct. You could compare it to this example.
Might be and it depends, what optimizer.step is doing with the closure. If you are storing the history of the losses and outputs, then an increased memory usage is expected.

Tia · April 26, 2020, 6:57pm

Thank you for your reply! I later realized that I just need the value of the the loss, not all the metadata. so instead of saving loss and outputs, I changed my code as

def closure():
    optimizer.zero_grad()
    outputs = net(inputs)
    loss = criterion(outputs, targets)
    loss.backward()
    return loss.detach(), outputs.detach()
loss, outputs = optimizer.step(closure)

That works fine, no out-of-memory error.