Pytorch LBFGS with implicit direct gradient calculation

zezo · March 8, 2021, 3:21pm

I am using pytorch to optimize a certain function f(x) . SO, I created a function f , That gives me the gradient as following:

f  = lambda x: gradient(x)
"gradient (x) is a function to calculate the gradient"

Then, I try to use LBFGS to optimize this function as follwoing:

opt = optim.LBFGS([X],lr=.1)
for _ in range(1000):
    def closure():
        opt.zero_grad()
        X.grad = -f(X)
        return X
    opt.step(closure)

But, I got this error:

RuntimeError: assigned grad has data of a different size

How can I solve this issue, with the same setup.

albanD · March 8, 2021, 3:57pm

Hi,

From the error message, it looks like the gradient computed by your f(X) is not of the same size as X. Could you verify that?

zezo · March 8, 2021, 4:05pm

Hi, no it’s same size. I used other optiimizers and all works, except LBFGS, since it needs a sort of closure.
X = m0.clone()
gr = lambda x: f.gradient(x)
opt = optim.Adam([X], lr=1e-2)
for _ in range(5000):
opt.zero_grad()
X.grad = -gr(X)
opt.step()

This Adam, and all other optimizers work

albanD · March 8, 2021, 4:48pm

Can you share the stack trace? The error is raised at line X.grad = -f(X) right?
If so, then I would add prints here.
Keep in mind that LBFGS (contrary to all other optimizers) actually evaluate the gradients multiple times after updating the value of your X.

zezo · March 8, 2021, 6:44pm

yes, That’s why i need to find a solution to bypass this issue.

albanD · March 8, 2021, 8:12pm

So it is not returning gradients with the same size in subsequent calls?

Maybe if you could share a code sample that reproduces the issue that would help debugging.

zezo · March 9, 2021, 3:30am

Thanks @AlbanD. I found a nice solution pytorch L-BFGS. I believe we need to call the function and use backward().
Thanks for your help