Pytorch LBFGS with implicit direct gradient calculation

I am using pytorch to optimize a certain function f(x) . SO, I created a function f , That gives me the gradient as following:

f  = lambda x: gradient(x)
"gradient (x) is a function to calculate the gradient"

Then, I try to use LBFGS to optimize this function as follwoing:

opt = optim.LBFGS([X],lr=.1)
for _ in range(1000):
    def closure():
        X.grad = -f(X)
        return X

But, I got this error:

RuntimeError: assigned grad has data of a different size

How can I solve this issue, with the same setup.


From the error message, it looks like the gradient computed by your f(X) is not of the same size as X. Could you verify that?

Hi, no it’s same size. I used other optiimizers and all works, except LBFGS, since it needs a sort of closure.
X = m0.clone()
gr = lambda x: f.gradient(x)
opt = optim.Adam([X], lr=1e-2)
for _ in range(5000):
X.grad = -gr(X)

This Adam, and all other optimizers work

Can you share the stack trace? The error is raised at line X.grad = -f(X) right?
If so, then I would add prints here.
Keep in mind that LBFGS (contrary to all other optimizers) actually evaluate the gradients multiple times after updating the value of your X.

yes, That’s why i need to find a solution to bypass this issue.

So it is not returning gradients with the same size in subsequent calls?

Maybe if you could share a code sample that reproduces the issue that would help debugging.

Thanks @AlbanD. I found a nice solution pytorch L-BFGS. I believe we need to call the function and use backward().
Thanks for your help