Hi, no it’s same size. I used other optiimizers and all works, except LBFGS, since it needs a sort of closure.
X = m0.clone()
gr = lambda x: f.gradient(x)
opt = optim.Adam([X], lr=1e-2)
for _ in range(5000):
opt.zero_grad()
X.grad = -gr(X)
opt.step()

Can you share the stack trace? The error is raised at line X.grad = -f(X) right?
If so, then I would add prints here.
Keep in mind that LBFGS (contrary to all other optimizers) actually evaluate the gradients multiple times after updating the value of your X.