# Solving many independent optimization problems with `torch.optim`

I’m interested in doing the torch version of something like this, in Scipy:

``````import numpy as np
from scipy.optimize import minimize

def objective(x):
return np.mean(x ** 2)

# a batch of initial conditions
# batch size is 101, problem dimension is 3
x0s = np.random.normal(size=(101, 3))
# use a BFGS to solve these problems independently
xs = [minimize(objective, x0).x for x0 in x0s]
``````

In PyTorch, I could write

``````import torch
from torch.optim import LBFGS
opt = LBFGS([xs])
def closure():
# objective is the MSE, and we sum the independent objectives
# xs.shape is (101, 3) as above
loss = torch.mean(xs ** 2, 1).sum()
loss.backward()
return loss
opt.step(closure)
``````

and this works, in the sense that the code runs; however, it solves the problem as a (101*3)-dimensional problem rather than as 101 separate 3-dimensional problems, which is much more expensive when using a method which tries to approximate the Hessian, like LBFGS. It would probably also lead to a difference in stopping criteria with the Scipy code above.

I’m wondering, is my understanding here correct, and if so is there any way to run optimizations in a truly independent way with torch? Maybe some vmap strategy?

After some experiments scaling up the problem size (both the number of problems and the problem dimension), I’ve concluded that I don’t understand the full picture. torch is much faster than scipy even on huge problems. So, I’m happy about that, but would still welcome any perspectives on these questions.

For reference, the scaling benchmark I ran was:

• scipy, in a notebook so that `%timeit` works:

``````import numpy as np
from scipy.optimize import minimize

def objective(x):
return np.mean(np.diff(x) ** 2) + x ** 2

# a batch of initial conditions
# batch size is 101, problem dimension is 3
x0s = np.random.normal(size=(10001, 10))
# use a BFGS to solve these problems independently
%timeit xs1 = np.array([minimize(objective, x0, method="L-BFGS-B").x for x0 in x0s])
``````

[30.8 s ± 160 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)]

• torch:

``````%%timeit
opt = LBFGS([xs2], max_iter=100)
def closure():