I have a learning algorithm that tries to reconstruct the input data using a particular model.

My forward problem is to reconstruct the 10 input data, and compute the sum of the losses between reconstructions and inputs.

I would like the 10 reconstructions to be done in parallel on the CPU, with 10 processes.

I don’t know if it’s even possible, meaning if autograd can be used in different processes.

I have tried so many approaches, mainly with `torch.multiprocessing.spawn`

and there always seems to be a problem with different things.

What is the best way to do this ?

Here is the single-process code:

```
import numpy as np
import torch
def forward_routine(leaf):
xi = lambda x: apply_kernel(leaf,x,extra_param)
err = np.zeros([S,L])
for j in range(S): # S = 10
# xi is a function that uses the leaf variable, so what it returns requires grad.
# The function 'reconstruct' uses xi many times.
# rec is the reconstruction, that requires grad.
# P is a parameter that is constant for each reconstruction
# err is a numpy array.
rec, err[j] = reconstruct(P, w[j], xi)
loss_cur = loss_func(rec, obs[j])
loss = loss + loss_cur
return loss
if __name__ == "__main__":
# x0 is a numpy array, let's say of size (D,N)
# I give this functor to the scipy's LBFGS routine.
def torch_func(x0):
leaf = torch.from_numpy(x0, requires_grad=True)
loss = forward_routine(leaf)
loss.backward()
grad = leaf.grad
return loss.numpy(), grad.numpy()
# Read input data
# Run LBFGS, with torch_func as the function to minimize.
```

I know that lambda functions can’t be pickled so I wrote a function object for xi, but I didn’t add it to be short.