There seems to be a problem mixing pytorch’s autograd with joblib. I need to get gradient in parallel for a lot of samples. Joblib works fine with other aspects of pytorch, however, when mixing with autograd it gives errors. I made a very small example which shows serial version works fine but the parallel version crashes.
from joblib import Parallel, delayed import numpy as np torch.autograd.set_detect_anomaly(True) tt = lambda x, grad=True: torch.tensor(x, requires_grad=grad) def Grad(X, Out): return autograd.grad(Out, X, create_graph=True, allow_unused=True) xs, ys = ,  for i in range(10): xi = tt(np.random.rand()).float() yi = xi * xi xs += [xi] ys += [yi] Grads_serial = [Grad(x, y) for x, y in zip(xs, ys)] print("Grads_serial", Grads_serial) Grads_parallel = Parallel(n_jobs=2)([delayed(Grad)(x, y) for x, y in zip(xs, ys)]) print("Grads_parallel", Grads_parallel)
The error message is not very helpful as well:
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.