I saw that one too, but it didn’t seem to fit my use case either. My approach would be something like this…
from joblib import Parallel, delayed
results = Parallel(n_jobs=-1)([delayed(train_function)(args) for _ in range(80)]
Where train_function
trains a model for a fixed number of epochs or until some stopping criterion and returns a list of validation losses per epoch (for example). When the parallel jobs are all done results
is simply a list containing the return values from each run of train_function
.
I haven’t tested this approach so I can’t say whether torch Tensors and Variables can be passed to train_function
successfully, nor whether they can share memory properly. That said, sklearn uses joblib so I am pretty sure that numpy arrays can be passed to the train_function
efficiently.