Make cross validation parallelized

I have a cuda9-docker with tensorflow and pytorch installed, I am doing cross validation on an image dataset. Currently I am using a for loop to do the cross validation. Something like

for data_train, data_test in sklearn.kfold(5, all_data):
  train(data_train)
  test(data_test)

But the for loop takes too long, will the following code work to parallelize the for loop? Maybe there is already a solution. But this is not Data Parallelization.

from multiprocessing import Pool

def f(trainset, testset):
    train_result = train(trainset)
    test_result = test(testset)
    save_train_result()
    save_test_result()

if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(f, sklearn.cvfold(5, all_data)))

I am not sure if the multiprocessing will only paralize the cpu or both cpu and gpu? This might be easiler than doing parallel in side a model i guess like Parallelize simple for-loop for single GPU
since in my case, there is no need to communicate across each process?

I have the same issue. Anyone has solution?

I currently found this may help, but haven’t check the performance.

I am not sure if the multiprocessing will only paralize the cpu or both cpu and gpu?

According to the official documentation of Process Pools:

processes is the number of worker processes to use. If processes is None then the number returned by os.cpu_count() is used.

Therefore, I believe it should only do the parallelization on CPU.

There is no need to communicate across each process?

I think even if you parallelize the loops here, you will still have to sync gradients/weights across processes, because there is actually an implicit data dependency of the model weights.

If you parallelize the cross validation for loop w/o communication, you will evolve a separate models on each process, and this will slower down the convergence.