Parallel k-fold cross validation on one GPU

My model training uses k-fold cross-validation, and I’m exploring whether it’s possible to parallelize the k-fold process on a single GPU. Specifically, can each fold run in a separate stream on the same GPU, dispatching folds until the cores are fully utilized? For instance, if a GPU can handle 3 streams at once, and I have 6 folds, parallelism could theoretically reduce the cross-validation time to a third.

Does this approach make sense, or does PyTorch already handle parallelization for cross-validation?

Thanks!

I don’t believe there is native support for this.

Assuming you have enough VRAM for the above, pre-making the folds and submitting “Separate Jobs” should be a simplistic way to achieve this?