My model training uses k-fold cross-validation, and I’m exploring whether it’s possible to parallelize the k-fold process on a single GPU. Specifically, can each fold run in a separate stream on the same GPU, dispatching folds until the cores are fully utilized? For instance, if a GPU can handle 3 streams at once, and I have 6 folds, parallelism could theoretically reduce the cross-validation time to a third.
Does this approach make sense, or does PyTorch already handle parallelization for cross-validation?
Thanks!