Best practice for training & validating on multiple GPUs?

Several configuration I could think of:

  1. Train and validate on all possible same GPUs (not able to set different batch_size for train/validate)
  2. Train and validate on different GPUs (can set different batch_size)
  3. Train on all GPUs and save the model per epoch, later run the model on validation data. (not able to use early stopping on validation loss)

What is the best practice?
Any other thoughts and suggestions will be appreciated.

All depends on your goals. If you want to maximize validation throughput, you’ll want to use as as many devices as you can. If you don’t care and want to keep your code simple, you can choose to use just one.