Running validation on CPU when network is on GPU?

What is the best way to run the validation on CPU when the network is on GPU? I want to run the entire validation dataset during the validation phase, which will of course not on GPU, and so I would expect to run this on CPU whenever I want a validation run. What’s the best approach to doing this in PyTorch. The simplest (but probably not efficient) way I would think of is to somehow just copy the entire network on each validation run with the device target being CPU (is there an easy way to do this?). Of course, perhaps the variables could just be retained and the values updated before the validation. Or is there a better way still? Or would it be recommended just to run the validation dataset in batches on GPU to process it? Other thoughts/suggestions? If one of the copying of variables or variable values is the recommended way, would someone mind giving a pointer as to which function I might use to copy over those values? Thank you!

One approach is to have two processes in two different terminals.

  • First process runs training, at end of each epoch, writes the model to disk (torch.save(model.state_dict()) or something)
  • Second process polls for a valid checkpoint on disk, and once it detects the file, it loads it and runs the checkpoint on the validation set.

Personally, I just run train -> validation -> train -> validation, all on GPU, because GPU is kinda much faster.

1 Like