Moving model and data between cpu and gpu with
.cuda() could make the code messy if not done properly.
Are there any recommend guideline/standard practice to go about it?
What about moving data to gpu in the
forward(self, ...) function of a network and returning results after moving them back to cpu?
something along the lines of
def forward(self ,x):
x = x.cuda()
x = self.fc(x)
x = x.cpu()
Is this a good idea or are there reasons to not go in this direction?
Moving or creating data inside the
cuda() might yield errors, if you are using
DistributedDataParallel, since the model will be copied to each specified device, while your
cuda() call will move the data to the default device.
Of course you could specify the device id by reading it from the attribute of an internal parameter e.g. via
x = x.to(self.param.device), but I would generally push the data to the device in the
I’m not sure, if you need to result on the CPU, but note that this call will be synchronizing your code.
If you are training and thus need the output for the further loss calculation, I would just leave the output on the device.
Fair enough, that makes sense. Thanks!
For my current situation I am playing around Reinforcement Learning algorithms and the inputs to the network could come from a variety of places including the gym and my replaybuffer…
Also right now I am doing some numpy operations after the forward pass and before the loss calculation …so maybe changing operations to run without grad on torch tensors, instead of using numpy will be more elegant in this case …