I have Inference routine consist of prediction using GPU:
for i in data:
pred_subcube= model(data[i].cuda())
Where data[i] is item (Tensor of shape 5x1x128x128x128, where 5 is batchsize, 1 channels, and 128x128x128 is shape of my 3D image) of DataLoader class, model is my U-Net like 3D CNN model.
And then I would like to set ‘pred’ as part of numpy array :
Is it efficient way in my case to predict using GPU and then sent it to CPU to fill my “resulting_cube” array? Isn’t it more efficient to predict using CPU to not waste time to detach.
I am not an expert on this so take it for what it is worth, but my hunch is that if the model is bigger than some small limit then the time for detaching the results will be negligible compared to the time for computing them on the CPU.