Efficiency of predictions using GPU vs CPU

I have Inference routine consist of prediction using GPU:

for i in data: 
  pred_subcube=  model(data[i].cuda())

Where data[i] is item (Tensor of shape 5x1x128x128x128, where 5 is batchsize, 1 channels, and 128x128x128 is shape of my 3D image) of DataLoader class, model is my U-Net like 3D CNN model.
And then I would like to set ‘pred’ as part of numpy array :

resulting_cube[i] = pred_subcube.detach().cpu().numpy()

Is it efficient way in my case to predict using GPU and then sent it to CPU to fill my “resulting_cube” array? Isn’t it more efficient to predict using CPU to not waste time to detach.

I am not an expert on this so take it for what it is worth, but my hunch is that if the model is bigger than some small limit then the time for detaching the results will be negligible compared to the time for computing them on the CPU.

that’s exactly the idea of my question) how deep have to be my model to use it with GPU and then detach

The one sure way to figure this out is to run some timed tests in either mode and compare. I am sorry, but I don’t know enough to make a good guess.

yeah, thats nice suggestion, but I thought there are some thoughts from experts who already tested thouse routines