I have Inference routine consist of prediction using GPU:

```
for i in data:
pred_subcube= model(data[i].cuda())
```

Where data[i] is item (Tensor of shape 5x1x128x128x128, where 5 is batchsize, 1 channels, and 128x128x128 is shape of my 3D image) of DataLoader class, model is my U-Net like 3D CNN model.

And then I would like to set ‘pred’ as part of numpy array :

`resulting_cube[i] = pred_subcube.detach().cpu().numpy()`

Is it efficient way in my case to predict using GPU and then sent it to CPU to fill my “resulting_cube” array? Isn’t it more efficient to predict using CPU to not waste time to detach.