which means that you are going to:
deatch --> cut computational graph
cpu --> allocate tensor in RAM
clone --> clone the tensor not to modify the output in-place
numpy --> port tensor to numpy

Note: permute is a pytorch function, if you map it into a numpy tensor you should use transpose

Then Tensor does have a .detach() method. Make sure you call it on a Tensor.

Also you use both img and seg_pred in your code. Make sure to do the .detach().cpu().numpy() (the .clone() is not necessary in this case I think, if you get an error from numpy saying that you try to modify a read-only array, then add it back) to each of them if you need a numpy array from them.

I think the simplest is going for you to print the objects. And check whether you have a Tensor (if not specified, it’s on the cpu, otherwise it will tell your it’s a cuda Tensor) or a np.array.

You need to give a Tensor to your model, torch operations and np.array to everything else.

To go from np.array to cpu Tensor, use torch.from_numpy().
To go from cpu Tensor to gpu Tensor, use .cuda().
To go from a Tensor that requires_grad to one that does not, use .detach() (in your case, your net output will most likely requires gradients and so it’s output will need to be detached).
To go from a gpu Tensor to cpu Tensor, use .cpu().
Tp gp from a cpu Tensor to np.array, use .numpy().