TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first

for epoch in range(start_epoch, start_epoch + epochs):

    print('\n\n\nEpoch: {}\n<Train>'.format(epoch))
    loss = 0
    learning_rate = learning_rate * (0.5 ** (epoch // 4))
    for param_group in optimizer.param_groups:
        param_group["learning_rate"] = learning_rate
    for idx, (inputs, targets, paths) in enumerate(trainloader):
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = net(inputs)
        if type(outputs) == tuple:
            outputs = outputs[0]
        batch_loss = dice_coef(outputs, targets)
        loss += float(batch_loss)
        progress_bar(idx, len(trainloader), 'Loss: %.5f, Dice-Coef: %.5f'
                     % ((loss / (idx + 1)), (1 - (loss / (idx + 1)))))
    log_msg = '\n'.join(['Epoch: %d  Loss: %.5f,  Dice-Coef:  %.5f' \
                         % (epoch, loss / (idx + 1), 1 - (loss / (idx + 1)))])

def dice_coef(preds, targets, backprop=True):
smooth = 1.0
class_num = 2
if backprop:
for i in range(class_num):
pred = preds[:,i,:,:]
target = targets[:,i,:,:]
intersection = (pred * target).sum()
loss_ = 1 - ((2.0 * intersection + smooth) / (pred.sum() + target.sum() + smooth))
if i == 0:
loss = loss_
loss = loss + loss_
loss = loss/class_num
return loss
# Need to generalize
targets = np.array(targets.argmax(1))
if len(preds.shape) > 3:
preds = np.array(preds).argmax(1)
for i in range(class_num):
pred = (preds==i).astype(np.uint8)
target= (targets==i).astype(np.uint8)
intersection = (pred * target).sum()
loss_ = 1 - ((2.0 * intersection + smooth) / (pred.sum() + target.sum() + smooth))
if i == 0:
loss = loss_
loss = loss + loss_
loss = loss/class_num
return loss

I facing this type of error anyone knows how to solve it?

As the error message suggests, you would have to push the tensor to the CPU first before converting it to a numpy array via tensor.cpu().
In particular np.array(targets.argmax(1)) seems to raise the error to use:

targets = targets.argmax(1).cpu().numpy()


PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier.


Thanks for your help.

Oic, thanks for your advise I will try to use it ^^

I have a similar error that I think is arising when I try to use sklearn’s roc_auc_score:

Exception has occurred: TypeError
Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/ubuntu/deep-behavior-embedding/.virtualenv/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/ubuntu/deep-behavior-embedding/.virtualenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/deep-behavior-embedding/.virtualenv/lib/python3.7/site-packages/pytorch_lightning/overrides/data_parallel.py", line 77, in forward
    output = super().forward(*inputs, **kwargs)
  File "/home/ubuntu/deep-behavior-embedding/.virtualenv/lib/python3.7/site-packages/pytorch_lightning/overrides/base.py", line 57, in forward
    output = self.module.validation_step(*inputs, **kwargs)
  File "/home/ubuntu/deep-behavior-embedding/src/model/lightning_model.py", line 144, in validation_step
    auc = roc_auc_score(y_true=y, y_score=torch.sigmoid(x))
  File "/home/ubuntu/deep-behavior-embedding/.virtualenv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/home/ubuntu/deep-behavior-embedding/.virtualenv/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 522, in roc_auc_score
    y_type = type_of_target(y_true)
  File "/home/ubuntu/deep-behavior-embedding/.virtualenv/lib/python3.7/site-packages/sklearn/utils/multiclass.py", line 261, in type_of_target
    if is_multilabel(y):
  File "/home/ubuntu/deep-behavior-embedding/.virtualenv/lib/python3.7/site-packages/sklearn/utils/multiclass.py", line 147, in is_multilabel
    y = np.asarray(y)
  File "/home/ubuntu/deep-behavior-embedding/.virtualenv/lib/python3.7/site-packages/numpy/core/_asarray.py", line 102, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/home/ubuntu/deep-behavior-embedding/.virtualenv/lib/python3.7/site-packages/torch/_tensor.py", line 643, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
  File "/home/ubuntu/deep-behavior-embedding/finetune.py", line 108, in <module>
    trainer.fit(model, datamodule=dm)

The code is here:

    def validation_step(self, batch, batch_idx):
        x = batch['src']
        y = batch['label']
        mask = batch['mask']

        x = self.base_model(x, mask)
        x = self.linear(x).mean(axis=1).squeeze(1)
        loss = F.binary_cross_entropy_with_logits(input=x,
        self.log('val_loss', loss)
            auc = roc_auc_score(y_true=y, y_score=torch.sigmoid(x))
            self.log('auc', auc)
        except ValueError:
            self.log('auc', 0)
        return loss

Should I be converting to numpy arrays? I’m afraid this might slow things down.

Sorry for reviving an old thread. Is there any reason that pytorch explictly raises an exception instead of simply calling .cpu() internally inside .numpy().

I personally find it annoying to have to write .cpu().numpy() every time. For attached tensors this is even more annoying with .detach().cpu().numpy() which is very verbose.

Have you considered integrating these steps into the .numpy() call? Is there some risk?

Just use .tolist() instead of .cpu().numpy()

Yes, I think the explicit nature of detaching the tensor and moving it to the CPU makes clear that:

  • Autograd won’t be able to track the operations performed on the result array,
  • all operations will be performed on the CPU on the result array.

While some users could find it annoying, I think it can avoid a lot of confusion especially for novice users as the detaching operation is explicit.
You can find a lot of topics here where a user was wondering why the training got stuck after using a threshold operation (which implicitly detaches the result). Of course it makes sense once you know that a threshold is not differentiable, but a user might still spend some time in debugging it.

Not sure what you mean? My tensors are typically images or feature maps, and I want to e.g. show them as images, or further process in numpy. tolist() doesnt make sense there.

Any suggestions for my code? I am also facing the same error. I have tried argmax(1).cpu().numpy()

losses = []

def train_epoch(models, criterion, optimizers, dataloaders):
    global iters
    train_loss = []
    for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
        with torch.cuda.device(CUDA_VISIBLE_DEVICES):
            inputs = data[0].cuda()
            labels = data[1].cuda()
        iters += 1
        scores, _, features = models(inputs) 
        target_loss = criterion(scores, labels)
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)    
        loss = m_backbone_loss
    return loss

Which line in this code is giving you the error?

Also, could you please paste the error message as it shows?

Error: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I want to plot losses

By any chance, are you using the list losses to get the values to plot?

If yes, then the problem lies in these two lines -

Replace both of them with this single line -


Let me know if you still face the error.

Thank you so much, Srishti!
But I am still facing the same error.

Please share the code that gives you the error if this doesn’t work -


Or this -


(And sorry, I forgot to include .cpu() in my previous reply - have edited it now.)

1 Like

yeah this .cpu() helped to resolve.
Thank you so much!

I am facing the same error for the following code:

 with torch.no_grad():
        for idx, (inputs, targets) in enumerate(ploader):
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = net(inputs)
            scores, predicted = outputs.max(1)
            # save top1 confidence score 
            outputs = F.normalize(outputs, dim=1)
            probs = F.softmax(outputs, dim=1)
            progress_bar(idx, len(ploader))
    idx = np.argsort(top1_scores)                 #Error is for this line
    samples = np.array(samples)
    return samples[idx[:1000]]

If the following line gives the error,

make sure PyTorch tensors are being returned in the __getitem__ method of the Dataset class whose instance is being used to create ploader.

I just tried to run the original repository, didn’t read and understand the code fully. But any comments how to fix getitem() here?

sub5k = Loader2(is_train=False,  transform=transform_test, path_list=samples)
ploader = torch.utils.data.DataLoader(sub5k, batch_size=1, shuffle=False, num_workers=2)

class Loader2(Dataset):
    def __init__(self, is_train=True, transform=None, path='./DATA', path_list=None):
        self.is_train = is_train
        self.transform = transform
        self.path_list = path_list

        if self.is_train: # train
            self.img_path = path_list
            if path_list is None:
                self.img_path = glob.glob('./DATA/train/*/*') # for loss extraction
                self.img_path = path_list
    def __len__(self):
        return len(self.img_path)

    def __getitem__(self, idx):
        if self.is_train:
            img = cv2.imread(self.img_path[idx][:-1])
            if self.path_list is None:
                img = cv2.imread(self.img_path[idx])
                img = cv2.imread(self.img_path[idx][:-1])
        img = Image.fromarray(img)
        img = self.transform(img)
        label = int(self.img_path[idx].split('/')[-2])

        return img, label

Try adding the following before the return statement:

label = torch.tensor([label])

Check out this that you could use in your transforms object to convert your img to a pytorch tensor.

Thank you for your help, I resolved the issue