RuntimeError: CUDA error: device-side assert triggered

I’m putting my code here:

with torch.no_grad():
        retrieval_one_hot = torch.zeros(k, 10).cuda()
        for batch_idx, (inputs, targets, indexes) in enumerate(testloader):
             inputs, targets, indexes =,,
             targets = targets.cuda()#async=True)
             batchSize = inputs.size(0)
             features = net(inputs)
             zz = torch.zeros(batchSize * k, 10).cuda()

and it returns the error at the line zz = torch.zeros(batchSize * k, 10).cuda(), with the error message “RuntimeError: CUDA error: device-side assert triggered”, any suggestions?

1 Like

Please the error message!

This suggestion might be the case for your problem.


The code runs fine on cpu, but fails to run on gpu.

Debugging CUDA device-side assert in PyTorch
This article would help you to debug your code. If you get a better traceback setting CUDA_LAUNCH_BLOCKING=1, post it.


I am also getting error in CPU

I think you unintentionally used CUDA tensors. Please place every tensor x in CPU memory:

x = x.cpu()
1 Like

You should read this post:


The reason why you met this problem becasue your “target” labels include minus values such as “-1”.
I find this reason when i met this problem. hope useful!


Very helpful! Thanks a lot!

In my case, this error is caused because my loss function just receive values between [0, 1], and i was passing other values.

So, normalizing my loss function input, solve this:

    saida_G -= saida_G.min(1, keepdim=True)[0]
    saida_G /= saida_G.max(1, keepdim=True)[0]

Read this: link

Another issue that might raise this error is mismatch with the last layer of the network. Validate that the output of the network is the same as the number of labels.

1 Like

Yes, the same situation caused this error for me. I just fixed that. Thanks!

for noobs like me:

import os
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

on top of your script and run the script to get a way better stack trace. in my case the cuda error told me to look at a completely different line which caused an hour of debugging