Device-side assert triggered when using binary_cross_entropy loss

I got `Runtime Error: cudaEventSynchronize in future::wait device-side assert triggered ’ when I use binary_cross_entropy

I think this is because the input of the BCELoss must fall into the range of [0,1].

my input is a product of two softmax, so, in theory, the product will never greater than 1.

I think this my be related to floating-point precision ?

and if so, how can I solve this problem.

can anyone help me ? thank you !

here is my code

cls_prob = F.softmax(cls_score, dim=1)
det_prob = F.softmax(det_score, dim=0)
predict = F.mul(cls_prob, det_prob)
loss = F.binary_cross_entropy(predict, label, size_average=False)
1 Like

Can you run your script with CUDA_LAUNCH_BLOCKING=1 and see what is the error message that is printed please.

Sorry, I think I missed some specific code information.
here is my complete code

import torch
from wsddn.roi_pooling.modules.roi_pool import RoIPool
from import FC
from wsddn.utils import network
import torch.nn.functional as F
import torch.nn as nn
from wsddn.vgg16 import VGG16

class WSDDN(nn.Module):
    feature_scale = 1.0 / 16
    n_classes = 21

    def __init__(self, classes=None):
        super(WSDDN, self).__init__()
        if classes is not None:
            self.classes = classes
            self.n_classes = len(classes)

        self.features = VGG16()
        self.roi_pool = RoIPool(7, 7, self.feature_scale)
        self.fc6 = FC(512 * 7 * 7, 4096)
        self.fc7 = FC(4096, 4096)
        self.classifier_head = FC(4096, self.n_classes, relu=False)
        self.detection_head = FC(4096, self.n_classes, relu=False)

        self._loss = None
        self._detection = None

    def forward(self, im_data, rois, labels):
        im_data = network.np_to_variable(im_data, is_cuda=True)
        im_data = im_data.permute(0, 3, 1, 2)
        rois = network.np_to_variable(rois, is_cuda=True)
        labels = network.np_to_variable(labels, is_cuda=True)
        features = self.features(im_data)

        pooled_features = self.roi_pool(features, rois)
        x = pooled_features.view(pooled_features.size()[0], -1)
        x = self.fc6(x)
        x = F.dropout(x,
        x = self.fc7(x)
        x = F.dropout(x,
        cls_score = self.classifier_head(x)
        det_score = self.detection_head(x)

        cls_predict = F.softmax(cls_score, dim=1)
        det_predict = F.softmax(det_score, dim=0)
        predict = F.mul(cls_predict, det_predict)
        y_predict = predict.sum(dim=0)
        y_predict = y_predict[1:]

        self._loss = self.build_loss(y_predict, labels)
        self._detection = predict

        return y_predict

    def detection(self):
        return self._detection

    def loss(self):
        return self._loss

    def build_loss(self, y_predict, labels):
        loss = F.binary_cross_entropy(y_predict, labels, size_average=False)
        # y_predict = torch.clamp(y_predict, min=1e-4, max=1 - 1e-4)
        # loss = -1 * torch.log(labels * (y_predict - 1.0 / 2) + 1 / 2).sum()
        return loss

and the weird thing is, this problem will occur after training for about 10000 iters, so I’m waiting for the problem now :joy:

hi, this is the error message

/pytorch/torch/lib/THCUNN/ Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::tuple<float, float, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [14,0,0] Assertion `input >= 0. && input <= 1.` failed.
Traceback (most recent call last):
  File "/home/tz/projects/wsdnn_pytorch/", line 86, in <module>
    predict = net(im_data, prior_boxes, gt_classes)
  File "/home/tz/anaconda2/envs/dl-python3/lib/python3.5/site-packages/torch/nn/modules/", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/tz/projects/wsdnn_pytorch/wsddn/", line 53, in forward
    self._loss = self.build_loss(y_predict, labels)
  File "/home/tz/projects/wsdnn_pytorch/wsddn/", line 67, in build_loss
    loss = F.binary_cross_entropy(y_predict, labels, size_average=False)
  File "/home/tz/anaconda2/envs/dl-python3/lib/python3.5/site-packages/torch/nn/", line 1200, in binary_cross_entropy
    return torch._C._nn.binary_cross_entropy(input, target, weight, size_average)
RuntimeError: after cudaLaunch in triple_chevron_launcher::launch(): device-side assert triggered

thank you for your help!!!

From the error message it seems that the input of your BCE loss is not between 0 and 1. The input you give should represent the probability of label 1, so it should be between 0 and 1.

1 Like

I agree with you.
so Can I just simply use torch.clamp to restrict the input?
I think the reason why the input doesn’t fall into range [0, 1] is the float-point precision

If it is floating-point precision error, then clamping will work, or adding the minimum and dividing by the max.
But first I would make sure that this is a precision problem, basically do this fix only if you’re close enough to either 0 or 1. Otherwise raise an error.

yes, thanks for your concrete reply!!

Hi @albanD I still found the similar issue in newest pytorch version (stable 1.4). I hope this issue will be fix soon in the next pytorch version.

This issue is due to user error (giving unexpected input to a function), not from pytorch’s side.

@albanD No I am doing like this.

criterion = nn.BCELoss() 
pred = torch.sigmoid(pred)
loss = criterion(pred, target)

It still giving error, but if I add clamp the error resolved.

criterion = nn.BCELoss() 
pred = torch.clamp(torch.sigmoid(pred),0,1)
loss = criterion(pred, target)

Which means the output of sigmoid is not in range 0 and 1, or maybe because of the precision problem. However suppose if I implement the attention module, which use sigmoid to produce [0-1] range, it will has problem because maybe the result is not pure 0 or 1 in range.

Could you please post some values for pred which produce this error?

@ptrblck Hello I am sorry for the late reply. After doing debugging for several months, finally I know the main problem. This long time debugging happened because the error is only shows in specific time so I need to run the training again and again to get exactly the error. I wait the error for coming but it just come again several weeks ago. The problem is caused by the Nan value of the prediction. This makes sense that the error not always happened, where its depends on your model performance. Actually the error saying that the value is not between 0 and 1, in fact it is Nan. So I think next time its better to detect the Nan value before calculate the loss. Use pytorch function torch.isnan to make sure the prediction is not Nan. Also I suggest the pytorch should can produce the Nan error instead of only showing error message value not between 0 and 1.

1 Like

Thanks for the update. I like the suggestion about printing the actual invalid value.
Would you like to open a GitHub issue with this feature request?

I face the same problem as you. It takes very long to debug because it only happens now and then. Do you know why there could be NaN value in prediction and how to prevent that from happening?

I am facing the same trouble as the original author posted. I multiply the results of two softmax outputs (softmax over two different dimentions). Then I sum the tensor over one dimention to get the final output scores, say a 20-d tensor. Here is the output score which triggers the cuda AssertionError, specifically one value 1.0000e+00, which in theory should not happy.

I assume this is related to floating-point precision error. This error is not stable to reproduce. I got the error sometimes around 3k steps and sometimes after 10k during training.

Does it imply that we should clamp the tensor whenever we use the binary_cross_entropy_loss? I think it might be a good idea to log what value is actually causing the AssertionError.

tensor([9.4490e-05, 1.3122e-06, 1.9130e-03, 1.1611e-04, 3.1499e-05, 7.9529e-05,
        5.0480e-05, 1.0000e+00, 2.0515e-04, 1.4706e-06, 3.1726e-05, 1.7213e-09,
        8.1568e-05, 6.2557e-06, 1.4758e-06, 2.2086e-04, 1.9921e-04, 7.1404e-05,
        6.8685e-06, 1.0655e-04], device='cuda:0', grad_fn=<SumBackward1>)
cls_prob = F.softmax(cls_score, dim=1) # across classes [2000,20]
det_prob = F.softmax(det_score, dim=0) # across proposals/detections [2000,20]
predict = F.mul(cls_prob, det_prob) # shape: [2000,20]
pred_class_scores = sum(predict, dim=0) # [20]
loss = F.binary_cross_entropy(pred_class_scores, label, size_average=False)

Your code might create values larger than 1. due to the limited floating point precision as seen here:


cls_score = torch.randn(2000, 20, device='cuda')
cls_score[:, 19] = 100.
det_score = torch.randn(2000, 20, device='cuda')

cls_prob = F.softmax(cls_score, dim=1) # across classes [2000,20]
det_prob = F.softmax(det_score, dim=0) # across proposals/detections [2000,20]
predict = torch.mul(cls_prob, det_prob) # shape: [2000,20]
pred_class_scores = torch.sum(predict, dim=0) # [20]
print((pred_class_scores > 1.).any())
> tensor(True, device='cuda:0')

print(pred_class_scores[19] - 1.)
> tensor(1.1921e-07, device='cuda:0')

so I think you should clamp the values before passing them to the loss function.

1 Like

Thank you for creating the example! It helps me a lot.

Hi, I meet the same problem as posted here. I checked the value of the results after the multiplication of the two scores (computed by softmax), and sometimes it did gives values larger than 1. It seemed truly a precision problem.

I check the solution of a GitHub repo ( The solution proposed in this repo is simply clamping the scores. I think clmap the values will cause zero gradient during back propagation, but it seems there is no other solutions right mow.