Device-side assert triggered when using binary_cross_entropy loss


(Tztztztztz) #1

I got `Runtime Error: cudaEventSynchronize in future::wait device-side assert triggered ’ when I use binary_cross_entropy

I think this is because the input of the BCELoss must fall into the range of [0,1].

my input is a product of two softmax, so, in theory, the product will never greater than 1.

I think this my be related to floating-point precision ?

and if so, how can I solve this problem.

can anyone help me ? thank you !

here is my code

cls_prob = F.softmax(cls_score, dim=1)
det_prob = F.softmax(det_score, dim=0)
predict = F.mul(cls_prob, det_prob)
loss = F.binary_cross_entropy(predict, label, size_average=False)

(Alban D) #2

Hi,
Can you run your script with CUDA_LAUNCH_BLOCKING=1 and see what is the error message that is printed please.


(Tztztztztz) #3

Sorry, I think I missed some specific code information.
here is my complete code


import torch
from wsddn.roi_pooling.modules.roi_pool import RoIPool
from wsddn.utils.network import FC
from wsddn.utils import network
import torch.nn.functional as F
import torch.nn as nn
from wsddn.vgg16 import VGG16


class WSDDN(nn.Module):
    feature_scale = 1.0 / 16
    n_classes = 21

    def __init__(self, classes=None):
        super(WSDDN, self).__init__()
        if classes is not None:
            self.classes = classes
            self.n_classes = len(classes)

        self.features = VGG16()
        self.roi_pool = RoIPool(7, 7, self.feature_scale)
        self.fc6 = FC(512 * 7 * 7, 4096)
        self.fc7 = FC(4096, 4096)
        self.classifier_head = FC(4096, self.n_classes, relu=False)
        self.detection_head = FC(4096, self.n_classes, relu=False)

        self._loss = None
        self._detection = None

    def forward(self, im_data, rois, labels):
        im_data = network.np_to_variable(im_data, is_cuda=True)
        im_data = im_data.permute(0, 3, 1, 2)
        rois = network.np_to_variable(rois, is_cuda=True)
        labels = network.np_to_variable(labels, is_cuda=True)
        features = self.features(im_data)

        pooled_features = self.roi_pool(features, rois)
        x = pooled_features.view(pooled_features.size()[0], -1)
        x = self.fc6(x)
        x = F.dropout(x, training=self.training)
        x = self.fc7(x)
        x = F.dropout(x, training=self.training)
        cls_score = self.classifier_head(x)
        det_score = self.detection_head(x)

        cls_predict = F.softmax(cls_score, dim=1)
        det_predict = F.softmax(det_score, dim=0)
        predict = F.mul(cls_predict, det_predict)
        y_predict = predict.sum(dim=0)
        y_predict = y_predict[1:]

        self._loss = self.build_loss(y_predict, labels)
        self._detection = predict

        return y_predict

    @property
    def detection(self):
        return self._detection

    @property
    def loss(self):
        return self._loss

    def build_loss(self, y_predict, labels):
        loss = F.binary_cross_entropy(y_predict, labels, size_average=False)
        # y_predict = torch.clamp(y_predict, min=1e-4, max=1 - 1e-4)
        # loss = -1 * torch.log(labels * (y_predict - 1.0 / 2) + 1 / 2).sum()
        return loss










and the weird thing is, this problem will occur after training for about 10000 iters, so I’m waiting for the problem now :joy:


(Tztztztztz) #4

hi, this is the error message

/pytorch/torch/lib/THCUNN/BCECriterion.cu:30: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::tuple<float, float, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [14,0,0] Assertion `input >= 0. && input <= 1.` failed.
Traceback (most recent call last):
  File "/home/tz/projects/wsdnn_pytorch/train.py", line 86, in <module>
    predict = net(im_data, prior_boxes, gt_classes)
  File "/home/tz/anaconda2/envs/dl-python3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/tz/projects/wsdnn_pytorch/wsddn/wsddn.py", line 53, in forward
    self._loss = self.build_loss(y_predict, labels)
  File "/home/tz/projects/wsdnn_pytorch/wsddn/wsddn.py", line 67, in build_loss
    loss = F.binary_cross_entropy(y_predict, labels, size_average=False)
  File "/home/tz/anaconda2/envs/dl-python3/lib/python3.5/site-packages/torch/nn/functional.py", line 1200, in binary_cross_entropy
    return torch._C._nn.binary_cross_entropy(input, target, weight, size_average)
RuntimeError: after cudaLaunch in triple_chevron_launcher::launch(): device-side assert triggered

thank you for your help!!!


(Alban D) #5

From the error message it seems that the input of your BCE loss is not between 0 and 1. The input you give should represent the probability of label 1, so it should be between 0 and 1.


(Tztztztztz) #6

I agree with you.
so Can I just simply use torch.clamp to restrict the input?
I think the reason why the input doesn’t fall into range [0, 1] is the float-point precision


(Alban D) #7

If it is floating-point precision error, then clamping will work, or adding the minimum and dividing by the max.
But first I would make sure that this is a precision problem, basically do this fix only if you’re close enough to either 0 or 1. Otherwise raise an error.


(Tztztztztz) #8

yes, thanks for your concrete reply!!