Exception in BCECriterion.cu:42

Is someone aware of this exception? Raised after 70th epoch (it does not depend on epoch… did it again and got the exception after 40th epoch)

Train Epoch: 70 [0/6742 (0%)]   Loss: -431231.800000
/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [4,0,0], thread: [224,0,0] Assertion `input >= 0. && input <= 1.` failed.

Thanks

Looks like at some point in training the model is outputting value outside the range of 0 to 1. Are you sure that the predicted value from the model are in range 0 to 1?

Can you put a try-except to see the value of the model prediction when Error is raised?

1 Like

I used the MNIST dataset. The values are between 0 and 255 (included).

the predicted values have such values (a snippet)

    0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
    5.4309e-15, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
    0.0000e+00, 7.7825e-17, 1.0000e+00, 1.0000e+00, 1.0000e+00, 1.0000e+00,
    1.0000e+00, 1.0000e+00, 1.0000e+00, 1.0000e+00, 1.0000e+00, 1.0000e+00,
    1.0000e+00, 1.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
    0.0000e+00, 2.2867e-38, 1.7363e-37, 9.8193e-22, 9.3205e-27, 0.0000e+00,

you are right my loss function is the F.binary_cross_entropy.

The predicted value for the model should be between 0 and 1.

import torch
import torch.nn.functional as F
preds = torch.ones(3) + 0.01
target = torch.ones(3)
F.binary_cross_entropy(preds, target) # Crashes as preds is not between 0 and 1.

I suspect this is what is happening in the epoch in which it is crashing i.e. preds is not in range 0 and 1

For the epoch it crashes can you do,

try:
    # training step
except:
    mask = (preds > 1) + (preds < 0) # see if any of the value in mask is True
    print(mask.sum()) # If non-zero, then predictions are out of the range.

Most probably you have done this but do verify if you have applied sigmoid before passing the preds to the model.

Or you can use https://pytorch.org/docs/stable/nn.functional.html#binary-cross-entropy-with-logits

Hope this helps.

Or you can use https://pytorch.org/docs/stable/nn.functional.html#binary-cross-entropy-with-logits

did not help, loss function returened a nan

tensor(nan, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)

F.binary_cross_entropy(preds, target) # Crashes as preds is not between 0 and 1.

not the same crash: BCECriterion.c:62 (in my case it was BCECriterion.cu:42)

but both of them reveal the wrong input.

thank you!

the funny fact is that the last layer of my model is already a sigmoid function


return F.sigmoid(self.fc6(h))

that means that the results should be within the required range

1 Like

Surprising to see the error.

(btw) I tried the same execution with smaller learning rates 1e-4/1e-5/1e-6 over 150 iterations and didn’t get any errors.

Still waiting for help regarding this issue.

(my replicated post in github https://github.com/pytorch/pytorch/issues/36647)