In binary_cross_entropy, RuntimeError: CUDA error: device-side assert triggered

hi,

Hoping someone can help, In a GAN, I get the error:

C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [3,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [4,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [6,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [7,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [9,0,0] Assertion `input_val >= zero && input_val <= one` failed.
Traceback (most recent call last):
line 2762, in binary_cross_entropy
    return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
RuntimeError: CUDA error: device-side assert triggered

I am using nn.Sigmoid() within the discriminator to make sure that the output of the loss is between 0 & 1 and torch.nn.BCELoss() as the loss function. Can anyone help solve the error please?

cheers,

chaslie

In case you are not using the latest release (1.9.0), could you update PyTorch and rerun the script?
If you are still seeing the issue, could you post an executable code snippet reproducing this error as well as the output of python -m torch.utils.collect_env?

thanks Ptrblck, updating pytorch seems to have solved the problem, any ideas what was causing the error and how this was solved in the latest version of pytorch?

No, I don’t remember this exact issue in the last version, but I also don’t know which release you were using before updating.

fair point, i was on 1.8.3 i think???

Hi Ptrblck,

i have run the torch.utils.collect_env, i hope it makes more sense to you than me :laughing:

PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.8 (64-bit runtime)
Python platform: Windows-10-10.0.19041-SP0
Is CUDA available: True
CUDA runtime version: 10.2.89
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] numpydoc==1.1.0
[pip3] torch==1.9.0
[pip3] torchaudio==0.9.0
[pip3] torchio==0.18.25
[pip3] torchsummary==1.5.1
[pip3] torchvision==0.10.0
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.2.89              h74a9793_1
[conda] mkl                       2020.2                      256
[conda] mkl-service               2.3.0            py38h196d8e1_0
[conda] mkl_fft                   1.3.0            py38h46781fe_0
[conda] mkl_random                1.1.1            py38h47e9c7a_0
[conda] numpy                     1.19.2           py38hadc3359_0
[conda] numpy-base                1.19.2           py38ha3acd2a_0
[conda] numpydoc                  1.1.0              pyhd3eb1b0_1
[conda] pytorch                   1.9.0           py3.8_cuda10.2_cudnn7_0    pytorch
[conda] torchaudio                0.9.0                      py38    pytorch
[conda] torchio                   0.18.25                  pypi_0    pypi
[conda] torchsummary              1.5.1                    pypi_0    pypi
[conda] torchvision               0.10.0               py38_cu102    pytorch

after 9 epochs I get the crash with the following:

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([10, 700, 4, 4], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(700, 1024, kernel_size=[4, 4], padding=[1, 1], stride=[2, 2], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
ConvolutionParams 
    data_type = CUDNN_DATA_FLOAT
    padding = [1, 1, 0]
    stride = [2, 2, 0]
    dilation = [1, 1, 0]
    groups = 1
    deterministic = false
    allow_tf32 = true
input: TensorDescriptor 00000150CFDF7190
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 10, 700, 4, 4, 
    strideA = 11200, 16, 4, 1, 
output: TensorDescriptor 00000150CFDF6400
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 10, 1024, 2, 2, 
    strideA = 4096, 4, 2, 1, 
weight: FilterDescriptor 00000150CF7DAB60
    type = CUDNN_DATA_FLOAT
    tensor_format = CUDNN_TENSOR_NCHW
    nbDims = 4
    dimA = 1024, 700, 4, 4, 
Pointer addresses: 
    input: 0000001C68600000
    output: 0000001C38DB0000
    weight: 0000001CA2000000
Additional pointer addresses: 
    grad_output: 0000001C38DB0000
    grad_weight: 0000001CA2000000
Backward filter algorithm: 1

I assume the created code snippet in the error message runs fine or does it also crash?
Assuming the former, could you post an executable code snippet to reproduce this issue as well as which GPU you are using?

hi PtrBlck,

I am using a Titan RTX GPU.

It seems changing the learning rate only delays the onset.
the executable code for a vae-GAN is:

            b_size = real_cpu.size(0)
            label_r = torch.full((b_size,), real_label, dtype=torch.float, device=device)
            label_f = torch.full((b_size,), fake_label, dtype=torch.float, device=device)
            # Forward pass real batch through D
            output = netD(real_cpu).view(-1)
            # Calculate loss on all-real batch
            errD_real = criterion(output, label_r)
            # D_x = output.mean().item()


            # label.fill_(fake_label)
            loss_G1, out_G1 = loss_fn_G_I(netG, real_cpu, device)
            output1 = netD(out_G1.detach()).view(-1)
            errD_G_real = criterion(output1, label_f)

            ## Train with all fake based on noise
            noise = torch.randn(b_size, nz, device=device)
            fake = netG.D2_Decoder(noise)
            # label.fill_(fake_label)
            output2 = netD(fake.detach()).view(-1)
            errD_fake = criterion(output2, label_f)


            errD = errD_real + errD_fake + errD_G_real
            # Calculate gradients for D in backward pass
            optimizerD.zero_grad()
            errD.backward(retain_graph=True)
            optimizerD.step()

            label_r2 = torch.full((b_size,), real_label, dtype=torch.float, device=device)
            fake2 = netG.D2_Decoder(noise)

            # with torch.autograd.set_detect_anomaly(True):
            #### now to work on the generator
            #### use just the decoder of the VAE first with fake
           
            optimizerG_D.zero_grad()
            # label.fill_(real_label)
            output4 = netD(fake2).view(-1)
            errG_F = criterion(output4, label_r2)
            output5 = netD(out_G1).view(-1)
            errG_R = criterion(output5, label_r2)
            err_G=errG_F+errG_R
            err_G.backward(retain_graph=True)
            optimizerG_D.step()

            ### now we operate on the encoder part of the VAE
            optimizerG_E.zero_grad()
            loss_G2, out_G2 = loss_fn_G_I(netG, real_cpu, device)
            loss_G2.backward(retain_graph=True)
            optimizerG_E.step()

I am at my wits end as to what is causing this. The data set used is celebrity faces

Thanks for the update! The code is unfortunately not executable so that I cannot try to reproduce it.
Could you please update it and ping me here again?

hi ptrblck,

It seems that this is a learning rate issue, if i set the LR very heigh eg 1e-4 then the error occurs, however if i set the LR to 2.5e-6 then the model will run through to 60 epochs.

How is the best way of sending you the code?

chaslie

Post or edit your code here and make sure others can run it in order to reproduce it.
I.e. check that all functions are defined and in case data is used, create random tensors, if possible.

I have finally got to the bottom of this problem. If you are seeing

C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [3,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [4,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [6,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [7,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [0,0,0], thread: [9,0,0] Assertion `input_val >= zero && input_val <= one` failed.
Traceback (most recent call last):
line 2762, in binary_cross_entropy
    return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
RuntimeError: CUDA error: device-side assert triggered

Then check that you haven’t got backward(retain_graph=true) active. If you have then then revise the training script to get rid of this. It seems that the gradients are stacking up and eventulay they will “blow up”.