Runtimeerror cudnn_status_execution_failed( cudnn_batch_norm_backward allow unreachable flaf

I am trying to train my first CNN after trying an existing one but it didn’t work . this error was appeared and i cann’t understand it.

I work on ubuntu 18.0 , GTX 1660 Ti 6G. this is a code sample that i think causes the error:

the code

for epoch in range(num_epochs):
for batch_idx, (features, targets, levels, x) in enumerate(train_loader):
features =
targets = targets
targets =
levels =

    logits, probas = model(features)
    if epoch >= 190:
        print('\n i=',batch_idx,'logits  =', logits)
        print( '\n i=',batch_idx,' probas =',probas)

    impf=torch.ones([logits.shape[0], NUM_CLASSES])
    for i in range(len(x) ):
    impf =                
    logits= (logits * impf).to(DEVICE)
    cost = cost_fn(logits, levels)

Which PyTorch, CUDA and cudnn versions are you using?
Also, could you post the model definition as well as the shapes of all tensors, so that we could reproduce and debug this issue, please?

1 Like

I am using pytorch 1.5.0 and my cuda version is 10.2 on ubuntu 18.0.

I used resrnet34:
def resnet34(num_classes, grayscale):
“”“Constructs a ResNet-34 model.”""
model = ResNet(block=BasicBlock,
layers=[3, 4, 6, 3],
return model
Now when I run my script on a small dataset it work perfect. but the same script on the large dataset caused a different error in the line: …RuntimeError: CUDA error: unspecified launch failure .

it works fine till epoch=31 then the error appears: RuntimeError: CUDA error: unspecified launch failure
Sometimes it work fine till epoch=39 then the same error appears

In another time, the run stopped at epoch=107 of 200 epochs

Could you check, if you are running out of memory, please?

do you mean memcheck … yes i do and this is the output:
========= ERROR SUMMARY: 0 errors

and if this may be the cause what is the solution ??? reduce the batch size can fix

No, I meant if your GPU memory is filling up and you thus cannot allocate any more data on the device.
You can check the memory usage via nvidia-smi or in your script via e.g. torch.cuda.memory_allocated().

Are you using custom CUDA code or did you execute cuda-memcheck just on the complete PyTorch model?

I executed 'cuda-memcheck` on the complete PyTorch model

please explain more. where to add this in my script?

is this true where the tensors ‘logits and impf’ have the same size ?

You could add it e.g. at the beginning and at the end of each iteration to check the allocated memory, which would show if you are close to the device limit. Note that this call does not return the memory usage of the CUDA context or from other applications.

1 Like

the output is:
Epoch: 001/200 | Batch 0000/0343 | Cost: 38.0783
memory allocated after 369382912
memory allocated before 346285056
memory allocated after 369382912
memory allocated before 346285056
memory allocated after 369382912
memory allocated before 346285056
memory allocated after 369382912
memory allocated before 346285056
memory allocated after 369382912
memory allocated before 346285056
memory allocated after 369382912

and this repeated for all epochs …so I found that the used memory is almost constant during each iteration. in the other side the run stopped at epoch 17 with a new error :sob: :sob: :sob::

In that case could you rerun the code with


and post the stack trace here?

I am already using this format

Could you update to PyTorch 1.6 or the nightly/master, since 1.5 had an issue where device assert statements were ignored.
This could mean that you are in fact hitting a valid assert.

Are you mean that PyTorch `1.5’ is the reason not an error in my script

I noticed that the code stoped at the same line in many time :

where it done will at the small data

Yes, since (some) assert statements were broken in PyTorch 1.5, thus you would have to update to 1.6 or the nightly binary.

Is this a new issue or why is the code not crashing anymore?