Max_unpooling3d_forward_kernel failed with error code 0

After receiving a memory error I ran my training loop using CUDA_LAUNCH_BLOCKING=1 and now receive this rather esoteric error: max_unpooling3d_forward_kernel failed with error code 0. What does error code 0 stand for? Also, this happens after 3 successful forward and backward passes through the network.

As I changed a number of things about my training loop from the last working version, without touching the actual network, having an hint where to look for a bug would really help.

This is surprising indeed.
Are you running out of memory? Does reducing the batch size help?

When I had low memory issues in the past I would get a much clearer “out of memory” error.

@ptrblck any idea of what could be causing this?

What do you mean by “memory error”?
Was it an out of memory error or some memory access violation?
Could you post the complete stack trace and, if possible, a code snippet to reproduce this issue, so that we can debug it?

Hi,

How does the backpropagation of max unpooling layer will work?

The nn.MaxUnpool2d layer will direct the gradients to the inputs directly as seen here:

pool = nn.MaxPool2d(kernel_size=2, return_indices=True)

x = torch.randn(1, 1, 4, 4, requires_grad=True)
print(x)
# tensor([[[[-0.0680, -0.7748, -0.1858,  0.4980],
#           [-1.9051, -1.8637, -0.3653,  1.4009],
#           [-0.5979,  0.5564,  1.4532,  0.9443],
#           [ 0.2864, -0.3723, -0.9621,  1.0871]]]], requires_grad=True)

act, idx = pool(x)
print(act)
# tensor([[[[-0.0680,  1.4009],
#           [ 0.5564,  1.4532]]]], grad_fn=<MaxPool2DWithIndicesBackward0>)
print(idx)
# tensor([[[[ 0,  7],
#           [ 9, 10]]]])

act.retain_grad()
unpool = nn.MaxUnpool2d(2)

out = unpool(act, idx)
print(out)
# tensor([[[[-0.0680,  0.0000,  0.0000,  0.0000],
#           [ 0.0000,  0.0000,  0.0000,  1.4009],
#           [ 0.0000,  0.5564,  1.4532,  0.0000],
#           [ 0.0000,  0.0000,  0.0000,  0.0000]]]],
#        grad_fn=<MaxUnpool2DBackward0>)

out.mean().backward()
print(act.grad)
# tensor([[[[0.0625, 0.0625],
#           [0.0625, 0.0625]]]])
print(x.grad)
# tensor([[[[0.0625, 0.0000, 0.0000, 0.0000],
#           [0.0000, 0.0000, 0.0000, 0.0625],
#           [0.0000, 0.0625, 0.0625, 0.0000],
#           [0.0000, 0.0000, 0.0000, 0.0000]]]])
1 Like

Thank you very much.