Loss.backward() failure due to contiguous issue

Hi,
I found this error when I backprop the loss

I suppose it indicates that ones needs to be contiguous, I have modified my codes into these:

self.sigma = nn.Parameter(np.log(init_length_scale) *torch.ones(self.in_channels), requires_grad=True).contiguous()

density = torch.ones(batch_size, n_in, 1).to(self.device).contiguous()

Still they didn’t work. Could somebody please tell me if I have put contiguous() at the wrong places? or I misunderstand the debug info. How should I locate such problems btw?

Many thanks in advance

Hi,

Can you enable anomaly detection to see which forward op is responsible for this please?

1 Like

Oh, that’s really useful!!! The bugs started to make some sense. It might be convolution channel mismatching for skipped layers. Thank you sooo much @albanD, otherwise I think I’ll just stuck around checking unrelated codes and making no progress at all.


Can you print the full info about the convolution inputs/weight/bias? This is most likely a bug on our side I’m afraid.

print("Info for t.")
print("size: ", t.size())
print("stride: ", t.stride())
print("offset: ", t.storage_offset())
print("device: ", t.device)
print("layout: ", t.layout)

Thanks!

Hi,
this is the Unet structure I adopted, the forward prop works fine though,

def pad_concat(t1, t2):
    """Concat the activations of two layer channel-wise by padding the layer
    with fewer points with zeros.

    Args:
        t1 (tensor): Activations from first layers of shape `(batch, c1, n1, m1)`.
        t2 (tensor): Activations from second layers of shape `(batch,c2, n2, m2)`.

    Returns:
        tensor: Concatenated activations of both layers of shape
            `(batch, c1 + c2, max(n1, n2), max(m1, m2))`.
    """
    if t1.shape[2] > t2.shape[2]:
        padding = t1.shape[2] - t2.shape[2]
        if padding % 2 == 0:  # Even difference
            t2 = F.pad(t2, (0, 0, int(padding / 2), int(padding / 2)), 'reflect').contiguous()
        else:  # Odd difference
            t2 = F.pad(t2, (0, 0, int((padding - 1) / 2), int((padding + 1) / 2)),
                       'reflect').contiguous()
    elif t2.shape[2] > t1.shape[2]:
        padding = t2.shape[2] - t1.shape[2]
        if padding % 2 == 0:  # Even difference
            t1 = F.pad(t1, (0, 0, int(padding / 2), int(padding / 2)), 'reflect').contiguous()
        else:  # Odd difference
            t1 = F.pad(t1, (0, 0, int((padding - 1) / 2), int((padding + 1) / 2)),
                       'reflect').contiguous()

    # another dimension
    if t1.shape[3] > t2.shape[3]:
        padding = t1.shape[3] - t2.shape[3]
        if padding % 2 == 0:  # Even difference
            t2 = F.pad(t2, (int(padding / 2), int(padding / 2), 0, 0), 'reflect').contiguous()
        else:  # Odd difference
            t2 = F.pad(t2, (int((padding - 1) / 2), int((padding + 1) / 2), 0, 0),
                       'reflect').contiguous()
    elif t2.shape[3] > t1.shape[3]:
        padding = t2.shape[3] - t1.shape[3]
        if padding % 2 == 0:  # Even difference
            t1 = F.pad(t1, (int(padding / 2), int(padding / 2), 0, 0), 'reflect').contiguous()
        else:  # Odd difference
            t1 = F.pad(t1, (int((padding - 1) / 2), int((padding + 1) / 2), 0, 0),
                       'reflect').contiguous()

    return torch.cat([t1, t2], dim=1)


class UNet(nn.Module):
    """Large convolutional architecture from 1d experiments in the paper.
    This is a 12-layer residual network with skip connections implemented by
    concatenation.

    Args:
        in_channels (int, optional): Number of channels on the input to
            network. Defaults to 8.
    """

    def __init__(self, in_channels=8):
        super(UNet, self).__init__()
        self.activation = nn.ReLU()
        self.in_channels = in_channels
        self.out_channels = 16
        self.num_halving_layers = 6

        self.l1 = nn.Conv2d(in_channels=self.in_channels,
                            out_channels=self.in_channels,
                            kernel_size=5, stride=2, padding=2)
        self.l2 = nn.Conv2d(in_channels=self.in_channels,
                            out_channels=2 * self.in_channels,
                            kernel_size=5, stride=2, padding=2)
        self.l3 = nn.Conv2d(in_channels=2 * self.in_channels,
                            out_channels=2 * self.in_channels,
                            kernel_size=5, stride=2, padding=2)
        self.l4 = nn.Conv2d(in_channels=2 * self.in_channels,
                            out_channels=4 * self.in_channels,
                            kernel_size=5, stride=2, padding=2)
        self.l5 = nn.Conv2d(in_channels=4 * self.in_channels,
                            out_channels=8 * self.in_channels,
                            kernel_size=5, stride=2, padding=2)

        for layer in [self.l1, self.l2, self.l3, self.l4, self.l5]:
            init_layer_weights(layer)

        self.l6 = nn.ConvTranspose2d(in_channels=8 * self.in_channels,
                                     out_channels=4 * self.in_channels,
                                     kernel_size=5, stride=2, padding=2,
                                     output_padding=1)
        self.l7 = nn.ConvTranspose2d(in_channels=8 * self.in_channels,
                                     out_channels=2 * self.in_channels,
                                     kernel_size=5, stride=2, padding=2,
                                     output_padding=1)
        self.l8 = nn.ConvTranspose2d(in_channels=4 * self.in_channels,
                                      out_channels=2 * self.in_channels,
                                      kernel_size=5, stride=2, padding=2,
                                      output_padding=1)
        self.l9 = nn.ConvTranspose2d(in_channels=4 * self.in_channels,
                                      out_channels=self.in_channels,
                                      kernel_size=5, stride=2, padding=2,
                                      output_padding=1)
        self.l10 = nn.ConvTranspose2d(in_channels=2 * self.in_channels,
                                      out_channels=self.in_channels,
                                      kernel_size=5, stride=2, padding=2,
                                      output_padding=1)

        for layer in [self.l6, self.l7, self.l8, self.l9, self.l10]:
            init_layer_weights(layer)

    def forward(self, x):
        """Forward pass through the convolutional structure.

        Args:
            x (tensor): Inputs of shape `(batch, n_in, in_channels)`.

        Returns:
            tensor: Outputs of shape `(batch, n_out, out_channels)`.
        """
        h1 = self.activation(self.l1(x))
        self.print_conv_info(x, self.l1)

        h2 = self.activation(self.l2(h1))
        self.print_conv_info(h1, self.l2)

        h3 = self.activation(self.l3(h2))
        self.print_conv_info(h2, self.l3)

        h4 = self.activation(self.l4(h3))
        self.print_conv_info(h3, self.l4)

        h5 = self.activation(self.l5(h4))
        self.print_conv_info(h4, self.l5)

        h6 = self.activation(self.l6(h5))
        self.print_conv_info(h5, self.l6)
        h6 = pad_concat(h4, h6)

        h7 = self.activation(self.l7(h6))
        self.print_conv_info(h6, self.l7)
        h7 = pad_concat(h3, h7)

        h8 = self.activation(self.l8(h7))
        self.print_conv_info(h7, self.l8)
        h8 = pad_concat(h2, h8)

        h9 = self.activation(self.l9(h8))
        self.print_conv_info(h8, self.l9)
        h9 = pad_concat(h1, h9)

        h10 = self.activation(self.l10(h9))
        self.print_conv_info(h9, self.l10)
        output = pad_concat(x, h10)

        return output

    def print_conv_info(self, input, layer):
        print("Info for %s."%layer)
        print("stride:", layer.stride)
        print("input size: ", input.size())
        print("input offset: ", input.storage_offset())
        print("input device: ", input.device)
        print("input layout:", input.layout)

and the results are the followings. hope it’s not too complicated

Info for Conv2d(8, 8, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (2, 2)
input size: torch.Size([64, 8, 192, 63])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for Conv2d(8, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (2, 2)
input size: torch.Size([64, 8, 96, 32])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for Conv2d(16, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (2, 2)
input size: torch.Size([64, 16, 48, 16])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for Conv2d(16, 32, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (2, 2)
input size: torch.Size([64, 16, 24, 8])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for Conv2d(32, 64, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (2, 2)
input size: torch.Size([64, 32, 12, 4])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for ConvTranspose2d(64, 32, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (2, 2)
input size: torch.Size([64, 64, 6, 2])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for ConvTranspose2d(64, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (2, 2)
input size: torch.Size([64, 64, 12, 4])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for ConvTranspose2d(32, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (2, 2)
input size: torch.Size([64, 32, 24, 8])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for ConvTranspose2d(32, 8, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (2, 2)
input size: torch.Size([64, 32, 48, 16])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for ConvTranspose2d(16, 8, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (2, 2)
input size: torch.Size([64, 16, 96, 32])
input offset: 0
input device: cuda:7
input layout: torch.strided

BTW, I suspect it is related to the padding operation, I am working on padding input x first in order to make its width and height even numbers and see if I will succeed.


NO, it’s not :frowning: I have made sure that only concatenation is called without padding in the pad_concat function, and directly output h10 without concatenating with x to avoid size mismatching.

Still the bug remains the same. I am wondering whether it is caused by modules before and after, since the final bug says: “RuntimeError: ones needs to be contiguous”. and I did have learnable parameters like this

self.sigma = nn.Parameter(np.log(init_length_scale) * torch.ones(self.in_channels).contiguous(), requires_grad=True)

Honestly, I am not sure what this “ones” means

The stride I was looking for there is the Tensor stride, not the layer’s stride :wink: This is what makes a Tensor “non-contiguous” that’s why it is important.

Oh I see, here’s the result, thanks again

Info for Conv2d(8, 8, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (96768, 12096, 63, 1)
input size: torch.Size([64, 8, 192, 63])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for Conv2d(8, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (24576, 3072, 32, 1)
input size: torch.Size([64, 8, 96, 32])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for Conv2d(16, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (12288, 768, 16, 1)
input size: torch.Size([64, 16, 48, 16])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for Conv2d(16, 32, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (3072, 192, 8, 1)
input size: torch.Size([64, 16, 24, 8])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for Conv2d(32, 64, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (1536, 48, 4, 1)
input size: torch.Size([64, 32, 12, 4])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for ConvTranspose2d(64, 32, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (768, 12, 2, 1)
input size: torch.Size([64, 64, 6, 2])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for ConvTranspose2d(64, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (3072, 48, 4, 1)
input size: torch.Size([64, 64, 12, 4])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for ConvTranspose2d(32, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (6144, 192, 8, 1)
input size: torch.Size([64, 32, 24, 8])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for ConvTranspose2d(32, 8, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (24576, 768, 16, 1)
input size: torch.Size([64, 32, 48, 16])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for ConvTranspose2d(16, 8, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (49152, 3072, 32, 1)
input size: torch.Size([64, 16, 96, 32])
input offset: 0
input device: cuda:7
input layout: torch.strided

i am having hte same error when trying to use deconvolutions with the following arch:

ModuleList(
(0): Sequential(
(0): ConvTranspose1d(512, 512, kernel_size=(2,), stride=(2,), output_padding=(1,))
)
(1): Sequential(
(0): ConvTranspose1d(512, 512, kernel_size=(2,), stride=(2,), output_padding=(1,))
)
(2): Sequential(
(0): ConvTranspose1d(512, 512, kernel_size=(3,), stride=(2,), output_padding=(1,))
)
(3): Sequential(
(0): ConvTranspose1d(512, 512, kernel_size=(3,), stride=(2,), output_padding=(1,))
)

must be the output_padding ?

1 Like

its definitely the output padding since if i set it to 0 everything works

yes!!! @alexeib it worked for me too. I set out_padding = 0 and use F.pad to align the size. Thanks a lot for the suggestion. But still I will be waiting for some possible explanations before I settle with the solution.

i created an issue here:

maybe you can provide a repro there for pytorch team to look at?

Can you run your code with TORCH_SHOW_CPP_STACKTRACES=1 set to see which exact conv function is responsible? Looking at the code or convtranspose doesn’t really show anything suspicious.

@alexeib we usually wait to find out what is the issue and make sure it is a bug in pytorch before opening an issue. To avoid opening “empty” issues that don’t contain concrete information.

Hi there, here’s the result with TORCH_SHOW_CPP_STACKTRACES=1 before python main.py, and I think it started from l10,
( h10 = self.activation(self.l10(h9)))
which the first layer of convolution if you do loss back propagation,

Warning: Traceback of forward call that caused the error:
  File "NP_PROV_train.py", line 55, in <module>
    mean, var = convcnp(x_context.to(device), y_context.to(device), x_target.to(device))
  File "/home/xuesongwang/venv_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xuesongwang/PycharmProject/Fluctuation_Resistant_Neural_Process/module/NP_PROV.py", line 508, in forward
    h = self.rho(h)
  File "/home/xuesongwang/venv_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xuesongwang/PycharmProject/Fluctuation_Resistant_Neural_Process/module/NP_PROV.py", line 246, in forward
    h10 = self.activation(self.l10(h9))
  File "/home/xuesongwang/venv_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xuesongwang/venv_torch/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 778, in forward
    output_padding, self.groups, self.dilation)
 (print_stack at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:57)
  0%|                                                                                                            | 0/200000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "NP_PROV_train.py", line 58, in <module>
    loss.backward()
  File "/home/xuesongwang/venv_torch/lib/python3.6/site-packages/torch/tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/xuesongwang/venv_torch/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: ones needs to be contiguous

I don’t know if this is the same problem, but I just got an autograd error trying to run a small Fastai Tabular model. I’d appreciate advice about how to track down the problem.

~/anaconda3/envs/fastai/lib/python3.8/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
    123         retain_graph = create_graph
    124 
--> 125     Variable._execution_engine.run_backward(
    126         tensors, grad_tensors, retain_graph, create_graph,
    127         allow_unreachable=True)  # allow_unreachable flag

RuntimeError: Found dtype Short but expected Float
Exception raised from compute_types at /Users/distiller/project/conda/conda-bld/pytorch_1595629449223/work/aten/src/ATen/native/TensorIterator.cpp:183 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 169 (0x128c4e199 in libc10.dylib)
frame #1: at::TensorIterator::compute_types(at::TensorIteratorConfig const&) + 3842 (0x121193312 in libtorch_cpu.dylib)
frame #2: at::TensorIterator::build(at::TensorIteratorConfig&) + 618 (0x12119c51a in libtorch_cpu.dylib)
frame #3: at::TensorIterator::TensorIterator(at::TensorIteratorConfig&) + 223 (0x12119c1ff in libtorch_cpu.dylib)
frame #4: at::native::mse_loss_backward_out(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long) + 410 (0x120fe7f7a in libtorch_cpu.dylib)

hi @urukhaim, did you try the detect_anomaly suggested by @albanD ? It may help you locate which operation is responsible

@xuesongwang is it possible to create a toy example and provide it in the issue above?

Hi @alexeib , I tried to write a demo code using only Unet architecture, and I assigned random input and output. The out_padding was set to 1. However, it worked and I got the gradient. Hence, I think it is related to the modules before and after Unet, if that’s the case, then I would probably have to release all my codes :frowning: , sorry about that.

If that comes from here, if you just run the previous layer and this one, l10, does that trigger the same error? :slight_smile:

Sorry I am a bit confused here. Let’s say l10 belongs to this UNet structure, before it was a linear module plus one learnable parameter, by “previous layer”, you mean I input my data, go through all the layers before l10, and directly used the output of l10 to compute loss and back propagation?