Loss.backward() failure due to contiguous issue

Hi,
I found this error when I backprop the loss

I suppose it indicates that ones needs to be contiguous, I have modified my codes into these:

self.sigma = nn.Parameter(np.log(init_length_scale) *torch.ones(self.in_channels), requires_grad=True).contiguous()

density = torch.ones(batch_size, n_in, 1).to(self.device).contiguous()

Still they didn’t work. Could somebody please tell me if I have put contiguous() at the wrong places? or I misunderstand the debug info. How should I locate such problems btw?

Many thanks in advance

Hi,

Can you enable anomaly detection to see which forward op is responsible for this please?

Oh, that’s really useful!!! The bugs started to make some sense. It might be convolution channel mismatching for skipped layers. Thank you sooo much @albanD, otherwise I think I’ll just stuck around checking unrelated codes and making no progress at all.


Can you print the full info about the convolution inputs/weight/bias? This is most likely a bug on our side I’m afraid.

print("Info for t.")
print("size: ", t.size())
print("stride: ", t.stride())
print("offset: ", t.storage_offset())
print("device: ", t.device)
print("layout: ", t.layout)

Thanks!

Hi,
this is the Unet structure I adopted, the forward prop works fine though,

def pad_concat(t1, t2):
    """Concat the activations of two layer channel-wise by padding the layer
    with fewer points with zeros.

    Args:
        t1 (tensor): Activations from first layers of shape `(batch, c1, n1, m1)`.
        t2 (tensor): Activations from second layers of shape `(batch,c2, n2, m2)`.

    Returns:
        tensor: Concatenated activations of both layers of shape
            `(batch, c1 + c2, max(n1, n2), max(m1, m2))`.
    """
    if t1.shape[2] > t2.shape[2]:
        padding = t1.shape[2] - t2.shape[2]
        if padding % 2 == 0:  # Even difference
            t2 = F.pad(t2, (0, 0, int(padding / 2), int(padding / 2)), 'reflect').contiguous()
        else:  # Odd difference
            t2 = F.pad(t2, (0, 0, int((padding - 1) / 2), int((padding + 1) / 2)),
                       'reflect').contiguous()
    elif t2.shape[2] > t1.shape[2]:
        padding = t2.shape[2] - t1.shape[2]
        if padding % 2 == 0:  # Even difference
            t1 = F.pad(t1, (0, 0, int(padding / 2), int(padding / 2)), 'reflect').contiguous()
        else:  # Odd difference
            t1 = F.pad(t1, (0, 0, int((padding - 1) / 2), int((padding + 1) / 2)),
                       'reflect').contiguous()

    # another dimension
    if t1.shape[3] > t2.shape[3]:
        padding = t1.shape[3] - t2.shape[3]
        if padding % 2 == 0:  # Even difference
            t2 = F.pad(t2, (int(padding / 2), int(padding / 2), 0, 0), 'reflect').contiguous()
        else:  # Odd difference
            t2 = F.pad(t2, (int((padding - 1) / 2), int((padding + 1) / 2), 0, 0),
                       'reflect').contiguous()
    elif t2.shape[3] > t1.shape[3]:
        padding = t2.shape[3] - t1.shape[3]
        if padding % 2 == 0:  # Even difference
            t1 = F.pad(t1, (int(padding / 2), int(padding / 2), 0, 0), 'reflect').contiguous()
        else:  # Odd difference
            t1 = F.pad(t1, (int((padding - 1) / 2), int((padding + 1) / 2), 0, 0),
                       'reflect').contiguous()

    return torch.cat([t1, t2], dim=1)


class UNet(nn.Module):
    """Large convolutional architecture from 1d experiments in the paper.
    This is a 12-layer residual network with skip connections implemented by
    concatenation.

    Args:
        in_channels (int, optional): Number of channels on the input to
            network. Defaults to 8.
    """

    def __init__(self, in_channels=8):
        super(UNet, self).__init__()
        self.activation = nn.ReLU()
        self.in_channels = in_channels
        self.out_channels = 16
        self.num_halving_layers = 6

        self.l1 = nn.Conv2d(in_channels=self.in_channels,
                            out_channels=self.in_channels,
                            kernel_size=5, stride=2, padding=2)
        self.l2 = nn.Conv2d(in_channels=self.in_channels,
                            out_channels=2 * self.in_channels,
                            kernel_size=5, stride=2, padding=2)
        self.l3 = nn.Conv2d(in_channels=2 * self.in_channels,
                            out_channels=2 * self.in_channels,
                            kernel_size=5, stride=2, padding=2)
        self.l4 = nn.Conv2d(in_channels=2 * self.in_channels,
                            out_channels=4 * self.in_channels,
                            kernel_size=5, stride=2, padding=2)
        self.l5 = nn.Conv2d(in_channels=4 * self.in_channels,
                            out_channels=8 * self.in_channels,
                            kernel_size=5, stride=2, padding=2)

        for layer in [self.l1, self.l2, self.l3, self.l4, self.l5]:
            init_layer_weights(layer)

        self.l6 = nn.ConvTranspose2d(in_channels=8 * self.in_channels,
                                     out_channels=4 * self.in_channels,
                                     kernel_size=5, stride=2, padding=2,
                                     output_padding=1)
        self.l7 = nn.ConvTranspose2d(in_channels=8 * self.in_channels,
                                     out_channels=2 * self.in_channels,
                                     kernel_size=5, stride=2, padding=2,
                                     output_padding=1)
        self.l8 = nn.ConvTranspose2d(in_channels=4 * self.in_channels,
                                      out_channels=2 * self.in_channels,
                                      kernel_size=5, stride=2, padding=2,
                                      output_padding=1)
        self.l9 = nn.ConvTranspose2d(in_channels=4 * self.in_channels,
                                      out_channels=self.in_channels,
                                      kernel_size=5, stride=2, padding=2,
                                      output_padding=1)
        self.l10 = nn.ConvTranspose2d(in_channels=2 * self.in_channels,
                                      out_channels=self.in_channels,
                                      kernel_size=5, stride=2, padding=2,
                                      output_padding=1)

        for layer in [self.l6, self.l7, self.l8, self.l9, self.l10]:
            init_layer_weights(layer)

    def forward(self, x):
        """Forward pass through the convolutional structure.

        Args:
            x (tensor): Inputs of shape `(batch, n_in, in_channels)`.

        Returns:
            tensor: Outputs of shape `(batch, n_out, out_channels)`.
        """
        h1 = self.activation(self.l1(x))
        self.print_conv_info(x, self.l1)

        h2 = self.activation(self.l2(h1))
        self.print_conv_info(h1, self.l2)

        h3 = self.activation(self.l3(h2))
        self.print_conv_info(h2, self.l3)

        h4 = self.activation(self.l4(h3))
        self.print_conv_info(h3, self.l4)

        h5 = self.activation(self.l5(h4))
        self.print_conv_info(h4, self.l5)

        h6 = self.activation(self.l6(h5))
        self.print_conv_info(h5, self.l6)
        h6 = pad_concat(h4, h6)

        h7 = self.activation(self.l7(h6))
        self.print_conv_info(h6, self.l7)
        h7 = pad_concat(h3, h7)

        h8 = self.activation(self.l8(h7))
        self.print_conv_info(h7, self.l8)
        h8 = pad_concat(h2, h8)

        h9 = self.activation(self.l9(h8))
        self.print_conv_info(h8, self.l9)
        h9 = pad_concat(h1, h9)

        h10 = self.activation(self.l10(h9))
        self.print_conv_info(h9, self.l10)
        output = pad_concat(x, h10)

        return output

    def print_conv_info(self, input, layer):
        print("Info for %s."%layer)
        print("stride:", layer.stride)
        print("input size: ", input.size())
        print("input offset: ", input.storage_offset())
        print("input device: ", input.device)
        print("input layout:", input.layout)

and the results are the followings. hope it’s not too complicated

Info for Conv2d(8, 8, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (2, 2)
input size: torch.Size([64, 8, 192, 63])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for Conv2d(8, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (2, 2)
input size: torch.Size([64, 8, 96, 32])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for Conv2d(16, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (2, 2)
input size: torch.Size([64, 16, 48, 16])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for Conv2d(16, 32, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (2, 2)
input size: torch.Size([64, 16, 24, 8])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for Conv2d(32, 64, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (2, 2)
input size: torch.Size([64, 32, 12, 4])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for ConvTranspose2d(64, 32, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (2, 2)
input size: torch.Size([64, 64, 6, 2])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for ConvTranspose2d(64, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (2, 2)
input size: torch.Size([64, 64, 12, 4])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for ConvTranspose2d(32, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (2, 2)
input size: torch.Size([64, 32, 24, 8])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for ConvTranspose2d(32, 8, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (2, 2)
input size: torch.Size([64, 32, 48, 16])
input offset: 0
input device: cuda:7
input layout: torch.strided

Info for ConvTranspose2d(16, 8, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (2, 2)
input size: torch.Size([64, 16, 96, 32])
input offset: 0
input device: cuda:7
input layout: torch.strided

BTW, I suspect it is related to the padding operation, I am working on padding input x first in order to make its width and height even numbers and see if I will succeed.


NO, it’s not :frowning: I have made sure that only concatenation is called without padding in the pad_concat function, and directly output h10 without concatenating with x to avoid size mismatching.

Still the bug remains the same. I am wondering whether it is caused by modules before and after, since the final bug says: “RuntimeError: ones needs to be contiguous”. and I did have learnable parameters like this

self.sigma = nn.Parameter(np.log(init_length_scale) * torch.ones(self.in_channels).contiguous(), requires_grad=True)

Honestly, I am not sure what this “ones” means

The stride I was looking for there is the Tensor stride, not the layer’s stride :wink: This is what makes a Tensor “non-contiguous” that’s why it is important.

Oh I see, here’s the result, thanks again

Info for Conv2d(8, 8, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (96768, 12096, 63, 1)
input size: torch.Size([64, 8, 192, 63])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for Conv2d(8, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (24576, 3072, 32, 1)
input size: torch.Size([64, 8, 96, 32])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for Conv2d(16, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (12288, 768, 16, 1)
input size: torch.Size([64, 16, 48, 16])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for Conv2d(16, 32, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (3072, 192, 8, 1)
input size: torch.Size([64, 16, 24, 8])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for Conv2d(32, 64, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)).
stride: (1536, 48, 4, 1)
input size: torch.Size([64, 32, 12, 4])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for ConvTranspose2d(64, 32, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (768, 12, 2, 1)
input size: torch.Size([64, 64, 6, 2])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for ConvTranspose2d(64, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (3072, 48, 4, 1)
input size: torch.Size([64, 64, 12, 4])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for ConvTranspose2d(32, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (6144, 192, 8, 1)
input size: torch.Size([64, 32, 24, 8])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for ConvTranspose2d(32, 8, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (24576, 768, 16, 1)
input size: torch.Size([64, 32, 48, 16])
input offset: 0
input device: cuda:7
input layout: torch.strided
Info for ConvTranspose2d(16, 8, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), output_padding=(1, 1)).
stride: (49152, 3072, 32, 1)
input size: torch.Size([64, 16, 96, 32])
input offset: 0
input device: cuda:7
input layout: torch.strided

i am having hte same error when trying to use deconvolutions with the following arch:

ModuleList(
(0): Sequential(
(0): ConvTranspose1d(512, 512, kernel_size=(2,), stride=(2,), output_padding=(1,))
)
(1): Sequential(
(0): ConvTranspose1d(512, 512, kernel_size=(2,), stride=(2,), output_padding=(1,))
)
(2): Sequential(
(0): ConvTranspose1d(512, 512, kernel_size=(3,), stride=(2,), output_padding=(1,))
)
(3): Sequential(
(0): ConvTranspose1d(512, 512, kernel_size=(3,), stride=(2,), output_padding=(1,))
)

must be the output_padding ?

its definitely the output padding since if i set it to 0 everything works

yes!!! @alexeib it worked for me too. I set out_padding = 0 and use F.pad to align the size. Thanks a lot for the suggestion. But still I will be waiting for some possible explanations before I settle with the solution.

i created an issue here:

maybe you can provide a repro there for pytorch team to look at?

Can you run your code with TORCH_SHOW_CPP_STACKTRACES=1 set to see which exact conv function is responsible? Looking at the code or convtranspose doesn’t really show anything suspicious.

@alexeib we usually wait to find out what is the issue and make sure it is a bug in pytorch before opening an issue. To avoid opening “empty” issues that don’t contain concrete information.

Hi there, here’s the result with TORCH_SHOW_CPP_STACKTRACES=1 before python main.py, and I think it started from l10,
( h10 = self.activation(self.l10(h9)))
which the first layer of convolution if you do loss back propagation,

Warning: Traceback of forward call that caused the error:
  File "NP_PROV_train.py", line 55, in <module>
    mean, var = convcnp(x_context.to(device), y_context.to(device), x_target.to(device))
  File "/home/xuesongwang/venv_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xuesongwang/PycharmProject/Fluctuation_Resistant_Neural_Process/module/NP_PROV.py", line 508, in forward
    h = self.rho(h)
  File "/home/xuesongwang/venv_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xuesongwang/PycharmProject/Fluctuation_Resistant_Neural_Process/module/NP_PROV.py", line 246, in forward
    h10 = self.activation(self.l10(h9))
  File "/home/xuesongwang/venv_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xuesongwang/venv_torch/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 778, in forward
    output_padding, self.groups, self.dilation)
 (print_stack at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:57)
  0%|                                                                                                            | 0/200000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "NP_PROV_train.py", line 58, in <module>
    loss.backward()
  File "/home/xuesongwang/venv_torch/lib/python3.6/site-packages/torch/tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/xuesongwang/venv_torch/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: ones needs to be contiguous

I don’t know if this is the same problem, but I just got an autograd error trying to run a small Fastai Tabular model. I’d appreciate advice about how to track down the problem.

~/anaconda3/envs/fastai/lib/python3.8/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
    123         retain_graph = create_graph
    124 
--> 125     Variable._execution_engine.run_backward(
    126         tensors, grad_tensors, retain_graph, create_graph,
    127         allow_unreachable=True)  # allow_unreachable flag

RuntimeError: Found dtype Short but expected Float
Exception raised from compute_types at /Users/distiller/project/conda/conda-bld/pytorch_1595629449223/work/aten/src/ATen/native/TensorIterator.cpp:183 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 169 (0x128c4e199 in libc10.dylib)
frame #1: at::TensorIterator::compute_types(at::TensorIteratorConfig const&) + 3842 (0x121193312 in libtorch_cpu.dylib)
frame #2: at::TensorIterator::build(at::TensorIteratorConfig&) + 618 (0x12119c51a in libtorch_cpu.dylib)
frame #3: at::TensorIterator::TensorIterator(at::TensorIteratorConfig&) + 223 (0x12119c1ff in libtorch_cpu.dylib)
frame #4: at::native::mse_loss_backward_out(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long) + 410 (0x120fe7f7a in libtorch_cpu.dylib)

hi @urukhaim, did you try the detect_anomaly suggested by @albanD ? It may help you locate which operation is responsible

@xuesongwang is it possible to create a toy example and provide it in the issue above?

Hi @alexeib , I tried to write a demo code using only Unet architecture, and I assigned random input and output. The out_padding was set to 1. However, it worked and I got the gradient. Hence, I think it is related to the modules before and after Unet, if that’s the case, then I would probably have to release all my codes :frowning: , sorry about that.

If that comes from here, if you just run the previous layer and this one, l10, does that trigger the same error? :slight_smile:

Sorry I am a bit confused here. Let’s say l10 belongs to this UNet structure, before it was a linear module plus one learnable parameter, by “previous layer”, you mean I input my data, go through all the layers before l10, and directly used the output of l10 to compute loss and back propagation?