Channels last question

I’m looking at https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html

Specifically, this example:

    input = torch.randint(1, 10, (2, 8, 4, 4), dtype=torch.float32, device="cuda", requires_grad=True)
    model = torch.nn.Conv2d(8, 5, 3).cuda().float()

    input = input.contiguous(memory_format=torch.channels_last)
    model = model.to(memory_format=torch.channels_last) # Module parameters need to be Channels Last

    out = model(input)
    print(out.is_contiguous(memory_format=torch.channels_last)) # Outputs: True

If I comment out the line that converts model to channels_last format, I expect it to fail, but it does not.
How come input which are in channel_last format can be convolved with weight filters in channel_first format without dim mismatch error:

>>> model.weight.shape
torch.Size([5, 8, 3, 3])
>>> model.weight.is_contiguous(memory_format=torch.channels_last)
False
>>> input.shape
torch.Size([2, 8, 4, 4])
>>> input.is_contiguous(memory_format=torch.channels_last)
True

My understanding is input actual shape in this case is [2, 4, 4, 8], and therefore the weight input channels dim (8) should not match the input channels dim (4).

@VitalyFedyunin ?

I think the input would be transformed to channels_last internally (I asked the same question recently and discussed it with one of the original authors), but let’s see if Vitaly can confirm it.

This is my understanding as well. It looks like a bug to me, because if input is being permuted to channels_last, but weight remains channels_first then the code effectively becomes:

input = torch.randint(1, 10, (2, 4, 4, 8), dtype=torch.float32, device="cuda", requires_grad=True)
model = torch.nn.Conv2d(8, 5, 3).cuda().float()
out = model(input)

And it obviously fails because of the dim mismatch, but my example above runs fine even when weight remains in channels_first format. Something is not right here.

No, I don’t think it’s a bug, if PyTorch internally makes sure that the input and parameter uses the same memory layout.
Also, the result is the same:

input = torch.randint(1, 10, (2, 8, 4, 4), dtype=torch.float32, device="cuda", requires_grad=True)
model = torch.nn.Conv2d(8, 5, 3).cuda().float()

input = input.contiguous(memory_format=torch.channels_last)
model = model.to(memory_format=torch.channels_last) # Module parameters need to be Channels Last

out_ref = model(input)
print(out_ref.is_contiguous(memory_format=torch.channels_last))
> True

model.to(memory_format=torch.contiguous_format)
out = model(input)

print(out.is_contiguous(memory_format=torch.channels_last))
> True

print((out_ref - out).abs().max())
> tensor(0., device='cuda:0', grad_fn=<MaxBackward1>)

What does it mean? To me it seems when the layout is different, the op should fail with a “memory layout mismatch” error message. Otherwise, how does the Pytorch know which layout is the correct one? Does it convert weight to channel_last, to match input, or does it convert input to channel_first to match weight? If it’s always the former, what’s the point of model=model.to(memory_format=torch.channels_last) line?

This might have been one possible approach, but would potentially break ambiguous memory layout checks, such as 1x1 kernels.
The current implementations checks, if the suggested memory format of the input or weight is channels_last (code) and should use it, if applicable.

I see. So does this mean that, in the tutorial example, it’s not necessary to set input to channels_last - it’s redundant because setting the model to channels_last forces it anyway. Is there any situation where we would want to set the inputs in addition to setting the model?

I would always set the memory layout explicitly, not rely on the internal workflow of fixing ambiguous memory layouts, and would stick to the tutorial.

Makes sense, thank you Piotr. I have to say, this behavior will cause confusion, because people will forget to set one or the other, see that it still works, and will wonder whether it actually worked as intended. At the very least I’d mention something about this in the tutorial.

2 Likes

Is it possible to run a Conv2d on a NHWC tensor (not NCHW)? it seems that if I put channels explicitely at the end of @michaelklachko example and keep “channels_last” behavior it does not work.

Not sure what you mean by “put channels explicitly”. If you mean you manually transposed the tensor so that channels dim is the last one, then yes, it will fail, because “channels_last” feature is designed to work with tensors in their default shapes. The transformation happens internally.

Like I said, this is going to be confusing to people until the docs and error messages are improved.

yes i agree and yes this is what i meant. It would be easier to allow conv2d operator to run on NHWC tensors.