input = torch.randint(1, 10, (2, 8, 4, 4), dtype=torch.float32, device="cuda", requires_grad=True)
model = torch.nn.Conv2d(8, 5, 3).cuda().float()
input = input.contiguous(memory_format=torch.channels_last)
model = model.to(memory_format=torch.channels_last) # Module parameters need to be Channels Last
out = model(input)
print(out.is_contiguous(memory_format=torch.channels_last)) # Outputs: True
If I comment out the line that converts model to channels_last format, I expect it to fail, but it does not.
How come input which are in channel_last format can be convolved with weight filters in channel_first format without dim mismatch error:
My understanding is input actual shape in this case is [2, 4, 4, 8], and therefore the weight input channels dim (8) should not match the input channels dim (4).
I think the input would be transformed to channels_last internally (I asked the same question recently and discussed it with one of the original authors), but let’s see if Vitaly can confirm it.
This is my understanding as well. It looks like a bug to me, because if input is being permuted to channels_last, but weight remains channels_first then the code effectively becomes:
input = torch.randint(1, 10, (2, 4, 4, 8), dtype=torch.float32, device="cuda", requires_grad=True)
model = torch.nn.Conv2d(8, 5, 3).cuda().float()
out = model(input)
And it obviously fails because of the dim mismatch, but my example above runs fine even when weight remains in channels_first format. Something is not right here.
What does it mean? To me it seems when the layout is different, the op should fail with a “memory layout mismatch” error message. Otherwise, how does the Pytorch know which layout is the correct one? Does it convert weight to channel_last, to match input, or does it convert input to channel_first to match weight? If it’s always the former, what’s the point of model=model.to(memory_format=torch.channels_last) line?
This might have been one possible approach, but would potentially break ambiguous memory layout checks, such as 1x1 kernels.
The current implementations checks, if the suggested memory format of the input or weight is channels_last (code) and should use it, if applicable.
I see. So does this mean that, in the tutorial example, it’s not necessary to set input to channels_last - it’s redundant because setting the model to channels_last forces it anyway. Is there any situation where we would want to set the inputs in addition to setting the model?
I would always set the memory layout explicitly, not rely on the internal workflow of fixing ambiguous memory layouts, and would stick to the tutorial.
Makes sense, thank you Piotr. I have to say, this behavior will cause confusion, because people will forget to set one or the other, see that it still works, and will wonder whether it actually worked as intended. At the very least I’d mention something about this in the tutorial.
Is it possible to run a Conv2d on a NHWC tensor (not NCHW)? it seems that if I put channels explicitely at the end of @michaelklachko example and keep “channels_last” behavior it does not work.
Not sure what you mean by “put channels explicitly”. If you mean you manually transposed the tensor so that channels dim is the last one, then yes, it will fail, because “channels_last” feature is designed to work with tensors in their default shapes. The transformation happens internally.
Like I said, this is going to be confusing to people until the docs and error messages are improved.
hello, I know this discussion is more than a year old but I just came across this as I’m facing an issue that I think is relevant to this.
I couldn’t reproduce some output and when I dug into it found out that the reason was because in one case the model.to(memory_format=torch.channels_last) is set but in another case it’s not. Now, according to this discussion it shouldn’t matter and pytorch should internally take care of it, but that’s not what I’m observing.
Interestingly, if the input channel is 3 instead of 1, it will be correct. Just wondering if this could be some bug? I know the proper way is to set the input memory format to also be channels_last, but this is some online repo that I’m looking at instead of my own code, so I’m just wondering if this behavior is intended by pytorch / the author or if it’s a bug that the author isn’t aware of, in which case I should raise it with them.
I tried to use channels last memory format but seems it will give a different result. I am using cudnn 8.6.0, cuda 11.7, python 3.8.16, and pytorch 2.0.1+cu117. Here are the results:
The posted differences are expected.
In your code you are directly calling half() on the inputs and layer to compare y_half against y_half_mem, which shows a mismatch in the expected range for float16.
Also, you are not disallowing TF32 and are then comparing y vs. y_mem.
Disable it via torch.backends.cudnn.allow_tf32 = False and the numerical mismatch will reduce.