Checkerborad effect

I slightly modified the Unet and now I experience this
for the GT of


I assume this is due to ConvTranspose2D. How to overcome this?

class UpConv(nn.Module):
    def __init__(self, in_channels, in_channels_skip, out_channels,
                 kernel_size, padding, stride):
        super(UpConv, self).__init__()
        self.act = nn.ReLU()

        self.conv_trans1 = nn.ConvTranspose2d(in_channels, in_channels, kernel_size=2, padding=0, stride=2)
        self.bn1 = nn.BatchNorm2d(in_channels, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        self.bn2 = nn.BatchNorm2d(out_channels, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

        self.conv_block = BaseConv(
            in_channels=in_channels + in_channels_skip,

    def forward(self, x, x_skip):
        x = self.act(self.bn1(self.conv_trans1(x)))
        x = ((x, x_skip[:, :, :x.shape[2], :x.shape[3]]), dim = 1)
        x = self.act(self.bn2(self.conv_block(x)))
        return x

I am using always 400x400 images to train so the slicing is really not that needed x and x_skip do match for the cat always.

source code

Hi Blackbird!

Speaking specifically of the architecture of the original U-Net paper (so
this comment may or may not be relevant to your use case), in which
the convolutions are not padded and they nibble away at the edges of
the image, a size of 400 will not produce an x and x_skip whose image
dimensions match (as would be required for cat() to succeed).

The nearest “self-consistent” sizes to 400 (for the input image) are 396
and 412.


K. Frank

Hi Frank,

Valuable feedback of yours and appreciated more than highly.

This architecture only works with image sizes that are 8*x.

shapes:x,x_skip torch.Size([1, 512, 100, 100]) torch.Size([1, 256, 100, 100])

shapes:x,x_skip torch.Size([1, 256, 200, 200]) torch.Size([1, 128, 200, 200])

shapes:x,x_skip torch.Size([1, 128, 400, 400]) torch.Size([1, 64, 400, 400])

Are you suggesting channel dimension of x should match the channel dimension of x_skip, or the upsampled output channels should match high resolution features channels (terms from the paper).

I am not using mirroring, you are right just the simple zero padding of 1 in size to compensate. I know the zero padding is evil, and I would probably benefit from mirror padding but let’s pretend this is not causing the checkerboard effects.

Unless you are referring to the references of the original U-Net paper I haven’t found any evidence after reading it carefully that channel dimensions for x and x_skip should match for the cat.

Do you advise that this would be beneficial? That would be needed of course if you would be using the Linknet architecture.

I am tinkering upon your idea, give me some clue or evidence why this may be important.

@KFrank after some checking it looks that stride 2 convolution or stride 2 transposed convolutions will produce the checkerboard effect no matter if the kernel size is divisible by the stride.

This led me to the conclusion to avoid them and to use the resize option since even max pooling provides high frequency image distortions as well.

I wonder if @ptrblck has something to add from his victorious experience since I modified his U-Net :thinking:

Hi Blackbird!

I don’t have any concrete evidence that contradicts your conclusion, but
I don’t find the argument in the online article you cite compelling on this

Quoting from that article:

One approach is to make sure you use a kernel size that is divided by
your stride, avoiding the overlap issue.

In the context of pytorch’s ConvTranspose2d, this means using, for

ConvTranspose2d (in_channels, out_channels, kernel_size = 2, stride = 2)

This is exactly the pytorch-class implementation of the “up-conv 2x2”
described in the original U-Net paper.

As your online paper explains, this specific upsampling does not cause
checkerboarding by the mechanism discussed in some detail in that paper.

However, your paper then goes on:

However, while this approach helps, it is still easy for deconvolution to
fall into creating artifacts.

I just don’t know what to make of this statement. It seems to be saying
that U-Net’s (original) “up-conv 2x2” does (or might) nonetheless cause
artifacts, but it doesn’t say what kinds of artifacts (checkerboarding or
something else) those might be, nor by what mechanism those artifacts
might be caused.

In general I think the online paper you cite is quite good – it gives a good,
understandable explanation, e.g., with its kernel_size = 3, stride = 2
example, of how upsampling can produce checkerboarding. But that
second statement I quoted notwithstanding, it doesn’t show how U-Net’s
“up-conv 2x2” would produce checkerboarding, and, in fact, outlines an
argument to the contrary.


K. Frank

1 Like

There is no checkerboard effect in the new architecture with stride 1.

For the GT

I am getting