Swapping BatchNorm for LayerNorm in ResNet

jacobbuckman · February 18, 2021, 8:03am

Question about the interface to ResNet in torchvision.

I’m trying to create a ResNet with LayerNorm (or GroupNorm) instead of BatchNorm. There’s a parameter called norm_layer that seems like it should do this: resnet18(num_classes=output_dim, norm_layer=nn.LayerNorm)

But this throws an error,
RuntimeError('Given normalized_shape=[64], expected input with shape [*, 64], but got input of size[128, 64, 14, 14]')
about the shapes being wrong. Is this deprecated? Or am I using the interface wrong? What is the recommended way to do this?

googlebot · February 18, 2021, 2:35pm

Layernorm was not designed for images, thus it works with “last dimension” tensors usual elsewhere. Technically, it will work with two permute() calls, how well it substitutes batchnorm is another matter…

jacobbuckman · February 18, 2021, 4:29pm

Okay, let me ask a different way then: what are the currently-implemented valid arguments to norm_layer? I’ll take anything besides BatchNorm. (Any of the myriad approaches that normalizes locally, without being reliant on the batch.)

googlebot · February 18, 2021, 4:49pm

You can write a wrapper around nn.LayerNorm that permutes dimensions before&after. Or try online norm. IDK how well these will work with resnet untuned.

jacobbuckman · February 18, 2021, 5:56pm

I see, thanks.

Related q, although maybe this deserves a new thread…what is this about?

if groups != 1 or base_width != 64:
  raise ValueError('BasicBlock only supports groups=1 and base_width=64')

Can I not adjust the width of the resnet? Is there a non-basic block I should be using?

I don’t care about width, per-say: I really just want a single parameter that I can adjust which will change the “scale” of the model, making it larger or smaller. What’s the best way to do this?

googlebot · February 19, 2021, 2:44am

Perhaps you need to generate a network with Bottleneck blocks, not sure.

PS also check GroupNorm, that’s like parallel LayerNorm blocks, and it works with N,C,* tensors.