# nn.LayerNorm for a specific dimension of my tensor?

I’d like to apply layernorm to a specific dimension of my tensor.

``````N=1
C=10
H=10
W=2
input = torch.randn(N, C, H, W)
^
``````

In the above example, I’d like to apply layernorm along the C dimension.

Looking at the LayerNorm documentation, as I understand it, you can only tell nn.LayerNorm the size of dimension to which you’d like to apply layernorm. I think this creates a problem if you have 2 dimensions of the same size, and you’d like to apply layernorm to the leftmost dimension.

Concretely, if I do the following, I believe it actually applies layernorm to dimension H, because it is the same size as dimension C, and it is further right in the list of dimensions.

``````N=1
C=10
H=10
W=2
input = torch.randn(N, C, H, W)
layernorm = nn.LayerNorm(C)
output = layernorm(input)
``````

Is there a way around this?

I suppose one solution is to transpose (perhaps using `permute`) before calling LayerNorm, but that feels a bit inelegant.

1 Like

#### The approach I ended up using

I ended up using `permute` to make `C` the rightmost dimension before LayerNorm, and then permuting again to go back to the original shape.

Let’s do a simpler example with 3 dimensions instead of 4:

``````import torch
from torch import nn

def get_input_tensor(dims):
t = torch.zeros(dims)
t_flat = t.view(t.numel()) # thx: https://discuss.pytorch.org/t/any-alternatives-to-flat-for-tensor/3106

# fill with something like [[[0,1,2], [3,4,5]]]
for i in range(t_flat.numel()):
t_flat[i] = i
return t

N=1
C=3
W=3
layernorm = nn.LayerNorm(C)

input = get_input_tensor([N,C,W])
x = input.permute(0, 2, 1) # [N, C, W] --> [N, W, C]
x = layernorm(x)
output = x.permute(0, 2, 1) # [N, W, C] --> [N, C, W]
``````

In practice, of course we’d want to put this in an `nn.Module` and initialize the `nn.LayerNorm` in the module’s `__init__()` function.

I haven’t done any careful speed testing to see whether the `permute` adds much runtime.

#### Correctness check

I was able to get the above to match the numerics of a hand-coded LayerNorm that operates on the middle dimension of a [N, C, W] input tensor:

``````# adapted from https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/LanguageModeling/BERT/modeling.py#L317
class LayerNorm_Custom(nn.Module):
def __init__(self, hidden_size, eps=1e-12):
super().__init__()
self.weight = nn.Parameter(torch.ones(hidden_size))
self.bias = nn.Parameter(torch.zeros(hidden_size))
self.variance_epsilon = eps

def forward(self, x):
x = x.permute(0, 2, 1) # [N, C, W] --> [N, W, C]
u = x.mean(-1, keepdim=True)
s = (x - u).pow(2).mean(-1, keepdim=True)
x = (x - u) / torch.sqrt(s + self.variance_epsilon)
x = self.weight * x + self.bias
x = x.permute(0, 2, 1)  # [N, W, C] --> [N, C, W]
return x
``````