I think doing
x = torch.randn(1, 3, 6) # batch size 1, 3 channels, 6 length of sequence
a = nn.Conv1d(3, 6, 3) # in channels 3, out channels 6, kernel size 3
gn = nn.GroupNorm(1, 6)
gn(a(x))
tensor([[[-0.1459, 0.5860, 0.1771, 1.1413],
[-0.8613, 2.7552, -1.0135, 0.8898],
[-0.1119, -0.1656, -0.4536, -0.9865],
[ 0.6755, -1.3193, 1.2248, -0.5849],
[ 1.2789, -0.5229, 0.1345, 0.1763],
[-2.1555, 0.0149, -0.2769, -0.4565]]], grad_fn=)
is equivalent to
ln = nn.LayerNorm([6, 4])
ln(a(x))
tensor([[[-0.1459, 0.5860, 0.1771, 1.1413],
[-0.8613, 2.7552, -1.0135, 0.8898],
[-0.1119, -0.1656, -0.4536, -0.9865],
[ 0.6755, -1.3193, 1.2248, -0.5849],
[ 1.2789, -0.5229, 0.1345, 0.1763],
[-2.1555, 0.0149, -0.2769, -0.4565]]],
grad_fn=)
so we could do
nn.GroupNorm(1, out_channels)
and we will not have to specify Lout after applying Conv1d and it would act as second case of LayerNorm specified above.
So, to compare batchnorm with groupnorm or 2nd case of layernorm, we would have to replace
nn.BatchNorm1d(out_channels)
with
nn.GroupNorm(1, out_channels)