InstanceNorm1D vs BatchNorm1D


I’m not sure if I should use InstanceNorm1D or BatchNorm1D in my network and I’d be grateful for some help.

I have an output x of shape (N, L) where N is the number of elements in the batch and L is the number of activations. I’d like to perform normalization for each l in L where the statistics are computed across x[:,l] and there are separate parameters gamma and beta for each l. Based on the docs it seems to me that both of the following layers will achieve the desired effect:

  1. torch.nn.BatchNorm1d(L, affine=True)
  2. torch.nn.InstanceNorm1d(L, affine=true)

and there would only be a difference if I had an output (N, C, L). Is this correct?