For batchnorm, it says in the doc "The mean and standard-deviation are calculated per-dimension over the mini-batches ". But for batchnorm1d, when input is of size (N，C，L), it seems N and L is merged together and the mean/var are calculated together for C. I checked the dimension of the running mean/var, it is of size C.
I was wondering is there any built-in way to implement mean/var for each C and L, but the weight/bias is only for C (sharing over L).

You could use nn.LayerNorm and specify the normalized_shape which should be used to calculate the mean and standard deviation. However, I think you would need to set elementwise_affine=False and could apply a linear layer instead on the output using your desired shape.

Thanks. But LayerNorm computes the mean/var over the neruons and they are computed for both training and test. I still want to compute the mean/var for each neuron over the batches only in training and fixed for test.