Non-Affine InstanceNorm Layers

According to the documentation for torch.nn. InstanceNorm1d, when affine is set to True, the beta (additive) and gamma (multiplicative) parameters are learnable.

When affine is set to False, should we infer that beta and gamma are simply absent (i.e., functionally 0 and 1, respectively)?

Also, no details are given about the parameter initialization: when affine is True, how are they initialized? Can the initialization be changed?

(The same question applies to the 2d version, if the answers are different.)


From the source code, it says that beta and gamma initialized like batch norm when affine=True. And here is how batch norm initialize link.

As for affine=False, in my shallow view, I think it will also similar to the batch norm, set to None but not 0 and 1.

Alas, that code is rather ambiguous, referencing weights and bias rather than gamma and beta.
Developers, would I be correct in thinking that bias is beta and weight is gamma?

I repeat my request for confirmation:
Developers? Anyone?

Weight and bias should correspond to gamma and beta as in the nn.BatchNorm case.
Although the parameter are named differently, this creates a consistent API naming scheme.

Right, but the question is, “respectively?”
Does weight correspond to gamma, and bias to beta? Or vice-versa?

Unlike (say) the weights and biases of a convolutional layer, you can’t tell by looking at the shape of the tensor, as the shapes are the same.

Thanks for clarifying the question, as I’ve missed that part. :wink:

Yes, as can be seen in this line of code at::batch_norm will be called, where the multiplicative factor gamma corresponds to weight and the additive value beta corresponds to bias.

Excellent, thanks so much.
That’s what it looked like, but I needed to be certain.