Is there a way to specify an alternative axis on which to perform BatchNormalizaiton. I want to apply it to a RNN which has the shape:
Batch x Steps x Features.
So the 1D batch normalization should be applied to the last dimension (Features) and not Steps (which is the default behavior since it is correct for convolutional networks).
But I could not find a way to achieve this, it seems that this logic is implemented in the native Torch part since even the functional API doesn’t allow for an axis to be specified.
So any tricks are welcome. (the only thing I could come up with is moving the axis around before and after the batch normalization, but that seems to be neither an elegant nor performant solution to the problem).