InstanceNorm behavior changed?

I recently updated a network from PT from 1.2 tp 1.10.
After the update I started getting the error “ValueError: Expected more than 1 value per channel when training…”
I understand completely. At the end of the network, I have a tensor of size [batch, 1000, 1, 1] that enters an InstanceNorm layer.
However, it never used to cause a problem with PT 1.2 or 1.1.

I tried removing this instance norm, and the network gives significantly worse results.
For completeness, I also had to change F.sigmoid to torch.sigmoid in some places.

I tired training multiple times using PT 1.2 and 1.10 and I get a consistent large difference. Also tried looking at the layer’s source code, but it doesn’t show any relevant difference within the Python part.

Can someone help recreate the old behavior? Thanks!