Autocast with normalization layers

Normalization layers, such as batch normalization and group normalization, comprise of reduce operations which, as Nvidia’s Apex suggests, should be done in full FP32.

However, I have not seen these layers’ operations in the Ops eligibility page. Hence, the behavior of these layers are non-changing which would result in FP16 outputs.

Is this a valid concern? It was not mentioned in the documentation.

1 Like

BatchNorm layers should keep their parameters in FP32 and an FP16 input will be transformed to FP32 before the operations are applied.

Could you point out how to read the documentation and get this kind of wisdom?

group_norm autocasting might be taken as a hint, but you are right that the page you link doesn’t mention batch norm (your chance to be a hero to others having the same concern! :wink: ).

2 Likes

I can verify that the output from batch normalization is not autocasted to float32, unlike that from layer normalization, which is right according to the documentation, only layer_norm is in the “autocast to float32” list.

I still cannot agree that this is the expected behavior.

Pytorch version: 1.6.0

with autocast():
    net = nn.BatchNorm1d(4).cuda()
    x = torch.randn(2, 4).half().cuda()
    y = net(x)
    print(y.dtype)  # torch.float16
with autocast():
    net = nn.LayerNorm(4).cuda()
    x = torch.randn(2, 4).half().cuda()
    y = net(x)
    print(y.dtype)  # torch.float32
1 Like