Autocast with normalization layers

Konpat_Ta_Preechakul · August 26, 2020, 3:31am

Normalization layers, such as batch normalization and group normalization, comprise of reduce operations which, as Nvidia’s Apex suggests, should be done in full FP32.

However, I have not seen these layers’ operations in the Ops eligibility page. Hence, the behavior of these layers are non-changing which would result in FP16 outputs.

Is this a valid concern? It was not mentioned in the documentation.

ptrblck · August 27, 2020, 8:04am

BatchNorm layers should keep their parameters in FP32 and an FP16 input will be transformed to FP32 before the operations are applied.

Konpat_Ta_Preechakul · August 27, 2020, 10:47am

Could you point out how to read the documentation and get this kind of wisdom?

tom · August 27, 2020, 11:09am

group_norm autocasting might be taken as a hint, but you are right that the page you link doesn’t mention batch norm (your chance to be a hero to others having the same concern! ).

Konpat_Ta_Preechakul · August 27, 2020, 1:59pm

I can verify that the output from batch normalization is not autocasted to float32, unlike that from layer normalization, which is right according to the documentation, only layer_norm is in the “autocast to float32” list.

I still cannot agree that this is the expected behavior.

Pytorch version: 1.6.0

with autocast():
    net = nn.BatchNorm1d(4).cuda()
    x = torch.randn(2, 4).half().cuda()
    y = net(x)
    print(y.dtype)  # torch.float16

with autocast():
    net = nn.LayerNorm(4).cuda()
    x = torch.randn(2, 4).half().cuda()
    y = net(x)
    print(y.dtype)  # torch.float32