Normalization layers, such as batch normalization and group normalization, comprise of reduce operations which, as Nvidia’s Apex suggests, should be done in full FP32.
However, I have not seen these layers’ operations in the Ops eligibility page. Hence, the behavior of these layers are non-changing which would result in FP16 outputs.
Is this a valid concern? It was not mentioned in the documentation.
group_norm autocasting might be taken as a hint, but you are right that the page you link doesn’t mention batch norm (your chance to be a hero to others having the same concern! ).
I can verify that the output from batch normalization is not autocasted to float32, unlike that from layer normalization, which is right according to the documentation, only layer_norm is in the “autocast to float32” list.
I still cannot agree that this is the expected behavior.
Pytorch version: 1.6.0
with autocast():
net = nn.BatchNorm1d(4).cuda()
x = torch.randn(2, 4).half().cuda()
y = net(x)
print(y.dtype) # torch.float16
with autocast():
net = nn.LayerNorm(4).cuda()
x = torch.randn(2, 4).half().cuda()
y = net(x)
print(y.dtype) # torch.float32