Dear all,
I wanted to use automatic mixed precision to train my model.
The output of my model is a weighted average of the output of several components, Y= w1y1 + w2y2 + ... + wkyk
,
where y1,...yk
is the output of each component after applying the sigmoid function (like a weighted averaged ensemble).
For this reason, I cannot use BCEWithLogitsLoss since simply taking the sigmoid of Y is not equivalent to the weighted average of the sigmoid outputs.
Any suggestions?
Many thanks!
From the docs:
The backward passes of torch.nn.functional.binary_cross_entropy()
(and torch.nn.BCELoss
, which wraps it) can produce gradients that aren’t representable in float16
. In autocast-enabled regions, the forward input may be float16
, which means the backward gradient must be representable in float16
(autocasting float16
forward inputs to float32
doesn’t help, because that cast must be reversed in backward). Therefore, binary_cross_entropy
and BCELoss
raise an error in autocast-enabled regions.
Assuming you can guarantee the numerical stability, remove the loss calculation (and maybe the weighting operations beforehand) from the autocast
region to use float32
for them.
1 Like
Sounds good. Will give it a try. Thank you!