I wanted to use automatic mixed precision to train my model.
The output of my model is a weighted average of the output of several components,
Y= w1y1 + w2y2 + ... + wkyk,
y1,...yk is the output of each component after applying the sigmoid function (like a weighted averaged ensemble).
For this reason, I cannot use BCEWithLogitsLoss since simply taking the sigmoid of Y is not equivalent to the weighted average of the sigmoid outputs.
From the docs:
The backward passes of
torch.nn.BCELoss, which wraps it) can produce gradients that aren’t representable in
float16 . In autocast-enabled regions, the forward input may be
float16 , which means the backward gradient must be representable in
float16 forward inputs to
float32 doesn’t help, because that cast must be reversed in backward). Therefore,
BCELoss raise an error in autocast-enabled regions.
Assuming you can guarantee the numerical stability, remove the loss calculation (and maybe the weighting operations beforehand) from the
autocast region to use
float32 for them.
Sounds good. Will give it a try. Thank you!