I’m using CrossEntropyLoss in my object detection model shown as below. Then I’m using APEX amp to enable “O2”-level auto mixed precision training.
class PredictionHead(): def __init__(self, class_weights): super(PredictionHead, self).__init__() class_weights = torch.tensor(class_weights, requires_grad=False) self.loss_module_class = torch.nn.CrossEntropyLoss(weight=class_weights, reduction="none") def forward(self, x): # ... def loss(self, predictions, labels): return self.loss_module_class(predictions, labels)
But I get the error as follows. It seems that APEX amp converted “class_weights” into half floats, but CrossEntryLoss computation requires full floats.
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 932, in forward ignore_index=self.ignore_index, reduction=self.reduction) File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2317, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2115, in nll_loss ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: Expected object of scalar type Float but got scalar type Half for argument #3 'weight' in call to _thnn_nll_loss_forward In call to configurable 'train_model' (<function train_model at 0x7f111c9a7f28>) in scope 'train' In call to configurable 'train' (<function _run_job at 0x7f1153a08730>)
I also posted in APEX github issue: https://github.com/NVIDIA/apex/issues/837. Any suggestion to skip conversion from full floats to half floats for CrossEntryLoss weights?