I’m training a network that consists of convolutional layers, which in the end outputs a single binary value (1.0 = true, 0.0 = false). I’m training used mixed precision, using Linear layers at the end, with BCEWithLogitsLoss, with Adam optimizer, using PyTorch 1.12.1.
Training works fine, generally. But completely randomly, in the middle of the training, training sometimes suddenly aborts with the following error:
RuntimeError: Subtraction, the
-operator, with two bool tensors is not supported. Use the
^or
logical_xor() operator instead.
This happens in “return torch.binary_cross_entropy_with_logits”.
Am I doing something wrong? Since the error happens randomly, the network seems to be fine, generally. So can it still be my fault? Any tips?
Network structure looks something like this:
data # image with 128x80 resolution
data = self.convs(data) # some convolutional layers
data = F.interpolate(data, scale_factor = 1.0 / 2.0, mode="bilinear", align_corners=False) # 64x40
data = self.convs(data)
data = F.interpolate(data, scale_factor = 1.0 / 2.0, mode="bilinear", align_corners=False) # 32x20
data = self.convs(data)
data = F.interpolate(data, scale_factor = 1.0 / 2.0, mode="bilinear", align_corners=False) # 16x10
data = self.convs(data)
data = F.interpolate(data, scale_factor = 1.0 / 2.0, mode="bilinear", align_corners=False) # 9x5
data = self.convs(data)
data = F.interpolate(data, scale_factor = 1.0 / 2.0, mode="bilinear", align_corners=False) # 4x2
data = data.view(data.shape[0], -1)
data = self.linearLayers(data) # uses Linear layers to get down to 1 element
return data