Custom Loss Function/Class

I am trying to create a custom loss function to train an autoencoder for image generation. In particular, I want to symmetrize the BCELoss() function. My attempt is as follows:

import torch.nn.functional as F
from torch import nn

class symmBCELoss(nn.BCELoss):
    def forward(self, input: Tensor, target: Tensor) -> Tensor:
        return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction) + F.binary_cross_entropy(target, input, weight=self.weight, reduction=self.reduction)

However, when I try to use this as a loss function to train my network I get the error:

RuntimeError: the derivative for 'target' is not implemented

I assume this is because the first argument of F.binary_cross_entropy() is hardcoded to be the input and not the target, so the appropriate gradients don’t match up. Is there a preferred way to set up this custom loss function class in a way that maximizes inheritance from the existing class? Just from the way the source code is laid out it seems like there are useful optimizations in the current implementation of the BCE loss and I would like to take advantage of them if possible.


This happens because the F.binary_cross_entropy() can only compute gradients wrt to its first argument. But in the second call you make, the second argument (input) is a Tensor that requires gradients. Hence the error.

If you need gradients for the target, it might be simpler to re-write the bce as a simple function that just computes (-self.weight (target * input.log() + (1 - target) * (1-input).log())).mean() (assuming mean reduction).

I don’t really want gradients for the target. I still just want gradients for the input, I just want to symmetrize the functional form of the loss function w.r.t. the input and target. And as I said above, I figured it would be better to try and use inheritance whenever possible to keep as much of the underlying optimization as I can. If you’re saying that using inheritance in this way isn’t likely to be possible then I guess I will just have to re-implement the function itself and worry about the performance later.

You don’t want gradients for your “target” Tensor but you do want gradients for the second argument of bce right?
And bce does not implement backward towards that second argument.
But in your second call, you give your “input” Tensor (that requires grad) as the second argument.

Ah, right I understand what you mean. That’s correct, I do want gradients through the second argument (or something equivalent). It sounds like what you’re saying is that what I’m looking for isn’t really possible within the inheritance structure of the BCELoss class.

I am afraid BCELoss does not.
But looking at the code, BCELossWithLogits does (
So if you actually use a sigmoid before it and want to merge both, that will work :slight_smile:

Otherwise, you will have to create a custom nn.Module yes. But the one line formula for BCELoss given above should not be significantly slower than the all-in-one version.