Custom Loss function for a network

I have a network (can be VGG, Resnet, Densenet) with its head/final layer split into two sibling layers. Both the layers are of size equal to number of classes. One layer outputs logits (before softmax) while the other one outputs noise for each class. In simple terms, my loss function is a cross entropy over element-wise sum of both the layers.

For this, should I be extending autograd and implement both forward and backward separately for each sibling layer? or something else can be done?

There’s no need to implement the backward. Just calculate the loss and do loss.backward().

Are your sure?

The network produces two different outputs. I do an operation using the two outputs separately, not in the forward pass.
How will that work out with backward then?

For clarification, I am trying to do the following. image

As you can see, here y and sigma are network outputs.

Ref: Image from

just write your loss in terms of autograd operations, and call backward. you dont need to do anything special like writing your own autograd.Function with a custom backward.

That worked out just fine. Just needed to dig more into PyTorch documentation. Thanks.