Auxiliary loss in Fully Convolutional Network

I noticed that the fully convolutional network present in torchvision has an auxiliary head.
I thought torchvision.models.fcn_resnet50/101 used the standard resnet architecture but in the original resnet paper the authors don’t mention using any auxiliary loss.

Is the resnet implemented in torchvision a custom one? Or is it based on a subsequent publication? Is there a paper explaining the differences with the standard architecture?

Thank you very much