ChauffeurNet Loss functions

Hi,
I’m trying to understand loss functions used in the ChauffeurNet paper from Waymo.

They want to predict a mask and a point for the future position of the car. The loss function is as follows:


The masks are as follows:

I want to implement these loss functions in pytorch. For the Bk prediction, I used a BCELoss and sigmoid function on the output of the network. I don’t understand two sigmas in the eq(4) for the L_B loss. The BLELoss has sigma in itself. I think it is a multi-label binary classification. value of each pixel would be between 0 and 1 in output of the network. After reading pytorch forum I realized that I can use BCEWithLogits loss and assign pos_weight, because as you can see in the image, the classes are imbalanced.
And for the L_P loss I used a spatial_softamx to convert the output of the network to probabilities. I think the sum of the value of all pixels in output should be 1. right? and I think it is a one-label binary classification.
Now I don’t understand which loss function I have to use in pytorch.
What is the difference between BCELoss for one-label vs multi-label binary classification?
And if I use BCEWithLogits, how should I use network in test time? I use a separate sigmoid but it seems that the result is not good.

Thanks

@ptrblck Do you have any idea about this problem?