Apparently Mask head uses BCEWithLogits, so it combines sigmoid+CE, to avoid competition between masks. I get a negative value. Is this a bug, or someting’s wrong with my encoding? I encoded each target mask as a uint8 array, where all pixels are 0 except the object’s mask, which are set to the masks’s class (i.e it matches the target class).
Could you post a link to the repository you are using?
Also, could you post the inputs and targets for a negative loss value?
Negative values for nn.BCE(WithLogits)Loss
can be created if your target is out of bounds.
I’m using this one:
Thanks for the link. Have you checked the target values?
How do I correctly label the mask? If I have K objects across N classes in one image, the mask label array must be HxWxK, but in each map, do the mask values have to be =1 or =Class_number?
If you are using nn.BCE(WithLogits)Loss
, the target should have the same shape as your output activation, where the channel index corresponds to the class index.
A one indicates the presence of the class at this spatial position, while a zero does not.
This is from the mask_rcnn.py:
During training, the model expects both the input tensors, as well as a targets dictionary,
containing:
- boxes (Tensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values
between 0 and H and 0 and W
- labels (Tensor[N]): the class label for each ground-truth box
- masks (Tensor[N, H, W]): the segmentation binary masks for each instance
here N is a ground-truth box, not class index.
So if I have (for example) 3 objects of class 1 and 2 objects of class2 and 1 object of class 3, image size 224x224, the mask array will be size (6x224x224), each map will be 0 for background and 1 for object. During training BCE is applied pixelwise, and the correct class is taken from the labels
vector. Since for each object its class/label, bbox and mask are appended to a list, the order stays the same. so the correct class is taken from the labels
vector.