Question about DETR gradient backpropagation to object queries.

Assume we only have 4 object queries and the batch size is one, only the last decoder layer will output loss, and negative labels will not be calculated into the loss. If we set tgt_mask like
[[False, False, True, True],
[False, False, True, True],
[True, True, False, False],
[True, True, False, False],]
And only the first object query matches the True label within one step during training, the last two object queries will not be updated during gradient back propagation. But I found that all object queries have been updated. Can anyone clarify it? Thank you so much.

DETR assigns all queries as either positive or negative samples and they are supervised with a classification loss. Only the queries assigned as positive samples are supervised with regression losses related to the bounding box prediction.

Each query will be updated by classification loss and the subset of queries assigned as positive samples (foreground) will also be updated by regression losses.

Thank you so much for replying the post! I just updated the post. Because I can’t modify it when it was pending state. In my expirement, I ignored negative samples and only calculate loss caused by positive samples by using torch.gather. And only caluate loss of the last decoder layer output. So the other object queries should not be updated theoretically. But the result make me confused.

I found it may be caused by the momentum in optimzer. So it’s is correct such tgt masks can prevent interactions among object queries?

tgt_mask will prevent interactions with the tgt sequence, but they will be still be supervised by DETR loss functions by default. To prevent specific queries from being supervised, the loss functions will need the same per query masks.

Does your system work in an autoregressive manner?

It is used for trajectory prediction. A lot of papers uses multiple decoders to ensemble results and achieve a better performance. So I was thinking that could we use tgt_mask to make a single large decoder runnning like multiple small decoders.