FasterRCNN class encoding

loicdtx · January 30, 2020, 5:23pm

Hello everyone,

I am training a FasterRCNN model for detecting objects belonging to two classes (excluding background). These classes are not encoded 1 and 2 but 7 and 22. Is it advisable in this case to re-encode the classes as 1 and 2, and set the number of classes (num_classes) of FastRCNNPredictor to 3 (the two classes + background)?
Otherwise I have to set the number of classes of FastRCNNPredictor to 23 (max encoding + 1 for background), which does not seem ideal.

Thanks

ptrblck · January 31, 2020, 12:10am

I would recommend to set the number of output activations to the number of classes you are dealing with.
Otherwise your model might output irrelevant classes without any meaning.

loicdtx · February 7, 2020, 1:57pm

Thanks for your response @ptrblck; I re-encoded on the fly using a transform (see below).
In addition to what you mentioned (having irrelevant classes), my initial worry was that the class loss would be somehow “diluted” if I would set the number of output activations to a higher value than the actual number of classes.

The transform

class EncodeClasses(object):
    """Re-encode classes

    Useful to limit the number of output activations to the actual number of
    classes when initial class codes do not follow a simple sequence ``[1:len(classes)]``
    """
    def __init__(self, encoding):
        """
        Args:
            encoding (dict): Dict of old code, new code mapping
        """
        self.encoding = encoding

    def __call__(self, image, target):
        labels = target['labels']
        for k,v in self.encoding.items():
            labels[labels == k] = v
        target['labels'] = labels
        return image, target