Why do I have this error: The shape of the mask [69120] at index 0 does not match the shape of the indexed tensor [17280, 12] at index 0

I am working with yolostereo3d for stereo3d object detection (solely stereo camera, no velydone)on kitti dataset with EdgeNext as the backbone instead of resNet.

After hard-coding the number of feature map channels between YoloStereo3D and EdgeNext to ensure homologation after a previous error(found in comment below), I get an error at a later stage in the training pipeline found below:

  File "/home/zakaseb/Thesis/YoloStereo3D/Stereo3D/visualDet3D/networks/heads/detection_3d_head.py", line 429, in loss
    useful_mask = anchors['mask'][j] #[N]
IndexError: The shape of the mask [69120] at index 0 does not match the shape of the indexed tensor [17280, 12] at index 0

Here is how I changed the backbone:

class YoloStereo3DCore(nn.Module):
    """
        Inference Structure of YoloStereo3D
        Similar to YoloMono3D,
        Left and Right image are fed into the backbone in batch. So they will affect each other with BatchNorm2d.
    """
    def __init__(self, backbone_arguments):
        f = open("/home/zakaseb/Thesis/YoloStereo3D/Stereo3D/Sequence.txt", "a")
        f.write("yolosterero3dCore_init \n")
        f.close()
        super(YoloStereo3DCore, self).__init__()
        self.backbone =edgenext_small(**backbone_arguments) # Resnet, change backbone from here

        base_features = 256 #if backbone_arguments['depth'] > 34 else 64 # meaning which depth of resnet
        self.neck = StereoMerging(base_features) #stereomerging outputs features and depth output.

Here is the edgenext_small()

@BACKBONE_DICT.register_module
model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[32, 64, 128, 256], expan_ratio=4,
                 global_block=[0, 1, 1, 1],
                 global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                 use_pos_embd_xca=[False, True, False, False],
                 kernel_sizes=[3, 5, 7, 9],
                 d2_scales=[2, 2, 3, 4],
                 classifier_dropout=0.0)

As per the traceback, the error broke out on this line

useful_mask = anchors['mask'][j] #[N]

which would explain the 2-dimensional mask in the error[17280, 12].

I don’t know how this issue could be related to the backbone as this concerns the detector head section of the pipeline, but since it was working with ResNet and not with EdgeNext, it has to be related.

error previously posted: Why do I have the error: Given groups=1, weight of size [8, 1024, 1, 1], expected input[8, 304, 9, 40] to have 1024 channels, but got 304 channels - #3 by zakaseb)

Kindly advise @ptrblck.