I am working with yolostereo3d for stereo3d object detection (solely stereo camera, no velydone)on kitti dataset with edgeNext as the backbone instead of resNet.
Before changing the backbone from resNet to edgeNext with the same kitti dataset, everything was working fine. However, I started having the below error afterwards:
RuntimeError: Given groups=1, weight of size [8, 1024, 1, 1], expected input[8, 304, 9, 40] to have 1024 channels, but got 304 channels instead
Here is how I changed the backbone:
class YoloStereo3DCore(nn.Module):
"""
Inference Structure of YoloStereo3D
Similar to YoloMono3D,
Left and Right image are fed into the backbone in batch. So they will affect each other with BatchNorm2d.
"""
def __init__(self, backbone_arguments):
f = open("/home/zakaseb/Thesis/YoloStereo3D/Stereo3D/Sequence.txt", "a")
f.write("yolosterero3dCore_init \n")
f.close()
super(YoloStereo3DCore, self).__init__()
self.backbone =edgenext_small(**backbone_arguments) # Resnet, change backbone from here
base_features = 256 #if backbone_arguments['depth'] > 34 else 64 # meaning which depth of resnet
self.neck = StereoMerging(base_features) #stereomerging outputs features and depth output.
Here is the edgenext_small()
@BACKBONE_DICT.register_module
def edgenext_small(pretrained=False, **kwargs):
FPS @ BS=1: 93.84 & @ BS=256: 1785.92 for MobileViT_S
model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[48, 96, 160, 304], expan_ratio=4,
global_block=[0, 1, 1, 1],
global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
use_pos_embd_xca=[False, True, False, False],
kernel_sizes=[3, 5, 7, 9],
d2_scales=[2, 2, 3, 4],
classifier_dropout=0.0)
return model