RuntimeError:Given groups=1,weight of size 256 304 3 3,expected input[1,2096,129,129]to have 304 channels, but got 2096 channels instead

An error occurred while I was trying to train the VOC2007 data set:

RuntimeError:Given groups=1, weight of size 256 304 3 3,expected input[1,2096,129,129]to have 304 channels, but got 2096 channels instead.

The error code is:
class Decoder(nn.Module):
def init(self, num_classes, backbone, BatchNorm):
super(Decoder, self).init()
if backbone == ‘resnet’ or backbone == ‘drn’:
low_level_inplanes = 256
elif backbone == ‘xception’:
low_level_inplanes = 128
elif backbone == ‘mobilenet’:
low_level_inplanes = 24
raise NotImplementedError
self.conv1 = nn.Conv2d(low_level_inplanes, 48, 1, bias=False)
self.bn1 = BatchNorm(48)
self.relu = nn.ReLU()
self.last_conv = nn.Sequential(nn.Conv2d(304, 256, kernel_size=3, stride=1, padding=1, bias=False),
nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1, bias=False),
nn.Conv2d(256, num_classes, kernel_size=1, stride=1))

def forward(self, x, low_level_feat):
    low_level_feat = self.conv1(low_level_feat)
    low_level_feat = self.bn1(low_level_feat)
    low_level_feat = self.relu(low_level_feat)
    x = F.interpolate(x, size=low_level_feat.size()[2:], mode='bilinear', align_corners=True)
    x =, low_level_feat), dim=1)
    x = self.last_conv(x)

    return x

VOC data set is a 3-channel image, I used it to train deeplab v3plus network, 23 lines of the above code are written into the input channel bit 304, how should I calculate the channel number of the input 3-channel image without a convolution or other layer training? According to the error, directly modify 304 to 2096? Hope to get the answer, thank you very much

It seems this line of code:

is increasing the number of input channels, so you would have to set in_channels of self.last_conv to 2096 based on the error message.

Thanks, this problem was solved after the modification, but a new problem appeared:
lonly batches of spatial targets supported(non-empty 3D tensors)but got targets of size:[2,513,513,3]
Error code points to:
def training(self, epoch):
train_loss = 0.0
tbar = tqdm(self.train_loader)
num_img_tr = len(self.train_loader)
for i, sample in enumerate(tbar):
image, target = sample[‘image’], sample[‘label’]
if self.config[‘network’][‘use_cuda’]:
image, target = image.cuda(), target.cuda()
self.scheduler(self.optimizer, i, epoch, self.best_pred)
output = self.model(image)
loss = self.criterion(output, target)
train_loss += loss.item()
tbar.set_description(‘Train loss: %.3f’ % (train_loss / (i + 1)))
self.writer.add_scalar(‘train/total_loss_iter’, loss.item(), i + num_img_tr * epoch)

        # Show 10 * 3 inference results each epoch
        if i % (num_img_tr // 10) == 0:
            global_step = i + num_img_tr * epoch
            self.summary.visualize_image(self.writer, self.config['dataset']['dataset_name'], image, target, output, global_step)

    self.writer.add_scalar('train/total_loss_epoch', train_loss, epoch)
    print('[Epoch: %d, numImages: %5d]' % (epoch, i * self.config['training']['batch_size'] +[0]))
    print('Loss: %.3f' % train_loss)

line 113,loss=loss = self.criterion(output, target)
def CrossEntropyLoss(self, logit, target):
n, c, h, w = logit.size()
criterion = nn.CrossEntropyLoss(weight=self.weight, ignore_index=self.ignore_index,
if self.cuda:
criterion = criterion.cuda()

    loss = criterion(logit, target.long())

    if self.batch_average:
        loss /= n

    return loss

I change loss = criterion(logit, target.long()) to loss = criterion(logit, torch.squeeze(target).long()) in, but no use ,Maybe why

If you are dealing with a multi-class segmentation use case, the targets should have the shape [batch_size, height, width] and contain the class indices in the range [0, nb_classes-1].
Your target seems to be a channel-last RGB image tensor, which will not work.

If each class in your target uses a specific color code, you would have to convert these color codes to class indices first.

The data set I used is the VOC2007 version. I have checked the label chart, which is all 8-bit single-channel index chart with 21 categories. I have no idea what the problem is

In that case some transformation or other processing might have created these channel-last 3-channel target tensors in the provided shape [2, 513, 513, 3], which won’t work.
Could you check, where your single-channel targets get converted to 3-channels?