Change dataset from cityscapes to binary segmentation

Hi, when I try to change the dataset Cityscapes to a binary segmentation, the cross_entropy loss seems not work well with the binary mask,

Blockquote
File “source_only.py”, line 126, in main
lr_scheduler, epoch, visualize if args.debug else None, args)
File “source_only.py”, line 188, in train
loss_cls_s = criterion(pred_s, label_s)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 889, in _call_impl
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py”, line 1048, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py”, line 2693, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py”, line 2390, in nll_loss
ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4

and if I change it to BCEloss,

Blockquote
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py”, line 613, in forward
return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py”, line 2755, in binary_cross_entropy
“Please ensure they have the same size.”.format(target.size(), input.size())
ValueError: Using a target size (torch.Size([2, 256, 512, 3])) that is different to the input size (torch.Size([2, 19, 256, 512])) is deprecated. Please ensure they have the same size.

it goes wrong with the channels, even I set the num_class from 19 to 2.

I have no idea, any guidance will be so helpful!

nn.CrossEntropyLoss used for a multi-class segmentation use case expects a model output in the shape [batch_size, nb_classes, height, width] while the target should have the shape [batch_size, height, width] and contain class indices in the range [0, nb_classes-1] ([0, 1] in your case for a “binary multi-class segmentation”).
The first error is thus raised, since the target seems to have 4 dimensions.

On the other hand, nn.BCEWithLogitsLoss expects both the model output and target to have the same shape as [batch_size, nb_classes, height, width] and the target should contain floating point values in the range [0, 1] for each class (nb_classes would be 1 for a binary segmentation use case).

Based on the second error message it seems you are trying to pass color images in the channels-last memory format as the target tensors, which won’t work. In case my assumption is correct you would have to map the colors to class values first.

Hi,I change the map to

Blockquote
self.id_to_trainid = {
0: 0,
1: 1
}
self.trainid2name = {
0:“back”,
1:“file”,
}

but it seems the same problem…

Blockquote
File “train_src.py”, line 314, in
main()
File “train_src.py”, line 307, in main
model = train(cfg, args.local_rank, args.distributed)
File “train_src.py”, line 139, in train
loss = criterion(pred, src_label)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 889, in _call_impl
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py”, line 1048, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py”, line 2693, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py”, line 2390, in nll_loss
ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: 1only batches of spatial targets supported (3D tensors) but got targets of size: : [2, 3, 720, 1280]

I don’t know where this mapping is used, but based on the error message your target still contains 4 dimensions (now the channel dimension seems to be in dim1).
Could you show how you’ve tried to map the color values to the class indices in the target tensor?

Thanks for reply, the map is in this way:

class GTA5DataSet(data.Dataset):
def init(self,
max_iters=None,
num_classes=2,
ignore_label=255,
debug=False,):

    self.split = split
    self.NUM_CLASS = num_classes
    self.data_root = data_root
    self.data_list = []
    
    if max_iters is not None:
        self.label_to_file, self.file_to_label = pickle.load(open(osp.join(data_root, "gtav_label_info.p"), "rb"))
        self.img_ids = []
        SUB_EPOCH_SIZE = 3000
        tmp_list = []
        ind = dict()
        for i in range(self.NUM_CLASS):
            ind[i] = 0
        for e in range(int(max_iters/SUB_EPOCH_SIZE)+1):
            cur_class_dist = np.zeros(self.NUM_CLASS)
            for i in range(SUB_EPOCH_SIZE):
                if cur_class_dist.sum() == 0:
                    dist1 = cur_class_dist.copy()
                else:
                    dist1 = cur_class_dist/cur_class_dist.sum()
                w = 1/np.log(1+1e-2 + dist1)
                w = w/w.sum()
                c = np.random.choice(self.NUM_CLASS, p=w)

                if ind[c] > (len(self.label_to_file[c])-1):
                    np.random.shuffle(self.label_to_file[c])
                    ind[c] = ind[c]%(len(self.label_to_file[c])-1)

                c_file = self.label_to_file[c][ind[c]]
                tmp_list.append(c_file)
                ind[c] = ind[c]+1
                cur_class_dist[self.file_to_label[c_file]] += 1

        self.img_ids = tmp_list

    if max_iters is not None:
        self.data_list = self.data_list * int(np.ceil(float(max_iters) / len(self.data_list)))
    
    print('length of gta5', len(self.data_list))

    self.id_to_trainid = {0: 0, 1: 1}
    self.trainid2name = {
        0:"back",
        1:"file",
         }
    self.transform = transform
    self.ignore_label = ignore_label
    self.debug = debug

def __len__(self):
    return len(self.data_list)

def __getitem__(self, index):
    if self.debug:
        index = 0
    datafiles = self.data_list[index]

    # re-assign labels to match the format of Cityscapes
    label_copy = self.ignore_label * np.ones(label.shape, dtype=np.float32)
    for k, v in self.id_to_trainid.items():
        label_copy[label == k] = v
    label = Image.fromarray(label_copy)

    if self.transform is not None:
        image, label = self.transform(image, label)

    return image, label, name