Question about image segmentation

Hi I’m trying to do image segmentation.

but I can not understand how it works.

I do know how CNN works in classification (something like MNIST classification) by the way

I just can’t understand how the mask image and labels work in the network.

do I need the datasets having like this???

(image, mask image)
[(image, label), (mask image, label)]

having both mask image and label makes me really confusing.

specifically I am trying to do segementation the brain cancer in Brain MRI image
so do I need the dataset like
(brain MRI image, mask image)
[(brain MRI, cancer or not),(mask image, cancer or not)]

and another question.
does the visualized weights shows the predicted mask image??

You would need the former, i.e. an input image and a target mask.
The target mask would now have the shape [batch_size, height, width] and contain the class indices for each pixel in the range [0, nb_classes-1] (similar to a vanilla classification task, but with the additional spatial dimensions).

No. Your model could use e.g. a conv layer as the output layer, which would return class logits in the shape [batch_size, nb_classes, height, width]. Again this would be similar to a standard classification use case, but your output would also have the additional spatial dimensions.