Training Semantic Segmentation

Thanks for your all answers. I have a question and I want to get your help.For example, my image is three-dimention(R,G,B), my label is one-dimention. When I have build a hourglass network,the output of the network is [1,3,64,64],the label is [1,1,256,256]. How can I do to solve the problem of the inconsistent dimensions of the image and label?

Make sure the model outputs the same number of channels e.g. by setting the right out_channels value in the last conv layer and also make sure that enough unpooling layers or transposed convs are used so that the spatial size matches.