I have a problem. I have done a CNN which gives me an output of the form : (BATCH_SIZE, 1, HEIGHT, WIDTH). However, in order to apply the CrossEntropyLoss function, I need an output of the form, (BATCH_SIZE, NB_CLASS, HEIGHT, WIDTH). How can I get such a matrix from my original output? I currently have three classes, I want to do image segmentation and my output images are obviously float32.
As an aside, it sounds like your current CNN performs binary (i.e.,
It sounds like your desired use case is multi-class (with NB_CLASS
classes) semantic segmentation.
First, once you’re down to (BATCH_SIZE, 1, HEIGHT, WIDTH), that
is, a single output channel, you will no longer be able to recover your
desired NB_CLASS channels. You will have to modify the final bit
of your CNN architecture, and the details will depend on your specific
It is likely that your next-to-last layer will produce an “image” with
a number of “feature” channels, and your last layer will recombine
your feature channels into the desired number of “class” channels.
For example, quoting from the original U-Net paper, “At the final layer
a 1x1 convolution is used to map each 64-component feature vector
to the desired number of classes.”
In the context of pytorch, you might be looking for a final layer of