Loss and other issues

Hello,

I am sorry if this sounds very naive, I am still new to PyTorch and working my way around. I am doing a project in which I have generated multi view synthetic images. These images are of 3 channels and 1000 by 1000. As ground truth I have a 12 channel mask. So essentially my ground truth dimension for a single image is 1000x1000x12. Each pixel has a vector of 12 float values where these values indicate certain properties. So this is not a classification problem but can be thought of as regression. Also I dont want probablities, rather the raw values of each channel since they carry meaning with their values.

I am trying to run the Unet architecture to see how it goes by and make changes and modifications slowly along the way. I am using 12 output channels at the final layer. However I am not sure if this is the right approach. Also I have read about the BCEWITHLOGITSLOSS, and I think it is the most fitting loss function in this context. Am I going in the right track? Also I have two 1080s. However to run this, even with a batch size of 1 I am getting CudaOutOfMemory, sorry if this information is irrelevant here. My main questions are, is this architecture worth experimenting for the scope of my work, and also my decision of using 12 channels at the final layer? Also is my loss function appropriate, or is there anything better that I should begin with?

Your comments and suggestions will be really appreciated. Thanks!

Hi Farhan!

I assume that you mean you have 12 output channels per pixel in
the final layer. Otherwise I don’t understand what you are doing.

Given that you understand this to be similar to a regression,
MSELoss (mean-squared error) would be the appropriate loss
function.

(I can’t comment on Unet nor your CudaOutOfMemory error.)

Best.

K. Frank

Hey Frank,

Thanks for commenting!

Yes I do have 12 values for each pixel, which is in each channel. However these are continuous
values on a range of [0-1].

*Edit : Also, yes that is what I meant, as in, my output would be a 1000x1000x12 mask. 12 corresponding to the properties I mentioned for each pixel, where each value can range between [0-1].