I am sorry if this sounds very naive, I am still new to PyTorch and working my way around. I am doing a project in which I have generated multi view synthetic images. These images are of 3 channels and 1000 by 1000. As ground truth I have a 12 channel mask. So essentially my ground truth dimension for a single image is 1000x1000x12. Each pixel has a vector of 12 float values where these values indicate certain properties. So this is not a classification problem but can be thought of as regression. Also I dont want probablities, rather the raw values of each channel since they carry meaning with their values.
I am trying to run the Unet architecture to see how it goes by and make changes and modifications slowly along the way. I am using 12 output channels at the final layer. However I am not sure if this is the right approach. Also I have read about the BCEWITHLOGITSLOSS, and I think it is the most fitting loss function in this context. Am I going in the right track? Also I have two 1080s. However to run this, even with a batch size of 1 I am getting CudaOutOfMemory, sorry if this information is irrelevant here. My main questions are, is this architecture worth experimenting for the scope of my work, and also my decision of using 12 channels at the final layer? Also is my loss function appropriate, or is there anything better that I should begin with?
Your comments and suggestions will be really appreciated. Thanks!