Hello there. I am working on a school project where we segment medical images. We use x-ray images and need to use them at full resolution. We implemented U-Net with transposed convolution as upsampling (which proved to be superior to using interpolation). The images are 1068x847, which is quite big. However the memory it takes up is huge. We need around 8GiB. I accept that the layers have to save a lot of data to do backdrop, however this seem too much. Thus we are worried about implementation mistakes. Is it normal? Or is there a simple way to see how much space is each layer taking up or something to make sure that this memory usage is justified?
The memory usage might be expected and you could take a look at e.g. this post to get some information about the size of the model parameters as well as the intermediate forward activations.