Hello,
I’m using Segmentation Models PyTorch to create and train a FCN for a semantic segmentation task. For training, I chop up my training data into patches of the same size (512×512). Then I am applying the model to larger inputs for inference, which is possible for FCN architectures using PyTorchs dynamic computional graphs. I just have to make sure the input dimensions are a multiple of the FCNs strides, in my case 32 pixels, which can easily be achieved by padding the inputs.
However, I now want to perform the inference on hardware with rather limited GPU memory, so I can’t feed the large inputs directly into the network. Instead I have to chop it into patches and put them back together. In order for this to produce the same results as when the network is fed the whole image, I have to restrict the output of the network to only those pixels, where the involved convolutions don’t see any of the network-internal padding, so only the “valid convolutions”. In other words, only those output pixels, whose receptive fields don’t see beyond the given image patch.
Is it possible to restrict the output of a FCN in PyTorch to those pixels, where all convolutions are valid in the sense that they don’t depend on padding?