I read an article on solving a problem I have using a conv nn. Authors in the paper tell to split training images into 40x40 pixels patches. So I did it. Then I took my test image, also split it into 40x40 patches, ran them through the nn and then stitched the results back. This works but I have visible seams between the patches. I thought that since I have conv-only nn I could simply run through the network the whole image of 400x400 resolution. But when I do that as a result I get a completely random output image. Is that correct? I can’t run any-size image through a conv net? I can only run images (size-wise) that the net was used to train on?
If you use a much larger input your kernels might miss the features they were trained on.
So I think it’s reasonable to assume the test input resolution should stay as close to the training resolution as possible.
Any hints to the stitching problem?
What kind of output do you have, i.e. is it a segmentation map or a “natural” image?
Based on the output we could think about normalizing the patches somehow.
A “natural” image. Essentially, I’m simply teaching the net to improve quality of jpeg images by removing blockiness. It works quite nicely except for the fact that stitches are somewhat visible. The paper is here https://arxiv.org/pdf/1708.00838v1.pdf Authors don’t mention how to process a whole image
Based on this MATLAB code they use a
You are right though, they don’t mention how they got the results from Figure 5.
Are your result images completely rubbish?
Actually, my results are quite nice now, as I have waited a bit longer for training to complete. But on areas with little color variation one can still spot stitches.
Note that the authors say they use 180x180 images but they actually split them into 40x40 pixels patches. Patches that overlap each other every 20 pixels. And they train the network on those patches.
I think that maybe the authors got their final results by not just splitting an input image into separate 40x40 patches, but they also included those overlapping one, and ran all of them through the network. After that they used inferred overlapped patches and separate patches and used some form of interpolation to blend between them. I think that could work but would induce some performance cost to obtaining the final output image. And authors don’t mention anything like that so it’s just my guess.
I wrote to the authors regarding this very problem.
Turns out they just push the entire image through the network. I just tested it again and it works all fine. Previously when I tested it I might have forgotten to properly conver scales (from 0-255 to 0.0-1.0 for instance) or something. So during evaluation the input image can indeed be much larger than 40x40 pixels. Well, actually not that large because when I try passing through a full HD image my GPU runs out of memory (8 GBs). So I think that in the end I will still have to segment the input.