The perceptual loss has become very much prevalent with an example shown in this code. However mostly I see people using VGG16 and not VGG19. This could be because generally people use low to medium resolution images such as 400x600 and so the depth of VGG16 may be sufficient. However the output of my denoising network is a high definition image which is 2048 x 4096 and has perfectly registered corresponding ground truth image. Of course at the time of training I just use a patch of 512 x 512 and perceptual loss will be computed over this patch only.
So I am not clear on the following issue regarding which will be the best practice:-
Use VGG16 or VGG19. Since VGG19 has more layers I naively believe that it should be used although most implementations on GitHub still use VGG16.
Even in VGG16 people do not use all the layers. Mostly middle layers. But since I want my restored image to be an identical replica of the ground truth, in the best possible case can I naively use L1norm difference on all max pool layers of VGG? Or should I just use deep layers and skip the shallow layers or just use the middle layers?
Also many implementations such as this one resize the image to 244 x 244. But since mine is a much larger image; bilinearly interpolating 512 x 512 images to this small resolution may be problematic. So what is better? Passing the full resolution image through VGG or the downsized version for computing perceptual loss.
There is not much benefit in the last layers. As these layers become more and more task specific, while we want something that can guide our model to generalize. Also, most of the basic things in the image is easily incorporated from low-to-middle layers of the model. But if you want, you can use last layers, but it is more of a diminishing returns thing.
Also, this also confuses me why people use VGG16, whereas we have much better classification models. One reason that I once found was because it just works and so nobody bothers to change it, as you would have to choose the layers from where you want features if you use a new architecture.