Position invariant perceptual loss

Is there any way to get positional-invariant perceptual loss?
i.e. these image pairs would receive a low loss:
cf746e57d06a_bE9Y400tb0 compared to 087b12a73b83_B44L536Iso

cf746e57d06a_bE9Y400tb0 compared to eb1c9f04e639_1MMupMjNHt

Because they contain the same objects.

But this image would receive a high loss:

cf746e57d06a_bE9Y400tb0 compared to 503fd7c3498a_OThVRmItUJ

Since the objects are not the same.

Experimentation of the above images show that none of the layers in VGG19 has this property:

Code_SqgKSTLCtv

1 Like