How did you pretrain the Autoencoder?
Based on your explanation, it seems you’ve used the real images:
Would this approach work, if you now feed the segmentation output of your VNET to this AE?
The image statistics, ranges etc. should be quite different, so I’m not sure if it’ll work out of the box or if you would need to train both models end-to-end.
The AutoEncoder is pretrained with the ACNN input segmentations.
The idea of the paper is to use the predicted segmentation from the VNET (or whatever segmentation network you are using). The predicted segmentation and the ground-truth segmentation will be the input to 2 separate AE instances where the latent representations of both will be compared via Euclidean distance measure.