Hi everyone,
A few questions here.
At what epoch do you notice that masked autoencoders start producing similar-looking reconstructions during the pretraining phase? I’ve experimented on pretraining the BTCV dataset and notice that after ~800 epochs, the results are really no different than the first 20 or so.
My loss never really drops before 0.9 either. I’m using a low learning rate and weight decay following the hyperparameters of the original paper. Is this common?
Also, it best to only move to the downstream task (in my case segmentation) once the reconstructions look decent?