For deep clustering models in PyTorch, suppose I have a joint objective where a pretrained autoencoder (used as a feature extractor) and a clustering model are optimized using a joint loss function (ae-loss + clustering loss). Should the autoencoder be in evaluation mode or training mode during optimization?