Problem in disjoint training of autoencoder and MLP networks

I have implemented a network that has an autoencoder and after that a Multiple layer perceptron (MLP). Before autoencoder I used embedding layers. The embeddings’ outputs are inputs of autoencoder. I use embeddings’ outputs for calculating MSE in the autoencoder. The embeddings are trained by autoencoder at the same time. My goal is to train the autoencoder, then using encoder’s output as input to train the MLP. The autoencoder seems to be trained good because there is reduction in loss. But when train autoencoder and freeze the autoencoder layers, training only MLP on pre-trained autoencoder does not improvement in accuracy.

To make sure my implementation is on the right direction and trained features by autoencoder does not make problem in MLP classification part of my model, I have tried joint training by using the following loss functions and weights:
Total Loss = 0.02 * classification loss + autoencoder loss
Total Loss = classification loss + 0.2 * autoencoder loss
Everything works fine.

As I said, my problem is that If I train autoencoder first and then train classification, the classification can not learn. I can not figure the issue out. Does it expect to be helpful if I use embedding output as input data for training autoencoder?
Because there are some variations in the embedding, at first I expected no loss reduction in training autoencoder. But it can learn and the loss reduces, so it shows that autoencoder and embeddings are learning representation of data. Does it a true conclusion?

Can anyone help me on this issue?

I hope my question is clear and you can help me. If you need more details , please let me know.