During my training I predict two covariance matrices, which I want to become similar. I therefore want to introduce a loss term, that forces that. My naive approach was to calculate a loss based on the absolut difference. The problem is, that this also causes the matrices to become small, which is not what I want. Do you guys have ideas here?

Couldnâ€™t you have a loss that is the divide by the norm of the matrices? That would make the loss large if the norm small which means your matrix is small.