Use one Module to provide loss function for another Module

I am using one trained Module (E) to provide the loss function for the output of another Module (M).
Of course, I would like the errors specifically back-propagated through the specific features of M’s output.

However, I don’t want E to update its parameters at all.

In the init of M, I have tried:

        for param in self.E.parameters():
            param.requires_grad = False

and also

            param.detach()

but after training M a bit, I see that E has been updated. I test this before and after training M by passing some dummy input to E, and seeing that it has changed.

How can I backprop errors from E to M, but keep E fixed?

By the way, I have searched the forums and found some related posts, but not the answer to my question. For example:

So here’s what I discovered.

        for param in self.E.parameters():
            param.requires_grad = False

works, except that even though I run E.eval() before passing it into M, somehow it gets into state E.train() during M’s training, and thus dropout is enabled. If I run E.eval() every time before I call E to do evaluation, it is fine. I don’t really understand why this is, if someone can explain.

This sound like transfer learning with extra steps. Why not simply transfer the trained portion of E to M and freeze it?

Can you explain what you mean?
The simplest thing would be not to copy the weights and the network structure, but instead just to pass E into M