Hi,

My loss function requires me to use the weights of the last layer of the model directly. Here is my use case:

- h = output from an intermediate layer
- S = weights of the last layer of the model
- ypred = output of the model, which are discarded during training. it is only used for inference.
- A multivariate normal distribution is constructed by computing mean and covariance for the batch from h and S
- The loss is then the negative log likelihood of the actual target computed from the distribution obtained in step 4

So when I train the model in this way, will the model learn, will the last layer of the model be updated accordingly?