[Solved] Reverse gradients in backward pass

Thanks for sharing your code, Daniel. Very helpful. Two questions:

  1. In the original paper, Bousmalis et al. use the squared Frobenius norm to ensure the shared and private subspaces are unlikely to encode redundant information. Did you include this loss term (i.e., the difference loss) in your loss function? If so, would you mind sharing the code of your loss function?

  2. Is there a solution where you update the feature extractor once only with losses collected from the decoder and classifier (with a custom loss that combines all loss terms, especially the difference and similarity loss terms)?

Thank you!