You should only apply the projection on weight.data, so that the operation isn’t taken into account in the computation graph (and it’s gradient isn’t retained). See here.
1 Like
You should only apply the projection on weight.data, so that the operation isn’t taken into account in the computation graph (and it’s gradient isn’t retained). See here.