You should only apply the projection on weight.data, so that the operation isn’t taken into account in the computation graph (and it’s gradient isn’t retained). See here.
You should only apply the projection on weight.data, so that the operation isn’t taken into account in the computation graph (and it’s gradient isn’t retained). See here.