Proper way to do projected gradient descent with optimizer class

You should only apply the projection on weight.data, so that the operation isn’t taken into account in the computation graph (and it’s gradient isn’t retained). See here.

1 Like