Proper way to do projected gradient descent with optimizer class

GeoffNN · April 22, 2019, 7:52pm

You should only apply the projection on weight.data, so that the operation isn’t taken into account in the computation graph (and it’s gradient isn’t retained). See here.