I am training a net with hinge loss or somewhat similar. when the loss is zero, the gradient should be automatically zero. However, I want to use L2 regularization. Thus, to save computing time, I use the following method.
optimizer = optim.SGD(parameters, lr =0.001, weight_decay=0.01) for iter in range(10): ptimizer.zero_grad() a = model(data) loss = somecriterion(a) if loss.item() != 0: loss.backward() optimizer.step()
Thus, in this way, when the loss is zero, I do not need to calculate backprop as the result is zero for sure. And since my optimizer has weight decay, it still trains the model with optimizer.step(). But this causes a memory leak. Any suggestions?`