I am training a net with hinge loss or somewhat similar. when the loss is zero, the gradient should be automatically zero. However, I want to use L2 regularization. Thus, to save computing time, I use the following method.
optimizer = optim.SGD(parameters, lr =0.001, weight_decay=0.01)
for iter in range(10):
ptimizer.zero_grad()
a = model(data)
loss = somecriterion(a)
if loss.item() != 0:
loss.backward()
optimizer.step()
Thus, in this way, when the loss is zero, I do not need to calculate backprop as the result is zero for sure. And since my optimizer has weight decay, it still trains the model with optimizer.step(). But this causes a memory leak. Any suggestions?`