In the model training,should torch.zero_grad() be put after or before loss.backward()? And why are sometimes loss values fixed?

In the model training,should torch.zero_grad() be put after or before loss.backward()? And why are sometimes loss values fixed?

It should be called b4 fitting the model with some
training input data like this:

model.zero_grad()
pred_y = model(X)
loss = criterion (pred_y, y)
loss.backward()
optimizer.step()

As to y some loss values are fixed, it’s a normal thing as long as it’s not fixed through out training.
U should probably change ur model architecture or check if ur datasets is scaled and normalized properly.

Also this is sth I’d like to say from the experience I’ve gotten over the past year: If u are working with structured data then neural networks might not just be the way forward so I suggest u use gradient boosters eg; random forest…

Neural networks are only better than other machine learning models when it comes to heavy lifting like Image data modeling or NLP and language modeling.