Multiple forward before backward,the loss depend on one forward but the train result depend on all forward

the model training code as follows:
optimizer.zero_grad()
feat = model_w(x2_w)
loss, p_inds, n_inds, dist_ap, dist_an, dist_mat = global_loss(tri_loss, feat, y_w, normalize_feature= False)
loss.backward()
optimizer.step()
if i add other forward, code as follows:
optimizer.zero_grad()
feat1 = model_w(G_result.detach())
feat = model_w(x2_w)
loss, p_inds, n_inds, dist_ap, dist_an, dist_mat = global_loss(tri_loss, feat, y_w, normalize_feature= False)
loss.backward()
optimizer.step()

the training result is different.
Can anybody confirm that what I’m doing is correct? If it isn’t could someone indicate what the correct way would be?

Thanks!

The additional forward pass using G_result will update the running stats in all batch norm layers, if you are using them in your model.
This will most likely result in a different final accuracy.

but rhe backward loss only denpend on the feat, if i need the feat1 and want the G_result has no influence on the final accuracy,what should i do is correct?

The gradients will be calculated using the loss calculated by feat, that’s correct.
However, as explained, the forward pass using G_result might update e.g. the running statistics of batch norm layers, if they are used in the model.
If you don’t want to update these stats, you could set the model to model.eval() before passing G_result to it, and back to model.train() afterwards.

I do this as follow

   optimizer.zero_grad() 
    model_w.eval()
    feat2 = model_w(G_result.detach())  
    model_w.train()
    feat = model_w(x2_w)
    loss, p_inds, n_inds, dist_ap, dist_an, dist_mat = global_loss(tri_loss, feat, y_w, normalize_feature= False) 
    loss.backward()
    optimizer.step()

the result is still wrong.Is there any other problem?

Could you explain, what makes the result wrong?

How large is the difference in final accuracy and what variance do you expect?
I.e. if you run your code without the G_result forward pass (with different seeds, if you set them), how large is the variance in final accuracy then compared to your current run?

after training 10 epoch,if add the G_result forward pass,there will be a 10% difference in accuracy
if without G_result(with different seeds),there will be a 1% difference in accuracy

How many runs did you run and how large is the variance?

Thank you.I know what you mean.the G_result is the GAN result,it has the different data distribution with real data and I will try fixed the bn layers when training.