If I have two different neural networks (parametrized by model1 and model2) and corresponding two optimizers, would the below operation using model1.parameters without detach() lead to change in its gradients? My requirement is that I want to just compute the mean squared loss between the two model parameters but update the optimizer corresponding to model1.

parameters_to_vector is differentiable and so yes gradients will flow back to both models.

In general, there are very limited cases where you need .detach() within your training function. It is most often used when you want to save the loss for logging, or save a Tensor for later inspection but you donâ€™t need gradient information.

You mentioned about parameters_to_vector being differentiable. So, how can I check whether a function is differentiable or not?

Also, can you be more specific about the operations in which detach is required. For instance, if I pass the model.parameters to any other function or use it like params = list(self.model1.parameters()), will these require the use of detach()?

In general, all ops in pytorch are differentiable.
The main exceptions are .detach() and with torch.no_grad. As well as functions that work with nn.Parameter that needs to remain leafs and so cannot have gradient history.

Also, can you be more specific about the operations in which detach is required

Detach is used to break the graph to mess with the gradient computation.
In 99% of the cases, you never want to do that.

The only weird cases where it can be useful are the ones I mentioned above where you want to use a Tensor that was used in a differentiable function for a function that is not expected to be differentiated. And so you can use detach() to express that. But this is a rare case.

Hi @albanD iâ€™m in this situation, i need to detach my tensor because the blackbox model that i use in my model donâ€™t take tensor . How to do that properly please? everything change but my parametersâ€™gradient stuck to zero, i suspect .detach(). Thank youâ€¦

If you want to get gradients through something that our autograd cannot see, you will have to use a custom Function so that you can tell the autograd what the backward is: Extending PyTorch â€” PyTorch 1.11.0 documentation
From within the forward there, you can just unpack the Tensor into whatever you want and pass that to your black box!

Thank you so much for your reply. I will try and get back. I still ask my self why all parameters gradients stuck zero and not None? Even if i initialize bias and weights to certain values, they still change during inference, iâ€™m a beginner and that confuse me. Thank you.

Ok thank you. I 'm using Adam without weight decay but itâ€™s momentum so i get it. Iâ€™ll try your suggestion and get back. So the lesson for now is to never use .detach() into a model.