# About the order of detach()

Let’s assume the following situation.

``````logits = model(inputs)
logits_x = logits[:batch_size]
logits_u_w, logits_u_s = logits[batch_size:].chunk(2)
del logits

e.g.1
pseudo_label = torch.softmax(logits_u_w, dim=-1).detach()
e.g.2
pseudo_label = torch.softmax(logits_u_w.detach(), dim=-1)
``````

Q. I want the operation between “logits_u_w” and “softmax” not to be included in autograd. What is correct? What is the reason?

I would say that your second equation (e.g.2) is the correct one.

To stop a tensor from tracking history, you can call `.detach()` to detach it from the computation history, and to prevent future computation from being tracked.

Hence, your tensor in e.g.1 would be tracked until the `softmax` while e.g.2 would stop tracking the gradient at `logits_u_w`

1 Like

The two will actually give you the same result if you write them this way.

I would recommend the 2nd one like @tux because in the first one, you first create the graph for the softmax then discard it when you detach. In the second one, you just never create that part of the graph.
But the difference is going to be very small.

1 Like

Thank you for your answer. I find the doc a bit confusing then. From the doc:

To stop a tensor from tracking history, you can call `.detach()` to detach it from the computation history, and to prevent future computation from being tracked.

`.detach()` just detach the gradient from the graph and all the futur operations will not be tracked.

But your answer seems to say that it does stop tracking, as the doc says, AND delete the last operation from the graph (here the `softmax` operation). Could you please clarify ?

The only thing `.detach()` does is return a new Tensor without gradient history.

But if in one line you create the result then detach it. The result that was tracking history goes out of scope. And so this part of the history is not referenced by anything and is deleted.

2 Likes