Clone and detach in v0.4.0

Sorry if this repetitive but I still don’t get it. What is wrong with doing clone first and then detach i.e. .clone().detach()?

If we clone and then detach then we still have a new tensor with it’s own memory and we’ve blocked the gradient flow to the earlier graph.

If we do .detach().clone()

then we create a tensor that shares the same memory but forget the the old gradient flow but then we made a clone of it, so now it has new memory for it (but since its a copy of the detached it still doesn’t have the gradient flow to the earlier part of the graph).

Which seem equivalent. Are they not? Is there an error in my reasoning?

Sorry if this repetitive but I still don’t get it. What is wrong with doing clone first and then detach i.e. .clone().detach() ?

Nothing. They will given an equivalent end result.
The minor optimization of doing detach() first is that the clone operation won’t be tracked: if you do clone first, then the autograd info are created for the clone and after the detach, because they are inaccessible, they are deleted. So the end result is the same, but you do a bit more useless work.
In any meaningful workload you shouldn’t see any perf difference though. So no need to worry too much about it :smiley:

3 Likes

I wish I would have known that there was no difference, but it was hard to know a priori if there was anything subtle I could have missed. Glad to know it’s safe!

Thank you! Everything is finally clear to me. I appreciate your feedback! You’re a boss a this Alban :muscle:

3 Likes