Why ExpBackward0._saved_result is not the output

Sam-gege · April 8, 2023, 10:09pm

For functions whose gradient depends on inputs, we can access the _saved_self or _saved_other attribute and they are same tensor as inputs:

import torch
x=torch.tensor([1,2,3.], requires_grad=True)
y=torch.log(x) # gradient is y'=1/x
print(y.grad_fn._saved_self is x)
# True

However, for functions whose gradient depends on output, its _saved_result is not the output, but a new tensor with the same data:

import torch
x = torch.tensor([1, 2, 3.], requires_grad=True)
z = torch.exp(x) # gradient is y'=y
print(z.grad_fn._saved_result is z)
print(z.grad_fn._saved_result.data_ptr() == z.data_ptr())
# False
# True

why not just points to the output? just a little curious.

ptrblck · April 9, 2023, 5:26am

Wouldn’t this create a circular reference for z? In the end it would point to itself and I’m not sure how Python would deal with it.

KFrank · April 9, 2023, 1:46pm

Hi @ptrblck (and Sam)!

I’m not a python expert, but I’ve come to believe that python first uses
reference counting (presumably to be able to free dereferenced objects
promptly and more cheaply), but will follow up with garbage collection
(to deal with reference cycles).

However, in the case of pytorch, it’s not clear what those rascally devs
have been up to …

Consider:

>>> import torch
>>> torch.__version__
'2.0.0'
>>>
>>> # reproduce Sam's results
>>> x = torch.tensor([1, 2, 3.], requires_grad=True)
>>> z = torch.exp(x) # gradient is y'=y
>>> print(z.grad_fn._saved_result is z)
False
>>> print(z.grad_fn._saved_result.data_ptr() == z.data_ptr())
True
>>>
>>> # looks like some kind of self-reference
>>> z.grad_fn
<ExpBackward0 object at 0x000002160E24A1D0>
>>> z.grad_fn._saved_result.grad_fn
<ExpBackward0 object at 0x000002160E24A1D0>
>>> z.grad_fn._saved_result.grad_fn._saved_result.grad_fn
<ExpBackward0 object at 0x000002160E24A1D0>
>>> z.grad_fn._saved_result.grad_fn._saved_result.grad_fn._saved_result.grad_fn
<ExpBackward0 object at 0x000002160E24A1D0>
>>> z.grad_fn._saved_result.grad_fn._saved_result.grad_fn._saved_result.grad_fn._saved_result.grad_fn
<ExpBackward0 object at 0x000002160E24A1D0>
>>>
>>> # but ...
>>> z.grad_fn._saved_result.grad_fn
<ExpBackward0 object at 0x000002160E24A1D0>
>>> z.grad_fn._saved_result
tensor([ 2.7183,  7.3891, 20.0855], grad_fn=<ExpBackward0>)
>>> z.grad_fn._saved_result.grad_fn   # ???
<ExpBackward0 object at 0x0000021615CDF550>
>>>
>>> z.grad_fn._saved_result.grad_fn._saved_result.grad_fn
<ExpBackward0 object at 0x0000021615CDF550>
>>> z.grad_fn._saved_result.grad_fn._saved_result
tensor([ 2.7183,  7.3891, 20.0855], grad_fn=<ExpBackward0>)
>>> z.grad_fn._saved_result.grad_fn._saved_result.grad_fn   # ???
<ExpBackward0 object at 0x000002160E249F30>

So … yeah.

Best.

K. Frank

soulitzer · April 10, 2023, 12:16pm

@ptrblck is correct. The output is not saved because it would create a reference cycle. The reason python would not be able to clear that automatically is because this reference cycle is purely a cpp one.