I was curious to call backward twice in series, where I had a sequential computation (graph). I have a loss function in the middle and another at the end. I call backward on both of them. That does not seem to be equivalent to getting the second derivative (cuz I did it manually with Sympy). I wasn’t trying to do the second derivative (though it would be nice to figure that out too), but was curious what that is doing.
So what is it doing?
Experiment results:
---- test results ----
experiment_type = grad_constant_but_use_backward_on_loss
-- Pytorch results
g = 12.0
dJ_dw = 16838.0
-- Sympy results
g_SYMPY = 2*x*(w*x - y)
dJ_dw_SYMPY = 197258.000000000
---- test results ----
experiment_type = grad_analytic_only_backward_on_J
-- Pytorch results
g = tensor([12.], grad_fn=<MulBackward0>)
dJ_dw = 197258.0
-- Sympy results
g_SYMPY = 2*x*(w*x - y)
dJ_dw_SYMPY = 197258.000000000
Code:
## variable declaration
w = torch.tensor([2.0], requires_grad=True)
x = torch.tensor([3.0], requires_grad=False)
y = torch.tensor([4.0], requires_grad=False)
x2 = torch.tensor([5.0], requires_grad=False)
y2 = torch.tensor([6.0], requires_grad=False)
if True:
## computes backard pass on J (i.e. dJ_dw) but g is was backwarded passed already
# compute g
loss = (w*x-y)**2
loss.backward()
g = w.grad.item() # dl_dw
# compute w_new
w_new = w - (g+w**2) * g
# compute final loss J
J = (w_new + x2 + y2)**2
# computes derivative of J
J.backward()
#dw_new_dw = w_new.grad.item()
dJ_dw = w.grad.item()
print('---- test results ----')
print(f'experiment_type = {experiment_type}')
print('-- Pytorch results')
print(f'g = {g}')
#print(f'dw_new_dw = {dw_new_dw}')
print(f'dJ_dw = {dJ_dw}')
print('-- Sympy results')
g, dw_new_dw, dJ_dw = symbolic_test(experiment_type)
print(f'g_SYMPY = {g}')
#print(f'dw_new_dw_SYMPY = {dw_new_dw}')
print(f'dJ_dw_SYMPY = {dJ_dw}')
and sympy code:
def symbolic_test(experiment_type):
'''
'''
w, x, y, x2, y2, g = symbols('w x y x2 y2 g')
loss = (w*x - y)**2
grad = diff(loss,w)
if experiment_type == 'grad_constant':
## compute g
eval_grad = grad.evalf(subs={w:2,x:3,y:4})
g = eval_grad
## compute w_new
w_new = w - (g+w**2) * g
## compute final loss J
J = (w_new + x2 + y2)**2
##
dw_new_dw = diff(w_new,w).evalf(subs={w:2,x:3,y:4})
dJ_dw = diff(J,w).evalf(subs={w:2,x:3,y:4,x2:5,y2:6})
else:
## include grad as a symbolic variable into next expressions
g = grad
## compute w_new
w_new = w - (g+w**2) * g
## compute final loss J
J = (w_new + x2 + y2)**2
##
dw_new_dw = diff(w_new,w).evalf(subs={w:2,x:3,y:4})
dJ_dw = diff(J,w).evalf(subs={w:2,x:3,y:4,x2:5,y2:6})
return g, dw_new_dw, dJ_dw
interesting! It seems it doesn’t matter which of these two I used:
g = w.grad.item() # dl_dw
#g = w.grad # dl_dw
This still does seem to make a difference!
g = w.grad # dl_dw
g.requires_grad = True
cross posted: