Error with manually traversing computational graph

I have a simple linear regression model and I am running one iteration of training on it.

# model definition
model = torch.nn.Linear(W_target.size(0), 1)
print(model.state_dict())

# zero gradient
model.zero_grad()

# forward pass and compute loss
y_preds = model(batch_x)  # computes y = xA^T + b
output = F.smooth_l1_loss(y_preds, batch_y)
loss = output.item()

Then I attempt to manually perform loss.backward() as such:

    dloss = torch.tensor(1.)
    print("output2.grad_fn:", output.grad_fn)
    smooth_l1_loss_backward = output.grad_fn
    dz = smooth_l1_loss_backward(dloss)[0]
    print(dz)
    print("smooth_l1_loss_backward.next_functions:", smooth_l1_loss_backward.next_functions)
    print(id(smooth_l1_loss_backward.next_functions[0][0]))
    assert smooth_l1_loss_backward.next_functions[0][0] == y_preds.grad_fn
    addmm_backward = smooth_l1_loss_backward.next_functions[0][0]
    print(addmm_backward)
    print("addmm_backward.next_functions:", addmm_backward.next_functions)
    d_ypreds = addmm_backward(dz)
    print(d_ypreds)
    d_bias, d_sample, d_weight_t = d_ypreds
    back_bias = addmm_backward.next_functions[0][0]
    transpose_backward = addmm_backward.next_functions[2][0]
    print(transpose_backward.next_functions)
    d_weight = transpose_backward(d_weight_t)
    back_weight = transpose_backward.next_functions[0][0]
    back_bias(d_bias)
    back_weight(d_weight)
    print(model.bias.grad)
    print(model.weight.grad)

However back_bias(d_bias) is not working due to a size error. Also, for some reason dz = smooth_l1_loss_backward(dloss)[0] is a tensor with grad_fn. Should that be happening?

In a tutorial that I found

x = torch.randn(4, 4, requires_grad=True)
y = torch.randn(4, 4, requires_grad=True)
z = x * y
l = z.sum()
dl = torch.tensor(1.)
back_sum = l.grad_fn
dz = back_sum(dl)
back_mul = back_sum.next_functions[0][0]
dx, dy = back_mul(dz)
back_x = back_mul.next_functions[0][0]
back_x(dx)
back_y = back_mul.next_functions[1][0]
back_y(dy)
print(x.grad)
print(y.grad)

dz = back_sum(dl) is Tensor with grad_fn=None.

Please advise. Thank you! @ptrblck

Yes, that is expected. When you run backward normally it runs it in a no_grad context, and that is why there is no grad normally.

So when I run dz = smooth_l1_loss_backward(dloss)[0], it should be a tensor with grad_fn=None right? But like I said above, I am getting a grad_fn.

No getting a grad_fn is fine. The reason the tutorial is getting grad_fn=None is because they may be doing something that breaks the graph in some way.

can you please explain why back_bias(d_bias) is causing the error RuntimeError: output with shape [1] doesn't match the broadcast shape [32, 1]. I don’t understand why d_bias is at a shape [32, 1].

d_bias should have the same shape as bias

yep I understand that that’s the issue, but I don’t understand the cause of the issue and how to fix it. my gut feeling tells me that it theres an issue with dz = smooth_l1_loss_backward(dloss)[0] because my d_bias is equal to dz.