Taking derivative of Linear module

Hi, I am trying something very basic with pytorch. I am creating a Linear module and setting its weights to something I desire as follows:

import torch
torch_linfn = torch.nn.Linear(2, 3, bias=True)
torch_linfn.weight = torch.nn.Parameter(
    torch.tensor([[1,0], [-1,14], [5,-9]], dtype=torch.float32))
torch_linfn.bias = torch.nn.Parameter(
    torch.tensor([1, 4, -1], dtype=torch.float32))

Now, I do a forward pass on the module:

x = torch.tensor([1, -12], dtype=torch.float32)
torch_y = torch_linfn(x)
# the output is: tensor([   2., -165.,  112.], grad_fn=<AddBackward0>)

Now, I hope to get the gradient with respect to the weights (essentially, get dy/dW):

torch_y.backward()

This throws the following error:

>>> torch_y.backward()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/kz-wd-ssd/repo/zkynet/venv/zkynet/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/media/kz-wd-ssd/repo/zkynet/venv/zkynet/lib/python3.8/site-packages/torch/autograd/__init__.py", line 166, in backward
    grad_tensors_ = _make_grads(tensors, grad_tensors_, is_grads_batched=False)
  File "/media/kz-wd-ssd/repo/zkynet/venv/zkynet/lib/python3.8/site-packages/torch/autograd/__init__.py", line 67, in _make_grads
    raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

How come? Why doesn’t pytorch’s autograd work for non-scalar outputs?

I guess you would have three different gradients, w.r.t. each of the elements in y. However, you could do this, which is maybe what you meant?

torch_y.sum().backward()  # adding .sum()
print(torch_linfn.weight.grad)
print(torch_linfn.bias.grad)
Output:
tensor([[  1., -12.],
        [  1., -12.],
        [  1., -12.]])
tensor([1., 1., 1.])

Yes that’s what I meant. Didn’t know you can get grad from the weight and bias. Thanks!

Actually, is there a reason pytorch does not support getting the gradients at the weight and bias without the .sum() function?

It does support it, just not default behavior. Here’s an example:

optim = torch.optim.SGD(params=torch_linfn.parameters(), lr=1e-4)

for i_y in range(torch_linfn.weight.shape[0]):
    optim.zero_grad()
    torch_y[i_y].backward(retain_graph=True)
    print(f"dy_{i_y}/dx")
    print(torch_linfn.weight.grad[i_y], end="\n\n")

Output:
dy_0/dx
tensor([  1., -12.])

dy_1/dx
tensor([  1., -12.])

dy_2/dx
tensor([  1., -12.])

More info about autograd here.

1 Like