What is the difference between those two tensors?
loss1 = tensor(22081814., device='cuda:0', grad_fn=<ThAddBackward>)
loss2 = tensor(1272513408., device='cuda:0', grad_fn=<SumBackward0>)
They are the loss values to be used for step.backward().
What is the difference between those two tensors?
loss1 = tensor(22081814., device='cuda:0', grad_fn=<ThAddBackward>)
loss2 = tensor(1272513408., device='cuda:0', grad_fn=<SumBackward0>)
They are the loss values to be used for step.backward().
The last operation on these tensors were apparently an addition and a summation.
Have a look at this dummy code:
x = torch.randn(1, requires_grad=True) + torch.randn(1)
print(x)
y = torch.randn(2, requires_grad=True).sum()
print(y)
Both operations are valid and the grad_fn
just points to the last operation performed on the tensor.
Usually you don’t have to worry about it and can just use the losses to call backward
.
Thank you very much.
Hello, let’s say that i have an embedding matrix: nn.Embedding(3, 5)
I want to average these embeddings, so first i have to sum their elements.
As long as the operations are equal, the forward and backward pass will yield the same results.
You might notice a performance difference, as a lot of operations are vectorized to speed them up.
Thank you very much!!