What is the difference between those two tensors?

```
loss1 = tensor(22081814., device='cuda:0', grad_fn=<ThAddBackward>)
loss2 = tensor(1272513408., device='cuda:0', grad_fn=<SumBackward0>)
```

They are the loss values to be used for step.backward().

What is the difference between those two tensors?

```
loss1 = tensor(22081814., device='cuda:0', grad_fn=<ThAddBackward>)
loss2 = tensor(1272513408., device='cuda:0', grad_fn=<SumBackward0>)
```

They are the loss values to be used for step.backward().

1 Like

The last operation on these tensors were apparently an addition and a summation.

Have a look at this dummy code:

```
x = torch.randn(1, requires_grad=True) + torch.randn(1)
print(x)
y = torch.randn(2, requires_grad=True).sum()
print(y)
```

Both operations are valid and the `grad_fn`

just points to the last operation performed on the tensor.

Usually you don’t have to worry about it and can just use the losses to call `backward`

.

6 Likes

Thank you very much.

Hello, let’s say that i have an embedding matrix: nn.Embedding(3, 5)

I want to average these embeddings, so first i have to sum their elements.

If i use torch.sum(my_embeddings, dim=0) or add them one by one i get the same result but with different backward operations as you can see in the screenshot. What is the best option for this or is it the same?

As long as the operations are equal, the forward and backward pass will yield the same results.

You might notice a performance difference, as a lot of operations are vectorized to speed them up.

2 Likes

Thank you very much!!