Yes, your third version where you call .backward() twice, is mathematically the same
as your first two versions where you sum the two losses and then call .backward()
once. (It could well have slightly different numerical round-off error.)
When you call .backward() twice (with no intervening .zero_grad()) the two gradients
get accumulated into the .grad properties of the various parameters. But the sum of the
gradients is the gradient of the sum, so you get the same result as summing the losses and
calculating the gradient of the sum with a single call to .backward().
Depending on your use case and the amount of memory you have, you could combine
your inputs, x1 and x2, into a single batch tensor, x_both, and, likewise, combine
your targets into a single batch tensor, y_both, and perform a single forward and
backward pass:
Assuming that your loss function, L1(), uses (the equivalent of) reduction = 'sum'
(or nearly equivalently, reduction = 'mean'), this will be mathematically equivalent
to your three versions.