Call backward () twice

I have a network that has two independent neurons at the output.
Each neuron has a function to activate tan.
I consider a mistake for the first neuron and for the second.
If I call backward () twice, I get an error.
How do i do right?
As the Loss function, I use L1Loss.

I also can not figure out how to choose the right Loss function.
I have two neurons. They work independently and give values ​​from -1 to 1.
For example, one neuron gives 0.3, and the network gives the correct answer is -0.4. Network error is 0.7. I have to reduce the gradient. But if I call the L1Loss function, then I will get 0.3 - (- 0.4) = 0.7. In this case, my gradients will be increased. How do I specify a network to reduce the gradient?
Although maybe I’m wrong, the network will do everything right …
The first question remains, how do I calculate the gradients for two neurons( loss1.backward(), loss2.backward() )

you can do for example (loss1 + loss2).backward() or use torch.autograd.grad function

every time after loss.backward() is called, the previous computational graph is released.
Thus if you want use the graph again, just add loss1.backward(retain_graph = True) to prevent the graph to be released.
And remember to reset optimizer.zero_grad , before you call optimizer.step().
loss.backward() will compute the gradient
and optimizer.step() will apply the gradient and update the tensor.
Since you have two loss, you might need to be more careful about when to reset the grad and update

In the example you stated, after applying you get the error 0.7. And when you do a backward pass the network computes the gradients such that your error is reduced. I am not sure how you stated the gradients will increase in your example.

Also like smth said you don’t need to do backward twice, you can basically add those losses and directly compute backward on the total loss.

Assume the network issued on the first neuron 0.7, the correct answer is 0.2 error is 0.5. The second neuron produced 0.1, the correct answer is 0.5, the error is -0.4.
If you do (loss1 + loss2) .backward () -> (0.5 - 0.4 = 0.1) .backward (), how does the network know that for the first neuron you need to decrease by 0.5, and for the second to increase by 0.4?

At the moment I did like this

optimizer.zero_grad()
loss1.backward(retain_graph=True)
loss2.backward()
optimizer.step()

Maybe that’s how it will be right?

optimizer.zero_grad()
loss1.backward(retain_graph=True)
optimizer.step()
optimizer.zero_grad()
loss2.backward()
optimizer.step()

you sum the absolute values of the errors. so your total loss would be 0.5 + 0.4. so, to decrease the total loss it needs to decrease individual losses, thus the gradients are updated such that both the losses decrease simultaneously.

If above is the scenario of your problem, then you can add absolute value of the losses and do backward at once.

The weights for the first neuron need to be reduced by 0.5, and the weights for the second one increased by 0.4. If the errors are 0.5 and 0.4, does backward () do the right thing?

yes. The error would be zero only if the first decreases and the second increases. And the models goal always would be to achieve zero loss.

I think you do not understand me correctly.
If the output was one neuron and the error was 0.5, then backward () would do everything correctly.
But I need backward () to change each neuron correctly, first reduce the weight for the first neuron, while the second neuron will not change. Then he changed the weight for the second neuron. Is it possible?

I can not understand, is the error 0.4 and -0.4 the same?
Indeed, in the first case, the gradient needs to be reduced, and in the second one, should it be increased?
But if I do

abs (Loss)

then the network will not correctly change the gradient …

what does it mean for “the previous computational graph to be released” mean?

I just encountered the exactly same problem.

In my case, my two networks can both be trained independently with different loss functions. But putting them together by calling loss1.backward() and loss2.backward() consecutively will result in “inplace operation” errors. And the info provided by pytorch isn’t helpful at all.

I fixed the error by making sure the input tensors of the two networks are completely detached. Based on my knowledge, the input tensors will be included in the computation graph by default, and loss.backward() function will remove all tensors that are related to the loss from the graph. That means, if an input tensor is used by both the networks, it will not be visible to loss2’s computation graph after pytorch runs loss1.backward(), which may be identified as an “inplace operation” by the debugging tools.

To detach the tensor, simply organize your code like:

x.clone().detach()
loss1 = loss_func(y1, net1(x))
loss2 = loss_func(y2, net2(x))
opti1.zero_grad()
opti2.zero_grad()
loss1.backward()
loss2.backward()
opti1.step()
opti2.step()

I’m using torch version 1.13. Please let me know if the solution is not useful.