Oh sorry I think I read a totally different thing.
The in-place computation means you modified a variable by manually changing its data. For example if you write w[7]=18
in the forward pass, in-place sums or similar things. Here you have a more elaborated explanation.
Would you mind to post the code of the networks? I may identify it visually.
Wrt your other question no. Think that the gradient of the sum depends on each term only. This way the shared network gets a proper optimization as the gradient will be optimized jointly. Otherwise the shard net gets their parameters updated, then the error from loss2 was computed with the previous weights and they kinda fight to optimize the problems separately