Quick question if the output for my network is a high dimensional vector and the resultant target value is of the same dimension, can i create a loss function such that it take a mean square error for each output node seperately and thus each output node gets its own error signal. Is this possible?

In the end, you mathematically only get a gradient of the shape of your parameters if you differentiate a scalar function.

There are cases where you want the Jacobian, ie differentiate a vector output, but those are rare enough that PyTorch does not support it as well as differentiating scalars. (But youâ€™ll find tricks if you search for Jacobian here on the forum.)

Best regards

Thomas