I have a problem where I am regressing over a vector of length 8. The first four values of the vector represent one output and the next four represent another output. Since these are two different quantities with different scales, I want to split my output into two, such that the loss is calculated individually for each output quantity and then summed up when backpropagated.

Can I write something like this in the forward function

def forward(self, x):
x = self.layer1(x) # feature computation
...
x = self.fc(x) #last layer (fully connected)
y, z = x.split(2)
return y, z

I don’t know if it will be handled well by the loss function or not. And, does it also require two nn.Linear layers at the end in parallel?

You can just return a tuple (but you’d want x.chunk(2, dim=-1) or somesuch to split into two chunks. One fc probably works if your optimizer considers tensor entries individually (e.g. Adam is OK, but LAMB will consider parameters as large tensors and the “size” of the change is proportional to the “length” (both Frobenius norm) of the parameters as vectors).
It doesn’t make much much sense to do the backpropagation completely separate, it is usually much better to scale the loss terms to some sort of equilibrium (either of the losses or the gradients or so) and then backpropagate in one go.