I have an MLP model and my goal is to predict 2 variables(so ideally I should have 2 neurons in my final layer). However, the first variable (lets call it var1) has sub values(up to 12) and the second variable(var2) has just a single value. Does it make sense to have 13 neurons in my final layer and during back prop, I’ll calculate 2 losses(one w.r.t to var1 and the second wrt var2), then sum them up?

**For better intuition- my scenario is a bit complex so I’ll use the house prediction analogy**

Imagine we’re trying to predict the price of houses and the number of rooms in each house(just for the sake of intuition)- we should have just 2 neurons in the final layer. However, lets say we want to be more specific(and we have enough data) to predict the prices of houses in 12 different states along side the number of rooms. So we’ll have 13 neurons in our final layer(12 for the prices and 1 for # of rooms).

- Does this architecture make sense?
- Does it make sense then to compute our loss wrt to the 2 variables independently and sum them up?

**Something like this**

```
output_dims = 12
input_dims = 491776
mse_loss = nn.MSELoss()
model = MLP(input_dims, output_dims) #we get a tensor with 13 values
l1 = mse_loss(model[0][:-1], true_prices) #prices
l2 = mse_loss(model[0][-1], true_rooms) #number of rooms
loss = l1 + l2
optimizer.zero_grad()
loss.backward()
optimizer.step()
```