Could loss function output a vector instead of a scalar?

saeed_i · May 23, 2021, 1:59am

hi
i want to use vector loss. i.e output of loss function is be a vector as follow:

def my_loss(y_pred, y_val):
    return (y_pred - y_val)**2

but in train step i (loss.backward()) I got this error:
grad can be implicitly created only for scalar outputs
that mean output of loss function can not be vector!!!
Is there a solution to this problem?
i want loss be a vector

eqy · May 23, 2021, 2:42am

Well the issue here is that in order to take a gradient, the derivative must be with respect to a single value. How should the vector loss be interpreted?

Lucky_Magna · May 23, 2021, 3:28am

@saeed_i You can use reduction=None parameter in the MSELoss to get what you are desiring. But I agree with @eqy , in order to take a gradient, the derivative must be with respect to a single value. While calculating the gradient you must consider the error over the complete dataset.

saeed_i · May 23, 2021, 7:46pm

why we cant do that???
I confused about the mathematical of this. Although it may be simple !!
Can you explain with an example ??
thankssss

KFrank · May 23, 2021, 9:42pm

Hi Saeed!

When you use a loss function to train a model, the loss function is
telling you which set of model parameters is “better” than other sets
of model parameters.

Let’s say you have a model, and when it has weight_A as its
parameters it produces loss_vector_A = [1.1, 4.4, 2.2].
Let’s also say the when the same model has weight_B as its
parameters it produces loss_vector_B = [2.2, 1.1, 3.3].

Is the model a better model with weight_A or weight_B? If the loss
function produced just a scalar (instead of a vector), we would just say
that the smaller scalar value corresponds to the better model. (That’s
really what “loss function” means.)

(If you say just add up the elements of your loss vectors to see
which model is better, then you would really be saying that
loss_vector_A.sum() and loss_vector_B.sum() should be
your scalar loss-function values.)

Best.

K. Frank

saeed_i · May 23, 2021, 10:38pm

yes i know these things.
i confused in mathematical calucation:
assume we have:
x = torch.tensor([2. , 3. , 5.], requires_grad= True)
y_1 = x.pow(2)
y_2 = x.pow(2).sum()

then:
y_1 = tensor([ 4., 9., 25.], grad_fn=)
y_2 = tensor(38., grad_fn=)

we know Derivation of x^2 is equal to 2x and gradient of y_1 is be 2x
why we put sum end of it(y_1==>y_2)??

KFrank · May 24, 2021, 1:17am

Hi Saeed!

You are making the hidden assumption that y_1[0] depends only
on x[0], y_1[1] only on x[1], and y_1[2] only on x[2]. This
happens to be true in your particular example of y_1 = x.pow (2).

How would you change your reasoning for the case where
y_1 = x * x.roll (1)?

Please look at the concept of the Jacobean matrix and how it is
the generalization of the gradient to a vector-valued function.

(And to avoid any misconception, let me reiterate what I said in
my previous post: In your current example, y_1 is a vector, rather
than a scalar, so it cannot be used as a loss function.)

Best.

K. Frank

googlebot · May 24, 2021, 4:08am

I think backward() documentation explains it pretty well, you can backward() on vector if you specify grad_tensor[s], that is a gradient of some vector-to-scalar function, if this argument consists of ones that’s the same as .sum().backward(), and a weighted sum otherwise.
As parameter.grad is for scalar-valued functions, reduction to a scalar is present one way or the other.