A question about backpropagation

Hello everyone,

I have a question about backpropagation.

In my program I use cross_entropy as the cost function.

I use a DNN predict 3 weights(w1, w2, w3) for 3 different models(M1,M2,M3). And after that I calculate the prediction like this:
prediction=w1*M1+w2*M2+w3*M3(w1,w2,w3 are vectors and M1,M2,M3 are matrics)
I have a targetmatrice, now I want after training prediction same as targetmatrice.
my code is like this:

prediction = w1.mul(M1) + w2.mul(M2) + w3.mul(M3)

            loss = loss_func(prediction.cuda(), target.cuda())

I want to ask if I do something wrong? Because the loss seems difficult to reduce.

Please help me
Thank you

Hello Wentao!

It is not clear to me what you are trying to do.

One key question:

When you call optimizer.step(), what numbers are you
intending to update? Normally, you would have the optimizer
update weights (and biases) in your model. What optimizer
are you using? What did you pass in as your parameters
when you instantiated optimizer?

Some more questions:

Are M1, M2, and M3 fixed, and you are trying to optimize
the values of w1, w2, and w3? Or vice versa? (Or both?)

Why are the ws vectors? What are the shapes of the ws and
Ms? What is the shape of prediction? What are your inputs?

It sounds like you are trying to get a predicted matrix to match
a target matrix. You say you are using “cross_entropy”. This
doesn’t seem like the appropriate loss function for what I am
guessing you are trying to do. A loss function along the lines
of mean-squared-error might be a better choice.

(CrossEntropyLoss is typically used for classification problems,
and takes integral class labels for its target.)

Good luck.

K. Frank

Hello Frank,

Thank you for your reply!

What I‘m trying to do is stream weighting, so M1, M2, M2 are the fixed matrices. I want use dnn to predict the weights for the different models, so w1,w2,w3 are the vectors which have the same length as M1,M2,M3. The optimizer is used to update the parameters in dnn. The dnn produces the weights vectors. The cross entropy will be better for my task, because the output is kind of posteriors probability. I‘m not sure if I have made some mistakes, I don‘t know why my loss don‘t reduced.

Thank you