Using input tensors for loss instead of output y_pred

Hi,

for RL purposes I want to calculate the loss from MSE from the inputs x and x_expected: loss(x, x_expected), instead of loss(y_pred, y_expected). In the latter case the gradients are changing over time, but when using the inputs, it stalls. Anything I have to consider? Like copying grad_fn from the output / y_pred onto x so that the loss recognizes the graph and so on?

actually you point out that the graph might not be recognized, and this makes sense that with x and x_expected the computation through a network would not be taking place, so no learnable parameters would be involved, and so there shouldnt be any gradients for them. are x and x_expected themselves coming through some other computation graph that you expect to learn through your new loss?

no, they are not. just plainly incoming as tensors with required_grad = True and retain_grad()

I give an example of the scenario: x = sensor data (distance to player) is coming in, x_target target distance to player is given (0). This squared difference between these should make up the loss.
The net predicts aktuator movements as y.

no one? I hope there is a way to do this

understood, so the learnable parameters are the ones that dictate how the actuator movements are predicted. i’ll suggest a scheme with a notation different from yours, i hope it can be of use
current position/distance to player : x_now
actuator movement: y_now = net(x_now)
next position: x_next = x_now + y_now ( i am simplifying this as addition of actuator movement to position, you can of course have whatever physical function that dictates how action changes position)
loss : L(x_next,x_target) , e.g. (x_next - x_target)**2
due to x_next involving y_now which in turn involves net params, now you will get non zero gradients for the parameters

hm… very interesting.

but let’s say x_now is the 1d-height of the center of the head [17cm] and y_now outputs 4 controls of the legs as [0, 0.1, 0.6, 0.9], then i cannot simply add x_now and y_now together!?

yes that is why i made a comment in that part of the scheme. you will have to look at the physical model of your problem to fill in that part ( how does the head height change if some legs are extended etc.)

I hoped the model can learn that^^ it might easily be more complex than my example, also it would be nice to keep it as general as possible.

I mean, my first approach would’ve been to use the output (1x4), and just replace the tensor values with the x tensor values (1x1).

Update: I tried this:
(1) combining output with input

x_noleaf = x.clone() # cloning input to use as non-leaf node
x_noleaf[0] = x_noleaf[0] + y[0] - y[0]  # for simulating an interaction between the two x = x + Y - Y

this didn’t seem to do anything

(2) modifying the output

x_new = output.clone()
x_new = x_new[0][0:2]  # slicing for dropping elements
x_new.data[ix] = x_old.data[ix]
loss(x_new, x_target)

might work…

I didnt understand the second approach,sorry. for the first i think you tried assuming that introducnig y into the graph somehow would cause gradient flow. however you use y[0] as well as - y[0] which admit opposite gradient flows that cancel each other.

even without going into details of gradient flow, the statement essentially is the same as x_noleaf[0] = x_noleaf[0] which shouldnt have helped if you think about it.

i think you are trying elbow in gradients, while the overall consideration of why gradients should flow and if indeed they will be useful for reducing the loss you are interested in is getting lost.

well, yes, I am of course not sure, what I am doing. But I mean, I can use any kind of loss in tensorflow, so it is something special with autograd. If it was me, I would simply have output = net(input), backpropagate within there and it’s fine.

My second approach is very simple. Copying the output and putting the input tensor values inside, while dropping the unused values. Then using the modified output in the loss. So kinda hacking the output to match the input values, while keeping the graph. Gradients are changing at least, going to inf and nan quickly though with SGD

As far as I understand you need to update network with loss based on value you provide from some other place. And problem is that: a) you have multiple outputs, b) by replacing outputs you are losing grad_fn link for the backward pass.
Easiest solution I can imagine is to create custom loss function, sum all outputs, multiply this sum by zero and add torch.Tensor(of your value) :slight_smile:

Thank you all! I think it works better now

class Custom_MSELoss(torch.nn.Module):
    def __init__(self):
        super(Custom_MSELoss, self).__init__()

    def forward(self, output, x_train, x_expected):
        x_train = torch.sum(output) * 0 + x_train  # <--!
        MSE = [(x_train[0][i] - x_expected[0][i])**2 for i in range(len(x_expected[0]))]
        MSE = sum(MSE) / len(MSE)
        return MSE

criterion = Custom_MSELoss()

Looks fine to me. Could any operation here potentially detach the history?