Backward Loss between outputs of different runs

jcarv · June 21, 2020, 4:34pm

I am trying to incorporate the difference between model outputs for different iterations, into the loss for training the model.
Is it possible to have a computation graph that results from the application of the same model to two different inputs?

Something like:

model = nn.Sequential(nn.Linear(input_size, 100),
            nn.ReLU(),
            nn.Linear(100, out_size),
            nn.ReLU()
        )
outputs = []
for idx, (x,_) in enumerate(train_loader):
  output = model(x)
  outputs.append(outputs)
  if idx%2 == 0:
    loss = criterion(outputs[0], outputs[1])
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    outputs = []

In this case, we would be saving the output of two sequential iterations, and minimize the loss between those two outputs. My main concern is that the same tensors are included twice in the computation graph. Would this be a problem?

Thanks for the help!

Scott_Hoang · June 21, 2020, 4:45pm

I think this is okay. remember to zero_grad your optimizer.

jcarv · June 21, 2020, 6:14pm

Thanks for the response! I will remember to add the zero_grad.
Do you have any idea on how the computation graph will be? Or how to test if it’s backpropagating correctly?

albanD · June 22, 2020, 4:37am

Is it possible to have a computation graph that results from the application of the same model to two different inputs?

Yes this is not problem.

Do you have any idea on how the computation graph will be?

The Parameters from your net have been used twice and their gradient will correspond to these two use (the sum of the gradients).
If you’re not afraid of very verbose representation, you can use torchviz to see how the graph will look like.

jcarv · June 23, 2020, 9:21am

@albanD
Thanks for the feedback.
I just tried to test one iteration of the model with torchviz, and run into an issue.
It seems that it just shows what the graph would be when we compute the model’s output with just one x, (ie. after 1 forward).

In my case I am actually doing 2 forwards of the model, and back-propagating the loss between the 2 outputs of the forward.

Testing it with MNIST I get: Screenshot 2020-06-23 at 11.17.29

Is there any workaround to test this?
My only concern is that computation might get messed up when the loss is computed between the two forwards (and not against a target).

Thanks!

albanD · June 23, 2020, 3:04pm

Do you give both outputs to torchviz to get this? Or just one? You will see the graph for a given output. So if you give the output of a single forward, you’ll see the graph for a single forward.
Or you want to give your loss directly which should depend on both outputs.