Tracking Loss through Parallel Paths

Joshua_Clancy · December 14, 2019, 12:54am

Hello!

I am a bit of a noob here so forgive me if this has a simple answer. For context, I successively implemented a tied weight VAE a few months ago and now I am testing out some other ideas to do with autoencoders. My current model has three parallel paths going through it in an attempt to make three specialized and separate latent spaces.

I am experimenting with how to specialize the latent spaces, but that is not my concern here. This question is specifically about the loss. Right now I have a list of losses (one for each path) and I loop through the losses performing backpropagation for each.

for loss in local_loss_list:
                lossy = loss
                # backpropagation
                optimizer.zero_grad()
                lossy.backward(retain_graph=True)
                # one step of the optmizer (using the gradients from backpropagation)               
                optimizer.step()
                total_loss = total_loss + lossy

I did this automatically without much thought, but now I am concerned. Will the loss from each parallel path backpropagate properly down the correct path? If so how? It seems to work, but I do not understand how the loss is being associated with the correct path of weights.

Thanks!!!

albanD · December 14, 2019, 1:14am

Hi,

The autograd computes the gradient for what you give it. So if you backprop for a loss computed with only one branch, it will backprop only that branch.

Note that if you just want to get the sum of the gradients and do a single update, you can do:

lossy = sum(local_loss_list)
# backpropagation
optimizer.zero_grad()
lossy.backward()
# one step of the optmizer (using the gradients from backpropagation)               
optimizer.step()
total_loss = total_loss + lossy

Joshua_Clancy · December 14, 2019, 6:05am

Yeah now that you mention it I think I have read that before. Thank you. So to be clear, keeping the parallel path’s losses separate is not actually doing anything because autograd’s computational graph keeps track of which weights are effecting the loss. So I can then just add them up and update them based on that? Is there any scenario where I should be worried about autograd not assigning the gradient to the correct weightings that are effecting the loss? Or does it just do all of that for you?

albanD · December 16, 2019, 10:43am

The autograd is a tool that just computes gradients. So it will set the gradients only for what is used for each loss.