Adding a float variable as an additional loss term works?

TheIllusion · March 5, 2018, 2:55am

Suppose I have a loss term as shown below:

G_sample = G(z)
D_fake = D(G_sample)

# Generator loss
G_loss = 0.5 * torch.mean((D_fake - 1)**2)

G_loss.backward()
G_solver.step()
reset_grad()

If I add any ordinary float variable to G_loss before .backward(),
for example,
G_loss = G_loss + avg_pixel_value

Does avg_pixel_value has any effect when doing back prop?
Or it is completely ignored?

Since avg_pixel_value itself cannot use .backward() api, I have a doubt of its effectiveness.

My question is can I add ordinary float terms to PyTorch Loss?

Many thanks.

jpeg729 · March 5, 2018, 6:33am

To differentiate a sum of two terms, you differentiate each term separately. Hence adding a constant value to your loss has no effect whatsoever on the gradients produced by backpropagation.

TheIllusion · March 5, 2018, 6:42am

Thanks for the reply.

Even though the additional term I mentioned doesn’t support .backward(), it’s not a constant but a variable(python float).

In my sample code,
I think avg_pixel_value might affect the value of G_loss.
(avg_pixel_value changes every iteration)

G_loss = G_loss + avg_pixel_value

Is there any faults on my understanding?

jpeg729 · March 5, 2018, 6:59am

avg_pixel_value is calculated from the input image without using any of the model parameters or outputs, so for backpropagation with respect to the model parameters, it is ignored.

Might I suggest a lecture on backpropagation by Andrej Karpathy.

TheIllusion · March 5, 2018, 7:06am

Thanks for the answer.

What if I use model’s output as blow:
G_loss = G_loss + mean_output_pixel_value

But mean_output_pixel_value is merely a float variable that doesn’t have graph information.

Can mean_output_pixel_value affect back propagation?

jpeg729 · March 5, 2018, 7:27am

Well, when you do G_loss.backward() pytorch first looks at what operation produced G_loss, in this case

G_loss = G_loss + mean_output_pixel_value

so pytorch retrieves the values that were summed and basically runs backward on both of them.
But mean_output_pixel_value is either a float value which has no .backward() method, or it is a Variable calculated from input_image which is a Variable with requires_grad=False. Therefore mean_output_pixel_value.backward() does nothing.

I shall try another explanation… The reason for running G_loss.backward() is to calculate a measure of how each parameter affected the value of G_loss, this measure is stored in param.grad for each parameter.

Now obviously, the model parameters have no effect whatsoever on the mean_output_pixel_value, so adding it to G_loss will not change the calculated param.grad values in any way at all.

TheIllusion · March 6, 2018, 3:36am

I’m still confused.

After run
G_loss = G_loss + mean_output_pixel_value,

G_loss will be updated.

Back propagation will be computed via multiplying G_loss and the input values of each later layers.

That’s why I thought that the updated G_loss will have some effect during backprop.

jpeg729 · March 6, 2018, 8:03am

When you write G_loss = G_loss + mean_output_pixel_value python first calculates G_loss + mean_output_pixel_value and stores the result in a new tensor, then python updates the name G_loss to point to this result.

That line does not update the data stored in G_loss. It basically acts like this: new_G_loss = old_G_loss + mean_output_pixel_value.

If you wanted to do an inplace addition, you could do

G_loss += mean_output_pixel_value

but that would either be non-differentiable, or pytorch would assume you meant new_G_loss = old_G_loss + mean_output_pixel_value

TheIllusion · March 6, 2018, 9:46am

Thanks for the great answer

Surya_Narayanan · June 8, 2023, 6:37pm

I have a question along these lines

I have a loss and a penalty term

so something like loss = function(inputs) + lambda * penalty(inputs) where function and penalty are both functions of my inputs and lambda is a constant

However, I notice that my loss term behaves independently of lambda, i.e. it doesn’t change the learning trajectory despite me changing lambda.

Any thoughts why?

ptrblck · June 8, 2023, 10:54pm

Are you able to call loss.backward() if only penalty(inputs) is used?
If not, this would indicate that penalty is detaching the inputs and might create a constant which will not influence the gradients.
On the other hand, if loss.backward() still works and creates valid gradients in the parameters of the model, could you post a minimal and executable code snippet to reproduce the issue?