Adding a float variable as an additional loss term works?

Suppose I have a loss term as shown below:

G_sample = G(z)
D_fake = D(G_sample)

# Generator loss
G_loss = 0.5 * torch.mean((D_fake - 1)**2)

G_loss.backward()
G_solver.step()
reset_grad()

If I add any ordinary float variable to G_loss before .backward(),
for example,
G_loss = G_loss + avg_pixel_value

Does avg_pixel_value has any effect when doing back prop?
Or it is completely ignored?

Since avg_pixel_value itself cannot use .backward() api, I have a doubt of its effectiveness.

My question is can I add ordinary float terms to PyTorch Loss?

Many thanks.

To differentiate a sum of two terms, you differentiate each term separately. Hence adding a constant value to your loss has no effect whatsoever on the gradients produced by backpropagation.

Thanks for the reply.

Even though the additional term I mentioned doesn’t support .backward(), it’s not a constant but a variable(python float).

In my sample code,
I think avg_pixel_value might affect the value of G_loss.
(avg_pixel_value changes every iteration)

G_loss = G_loss + avg_pixel_value

Is there any faults on my understanding?

avg_pixel_value is calculated from the input image without using any of the model parameters or outputs, so for backpropagation with respect to the model parameters, it is ignored.

Might I suggest a lecture on backpropagation by Andrej Karpathy.

Thanks for the answer.

What if I use model’s output as blow:
G_loss = G_loss + mean_output_pixel_value

But mean_output_pixel_value is merely a float variable that doesn’t have graph information.

Can mean_output_pixel_value affect back propagation?

1 Like

Well, when you do G_loss.backward() pytorch first looks at what operation produced G_loss, in this case

G_loss = G_loss + mean_output_pixel_value

so pytorch retrieves the values that were summed and basically runs backward on both of them.
But mean_output_pixel_value is either a float value which has no .backward() method, or it is a Variable calculated from input_image which is a Variable with requires_grad=False. Therefore mean_output_pixel_value.backward() does nothing.

I shall try another explanation… The reason for running G_loss.backward() is to calculate a measure of how each parameter affected the value of G_loss, this measure is stored in param.grad for each parameter.

Now obviously, the model parameters have no effect whatsoever on the mean_output_pixel_value, so adding it to G_loss will not change the calculated param.grad values in any way at all.

I’m still confused.

After run
G_loss = G_loss + mean_output_pixel_value,

G_loss will be updated.

Back propagation will be computed via multiplying G_loss and the input values of each later layers.

That’s why I thought that the updated G_loss will have some effect during backprop.

When you write G_loss = G_loss + mean_output_pixel_value python first calculates G_loss + mean_output_pixel_value and stores the result in a new tensor, then python updates the name G_loss to point to this result.

That line does not update the data stored in G_loss. It basically acts like this: new_G_loss = old_G_loss + mean_output_pixel_value.

If you wanted to do an inplace addition, you could do

G_loss += mean_output_pixel_value

but that would either be non-differentiable, or pytorch would assume you meant new_G_loss = old_G_loss + mean_output_pixel_value

2 Likes

Thanks for the great answer

1 Like

I have a question along these lines

I have a loss and a penalty term

so something like loss = function(inputs) + lambda * penalty(inputs) where function and penalty are both functions of my inputs and lambda is a constant

However, I notice that my loss term behaves independently of lambda, i.e. it doesn’t change the learning trajectory despite me changing lambda.

Any thoughts why?

Are you able to call loss.backward() if only penalty(inputs) is used?
If not, this would indicate that penalty is detaching the inputs and might create a constant which will not influence the gradients.
On the other hand, if loss.backward() still works and creates valid gradients in the parameters of the model, could you post a minimal and executable code snippet to reproduce the issue?