To differentiate a sum of two terms, you differentiate each term separately. Hence adding a constant value to your loss has no effect whatsoever on the gradients produced by backpropagation.
avg_pixel_value is calculated from the input image without using any of the model parameters or outputs, so for backpropagation with respect to the model parameters, it is ignored.
Well, when you do G_loss.backward() pytorch first looks at what operation produced G_loss, in this case
G_loss = G_loss + mean_output_pixel_value
so pytorch retrieves the values that were summed and basically runs backward on both of them.
But mean_output_pixel_value is either a float value which has no .backward() method, or it is a Variable calculated from input_image which is a Variable with requires_grad=False. Therefore mean_output_pixel_value.backward() does nothing.
I shall try another explanation… The reason for running G_loss.backward() is to calculate a measure of how each parameter affected the value of G_loss, this measure is stored in param.grad for each parameter.
Now obviously, the model parameters have no effect whatsoever on the mean_output_pixel_value, so adding it to G_loss will not change the calculated param.grad values in any way at all.
When you write G_loss = G_loss + mean_output_pixel_value python first calculates G_loss + mean_output_pixel_value and stores the result in a new tensor, then python updates the name G_loss to point to this result.
That line does not update the data stored in G_loss. It basically acts like this: new_G_loss = old_G_loss + mean_output_pixel_value.
If you wanted to do an inplace addition, you could do
G_loss += mean_output_pixel_value
but that would either be non-differentiable, or pytorch would assume you meant new_G_loss = old_G_loss + mean_output_pixel_value
so something like loss = function(inputs) + lambda * penalty(inputs) where function and penalty are both functions of my inputs and lambda is a constant
However, I notice that my loss term behaves independently of lambda, i.e. it doesn’t change the learning trajectory despite me changing lambda.
Are you able to call loss.backward() if only penalty(inputs) is used?
If not, this would indicate that penalty is detaching the inputs and might create a constant which will not influence the gradients.
On the other hand, if loss.backward() still works and creates valid gradients in the parameters of the model, could you post a minimal and executable code snippet to reproduce the issue?