Gradients not being calculated as expected

In the following example, the gradient values for critic.bias.grad and critic.weight.grad are zero, but the loss is -0.5. I set up the custom loss to make sure there was nothing funny happening with mse_loss but using mse_loss gives the same result. Why is this the case? I would expect bias grad to be 1 because:
Error = ((target_output - output)^2)/2, so dE/dY would be 2(out[1]-1-out[1])^2)*(-1)/2 = 1

Any help that guides me into why this happens would be very useful to be able to implement custom layers successfully in pytorch.

import torch.nn as nn
import torch

class CartPoleNN(nn.Module):
def init(self, input_size):
super(CartPoleNN, self).init()
self.fc1 = nn.Linear(input_size, 20) # Fully connected layer 1, 0.4), 0.4)

    # Actor network = nn.Sequential(
        nn.Linear(20, 2),
        nn.Softmax(dim=-1)  # Apply softmax to get action probabilities

    for layer in
        if isinstance(layer, nn.Linear):
  , 0.4)
  , 0.4)

    # Critic network
    self.critic = nn.Sequential(
        nn.Linear(20, 1),

    for layer in self.critic:
        if isinstance(layer, nn.Linear):
  , 0.4)
  , 0.4)

def forward(self, x):
    x = self.fc1(x)  # Pass through the first fully connected layer
    action_probs =
    critic_value = self.critic(x)
    return action_probs, critic_value

class CustomLoss(nn.Module):
def init(self):
super(CustomLoss, self).init()

def forward(self, inputs, targets):
    loss = -0.5 * (targets - inputs)**2
    return loss.sum()

model = CartPoleNN(4)
Inputs = torch.tensor([0.0132, -0.2175, -0.0469, 0.2295])
out = model(Inputs)
Values = out[1]
AdvantagePY = out[1]
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
Loss = CustomLoss()
CriticLoss = Loss(out[1], out[1]-1)
model.critic.zero_grad() # Zero the gradients

The loss here is always -0.5 * (out[1] -1 - out[1])**2 = -0.5
Since 0.5 is constant the derivative and the derivative wrt a constant to anything is 0, the derivative of loss wrt to any of the parameters is 0.

Hi soulitzer:
Thanks for your kind reply. As you point out, the derivative will be zero in this case because it’s a constant. In my original example I had (taget_output - expected_output) = 1 but target_output was not = out[1] -1. To simplify the post and recreate the value I used this example, which turns the loss into a constant and doesn’t help show my issue at all, so I apologize for wasting your time.
It turns out that the issue is with scaling the error before passing it to the loss.
When I have Advantage = (Reward - Values) (with both being vectors) I’m trying to use
Advantage = Advantage/Advantage.abs().max() to normalize gradient values at [-1,1], this is what’s producing the unexpected gradient values for me. I’m trying to scale the gradients based on the value in advantage = (target_output - critic_output), and with a manual implementation I did elsewhere I’m able to scale the gradients like this, but it seems pytorch does not allow this probably due to the way autograd takes the loss. I will look into it further and maybe create a new post (this time with an acceptable non-constant working example).
Appreciate your help, and again sorry for the time waste.