RuntimeError : element 0 of tensors does not require grad and does not have a grad_fn

christian12 · May 12, 2023, 12:47pm

Hi,

I’m trying to use the “Artificial Trust-Based Task Allocation (ATTA)” code from this github: Link
I am using Python 3.8.3 and PyTorch 1.7.1 as stated on github. All other packages were installed with pip and are the current available versions.

I’ve been attempting to debug the issue, but I’m unsure what the problem is and how to solve it.

Running file “atta_caseII.py” gives me an error in Line 527.
The code runs for some time until the error occurs.
Adding the full code here exceeds the character limit.
Full code of “atta_caseII.py”: https://github.com/arshaali/artificial-trust-task-allocation/blob/main/code/atta_caseII.py

Line 527:

loss.backward() #take deriv of loss function wrt the model parameters

Error:

Exception has occurred: RuntimeError 
element 0 of tensors does not require grad and does not have a grad_fn 
  File "C:\[PATH_TO_FILE]\code\atta_caseII.py", line 527, in closure      
     loss.backward() #take deriv of loss function wrt the model parameters    
  File "C:\[PATH_TO_FILE]\code\atta_caseII.py", line 538, in <module>      
     optimizer.step(closure) #optimizer calculates the gradient and adjusts the parameters that will minimize the loss function  
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

The code around Line 527:

def closure(): #closure function must be defined for pytorch
    #this will calculate the gradients. this runs everytime.
    #diff1 = model(bin_c, obs_probs_idxs)
                            
    #diff = torch.tensor(model(bin_c, obs_probs_idxs) - obs_probs_vect, requires_grad=True)
    diff = model(bin_c, obs_probs_idxs) - obs_probs_vect #the diff between trust estimated from artificial trust model and trust approximation
    #print("model diff = ", diff1)
    #print("obs_probs_vect = ", obs_probs_vect)
    #print("diff = ", diff)
    #diff.retain_grad()

    # loss = torch.tensor(torch.mean( torch.pow(diff, 2.0) ), requires_grad=True) #loss needs to be defined in pytorch
                            
    loss = torch.mean( torch.pow(diff, 2.0) ) #calculate the current loss
    #loss.retain_grad() #Something to try if current implementation doesnt work
    #loss = torch.mean( torch.pow( (model(bin_c, obs_probs_idxs) - obs_probs_vect), 2.0 ) )
    #print("loss = ", loss)
    #print("loss.grad_fn = ", loss.grad_fn)
    optimizer.zero_grad() #standard command to give. Sets the gradients all to 0.
    #print("ran zero grad")
    loss.backward() #take deriv of loss function wrt the model parameters 
    #pytorch lets you choose a function to minimize. We are minimizing the loss function defined above.
    #print("ran loss backward")
    return loss

    #print("_l1 = ", model.sigm( model.pre_l_1))
    #print("_u1 = ", model.sigm( model.pre_u_1))
    #print("_l2 = ", model.sigm( model.pre_l_2))
    #print("_u2 = ", model.sigm( model.pre_u_2))
                        
                        
optimizer.step(closure) #optimizer calculates the gradient and adjusts the parameters that will minimize the loss function
#running the optimizer to update the parameters below

ptrblck · May 13, 2023, 9:46am

Based on the stacktrace it seems as it loss is detached from the computation graph. Could you check if the model output tensor has a valid .grad_fn or is also already detached?

christian12 · May 13, 2023, 12:14pm

It’s grad_fn: None.

I tried different things and the code works now. What ultimately lead to the solution was a warning section containing requires_grad_() in the PyTorch documentation about torch.tensor.
This is the code which works now:

def closure():
    loss = torch.mean( torch.pow( (model(bin_c, obs_probs_idxs) - obs_probs_vect), 2.0 ) ) 
    loss.requires_grad_(True)
    optimizer.zero_grad()
    loss.backward() 
    return loss
optimizer.step(closure)

I think this code is ok?
Thanks

ptrblck · May 13, 2023, 12:40pm

No, I don’t think your code fixes the actual issue and just masks it by explicitly calling .requires_grad_() on the already detached loss. Could you post the model definition so that I could check if the activation tensors are detached there?

christian12 · May 13, 2023, 1:13pm

Thank you for your quick reply.

I just started with PyTorch but I think you ask about these lines, right?

class RobotTrustModel(torch.nn.Module):

    def __init__(self):
        super(RobotTrustModel, self).__init__()

        #beta is not relevant for artificial trust model (see BTM paper for natural trust)
        self.pre_beta_1 = Parameter(dtype(4.0 * np.ones(1)), requires_grad=True)
        self.pre_beta_2 = Parameter(dtype(4.0 * np.ones(1)), requires_grad=True)

        #instead of using lower bounds of 0 and upper bounds of 1, the code is more
        #stable when using the range [-10,10] and then converting back later to [0,1].
        #requires_grad=True means it is part of the gradient computation (i.e., the value should be updated when being optimized)
        self.pre_l_1 = Parameter(dtype(-10.0 * np.ones(1)), requires_grad=True)
        self.pre_u_1 = Parameter(dtype( 10.0 * np.ones(1)), requires_grad=True)
        self.pre_l_2 = Parameter(dtype(-10.0 * np.ones(1)), requires_grad=True)
        self.pre_u_2 = Parameter(dtype( 10.0 * np.ones(1)), requires_grad=True)

The whole code can be found here:
https://github.com/arshaali/artificial-trust-task-allocation/blob/main/code/atta_caseII.py

christian12 · May 23, 2023, 9:56am

After some more digging I think I found the problem.

What I didn’t know was this:
If any operations are applied to two tensors, where at least one of them has requires_grad=True, the resulting tensor will also have requires_grad=True .

And this is what happens here. It happen that the returned tensor is a product of tensors without requires_grad=True.

That’s the problem right?

Here is my detailed Explanation:
Here we set requires_grad=True, that’s fine.
Code A:

...
from torch.nn import Parameter
...
dtype = torch.DoubleTensor
...
class RobotTrustModel(torch.nn.Module):
    def __init__(self):
       super(RobotTrustModel, self).__init__()
           self.pre_beta_1 = Parameter(dtype(4.0 * np.ones(1)), requires_grad=True)
           self.pre_beta_2 = Parameter(dtype(4.0 * np.ones(1)), requires_grad=True)

           self.pre_l_1 = Parameter(dtype(-10.0 * np.ones(1)), requires_grad=True) 
           self.pre_u_1 = Parameter(dtype( 10.0 * np.ones(1)), requires_grad=True) 
           self.pre_l_2 = Parameter(dtype(-10.0 * np.ones(1)), requires_grad=True) 
           self.pre_u_2 = Parameter(dtype( 10.0 * np.ones(1)), requires_grad=True) 
...

Coming from here:

...
def closure(): 
    diff = model(bin_c, obs_probs_idxs) - obs_probs_vect 
    loss = torch.mean( torch.pow(diff, 2.0) )  
    optimizer.zero_grad()  
    loss.backward() 
    return loss 

optimizer.step(closure)
...

the function “forward” which contains this among other things is called:
Code B:

def forward(self, bin_centers, obs_probs_idxs):
   ...
   for i in range(n_diffs): #loop over the number of the [row,col] indexes (max is nbins x nbins (625))
        # get the index for the row and column
        bin_center_idx_1 = obs_probs_idxs[i, 0] #get the ith cell row
        bin_center_idx_2 = obs_probs_idxs[i, 1] #get the ith cell col
       # pass l, u, beta & the bin to compute_trust for 1 & 2
        trust[i] = self.compute_trust(l_1, u_1, beta_1, bin_centers[bin_center_idx_1]) * self.compute_trust(l_2, u_2, beta_2, bin_centers[bin_center_idx_2])
   ...
   return trust

In order to get tensor trust[i] the function “compute_trust” has to be called twice in each iteration of the loop.
l_1, u_1 & beta_1 and those for _2 all have requires_grad=True because they are all a result of operations of tensors (defined in Code A) where requires_grad=True.

Function “compute_trust”:
Code C:

def compute_trust(self, l, u, b, p):
            trust = 1.0 - 1.0 / (b * (u - l)) * torch.log( (1.0 + torch.exp(b * (p - l))) / (1.0 + torch.exp(b * (p - u))) )

        else: 
            if p <= l: 
                trust = torch.tensor([1.0], requires_grad=True) #assign a trust of 1
            elif p > u: 
                trust = torch.tensor([0.0], requires_grad=True) #assign a trust of 0
            else:
                trust = (u - p) / (u - l + 0.0001) 
        
        return trust

Here is the problem. trust = torch.tensor([1.0] and trust = torch.tensor([0.0] don’t have requires_grad=True set.
In the case that both function calls of “compute_trust” within the loop in Code B always get either trust = torch.tensor([1.0] or trust = torch.tensor([0.0] as a return the returned trust in Code B also won’t have a required_trust= True which results in an error.

Changing trust = torch.tensor([1.0] and trust = torch.tensor([0.0] in Code C to this and the code works:

       if p <= l:  
                trust = torch.tensor([1.0] , requires_grad=True) 
       elif p > u:  
                trust = torch.tensor([0.0] , requires_grad=True)

.
.
.
.
.
The complete relevant code:

...
def closure(): 
    diff = model(bin_c, obs_probs_idxs) - obs_probs_vect 
    loss = torch.mean( torch.pow(diff, 2.0) )  
    optimizer.zero_grad()  
    loss.backward() 
    return loss 

optimizer.step(closure)
...

class RobotTrustModel(torch.nn.Module):

    def __init__(self):
        super(RobotTrustModel, self).__init__()

        #beta is not relevant for artificial trust model (see BTM paper for natural trust)
        self.pre_beta_1 = Parameter(dtype(4.0 * np.ones(1)), requires_grad=True)
        self.pre_beta_2 = Parameter(dtype(4.0 * np.ones(1)), requires_grad=True)

        #instead of using lower bounds of 0 and upper bounds of 1, the code is more
        #stable when using the range [-10,10] and then converting back later to [0,1].
        #requires_grad=True means it is part of the gradient computation (i.e., the value should be updated when being optimized)
        self.pre_l_1 = Parameter(dtype(-10.0 * np.ones(1)), requires_grad=True)
        self.pre_u_1 = Parameter(dtype( 10.0 * np.ones(1)), requires_grad=True)
        self.pre_l_2 = Parameter(dtype(-10.0 * np.ones(1)), requires_grad=True)
        self.pre_u_2 = Parameter(dtype( 10.0 * np.ones(1)), requires_grad=True)


    def forward(self, bin_centers, obs_probs_idxs):
        # the bins and hols [row,col] indexes which are not nan
        #bin_centers and obs_probs_idxs are passed in
        
        n_diffs = obs_probs_idxs.shape[0] #the number of [row,col] indexes (max is nbins x nbins (625))
        trust = torch.zeros(n_diffs) #create a 1xn_diffs array of 0s

        if(self.pre_l_1 > self.pre_u_1): #if the lower bound is greater than the upper bound
            buf = self.pre_l_1 #switch the l_1 and u_1 values
            self.pre_l_1 = self.pre_u_1
            self.pre_u_1 = buf

        if(self.pre_l_2 > self.pre_u_2): #if the lower bound is greater than the upper bound
            buf = self.pre_l_2 #switch the l_2 and u_2 values
            self.pre_l_2 = self.pre_u_2
            self.pre_u_2 = buf

        l_1 = self.sigm(self.pre_l_1) #convert to [0,1] range
        u_1 = self.sigm(self.pre_u_1)
        beta_1 = self.pre_beta_1 * self.pre_beta_1 #want beta to be positive to compute trust using the artificial trust model

        l_2 = self.sigm(self.pre_l_2)
        u_2 = self.sigm(self.pre_u_2)
        beta_2 = self.pre_beta_2 * self.pre_beta_2

        for i in range(n_diffs): #loop over the number of the [row,col] indexes (max is nbins x nbins (625))
            # get the index for the row and column
            bin_center_idx_1 = obs_probs_idxs[i, 0] #get the ith cell row
            bin_center_idx_2 = obs_probs_idxs[i, 1] #get the ith cell col
            # pass l, u, beta & the bin to compute_trust for 1 & 2
            trust[i] = self.compute_trust(l_1, u_1, beta_1, bin_centers[bin_center_idx_1]) * self.compute_trust(l_2, u_2, beta_2, bin_centers[bin_center_idx_2])
            #computing the trust estimate for each cell based on the current lower and upper bounds (basically the 3d trust plot)

        if usecuda:
            trust = trust.cuda()

        return trust

    def compute_trust(self, l, u, b, p):
        #passing in lower bound capability belief, upper bound capability belief, beta, task requirement lambdabar
        # p is the value from one bin
        if b < -50: #this is for natural trust. This never happens for the artificial trust model.
            trust = 1.0 - 1.0 / (b * (u - l)) * torch.log( (1.0 + torch.exp(b * (p - l))) / (1.0 + torch.exp(b * (p - u))) )
            print(f"b {trust}")
        else: #as long as you pass in a positive beta, we will be calculating artificial trust which doesnt depend on beta
            if p <= l: #if lambdabar is less than the lower bound capability belief
                #trust = torch.tensor([1.0])    # OLD
                trust = torch.tensor([1.0], requires_grad=True) #assign a trust of 1
            elif p > u: #if lambdabar is greater than the upper bound capability belief
                #trust = torch.tensor([0.0])    # OLD
                trust = torch.tensor([0.0], requires_grad=True) #assign a trust of 0
            else:
                trust = (u - p) / (u - l + 0.0001) #assign trust as a constant slope between u and l. 0.0001 is to not divide by 0.

        if usecuda:
            trust = trust.cuda()
        
        return trust #returns the trust in human agent given lower bound l, upper bound u, beta term b, and task requirement lambdabar p 

    def sigm(self, x): #sigmoid function to convert [-10,10] (really [-inf,inf]) to [0,1]
        return 1 / (1 + torch.exp(-x))

    def sigmoid(self, lambdabar, agent_c):
        #takes in task requirement for one dimension and the agent's actual capability for that dimension
        #calculates true trust to determine the stochastic task outcome

        #if lambdabar == agent_c, the sigmoid output is 0.5
        eta = 1/50.0 #is a good value through testing for good capability updating
        #eta = 1/5.0
        #eta = 1/500.0
        return 1 / (1 + math.exp((lambdabar - agent_c)/eta))

ptrblck · May 23, 2023, 4:38pm

You explanation is mostly correct, but as already explained setting requires_grad=True on an already detached tensor will not fix the issue.

E.g. look at this example:

model = nn.Linear(1, 1)
x = torch.randn(1, 1)

out = model(x)

# explicitly detach the tensor from the computation graph
out_detached = out.detach()

# this will raise an error
out_detached.mean().backward()
# RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

# try to "fix" it
out_detached.requires_grad_()

# backward won't fail anymore
out_detached.mean().backward()

# but also no gradiets are calculated
print(model.weight.grad)
# None

christian12 · May 25, 2023, 8:28am

Thanks.

I think I have it now, have I?

I deleted the , requires_grad=True from the self.pre_ variables.
Eg. What used to be
Parameter(dtype(4.0 * np.ones(1)), requires_grad=True) is now
Parameter(dtype(4.0 * np.ones(1))).

...
from torch.nn import Parameter
...
dtype = torch.DoubleTensor
...
class RobotTrustModel(torch.nn.Module):

    def __init__(self):
        super(RobotTrustModel, self).__init__()

        #beta is not relevant for artificial trust model (see BTM paper for natural trust)
        self.pre_beta_1 = Parameter(dtype(4.0 * np.ones(1)))
        self.pre_beta_2 = Parameter(dtype(4.0 * np.ones(1)))

        #instead of using lower bounds of 0 and upper bounds of 1, the code is more
        #stable when using the range [-10,10] and then converting back later to [0,1].
        #requires_grad=True means it is part of the gradient computation (i.e., the value should be updated when being optimized)
        self.pre_l_1 = Parameter(dtype(-10.0 * np.ones(1)))
        self.pre_u_1 = Parameter(dtype( 10.0 * np.ones(1)))
        self.pre_l_2 = Parameter(dtype(-10.0 * np.ones(1)))
        self.pre_u_2 = Parameter(dtype( 10.0 * np.ones(1)))
...

And what used to be:
torch.tensor([1.0], requires_grad=True) is now Parameter(dtype(np.ones(1))) &
torch.tensor([0.0], requires_grad=True) is now Parameter(dtype(np.zeros(1)))

def compute_trust(self, l, u, b, p):
        if b < -50:
            trust = 1.0 - 1.0 / (b * (u - l)) * torch.log( (1.0 + torch.exp(b * (p - l))) / (1.0 + torch.exp(b * (p - u))) )
        else: 
            if p <= l: 
                trust = Parameter(dtype(np.ones(1))) #assign a trust of 1 
            elif p > u: 
                trust = Parameter(dtype(np.zeros(1))) #assign a trust of 0
            else:
                trust = (u - p) / (u - l + 0.0001) 
        
        return trust

With this code requires_grad=True isn’t used anymore and hopefully no tensor gets detached. The whole code runs through without an error.