Error when computing gradients

Hi, I am trying to use my customized embedding loss for learning.
However, I have been getting the error below:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Below is my code for the loss function:

def disc_loss(output,num, x=None,y=None):
    if pred == True:
        #pdist = nn.PairwiseDistance(p=2)
        #loss = pdist(x,y).sum()
        num = x.size(1)
        loss = torch.sqrt(torch.sum((x-y)**2))
    return loss

x and y in the function are the embedding that I got from intermediate conv layer.
I keep getting the error message during .backward() function, but I am not sure what is wrong with my implementation. Can somebody help me with this?
Thank you!

I think the problem is solved by using requires_grad = True for loss


Seems that you want to Implement MSELoss? In that case use the standard MSELoss criterion.

@omarfoq Hi, yea I forgot about MSELoss. However, I’ve just tried with MSELoss and it gives same error which seems very weird :frowning:

In that case, the problem is not related to this part. Can you please share how you feed X and Y to the loss function, or if possible share the part of code corresponding to your training loop?

@omarfoq Thank you for the help!
Here is a part of my code

                output,em1 = disc(real_image.float()) #Real point cloud data
                _,em2 = disc(inputs2.float()) #Fake or predicted point cloud data
            ls_fake = disc_loss(output, num=1,x=em1,y=em2,pred=True)

Here, disc is a model that I am using and em1 and em2 are embedding or features from intermediate layer. So, I basically have to compute MSELoss based on the two em1 and em2. But, I get an error at ls_fake.backward()

@omarfoq Let me know if you need more information!

Hello Seems that your computation graph gets disconnected when you get em1 and em2, can you show how you get them from disc

Below is a part of the code for my model, disc.

def forward(self, inputs):
..... More code
        x3 = x2.max(dim=-1, keepdim=False)[0]
        em = x3
        x = F.leaky_relu(self.bn6(self.linear1(x3)), negative_slope=0.2) # (batch_size, emb_dims*2) -> (batch_size, 512)
        #x = self.dp1(x)
        x = F.leaky_relu(self.bn7(self.linear2(x)), negative_slope=0.2) # (batch_size, 512) -> (batch_size, 256)
        x = self.dp2(x)
        x = self.linear3(x)                                             # (batch_size, 256) -> (batch_size, output_channels)
        return x,em

em is just the output striaght from the convolution and maxpooling.


The problem is in

em = x2.max(dim=-1, keepdim=False)[0]

I think you can’t propagte this operation, have a look to this topic and this post

@omarfoq So does this mean I shoud use pytorch built in maxpooling function for this right?

That’s one possibility. I think a better implementation may take advantage of using hooks. I think you can keep the original and use register_forward_hook during the forward phase to get the embedding.

@omarfoq Thank you very much for the help! Could you give me an example how I can use register_forward_hook for my case?


x =

embeddings = {}
def hook_fn(model, input_, output):
    activation["embeddings"] = output.squeeze()


In my case features is the name of the before last layer in the network.