Torch.bmm will break the gradient.RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Hi, all.

output=torch.bmm(l_atten.unsqueeze(1),l_feature)
return output

Are you certain that it is bmm's fault? What does rest of your computation graph look like?

Please see this.
Thanks for your attention.
Here, l_atten is obtained from l_feature according to a NNs .So both l_feature and l_atten need backward.
My result code just like this:
l_feature1 = self.feature1(intput1) #feature1 is [batchsize,5] tensor , self.feature1 is a small neural network
l_feature2 = self.feature2(input2) #feature2 is [batchsize,5] tensor
l_feature3 = self.feature3(input3) #feature3 is [batchsize,5] tensor
l_feature = torch.cat((l_feature1,l_feature2,l_feature3),1) #combine features, [batchsize,15] tensor
l_feature = l_feature.view(batch_size,3,5) #reshape features so that I can multiply attention coefficient
#attention coefficient
l_atten1 = self.atten1(l_feature1) #l_atten1 is [batchsize,1] tensor for feature1, self.atten1 is a small neural network
l_atten2 = self.atten2(l_feature2) #l_atten2 is [batchsize,1] tensor for feature2
l_atten3 = self.atten3(l_feature3) #l_atten3 is [batchsize,1] tensor for feature3
l_atten = torch.cat((l_atten1,l_atten2,l_atten3,),1) #combine attention coefficient,[ bachsize , 3] tensor
for i in range(0,barch_size):
l_atten[i,]=F.softmax(l_atten[i,]) #softmax is used for normalization
output=torch.bmm(l_atten.unsqueeze(1),l_feature) ###attention
I’m not sure whether I clearly my question. Can you get it and help me ?

This is the inplace operation that breaks gradient calculation.

1 Like

Thank you!
That’s the key point!

Why don’t you format your code using Markdown?

Thanks for your suggestions.
I will consider using Markdown next time.