Torch.bmm will break the gradient.RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Hi, all.

return output

Are you certain that it is bmm's fault? What does rest of your computation graph look like?

Please see this.
Thanks for your attention.
Here, l_atten is obtained from l_feature according to a NNs .So both l_feature and l_atten need backward.
My result code just like this:
l_feature1 = self.feature1(intput1) #feature1 is [batchsize,5] tensor , self.feature1 is a small neural network
l_feature2 = self.feature2(input2) #feature2 is [batchsize,5] tensor
l_feature3 = self.feature3(input3) #feature3 is [batchsize,5] tensor
l_feature =,l_feature2,l_feature3),1) #combine features, [batchsize,15] tensor
l_feature = l_feature.view(batch_size,3,5) #reshape features so that I can multiply attention coefficient
#attention coefficient
l_atten1 = self.atten1(l_feature1) #l_atten1 is [batchsize,1] tensor for feature1, self.atten1 is a small neural network
l_atten2 = self.atten2(l_feature2) #l_atten2 is [batchsize,1] tensor for feature2
l_atten3 = self.atten3(l_feature3) #l_atten3 is [batchsize,1] tensor for feature3
l_atten =,l_atten2,l_atten3,),1) #combine attention coefficient,[ bachsize , 3] tensor
for i in range(0,barch_size):
l_atten[i,]=F.softmax(l_atten[i,]) #softmax is used for normalization
output=torch.bmm(l_atten.unsqueeze(1),l_feature) ###attention
I’m not sure whether I clearly my question. Can you get it and help me ?

This is the inplace operation that breaks gradient calculation.

1 Like

Thank you!
That’s the key point!

Why don’t you format your code using Markdown?

Thanks for your suggestions.
I will consider using Markdown next time.