Hi, all.

output=torch.bmm(l_atten.unsqueeze(1),l_feature)

return output

Hi, all.

output=torch.bmm(l_atten.unsqueeze(1),l_feature)

return output

Are you certain that it is `bmm`

's fault? What does rest of your computation graph look like?

Please see this.

Thanks for your attention.

Here, l_atten is obtained from l_feature according to a NNs .So both l_feature and l_atten need backward.

My result code just like this:

l_feature1 = self.feature1(intput1) #feature1 is [batchsize,5] tensor , self.feature1 is a small neural network

l_feature2 = self.feature2(input2) #feature2 is [batchsize,5] tensor

l_feature3 = self.feature3(input3) #feature3 is [batchsize,5] tensor

l_feature = torch.cat((l_feature1,l_feature2,l_feature3),1) #combine features, [batchsize,15] tensor

l_feature = l_feature.view(batch_size,3,5) #reshape features so that I can multiply attention coefficient

#attention coefficient

l_atten1 = self.atten1(l_feature1) #l_atten1 is [batchsize,1] tensor for feature1, self.atten1 is a small neural network

l_atten2 = self.atten2(l_feature2) #l_atten2 is [batchsize,1] tensor for feature2

l_atten3 = self.atten3(l_feature3) #l_atten3 is [batchsize,1] tensor for feature3

l_atten = torch.cat((l_atten1,l_atten2,l_atten3,),1) #combine attention coefficient,[ bachsize , 3] tensor

for i in range(0,barch_size):

l_atten[i,]=F.softmax(l_atten[i,]) #softmax is used for normalization

output=torch.bmm(l_atten.unsqueeze(1),l_feature) ###attention

I’m not sure whether I clearly my question. Can you get it and help me ?

This is the inplace operation that breaks gradient calculation.

1 Like

Thank you!

That’s the key point!

Why don’t you format your code using Markdown?

Thanks for your suggestions.

I will consider using Markdown next time.