How to construct an attention


I would like to use an attention like this;

attention = torch.relu(self.w_fc1(x))
self.x[0] = self.x[0] * torch.softmax(self.w_fc2(attention), dim=0)

where x is input vector.
#I removed avg_pool of 1st stage in attention.
This is in forward path description.
Model() includes;

self.register_buffer('x', torch.stack([torch.zeros(NUM_INPUT) for _ in range(NUM_HIDDEN)]))
self.w_fc1 = nn.Linear(NUM_INPUT, NUM_INPUT)
self.w_fc2 = nn.Linear(NUM_INPUT, NUM_INPUT)

Then I meet an error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [28]] is at version 56; expected version 28 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

What does mean version 56 ?
When I try to do like this;

attention = torch.relu(self.w_fc1(x))
self.x[0] = attention

Same error is happen.
I have no idea to solve and where I made a wrong.
Can anyone point out, or suggest to solve?


The error is due to self.x = attention. The exact reason depends on the complete code. The main idea is as follows: Let ‘a’ depend on ‘x’ and ‘y’ for its gradient computation. Now when you do self.x = , then it is changing the value of x that a depend on, thus you have no way to compute the gradients for ‘a’.

Hi Singh-san,

Where is “a” and “y” in your explanation?


Those are madeup variables as it not possible to tell the exact problem with partial code. So I was trying to make up some dummy scenario in which case this error would occur.

OK, I have modified my explanation, please replying.

Doing self.x[0] = something is an in-place operation i.e. no copy of the object is being made. So are modifying the original object but that original object is also needed for backprop computation.

I think NO. I think you do not understand python, sorry.

I checked behavior with print insertion. Then second loss.backward() fails to do in a mini-batch.