How to construct an attention

111137 · December 14, 2019, 1:17pm

Hi,

I would like to use an attention like this;

attention = torch.relu(self.w_fc1(x))
self.x[0] = self.x[0] * torch.softmax(self.w_fc2(attention), dim=0)

where x is input vector.
#I removed avg_pool of 1st stage in attention.
This is in forward path description.
Model() includes;

self.register_buffer('x', torch.stack([torch.zeros(NUM_INPUT) for _ in range(NUM_HIDDEN)]))

self.w_fc1 = nn.Linear(NUM_INPUT, NUM_INPUT)
self.w_fc2 = nn.Linear(NUM_INPUT, NUM_INPUT)

Then I meet an error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [28]] is at version 56; expected version 28 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

What does mean version 56 ?
When I try to do like this;

attention = torch.relu(self.w_fc1(x))
self.x[0] = attention

Same error is happen.
I have no idea to solve and where I made a wrong.
Can anyone point out, or suggest to solve?

Best,
S.Takano

Kushaj · December 14, 2019, 1:36pm

The error is due to self.x = attention. The exact reason depends on the complete code. The main idea is as follows: Let ‘a’ depend on ‘x’ and ‘y’ for its gradient computation. Now when you do self.x = , then it is changing the value of x that a depend on, thus you have no way to compute the gradients for ‘a’.

111137 · December 14, 2019, 1:52pm

Hi Singh-san,

Where is “a” and “y” in your explanation?

S.Takano

Kushaj · December 14, 2019, 2:30pm

Those are madeup variables as it not possible to tell the exact problem with partial code. So I was trying to make up some dummy scenario in which case this error would occur.

111137 · December 14, 2019, 2:43pm

OK, I have modified my explanation, please replying.

Kushaj · December 14, 2019, 2:57pm

Doing self.x[0] = something is an in-place operation i.e. no copy of the object is being made. So are modifying the original object but that original object is also needed for backprop computation.

111137 · December 14, 2019, 3:02pm

I think NO. I think you do not understand python, sorry.

111137 · December 14, 2019, 4:25pm

I checked behavior with print insertion. Then second loss.backward() fails to do in a mini-batch.