def forward(self, input, hidden, encoder_outputs, mask):

`""" Run LSTM through 1 time step SHAPE REQUIREMENT - input: <1 x batch_size x N_LETTER> - hidden: (<num_layer x batch_size x hidden_size>, <num_layer x batch_size x hidden_size>) - lstm_out: <1 x batch_size x N_LETTER> """ # Incorporate attention to LSTM input hidden_cat = torch.cat((hidden[0], hidden[1]), dim=2) # attn_weights is 1 x batch_sz x MAX_NAME_LEN attn_weights = F.softmax(self.attn(torch.cat((input, hidden_cat), 2)), dim=2) # Set all pad characters to negative infinity attn_weights[mask] = float('-inf') # Softmax to re-adjust weights so pad chars have no weight, in torch dimension correlated to name, dim=2 attn_weights = torch.softmax(attn_weights, dim=2) attn_applied = torch.bmm(attn_weights.transpose(0,1),encoder_outputs.transpose(0,1)).transpose(0,1) attn_output = torch.cat((input, attn_applied), 2) attn_output = F.relu(self.attn_combine(attn_output)) # Run LSTM lstm_out, hidden = self.lstm(attn_output, hidden) lstm_out = self.fc1(lstm_out) lstm_out = self.softmax(lstm_out) return lstm_out, hidden`

This is my forward function as you can see I’m passing in a mask that sets all characters that are pad characters in the inputs to negative infinity then I’m applying a softmax over the inputs. I read in a blog this is how you’re supposed to apply masking to attention, but when I try to compute loss I get this exception.

“Exception has occurred: RuntimeError

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 2048, 40]], which is output 0 of SoftmaxBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).”