Using Maxpooling on output of LSTM(consider the mask matrix)

Hi everyone!
I try to use maxpooling on output of LSTM.

s_embedding = pack_padded_sequence(s_embedding, s_len_sorted, batch_first=True)
s_output, _ = self.lstm(s_embedding, self.hidden)
s_output, _ = pad_packed_sequence(s_output, batch_first=True)
s_output = s_output[s_idx_reverse.data]
s_output = self.maxpool(s_output) # self.mapool is an AdaptiveMaxpooling

But I think it is not correct because of many padding zeros are in s_output which will affect the output of maxpooling. So I use the mask matrix to set those padding zeros to float(’-inf’) before maxpooling

s_output[s_output * s_mask == 0] = float('-inf') # s_mask size: batch * seq_len * 1

But I got an error while training:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Question 1 :How can I implement it without inplace operation?
Question 2 :If I want to use Averagepooling, then what should I do?