(PyTorch 0.4)
How does one apply a manual dropout layer to a packed sequence (specifically in an LSTM on a GPU)? Passing the packed sequence (which comes from the lstm layer) directly does not work, as the dropout layer doesn’t know quite what to do with it and returns something not a packed sequence. Passing the data of the packed sequence seems like it should work, but results in the attribute error shown below the code sample.
Perversely, I can make this an inplace operation (again, on the data directly, not the full packed sequence) and it technically works (i.e., it runs) on the CPU, but gives a warning on the GPU that the inplace operation is modifying a needed gradient.
So:
- Are the different behaviors between CPU and GPU expected?
- What is the overall correct way to do this on a GPU?
- What is the overall correct way to do this on a CPU?
def __init__ (self, ....):
super(Model1, self).__init__()
....
self.drop = torch.nn.Dropout(p=0.5, inplace=False)
def forward(self, inputs, lengths):
pack1 = nn.utils.rnn.pack_padded_sequence(inputs, lengths, batch_first=True)
out1, self.hidden1 = self.lstm1(pack1, (self.hidden1[0].detach(), self.hidden1[1].detach()))
out1.data = self.drop(out1.data)
AttributeError: can't set attribute