I want to use LSTMCell so I can move them to gpu when I call model.to(). But the hidden state and cell state of LSTMCell had to be moved to the gpu for LSTMCell’s weights to be in the gpu. So I googled how I’m supposed to do it and found that I needed to use buffers with register_buffer(). But I don’t know how I’m supposed to save the output of LSTMCell to the buffers I initialized.
Here’s what I tried.
class Model(nn.Module): def __init__(self, ...): ... self.lstm = nn.LSTMCell(256, 256) self.register_buffer('hidden', torch.zeros(batch, 256)) self.register_buffer('cell', torch.zeros(batch, 256)) ... def clear_lstm(self): # update buffers to torch.zeros(batch, 256) self.register_buffer('hidden', torch.zeros(batch, 256)) self.register_buffer('cell', torch.zeros(batch, 256)) pass def forward(self, input): ... self.hidden, self.cell = self.lstm(input, (self.hidden, self.cell)) ...
I tried it without calling model.clear_lstm() before running and it worked, but after 3 runs, I ran out of memory(when I called .cuda() on each state, this didn’t happened). Running the model after calling model.clear_lstm() the buffers and LSTMCell wasn’t in gpu so I got an error:
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'mat2'