I want to use LSTMCell so I can move them to gpu when I call model.to(). But the hidden state and cell state of LSTMCell had to be moved to the gpu for LSTMCell’s weights to be in the gpu. So I googled how I’m supposed to do it and found that I needed to use buffers with register_buffer(). But I don’t know how I’m supposed to save the output of LSTMCell to the buffers I initialized.
Here’s what I tried.
class Model(nn.Module):
def __init__(self, ...):
...
self.lstm = nn.LSTMCell(256, 256)
self.register_buffer('hidden', torch.zeros(batch, 256))
self.register_buffer('cell', torch.zeros(batch, 256))
...
def clear_lstm(self):
# update buffers to torch.zeros(batch, 256)
self.register_buffer('hidden', torch.zeros(batch, 256))
self.register_buffer('cell', torch.zeros(batch, 256))
pass
def forward(self, input):
...
self.hidden, self.cell = self.lstm(input, (self.hidden, self.cell))
...
I tried it without calling model.clear_lstm() before running and it worked, but after 3 runs, I ran out of memory(when I called .cuda() on each state, this didn’t happened). Running the model after calling model.clear_lstm() the buffers and LSTMCell wasn’t in gpu so I got an error:
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'mat2'