Hey there!
I’m starting with pytorch so I wanted to implement a neural language model.
Everything was going ok, I started facing problems when trying to work with GPU.
In fact I have a typical model that embeds, run RNN (LSTM) then use an output projection xW+b then a softmax.
My model is like:
class RnnLm(nn.Module):
def __init__(self, params):
super().__init__()
self.params = params
self.embedding = nn.Embedding(num_embeddings=params.vocab_size,
embedding_dim=params.embed_dim)
self.cell = nn.LSTM(input_size=params.embed_dim,
hidden_size=params.hidden_size,
batch_first=True)
self.out_w = autograd.Variable(torch.randn(params.hidden_size, params.vocab_size))
self.out_b = autograd.Variable(torch.randn(params.vocab_size))
def _embed_data(self, src):
"""Embeds a list of words
"""
src_var = autograd.Variable(src)
embedded = self.embedding(src_var)
return embedded
def forward(self, inputs):
# inputs: nested list [batch_size x time_steps]
# emb_inputs: [bs x ts x emb_size]
emb_inputs = self._embed_data(inputs)
log("Input: %s ; Embedded: %s "% (str(inputs.size()), str(emb_inputs.size())))
# Running the RNN
# o: [bs x ts x h_size]
# h: [n_layer x ts x h_size]
# c: [n_layer x ts x h_size]
o, (h, c) = self.cell(emb_inputs)
o = o.contiguous()
self.o = o
log("Outputs: %s" % str(o.size()))
log("h %s" % str(h.size()))
log("c %s" % str(c.size()))
# Output projection
# oo: [bs*ts x h_size]
# logits: [bs*ts x vocab_size]
oo = o.view(-1, params.hidden_size)
logits = oo @ self.out_w + self.out_b.expand_as(logits)
# Softmax
prediction = F.log_softmax(logits)
return prediction
The whole code can be seen here: https://github.com/pltrdy/pytorchwork/blob/master/rnn_lm.ipynb its quite experimental (=messy).
Trying to work with GPU, I create a object “model” then call model.cuda()
.
The problem then comes from out_w
& out_b
that are not cuda tensors
print("data type: oo: %s; out_w: %s" % (str(type(oo.data)), str(type(self.out_w.data))))
Returns:
of type data type: oo: <class 'torch.cuda.FloatTensor'>; out_w: <class 'torch.FloatTensor'>
oo
's type is ok, but out_w
should be torch.cuda.FloatTensor
.
Obviously, I could add some .cuda()
for out_w
& out_b
in RnnLM.__init__
but thats fixing without learning.
Thanks for any help or suggestion.