Hello everyone, hope you are doing great.
I have a simple question, that might seem dumb, but I here it goes.
Why do I get a single hiddenstate for the whole batch when I try to process each timestep separately?
consider this simple case:
class Encoder(nn.Module):
def __init__(self, vocab_size, embedding_dim, hidden_size, num_layers=1, bidirectional=False, dropout=0.3):
super().__init__()
self.vocab_size = vocab_size
self.hidden_size = hidden_size
self.num_layers = num_layers
self.bidirectional = bidirectional
self.dropout = dropout
self.lstm = nn.LSTM(input_size=embedding_dim,
hidden_size=self.hidden_size,
num_layers=self.num_layers,
batch_first=True,
dropout=self.dropout,
bidirectional=self.bidirectional)
self.embedding = nn.Embedding(self.vocab_size, embedding_dim)
def forward(self, x, h):
x = self.embedding(x)
out = []
for t in range(x.size(0)):
xt = x[:,t]
print(f'{t}')
outputs, hidden = self.lstm(xt, h)
out.append((outputs,hidden))
print(f'{outputs.shape=}') # prints (batchsize, timestep, hiddensize)
print(f'{hidden[0].shape=}') # prints (1,hiddensize)
print(f'{hidden[1].shape=}') # prints (1,hiddensize)
return out
enc = Encoder(vocab_size=50,embedding_dim=75, hidden_size=100)
xt = torch.randint(0,50, size=(5,30))
h = None
enc(xt, None)
Now I’m expecting to get (batchsize, hiddensize)
for my hiddensize, the same way my outputs come out ok as (batchsize, timestep, hiddensize)
. but for some reason, the hiddenstate shape is (1,hiddensize)
, not (5,hiddensize)
which is the batchsize.
basically I’m getting a single hiddenstate for the whole batch at each iteration, but I get correct outputs for some reason!?
Obviously this doesn’t happen if I feed the whole input sequence all at once, but I need to grab each timestep for my Bahdanau attention mechanism. I’m not sure why this is happening?
any help is greatly appreciated.