Hi,
I want to feed in 18 images of size (3,128,128) into an lstm of 17 layers. I’m a bit confused about what my input should be. Docs mention that the input should be of shape(seq_len, batch_size, input_size), When I draw my 1st batch using a data loader I get a tensor of size (18,3,128,128) Does this mean that my LSTM input is: seq_len =18, batch_size=1, input size =3128128 ? Will this flatten the image to a 3128128 vector? Or do I have to reshape it manually?
I also want to implement Teacher Forcing so I will be modifying the RNN class.
What should forward look like? Here’s what I’m trying but I can’t figure out how to write it.
class trialLSTM(nn.Module):
def __init__(self, seq_len, input_size, hidden_size, batch_size, num_layers):
super(trialLSTM, self).__init__()
self.seq_len = seq_len
self.input_size = input_size
self.hidden_size = hidden_size
self.batch_size = batch_size
self.lstm = nn.LSTM(seq_len, batch_size, input_size)
def init_hidden(self):
# initialize the hidden state and the cell state to zeros
hidden = torch.zeros(self.batch_size, self.hidden_size)
cell = torch.zeros(self.batch_size, self.hidden_size)
if gpu:
hidden = hidden.cuda()
cell = cell.cuda()
return hidden, cell
def forward(self, x, (h_0, c_0)):
# Incoming x is (18,3,128,128)
#do i need to reshape it to (1, 3, 128, 128)?like so:
# for i in range(0, 18):
# x[i] = x[i].reshape(1, 3, 128, 128)
# if yes, do I reshape it here or in the training loop?
output = torch.empty(seq_len-1, seq_len-1, seq_len-1)
for t in range(seq_len+1):
if t==0:
hidden, cell = self.lstm(x[0], (h_0,c_0))
else:
hidden, cell = self.lstm(x[t], (h_1,c_1))
TL;DR
I want to pass data of shape (18,3,128,128) (these are 18 images of shape (3,128,128) at a time in LSTM of 17 layers.
at time 0 input = (data[0], (h_0,c_0)) and output = (h_1,c_1)
at time 1 input = (data[1], (h_1, c_1)) and output = (h_2_c_2)
…
at time 16 input = (data[15], (h_15, c_15)) and output = (h_16, c_16)
so in my training loop:
model = trialLSTM(seq_len=18, input_size=(3*128*128), hidden_size=(3*128*128),
batch_size=1, num_layers=17)
def train(epoch):
for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad()
target = data[1,:]
if gpu:
data, target = data.cuda(), target.cuda()
output = model(data, (h_0, c_0)
# Here I should get a tensor of 18 hidden states of shape (3,128,128) each right?
loss = nn.BCELoss(output, target)
loss.backward()
optimizer.step()
What am I doing wrong? What should I be doing? Please help!