I want to feed in 18 images of size (3,128,128) into an lstm of 17 layers. I’m a bit confused about what my input should be. Docs mention that the input should be of shape(seq_len, batch_size, input_size), When I draw my 1st batch using a data loader I get a tensor of size (18,3,128,128) Does this mean that my LSTM input is: seq_len =18, batch_size=1, input size =3128128 ? Will this flatten the image to a 3128128 vector? Or do I have to reshape it manually?
I also want to implement Teacher Forcing so I will be modifying the RNN class.
What should forward look like? Here’s what I’m trying but I can’t figure out how to write it.
class trialLSTM(nn.Module): def __init__(self, seq_len, input_size, hidden_size, batch_size, num_layers): super(trialLSTM, self).__init__() self.seq_len = seq_len self.input_size = input_size self.hidden_size = hidden_size self.batch_size = batch_size self.lstm = nn.LSTM(seq_len, batch_size, input_size) def init_hidden(self): # initialize the hidden state and the cell state to zeros hidden = torch.zeros(self.batch_size, self.hidden_size) cell = torch.zeros(self.batch_size, self.hidden_size) if gpu: hidden = hidden.cuda() cell = cell.cuda() return hidden, cell def forward(self, x, (h_0, c_0)): # Incoming x is (18,3,128,128) #do i need to reshape it to (1, 3, 128, 128)?like so: # for i in range(0, 18): # x[i] = x[i].reshape(1, 3, 128, 128) # if yes, do I reshape it here or in the training loop? output = torch.empty(seq_len-1, seq_len-1, seq_len-1) for t in range(seq_len+1): if t==0: hidden, cell = self.lstm(x, (h_0,c_0)) else: hidden, cell = self.lstm(x[t], (h_1,c_1))
I want to pass data of shape (18,3,128,128) (these are 18 images of shape (3,128,128) at a time in LSTM of 17 layers.
at time 0 input = (data, (h_0,c_0)) and output = (h_1,c_1)
at time 1 input = (data, (h_1, c_1)) and output = (h_2_c_2)
at time 16 input = (data, (h_15, c_15)) and output = (h_16, c_16)
so in my training loop:
model = trialLSTM(seq_len=18, input_size=(3*128*128), hidden_size=(3*128*128), batch_size=1, num_layers=17) def train(epoch): for batch_idx, (data, target) in enumerate(train_loader): optimizer.zero_grad() target = data[1,:] if gpu: data, target = data.cuda(), target.cuda() output = model(data, (h_0, c_0) # Here I should get a tensor of 18 hidden states of shape (3,128,128) each right? loss = nn.BCELoss(output, target) loss.backward() optimizer.step()
What am I doing wrong? What should I be doing? Please help!