Hi, I’m new to Pytorch, there is a doubt that am having in the Image Captioning example code . In DcoderRNN class the lstm is defined as ,
self.lstm = nn.LSTM(embed_size, hidden_size, num_layers, batch_first=True)
in the forward function ,
embeddings = self.embed(captions)
embeddings = torch.cat((features.unsqueeze(1), embeddings), 1)
we first embed the captions and then concat the embeddings with the context feature from the EncoderCNN , but the concat increases the size from embed size how we can forward that to the lstm ? as the input size of lstm is already defined as embed_size .
Am I missing something here ?
Thanks in advance .