Expected input batch_size (5) to match target batch_size (2)

My dataset has temporal data of coordinates of shape (1000,5,2). Each (5,2) is a pair of 5 coordinates that looks like below. 1000 of such arrays are merged together to create my dataset.

array([[109, 106],
[109, 106],
[110, 109],
[110, 109],
[108, 107]])

I am sending it to an LSTM but I am getting the following error while passing the output to the loss calculation. My output layer has 2 classes.

ValueError: Expected input batch_size (5) to match target batch_size (2)

The following is the code that I tried:

#Create Data Loader
input_size = 2
batch_size = 1
hidden_dim = 12
output_size = 2
epochs = 20
use_cuda = torch.cuda.is_available()
params = {'batch_size': batch_size, 'shuffle': False, 'num_workers': 2, 'pin_memory': True} if use_cuda else {}

train_loader = data.DataLoader(train_set, **params)
#test_loader = data.DataLoader(test_set, **params)


#model 
class Model(nn.Module):
    def __init__(self, device,input_size, output_size, hidden_dim, n_layers):
        super(Model, self).__init__()

        # Defining some parameters
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        self.device = device
        self.softmax = nn.Softmax(dim=-1)

        #Defining the layers
        # LSTM Layer
        self.lstm = nn.LSTM(input_size, hidden_dim, n_layers, batch_first=True)   
        # Fully connected layer
        self.fc = nn.Linear(hidden_dim, output_size)
    
    def forward(self, x):
        batch_size = x.size(0)

        # Initializing hidden state for first input using method defined below
        hidden = self.init_hidden(batch_size).to(device)
        cell_state = self.init_hidden(batch_size).to(device)
        
        # Passing in the input and hidden state into the model and obtaining outputs
        out, hidden = self.lstm(x, (hidden,cell_state))
        
        # Reshaping the outputs such that it can be fit into the fully connected layer
        out = out.contiguous().view(-1, self.hidden_dim)
        out = self.fc(out)
        out = self.softmax(out)

        return out, hidden

    def init_hidden(self, batch_size):
        # This method generates the first hidden state of zeros which we'll use in the forward pass
        hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim)
        return hidden

# Instantiate the model with hyperparameters
model = Model(device,input_size=input_size, output_size= output_size, hidden_dim= hidden_dim, n_layers=2).to(device)

# Define hyperparameters and loss function 
lr=0.01
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

#Train 
for epoch in range(epochs):
    sum_loss = 0.0
    total = 0 
    for batch_idx, (X, y) in enumerate(train_loader):
        X, y = X.to(device), y.to(device).view(-1, )
        optimizer.zero_grad() # Clears existing gradients from previous epoch
        output, hidden = model(X)
        loss = criterion(output, y.view(-1).long())
        loss.backward() # Does backpropagation and calculates gradients
        optimizer.step() # Updates the weights accordingly
        sum_loss += loss.item()*y.shape[0]
        total += y.shape[0]
        if (batch_idx+1) % 1000 == 0:
            print('Epoch: {}/{}.............'.format(epoch+1,epochs), end=' ')
            print("Loss: {:.4f}".format(sum_loss/total))

I guess a view or reshape operation might be wrong and in particular this line of code might be interesting to check:

        # Reshaping the outputs such that it can be fit into the fully connected layer
        out = out.contiguous().view(-1, self.hidden_dim)

Here you are pushing all “additional” dimension/shapes to dim0, which is the batch dimension and should stay constant.
Use: out = out.contiguous().view(out.size(0), -1) instead as it would keep the batch size equal and would then flatten the additional values into the feature dimension.
If a shape mismatch is raised in the linear layer afterwards, adapt the in_features to the expected value.

Also, remove the nn.Softmax layers, since nn.CrossEntropyLoss expects raw logits and will apply F.log_softmax internally.

These are the shapes that are going in and coming out of my architecture
Before applying your suggestion to reshape my output.

1. LSTM Input : torch.Size([1, 5, 2])
2. LSTM output : torch.Size([1, 5, 100])
3. FC layer input :  torch.Size([5,100])
4. FC layer output : torch.Size([5,2])

After applying your suggestion to reshape my LSTM output.

input to LSTM:  torch.Size([1, 5, 2])
Output of LSTM:  torch.Size([1, 5, 100])
Input to FC layer:  torch.Size([1, 500])
Output of FC layer:  torch.Size([1, 2])

According to this post, my output shapes when using out = out.contiguous().view(-1, self.hidden_dim) from architecture seem right.
When I applied your suggested code I got the following error
ValueError: Expected input batch_size (1) to match target batch_size (2).