Confused about LSTMs/Rnn Data_iterator

I am attempting to use an LSTM to classify what the type of weather on a particular day at a specific location on features e.g. humidity, temperature, etc . Therefore my data takes the following format: [locations, days_for_that_location, number_of_dimensions] which turns out to be [100, x, 8]. Where x is anywhere between 900 and 6000 - I have more data for some locations than I do for others. I have to classify each day into Sunny, Rainy, Snowy, etc as a result the label dataset has size [locations, days_per_location, 1] which turns out to be [100, x, 1]. Where the third dimension is a number from 0 to 6, each number representing a type of weather - so that I can use cross entropy loss. Below follows what I built - which I do not think is correct.

In some sense I am trying to use PoS tagging techniques for this.

from torch.utils.data import Dataset
class WeatherData(Dataset):
    def __init__ (self, location):
        self.samples = []
        
        for day in location:
            self.samples.append((day['features'], day['lables']))
            
    def __len__(self):
        return len(self.samples)
    
    def __getitem__(self, idx):
            return(self.samples[idx])
BATCH_SIZE = 1
DatasetW= WeatherData(Data)
train_iterator = torch.utils.data.DataLoader(DatasetW, batch_size = BATCH_SIZE)

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.lstm1 = nn.LSTM(input_size = 8, hidden_size = 32, num_layers = 2, dropout = 0.5)
        self.fc1 = nn.Linear(32, 120)
        self.fc2 = nn.Linear(120,9)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = x.unsqueeze(0)
        x = x.unsqueeze(0)
        x, _ = self.lstm1(x)
        x = F.dropout(x, p = 0.9)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.softmax(x)
        return x


net = Net()
net = net.double()
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Where the issue is most likely - I should not be passing in each data point one at a time.

for epoch in range(100):  # loop over the dataset multiple times

    running_loss = 0.0
    o_lst = []
    l_lst = []
    for i, data in enumerate(train_iterator, 0):
        inputs, labels = data
        
        optimizer.zero_grad()
        for data_point, lab in zip(inputs[0], labels[0][0]):
            outputs = net(data_point)
            lab  = lab.unsqueeze(0)
            outputs = outputs.squeeze(0)

            loss = criterion(outputs, lab)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
           
        
            o_lst.append((outputs.argmax().item()))
            l_lst.append(lab.item())
       

    print("Epoch: ", epoch, "Loss: ",running_loss)
  
print('Finished Training')

I am certain this is wrong because I am only passing in one data-point a time and I believe I should be using the LSTM differently. Confused about how to create a dataset and a subsequent model that has variable input length and take advantage of batching if possible.

The current model works in the sense that it runs but it does not learn - the loss is stuck - which to be honest is not surprising.