Hello, Pytorch newbie here. I am currently working on training a model that maps sequential data (of varying length) with dimensions `(batch_size, seq_len, num_features)`

to a scalar `(batch_size, 1)`

and I am unsure why my training code is running extremely slowly.

```
import torch.nn as nn
num_epochs = 100
class DifficultyModel(nn.Module):
def __init__(self, input_dim):
super().__init__()
self.lstm = nn.LSTM(input_dim, output_dim = 1)
def forward(self, x):
output, (hidden, cell) = self.lstm(x)
return torch.sum(output, dim=1).squeeze(1)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
num_features = 6
model = DifficultyModel(num_features).to(device)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
# train_dataloader contains 6 features and automatically generates batches in batch_size = 16 but this can be configured.
for epoch in range(num_epochs):
for i, (X, y) in enumerate(train_dataloader):
X = X.to(device)
Y = y.to(device)
optimizer.zero_grad()
output = model(X)
loss = criterion(output, Y)
loss.backward()
optimizer.step()
if epoch % 5 == 0:
print(f'Epoch {epoch}, Loss {loss.item()}')
```

Also, the training output after 3 hours of running the code in Google Colabâ€™s GPU runtime yielded the following:

```
Epoch 0, Loss 7248.39453125
Epoch 5, Loss 340.68792724609375
Epoch 10, Loss 320.6564636230469
Epoch 15, Loss 42.046302795410156
Epoch 20, Loss 62.36985778808594
Epoch 25, Loss 23.50816535949707
Epoch 30, Loss 55.912776947021484
Epoch 35, Loss 43.66672134399414
Epoch 40, Loss 27.032011032104492
Epoch 45, Loss 13.70750617980957
```

How can I be certain that the model is appropriately learning? Any help would be greatly appreciated.