Training Pytorch model takes a long time to run

Hello, Pytorch newbie here. I am currently working on training a model that maps sequential data (of varying length) with dimensions (batch_size, seq_len, num_features) to a scalar (batch_size, 1) and I am unsure why my training code is running extremely slowly.

import torch.nn as nn

num_epochs = 100

class DifficultyModel(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.lstm = nn.LSTM(input_dim, output_dim = 1)

    def forward(self, x):
        output, (hidden, cell) = self.lstm(x)
        return torch.sum(output, dim=1).squeeze(1)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

num_features = 6
model = DifficultyModel(num_features).to(device)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())

# train_dataloader contains 6 features and automatically generates batches in batch_size = 16 but this can be configured.
for epoch in range(num_epochs):
    for i, (X, y) in enumerate(train_dataloader):
        X = X.to(device)
        Y = y.to(device)

        optimizer.zero_grad()

        output = model(X)
        loss = criterion(output, Y)

        loss.backward()
        optimizer.step()

    if epoch % 5 == 0:
        print(f'Epoch {epoch}, Loss {loss.item()}')

Also, the training output after 3 hours of running the code in Google Colab’s GPU runtime yielded the following:

Epoch 0, Loss 7248.39453125
Epoch 5, Loss 340.68792724609375
Epoch 10, Loss 320.6564636230469
Epoch 15, Loss 42.046302795410156
Epoch 20, Loss 62.36985778808594
Epoch 25, Loss 23.50816535949707
Epoch 30, Loss 55.912776947021484
Epoch 35, Loss 43.66672134399414
Epoch 40, Loss 27.032011032104492
Epoch 45, Loss 13.70750617980957

How can I be certain that the model is appropriately learning? Any help would be greatly appreciated.

Hi, LSTMs rely on sequential processing of data so the slow training might be expected.

The loss seems to be decreasing very rapidly so the model is indeed learning. You might want to compare the loss value to a baseline loss to see how well the model can learn.