Is it normal to see such a high difference in training runs

Not sure where to ask this, but this is an embarassing question but I’m baffled at my findings. I’m running a Lenet5 with MNIST test as common of an example as it ever gets. Here is the code in torch is below. Nothing out of extraordinary. I’m running only 10 epochs but that should be enough I believe, with batchsize = 512, lr=0.1, momentum=0.1 onestep learning cycle. Sounds like a reasonable setup. Here is where I’m baffled, the runs are vastly different depending on the random seed (i.e. just completely different learning curve each time). How come this be, am I missing something fundamental in my understanding of neural nets (see the attached tensor board capture)?

    for epoch in range(params.num_epochs):
        print(f"Epoch: {epoch}")
        model.train()

        for i, (x_train, y_train) in enumerate(train_loader):

            x_train = x_train.to(device)
            y_train = y_train.to(device)

            logits = model(x_train)

            # print(logits.shape, x_train.shape, y_train.shape)
            loss = criterion(logits, y_train)

            writer.add_scalar("Loss/train", loss.item(), epoch * len(train_loader) + i,)
            train_losses.append(loss.item())
            # Backward pass and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            # Calculate accuracy
            acc = accuracy(logits, y_train)
            writer.add_scalar("Acc/train", acc, epoch * len(train_loader) + i)
            train_accuracies.append(acc)
            sched.step()

            writer.add_scalar("First layer norm", torch.linalg.norm(model.conv1.weight), epoch)
            
        # valid
        model.eval()
        for x_test, y_test in test_loader:

            x_test = x_test.to(device)
            y_test = y_test.to(device)

            logits = model(x_test)
            loss = criterion(logits, y_test)
            writer.add_scalar("Loss/valid", loss.item(), epoch * len(train_loader) )
            valid_losses.append(loss.item())

            # Calculate accuracy
            acc = accuracy(logits, y_test)
            valid_accuracies.append(acc)
            writer.add_scalar("Acc/valid", acc, epoch * len(train_loader))