DataLoader on CPU

I’ve been working through this an Udemy course and the following notebook illustrates a HUGE performance cost I am experiencing on an M1 Studio Max:
https://www.learnpytorch.io/04_pytorch_custom_datasets/

I’m convinced the problem isn’t training/testing/inference or anything pertaining to processing the neural net. The problem is the DataLoader. Simply iterating over the DataLoader is insufferably slower on the Mac than on Google colab. This has nothing to do with GPU, either the T4 on Colab or the M1 GPU. On Colab I have not even enabled the GPU runtime and on the both computers I am not – for this demonstration of the problem – moving the model to the GPU device.

After build the ImageLoad, the model, the train/test functions, etc. There is then a fairly normal looking PyTorch loop over the epochs. For the purpose of the demo, I have actually disabled all NN functions by putting a ‘continue’ at the top of the batch loops in the train/test functions, like this:

    for batch, (X, y) in enumerate(dataloader):
        continue  # <=== Just test the dataloader, not any NN functionality
        X, y = X.to(device), y.to(device)
        y_pred = model(X)
        loss = loss_fn(y_pred, y)
        train_loss += loss.item()
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

I then ran the cell that actually runs the epoch loop. As per the course instructor’s intent, it runs quickly on Colab. The total 5-epoch runtime is 3.38s.

But the exact same notebook, downloaded and unmodified, when run on my 10-core M1 Max Mac Studio, takes 302s to run.

The entire dataset this educational notebook utilizes is 16.2MB. Before anyone responds asking if anything would obviously make the data big (image size, whatever), please note my comparison above. It runs in 3 or 4 seconds, as intended, on Colab. So the code itself, the task designed around it by the instructor include the size of the dataset, is all already designed to be small and fast.

This is direct difference between these two environments.

I saw very similar results when I ran on an iMac Pro, although I don’t have the comparison in front of me at the moment.

Does anyone have any idea what could possibly be going on here?

Thank you.