Loading same data but getting different results

So I’m trying to manually split my training data into batches such that I can easily access them via indexing, and not relying on DataLoader to split them up for me, since that way I won’t be able to access the individual batches by indexing. So I tried the following:

train_data = datasets.ANY(root='data', transform=T_train, download=True)
BS = 200
num_batches = len(train_data) // BS
sequence = list(range(len(train_data)))
np.random.shuffle(sequence)  # To shuffle the training data
subsets = [Subset(train_data, sequence[i * BS: (i + 1) * BS]) for i in range(num_batches)]
train_loader = [DataLoader(sub, batch_size=BS) for sub in subsets]  # Create multiple batches, each with BS number of samples

Which works during training just fine.

However, when I attempted another way to manually split the training data I got different end results, even with all the same parameters and the following settings:

device = torch.device('cuda')
torch.manual_seed(0)
np.random.seed(0)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.cuda.empty_cache()

I only split the training data the following way this time:

train_data = list(datasets.ANY(root='data', transform=T_train, download=True))  # Cast into a list
BS = 200
num_batches = len(train_data) // BS
np.random.shuffle(train_data)  # To shuffle the training data
train_loader = [DataLoader(train_data[i*BS: (i+1)*BS], batch_size=BS) for i in range(num_batches)]

But this gives me different results than the first approach, even though (I believe ) that both approaches are identical in manually splitting the training data into batches. I even tried not shuffling at all and loading the data just as it is, but I still got different results (85.2% v.s 81.98% accuracy). I even manually checked that the loaded images from the batches match; and are the same using both methods.

Not only that, when I load the training data the conventional way as follows:

BS = 200
train_loader = DataLoader(train_data, batch_size=BS, shuffle=True)

I get even more drastic results!

Can somebody please explain to me why these differences arise, and how to fix it?

You have confirmed that “the loaded images from the batches match” between the first two methods. What happens afterward? Do you set your seed (e.g. torch.manual_seed(0)) before training?

Hey, yeah the order of the images from each batch is the same in all batches using both approaches. And before training, I’ve set the following:

device = torch.device('cuda')
torch.manual_seed(0)
torch.cuda.manual_seed_all(0)
np.random.seed(0)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.cuda.empty_cache()

Moreover, I would like to share an update that might shed some light:

T_train transformation contains some random transformations (H_flip, crop) and when using it along with the first train_loader, the time taken during training was: 24.79s/it, while the second train_loader took: 10.88s/it (even though both have the exact same number of parameters updates/steps). So I decided to remove the random transformations from T_train; then the time taken using the first train_loader reduced to: 16.99s/it, while the second train_loader remained at: 10.87s/it. So somehow, the second train_loader still took the same time (with or without the random transformations). Thus, I decided to bring back the random transformations in T_train to visualize the image outputs from the second train_loader to verify if the random transformations were being applied, and indeed they were! So this is really confusing and I’m not quite why they’re giving different results.

I am unsure why your second train_loader still has the random transformations after you remove them. Perhaps you need clear all the variable and re-run everything?

Aside from that, the other pitfall may be nondeterministic algorithms:
https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms