How are the batches created and sampled by Dataloader?

Dear community,

if I create my batch collection of a data set using

batches = torch.utils.data.DataLoader(train_set, batch_size=bs, shuffle=True)

will any data point appear in exactly one batch? I.e. is the sum of batches equal to my entire data set?
Or is each batch independently sampled from the entire data set, and it can happen that there are data points that don’t belong to a batch or that the same data point happens to be in different batches?

Also, if I do this:

    for epoch in range(epochs):
        for (batchidx, (features, targets)) in enumerate(batches_train):
            output = serial_net.forward(features)

Will the order of batches that is fed into the network be equal for each epoch?

Best,
PiF

If you are using the shuffle option in the DataLoader, a RandomSampler will be created as seen here. This sampler sets replacement argument to False by default so that a random permutation will be applied as seen here.
This makes sure that each sample is only drawn once in this setup.
Note that the last batch might contain less than batch_size samples, of the length of the dataset (number of samples) is not evenly divisible by the batch_size. If you are using drop_last=True in the DataLoader, this last smaller batch will be removed and this epoch will not contain all samples from the dataset.

No, it’ll be reshuffled in each epoch.

Thank you very much for clarifying!