Sorry for asking this basic question but I think I was always under the impression that Dataloader shuffle just reorders the batches without doing changing the order of the images. So for example, my batch size is 2 and my images are: 0, 1, 2, 3, 4, 5, 6, 7
If I call the dataloader with shuffle set to true, I get the following batches: [0, 1], [2, 3], [4, 5], [6, 7] and then the order of these batches is changed, so, in the end, I could get something like: [2, 3], [0, 1], [6, 7], [4, 5]
Is this how shuffle works in dataloader or is the order of the images changed entirely (meaning the order of the images changes to 3, 4, 7, 0, 1, 5, 2, 6) and then they are converted to batches?
I guess I had the wrong assumption then. I thought that images were left untouched and only the batches were shuffled around. Also, what if I use a random sampler and then call the dataloader with the random sampler and shuffle set to false. Will I get the behavior where the images are not shuffled but the batches are?
I don’t think we have a fixed-batched random sampler.
It should be “easy” to do: you can have you dataset of size real_size / batch_size. Have your dataset return a whole batch when asked for a single index .Use a regular random sampler and a batch-size of 1 for the dataloader.
I have to admit, I’m not sure why it is done this way in this code. Maybe for historical reasons, he was trying other samplers before.
You can double check the doc here to make sure what the behavior will be with shuffle or if you set the sampler.
Hello! I’m quite new here, but I think my question is related to this topic.
When shuffle is True, it just kind of build batches with random indexes? Or it actually shuffles my dataset?
If it shuffles the entire dataset, my labels tensor is shuffled the same way?
I know it can be a stupid question, but as I am building my labels tensor inside my custom dataset class, I don’t know if shuffle can change the match I’m creating.
If shuffle=True is set in the DataLoader, a RandomSampler will be used as seen in these lines of code.
This sampler will create random indices and pass them to the Dataset.__getitem__ method as seen here.
Your data and target correspondence should thus hold, since the same index should be used to load these tensors.
The idea is to have an extra dimension.
In particular, if you use a TensorDataset, you want to change your Tensor from real_size, ... to real_size / batch_size, batch_size, ... and as for batch 1 from the Dataloader. That way you will get one batch of size batch_size every time. Note that you get an input of size 1, batch_size, ... that you might want to reshape to remove the leading 1.
I know it is an old post, but I came across the exact same problem.
Can you give an example how to do what you suggested?
I’m working on CIFAR for example - and I want to have certain order of images and that the dataloder will only shuffle between batches and not inside a batch.