Customizing the batch with specific elements

Hey, I am a fresh starter with pytorch. Strangley I cannot find anything related to this, although it seems rather simple.

I want to structure my batch with specific examples, like all examples per batch having the same label or just fill the batch with examples of just 2 classes.

How would I do that? For me it seems the right place within the dataloader and not in the dataset? As the dataloader is responsible for the batches and not the dataset?

Is there simple minimal example?

You could write a custom sampler and could use the current implementations as the base class.
The sampler is responsible to create the indices, which are then passed to the Dataset,__getitem__.
You could thus use the target tensor and create batches of indices using your custom sample logic.
In case you would like to use weighted sampling, you could use WeightedRandomSampler instead.
Note however, that this sampler will not guarantee to sample a specific number of classes in each batch.

What you mean by target tensor?

The tensor returned by the Dataset containing the target values.
E.g. for a multi-class classification it would contain the class indices, which would be passed to nn.CrossEntropyLoss.

I created a sampler and use it in the dataloader as a batch_sampler argument. No I need to have a len method. What is it? The length of all datapoints or the number of batches the iterator returns?

That is not clear and confusing and in the link mentioned above not well documented…

The Dataset.__len__ method returns the number of samples, which should be drawn from the Dataset.

Sorry, but we are talking about sampler, and in the code aboce in the link you see that Samplers have a len emthod. I get an error because I need len(dataloader)…

Why are you talking about the Dataset now? pytorch is really confusing…