Dataloader batching index

Hi there,
I am trying to do this:
Given a sequence of images, let’s say: pic1, pic2… pic 32, with batch size of 4, I will get batches like:
[pic1, pic2, pic3, pic4]. [pic5, pic6, pic7, pic8)] etc.
But what I want is that the next batch does not start at batchsize+1 element, but having some element in common with the previous batch. e.g.
[pic1, pic2, pic3, pic4]. [pic4, pic5, pic6, pic7].

Or just start batches at random position of the sequence of the image.

Cheers.
Xw

Hi WenXiaowei,
You wanted to select some baches of your sequence at a random position, as you mentioned above. You can shuffle them first; after that, select them. For the problem, I think the torch.unfold will be helpful with a little trick to use it. It would help if you put all your picture in a tensor and for the first tdimension of it, use the torch.unfold to fold the first dimension to your batch with 25 percent overlap.

Hi, thank you for reply.

I cannot shuffle them because the order it important.
I’ll try the second option :smiley:

Ty

Solve, by using a custom batch_sampler while creating dataloader. instead of a simple sampler.

loader = DataLoader(
        dataset,
        batch_sampler=SequenceSampler(len(dataset), batch_size),
        num_workers=num_workers,
        pin_memory=pin_memory,
        collate_fn=collate_fn
    )