Efficiency of dataloader and collate for large array-like datasets

For  `DataLoader` , set  `batch_size=None`  and use  `sampler=`  for your new sampler.

Do you mean batch_sampler= instead of sampler= ? I learned from the doc that batch_sampler= yields batch of indices a time, which suits this case. Here’s the doc: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

  • sampler ( Sampler , optional ) – defines the strategy to draw samples from the dataset. If specified, shuffle must be False .
  • batch_sampler ( Sampler , optional ) – like sampler , but returns a batch of indices at a time. Mutually exclusive with batch_size , shuffle , sampler , and drop_last .