Regarding batch loading in pytorch

peepeepoopoo · November 13, 2020, 5:32am

Hi! just curious as to how Dataloader create batches in sequential mode. if I have a list of files in order of
[“file1.txt”, “file2.txt”, “file3.txt”, “file4.txt”,“file5.txt”,“file6.txt”], would the output order be something like this (with a batch size of 2): [“file1.txt”, “file2.txt”], [ “file3.txt”, “file4.txt” ] , [“file5.txt”,“file6.txt”],

ptrblck · November 15, 2020, 8:56am

If you are storing the filenames in e.g. a list in the initially posted order, then you are correct in the batch outputs.
If you are not using shuffle=True, the Dataset.__getitem__ will get indices in a sequential order (i.e. 0, 1, 2, 3, ...) and depending how you are storing and loading the data, these indices will just be used in the __getitem__ method.

peepeepoopoo · November 16, 2020, 12:15am

Thanks! I can confirm this is the case observing the output.