Is shuffling of the dataset performed by randomizing the access index for the getitem method or is the dataset itself shuffled in some way (which i doubt since I slice the data only in parts from an hdf5 file)
My question concerns the data access of different hdf5 datasets within the getitem method.
The getittem function does exactly what you code it to do.
The good practice is to provide the structure of the data to be loaded in the init, for example, generating a list of files and then to code all the workload in the getittem function.
The dataset class (of pytorch) shuffle nothing. The dataloader (of pytorch) is the class in charge of doing all that.
At some point you have to return the amount of elements your data has, how many samples.
If you set shuffling, it will vary the ordering of the idx, however it’s totally agnostic to what that idx points to.
1 Like
bis this index zero-based like the python convention?
Yep
Play with this 
import torch
class Dataset(torch.utils.data.Dataset):
def __len__(self):
return 11
def __getitem__(self, idx):
return idx
s=Dataset()
loader= torch.utils.data.DataLoader(s,
batch_size=1, shuffle=True,
num_workers=0)
for i in iter(loader):
print(i)
2 Likes