After torch.utils.data.random_split, len function and shape attribute gives different values

I used

train_dataset, val_dataset = torch.utils.data.random_split(dataset=train_val_dataset, lengths=[train_size, val_size])

Then I verified:

len(train_dataset)

and it gives 54000.

But when I verified:

train_dataset.dataset.data.shape

it gives 60000.

Why would it happen?

thanks

random_split creates Subsets, which use their internal indices to sample from the original dataset as seen here. You are accessing the internal and original dataset, which was not manipulated.

1 Like