How do I split an iterable dataset into training and test datasets?

I have an iterable dataset object with all of my data files. How can I split it into train and validation set. I have seen a few solutions for custom datasets but iterable does not support len() operator. torch.utils.random_sample() and torch.utils.SubsetRandomSample() don’t work.

def __init__(self):
 def __iter__(self):
     yield batch

I don’t think you could split the samples in the IterableDataset as it’s used for e.g. streams of data.
In case your dataset is predefined, you could check if the stream source is able to split and shuffle the data.

1 Like