Sample from multiple datasets

Hi
I have multiple datasets of different length and I want to sample them with different rates, is there any function like

tf.data.experimental.sample_from_datasets

in pytorch, I appreciate any implementation achieving this in pytorch, thanks @ptrblck please help me :slight_smile:

1 Like

what prevent them to have the same length? Why not preorganize them to have the same length

If I understand your use case correctly, you have e.g.:

dataset1 = ...
dataset2 = ...

print(len(dataset1)) 
> 10
print(len(dataset2))
> 100

Now you would like to create a “parent dataset” containing both and use some weighted sampling to e.g. oversample dataset1 and create balanced batches from both datasets?