Sample from multiple datasets

I have multiple datasets of different length and I want to sample them with different rates, is there any function like

in pytorch, I appreciate any implementation achieving this in pytorch, thanks @ptrblck please help me :slight_smile:

1 Like

what prevent them to have the same length? Why not preorganize them to have the same length

If I understand your use case correctly, you have e.g.:

dataset1 = ...
dataset2 = ...

> 10
> 100

Now you would like to create a “parent dataset” containing both and use some weighted sampling to e.g. oversample dataset1 and create balanced batches from both datasets?