Dataloader questions for torch geometric

Thibault712 · April 22, 2022, 4:24pm

Hi everyone,

I have two questions about torch geometric:

When using the Dataloader (the one of torch geometric not pytorch).

_Is it possible to load the batches on the fly?

I have a big dataset of millions entries I can’t load a list of a millions of graphs at the same time, however I can manage to load them batch/batch

_ Is it possible that the loader automatically lowers the batch size if it can’t manage it?

My graphs are not homogeneous sizewise, sometimes it will easily handle a batch of 100 and then struggle with a batch of 20. That would be great if there was an option that prevent to load a batch with too much nodes/edges and instead just lower the batchsize exceptionally.

ejguan · April 25, 2022, 8:07pm

Yes, it is. If you use Dataset class, you can move your data-loading logic into __getitem__ function rather than in __init__ function.

You should be able to achieve such logic by using custom BatchSampler. If you know the size of each data entry, you can use the list of sizes to aggregate a list of data indices for every single batch.