PyG batch generation on the fly

AleTL · March 18, 2022, 12:54pm

Hello!

I have tabular data and what I want is to take these data, transform them into several graphs, and input these graphs to some GNN. Pytorch Geometric allow to generate my own dataset both in memory and storing it. But what I want is to generate the graphs on the fly. I don’t want to process the whole tabular data, generate the graphs and then feed the GNN; what I want is to generate the graphs on the fly and directly feed them to the GNN. Everytime I need a new batch of graphs, I take the tabular data, generate the graphs, feed them and then remove them and start again with new data.

Does anyone know if this is possible or how should I do it? I’ve been trying to look for information or code snips but I’ve found nothing.

Thank you!

ejguan · March 22, 2022, 3:40pm

If you choose to use Dataset in PyTorch, you can implement all data retrieving logic inside __getitem__ function to do data generation on the fly.

Or, you can choose to use torchdata GitHub - pytorch/data: A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries., which utilize iterator-style datapipe to help you construct data pipeline on the fly.