Node regression on graphs with different sizes - Pytorch Geometric

chris · April 9, 2021, 4:19pm

I have a dataset of graphs, several of them have different target feature dimensions, e.g.

Graph1 = Data(edge_attr=[9468], edge_index=[2, 9468], x=[299, 21], y=[299])
Graph2 = Data(edge_attr=[9622], edge_index=[2, 9622], x=[309, 21], y=[309])

As you see they contain a different amount of nodes, 299 vs 309, each target y is a continuous variable that I want to predict. The problem is since they are different sizes I would need a neural network with a dynamic output dimension, which is problematic. My solution would be to “pad” the graphs with nodes that have features equal to 0 and a target equal to 0 with no edge connections to make the graphs have the same number of nodes and targets, but I am not sure if this is a good solution. Do any of you have an idea of how to solve this problem? I am quite new to the field of geometric deep learning.

Thanks!

chris · April 12, 2021, 6:02pm

I solved it by padding and then making a mask for the values I want to use for training.

looc · January 3, 2023, 10:16am

I can see that padding solves it for graphs of similar sizes. But what can I do if my graph sizes differ in magnitudes, say, from 100 nodes to 10000 nodes? Especially if I don’t know if the graphs presented in inference exceed the maximum size of training graphs? Padding seems not to be a solution for this case.
If you have, for example, public transport networks for a couple of cities, and want to predict a certain property for each station. Training examples have up to N stations, but I want to apply the trained model to a city with 2*N stations.
I only recently started with GNNs, so perhaps I am missing an obvious way to apply them to a task like this.