Dynamic Graph Representation Puzzle

I’ve run into an interesting design problem in my application. I need to support graphs (in the machine learning context, i.e. with associated node/edge features) that have variable size (in terms of number of nodes/edges).

My torch.nn.Module has already been adapted to handle variable sized graphs, and works great, but now I’m turning to the data representation.

Specifically, to take the batch size 1 case, we start with an “empty” graph. At each step of a sampling loop, we either terminate sampling of the graph, or sample a new node type from a given vocabulary and identify which existing nodes it ought to be connected to, and then repeat the sampling loop, until we sample the termination token.

Independent of how I represent a graph, or a batch of graphs, my design objectives are:

  1. Ability to operate on batches of variably-sized graphs, not just one at a time.
  2. The representation must be amenable to the addition of nodes/edges during the sampling procedure.
  3. Throughput: I would like to minimize irregularity in how each graph in the batch is manipulated, perhaps at the cost of some masking logic.

I’ve considered some fairly complicated solutions to this problem, but I’m curious if any data ninjas in the community might offer some advice. I imagine it’s fairly unusual (graphs are commonplace, but typically they have a known static size since they are given and not sampled), but would love any input.