Hi guys,

I have dataset such each Data object is consists from two separate graphs (The graph are not connected - **I don’t want them to affect each other until the linear stage** of my model).

I built the data objects and dataset based on the “PairData” example here:

https://pytorch-geometric.readthedocs.io/en/latest/notes/batching.html

My Data object:

class TwoGraphsData(Data):

`def __init__(self, x_a=None, edge_index_a=None, edge_attr_a=None, x_b=None, edge_index_b=None, edge_attr_b=None, y=None, linker_size=0): super().__init__() self.x_a = x_a self.edge_index_a = edge_index_a self.edge_attr_a = edge_attr_a self.x_b = x_b self.edge_index_b = edge_index_b self.edge_attr_b = edge_attr_b self.linker_size = linker_size`

Note: The **order** of the two sub-graphs inside the Data object is **doesn’t matter**. Each sub-graph may be the ‘a’ graph or the ‘b’ graph. In fact, the model has to be order invariant.

My model has some GCNconv , pooling and linear layers.

The **forward** function for **single** graph in **regular** data object is:

x, edge_index, batch = data.x.float(), data.edge_index, data.batch

edge_attr = torch.flatten(data.edge_attr)`x = F.relu(self.conv1(x, edge_index, edge_attr)) x, edge_index, edge_attr, batch = self.pool1(x, edge_index, edge_attr, batch) x1 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1) x = F.relu(self.conv2(x, edge_index, edge_attr)) x, edge_index, edge_attr, batch = self.pool2(x, edge_index, edge_attr, batch) x2 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1) x = F.relu(self.conv3(x, edge_index, edge_attr)) x3 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1) x = F.relu(x1) + F.relu(x2) + F.relu(x3) x = torch.cat((x, torch.reshape(data.linker, (data.linker.size()[0], 1))), dim=1) x = F.relu(self.lin1(x)) x = F.dropout(x, p=self.dropout_ratio, training=self.training) x = F.relu(self.lin2(x)) x = self.lin3(x) return x`

I want to activate the **conv** and **pooling** layers on **each sub-graph separately.**

I want to activate the **linear** layers on the **concatenation** of conv+pooling of sub-graph a and conv+pooling of sub-graph b results.

**I wonder what is the right way to handle “PairData” for correct training.**

I thought about few options:

**a.** Just use each conv and pool on x_a and x_b in a row. something like:

x_a = F.relu(self.conv1(x_a, edge_index_a, edge_attr_a))

x_b = F.relu(self.conv1(x_b, edge_index_b, edge_attr_b))

x_a, edge_index_a, edge_attr_a, batch = self.pool1(x_a, edge_index_a, edge_attr_a, batch)

x_b, edge_index_b, edge_attr_b, batch = self.pool1(x_b, edge_index_b, edge_attr_b, batch)

x1_a = torch.cat([gmp(x_a, batch), gap(x_a, batch)], dim=1)

x1_b = torch.cat([gmp(x_b, batch), gap(x_b, batch)], dim=1)

…

x = F.relu(self.lin1(torch.cat((x1_a, x1_b), dim=1)))

…

return x

Actually, I’m not sure that this makes sense - how the gradients and the layers weights will be computed?

**b.** Build a new model that duplicate all the conv and pool layers (For example x_a will be passed through conv1_a, pool1_a etc. and x_b through conv1_b, poo1_b etc.).

and concatenate their outputs before the linear layers.

The problem (or not?) with this approach - conv1_a and conv1_b will hold different weights - means that I will have to train the model with some data transformation that switches between subgraphs a and subgraphs b (because I want the model to be invariant for the order of the graphs).

Another problem - I guess that this model will be heavier than the original one.

**c** Create a main model that holds two separate original models (without the linear layers) and pass subgraph a through model_a and subgraph b through model_b. Concatenate the results and pass in linear layers i the main model.

This approach is similar to b (same problems) - but I want to make sure that it makes sense.

So which model are you guys recommend?

Is there other good practice to deal with such graphs?

Thanks!