I am trying to train a graph neural network model and I am having problems regarding the models’ architecture. What I have is DataBatch objects that comes from `torch_geometric.data`

library.

In batch object(data), every row in `data.y`

is target variables for nodes in every graph. Batch consists of node embeddings coming from 128 different graphs.What I am interested in is the prediction of first node in every graph and I am not sure how to proceed. I read that graph attention networks are specifically used for node level regression.

```
DataBatch(x=[2634, 768], edge_index=[2, 2506], edge_attr=[2506, 1], y=[128, 131], mask=[128, 131], batch=[2634], ptr=[129])
```

Here is what I have as architecture

```
class GCNAttentionModel(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim, dropout_rate):
#super(GCNModel, self).__init__()
super(GCNAttentionModel, self).__init__()
self.conv1 = GATConv(input_dim, hidden_dim)
self.conv2 = GATConv(hidden_dim, output_dim)
self.dropout_rate = dropout_rate
def forward(self, x, edge_index, edge_attr,batch):
x = self.conv1(x, edge_index, edge_attr)
x = torch.nn.LeakyReLU()(x)
#x = F.dropout(x, p=self.dropout_rate, training=self.training)
x = self.conv2(x, edge_index, edge_attr)
x = torch.nn.LeakyReLU()(x)
#x = F.dropout(x, p=self.dropout_rate, training=self.training)
return x
```

The first level output has the size of `torch.Size([2634, 131])`

and I am not sure if it is what it is supposed to be if I am interested in specific nodes’ prediction. Should output dimension be 1 if I am interested in only one node? If that’s the case, how the target variables neighbor nodes will be aggregated and used, that’s what I couldn’t figure out. Can anyone help or suggest me a way?