Training graph attention network model for node level predictions

I am trying to train a graph neural network model and I am having problems regarding the models’ architecture. What I have is DataBatch objects that comes from library.
In batch object(data), every row in data.y is target variables for nodes in every graph. Batch consists of node embeddings coming from 128 different graphs.What I am interested in is the prediction of first node in every graph and I am not sure how to proceed. I read that graph attention networks are specifically used for node level regression.

DataBatch(x=[2634, 768], edge_index=[2, 2506], edge_attr=[2506, 1], y=[128, 131], mask=[128, 131], batch=[2634], ptr=[129])

Here is what I have as architecture

class GCNAttentionModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, dropout_rate):
        #super(GCNModel, self).__init__()
        super(GCNAttentionModel, self).__init__()
        self.conv1 = GATConv(input_dim, hidden_dim)
        self.conv2 = GATConv(hidden_dim, output_dim)

        self.dropout_rate = dropout_rate
    def forward(self, x, edge_index, edge_attr,batch):
        x = self.conv1(x, edge_index, edge_attr)
        x = torch.nn.LeakyReLU()(x)
        #x = F.dropout(x, p=self.dropout_rate,
        x = self.conv2(x, edge_index, edge_attr)
        x = torch.nn.LeakyReLU()(x)
        #x = F.dropout(x, p=self.dropout_rate,
        return x

The first level output has the size of torch.Size([2634, 131]) and I am not sure if it is what it is supposed to be if I am interested in specific nodes’ prediction. Should output dimension be 1 if I am interested in only one node? If that’s the case, how the target variables neighbor nodes will be aggregated and used, that’s what I couldn’t figure out. Can anyone help or suggest me a way?