Training graph attention network model for node level predictions

I am trying to train a graph neural network model and I am having problems regarding the models’ architecture. What I have is DataBatch objects that comes from torch_geometric.data library.
In batch object(data), every row in data.y is target variables for nodes in every graph. Batch consists of node embeddings coming from 128 different graphs.What I am interested in is the prediction of first node in every graph and I am not sure how to proceed. I read that graph attention networks are specifically used for node level regression.

DataBatch(x=[2634, 768], edge_index=[2, 2506], edge_attr=[2506, 1], y=[128, 131], mask=[128, 131], batch=[2634], ptr=[129])

Here is what I have as architecture

class GCNAttentionModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, dropout_rate):
        
        #super(GCNModel, self).__init__()
        super(GCNAttentionModel, self).__init__()
        self.conv1 = GATConv(input_dim, hidden_dim)
        self.conv2 = GATConv(hidden_dim, output_dim)

        self.dropout_rate = dropout_rate
        
        
    def forward(self, x, edge_index, edge_attr,batch):
        x = self.conv1(x, edge_index, edge_attr)
        x = torch.nn.LeakyReLU()(x)
        #x = F.dropout(x, p=self.dropout_rate, training=self.training)
        x = self.conv2(x, edge_index, edge_attr)
        x = torch.nn.LeakyReLU()(x)
        #x = F.dropout(x, p=self.dropout_rate, training=self.training)
        return x

The first level output has the size of torch.Size([2634, 131]) and I am not sure if it is what it is supposed to be if I am interested in specific nodes’ prediction. Should output dimension be 1 if I am interested in only one node? If that’s the case, how the target variables neighbor nodes will be aggregated and used, that’s what I couldn’t figure out. Can anyone help or suggest me a way?