Graph Neural Network for regression purpose not updating

Hi,

I am currently working on implementing a Graph Neural Network (GNN) for a regression task aimed at predicting revenue. The dataset consists of 43 features describing the problem setting, and based on these features, the goal is to predict the revenue.

Initially, I explored traditional regression models like Ordinary Least Squares (OLS) and Random Forest, both of which yielded respectable results with R-squared scores of 0.8 and 0.9, respectively. While these models provided valuable insights into the features, I decided to explore GNNs due to the unique characteristics of the problem.

The GNN approach involves modeling the problem as a graph, which seems well-suited to capture the inherent relationships among the features. However, my initial attempts at using a GNN resulted in predictions far from the true values, and the loss remained relatively constant.

Below, I provide an overview of the procedure, the GNN model, and some experiments I have conducted so far:

  1. Data Preprocessing: I applied standard scaling to the data and split it into training and test sets.
  2. Data Loader: I created a data loader using torch_geometric.data, which includes node features, edge_index, and edge_features.
  3. GAT Model Definition: I defined the Graph Attention Network (GAT) model as follows:
class GAT(torch.nn.Module):
    """Graph Attention Network"""
    def __init__(self, num_features, num_classes, hidden_channels, heads=1, edge_dimension=29):
        super().__init__()
        self.gat1 = GATv2Conv(num_features, hidden_channels, heads=heads, edge_dim=edge_dimension)

        self.linear1 = nn.Linear(heads*hidden_channels, hidden_channels)
        self.linear2 = nn.Linear(hidden_channels, hidden_channels)
        self.linear3 = nn.Linear(hidden_channels, hidden_channels)

        self.output = nn.Linear(hidden_channels, num_classes)

    def forward(self, x, edge_index, edge_attr):

        x = self.gat1(x, edge_index, edge_attr=edge_attr)
        x = x.relu()

        x= self.linear1(x)
        x = x.relu()

        x = self.linear2(x)
        x = x.relu()
        x = F.dropout(x, p=0.1, training=self.training)

        x = self.linear3(x)
        x = x.relu()

        x = self.output(x)
        return x
  1. Training Procedure : I implemented the training procedure using Mean Squared Error (MSE) loss and the Adam optimizer.
loss_func = nn.MSELoss()
optimizer = torch.optim.Adam(model_gat.parameters(), lr=0.001) #, weight_decay=5e-4)

def train_gat():
    model_gat.train() # set the model to training 'mode' (i.e., apply dropout)
    total_loss = []
    for data in train_loader:
        data.to(device)
        optimizer.zero_grad()

        out = model_gat(data.node, data.edge_index, data.edge_features)

        loss = loss_func(out, data.y)
        loss.backward() # derive gradients
        optimizer.step() # update all parameters based on the gradients
        total_loss.append(loss)
    return sum(total_loss)/len(total_loss)

def test_gat():
    model_gat.eval()
    test_loss = []
    with torch.no_grad():
        for data in test_loader:
            data.to(device)
            out = model_gat(data.node, data.edge_index, data.edge_features) # propagate the data through the model
            test_loss.append(loss_func(out, data.y)) # create a tensor that evaluates whether predictions were correct
    return sum(test_loss)/len(test_loss), out, data.y


train_losses = []
epochs = 15
for epoch in range(1, epochs+1):
    loss = train_gat() # do one training step over the entire dataset
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')
    train_losses.append(loss.cpu().detach().numpy()) # save accuracies so we can plot them

test_loss, pred, true = test_gat()
print(test_loss)

print(pred, true)

Despite several attempts to improve the GNN’s performance, such as adjusting the learning rate, using different loss functions, and increasing the model’s size, the results have not improved significantly. Additionally, I experimented with a smaller dataset containing only three observations to check if the model could overfit, but it was not successful.

Furthermore, as a comparison, I implemented a Multi-Layer Perceptron (MLP), but it also failed to produce satisfactory results. (Adjusting the data loader accordingly)

class NN(torch.nn.Module):
    """Graph Attention Network with Batch Normalization"""
    def __init__(self, num_features, num_classes):
        super().__init__()

        self.linear1 = nn.Linear(num_features, 300)
        self.batch_norm1 = nn.BatchNorm1d(300)
        self.linear2 = nn.Linear(300, 50)
        self.batch_norm2 = nn.BatchNorm1d(50)
        self.linear3 = nn.Linear(50, 20)
        self.batch_norm3 = nn.BatchNorm1d(20)

        self.output = nn.Linear(20, num_classes)

    def forward(self, x):
        x = self.linear1(x)
        x = self.batch_norm1(x)
        x = F.relu(x)

        x = self.linear2(x)
        x = self.batch_norm2(x)
        x = F.relu(x)

        x = self.linear3(x)
        x = self.batch_norm3(x)
        x = F.relu(x)

        x = self.output(x)
        return x

At this point, I am seeking recommendations or insights into why the GNN approach may not be working as expected. Any help or suggestions would be greatly appreciated.