Hi,
I am currently working on implementing a Graph Neural Network (GNN) for a regression task aimed at predicting revenue. The dataset consists of 43 features describing the problem setting, and based on these features, the goal is to predict the revenue.
Initially, I explored traditional regression models like Ordinary Least Squares (OLS) and Random Forest, both of which yielded respectable results with R-squared scores of 0.8 and 0.9, respectively. While these models provided valuable insights into the features, I decided to explore GNNs due to the unique characteristics of the problem.
The GNN approach involves modeling the problem as a graph, which seems well-suited to capture the inherent relationships among the features. However, my initial attempts at using a GNN resulted in predictions far from the true values, and the loss remained relatively constant.
Below, I provide an overview of the procedure, the GNN model, and some experiments I have conducted so far:
- Data Preprocessing: I applied standard scaling to the data and split it into training and test sets.
- Data Loader: I created a data loader using
torch_geometric.data
, which includes node features, edge_index, and edge_features. - GAT Model Definition: I defined the Graph Attention Network (GAT) model as follows:
class GAT(torch.nn.Module):
"""Graph Attention Network"""
def __init__(self, num_features, num_classes, hidden_channels, heads=1, edge_dimension=29):
super().__init__()
self.gat1 = GATv2Conv(num_features, hidden_channels, heads=heads, edge_dim=edge_dimension)
self.linear1 = nn.Linear(heads*hidden_channels, hidden_channels)
self.linear2 = nn.Linear(hidden_channels, hidden_channels)
self.linear3 = nn.Linear(hidden_channels, hidden_channels)
self.output = nn.Linear(hidden_channels, num_classes)
def forward(self, x, edge_index, edge_attr):
x = self.gat1(x, edge_index, edge_attr=edge_attr)
x = x.relu()
x= self.linear1(x)
x = x.relu()
x = self.linear2(x)
x = x.relu()
x = F.dropout(x, p=0.1, training=self.training)
x = self.linear3(x)
x = x.relu()
x = self.output(x)
return x
- Training Procedure : I implemented the training procedure using Mean Squared Error (MSE) loss and the Adam optimizer.
loss_func = nn.MSELoss()
optimizer = torch.optim.Adam(model_gat.parameters(), lr=0.001) #, weight_decay=5e-4)
def train_gat():
model_gat.train() # set the model to training 'mode' (i.e., apply dropout)
total_loss = []
for data in train_loader:
data.to(device)
optimizer.zero_grad()
out = model_gat(data.node, data.edge_index, data.edge_features)
loss = loss_func(out, data.y)
loss.backward() # derive gradients
optimizer.step() # update all parameters based on the gradients
total_loss.append(loss)
return sum(total_loss)/len(total_loss)
def test_gat():
model_gat.eval()
test_loss = []
with torch.no_grad():
for data in test_loader:
data.to(device)
out = model_gat(data.node, data.edge_index, data.edge_features) # propagate the data through the model
test_loss.append(loss_func(out, data.y)) # create a tensor that evaluates whether predictions were correct
return sum(test_loss)/len(test_loss), out, data.y
train_losses = []
epochs = 15
for epoch in range(1, epochs+1):
loss = train_gat() # do one training step over the entire dataset
print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')
train_losses.append(loss.cpu().detach().numpy()) # save accuracies so we can plot them
test_loss, pred, true = test_gat()
print(test_loss)
print(pred, true)
Despite several attempts to improve the GNN’s performance, such as adjusting the learning rate, using different loss functions, and increasing the model’s size, the results have not improved significantly. Additionally, I experimented with a smaller dataset containing only three observations to check if the model could overfit, but it was not successful.
Furthermore, as a comparison, I implemented a Multi-Layer Perceptron (MLP), but it also failed to produce satisfactory results. (Adjusting the data loader accordingly)
class NN(torch.nn.Module):
"""Graph Attention Network with Batch Normalization"""
def __init__(self, num_features, num_classes):
super().__init__()
self.linear1 = nn.Linear(num_features, 300)
self.batch_norm1 = nn.BatchNorm1d(300)
self.linear2 = nn.Linear(300, 50)
self.batch_norm2 = nn.BatchNorm1d(50)
self.linear3 = nn.Linear(50, 20)
self.batch_norm3 = nn.BatchNorm1d(20)
self.output = nn.Linear(20, num_classes)
def forward(self, x):
x = self.linear1(x)
x = self.batch_norm1(x)
x = F.relu(x)
x = self.linear2(x)
x = self.batch_norm2(x)
x = F.relu(x)
x = self.linear3(x)
x = self.batch_norm3(x)
x = F.relu(x)
x = self.output(x)
return x
At this point, I am seeking recommendations or insights into why the GNN approach may not be working as expected. Any help or suggestions would be greatly appreciated.