Regression using Pytorch Geometric

HI. I’m new at geometric deep learning and gcnn. I want to train a gcnn model for predicting a feature as a regression problem. my code is below

import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv


class GCN(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = GCNConv(data.num_node_features, 100)
        self.conv2 = GCNConv(100, 16)
        self.conv3 = GCNConv(16, data.num_node_features)
        self.linear1 = torch.nn.Linear(104,1)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index

        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)
        x = self.conv3(x, edge_index)
        x = self.linear1(x)
        return F.log_softmax(x, dim=1)

import torch.nn as nn
device = torch.device('cpu')
model = GCN().to(device)
model = model.double()
data = data.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

model.train()
for epoch in range(5):
    optimizer.zero_grad()
    out = model(data)
    loss = F.mse_loss(out.squeeze(), data.y.squeeze())
    loss.backward()
    optimizer.step()
    print(f'Epoch: {epoch}, Loss: {loss}')

I am getting nan loss. what are the problems with this?

Also is there any blogs of solving regression problem using pytorch geometric?
Thanks

The usage of F.log_softmax looks wrong or at least uncommon for a regression use case. Could you describe what your target is containing?

Thanks for replying.
My dataset is a traffic dataset. Where target is predicting time for a vehicle from the other features. Here y is time, x is the node_features and in edge_attr edge indexs are saved.
Moreover, I tried not using F.log_softmax
in that case I returned ```x`` but the nan error is same.

@Thfuad did you get your answer?

Yeah I solved that. There were NAN values in the dataset which I thought I removed in pre processing steps. you should check this out.

getting nan in loss can be happened for one of following reasons-

  1. There is nan data in the dataset.
  2. Using relu function sometimes gives nan output. (Use leaky-relu instead)
  3. Sometimes zero into square_root from torch gives nan output.
  4. Using wrong loss. (eg. classification loss in regression problem)
1 Like

Actually I want to do regression using PyG, could you please help me with this error. I am having my data in excel sheet with three columns as Feature1,Feature2 and Feature3 and forth col as Target value. I am writing the script as give below:

import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import torch_geometric
from torch_geometric.data import Data, DataLoader
from torch_geometric.nn import GCNConv
from sklearn.model_selection import train_test_split
from torch.utils.data import TensorDataset

# Load your dataset
data = pd.read_excel('/..../Data_GCN.xlsx')
features = data.iloc[:, 0:3].values  # Assuming the first three columns are your features
target = data.iloc[:, 3].values  # Assuming the fourth column is your target
#%%

# Create a cyclic graph
num_nodes = features.shape[0]
edge_index = torch.tensor([(i, (i + 1) % num_nodes) for i in range(num_nodes)], dtype=torch.long).t().contiguous()
#%%

# Convert features and target to PyTorch tensors
x = torch.tensor(features, dtype=torch.float32)
y = torch.tensor(target, dtype=torch.float32).view(-1, 1)
#%%
# Create a PyTorch Geometric Data object
data = Data(x=x, edge_index=edge_index, y=y)
#%%
# Split your data into training and validation sets
train_data, val_data = train_test_split(data, test_size=0.2, random_state=42)

After this last line I am getting error : Traceback (most recent call last):

File ~/anaconda3/envs/…lib/python3.11/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
exec(code, globals, locals)

File ~/Documents/…/Codes/IITK/gcn_reg.py:30
train_data, val_data = train_test_split(data, test_size=0.2, random_state=42)

File ~/anaconda3/envs/,/lib/python3.11/site-packages/sklearn/utils/_param_validation.py:211 in wrapper
return func(*args, **kwargs)

File ~/anaconda3/envs/,/lib/python3.11/site-packages/sklearn/model_selection/_split.py:2640 in train_test_split
return list(

File ~/anaconda3/envs/…/lib/python3.11/site-packages/sklearn/model_selection/_split.py:2642 in
(_safe_indexing(a, train), _safe_indexing(a, test)) for a in arrays

File ~/anaconda3/envs/…/lib/python3.11/site-packages/sklearn/utils/init.py:357 in _safe_indexing
return _list_indexing(X, indices, indices_dtype)

File ~/anaconda3/envs/Krishna/lib/python3.11/site-packages/sklearn/utils/init.py:211 in _list_indexing
return [X[idx] for idx in key]

File ~/anaconda3/envs/…/lib/python3.11/site-packages/sklearn/utils/init.py:211 in
return [X[idx] for idx in key]

File ~/anaconda3/envs/,/lib/python3.11/site-packages/torch_geometric/data/data.py:457 in getitem
return self._store[key]

File ~/anaconda3/envs/…/lib/python3.11/site-packages/torch_geometric/data/storage.py:104 in getitem
return self._mapping[key]

KeyError: 1

You can assume data in excel and please guide what to do in this case. My Adj matrix is created above, Actually in my case it is a cyclic graph each node is connected to its next node and last one to first node, as it is a time series data.

Another question is: Is it fine to format data in such a way or any specific Class data is needed here? I wanted to know what is the standard data format to input to a GCNCOnv or GNN to process.

The data should be in pytorch geometric format where there will be features(X), output(y), and edges of the graph. You can follow the pytorch geometric documentations for GNN or GCNConv, or GraphSAGE