Graph Neural Network is only predicting one value

Hi, so I am working on a very imbalanced dataset, and I intend to apply a binary classifier. I intend to use an Edge Classification approach, by treating each edges as nodes, since my graph’s nodes contain no features. So far, I have set up the models, and done some training and testing. However, when I look further into the outputs of my model, most of the results all belong to class 0. I was wondering whether my approach here was correct or not, or is there something wrong with my configurations. My approach uses methods from Pytorch Geometric, and here are some details:

#Graph data stored in the form of Pytorch Geometric Dataset
Data(edge_index=[2, 811277], type=[9257], edge_attr=[811277, 20], num_nodes=9257)
#Edge attributes 
tensor([[2.0100e+03, 8.0000e+00, 2.0000e+01,  ..., 0.0000e+00, 0.0000e+00,
         1.2587e-04],
        [2.0040e+03, 1.2000e+01, 1.7000e+01,  ..., 0.0000e+00, 0.0000e+00,
         1.2587e-04],
        [2.0060e+03, 1.1000e+01, 2.0000e+01,  ..., 9.5853e-05, 0.0000e+00,
         1.2587e-04],
        ...,
        [2.0080e+03, 8.0000e+00, 1.4000e+01,  ..., 1.3214e-04, 0.0000e+00,
         9.3943e-05],
        [2.0080e+03, 8.0000e+00, 1.6000e+01,  ..., 1.3214e-04, 0.0000e+00,
         3.8863e-04],
        [2.0080e+03, 8.0000e+00, 1.7000e+01,  ..., 1.3214e-04, 0.0000e+00,
         1.2587e-04]])
#Edge index
tensor([[   0,    0,    0,  ...,  163,  163,  163],
        [ 177,  177,  177,  ..., 9247, 2135, 5127]])
#y, with return_counts=True
(tensor([0., 1.]), tensor([810537,    740]))
class GCN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels = 64):
        super().__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, 2)
        self.activation = nn.ReLU()
        self.outputs = nn.Linear(2, 1)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = self.activation(x)
        x = self.conv2(x, edge_index)
        x = self.outputs(x)
        return Fnn.sigmoid(x)

        # return x 

model= GCN(in_channels=edge_attr.size(dim=1))
criterion = torch.nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.005)
for i in range(201):
    output = model(edge_attr, edge_index)    
    optimizer.zero_grad()
    loss = criterion(output, y)
    if i % 20 == 0:
        print(output)
        print(f'Epoch: {i}, Loss: {loss.item()}')
    loss.backward()
    optimizer.step()



# Disable gradient computation
with torch.no_grad():
    model.eval()
    output_test = model(edge_attr1, edge_index1)
    print(output_test)
    print(output_test.unique())

I have read a bit of discussions, and I feel like maybe because the class distribution is too imbalance, causing the model to be much more heavily inclined to detect class 0, but I have tried to undersample the data to a distribution of 10 to 1, and the model is still not predicting class 1, so I think there might be some problems with my model. I would love to hear some thoughts and comments regarding my situation, and thank you for any and all help!