GPU training with GCN sparse matrix bug

Hi,

Feature_matrix = torch.Size([44, 156])
Adjacency_matrix = torch.Size([44, 44])

Btw, i checked all the input and the outputs, Adjacency matrix wasn’t on cuda:0 device, so i assigned it and it worked. My current problem, why does my computation is slower than my cpu, even though i have tried various batch sizes.

Please see my training step

for epoch in range(num_epoch):
model.train()

for (g, features) in train_data_loader:
    
    adj  = g.adjacency_matrix(transpose = False)
    adj = sp.coo_matrix(adj.to_dense())
    n_nodes, feat_dim = features.shape
    nodes = list(g.nodes())
    
    # Against Class Imbalance
    adj_norm = preprocess_graph(adj)
    adj_label = adj + sp.eye(adj.shape[0])
    adj_label = torch.FloatTensor(adj_label.toarray()).to(device)

    pos_weight = float(adj.shape[0] * adj.shape[0] - adj.sum()) / adj.sum()
    pos_weight = torch.from_numpy(np.array((pos_weight)))
    norm = adj.shape[0] * adj.shape[0] / float((adj.shape[0] * adj.shape[0] - adj.sum()) * 2)

    print(features.device, adj_norm.to(device).device, adj_label.device)

    recovered, mu, logvar = model(features, adj_norm.to(device))
    loss_train = loss_function(recovered, adj_label, mu, logvar, n_nodes, norm, pos_weight)
    loss = loss_train
    optimizer.zero_grad()
    loss.backward()
    cur_loss = loss.item()
    optimizer.step()

Additionally, all inputs are on cuda:0 but cpu takes all the computation. Am i missing anything?

Thank you