HeteroData and HGTLoader

SKTheLearner · November 23, 2023, 3:02am

Hi
I have a large HeteroData like:
HeteroData(
mol={ x=[876, 384] },
gene={ x=[18211, 18211] },
(mol, perts, gene)={
edge_index=[2, 11181554],
edge_label=[11181554],
},
(gene, rev_perts, mol)={
edge_index=[2, 11181554],
edge_label=[11181554],
}
)

can someone guide me to create mini-batches for my training…

I tried the following…
a. split the data into train,val,set using RandomLinkSplit
b. created HGTLoader using train like so
train_loader = HGTLoader(
train,
# Sample 512 nodes per type and per iteration for 4 iterations
num_samples={key: [16] * 2 for key in train_data.node_types},
# Use a batch size of 128 for sampling training nodes of type paper
batch_size=16,
input_nodes=(‘mol’,None),
)

 c. Then in train loop i used for t in train_loader:
     but model throws error...

     Input In [12], in EdgeDecoder.forward(self, z_dict, edge_label_index)
    19 def forward(self, z_dict, edge_label_index):
    20     row, col = edge_label_index
    ---> 21     z = torch.cat([z_dict['mol'][row], z_dict['gene'][col]], dim=-1)
   23     z = self.lin1(z).relu()
   24     z = self.lin2(z)

   IndexError: index 859 is out of bounds for dimension 0 with size 32
   quite obvious it is not finding the edge_label_index in the batch...
    (how can I make sure I pass the correct label_indices for the current batch?)

QUESTIONS:
a. am I using HGT Loader correctly or is HGTLoader is the right one to use ?
b. am I iterating thru the batches correctly ?

@ptrblck et al