Pre_transform adds feature dimension pytorch geometric

FloorEijkelboom · March 21, 2022, 7:41pm

Hi! When using a pre_transform on my dataset, the resulting graphs have different feature dimensions than when the transform is applied to some graph (i.e. outside the pre_transform of a dataset). The same happens when considering batches, the pre_transformed batches have an extra feature dimension, where the transform applied by the dataloader real-time does not have this extra dimension.

Example:

def some_transform(graph: Data) -> Data:
    feature_dimension = graph.x.shape[1]
    feature = torch.ones((1, feature_dimension))

    graph.extra_feature = feature

    return graph

dataset = TUDataset('./datasets/TUDataset/PROTEINS', name='PROTEINS', pre_transform=some_transform)
dataset = dataset.shuffle()
loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

for batch in train_loader:
      print(batch.x.shape)
      print(batch.extra_feature.shape)

yields

>>> torch.Size([4313, 3])
>>> torch.Size([128, 4])

Why could this be the case? Thanks in advance!

nivek · March 21, 2022, 9:53pm

It is hard to tell without seeing what each sample and TUDataset look like.

It may be because DataLoader performs collation on each batch of data. Specifically, you may want to check if the default collation is applied.