Hi
so I’ve used Pytorch-Geometric for a while and have now returned to PyTorch.
Something that I haven’t found in PyTorch, that i have grown very accustomed to in Pytorch-Geometric, is the possibility to skip preprocessing on Datasets, after you’ve done it once.
And i was wondering, if there is a similar functionality here and I’m just missing it, or if i have to do everything myself.
But what am I talking about exactly?
Well, there is a good example of it in their documentation and it is as follows.
import torch
from torch_geometric.data import InMemoryDataset
class MyOwnDataset(InMemoryDataset):
def __init__(self, root, transform=None, pre_transform=None):
super(MyOwnDataset, self).__init__(root, transform, pre_transform)
self.data, self.slices = torch.load(self.processed_paths[0])
@property
def raw_file_names(self):
return ['some_file_1', 'some_file_2', ...]
@property
def processed_file_names(self):
return ['data.pt']
def download(self):
# Download to `self.raw_dir`.
def process(self):
# Read data into huge `Data` list.
data_list = [...]
data, slices = self.collate(data_list)
torch.save((data, slices), self.processed_paths[0])
So here, the interface provides two things. A directory for the raw data and a directory for the processed data.
The dataset will execute the download function, if the raw-directory is empty.
And when the process method has been executed, the data is being saved into the processed directory. So that, if the data has already been preprocessed, it will just be loaded from a file every time i reload the dataset.
Is there a functionality like that?