PyTorch Geometric -- how make own dataset of multiple graphs?

David_Ireland · June 26, 2021, 11:19pm

I have a list of multiple Data objects that are all independent graphs. How do I make a dataset out of this, so that it is like the built in datasets in the tutorial here? I have tried the tutorial on making your own dataset but I have absolutely no idea how to make sense of it (note I am experienced with PyTorch but not so much with custom data sets, usually they are not needed).

All I want to do is make this list of Data objects a dataset so that it has the same functionality as in the tutorial but this seems impossible to do – if someone could help I would be extremely grateful, thanks.

Anil_Kamat · October 13, 2021, 4:48am

I am also looking for this answer. Most of the online sources only talk about the in-built dataset but to apply GNN we need to synthesize our own data. if you have found the answer do share. Thanks.
I think the following method from the PyG doc will work but it is not an efficient way to do::

from torch_geometric.data import Data
from torch_geometric.loader import DataLoader

data_list = [Data(...), ..., Data(...)]
loader = DataLoader(data_list, batch_size=32)

Here, each Data object should be created in a loop.

Animesh_Basak_Chowdh · March 15, 2022, 8:21pm

Checkout these two files:

github.com

NYU-MLDA/OpenABC/blob/master/datagen/utilities/PyGDataAIG.py

import os
import shutil

import pandas as pd
import networkx as nx
import glob
import pickle
import copy

from typing import Optional, Tuple

import torch
from torch import Tensor
from torch.utils.dlpack import to_dlpack, from_dlpack
import scipy.sparse
import zipfile
import argparse

import torch_geometric
import torch_geometric.data

This file has been truncated. show original

github.com

NYU-MLDA/OpenABC/blob/master/models/qor/SynthNetV1/netlistDataset.py

import os.path as osp
import torch
from zipfile import ZipFile

import pandas as pd
from torch_geometric.data import Dataset, download_url


class NetlistGraphDataset(Dataset):
    def __init__(self, root, filePath, transform=None, pre_transform=None):
        self.filePath = osp.join(root, filePath)
        super(NetlistGraphDataset, self).__init__(root, transform, pre_transform)

    @property
    def processed_file_names(self):
        fileDF = pd.read_csv(self.filePath)
        return fileDF['fileName'].tolist()

    def len(self):
        return len(self.processed_file_names)

This file has been truncated. show original

I have created first done the preprocessing and stored .pt files in processed directory. Later, I’m loading it via NetlistGraphDataset. It is flexible if one wants to create different train/test split on different graph objects.