Hi, I am trying to make a custom Dataset with pytorch geometric. I need to replace the last row of __init__
of the class in Creating Your Own Datasets — pytorch_geometric 1.6.3 documentation to adapt it for multiple loads because I have several files to load in the same dataset.
From what I understood (I can’t find in the docs), torch.load(file_path)
returns a tuple with Data and slice. How can I combine these two for multiple files keeping the object structure? It is a little hard to explain it, I hope you understand my problem. Thank you
It looks like you can pass an io.BytesIO() into torch.load. I would try using the io.BytesIO to read from multiple files and then load that into torch.load() with a singular buffer.
Ok, but I only moved the problem, how can I concatenate those BytesIO? I tried this
buffer = io.BytesIO()
for file in self.processed_paths:
with open(file, 'rb') as f:
buffer.write(f.read())
self.data, self.slices = torch.load(buffer)
but I get an EOFError when loading
Try doing buffer.seek(0) before torch.load()
With seek changed the error. Now I have:
RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: invalid header or archive is corrupted
It kind makes sense of this error, I only concatenated some buffers. I don’t think that it is a well built structure for torch to load, isn’t it? The files are working independently, this isn’t a file divided in multiple files for some reason.
Let’s say I have a.pt
and b.pt
, inside a.pt I have a dataset, inside b.pt I have another that I have to merge together. If I run torch.load('a.pt')
it works, same thing for b.pt.