I’ve been stuck with this for a while now, and I’m certain this is a problem people have dealt with in the past.
So I’m trying to use torch dataloaders on a Google Cloud VM with a GPU attached. I want to run training on ImageNet. So my options are to either get a persistent disk and download ImageNet onto it, or otherwise download ImageNet onto a GCS bucket.
If I were to go the bucket route, does anyone know what the best way to then have this interface with a torch dataloader would be? I’ve looked into WebDataset as well, and it seems cool - but I’d likely have to pass in URLs (which isn’t necessarily a problem).
Long story short, I’m not sure if I should be mounting the bucket, or if things “just work” if I were to use WebDataset, and overall just what the best/most economical path is going forward.
Hi,
I’ve also started to check this option out.
what I have found is the following:
there is an issue with setting credentials to fsspec in the FSSpecFileListerIterDataPipe. a possible solution is to add a token argument to fsspec.core.url) fs, path = fsspec.core.url_to_fs(self.root, token=path/to/json)
the output from fs.protocol is a tuple ('gcs', 'gs') .
I’ve currently patched the code to take the relevant prefix for our GCS path.
I hope to update once I get the pipeline running
BTW - currently investigating 2 strange issues with the hotfix of the token
I tried to implement the datapipline/dataloader tutorial - but when trying to iterate on DataLoader the system doesn’t do anything, doesn’t time out but keeps of running…
...
for idx, batch in enumerate(dl):
print(idx)
Another issue is that when running twice the following example - on the first time the system works, but on the second time (without restarting the kernel) the fs.protocole in the open_file_by_fsspec class returns a file_uri without gs:// thus failing
datapipe = FSSpecFileLister(root=image_bucket, masks=['*.png'])
file_dp = datapipe.open_file_by_fsspec(mode='rb')
ds = Mapper(file_dp, PIL_open)
for i in ds:
print(f'{i=}')
show_image(i)
For 1, can you let us know what version of the libraries (torch and torchdata) you have installed? And what is the exact code snippet that you are running?
By the hotfix, are you running the code from this PR or the fix that you described above?
again - you get a FileNotFoundError .
Once I restart the kernel it works.
If I run the script as below there is no problem - I’m guessing it is something regarding the how jupyter kernel uses fsspec
python my_script.py
there is no problem. something is
the hotfix - is my local one - where I added thetoken= to fsspec.core.url_to_fs