[SOLVED] Can't load lsun dataset in pytorch

Hey!

Im having a problem when loading the LSUN dataset:

train_dataset = datasets.LSUN(data_path,'train',transform=train_transforms)

This line of code never completes. I’ve let it work for 30 minutes and it still wont complete. I moved the dataset over to a NVMe disk and it still didnt load.

If I interrupt it, it seems to be stuck in the same place every time:

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-16-cec41a6e3857> in <module>()
----> 1 train_dataset = datasets.LSUN(data_path,'train')

~/.local/lib/python3.5/site-packages/torchvision/datasets/lsun.py in __init__(self, db_path, classes, transform, target_transform)
    104             self.dbs.append(LSUNClass(
    105                 db_path=db_path + '/' + c + '_lmdb',
--> 106                 transform=transform))
    107 
    108         self.indices = []

~/.local/lib/python3.5/site-packages/torchvision/datasets/lsun.py in __init__(self, db_path, transform, target_transform)
     25         else:
     26             with self.env.begin(write=False) as txn:
---> 27                 self.keys = [key for key, _ in txn.cursor()]
     28             pickle.dump(self.keys, open(cache_file, "wb"))
     29         self.transform = transform

~/.local/lib/python3.5/site-packages/torchvision/datasets/lsun.py in <listcomp>(.0)
     25         else:
     26             with self.env.begin(write=False) as txn:
---> 27                 self.keys = [key for key, _ in txn.cursor()]
     28             pickle.dump(self.keys, open(cache_file, "wb"))
     29         self.transform = transform

KeyboardInterrupt: 

Loading the validation or test dataset works fine. What is going on here? I have the ms_celeb_1m dataset which is larger but still loads in only a couple of minutes.

I have no cache-files in my directory. I removed the ones that appeared after I loaded the val/test dataset.

Thanks in advance!

It just seems to be incredibly slow. I put some print statements in the LSUN dataset and it is working. Its just gonna take a long time. I tried increasing max number of workers and allowing readahead so hopefully I dont have to wait too long.

I guess I just have to open it the slow way once. After that the cache files makes loading the dataset fast.

1 Like