ImageFolder() hangs in wait forever

I tried to use Amos’ DenseNet and Soumith’s ImageNet examples and replaced CIFAR10 dataset with ImageFolder dataset:

Original:

trainLoader = DataLoader(
        dset.CIFAR10(root='cifar', train=True, download=True,
                     transform=trainTransform),
        batch_size=args.batchSz, shuffle=True, **kwargs)
    testLoader = DataLoader(
        dset.CIFAR10(root='cifar', train=False, download=True,
                     transform=testTransform),
        batch_size=args.batchSz, shuffle=False, **kwargs)

Modified:

trainLoader = DataLoader(
    dset.ImageFolder(root='/home/FC/data/P/train',  transform=trainTransform),
    batch_size=args.batchSz, shuffle=True, **kwargs)
testLoader = DataLoader(
    dset.ImageFolder(root='/home/FC/data/P/val',  transform=testTransform),
    batch_size=args.batchSz, shuffle=False, **kwargs)

But loading process hangs forever. Keyboard interrupt shows:

^CTraceback (most recent call last):
  File "train.py", line 291, in <module>
    main()
  File "train.py", line 132, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "train.py", line 157, in train
    for i, (input, target) in enumerate(train_loader):
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 168, in __next__
    idx, batch = self.data_queue.get()
  File "/conda3/envs/idp/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/conda3/envs/idp/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
KeyboardInterrupt

What should I do to resolve this?
Folder structure is following with 2048x2048 PNGs:

/home/FC/Data/P
-> train -> classes -> images.png
-> val -> classes -> images.png

Could you post your train.py file?

Sure thing Kaiser, here is my full train.py in a gist:

I have tweaked only a couple of lines from Brandon Amos’s original DenseNet implementation, by replacing CIFAR10 loaded with ImageFolder loader:

Thanks!

Some things I would try:

  1. have you tried to access your datasets instance through the interactive python console (if that works, does creating a data loader interactively work too?)
  2. https://github.com/bamos/densenet.pytorch/blob/master/train.py#L101 (background commands are always a bit “difficult”)
  3. https://github.com/bamos/densenet.pytorch/blob/master/train.py#L67 (you need to use pin_memory? - do not know this one)
1 Like

Thanks Kaiser - I am starting debugging now and will look in to all three points you mentioned.

Also, are you running your script in Docker?

Hi Adam,
Yes - running the scripts in nvidia-docker with --ipc=host.

I got some strange behavior, so opened an issue about it here. https://github.com/pytorch/pytorch/issues/1120

Looks like models are hardcoded with 224x224 dimensions. I am looking into it now.