When multiple ImageNet tasks are running on the same machine, the initiating of torchvision.datasets.ImageFolder
becomes extremelllly slow.
Following codes take about 2min in a single training task:
def ilsvrc2012(path, bs=256):
traindir = os.path.join(path, 'train')
valdir = os.path.join(path, 'val')
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
train_dataset = datasets.ImageFolder(
traindir,
transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize,
]))
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=bs, shuffle=True,
num_workers=8, pin_memory=True)
val_loader = torch.utils.data.DataLoader(
datasets.ImageFolder(valdir, transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
])),
batch_size=bs, shuffle=False,
num_workers=8, pin_memory=True)
return train_loader, val_loader
But when I start to run the other task that reads the same image data, the same codes unbelievably cost about half hour!
The images are stored on a SSD disk with SATA connection to the main board.
System Info:
uname -a
Linux Monster 4.13.0-45-generic #50~16.04.1-Ubuntu SMP Wed May 30 11:18:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Torch Info:
python -c "import torch; print(torch.__version__)"
0.4.1
I just wonder what causes this inconceivable performance gap.
More information can be updated upon request.