Custom data loading process become too slow

kkjh0723 · February 20, 2018, 12:42pm

I modified the imagenet example for training on my own dataset and it become quite slower than before. I’m not sure what is the main reason.

First, my dataset have a list of labeled [images, labels] and another list of unlabeled images. So I modified _getitem__ in ImageFolder class as follows,

def __getitem__(self, index):
    """
    Args:
        index (int): Index
    Returns:
        tuple: (image, target) where target is class_index of the target class.
    """
    pindex = index + self.midx * self.nimgs
    path, target = self.imgs[index]
    pathu, _ = self.imgus[pindex]
    img = self.loader(path)
    imgu = self.loader(pathu)
    if self.transform is not None:
        img = self.transform(img)
        imgu = self.transform(imgu)
    if self.target_transform is not None:
        target = self.target_transform(target)

    return img, target, imgu

self.imgus is the added list of unlabeled images.

Then I changed training code as follows,

def train(train_loader, model, criterion, optimizer, epoch):

    batch_time = AverageMeter()
    data_time = AverageMeter()
    var_time = AverageMeter()
    model_time = AverageMeter()
    ...
    top1 = AverageMeter()
    top5 = AverageMeter()

    # switch to train mode
    model.train()

    # set midx
    train_loader.dataset.midx = epoch % train_loader.dataset.max_midx
    print(epoch, train_loader.dataset.midx)
    end = time.time()
    for i, (input, target, inputu) in enumerate(train_loader):

        # measure data loading time
        dtime = time.time()
        data_time.update(dtime - end)

        target = target.cuda(async=True)
        input_var = torch.autograd.Variable(input)
        target_var = torch.autograd.Variable(target)
        inputu_var = torch.autograd.Variable(inputu)
        input_concat_var = torch.cat([input_var, inputu_var])
        vtime = time.time()
        var_time.update(vtime - dtime)
        # compute output
        output = model(input_concat_var)
        mtime = time.time()
        model_time.update(mtime - vtime)
        ...

Now, in the for loop, I got batch of input, target and inputu(unlabeled image),
change each of them into Variable,
and concatenate labeled and unlabeled images before feed into the model.

In order to check where the code get slower, I added var_time and model_time as in the code.
Following is one part of the log on the terminal,

Epoch:	[2][0/4180]	Time	11.312	-11.312	Data	8.702	-8.702	Var	1.706	-1.706	Model	0.481	-0.481
Epoch:	[2][100/4180]	Time	47.901	-80.702	Data	0.001	-0.087	Var	46.429	-79.423	Model	1.021	-0.765
Epoch:	[2][200/4180]	Time	11.375	-69.206	Data	0.001	-0.044	Var	10.028	-67.958	Model	0.93	-0.779
Epoch:	[2][300/4180]	Time	9.444	-64.922	Data	0.001	-0.03	Var	8.087	-63.683	Model	0.934	-0.783
Epoch:	[2][400/4180]	Time	10.702	-62.866	Data	0.001	-0.023	Var	9.8	-61.639	Model	0.488	-0.777
Epoch:	[2][500/4180]	Time	93.547	-63.055	Data	0.001	-0.019	Var	92.354	-61.813	Model	0.78	-0.796
Epoch:	[2][600/4180]	Time	104.527	-60.569	Data	0.001	-0.016	Var	103.357	-59.318	Model	0.761	-0.808
Epoch:	[2][700/4180]	Time	1.772	-57.497	Data	0.001	-0.014	Var	0.726	-56.248	Model	0.639	-0.809
Epoch:	[2][800/4180]	Time	1.706	-50.549	Data	0.001	-0.012	Var	0.865	-49.337	Model	0.39	-0.776
Epoch:	[2][900/4180]	Time	1.741	-45.143	Data	0.001	-0.011	Var	0.945	-43.96	Model	0.392	-0.75
Epoch:	[2][1000/4180]	Time	1.879	-40.818	Data	0.001	-0.01	Var	0.918	-39.658	Model	0.564	-0.729
Epoch:	[2][1100/4180]	Time	1.879	-37.277	Data	0.002	-0.009	Var	0.881	-36.136	Model	0.588	-0.712

You can see that batch_time fluctuated a lot and major reason of the increase seems come from var_time. var_time becomes very large and ranges from 1.x to 100.x. I understand that concat operation make some increase in time(1.x) but its weird that it goes up to hundreds.
I don’t know what makes it so slow. When I see the htop or nvidia-smi during that perioid, both cpus and gpus are not used much (almost not used).

Is there any problem in my modified code? Or can it be a hardware problem?
I’m running on 8GPUs with 16 workers, batch size is 384 (192 each for labeled and unlabeled image).

icortes · September 22, 2018, 5:13am

I am experiencing a similar issue. Any comments on this thread? Thanks