Custom data loading process become too slow

I modified the imagenet example for training on my own dataset and it become quite slower than before. I’m not sure what is the main reason.

First, my dataset have a list of labeled [images, labels] and another list of unlabeled images. So I modified _getitem__ in ImageFolder class as follows,

def __getitem__(self, index):
    """
    Args:
        index (int): Index
    Returns:
        tuple: (image, target) where target is class_index of the target class.
    """
    pindex = index + self.midx * self.nimgs
    path, target = self.imgs[index]
    pathu, _ = self.imgus[pindex]
    img = self.loader(path)
    imgu = self.loader(pathu)
    if self.transform is not None:
        img = self.transform(img)
        imgu = self.transform(imgu)
    if self.target_transform is not None:
        target = self.target_transform(target)

    return img, target, imgu

self.imgus is the added list of unlabeled images.

Then I changed training code as follows,

def train(train_loader, model, criterion, optimizer, epoch):

    batch_time = AverageMeter()
    data_time = AverageMeter()
    var_time = AverageMeter()
    model_time = AverageMeter()
    ...
    top1 = AverageMeter()
    top5 = AverageMeter()

    # switch to train mode
    model.train()

    # set midx
    train_loader.dataset.midx = epoch % train_loader.dataset.max_midx
    print(epoch, train_loader.dataset.midx)
    end = time.time()
    for i, (input, target, inputu) in enumerate(train_loader):

        # measure data loading time
        dtime = time.time()
        data_time.update(dtime - end)

        target = target.cuda(async=True)
        input_var = torch.autograd.Variable(input)
        target_var = torch.autograd.Variable(target)
        inputu_var = torch.autograd.Variable(inputu)
        input_concat_var = torch.cat([input_var, inputu_var])
        vtime = time.time()
        var_time.update(vtime - dtime)
        # compute output
        output = model(input_concat_var)
        mtime = time.time()
        model_time.update(mtime - vtime)
        ...

Now, in the for loop, I got batch of input, target and inputu(unlabeled image),
change each of them into Variable,
and concatenate labeled and unlabeled images before feed into the model.

In order to check where the code get slower, I added var_time and model_time as in the code.
Following is one part of the log on the terminal,

Epoch: [2][0/4180] Time 11.312 -11.312 Data 8.702 -8.702 Var 1.706 -1.706 Model 0.481 -0.481
Epoch: [2][100/4180] Time 47.901 -80.702 Data 0.001 -0.087 Var 46.429 -79.423 Model 1.021 -0.765
Epoch: [2][200/4180] Time 11.375 -69.206 Data 0.001 -0.044 Var 10.028 -67.958 Model 0.93 -0.779
Epoch: [2][300/4180] Time 9.444 -64.922 Data 0.001 -0.03 Var 8.087 -63.683 Model 0.934 -0.783
Epoch: [2][400/4180] Time 10.702 -62.866 Data 0.001 -0.023 Var 9.8 -61.639 Model 0.488 -0.777
Epoch: [2][500/4180] Time 93.547 -63.055 Data 0.001 -0.019 Var 92.354 -61.813 Model 0.78 -0.796
Epoch: [2][600/4180] Time 104.527 -60.569 Data 0.001 -0.016 Var 103.357 -59.318 Model 0.761 -0.808
Epoch: [2][700/4180] Time 1.772 -57.497 Data 0.001 -0.014 Var 0.726 -56.248 Model 0.639 -0.809
Epoch: [2][800/4180] Time 1.706 -50.549 Data 0.001 -0.012 Var 0.865 -49.337 Model 0.39 -0.776
Epoch: [2][900/4180] Time 1.741 -45.143 Data 0.001 -0.011 Var 0.945 -43.96 Model 0.392 -0.75
Epoch: [2][1000/4180] Time 1.879 -40.818 Data 0.001 -0.01 Var 0.918 -39.658 Model 0.564 -0.729
Epoch: [2][1100/4180] Time 1.879 -37.277 Data 0.002 -0.009 Var 0.881 -36.136 Model 0.588 -0.712

You can see that batch_time fluctuated a lot and major reason of the increase seems come from var_time. var_time becomes very large and ranges from 1.x to 100.x. I understand that concat operation make some increase in time(1.x) but its weird that it goes up to hundreds.
I don’t know what makes it so slow. When I see the htop or nvidia-smi during that perioid, both cpus and gpus are not used much (almost not used).

Is there any problem in my modified code? Or can it be a hardware problem?
I’m running on 8GPUs with 16 workers, batch size is 384 (192 each for labeled and unlabeled image).

I am experiencing a similar issue. Any comments on this thread? Thanks