I modified the imagenet example for training on my own dataset and it become quite slower than before. I’m not sure what is the main reason.
First, my dataset have a list of labeled [images, labels] and another list of unlabeled images. So I modified _getitem__
in ImageFolder
class as follows,
def __getitem__(self, index):
"""
Args:
index (int): Index
Returns:
tuple: (image, target) where target is class_index of the target class.
"""
pindex = index + self.midx * self.nimgs
path, target = self.imgs[index]
pathu, _ = self.imgus[pindex]
img = self.loader(path)
imgu = self.loader(pathu)
if self.transform is not None:
img = self.transform(img)
imgu = self.transform(imgu)
if self.target_transform is not None:
target = self.target_transform(target)
return img, target, imgu
self.imgus
is the added list of unlabeled images.
Then I changed training code as follows,
def train(train_loader, model, criterion, optimizer, epoch):
batch_time = AverageMeter()
data_time = AverageMeter()
var_time = AverageMeter()
model_time = AverageMeter()
...
top1 = AverageMeter()
top5 = AverageMeter()
# switch to train mode
model.train()
# set midx
train_loader.dataset.midx = epoch % train_loader.dataset.max_midx
print(epoch, train_loader.dataset.midx)
end = time.time()
for i, (input, target, inputu) in enumerate(train_loader):
# measure data loading time
dtime = time.time()
data_time.update(dtime - end)
target = target.cuda(async=True)
input_var = torch.autograd.Variable(input)
target_var = torch.autograd.Variable(target)
inputu_var = torch.autograd.Variable(inputu)
input_concat_var = torch.cat([input_var, inputu_var])
vtime = time.time()
var_time.update(vtime - dtime)
# compute output
output = model(input_concat_var)
mtime = time.time()
model_time.update(mtime - vtime)
...
Now, in the for loop, I got batch of input
, target
and inputu
(unlabeled image),
change each of them into Variable
,
and concatenate labeled and unlabeled images before feed into the model.
In order to check where the code get slower, I added var_time
and model_time
as in the code.
Following is one part of the log on the terminal,
Epoch: | [2][0/4180] | Time | 11.312 | -11.312 | Data | 8.702 | -8.702 | Var | 1.706 | -1.706 | Model | 0.481 | -0.481 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Epoch: | [2][100/4180] | Time | 47.901 | -80.702 | Data | 0.001 | -0.087 | Var | 46.429 | -79.423 | Model | 1.021 | -0.765 |
Epoch: | [2][200/4180] | Time | 11.375 | -69.206 | Data | 0.001 | -0.044 | Var | 10.028 | -67.958 | Model | 0.93 | -0.779 |
Epoch: | [2][300/4180] | Time | 9.444 | -64.922 | Data | 0.001 | -0.03 | Var | 8.087 | -63.683 | Model | 0.934 | -0.783 |
Epoch: | [2][400/4180] | Time | 10.702 | -62.866 | Data | 0.001 | -0.023 | Var | 9.8 | -61.639 | Model | 0.488 | -0.777 |
Epoch: | [2][500/4180] | Time | 93.547 | -63.055 | Data | 0.001 | -0.019 | Var | 92.354 | -61.813 | Model | 0.78 | -0.796 |
Epoch: | [2][600/4180] | Time | 104.527 | -60.569 | Data | 0.001 | -0.016 | Var | 103.357 | -59.318 | Model | 0.761 | -0.808 |
Epoch: | [2][700/4180] | Time | 1.772 | -57.497 | Data | 0.001 | -0.014 | Var | 0.726 | -56.248 | Model | 0.639 | -0.809 |
Epoch: | [2][800/4180] | Time | 1.706 | -50.549 | Data | 0.001 | -0.012 | Var | 0.865 | -49.337 | Model | 0.39 | -0.776 |
Epoch: | [2][900/4180] | Time | 1.741 | -45.143 | Data | 0.001 | -0.011 | Var | 0.945 | -43.96 | Model | 0.392 | -0.75 |
Epoch: | [2][1000/4180] | Time | 1.879 | -40.818 | Data | 0.001 | -0.01 | Var | 0.918 | -39.658 | Model | 0.564 | -0.729 |
Epoch: | [2][1100/4180] | Time | 1.879 | -37.277 | Data | 0.002 | -0.009 | Var | 0.881 | -36.136 | Model | 0.588 | -0.712 |
You can see that batch_time
fluctuated a lot and major reason of the increase seems come from var_time
. var_time
becomes very large and ranges from 1.x to 100.x. I understand that concat
operation make some increase in time(1.x) but its weird that it goes up to hundreds.
I don’t know what makes it so slow. When I see the htop
or nvidia-smi
during that perioid, both cpus and gpus are not used much (almost not used).
Is there any problem in my modified code? Or can it be a hardware problem?
I’m running on 8GPUs with 16 workers, batch size is 384 (192 each for labeled and unlabeled image).