I have a problem to figure out the influence of batch size when the input requires grad.
Here is my code,
def test_generate_time(model, epoch_num, bs):
model.eval()
for p in model.parameters():
p.require_grad = False
model = model.cuda()
imgs = torch.zeros((bs, 3, 224, 224)).float().cuda()
imgs.requires_grad = True # want to update inputs
labels = torch.ones(bs).long().cuda()
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam([imgs], lr=1e-6)
for epoch in range(epoch_num):
epoch_begin = time.time()
forward_start = time.time()
out = model(imgs)
loss = criterion(out, labels)
forward_time = time.time() - forward_start
backward_start = time.time()
loss.backward()
optimizer.step()
backward_time = time.time() - backward_start
epoch_time = time.time() - epoch_begin
print('Bs {} forward time {:.3f} backward time {:.3f} epoch_time {:.3f}'.format(bs, forward_time, backward_time, epoch_time))
And here is the result of batchsize=1 and batchsize=32
Bs 1 forward time 0.008 backward time 0.017 epoch_time 0.025
Bs 32 forward time 0.021 backward time 0.296 epoch_time 0.317
It seems batch size has a big influence on the speed, which contradicts my knowledge because you know when training a network (instead of inputs), different batch size has little impact of speed. I also run a test of time for the ordinary image classification.
Bs 1 forward time 0.0018 backward time 0.006
Bs 32 forward time 0.0021 backward time 0.006
I really don’t know what’s happening here. Does any one knows why they differ? Or even improve the speed of training inputs with large batch size ?