# Speed of different batch size

I have a problem to figure out the influence of batch size when the input requires grad.

Here is my code,

``````def test_generate_time(model, epoch_num, bs):
model.eval()
for p in model.parameters():
p.require_grad = False
model = model.cuda()

imgs = torch.zeros((bs, 3, 224, 224)).float().cuda()
imgs.requires_grad = True      # want to update inputs
labels = torch.ones(bs).long().cuda()
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam([imgs], lr=1e-6)

for epoch in range(epoch_num):
epoch_begin = time.time()

forward_start = time.time()
out = model(imgs)
loss = criterion(out, labels)
forward_time = time.time() - forward_start

backward_start = time.time()
loss.backward()
optimizer.step()
backward_time = time.time() - backward_start
epoch_time = time.time() - epoch_begin
print('Bs {} forward time {:.3f} backward time {:.3f} epoch_time {:.3f}'.format(bs, forward_time, backward_time, epoch_time))
``````

And here is the result of batchsize=1 and batchsize=32
Bs 1 forward time 0.008 backward time 0.017 epoch_time 0.025
Bs 32 forward time 0.021 backward time 0.296 epoch_time 0.317

It seems batch size has a big influence on the speed, which contradicts my knowledge because you know when training a network (instead of inputs), different batch size has little impact of speed. I also run a test of time for the ordinary image classification.
Bs 1 forward time 0.0018 backward time 0.006
Bs 32 forward time 0.0021 backward time 0.006

I really donâ€™t know whatâ€™s happening here. Does any one knows why they differ? Or even improve the speed of training inputs with large batch size ?

It looks like youâ€™re running all your code on cuda but you donâ€™t do any synchronization when timing.
The cuda api is asynchronous, so you need to manually add `torch.cuda.synchronize()` before calling `time.time()` if you want to measure the actual runtime.

1 Like

Thanks a lot! Thatâ€™s very helpful.