import torch
import datetime
m = nn.Conv1d(16, 512, 3, stride=2).cuda()
input = torch.randn(20, 16, 24000).cuda()
for _ in range(40):
torch.cuda.synchronize()
t1 = datetime.datetime.now()
output = m(input)
torch.cuda.synchronize()
t2 = datetime.datetime.now()
print((t2-t1).microseconds / 1000, 'ms')
When the batch size = 20, forward takes 12ms.
When I increase the batch size to 60, the forward takes 26ms.
Is this normal? I think when using GPU, as long as the cuda memory is enough, increasing batch size won’t make forward slower.