I am trying to check the time taken by two different models(ELMO vs BERT) while predicting in named entity recognition , It involves getting logits data from GPU to CPU which is taking different times with different models.
this code snippet is same for both the models
logits = torch.argmax(F.log_softmax(logits,dim=2),dim=2)
batch_start_time = time.time()
logits = logits.detach().cpu().numpy()
print(logits.shape)
pred_batch = []
print("Batch time before loop :: %s seconds"%(time.time()-batch_start_time)
output for model ELMO::
(32, 128)
Batch time before loop :: 0.0035903453826904297 seconds
output for model BERT::
(32, 128)
Batch time before loop :: 0.1682131290435791 seconds
why is this difference and are there any other factors that influence this time.