Understanding time taken from moving data from GPU to CPU

:question: I am trying to check the time taken by two different models(ELMO vs BERT) while predicting in named entity recognition , It involves getting logits data from GPU to CPU which is taking different times with different models.

this code snippet is same for both the models

       logits = torch.argmax(F.log_softmax(logits,dim=2),dim=2)
		batch_start_time  = time.time()
		logits = logits.detach().cpu().numpy() 
		print(logits.shape) 
		pred_batch = []
		print("Batch time before loop :: %s seconds"%(time.time()-batch_start_time)

output for model ELMO::

(32, 128)

Batch time before loop :: 0.0035903453826904297 seconds

output for model BERT::

(32, 128)

Batch time before loop :: 0.1682131290435791 seconds

why is this difference and are there any other factors that influence this time.

CUDA operations are called asynchronously, so you should synchronize the code before starting and stopping the timer using torch.cuda.synchronize().

In your current code snippet logits = logits.detach().cpu().numpy() will create a synchronization point, so that your code will wait at this line of code for all preceding operations to finish (which also might include the forward pass of your models) while the timer was already started.

1 Like