Cpu faster than gpu?

@mattinjersey, it seems to me that the difference between your code and @ptrblck’s code is that the latter only measures the time for computation on the GPU, it does not account for the data transfer time (data transfer happens only once in @ptrblck’s code).

On the other hand, you transfer data to the GPU at every iteration, and hence you are observing the additional time required for that.

Is it possible, as @ptrblck suggested, to write your ComputeResults() function as a dataset, so that a dataloader can generate data batches in pin-locked memory (by passing pin_memory=True)? Once the data batch is in pin-locked memory, you can also pass non_blocking=True to the cuda() calls so that data transfer happens asynchronously with computations.

These 2 steps should help amortize the data transfer costs that are slowing you down.

1 Like