I am trying to move a big tensor from CPU to GPU by the method .cuda()
But I found the time varies a lot. For example, the first time for moving data is about a few ms but the second moving may take a few seconds. When I use pdb to debug the code, the time of TENSOR.cuda() operation is fast if I pause a while before executing the operation. If I do not pause, the operation will take longer time.
When I use ctrl + c to stop the program, the code is on
the pytorch seems not running in real python code. The python code is more like a instrument. Before we call or print the result explictly, the result is not really in memory. So every time I call print or CUDA it will take a while to fectch the data.
I havent tested this out in pytorch itself, but generally speaking any time you ask the gpu to do something, it’s fairly asynchronous. So, you send a request, in the mail as it were, and at some point in the future, when the gpu feels like it, has finished its breakfast, reading the morning paper etc, you get the results back.
Now, ok, it doesnt really read the paper etc, but it does take a while to do stuff. If you send more stuff whilst the first stuff hasnt finished, obviously the second set of stuff will be delayed for a while.
copying data to the gpu counts as ‘doing stuff’. it may look instantaneous, because the instruction returns immediatley, but it’ll take a while.
Hunt for an instruction with a name like ‘sync’, or similar, and have a play with that. in fact there’s an example I made earlier here: