What is the overhead of transforming numpy to torch and vice versa?
probably about 1 microsecond (basically the cost of a python call). There is no memcopy or anything, so it’s quite efficient.
PyTorch tensors and NumPy arrays share the same memory locations.
I’ve compared converting to NumPy arrays from PyTorch and Tensorflow here: http://stsievert.com/blog/2017/09/07/pytorch/
On my local machine, PyTorch takes 0.5 microseconds to convert between the two.
@smth @stsievert what if I convert from cuda tensor? Is that just 2 Python function call? Or gpu tensor to cpu tensor takes much longer? I have been using quite a number of conversion in my code and am wondering if it is slowing me down
The code behind these timings can be found at https://github.com/stsievert/pytorch-timing-comparisons in Jupyter notebooks. They are timing a CPU tensor to NumPy array, for both tensor flow and PyTorch.
I would expect that converting from a PyTorch GPU tensor to a ndarray is O(n) since it has to transfer all n floats from GPU memory to CPU memory. I’m not sure on the O constant, but I would expect it to be fairly small.
Of course the big O constant is small – memory copies are fast.
But the big O constant is still significant. Memory is a bottleneck – CPUs spend most of their time waiting for registers to be filled, not waiting for computation to be finished.
Try to minimize the number of CPU <=> GPU transfers. I believe you want to use async=True
in cuda
or cpu
when you can.