Introducing SpeedTorch: 4x speed CPU->GPU transfer, 110x GPU->CPU transfer

As far as I can tell, CuPy is only intended to hold CUDA data, but in this case it’s actually holding CPU data (pinned memory). You can check with something like:

cupy.cuda.runtime.pointerGetAttributes(gadgetCPU.CUPYcorpus.data.ptr).memoryType

This will print 1 (= cudaMemoryTypeHost). On gadgetGPU it’ll print 2 (=cudaMemoryTypeDevice). (cudaMemoryType reference)

You can do something similar in PyTorch from the C++ API using torch::from_blob. Here’s an example. Note there’s a check in from_blob that tries to prevent this sort of thing, but the check is broken (what luck!).

1 Like