Non-blocking transfer to GPU is not working

We had some issues using pinned memory recently (@rwightman reported it here), which were fixed recently, so you could try out the nightly build or build from source and check, if the profiling changes.
For general data loading bottlenecks, have a look at this post.