The pytorch document says that "GPU copies are much faster when they originate from pinned method, that returns a copy of the object, with data put in a pinned region.
Also, once you pin a tensor or storage, you can use asynchronous GPU copies. Just pass an additional
non_blocking=True argument to a [
to()] used to overlap data transfers with computation.
So this means the time should be faster. Instead, it taking more time. After placing non_blocking=True, code takes 1 hour+ to run whereas non putting this takes 11 mins