How to reduce latency of transferring tensors from CPU to GPU

Hi PyTorch team,

I am looking for the fastest way to transfer tensors from CPU to GPU. According to the blog from Nvidia, data transfer is faster on pinned memory. So I ran a simple experiment using the following code on A100.

import time
import torch
import pandas as pd
from tqdm import tqdm


def get_performance_metrics(x):
    shape = tuple(x.size())

    time_start = time.perf_counter()
    x = x.pin_memory()
    torch.cuda.synchronize()
    elapsed_pin = time.perf_counter() - time_start

    time_start = time.perf_counter()
    x = x.to("cuda:0")
    torch.cuda.synchronize()
    elapsed_cuda = time.perf_counter() - time_start

    return {
        "shape": shape,
        "pin-memory(ms)": elapsed_pin * 1000,
        "pinned->cuda(ms)": elapsed_cuda * 1000,
    }

df = []
for seq in tqdm(range(1, 1050, 5)):
    x = torch.rand((24, 2, 1, seq, 4096)).type(torch.float16)
    m = get_performance_metrics(x)
    del x
    torch.cuda.empty_cache()

    y = torch.rand((24, 2, 1, seq, 4096)).type(torch.float16)
    time_start = time.perf_counter()
    y.to("cuda")
    torch.cuda.synchronize()
    elapsed_cuda = time.perf_counter() - time_start
    m["direct to cuda(ms)"] = elapsed_cuda * 1000

    del y
    torch.cuda.empty_cache()

    df.append(m)

df = pd.DataFrame(df).round(2)
df.to_excel("tensor_copy_measurement.xlsx", index=False)

In general, there is a trend that pinning the tensor first will produce a lower transfer latency. But this does not hold for all data points. Also there are two weird phenomena

  1. Transferring a tensor of shape (24, 2, 1, 1026, 4096) is faster than (24, 2, 1, 966, 4096)
  2. There are some outliers which take significantly more time.
shape pin-memory(ms) pinned->cuda(ms) direct to cuda(ms)
0 (24, 2, 1, 1, 4096) 1245.59 0.33 0.24
1 (24, 2, 1, 6, 4096) 1.47 0.3 0.42
2 (24, 2, 1, 11, 4096) 2.45 0.42 0.53
3 (24, 2, 1, 16, 4096) 0.22 0.54 0.66
4 (24, 2, 1, 21, 4096) 0.19 0.59 0.81
5 (24, 2, 1, 26, 4096) 4.83 0.73 1
6 (24, 2, 1, 31, 4096) 0.31 0.86 1.1
7 (24, 2, 1, 36, 4096) 0.38 0.98 1.27
8 (24, 2, 1, 41, 4096) 0.45 1.13 1.46
9 (24, 2, 1, 46, 4096) 8.2 1.19 1.62
10 (24, 2, 1, 51, 4096) 0.56 1.31 1.86
11 (24, 2, 1, 56, 4096) 0.64 1.46 2.11
12 (24, 2, 1, 61, 4096) 0.71 1.54 2.36
13 (24, 2, 1, 66, 4096) 1.04 1.67 91.19
14 (24, 2, 1, 71, 4096) 1.09 1.74 2.82
15 (24, 2, 1, 76, 4096) 1.15 1.93 2.99
16 (24, 2, 1, 81, 4096) 1.22 1.97 3.2
17 (24, 2, 1, 86, 4096) 17.01 2.04 3.53
18 (24, 2, 1, 91, 4096) 1.06 2.16 3.67
19 (24, 2, 1, 96, 4096) 1.16 120.73 3.93
20 (24, 2, 1, 101, 4096) 1.16 2.42 4.08
21 (24, 2, 1, 106, 4096) 1.22 2.5 4.26
22 (24, 2, 1, 111, 4096) 1.28 2.6 4.45
23 (24, 2, 1, 116, 4096) 1.35 2.68 64.01
24 (24, 2, 1, 121, 4096) 1.4 2.78 4.88
25 (24, 2, 1, 126, 4096) 1.43 2.92 5.13
26 (24, 2, 1, 131, 4096) 1.5 3 5.29
27 (24, 2, 1, 136, 4096) 1.56 142.71 5.46
28 (24, 2, 1, 141, 4096) 1.66 3.23 5.67
29 (24, 2, 1, 146, 4096) 1.68 3.37 5.86
30 (24, 2, 1, 151, 4096) 1.72 3.39 11.65
31 (24, 2, 1, 156, 4096) 1.75 3.5 6.22
32 (24, 2, 1, 161, 4096) 1.79 3.63 6.46
33 (24, 2, 1, 166, 4096) 1.87 3.72 63.62
34 (24, 2, 1, 171, 4096) 35.3 3.78 6.86
35 (24, 2, 1, 176, 4096) 1.98 3.92 7.02
36 (24, 2, 1, 181, 4096) 2.7 128.2 7.37
37 (24, 2, 1, 186, 4096) 2.79 4.1 7.38
38 (24, 2, 1, 191, 4096) 2.86 4.24 92.4
39 (24, 2, 1, 196, 4096) 2.93 4.26 7.77
40 (24, 2, 1, 201, 4096) 3.01 4.39 8.01
41 (24, 2, 1, 206, 4096) 3.08 11.48 8.2
42 (24, 2, 1, 211, 4096) 3.14 4.61 8.46
43 (24, 2, 1, 216, 4096) 3.21 4.68 8.62
44 (24, 2, 1, 221, 4096) 3.29 4.8 8.77
45 (24, 2, 1, 226, 4096) 3.36 4.88 87.67
46 (24, 2, 1, 231, 4096) 3.43 4.96 9.14
47 (24, 2, 1, 236, 4096) 3.49 5.06 100.49
48 (24, 2, 1, 241, 4096) 3.52 5.26 9.56
49 (24, 2, 1, 246, 4096) 3.58 5.35 54.47
50 (24, 2, 1, 251, 4096) 3.64 5.43 9.98
51 (24, 2, 1, 256, 4096) 3.71 5.53 10.14
52 (24, 2, 1, 261, 4096) 3.77 5.62 10.37
53 (24, 2, 1, 266, 4096) 3.86 5.74 10.53
54 (24, 2, 1, 271, 4096) 3.94 5.81 10.72
55 (24, 2, 1, 276, 4096) 4.02 5.93 10.9
56 (24, 2, 1, 281, 4096) 4.09 6.01 11.16
57 (24, 2, 1, 286, 4096) 4.14 38.6 11.33
58 (24, 2, 1, 291, 4096) 4.21 6.29 11.53
59 (24, 2, 1, 296, 4096) 4.27 6.25 11.7
60 (24, 2, 1, 301, 4096) 4.36 6.46 11.92
61 (24, 2, 1, 306, 4096) 4.42 6.51 12.11
62 (24, 2, 1, 311, 4096) 4.5 65.78 12.26
63 (24, 2, 1, 316, 4096) 4.56 6.77 95.24
64 (24, 2, 1, 321, 4096) 4.64 6.82 12.69
65 (24, 2, 1, 326, 4096) 4.8 53.54 12.9
66 (24, 2, 1, 331, 4096) 4.86 7.02 43.83
67 (24, 2, 1, 336, 4096) 4.95 7.18 13.26
68 (24, 2, 1, 341, 4096) 5 7.2 13.45
69 (24, 2, 1, 346, 4096) 72.38 7.37 13.43
70 (24, 2, 1, 351, 4096) 5.13 7.44 13.54
71 (24, 2, 1, 356, 4096) 5.19 7.55 13.71
72 (24, 2, 1, 361, 4096) 5.28 7.64 14.06
73 (24, 2, 1, 366, 4096) 5.35 7.69 99.09
74 (24, 2, 1, 371, 4096) 5.42 7.8 14.34
75 (24, 2, 1, 376, 4096) 5.49 7.9 14.51
76 (24, 2, 1, 381, 4096) 5.57 8.04 14.67
77 (24, 2, 1, 386, 4096) 4.19 8.21 14.96
78 (24, 2, 1, 391, 4096) 4.27 8.25 137.99
79 (24, 2, 1, 396, 4096) 5.81 8.36 15.35
80 (24, 2, 1, 401, 4096) 5.85 8.4 15.59
81 (24, 2, 1, 406, 4096) 5.96 26.22 15.78
82 (24, 2, 1, 411, 4096) 5.99 8.6 15.93
83 (24, 2, 1, 416, 4096) 6.08 8.72 16.12
84 (24, 2, 1, 421, 4096) 6.17 8.82 166.67
85 (24, 2, 1, 426, 4096) 6.24 8.94 16.5
86 (24, 2, 1, 431, 4096) 6.28 9.07 16.67
87 (24, 2, 1, 436, 4096) 6.36 9.17 17.05
88 (24, 2, 1, 441, 4096) 6.72 9.38 17.11
89 (24, 2, 1, 446, 4096) 6.74 156.34 17.29
90 (24, 2, 1, 451, 4096) 6.81 140.54 17.45
91 (24, 2, 1, 456, 4096) 6.9 130 17.72
92 (24, 2, 1, 461, 4096) 6.92 121.27 17.83
93 (24, 2, 1, 466, 4096) 7.04 110.51 17.98
94 (24, 2, 1, 471, 4096) 7.09 99.92 18.18
95 (24, 2, 1, 476, 4096) 7.15 87.87 18.54
96 (24, 2, 1, 481, 4096) 7.26 69.33 18.79
97 (24, 2, 1, 486, 4096) 7.36 58.46 18.97
98 (24, 2, 1, 491, 4096) 7.43 46.41 19.16
99 (24, 2, 1, 496, 4096) 7.48 39.62 19.4
100 (24, 2, 1, 501, 4096) 7.57 21.8 19.55
101 (24, 2, 1, 506, 4096) 7.44 22.77 19.85
102 (24, 2, 1, 511, 4096) 5.78 10.54 19.98
103 (24, 2, 1, 516, 4096) 5.59 10.8 20.09
104 (24, 2, 1, 521, 4096) 5.66 10.83 20.42
105 (24, 2, 1, 526, 4096) 5.73 10.95 20.54
106 (24, 2, 1, 531, 4096) 5.75 11.03 20.71
107 (24, 2, 1, 536, 4096) 5.83 11.17 21.24
108 (24, 2, 1, 541, 4096) 5.87 11.23 21.18
109 (24, 2, 1, 546, 4096) 5.91 11.34 21.24
110 (24, 2, 1, 551, 4096) 5.98 11.47 138.33
111 (24, 2, 1, 556, 4096) 8.44 11.57 21.63
112 (24, 2, 1, 561, 4096) 8.48 11.68 21.86
113 (24, 2, 1, 566, 4096) 8.57 11.77 22.08
114 (24, 2, 1, 571, 4096) 8.64 95.79 22.19
115 (24, 2, 1, 576, 4096) 8.73 11.9 22.39
116 (24, 2, 1, 581, 4096) 8.78 12.06 22.68
117 (24, 2, 1, 586, 4096) 8.88 12.15 22.82
118 (24, 2, 1, 591, 4096) 8.95 12.29 23
119 (24, 2, 1, 596, 4096) 9.01 130.26 23.32
120 (24, 2, 1, 601, 4096) 9.09 12.38 23.5
121 (24, 2, 1, 606, 4096) 9.18 12.51 44.14
122 (24, 2, 1, 611, 4096) 9.24 12.65 23.82
123 (24, 2, 1, 616, 4096) 9.32 12.71 24.02
124 (24, 2, 1, 621, 4096) 9.4 12.82 24.28
125 (24, 2, 1, 626, 4096) 9.46 12.94 133.27
126 (24, 2, 1, 631, 4096) 6.82 13.13 24.65
127 (24, 2, 1, 636, 4096) 7 163.93 24.81
128 (24, 2, 1, 641, 4096) 6.92 13.23 177.68
129 (24, 2, 1, 646, 4096) 6.97 13.35 25.13
130 (24, 2, 1, 651, 4096) 9.56 107.45 25.39
131 (24, 2, 1, 656, 4096) 9.63 13.55 101.99
132 (24, 2, 1, 661, 4096) 9.68 13.65 25.76
133 (24, 2, 1, 666, 4096) 9.75 54.84 25.93
134 (24, 2, 1, 671, 4096) 9.82 13.86 57.36
135 (24, 2, 1, 676, 4096) 9.95 13.93 26.47
136 (24, 2, 1, 681, 4096) 7.37 14.09 26.48
137 (24, 2, 1, 686, 4096) 140.9 14.22 26.63
138 (24, 2, 1, 691, 4096) 7.45 14.28 26.9
139 (24, 2, 1, 696, 4096) 7.51 14.37 27.32
140 (24, 2, 1, 701, 4096) 7.55 14.46 27.26
141 (24, 2, 1, 706, 4096) 7.7 124.03 27.07
142 (24, 2, 1, 711, 4096) 7.66 14.68 27.1
143 (24, 2, 1, 716, 4096) 7.73 14.71 27.47
144 (24, 2, 1, 721, 4096) 7.78 14.87 160.05
145 (24, 2, 1, 726, 4096) 10.95 14.91 27.86
146 (24, 2, 1, 731, 4096) 11.01 14.99 180.19
147 (24, 2, 1, 736, 4096) 11.05 15.09 28.19
148 (24, 2, 1, 741, 4096) 11.14 15.2 129.51
149 (24, 2, 1, 746, 4096) 11.23 15.25 28.61
150 (24, 2, 1, 751, 4096) 11.32 15.45 86.7
151 (24, 2, 1, 756, 4096) 11.38 15.6 29.12
152 (24, 2, 1, 761, 4096) 11.46 15.64 38.48
153 (24, 2, 1, 766, 4096) 11.53 15.73 29.29
154 (24, 2, 1, 771, 4096) 11.61 15.77 29.51
155 (24, 2, 1, 776, 4096) 11.65 15.94 29.56
156 (24, 2, 1, 781, 4096) 11.72 15.95 30.13
157 (24, 2, 1, 786, 4096) 11.81 16.21 29.89
158 (24, 2, 1, 791, 4096) 11.87 16.14 30.07
159 (24, 2, 1, 796, 4096) 11.94 16.29 30.48
160 (24, 2, 1, 801, 4096) 12.02 16.37 30.56
161 (24, 2, 1, 806, 4096) 12.12 16.46 30.89
162 (24, 2, 1, 811, 4096) 12.17 112.46 31.02
163 (24, 2, 1, 816, 4096) 12.27 16.63 31.27
164 (24, 2, 1, 821, 4096) 12.34 16.91 31.31
165 (24, 2, 1, 826, 4096) 12.4 16.77 31.63
166 (24, 2, 1, 831, 4096) 12.48 17.02 32.18
167 (24, 2, 1, 836, 4096) 12.54 17.02 140.98
168 (24, 2, 1, 841, 4096) 12.6 17.15 32.2
169 (24, 2, 1, 846, 4096) 12.7 17.23 32.36
170 (24, 2, 1, 851, 4096) 12.77 17.36 33.89
171 (24, 2, 1, 856, 4096) 12.84 17.49 32.87
172 (24, 2, 1, 861, 4096) 13 17.48 32.95
173 (24, 2, 1, 866, 4096) 12.98 17.6 33.1
174 (24, 2, 1, 871, 4096) 13.08 17.69 140.24
175 (24, 2, 1, 876, 4096) 13.19 164.71 33.59
176 (24, 2, 1, 881, 4096) 13.19 17.82 33.64
177 (24, 2, 1, 886, 4096) 13.3 17.91 34.15
178 (24, 2, 1, 891, 4096) 13.36 17.92 34.11
179 (24, 2, 1, 896, 4096) 13.45 18.14 47.32
180 (24, 2, 1, 901, 4096) 13.51 161.89 187.27
181 (24, 2, 1, 906, 4096) 13.55 137.23 185.95
182 (24, 2, 1, 911, 4096) 13.64 132.53 188.22
183 (24, 2, 1, 916, 4096) 13.71 126.04 186.95
184 (24, 2, 1, 921, 4096) 13.78 119.1 159.75
185 (24, 2, 1, 926, 4096) 13.87 103.94 148.76
186 (24, 2, 1, 931, 4096) 13.92 107.04 144.25
187 (24, 2, 1, 936, 4096) 14.03 91.58 146.02
188 (24, 2, 1, 941, 4096) 14.13 91.43 142.19
189 (24, 2, 1, 946, 4096) 14.16 96.95 139.02
190 (24, 2, 1, 951, 4096) 14.14 90.7 134
191 (24, 2, 1, 956, 4096) 14.14 83.12 128.91
192 (24, 2, 1, 961, 4096) 14.21 67.87 121.61
193 (24, 2, 1, 966, 4096) 14.29 74.24 113.4
194 (24, 2, 1, 971, 4096) 14.38 67.69 112.32
195 (24, 2, 1, 976, 4096) 14.41 52.07 98.98
196 (24, 2, 1, 981, 4096) 14.48 43.74 96.83
197 (24, 2, 1, 986, 4096) 14.57 42.39 95.48
198 (24, 2, 1, 991, 4096) 14.64 43.3 89.24
199 (24, 2, 1, 996, 4096) 14.76 38.1 87.16
200 (24, 2, 1, 1001, 4096) 14.79 34.66 80.94
201 (24, 2, 1, 1006, 4096) 14.89 20.43 75.97
202 (24, 2, 1, 1011, 4096) 14.68 23.79 63.71
203 (24, 2, 1, 1016, 4096) 14.82 20.61 48.38
204 (24, 2, 1, 1021, 4096) 14.9 20.74 39.35
205 (24, 2, 1, 1026, 4096) 14.92 20.91 39.28
206 (24, 2, 1, 1031, 4096) 15 20.84 39.39
207 (24, 2, 1, 1036, 4096) 14.92 21.11 39.48
208 (24, 2, 1, 1041, 4096) 14.93 21.2 39.67
209 (24, 2, 1, 1046, 4096) 15.01 21.29 39.89

My questions are

  1. Am I setting the experiment correctly? What factors do I miss?
  2. What is the fastest way to transfer a tensor from CPU to GPU on A100?