Hi PyTorch team,
I am looking for the fastest way to transfer tensors from CPU to GPU. According to the blog from Nvidia, data transfer is faster on pinned memory. So I ran a simple experiment using the following code on A100.
import time
import torch
import pandas as pd
from tqdm import tqdm
def get_performance_metrics(x):
shape = tuple(x.size())
time_start = time.perf_counter()
x = x.pin_memory()
torch.cuda.synchronize()
elapsed_pin = time.perf_counter() - time_start
time_start = time.perf_counter()
x = x.to("cuda:0")
torch.cuda.synchronize()
elapsed_cuda = time.perf_counter() - time_start
return {
"shape": shape,
"pin-memory(ms)": elapsed_pin * 1000,
"pinned->cuda(ms)": elapsed_cuda * 1000,
}
df = []
for seq in tqdm(range(1, 1050, 5)):
x = torch.rand((24, 2, 1, seq, 4096)).type(torch.float16)
m = get_performance_metrics(x)
del x
torch.cuda.empty_cache()
y = torch.rand((24, 2, 1, seq, 4096)).type(torch.float16)
time_start = time.perf_counter()
y.to("cuda")
torch.cuda.synchronize()
elapsed_cuda = time.perf_counter() - time_start
m["direct to cuda(ms)"] = elapsed_cuda * 1000
del y
torch.cuda.empty_cache()
df.append(m)
df = pd.DataFrame(df).round(2)
df.to_excel("tensor_copy_measurement.xlsx", index=False)
In general, there is a trend that pinning the tensor first will produce a lower transfer latency. But this does not hold for all data points. Also there are two weird phenomena
- Transferring a tensor of shape (24, 2, 1, 1026, 4096) is faster than (24, 2, 1, 966, 4096)
- There are some outliers which take significantly more time.
shape | pin-memory(ms) | pinned->cuda(ms) | direct to cuda(ms) | |
---|---|---|---|---|
0 | (24, 2, 1, 1, 4096) | 1245.59 | 0.33 | 0.24 |
1 | (24, 2, 1, 6, 4096) | 1.47 | 0.3 | 0.42 |
2 | (24, 2, 1, 11, 4096) | 2.45 | 0.42 | 0.53 |
3 | (24, 2, 1, 16, 4096) | 0.22 | 0.54 | 0.66 |
4 | (24, 2, 1, 21, 4096) | 0.19 | 0.59 | 0.81 |
5 | (24, 2, 1, 26, 4096) | 4.83 | 0.73 | 1 |
6 | (24, 2, 1, 31, 4096) | 0.31 | 0.86 | 1.1 |
7 | (24, 2, 1, 36, 4096) | 0.38 | 0.98 | 1.27 |
8 | (24, 2, 1, 41, 4096) | 0.45 | 1.13 | 1.46 |
9 | (24, 2, 1, 46, 4096) | 8.2 | 1.19 | 1.62 |
10 | (24, 2, 1, 51, 4096) | 0.56 | 1.31 | 1.86 |
11 | (24, 2, 1, 56, 4096) | 0.64 | 1.46 | 2.11 |
12 | (24, 2, 1, 61, 4096) | 0.71 | 1.54 | 2.36 |
13 | (24, 2, 1, 66, 4096) | 1.04 | 1.67 | 91.19 |
14 | (24, 2, 1, 71, 4096) | 1.09 | 1.74 | 2.82 |
15 | (24, 2, 1, 76, 4096) | 1.15 | 1.93 | 2.99 |
16 | (24, 2, 1, 81, 4096) | 1.22 | 1.97 | 3.2 |
17 | (24, 2, 1, 86, 4096) | 17.01 | 2.04 | 3.53 |
18 | (24, 2, 1, 91, 4096) | 1.06 | 2.16 | 3.67 |
19 | (24, 2, 1, 96, 4096) | 1.16 | 120.73 | 3.93 |
20 | (24, 2, 1, 101, 4096) | 1.16 | 2.42 | 4.08 |
21 | (24, 2, 1, 106, 4096) | 1.22 | 2.5 | 4.26 |
22 | (24, 2, 1, 111, 4096) | 1.28 | 2.6 | 4.45 |
23 | (24, 2, 1, 116, 4096) | 1.35 | 2.68 | 64.01 |
24 | (24, 2, 1, 121, 4096) | 1.4 | 2.78 | 4.88 |
25 | (24, 2, 1, 126, 4096) | 1.43 | 2.92 | 5.13 |
26 | (24, 2, 1, 131, 4096) | 1.5 | 3 | 5.29 |
27 | (24, 2, 1, 136, 4096) | 1.56 | 142.71 | 5.46 |
28 | (24, 2, 1, 141, 4096) | 1.66 | 3.23 | 5.67 |
29 | (24, 2, 1, 146, 4096) | 1.68 | 3.37 | 5.86 |
30 | (24, 2, 1, 151, 4096) | 1.72 | 3.39 | 11.65 |
31 | (24, 2, 1, 156, 4096) | 1.75 | 3.5 | 6.22 |
32 | (24, 2, 1, 161, 4096) | 1.79 | 3.63 | 6.46 |
33 | (24, 2, 1, 166, 4096) | 1.87 | 3.72 | 63.62 |
34 | (24, 2, 1, 171, 4096) | 35.3 | 3.78 | 6.86 |
35 | (24, 2, 1, 176, 4096) | 1.98 | 3.92 | 7.02 |
36 | (24, 2, 1, 181, 4096) | 2.7 | 128.2 | 7.37 |
37 | (24, 2, 1, 186, 4096) | 2.79 | 4.1 | 7.38 |
38 | (24, 2, 1, 191, 4096) | 2.86 | 4.24 | 92.4 |
39 | (24, 2, 1, 196, 4096) | 2.93 | 4.26 | 7.77 |
40 | (24, 2, 1, 201, 4096) | 3.01 | 4.39 | 8.01 |
41 | (24, 2, 1, 206, 4096) | 3.08 | 11.48 | 8.2 |
42 | (24, 2, 1, 211, 4096) | 3.14 | 4.61 | 8.46 |
43 | (24, 2, 1, 216, 4096) | 3.21 | 4.68 | 8.62 |
44 | (24, 2, 1, 221, 4096) | 3.29 | 4.8 | 8.77 |
45 | (24, 2, 1, 226, 4096) | 3.36 | 4.88 | 87.67 |
46 | (24, 2, 1, 231, 4096) | 3.43 | 4.96 | 9.14 |
47 | (24, 2, 1, 236, 4096) | 3.49 | 5.06 | 100.49 |
48 | (24, 2, 1, 241, 4096) | 3.52 | 5.26 | 9.56 |
49 | (24, 2, 1, 246, 4096) | 3.58 | 5.35 | 54.47 |
50 | (24, 2, 1, 251, 4096) | 3.64 | 5.43 | 9.98 |
51 | (24, 2, 1, 256, 4096) | 3.71 | 5.53 | 10.14 |
52 | (24, 2, 1, 261, 4096) | 3.77 | 5.62 | 10.37 |
53 | (24, 2, 1, 266, 4096) | 3.86 | 5.74 | 10.53 |
54 | (24, 2, 1, 271, 4096) | 3.94 | 5.81 | 10.72 |
55 | (24, 2, 1, 276, 4096) | 4.02 | 5.93 | 10.9 |
56 | (24, 2, 1, 281, 4096) | 4.09 | 6.01 | 11.16 |
57 | (24, 2, 1, 286, 4096) | 4.14 | 38.6 | 11.33 |
58 | (24, 2, 1, 291, 4096) | 4.21 | 6.29 | 11.53 |
59 | (24, 2, 1, 296, 4096) | 4.27 | 6.25 | 11.7 |
60 | (24, 2, 1, 301, 4096) | 4.36 | 6.46 | 11.92 |
61 | (24, 2, 1, 306, 4096) | 4.42 | 6.51 | 12.11 |
62 | (24, 2, 1, 311, 4096) | 4.5 | 65.78 | 12.26 |
63 | (24, 2, 1, 316, 4096) | 4.56 | 6.77 | 95.24 |
64 | (24, 2, 1, 321, 4096) | 4.64 | 6.82 | 12.69 |
65 | (24, 2, 1, 326, 4096) | 4.8 | 53.54 | 12.9 |
66 | (24, 2, 1, 331, 4096) | 4.86 | 7.02 | 43.83 |
67 | (24, 2, 1, 336, 4096) | 4.95 | 7.18 | 13.26 |
68 | (24, 2, 1, 341, 4096) | 5 | 7.2 | 13.45 |
69 | (24, 2, 1, 346, 4096) | 72.38 | 7.37 | 13.43 |
70 | (24, 2, 1, 351, 4096) | 5.13 | 7.44 | 13.54 |
71 | (24, 2, 1, 356, 4096) | 5.19 | 7.55 | 13.71 |
72 | (24, 2, 1, 361, 4096) | 5.28 | 7.64 | 14.06 |
73 | (24, 2, 1, 366, 4096) | 5.35 | 7.69 | 99.09 |
74 | (24, 2, 1, 371, 4096) | 5.42 | 7.8 | 14.34 |
75 | (24, 2, 1, 376, 4096) | 5.49 | 7.9 | 14.51 |
76 | (24, 2, 1, 381, 4096) | 5.57 | 8.04 | 14.67 |
77 | (24, 2, 1, 386, 4096) | 4.19 | 8.21 | 14.96 |
78 | (24, 2, 1, 391, 4096) | 4.27 | 8.25 | 137.99 |
79 | (24, 2, 1, 396, 4096) | 5.81 | 8.36 | 15.35 |
80 | (24, 2, 1, 401, 4096) | 5.85 | 8.4 | 15.59 |
81 | (24, 2, 1, 406, 4096) | 5.96 | 26.22 | 15.78 |
82 | (24, 2, 1, 411, 4096) | 5.99 | 8.6 | 15.93 |
83 | (24, 2, 1, 416, 4096) | 6.08 | 8.72 | 16.12 |
84 | (24, 2, 1, 421, 4096) | 6.17 | 8.82 | 166.67 |
85 | (24, 2, 1, 426, 4096) | 6.24 | 8.94 | 16.5 |
86 | (24, 2, 1, 431, 4096) | 6.28 | 9.07 | 16.67 |
87 | (24, 2, 1, 436, 4096) | 6.36 | 9.17 | 17.05 |
88 | (24, 2, 1, 441, 4096) | 6.72 | 9.38 | 17.11 |
89 | (24, 2, 1, 446, 4096) | 6.74 | 156.34 | 17.29 |
90 | (24, 2, 1, 451, 4096) | 6.81 | 140.54 | 17.45 |
91 | (24, 2, 1, 456, 4096) | 6.9 | 130 | 17.72 |
92 | (24, 2, 1, 461, 4096) | 6.92 | 121.27 | 17.83 |
93 | (24, 2, 1, 466, 4096) | 7.04 | 110.51 | 17.98 |
94 | (24, 2, 1, 471, 4096) | 7.09 | 99.92 | 18.18 |
95 | (24, 2, 1, 476, 4096) | 7.15 | 87.87 | 18.54 |
96 | (24, 2, 1, 481, 4096) | 7.26 | 69.33 | 18.79 |
97 | (24, 2, 1, 486, 4096) | 7.36 | 58.46 | 18.97 |
98 | (24, 2, 1, 491, 4096) | 7.43 | 46.41 | 19.16 |
99 | (24, 2, 1, 496, 4096) | 7.48 | 39.62 | 19.4 |
100 | (24, 2, 1, 501, 4096) | 7.57 | 21.8 | 19.55 |
101 | (24, 2, 1, 506, 4096) | 7.44 | 22.77 | 19.85 |
102 | (24, 2, 1, 511, 4096) | 5.78 | 10.54 | 19.98 |
103 | (24, 2, 1, 516, 4096) | 5.59 | 10.8 | 20.09 |
104 | (24, 2, 1, 521, 4096) | 5.66 | 10.83 | 20.42 |
105 | (24, 2, 1, 526, 4096) | 5.73 | 10.95 | 20.54 |
106 | (24, 2, 1, 531, 4096) | 5.75 | 11.03 | 20.71 |
107 | (24, 2, 1, 536, 4096) | 5.83 | 11.17 | 21.24 |
108 | (24, 2, 1, 541, 4096) | 5.87 | 11.23 | 21.18 |
109 | (24, 2, 1, 546, 4096) | 5.91 | 11.34 | 21.24 |
110 | (24, 2, 1, 551, 4096) | 5.98 | 11.47 | 138.33 |
111 | (24, 2, 1, 556, 4096) | 8.44 | 11.57 | 21.63 |
112 | (24, 2, 1, 561, 4096) | 8.48 | 11.68 | 21.86 |
113 | (24, 2, 1, 566, 4096) | 8.57 | 11.77 | 22.08 |
114 | (24, 2, 1, 571, 4096) | 8.64 | 95.79 | 22.19 |
115 | (24, 2, 1, 576, 4096) | 8.73 | 11.9 | 22.39 |
116 | (24, 2, 1, 581, 4096) | 8.78 | 12.06 | 22.68 |
117 | (24, 2, 1, 586, 4096) | 8.88 | 12.15 | 22.82 |
118 | (24, 2, 1, 591, 4096) | 8.95 | 12.29 | 23 |
119 | (24, 2, 1, 596, 4096) | 9.01 | 130.26 | 23.32 |
120 | (24, 2, 1, 601, 4096) | 9.09 | 12.38 | 23.5 |
121 | (24, 2, 1, 606, 4096) | 9.18 | 12.51 | 44.14 |
122 | (24, 2, 1, 611, 4096) | 9.24 | 12.65 | 23.82 |
123 | (24, 2, 1, 616, 4096) | 9.32 | 12.71 | 24.02 |
124 | (24, 2, 1, 621, 4096) | 9.4 | 12.82 | 24.28 |
125 | (24, 2, 1, 626, 4096) | 9.46 | 12.94 | 133.27 |
126 | (24, 2, 1, 631, 4096) | 6.82 | 13.13 | 24.65 |
127 | (24, 2, 1, 636, 4096) | 7 | 163.93 | 24.81 |
128 | (24, 2, 1, 641, 4096) | 6.92 | 13.23 | 177.68 |
129 | (24, 2, 1, 646, 4096) | 6.97 | 13.35 | 25.13 |
130 | (24, 2, 1, 651, 4096) | 9.56 | 107.45 | 25.39 |
131 | (24, 2, 1, 656, 4096) | 9.63 | 13.55 | 101.99 |
132 | (24, 2, 1, 661, 4096) | 9.68 | 13.65 | 25.76 |
133 | (24, 2, 1, 666, 4096) | 9.75 | 54.84 | 25.93 |
134 | (24, 2, 1, 671, 4096) | 9.82 | 13.86 | 57.36 |
135 | (24, 2, 1, 676, 4096) | 9.95 | 13.93 | 26.47 |
136 | (24, 2, 1, 681, 4096) | 7.37 | 14.09 | 26.48 |
137 | (24, 2, 1, 686, 4096) | 140.9 | 14.22 | 26.63 |
138 | (24, 2, 1, 691, 4096) | 7.45 | 14.28 | 26.9 |
139 | (24, 2, 1, 696, 4096) | 7.51 | 14.37 | 27.32 |
140 | (24, 2, 1, 701, 4096) | 7.55 | 14.46 | 27.26 |
141 | (24, 2, 1, 706, 4096) | 7.7 | 124.03 | 27.07 |
142 | (24, 2, 1, 711, 4096) | 7.66 | 14.68 | 27.1 |
143 | (24, 2, 1, 716, 4096) | 7.73 | 14.71 | 27.47 |
144 | (24, 2, 1, 721, 4096) | 7.78 | 14.87 | 160.05 |
145 | (24, 2, 1, 726, 4096) | 10.95 | 14.91 | 27.86 |
146 | (24, 2, 1, 731, 4096) | 11.01 | 14.99 | 180.19 |
147 | (24, 2, 1, 736, 4096) | 11.05 | 15.09 | 28.19 |
148 | (24, 2, 1, 741, 4096) | 11.14 | 15.2 | 129.51 |
149 | (24, 2, 1, 746, 4096) | 11.23 | 15.25 | 28.61 |
150 | (24, 2, 1, 751, 4096) | 11.32 | 15.45 | 86.7 |
151 | (24, 2, 1, 756, 4096) | 11.38 | 15.6 | 29.12 |
152 | (24, 2, 1, 761, 4096) | 11.46 | 15.64 | 38.48 |
153 | (24, 2, 1, 766, 4096) | 11.53 | 15.73 | 29.29 |
154 | (24, 2, 1, 771, 4096) | 11.61 | 15.77 | 29.51 |
155 | (24, 2, 1, 776, 4096) | 11.65 | 15.94 | 29.56 |
156 | (24, 2, 1, 781, 4096) | 11.72 | 15.95 | 30.13 |
157 | (24, 2, 1, 786, 4096) | 11.81 | 16.21 | 29.89 |
158 | (24, 2, 1, 791, 4096) | 11.87 | 16.14 | 30.07 |
159 | (24, 2, 1, 796, 4096) | 11.94 | 16.29 | 30.48 |
160 | (24, 2, 1, 801, 4096) | 12.02 | 16.37 | 30.56 |
161 | (24, 2, 1, 806, 4096) | 12.12 | 16.46 | 30.89 |
162 | (24, 2, 1, 811, 4096) | 12.17 | 112.46 | 31.02 |
163 | (24, 2, 1, 816, 4096) | 12.27 | 16.63 | 31.27 |
164 | (24, 2, 1, 821, 4096) | 12.34 | 16.91 | 31.31 |
165 | (24, 2, 1, 826, 4096) | 12.4 | 16.77 | 31.63 |
166 | (24, 2, 1, 831, 4096) | 12.48 | 17.02 | 32.18 |
167 | (24, 2, 1, 836, 4096) | 12.54 | 17.02 | 140.98 |
168 | (24, 2, 1, 841, 4096) | 12.6 | 17.15 | 32.2 |
169 | (24, 2, 1, 846, 4096) | 12.7 | 17.23 | 32.36 |
170 | (24, 2, 1, 851, 4096) | 12.77 | 17.36 | 33.89 |
171 | (24, 2, 1, 856, 4096) | 12.84 | 17.49 | 32.87 |
172 | (24, 2, 1, 861, 4096) | 13 | 17.48 | 32.95 |
173 | (24, 2, 1, 866, 4096) | 12.98 | 17.6 | 33.1 |
174 | (24, 2, 1, 871, 4096) | 13.08 | 17.69 | 140.24 |
175 | (24, 2, 1, 876, 4096) | 13.19 | 164.71 | 33.59 |
176 | (24, 2, 1, 881, 4096) | 13.19 | 17.82 | 33.64 |
177 | (24, 2, 1, 886, 4096) | 13.3 | 17.91 | 34.15 |
178 | (24, 2, 1, 891, 4096) | 13.36 | 17.92 | 34.11 |
179 | (24, 2, 1, 896, 4096) | 13.45 | 18.14 | 47.32 |
180 | (24, 2, 1, 901, 4096) | 13.51 | 161.89 | 187.27 |
181 | (24, 2, 1, 906, 4096) | 13.55 | 137.23 | 185.95 |
182 | (24, 2, 1, 911, 4096) | 13.64 | 132.53 | 188.22 |
183 | (24, 2, 1, 916, 4096) | 13.71 | 126.04 | 186.95 |
184 | (24, 2, 1, 921, 4096) | 13.78 | 119.1 | 159.75 |
185 | (24, 2, 1, 926, 4096) | 13.87 | 103.94 | 148.76 |
186 | (24, 2, 1, 931, 4096) | 13.92 | 107.04 | 144.25 |
187 | (24, 2, 1, 936, 4096) | 14.03 | 91.58 | 146.02 |
188 | (24, 2, 1, 941, 4096) | 14.13 | 91.43 | 142.19 |
189 | (24, 2, 1, 946, 4096) | 14.16 | 96.95 | 139.02 |
190 | (24, 2, 1, 951, 4096) | 14.14 | 90.7 | 134 |
191 | (24, 2, 1, 956, 4096) | 14.14 | 83.12 | 128.91 |
192 | (24, 2, 1, 961, 4096) | 14.21 | 67.87 | 121.61 |
193 | (24, 2, 1, 966, 4096) | 14.29 | 74.24 | 113.4 |
194 | (24, 2, 1, 971, 4096) | 14.38 | 67.69 | 112.32 |
195 | (24, 2, 1, 976, 4096) | 14.41 | 52.07 | 98.98 |
196 | (24, 2, 1, 981, 4096) | 14.48 | 43.74 | 96.83 |
197 | (24, 2, 1, 986, 4096) | 14.57 | 42.39 | 95.48 |
198 | (24, 2, 1, 991, 4096) | 14.64 | 43.3 | 89.24 |
199 | (24, 2, 1, 996, 4096) | 14.76 | 38.1 | 87.16 |
200 | (24, 2, 1, 1001, 4096) | 14.79 | 34.66 | 80.94 |
201 | (24, 2, 1, 1006, 4096) | 14.89 | 20.43 | 75.97 |
202 | (24, 2, 1, 1011, 4096) | 14.68 | 23.79 | 63.71 |
203 | (24, 2, 1, 1016, 4096) | 14.82 | 20.61 | 48.38 |
204 | (24, 2, 1, 1021, 4096) | 14.9 | 20.74 | 39.35 |
205 | (24, 2, 1, 1026, 4096) | 14.92 | 20.91 | 39.28 |
206 | (24, 2, 1, 1031, 4096) | 15 | 20.84 | 39.39 |
207 | (24, 2, 1, 1036, 4096) | 14.92 | 21.11 | 39.48 |
208 | (24, 2, 1, 1041, 4096) | 14.93 | 21.2 | 39.67 |
209 | (24, 2, 1, 1046, 4096) | 15.01 | 21.29 | 39.89 |
My questions are
- Am I setting the experiment correctly? What factors do I miss?
- What is the fastest way to transfer a tensor from CPU to GPU on A100?