As I learned from the document, “By default, GPU operations are asynchronous. When you call a function that uses the GPU, the operations are enqueued to the particular device, but not necessarily executed until later. This allows us to execute more computations in parallel, including operations on CPU or other GPUs.”
However, I find that LSTM doesn’t seem to be asynchronous. I don’t know why. See the examples below.
LSTM:
a = torch.rand(1000, 20, 10000).to(device)
net = nn.LSTM(10000, 100).to(device)
torch.cuda.synchronize()
t = time.time()
with torch.no_grad():
for i in range(10):
c = net(a)
print(time.time() - t)
torch.cuda.synchronize()
print(time.time() - t)
0.3161756992340088
0.3302645683288574
Linear:
a = torch.rand(1000, 20, 10000).to(device)
net = nn.Linear(10000, 100).to(device)
torch.cuda.synchronize()
t = time.time()
with torch.no_grad():
for i in range(10):
c = net(a)
print(time.time() - t)
torch.cuda.synchronize()
print(time.time() - t)
0.0007715225219726562
0.02486276626586914