I want to measure the torch operation execution time.
device = "cuda:0"
add_1 = torch.randn(1, 64, 1, 1).to(device)
with_stack=True,
with profile(activities=[ProfilerActivity.CUDA, ProfilerActivity.CPU], record_shapes=True, profile_memory=True) as prof:
with record_function("operation"):
rsqrt = torch.ops.aten.rsqrt.default(add_1)
print(prof.key_averages().table())
This profiler activity gives the following output for execution time.
Self CPU time total: 22.537ms
Self CUDA time total: 1.000us
torch.cuda.current_stream().synchronize()
t0 = time.time()
rsqrt = torch.ops.aten.rsqrt.default(add_1)
torch.cuda.current_stream().synchronize()
t1 = time.time()
This code snippet gives me the time 0.17ms
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
start.record()
rsqrt = torch.ops.aten.rsqrt.default(add_1)
end.record()
torch.cuda.current_stream().synchronize()
torch.cuda.Event gives me the GPU execution time 0.10 ms.
-
What is the right approach of measuring torch operation execution time?
-
I have a model and I want to measure the training time of it.
import torch
import torchvision
import time
device = "cuda:0"
num_iter = 1
model = torchvision.models.resnet50(pretrained=True).to(device)
inputs = torch.randn(1, 3, 224, 224).to("cuda:0")
labels = torch.randn(1, 1000).to("cuda:0")
learning_rate = 0.001
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
torch.cuda.current_stream().synchronize()
t0 = time.time()
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
torch.cuda.current_stream().synchronize()
t1 = time.time()
print(f"Total Time taken for the model training: {(t1 - t0) / num_iter * 1000} ms")
Is this the right way to measure the training time of a model?
Any kind of help will be appreciated. Thanks.