Big gaps between forward and backward computations when profiling forward and backward pass of convolution

I am profiling forward and backward pass of convolution using the following code, and visualizing the profile using chrome://tracing.

I notice big gaps between forward and backward computations, before and after torch::autograd::GraphRoot, shown in the following picture.

Any ideas why these gaps are there?

import torch
import torch.nn.functional as F
import torch.autograd as Ag

Device = torch.device("cuda:0")
ProblemSize = 100
NumChannels = 5
NumFilters = 96
ClassTypeName = "Single"
ClassType = torch.float32

X = torch.rand(1, NumChannels, ProblemSize, ProblemSize, dtype=ClassType, requires_grad=True).to(Device)
weights = torch.rand(NumFilters, NumChannels, 10, 10, dtype=ClassType, requires_grad=True).to(Device)
    
#warm up
Y = F.conv2d(X, weights)
Ysum = torch.sum(Y)
grad = Ag.grad(Ysum, weights)

Y = F.conv2d(X, weights)
Ysum = torch.sum(Y)
grad = Ag.grad(Ysum, weights)

#profile
with torch.autograd.profiler.profile(use_cuda=True) as prof:
    Y = F.conv2d(X, weights)
    Ysum = torch.sum(Y)
    grad = Ag.grad(Ysum, weights)
print(prof)
prof.export_chrome_trace("pytorch_profile_conv_grad.json")

I also meet same question, and are finding solution, @ptrblck @albanD

Hi,

Quite a few things happen before the start of the backward pass (discovering the graph and what needs to be computed in there). I don’t know what is the scale of the benchmark here, but for very small ops, this is possible.

1 Like