How can I profile the Backward pass of model?

Navdeep_Kumar · April 11, 2020, 11:51am

I need to profile the backward pass of a model running on a GPU. I need to see how much time each layer’s gradient computation took along with achived TFLOPs during the operation. The problem is, If I use a profiler such as nsight systems then I cannot simply differentiate which kernel ran for which layer just because I cannot annotate the backward pass using nvtx. Is there some way in which the backward pass can be profiled.

ptrblck · April 12, 2020, 6:49am

autograd.profiler should give your the runtime for the backward functions. If you spot a bottleneck, you could run nsight systems in isolation on this particular backward call.

torch_fresh · June 15, 2020, 3:47pm

Hi, I have a same issue for profiling backward pass of each layer. Can you give me some hints for solving this problem? Thanks for any code or suggestion.

ptrblck · June 16, 2020, 6:57am

If you just want to profile the backward layer and get the current runtime, this code snippet might be helpful:

def profile(module, input):
    # Warmup
    for _ in range(50):
        output = module(input)

    g0 = torch.rand_like(output)
    for _ in range(50):
        output = module(input)
        output.backward(g0)


    nb_iters = 100
    torch.cuda.synchronize()
    start = time.time()
    for _ in range(nb_iters):
        output = module(input)
    torch.cuda.synchronize()
    end = time.time()
    fwd_time = (end - start) / nb_iters

    # Profile backward pass
    torch.cuda.synchronize()
    start = time.time()
    for _ in range(nb_iters):
        output = module(input)
        module.weight.grad = None
        output.backward(g0)

    torch.cuda.synchronize()
    end = time.time()
    all_time = (end - start) / nb_iters
    bwd_time = all_time - fwd_time