Benchmark Backward

What is the recommended way to benchmark the backward path?

I have a large model and a surprisingly long backward time.

I suspect that the very long time for the backward is caused by few layers.

Is there any way to benchmark the backward of a model such that we get a list of each layer and the time it took for the backward?

Would you be looking for something like the autograd profiler?

import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import profiler

# Define a simple neural network
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(10, 50)
        self.fc2 = nn.Linear(50, 20)
        self.fc3 = nn.Linear(20, 5)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Initialize the network, loss function, and optimizer
net = SimpleNet()
criterion = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=0.01)

# Dummy input and target
input = torch.randn(1, 10)
target = torch.randn(1, 5)

# Forward pass
output = net(input)
loss = criterion(output, target)

# Profile the backward pass
with profiler.profile(record_shapes=True) as prof:
    with profiler.record_function("backward"):
        loss.backward()

# Print the profiling results
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))

Would output this:

-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                               backward        27.23%       8.207ms       100.00%      30.145ms      30.145ms             1  
                                aten::mse_loss_backward        17.68%       5.330ms        35.11%      10.585ms       5.293ms             2  
autograd::engine::evaluate_function: MseLossBackward...         0.11%      33.000us        29.28%       8.827ms       8.827ms             1  
                                       MseLossBackward0        11.04%       3.327ms        29.17%       8.794ms       8.794ms             1  
     autograd::engine::evaluate_function: ReluBackward0         0.09%      28.000us        19.90%       5.999ms       2.999ms             2  
                                          ReluBackward0         0.10%      31.000us        19.81%       5.971ms       2.986ms             2  
                               aten::threshold_backward        19.70%       5.940ms        19.70%       5.940ms       2.970ms             2  
    autograd::engine::evaluate_function: AddmmBackward0         0.89%     269.000us        19.56%       5.897ms       1.966ms             3  
                                              aten::sum         0.27%      82.000us        11.19%       3.373ms       1.124ms             3  
                                            aten::fill_        10.93%       3.294ms        10.93%       3.294ms     823.500us             4  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 30.145ms