Computing gradient w.r.t the network parameters on each input sample


I need to compute the norm of the grads w.r.t. the network parameters on each input sample. Right now I am doing this as follows:

pred = net(input)
for i in range(len(input)):

    loss = criterion(pred[i:i + 1], targets[i:i + 1])  # only consider the loss on i-th data

    # gather gradients wrt the weights on the current sample
    grad_l = []
    for param in net.parameters():
        # check if the param has grad
        if param.grad is not None:
    grad_l =, 0)
    norm = grad_l.norm(2)

However, it is notably slow. Is there any other way/trick to do so in a faster manner?