Efficiently computing gradients for each example in a large dataset

Given a trained model (M), I’m interested in computing the utility of new (unseen) examples in a pool (for an active learning task). For this, I need to compute the magnitude of the gradient when M is trained on each new example. In code, it is something like:

losses, grads = [], []
for i in range(X_pool.shape[0]):
    pred = model(X_pool[i:i+1])
    loss = loss_func(pred, y_pool[i:i+1])

    model.zero_grad()
    loss.backward()

    losses.append(loss)
    grads.append(layer.weight.grad.norm())

However, this is quite slow when there is a large pool of examples, especially since this will be an inner loop in my scenario. How can this code be improved for efficiency?

I appreciate any suggestions or pointers. Thanks!

did you ever figure out nything ? i m doing a linear regression with 1 layer but its diverging … any advice?