I am training ResNet101 on tiny-imagenet. How to compute gradient l2 norm and the noise of the gradient (see the last two strings) per epoch? L is a classic loss function.
Firstly I need to compute them per iteration and then average. Is it OK for gradient norm?
for batch_num, (X, y) in enumerate(train_loader):
X = X.to(device)
y = y.to(device)
optimizer.zero_grad()
y_pred = model(X)
loss = loss_fn(y_pred, y)
loss_value += loss.item()
loss.backward()
optimizer.step()
# Compute gradients for batch
all_batch_gradients = []
for param in model.parameters():
all_batch_gradients.append(param.grad.view(-1,1).to("cpu"))
all_batch_gradients = torch.concat(all_batch_gradients)