I want to save time by using the gradients of the last layer to replace that of all layers, however, I find it takes the same time to calculate the gradient of the last layer and all the gradients.
the code for calculating the gradient for the last layer:
last_layer=list(net.children())[-1]
for param in net.parameters():
param.requires_grad = False
for param in last_layer.parameters():
param.requires_grad = True
for i in range(len(inputs)):
optimizer.zero_grad()
loss[i].backward(retain_graph=True)
last_grad_norm=0.0
for name, para in last_layer.named_parameters():
if 'weight' in name:
last_grad_norm += para.grad.norm().cpu().item()
result.append(last_grad_norm)
the code for calculating the gradient for all layers:
for i in range(len(inputs)):
optimizer.zero_grad()
loss[i].backward(retain_graph=True)
all_grad_norm=0.0
for name, para in last_layer.named_parameters():
if 'weight' in name:
all_grad_norm += para.grad.norm().cpu().item()
result.append(all_grad_norm)
I want to know why it cosumes the same time even that I only calculate gradients for partical layers. Meanwhile, I want to know how can I get the gradients for the last layer faster. I will be quite appreciate if someone can help me.