Hi,
down below, I coded smooth gradient function
in order to implement the interpretation method ‘smooth grad’.
def smooth_grad(inputs, labels, target_layer, args, by_label=True):
iterations = args.smooth_num
alpha = args.smooth_std
for i in range(iterations):
inputs_noise = inputs + alpha*torch.randn(inputs.shape).cuda()
activation_output = prediction(inputs_noise)
if i == 0:
R = simple_grad(activation_output, labels, target_layer, args, no_R_process=True).detach()
else:
# If you want to train the model by using LR with smooth grad, then you need to remove detach() function.
R += simple_grad(activation_output, labels, target_layer, args, no_R_process=True).detach()
return R
# Do something with R
simple_grad function is a function that calculates the gradient of activation_output with respect to inputs_noise with “forward propagation”. and at the end, I’m gonna backpropagate with the result I’m going to make with R.
What I found is that even though I’m using vgg16, 8x3x224x224 size of input and
there are multiple forward props(16 of them), it takes less gpu memory than 11GB.
Question
- How is this possible?
- Is there any problem in the way that I used for loop like this? Will backprop be able to flow to all those 16 forward path?