down below, I coded smooth gradient function
in order to implement the interpretation method ‘smooth grad’.
def smooth_grad(inputs, labels, target_layer, args, by_label=True): iterations = args.smooth_num alpha = args.smooth_std for i in range(iterations): inputs_noise = inputs + alpha*torch.randn(inputs.shape).cuda() activation_output = prediction(inputs_noise) if i == 0: R = simple_grad(activation_output, labels, target_layer, args, no_R_process=True).detach() else: # If you want to train the model by using LR with smooth grad, then you need to remove detach() function. R += simple_grad(activation_output, labels, target_layer, args, no_R_process=True).detach() return R # Do something with R
simple_grad function is a function that calculates the gradient of activation_output with respect to inputs_noise with “forward propagation”. and at the end, I’m gonna backpropagate with the result I’m going to make with R.
What I found is that even though I’m using vgg16, 8x3x224x224 size of input and
there are multiple forward props(16 of them), it takes less gpu memory than 11GB.
- How is this possible?
- Is there any problem in the way that I used for loop like this? Will backprop be able to flow to all those 16 forward path?