Multiple forward from single input using for loop

Hi,
down below, I coded smooth gradient function
in order to implement the interpretation method ‘smooth grad’.

    def smooth_grad(inputs, labels, target_layer, args, by_label=True):
        iterations = args.smooth_num
        alpha = args.smooth_std
        for i in range(iterations):
            inputs_noise = inputs + alpha*torch.randn(inputs.shape).cuda()
            activation_output = prediction(inputs_noise)
            if i == 0:
                R = simple_grad(activation_output, labels, target_layer, args, no_R_process=True).detach()
            else:
                # If you want to train the model by using LR with smooth grad, then you need to remove detach() function.
                R += simple_grad(activation_output, labels, target_layer, args, no_R_process=True).detach()

        return R
        
        # Do something with R

simple_grad function is a function that calculates the gradient of activation_output with respect to inputs_noise with “forward propagation”. and at the end, I’m gonna backpropagate with the result I’m going to make with R.
What I found is that even though I’m using vgg16, 8x3x224x224 size of input and
there are multiple forward props(16 of them), it takes less gpu memory than 11GB.

Question

  1. How is this possible?
  2. Is there any problem in the way that I used for loop like this? Will backprop be able to flow to all those 16 forward path?