How can I modify the output gradients (dE/dO) per layer before calculating the input gradients (dE/dI) during back-propagation? I need to mask the output gradients as follows: dE/dI = w * mask1(dE/dO). I checked register_backward_hook(), but the documentation mentions I can only modify the input gradients dE/dI after it has been calculated. Same question applies for masking output gradients before calculating weight gradients dE/dW = I * mask2(dE/dO). Please note that mask1 and mask2 can be different.