Hi, I’m Newbie in pytorch.
Currently based on simple CNN architecture, I would like to apply pruning task (which holds only several nodes like dropout) in my Network layer-wisely. By making mask which can make the same size of the gradient of input/output(=activation map), I want to modify the weight or gradients using it during backpropagation step.
Therefore, my system flow I hope is as follows,
- forward pass,
- loss calculation,
- making mask in each layer by using weights(W), bias(b), gradients of input(dL/dx)/output(dL/dy), etc.,
- modifying gradients using mask which consist of 1 (which indicates meaningful nodes) or 0 (meaningless),
- updating(or applying) weight or gradient using the modified gradients layer-wisely
I have already read other similar Q&A about this question in pytorch community and tried to apply it as below,
Case 1) Use of register_backward_hook: in this case, there is only an example to use multiplication of scalar variable (like gradients * 10). Of course, I could get the gradient of input (dL/dx) or output(dL/dy) which has same size of mask but I don’t know how to update the modified the gradient manually.
Case 2) Making my own function class using torch.autograd.Function as belows. However, 1) I couldn’t see any example in convolutional case (We can’t debug in detail of conv layer) and 2) what I want to do is just to apply the pre-calculated mask to the gradients and in this respect it is too difficult to understand.
class my_function(torch.autograd.Function): @staticmethod def forward(ctx, input, weight, bias): # reimplement forward, stash by ctx.save what you need ctx.save_for_backward(input, weight, bias) if input.dim() == 2 and bias is not None: # fused op is marginally faster return torch.addmm(bias, input, weight.t()) output = input.matmul(weight.t()) if bias is not None: output += bias return output @staticmethod def backward(ctx, grad_output): # print('hello_backward') input, weight, bias = ctx.saved_variables # print(input.shape) grad_input = grad_weight = grad_bias = None ''' if ctx.needs_input_grad: grad_input = grad_output.mm(weight) if ctx.needs_input_grad: grad_weight = grad_output.t().mm(input) if bias is not None and ctx.needs_input_grad: grad_bias = grad_output.sum(0).squeeze(0) ''' # not sure what to return as format return grad_input, grad_weight, grad_bias
Now I refer to the paper[https://arxiv.org/abs/1706.06197] in ICML2017 which do Sparsified Back Propagation.
So, it would be very helpful for any advice how to implement this concept. Thank you!