Iam trying to understand the implementation of AdderNet regarding this: AdderNet/adder.py at master · huawei-noah/AdderNet · GitHub
especially the backprop, can someone maybe elaborate on the X_col and W_col, and how this is in relation to the AdderNet paper: https://arxiv.org/pdf/1912.13200.pdf
@staticmethod def backward(ctx,grad_output): W_col,X_col = ctx.saved_tensors grad_W_col = ((X_col.unsqueeze(0)-W_col.unsqueeze(2))*grad_output.unsqueeze(1)).sum(2) grad_W_col = grad_W_col/grad_W_col.norm(p=2).clamp(min=1e- 12)*math.sqrt(W_col.size(1)*W_col.size(0))/5 grad_X_col = (-(X_col.unsqueeze(0)- W_col.unsqueeze(2)).clamp(-1,1)*grad_output.unsqueeze(1)).sum(0) return grad_W_col, grad_X_col
especially how the hardthan function is executed here is something I dont really understand (which is stated in 3.2 of the Paper), and how the backprop has something to do with the X_col, since how Iam understanding it X_col is just the input columns