AdderNet Backprop Understanding

Fabfi · November 6, 2021, 10:53pm

Iam trying to understand the implementation of AdderNet regarding this: https://github.com/huawei-noah/AdderNet/blob/master/adder.py

especially the backprop, can someone maybe elaborate on the X_col and W_col, and how this is in relation to the AdderNet paper: https://arxiv.org/pdf/1912.13200.pdf

@staticmethod
def backward(ctx,grad_output):
    W_col,X_col = ctx.saved_tensors
    grad_W_col = ((X_col.unsqueeze(0)-W_col.unsqueeze(2))*grad_output.unsqueeze(1)).sum(2)
    grad_W_col = grad_W_col/grad_W_col.norm(p=2).clamp(min=1e- 
    12)*math.sqrt(W_col.size(1)*W_col.size(0))/5
    grad_X_col = (-(X_col.unsqueeze(0)- 
    W_col.unsqueeze(2)).clamp(-1,1)*grad_output.unsqueeze(1)).sum(0)
    
    return grad_W_col, grad_X_col

especially how the hardthan function is executed here is something I dont really understand (which is stated in 3.2 of the Paper), and how the backprop has something to do with the X_col, since how Iam understanding it X_col is just the input columns