I am trying to replace Convolution ops with Similarity ops defined by Deep SimNet

which change

to

It just add l1 norm along with the multiply in convolution. The size of t is same with w.

It looks like this in code:

```
# N, F, C, H, W, HH, WW, H_out, W_out are:
# batch_size, out_channel, in_channel, in_height, in_weight, kernel_height, kernel_weight, out_height, out_weight
x = Im2Col.apply(x, self.kernel_size, self.dilation, self.padding, self.stride)
x = x.unsqueeze(1) # x.shape = N, 1, C*HH*WW, H_out*W_out
w = self.weight.view(1, F, C*HH*WW, 1) # w.shape = 1, F, C*HH*WW, 1
t = self.tamplate.view_as(w) # t.shape = w.shape
```

where x, w, t are input, weight and template in the equation above

The trivial convolution looks like:

```
x = w.mul(x).sum(-2)
x = x.view(N,F,H_out, W_out)
```

which takes 700 M in torch.no_grad() and 1200 M in training phase. Similar with torch.nn.Conv2d. (although the torch.nn.Conv2d use fft while I use im2col here)

But the Similarity looks like:

```
x = x.sub(t).abs().mul(w).sum(-2))
x = x.view(N,F,H_out, W_out)
```

which takes 1500M in torch.no_grad() and over 10000M(10G!) in training phase.

Why?, I just add sub(t).abs() before mul(w). And it make 10x number of gradient?

BTW, the net I use is resnet18. And all the Conv2d layers are replaced by Similarity layers.