Dear @ptrblck

A few months ago, I created this topic.

I re-implemented your code snippet for replacing the standard `conv. layer`

in Alexnet with the new one discussed in this topic. However, when I use `torch.allclose`

function (implemented in your code) to figure out whether the output of the standard `conv. layer`

and `modified one`

is equal or not, the result is always `False`

, even the program is run without any error. I mean the input is passed through all layer without error but the output of `torch.allclose`

is always False. The difference between your code and my code is that in your code you considered bias parameter to be false for simplification. My code also considers the bias values.

I provide my code also. I really appreciate it if you could please help out. What is wrong with my code that the out puts are different?

Thank you.

The code snippet for modifying the conv layer.

for name, module in alex_net._modules.items():

if name == ‘features’:

for n,m in module._modules.items():

if type(m) == torch.nn.modules.conv.Conv2d:

alex_net._modules[name][int(n)] = custom_conv(alex_net._modules[name][int(n)])

print(alex_net)

output: I keep the `standard conv. layer`

in the `custom_conv class`

for comparing their outputs.

AlexNet(

(features): Sequential(

(0): custom_conv(

(original_conv): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))

(conv_grouped): Conv2d(3, 192, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2), groups=3)

)

(1): ReLU(inplace=True)

(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)

(3): custom_conv(

(original_conv): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))

(conv_grouped): Conv2d(64, 12288, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=64)

)

(4): ReLU(inplace=True)

(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)

(6): custom_conv(

(original_conv): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(conv_grouped): Conv2d(192, 73728, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192)

)

(7): ReLU(inplace=True)

(8): custom_conv(

(original_conv): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(conv_grouped): Conv2d(384, 98304, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384)

)

(9): ReLU(inplace=True)

(10): custom_conv(

(original_conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(conv_grouped): Conv2d(256, 65536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)

)

(11): ReLU(inplace=True)

(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)

)

(avgpool): AdaptiveAvgPool2d(output_size=(6, 6))

(classifier): Sequential(

(0): Dropout(p=0.5, inplace=False)

(1): Linear(in_features=9216, out_features=4096, bias=True)

(2): ReLU(inplace=True)

(3): Dropout(p=0.5, inplace=False)

(4): Linear(in_features=4096, out_features=4096, bias=True)

(5): ReLU(inplace=True)

(6): Linear(in_features=4096, out_features=1000, bias=True)

)

)

Custom conv. layer implementations.

class custom_conv(nn.Module):

def **init**(self, conv_module):

super(custom_conv, self).**init**()

self.original_conv = conv_module

self.in_channels = conv_module.in_channels

self.out_channels = conv_module.out_channels

self.kernel_size = conv_module.kernel_size

self.stride = conv_module.stride

self.padding = conv_module.padding

self.conv_grouped = nn.Conv2d(self.in_channels, self.in_channels*self.out_channels,self.kernel_size,

self.stride, self.padding, groups=self.in_channels, bias=True)

with torch.no_grad():

self.conv_grouped.weight.copy_(conv_module.weight.permute(1, 0, 2, 3).

reshape(self.in_channels * self.out_channels, 1, self.kernel_size[0], self.kernel_size[1]))

```
def __call__(self, x):
out = self.original_conv(x)
#print('original output ',out.shape)
N = out.shape[0]
H = out.shape[2]
W = out.shape[3]
out_grouped = self.conv_grouped(x)
#print(out_grouped.shape)
out_grouped = out_grouped.view(N, self.in_channels , self.out_channels, H, W).\
permute(0, 2, 1, 3, 4).reshape(N, self.in_channels*self.out_channels, H, W)
idx = torch.arange(self.out_channels)
idx = torch.repeat_interleave(idx, self.in_channels)
idx = idx[None, :, None, None].expand(N, -1, H, W)
out_grouped_reduced = torch.zeros_like(out)
out_grouped_reduced.scatter_add_(dim=1, index=idx, src=out_grouped)
print(torch.allclose(out_grouped_reduced, out, atol=5e-6), (out_grouped_reduced - out).abs().max())
return out_grouped_reduced, out
```

I pass a random input with shape `[1,3,224,224]`

and the output of my code is as follow:

layer index: 0

False tensor(0.2228, grad_fn=)

original shape : torch.Size([1, 64, 55, 55])

grouped shape: torch.Size([1, 64, 55, 55])

layer index: 1

original shape : torch.Size([1, 64, 55, 55])

grouped shape: torch.Size([1, 64, 55, 55])

layer index: 2

original shape : torch.Size([1, 64, 27, 27])

grouped shape: torch.Size([1, 64, 27, 27])

layer index: 3

False tensor(2.6001, grad_fn=)

original shape : torch.Size([1, 192, 27, 27])

grouped shape: torch.Size([1, 192, 27, 27])

layer index: 4

original shape : torch.Size([1, 192, 27, 27])

grouped shape: torch.Size([1, 192, 27, 27])

layer index: 5

original shape : torch.Size([1, 192, 13, 13])

grouped shape: torch.Size([1, 192, 13, 13])

layer index: 6

False tensor(10.5771, grad_fn=)

original shape : torch.Size([1, 384, 13, 13])

grouped shape: torch.Size([1, 384, 13, 13])

layer index: 7

original shape : torch.Size([1, 384, 13, 13])

grouped shape: torch.Size([1, 384, 13, 13])

layer index: 8

False tensor(10.4743, grad_fn=)

original shape : torch.Size([1, 256, 13, 13])

grouped shape: torch.Size([1, 256, 13, 13])

layer index: 9

original shape : torch.Size([1, 256, 13, 13])

grouped shape: torch.Size([1, 256, 13, 13])

layer index: 10

False tensor(10.4024, grad_fn=)

original shape : torch.Size([1, 256, 13, 13])

grouped shape: torch.Size([1, 256, 13, 13])

layer index: 11

original shape : torch.Size([1, 256, 13, 13])

grouped shape: torch.Size([1, 256, 13, 13])

layer index: 12

original shape : torch.Size([1, 256, 6, 6])

grouped shape: torch.Size([1, 256, 6, 6])

layer index: 12

original shape : torch.Size([1, 256, 6, 6])

grouped shape: torch.Size([1, 256, 6, 6])

layer index: 12

original shape : torch.Size([1, 1000])

grouped shape: torch.Size([1, 1000])