Dear @ptrblck
A few months ago, I created this topic.
I re-implemented your code snippet for replacing the standard conv. layer
in Alexnet with the new one discussed in this topic. However, when I use torch.allclose
function (implemented in your code) to figure out whether the output of the standard conv. layer
and modified one
is equal or not, the result is always False
, even the program is run without any error. I mean the input is passed through all layer without error but the output of torch.allclose
is always False. The difference between your code and my code is that in your code you considered bias parameter to be false for simplification. My code also considers the bias values.
I provide my code also. I really appreciate it if you could please help out. What is wrong with my code that the out puts are different?
Thank you.
The code snippet for modifying the conv layer.
for name, module in alex_net._modules.items():
if name == ‘features’:
for n,m in module._modules.items():
if type(m) == torch.nn.modules.conv.Conv2d:
alex_net._modules[name][int(n)] = custom_conv(alex_net._modules[name][int(n)])
print(alex_net)
output: I keep the standard conv. layer
in the custom_conv class
for comparing their outputs.
AlexNet(
(features): Sequential(
(0): custom_conv(
(original_conv): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
(conv_grouped): Conv2d(3, 192, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2), groups=3)
)
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): custom_conv(
(original_conv): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(conv_grouped): Conv2d(64, 12288, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=64)
)
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): custom_conv(
(original_conv): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_grouped): Conv2d(192, 73728, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192)
)
(7): ReLU(inplace=True)
(8): custom_conv(
(original_conv): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_grouped): Conv2d(384, 98304, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384)
)
(9): ReLU(inplace=True)
(10): custom_conv(
(original_conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_grouped): Conv2d(256, 65536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
)
(11): ReLU(inplace=True)
(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
(classifier): Sequential(
(0): Dropout(p=0.5, inplace=False)
(1): Linear(in_features=9216, out_features=4096, bias=True)
(2): ReLU(inplace=True)
(3): Dropout(p=0.5, inplace=False)
(4): Linear(in_features=4096, out_features=4096, bias=True)
(5): ReLU(inplace=True)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)
Custom conv. layer implementations.
class custom_conv(nn.Module):
def init(self, conv_module):
super(custom_conv, self).init()
self.original_conv = conv_module
self.in_channels = conv_module.in_channels
self.out_channels = conv_module.out_channels
self.kernel_size = conv_module.kernel_size
self.stride = conv_module.stride
self.padding = conv_module.padding
self.conv_grouped = nn.Conv2d(self.in_channels, self.in_channels*self.out_channels,self.kernel_size,
self.stride, self.padding, groups=self.in_channels, bias=True)
with torch.no_grad():
self.conv_grouped.weight.copy_(conv_module.weight.permute(1, 0, 2, 3).
reshape(self.in_channels * self.out_channels, 1, self.kernel_size[0], self.kernel_size[1]))
def __call__(self, x):
out = self.original_conv(x)
#print('original output ',out.shape)
N = out.shape[0]
H = out.shape[2]
W = out.shape[3]
out_grouped = self.conv_grouped(x)
#print(out_grouped.shape)
out_grouped = out_grouped.view(N, self.in_channels , self.out_channels, H, W).\
permute(0, 2, 1, 3, 4).reshape(N, self.in_channels*self.out_channels, H, W)
idx = torch.arange(self.out_channels)
idx = torch.repeat_interleave(idx, self.in_channels)
idx = idx[None, :, None, None].expand(N, -1, H, W)
out_grouped_reduced = torch.zeros_like(out)
out_grouped_reduced.scatter_add_(dim=1, index=idx, src=out_grouped)
print(torch.allclose(out_grouped_reduced, out, atol=5e-6), (out_grouped_reduced - out).abs().max())
return out_grouped_reduced, out
I pass a random input with shape [1,3,224,224]
and the output of my code is as follow:
layer index: 0
False tensor(0.2228, grad_fn=)
original shape : torch.Size([1, 64, 55, 55])
grouped shape: torch.Size([1, 64, 55, 55])
layer index: 1
original shape : torch.Size([1, 64, 55, 55])
grouped shape: torch.Size([1, 64, 55, 55])
layer index: 2
original shape : torch.Size([1, 64, 27, 27])
grouped shape: torch.Size([1, 64, 27, 27])
layer index: 3
False tensor(2.6001, grad_fn=)
original shape : torch.Size([1, 192, 27, 27])
grouped shape: torch.Size([1, 192, 27, 27])
layer index: 4
original shape : torch.Size([1, 192, 27, 27])
grouped shape: torch.Size([1, 192, 27, 27])
layer index: 5
original shape : torch.Size([1, 192, 13, 13])
grouped shape: torch.Size([1, 192, 13, 13])
layer index: 6
False tensor(10.5771, grad_fn=)
original shape : torch.Size([1, 384, 13, 13])
grouped shape: torch.Size([1, 384, 13, 13])
layer index: 7
original shape : torch.Size([1, 384, 13, 13])
grouped shape: torch.Size([1, 384, 13, 13])
layer index: 8
False tensor(10.4743, grad_fn=)
original shape : torch.Size([1, 256, 13, 13])
grouped shape: torch.Size([1, 256, 13, 13])
layer index: 9
original shape : torch.Size([1, 256, 13, 13])
grouped shape: torch.Size([1, 256, 13, 13])
layer index: 10
False tensor(10.4024, grad_fn=)
original shape : torch.Size([1, 256, 13, 13])
grouped shape: torch.Size([1, 256, 13, 13])
layer index: 11
original shape : torch.Size([1, 256, 13, 13])
grouped shape: torch.Size([1, 256, 13, 13])
layer index: 12
original shape : torch.Size([1, 256, 6, 6])
grouped shape: torch.Size([1, 256, 6, 6])
layer index: 12
original shape : torch.Size([1, 256, 6, 6])
grouped shape: torch.Size([1, 256, 6, 6])
layer index: 12
original shape : torch.Size([1, 1000])
grouped shape: torch.Size([1, 1000])